Eikon Data API¶

Cross-Asset Financial Analytics — Discovering Statistical Inefficiencies via Deep Learning

Dr. Yves J. Hilpisch | The Python Quants GmbH

The Agenda¶

This tutorial shows

how to retrieve historical data across asset classes via the Eikon Data API,
how to work with such data using pandas, Plotly and Cufflinks and
how to discover stastical inefficiencies in financial time series with machine learning.

Importing Required Packages¶

import time
import eikon as ek  # the Eikon package
import numpy as np  # NumPy
import pandas as pd  # pandas
import cufflinks as cf  # Cufflinks
import configparser as cp

The following Python and package versions are used.

import sys
print(sys.version)

3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 13:42:17) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]

ek.__version__

'1.0.1'

np.__version__

'1.17.2'

pd.__version__

'0.25.2'

cf.__version__

'0.17.0'

Connecting to Eikon Data API¶

This code sets the app_id to connect to the Eikon Data API Proxy which needs to be running locally.

cfg = cp.ConfigParser()
cfg.read('eikon.cfg')

['eikon.cfg']

# ek.set_app_key(cfg['eikon']['app_id'])

Retrieving Cross-Asset Data¶

We first define a small universe of RICS for which to retrieve data.

rics = [
    'GE',  # General Electric stock
    'AAPL.O',  # Apple stock
    '.SPX',  # S&P 500 stock index
    '.VIX',  # VIX volatility index
    'EUR=',  # EUR/USD exchange rate
    'XAU=',  # Gold price
    'GLD',  # Gold ETF
    'BTC=',  # Bitcoin in USD
]

Second, end-of-day (EOD) data is retrieved.

fn = 'eikon_eod_data.csv'
try:
    data = pd.read_csv(fn, index_col=0, parse_dates=True)
except:
    first = True
    for ric in rics:
        print(ric)
        d = ek.get_timeseries(ric,  # the RIC
                             fields='CLOSE',  # the required fields
                             start_date='2015-01-01',  # start date
                             end_date='2019-09-30')  # end date
        if first:
            data = d
            data.columns = [ric]
            first = False
        else:
            data[ric] = d
        time.sleep(2)
    data.to_csv(fn)

data.head()  # first five rows

data.tail()  # final five rows

data.info()  # DataFrame meta information

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1194 entries, 2015-01-02 to 2019-09-30
Data columns (total 8 columns):
GE        1194 non-null float64
AAPL.O    1194 non-null float64
.SPX      1194 non-null float64
.VIX      1194 non-null float64
EUR=      1194 non-null float64
XAU=      1194 non-null float64
GLD       1194 non-null float64
BTC=      1193 non-null float64
dtypes: float64(8)
memory usage: 84.0 KB

Preparing Features¶

To discover statistical inefficiencies we work with different features extracted from the time series data.

def add_lags(data, ric, lags):
    cols = []
    df = pd.DataFrame(data[ric])
    df['r'] = np.log(df / df.shift())
    df['sma'] = df[ric].rolling(20).mean()
    df['min'] = df[ric].rolling(20).min()
    df['max'] = df[ric].rolling(20).max()
    df['mom'] = df['r'].rolling(20).mean()
    df['vol'] = df['r'].rolling(20).std()
    df.dropna(inplace=True)
    df['d'] = np.where(df['r'] > 0, 1, 0)
    features = [ric, 'r', 'd', 'sma', 'min', 'max', 'mom', 'vol']
    for f in features:
        for lag in range(1, lags + 1):
            col = f'{f}_lag_{lag}'  # defines the column name
            df[col] = df[f].shift(lag)  # creates the lagged data column
            cols.append(col)  # stores the column name
    df.dropna(inplace=True)  # gets rid of incomplete data rows
    return df, cols

Second, the iterations over all RICs, using the add_lags function and storing the resulting DataFrame objects in a dictionary.

lags = 7  # historical lags

dfs = {}
for ric in rics:
    print(ric)
    df, cols = add_lags(data, ric, lags)
    dfs[ric] = df.dropna(), cols

GE
AAPL.O
.SPX
.VIX
EUR=
XAU=
GLD
BTC=

dfs.keys()  # the keys of the dictonary

dict_keys(['GE', 'AAPL.O', '.SPX', '.VIX', 'EUR=', 'XAU=', 'GLD', 'BTC='])

# dfs['AAPL.O'].head(7)

Machine Learning¶

The matrix consisting of the lagged data columns is used to "predict" the next day's value of the RIC via Machine Learning.

from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score

np.set_printoptions(precision=4, suppress=True)

def create_model(algo=3, n_estimators=75):
    base_estimators = [
        GaussianNB(),
        LogisticRegression(C=1),
        DecisionTreeClassifier(random_state=100, max_depth=3,
                              min_samples_leaf=12),
        MLPClassifier(hidden_layer_sizes=2 * [128,],
                      early_stopping=True, shuffle=False,
                      random_state=100,
                      activation='relu',
                      max_iter=250),
        RandomForestClassifier(n_estimators=5, max_depth=3,
                               min_samples_leaf=8, random_state=100)
        ]
    model = BaggingClassifier(base_estimator=base_estimators[4],
                          n_estimators=n_estimators,
                          bootstrap=True,
                          max_features=0.75,
                          max_samples=0.75,
                          n_jobs=4,
                          random_state=100
                        )
    return model

First, training and prediction in-sample only.

%%time
for ric in rics:
    model = create_model(n_estimators=10)
    df, cols = dfs[ric]  # getting data for the RIC
    model.fit(df[cols], df['d'])  # the fitting step
    pred = model.predict(df[cols])  # the prediction step
    acc = accuracy_score(df['d'], pred)  # prediction accuracy
    print(f'IN-SAMPLE | {ric:7s} | acc={acc:.4f}')

IN-SAMPLE | GE      | acc=0.5835
IN-SAMPLE | AAPL.O  | acc=0.6195
IN-SAMPLE | .SPX    | acc=0.5938
IN-SAMPLE | .VIX    | acc=0.6127
IN-SAMPLE | EUR=    | acc=0.6427
IN-SAMPLE | XAU=    | acc=0.6401
IN-SAMPLE | GLD     | acc=0.6470
IN-SAMPLE | BTC=    | acc=0.6139
CPU times: user 1.25 s, sys: 753 ms, total: 2 s
Wall time: 1min 6s

Second, training in-sample and prediction out-of-sample.

split = int(len(dfs[ric][0]) * 0.7)

%%time
for ric in rics:
    model = create_model()
    df, cols = dfs[ric]  # getting data for the RIC
    train = df.iloc[:split]  # training data set
    mu, std = train[cols].mean(), train[cols].std()
    train[cols] = (train[cols] - mu) / std  # normalization
    model.fit(train[cols], train['d'])  # the fitting step
    test = df.iloc[split:]  # test data set
    test[cols] = (test[cols] - mu) / std  # normalization
    pred = model.predict(test[cols])  # the prediction step
    acc = accuracy_score(test['d'], pred)  # prediction accuracy
    print(f'OUT-OF-SAMPLE | {ric:7s} | acc={acc:.4f}')

OUT-OF-SAMPLE | GE      | acc=0.5156
OUT-OF-SAMPLE | AAPL.O  | acc=0.4958
OUT-OF-SAMPLE | .SPX    | acc=0.5212
OUT-OF-SAMPLE | .VIX    | acc=0.5666
OUT-OF-SAMPLE | EUR=    | acc=0.5156
OUT-OF-SAMPLE | XAU=    | acc=0.4986
OUT-OF-SAMPLE | GLD     | acc=0.5042
OUT-OF-SAMPLE | BTC=    | acc=0.4986
CPU times: user 5.29 s, sys: 855 ms, total: 6.15 s
Wall time: 1min 7s

Analyzing Intraday Data¶

Let us quickly check, whether the results are similar on an intraday basis.

fn = 'eikon_id_data.csv'
try:
    data = pd.read_csv(fn, index_col=0, parse_dates=True)
except:
    first = True
    for ric in rics:
        print(ric)
        d = ek.get_timeseries(ric,  # the RIC
                             fields='CLOSE',  # the required fields
                             start_date='2019-10-07',  # start date
                             end_date='2019-10-12',  # end date
                             interval='minute') 
        if first:
            data = d
            data.columns = [ric]
            first = False
        else:
            data[ric] = d
        time.sleep(2)
    data.to_csv(fn)

data.dropna(inplace=True)

data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1996 entries, 2019-10-07 13:32:00 to 2019-10-11 20:15:00
Data columns (total 8 columns):
GE        1996 non-null float64
AAPL.O    1996 non-null float64
.SPX      1996 non-null float64
.VIX      1996 non-null float64
EUR=      1996 non-null float64
XAU=      1996 non-null float64
GLD       1996 non-null float64
BTC=      1996 non-null float64
dtypes: float64(8)
memory usage: 140.3 KB

data.tail()

dfs = {}
for ric in rics:
    df, cols = add_lags(data, ric, lags)
    dfs[ric] = df.dropna(), cols

split = int(len(dfs[ric][0]) * 0.8)

%%time
for ric in rics:
    model = create_model(n_estimators=75)
    df, cols = dfs[ric]  # getting data for the RIC
    train = df.iloc[:split]  # training data set
    mu, std = train[cols].mean(), train[cols].std()
    train[cols] = (train[cols] - mu) / std  # normalization
    model.fit(train[cols], train['d'])  # the fitting step
    test = df.iloc[split:]  # test data set
    test[cols] = (test[cols] - mu) / std  # normalization
    pred = model.predict(test[cols])  # the prediction step
    acc = accuracy_score(test['d'], pred)  # prediction accuracy
    print(f'OUT-OF-SAMPLE | {ric:7s} | acc={acc:.4f}')

OUT-OF-SAMPLE | GE      | acc=0.6396
OUT-OF-SAMPLE | AAPL.O  | acc=0.4619
OUT-OF-SAMPLE | .SPX    | acc=0.4898
OUT-OF-SAMPLE | .VIX    | acc=0.4442
OUT-OF-SAMPLE | EUR=    | acc=0.6421
OUT-OF-SAMPLE | XAU=    | acc=0.5127
OUT-OF-SAMPLE | GLD     | acc=0.4848
OUT-OF-SAMPLE | BTC=    | acc=0.6117
CPU times: user 5.04 s, sys: 785 ms, total: 5.83 s
Wall time: 1min 8s

Conclusions¶

Based on this tutorial, we can conclude that

it is easy to retrieve historical end-of-day and intraday data across asset classes via the Eikon Data API,
Plotly and Cufflinks make financial data visualization convenient and
there is some evidence for statistical inefficiencies based on a machine learning-based prediction approach.

Eikon Data API Developer Resources¶

Data Item Browser Application: Type DIB into Eikon Search Bar.

	GE	AAPL.O	.SPX	.VIX	EUR=	XAU=	GLD	BTC=
Date
2015-01-02	24.088950	109.33	2058.20	17.79	1.2000	1188.68	114.08	315.41
2015-01-05	23.646775	106.25	2020.58	19.92	1.1931	1203.70	115.80	276.80
2015-01-06	23.137312	106.26	2002.61	21.12	1.1888	1218.45	117.12	276.82
2015-01-07	23.146924	107.75	2025.90	19.31	1.1837	1210.05	116.43	NaN
2015-01-08	23.425687	111.89	2062.14	17.01	1.1792	1208.63	115.94	276.82

	GE	AAPL.O	.SPX	.VIX	EUR=	XAU=	GLD	BTC=
Date
2019-09-24	8.97	217.68	2966.60	17.05	1.1018	1531.80	144.51	8575.50
2019-09-25	9.16	221.03	2984.87	15.96	1.0941	1503.75	141.83	8500.98
2019-09-26	9.02	219.89	2977.62	16.07	1.0921	1505.60	141.79	8088.33
2019-09-27	9.04	218.82	2961.79	17.22	1.0938	1496.56	141.06	8055.49
2019-09-30	8.94	223.97	2976.74	16.24	1.0898	1472.00	138.87	8260.00

	GE	AAPL.O	.SPX	.VIX	EUR=	XAU=	GLD	BTC=
Date
2019-10-11 20:10:00	8.78	236.30	2970.27	15.55	1.1039	1487.8700	140.25	8375.70
2019-10-11 20:11:00	8.77	236.33	2970.27	15.63	1.1038	1487.7975	140.23	8352.60
2019-10-11 20:12:00	8.78	236.40	2970.27	15.56	1.1039	1487.2671	140.20	8369.26
2019-10-11 20:14:00	8.77	236.39	2970.27	15.55	1.1035	1487.2476	140.15	8372.16
2019-10-11 20:15:00	8.78	236.40	2970.27	15.58	1.1037	1486.9347	140.14	8384.40