Eikon Data API¶

Predicting Financial Time Series Movements

Dr. Yves J. Hilpisch | The Python Quants GmbH

The Agenda¶

This tutorial shows

how to retrieve historical data across asset classes via the Eikon Data API,
how to work with such data using pandas, Plotly and Cufflinks and
how to apply machine learning (ML) techniques for time series prediction

Importing Required Packages¶

import eikon as ek  # the Eikon Python wrapper package
import numpy as np  # NumPy
import pandas as pd  # pandas
import cufflinks as cf  # Cufflinks
from sklearn.svm import SVC  # sckikit-learn
from sklearn.model_selection import train_test_split
import configparser as cp

The following Python and package versions are used.

import sys
print(sys.version)

3.6.3 |Anaconda, Inc.| (default, Oct  6 2017, 12:04:38) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]

ek.__version__

'0.1.12'

np.__version__

'1.14.1'

pd.__version__

'0.22.0'

cf.__version__

'0.12.1'

Connecting to Eikon Data API¶

This code sets the app_id to connect to the Eikon Data API Proxy which needs to be running locally.

cfg = cp.ConfigParser()
cfg.read('eikon.cfg')

['eikon.cfg']

ek.set_app_id(cfg['eikon']['app_id'])

Retrieving Cross-Asset Data¶

We first define a small universe of RICS for which to retrieve data.

rics = [
    'SPY',  # S&P 500 ETF
    'AAPL.O',  # Apple stock
    'AMZN.O'  # Amazon stock
]

Second, end-of-day (EOD) data is retrieved.

data = ek.get_timeseries(rics,  # the RICs
                         fields='CLOSE',  # the required fields
                         start_date='2018-02-12',  # start date
                         end_date='2018-02-28', # end date
                         interval='minute')  # bar length

data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 10253 entries, 2018-02-12 09:01:00 to 2018-02-28 00:00:00
Data columns (total 3 columns):
SPY       9486 non-null float64
AAPL.O    8896 non-null float64
AMZN.O    8752 non-null float64
dtypes: float64(3)
memory usage: 320.4 KB

data.head()  # first five rows

data.tail()  # final five rows

Only complete data rows are selected.

data.dropna(inplace=True)  # deletes tows with NaN values

data.info()  # DataFrame meta information

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 7615 entries, 2018-02-12 09:01:00 to 2018-02-28 00:00:00
Data columns (total 3 columns):
SPY       7615 non-null float64
AAPL.O    7615 non-null float64
AMZN.O    7615 non-null float64
dtypes: float64(3)
memory usage: 238.0 KB

Calculating the Log Returns¶

We next calculate the log returns in vectorized fashion.

rets = np.log(data / data.shift(1)).dropna()  # log returns in vectorized fashion

rets.head()

Plotting the Data¶

Using Cufflinks, we can plot the normalized financial time series as line plots for comparison.

cf.set_config_file(offline=True)  # set the plotting mode to offline

data.normalize().iplot(kind='lines')

The frequeny distributions, i.e. the histograms, of the log returns per RIC.

rets.iplot(kind='histogram', subplots=True)

Preparing Lagged Data¶

To gain insights into whether the random walk hypothesis holds true, we work with five lags. The code that follows derives the lagged data for every single RIC. First, a function that adds columns with lagged data to a DataFrame object.

def add_lags(data, ric, lags):
    cols = []
    df = pd.DataFrame(rets[ric])
    for lag in range(1, lags + 1):
        col = 'lag_{}'.format(lag)  # defines the column name
        # creates the lagged data column
        df[col] = np.digitize(df[ric].shift(lag), bins=[0])
        cols.append(col)  # stores the column name
    df.dropna(inplace=True)  # gets rid of incomplete data rows
    return df, cols

Second, the iterations over all RICs, using the add_lags function and storing the resulting DataFrame objects in a dictionary.

lags = 5  # five historical lags

dfs = {}
for ric in rics:
    df, cols = add_lags(data, ric, lags)
    dfs[ric] = df

cols  # the column names for the lags

['lag_1', 'lag_2', 'lag_3', 'lag_4', 'lag_5']

dfs.keys()  # the keys of the dictonary

dict_keys(['SPY', 'AAPL.O', 'AMZN.O'])

dfs['AAPL.O'].head(7)

Implementing ML Algorithm¶

The matrix consisting of the lagged data columns is used to "predict" the next day's direction of movement of the RIC via support vector machine (SVM) algorithm.

for ric in rics:
    model = SVC(C=100) # the ML model
    df = dfs[ric].copy()  # getting data for the RIC
    model.fit(df[cols], np.sign(df[ric]))  # model fitting
    dfs[ric]['position'] = model.predict(df[cols])  # prediction

for ric in rics:
    print('{:10} | {}'.format(ric, dfs[ric]['position'].values[:12]))

SPY        | [-1. -1. -1. -1. -1. -1. -1. -1.  1.  1.  1.  1.]
AAPL.O     | [-1. -1. -1. -1.  1.  1. -1.  1.  1.  1. -1.  1.]
AMZN.O     | [-1.  1. -1. -1. -1. -1. -1.  1.  1. -1. -1.  1.]

Vectorized Backtesting¶

Let's backtest the performance of the ML-based trading strategies. First, the strategy returns.

for ric in rics:
    dfs[ric]['strategy'] = dfs[ric]['position'] * dfs[ric][ric]

Second, the visualization of the cumulative performance.

for ric in rics:
    dfs[ric][[ric, 'strategy']].cumsum().apply(np.exp).iplot()

Train Test Split¶

Next, to get a more realistic picture of the real trading performance to be expected a random train test split to implement out-of-sample backtesting.

res = {}
for ric in rics:
    model = SVC(C=100) # the ML model
    df = dfs[ric].copy()  # getting data for the RIC
    mu = df[ric].mean()
    v = df[ric].std()
    bins = [mu - v, mu, mu + v]
    # bins = [0]
    train_x, test_x, train_y, test_y = train_test_split(
        df[cols].apply(lambda x: np.digitize(x, bins=bins)),                   
        np.sign(df[ric]), test_size=0.33, random_state=111)
    train_x.sort_index(inplace=True)
    train_y.sort_index(inplace=True)
    test_x.sort_index(inplace=True)
    test_y.sort_index(inplace=True)
    model.fit(train_x, train_y)  # model fitting
    pred = model.predict(test_x)  # prediction
    strat = pred * df[ric][test_y.index]
    res[ric] = pd.DataFrame({ric: df[ric][test_y.index],
                             'pred': pred,
                             'strategy': strat})

res['AAPL.O'].head()

for ric in rics:
    res[ric][[ric, 'strategy']].cumsum().apply(np.exp).iplot()

Conclusions¶

Based on this tutorial, we can conclude that

it is easy to retrieve historical end-of-day and intraday data across asset classes via the Eikon Data API,
Plotly and Cufflinks make financial data visualization convenient,
machine learning (ML) techniques are easily applied by the use of Python and
that such techniques might be helpful in predicting the direction of market movements.

Eikon Data API Developer Resources¶

Data Item Browser Application: Type DIB into Eikon Search Bar.

CLOSE	SPY	AAPL.O	AMZN.O
Date
2018-02-12 09:01:00	264.71	157.25	1356.42
2018-02-12 09:02:00	264.71	158.30	1356.05
2018-02-12 09:03:00	264.89	158.70	1359.00
2018-02-12 09:04:00	265.02	158.79	1360.00
2018-02-12 09:05:00	265.14	158.70	1360.00

CLOSE	SPY	AAPL.O	AMZN.O
Date
2018-02-27 23:56:00	NaN	178.7	NaN
2018-02-27 23:57:00	275.20	NaN	1515.53
2018-02-27 23:58:00	275.21	178.7	1516.00
2018-02-27 23:59:00	275.26	178.7	1515.87
2018-02-28 00:00:00	275.27	178.7	1515.01

CLOSE	SPY	AAPL.O	AMZN.O
Date
2018-02-12 09:02:00	0.000000	0.006655	-0.000273
2018-02-12 09:03:00	0.000680	0.002524	0.002173
2018-02-12 09:04:00	0.000491	0.000567	0.000736
2018-02-12 09:05:00	0.000453	-0.000567	0.000000
2018-02-12 09:06:00	0.000264	0.000000	0.000000

	AAPL.O	lag_1	lag_2	lag_3	lag_4	lag_5
Date
2018-02-12 09:02:00	0.006655	1	1	1	1	1
2018-02-12 09:03:00	0.002524	1	1	1	1	1
2018-02-12 09:04:00	0.000567	1	1	1	1	1
2018-02-12 09:05:00	-0.000567	1	1	1	1	1
2018-02-12 09:06:00	0.000000	0	1	1	1	1
2018-02-12 09:07:00	0.002454	1	0	1	1	1
2018-02-12 09:10:00	-0.000189	1	1	0	1	1

	AAPL.O	pred	strategy
Date
2018-02-12 09:04:00	0.000567	-1.0	-0.000567
2018-02-12 09:05:00	-0.000567	-1.0	0.000567
2018-02-12 09:06:00	0.000000	1.0	0.000000
2018-02-12 09:10:00	-0.000189	-1.0	0.000189
2018-02-12 09:27:00	-0.001636	1.0	-0.001636