Eikon Data API

Predicting Financial Time Series Movements

Dr. Yves J. Hilpisch | The Python Quants GmbH

http://tpq.io | @dyjh | training@tpq.io

The Agenda

This tutorial shows

  • how to retrieve historical data across asset classes via the Eikon Data API,
  • how to work with such data using pandas, Plotly and Cufflinks and
  • how to apply machine learning (ML) techniques for time series prediction

Importing Required Packages

In [1]:
import eikon as ek  # the Eikon Python wrapper package
import numpy as np  # NumPy
import pandas as pd  # pandas
import cufflinks as cf  # Cufflinks
from sklearn.svm import SVC  # sckikit-learn
from sklearn.model_selection import train_test_split
import configparser as cp

The following Python and package versions are used.

In [2]:
import sys
print(sys.version)
3.6.3 |Anaconda, Inc.| (default, Oct  6 2017, 12:04:38) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
In [3]:
ek.__version__
Out[3]:
'0.1.12'
In [4]:
np.__version__
Out[4]:
'1.14.1'
In [5]:
pd.__version__
Out[5]:
'0.22.0'
In [6]:
cf.__version__
Out[6]:
'0.12.1'

Connecting to Eikon Data API

This code sets the app_id to connect to the Eikon Data API Proxy which needs to be running locally.

In [7]:
cfg = cp.ConfigParser()
cfg.read('eikon.cfg')
Out[7]:
['eikon.cfg']
In [8]:
ek.set_app_id(cfg['eikon']['app_id'])

Retrieving Cross-Asset Data

We first define a small universe of RICS for which to retrieve data.

In [9]:
rics = [
    'SPY',  # S&P 500 ETF
    'AAPL.O',  # Apple stock
    'AMZN.O'  # Amazon stock
]

Second, end-of-day (EOD) data is retrieved.

In [10]:
data = ek.get_timeseries(rics,  # the RICs
                         fields='CLOSE',  # the required fields
                         start_date='2018-02-12',  # start date
                         end_date='2018-02-28', # end date
                         interval='minute')  # bar length  
In [11]:
data.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 10253 entries, 2018-02-12 09:01:00 to 2018-02-28 00:00:00
Data columns (total 3 columns):
SPY       9486 non-null float64
AAPL.O    8896 non-null float64
AMZN.O    8752 non-null float64
dtypes: float64(3)
memory usage: 320.4 KB
In [12]:
data.head()  # first five rows
Out[12]:
CLOSE SPY AAPL.O AMZN.O
Date
2018-02-12 09:01:00 264.71 157.25 1356.42
2018-02-12 09:02:00 264.71 158.30 1356.05
2018-02-12 09:03:00 264.89 158.70 1359.00
2018-02-12 09:04:00 265.02 158.79 1360.00
2018-02-12 09:05:00 265.14 158.70 1360.00
In [13]:
data.tail()  # final five rows
Out[13]:
CLOSE SPY AAPL.O AMZN.O
Date
2018-02-27 23:56:00 NaN 178.7 NaN
2018-02-27 23:57:00 275.20 NaN 1515.53
2018-02-27 23:58:00 275.21 178.7 1516.00
2018-02-27 23:59:00 275.26 178.7 1515.87
2018-02-28 00:00:00 275.27 178.7 1515.01

Only complete data rows are selected.

In [14]:
data.dropna(inplace=True)  # deletes tows with NaN values
In [15]:
data.info()  # DataFrame meta information
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 7615 entries, 2018-02-12 09:01:00 to 2018-02-28 00:00:00
Data columns (total 3 columns):
SPY       7615 non-null float64
AAPL.O    7615 non-null float64
AMZN.O    7615 non-null float64
dtypes: float64(3)
memory usage: 238.0 KB

Calculating the Log Returns

We next calculate the log returns in vectorized fashion.

In [16]:
rets = np.log(data / data.shift(1)).dropna()  # log returns in vectorized fashion
In [17]:
rets.head()
Out[17]:
CLOSE SPY AAPL.O AMZN.O
Date
2018-02-12 09:02:00 0.000000 0.006655 -0.000273
2018-02-12 09:03:00 0.000680 0.002524 0.002173
2018-02-12 09:04:00 0.000491 0.000567 0.000736
2018-02-12 09:05:00 0.000453 -0.000567 0.000000
2018-02-12 09:06:00 0.000264 0.000000 0.000000

Plotting the Data

Using Cufflinks, we can plot the normalized financial time series as line plots for comparison.

In [18]:
cf.set_config_file(offline=True)  # set the plotting mode to offline
In [19]:
data.normalize().iplot(kind='lines')

The frequeny distributions, i.e. the histograms, of the log returns per RIC.

In [20]:
rets.iplot(kind='histogram', subplots=True)

Preparing Lagged Data

To gain insights into whether the random walk hypothesis holds true, we work with five lags. The code that follows derives the lagged data for every single RIC. First, a function that adds columns with lagged data to a DataFrame object.

In [21]:
def add_lags(data, ric, lags):
    cols = []
    df = pd.DataFrame(rets[ric])
    for lag in range(1, lags + 1):
        col = 'lag_{}'.format(lag)  # defines the column name
        # creates the lagged data column
        df[col] = np.digitize(df[ric].shift(lag), bins=[0])
        cols.append(col)  # stores the column name
    df.dropna(inplace=True)  # gets rid of incomplete data rows
    return df, cols

Second, the iterations over all RICs, using the add_lags function and storing the resulting DataFrame objects in a dictionary.

In [22]:
lags = 5  # five historical lags
In [23]:
dfs = {}
for ric in rics:
    df, cols = add_lags(data, ric, lags)
    dfs[ric] = df
In [24]:
cols  # the column names for the lags
Out[24]:
['lag_1', 'lag_2', 'lag_3', 'lag_4', 'lag_5']
In [25]:
dfs.keys()  # the keys of the dictonary
Out[25]:
dict_keys(['SPY', 'AAPL.O', 'AMZN.O'])
In [26]:
dfs['AAPL.O'].head(7)
Out[26]:
AAPL.O lag_1 lag_2 lag_3 lag_4 lag_5
Date
2018-02-12 09:02:00 0.006655 1 1 1 1 1
2018-02-12 09:03:00 0.002524 1 1 1 1 1
2018-02-12 09:04:00 0.000567 1 1 1 1 1
2018-02-12 09:05:00 -0.000567 1 1 1 1 1
2018-02-12 09:06:00 0.000000 0 1 1 1 1
2018-02-12 09:07:00 0.002454 1 0 1 1 1
2018-02-12 09:10:00 -0.000189 1 1 0 1 1

Implementing ML Algorithm

The matrix consisting of the lagged data columns is used to "predict" the next day's direction of movement of the RIC via support vector machine (SVM) algorithm.

In [27]:
for ric in rics:
    model = SVC(C=100) # the ML model
    df = dfs[ric].copy()  # getting data for the RIC
    model.fit(df[cols], np.sign(df[ric]))  # model fitting
    dfs[ric]['position'] = model.predict(df[cols])  # prediction
In [28]:
for ric in rics:
    print('{:10} | {}'.format(ric, dfs[ric]['position'].values[:12]))
SPY        | [-1. -1. -1. -1. -1. -1. -1. -1.  1.  1.  1.  1.]
AAPL.O     | [-1. -1. -1. -1.  1.  1. -1.  1.  1.  1. -1.  1.]
AMZN.O     | [-1.  1. -1. -1. -1. -1. -1.  1.  1. -1. -1.  1.]

Vectorized Backtesting

Let's backtest the performance of the ML-based trading strategies. First, the strategy returns.

In [29]:
for ric in rics:
    dfs[ric]['strategy'] = dfs[ric]['position'] * dfs[ric][ric]

Second, the visualization of the cumulative performance.

In [30]:
for ric in rics:
    dfs[ric][[ric, 'strategy']].cumsum().apply(np.exp).iplot()

Train Test Split

Next, to get a more realistic picture of the real trading performance to be expected a random train test split to implement out-of-sample backtesting.

In [31]:
res = {}
for ric in rics:
    model = SVC(C=100) # the ML model
    df = dfs[ric].copy()  # getting data for the RIC
    mu = df[ric].mean()
    v = df[ric].std()
    bins = [mu - v, mu, mu + v]
    # bins = [0]
    train_x, test_x, train_y, test_y = train_test_split(
        df[cols].apply(lambda x: np.digitize(x, bins=bins)),                   
        np.sign(df[ric]), test_size=0.33, random_state=111)
    train_x.sort_index(inplace=True)
    train_y.sort_index(inplace=True)
    test_x.sort_index(inplace=True)
    test_y.sort_index(inplace=True)
    model.fit(train_x, train_y)  # model fitting
    pred = model.predict(test_x)  # prediction
    strat = pred * df[ric][test_y.index]
    res[ric] = pd.DataFrame({ric: df[ric][test_y.index],
                             'pred': pred,
                             'strategy': strat})
In [32]:
res['AAPL.O'].head()
Out[32]:
AAPL.O pred strategy
Date
2018-02-12 09:04:00 0.000567 -1.0 -0.000567
2018-02-12 09:05:00 -0.000567 -1.0 0.000567
2018-02-12 09:06:00 0.000000 1.0 0.000000
2018-02-12 09:10:00 -0.000189 -1.0 0.000189
2018-02-12 09:27:00 -0.001636 1.0 -0.001636
In [33]:
for ric in rics:
    res[ric][[ric, 'strategy']].cumsum().apply(np.exp).iplot()

Conclusions

Based on this tutorial, we can conclude that

  • it is easy to retrieve historical end-of-day and intraday data across asset classes via the Eikon Data API,
  • Plotly and Cufflinks make financial data visualization convenient,
  • machine learning (ML) techniques are easily applied by the use of Python and
  • that such techniques might be helpful in predicting the direction of market movements.

Eikon Data API Developer Resources

Data Item Browser Application: Type DIB into Eikon Search Bar.