Eikon Data API

Predicting Financial Time Series Movements

Dr. Yves J. Hilpisch | The Python Quants GmbH

http://tpq.io | @dyjh | training@tpq.io

The Agenda

This tutorial shows

  • how to retrieve historical data across asset classes via the Eikon Data API,
  • how to work with such data using pandas, Plotly and Cufflinks and
  • how to apply machine learning (ML) techniques for time series prediction

Importing Required Packages

In [1]:
import eikon as ek  # the Eikon Python wrapper package
import numpy as np  # NumPy
import pandas as pd  # pandas
import cufflinks as cf  # Cufflinks
from sklearn.svm import SVC  # sckikit-learn
from sklearn.model_selection import train_test_split
import configparser as cp

The following Python and package versions are used.

In [2]:
import sys
print(sys.version)
3.6.3 |Anaconda, Inc.| (default, Oct  6 2017, 12:04:38) 
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
In [3]:
ek.__version__
Out[3]:
'0.1.12'
In [4]:
np.__version__
Out[4]:
'1.14.1'
In [5]:
pd.__version__
Out[5]:
'0.22.0'
In [6]:
cf.__version__
Out[6]:
'0.12.1'

Connecting to Eikon Data API

This code sets the app_id to connect to the Eikon Data API Proxy which needs to be running locally.

In [7]:
cfg = cp.ConfigParser()
cfg.read('eikon.cfg')
Out[7]:
['eikon.cfg']
In [8]:
ek.set_app_id(cfg['eikon']['app_id'])

Retrieving Cross-Asset Data

We first define a small universe of RICS for which to retrieve data.

In [9]:
rics = [
    'SPY',  # S&P 500 ETF
    'AAPL.O',  # Apple stock
    'AMZN.O'  # Amazon stock
]

Second, end-of-day (EOD) data is retrieved.

In [10]:
data = ek.get_timeseries(rics,  # the RICs
                         fields='CLOSE',  # the required fields
                         start_date='2018-02-12',  # start date
                         end_date='2018-02-28', # end date
                         interval='minute')  # bar length  
In [11]:
data.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 10253 entries, 2018-02-12 09:01:00 to 2018-02-28 00:00:00
Data columns (total 3 columns):
SPY       9486 non-null float64
AAPL.O    8896 non-null float64
AMZN.O    8752 non-null float64
dtypes: float64(3)
memory usage: 320.4 KB
In [12]:
data.head()  # first five rows
Out[12]:
CLOSE SPY AAPL.O AMZN.O
Date
2018-02-12 09:01:00 264.71 157.25 1356.42
2018-02-12 09:02:00 264.71 158.30 1356.05
2018-02-12 09:03:00 264.89 158.70 1359.00
2018-02-12 09:04:00 265.02 158.79 1360.00
2018-02-12 09:05:00 265.14 158.70 1360.00
In [13]:
data.tail()  # final five rows
Out[13]:
CLOSE SPY AAPL.O AMZN.O
Date
2018-02-27 23:56:00 NaN 178.7 NaN
2018-02-27 23:57:00 275.20 NaN 1515.53
2018-02-27 23:58:00 275.21 178.7 1516.00
2018-02-27 23:59:00 275.26 178.7 1515.87
2018-02-28 00:00:00 275.27 178.7 1515.01

Only complete data rows are selected.

In [14]:
data.dropna(inplace=True)  # deletes tows with NaN values
In [15]:
data.info()  # DataFrame meta information
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 7615 entries, 2018-02-12 09:01:00 to 2018-02-28 00:00:00
Data columns (total 3 columns):
SPY       7615 non-null float64
AAPL.O    7615 non-null float64
AMZN.O    7615 non-null float64
dtypes: float64(3)
memory usage: 238.0 KB

Calculating the Log Returns

We next calculate the log returns in vectorized fashion.

In [16]:
rets = np.log(data / data.shift(1)).dropna()  # log returns in vectorized fashion
In [17]:
rets.head()
Out[17]:
CLOSE SPY AAPL.O AMZN.O
Date
2018-02-12 09:02:00 0.000000 0.006655 -0.000273
2018-02-12 09:03:00 0.000680 0.002524 0.002173
2018-02-12 09:04:00 0.000491 0.000567 0.000736
2018-02-12 09:05:00 0.000453 -0.000567 0.000000
2018-02-12 09:06:00 0.000264 0.000000 0.000000

Plotting the Data

Using Cufflinks, we can plot the normalized financial time series as line plots for comparison.

In [18]:
cf.set_config_file(offline=True)  # set the plotting mode to offline
In [19]:
data.normalize().iplot(kind='lines')