The Python Quants

FXCM Algorithmic Trading Initiative

Algo Trading: Historical Data & Python Wrapper

Dr. Yves J. Hilpisch

The Python Quants GmbH

Risk Disclaimer

Trading forex/CFDs on margin carries a high level of risk and may not be suitable for all investors as you could sustain losses in excess of deposits. Leverage can work against you. Due to the certain restrictions imposed by the local law and regulation, German resident retail client(s) could sustain a total loss of deposited funds but are not subject to subsequent payment obligations beyond the deposited funds. Be aware and fully understand all risks associated with the market and trading. Prior to trading any products, carefully consider your financial situation and experience level. Any opinions, news, research, analyses, prices, or other information is provided as general market commentary, and does not constitute investment advice. FXCM & TPQ will not accept liability for any loss or damage, including without limitation to, any loss of profit, which may arise directly or indirectly from use of or reliance on such information.

Speaker Disclaimer

The speaker is neither an employee, agent nor representative of FXCM and is therefore acting independently. The opinions given are their own, constitute general market commentary, and do not constitute the opinion or advice of FXCM or any form of personal or investment advice. FXCM assumes no responsibility for any loss or damage, including but not limited to, any loss or gain arising out of the direct or indirect use of this or any other content. Trading forex/CFDs on margin carries a high level of risk and may not be suitable for all investors as you could sustain losses in excess of deposits.

Some Basic Imports

In [1]:
import time
import numpy as np
import pandas as pd
import datetime as dt
import cufflinks as cf
from pylab import plt
cf.set_config_file(offline=True)
plt.style.use('seaborn')
%matplotlib inline

You can install fxcmpy via

pip install fxcmpy

The documentation is currently found under http://fxcmpy.tpq.io

Read the license & risk warning carefully.

In [2]:
import fxcmpy
In [3]:
fxcmpy.__version__
Out[3]:
'1.1.11'

Retrieving Historical Tick Data

Available Symbols

In [4]:
from fxcmpy import fxcmpy_tick_data_reader as tdr
In [5]:
print(tdr.get_available_symbols())
('AUDCAD', 'AUDCHF', 'AUDJPY', 'AUDNZD', 'CADCHF', 'EURAUD', 'EURCHF', 'EURGBP', 'EURJPY', 'EURUSD', 'GBPCHF', 'GBPJPY', 'GBPNZD', 'GBPUSD', 'GBPCHF', 'GBPJPY', 'GBPNZD', 'NZDCAD', 'NZDCHF', 'NZDJPY', 'NZDUSD', 'USDCAD', 'USDCHF', 'USDJPY')

Retrieving Tick Data

In [6]:
start = dt.datetime(2018, 3, 26)
stop = dt.datetime(2018, 3, 29)
In [7]:
%time td = tdr('EURUSD', start, stop)
Fetching data from: https://tickdata.fxcorporate.com/EURUSD/2018/13.csv.gz
CPU times: user 1.71 s, sys: 388 ms, total: 2.1 s
Wall time: 5.56 s
In [8]:
type(td)
Out[8]:
fxcmpy.fxcmpy_data_reader.fxcmpy_tick_data_reader
In [9]:
td.get_raw_data().info()
<class 'pandas.core.frame.DataFrame'>
Index: 1036420 entries, 03/25/2018 21:00:13.230 to 03/30/2018 20:59:31.180
Data columns (total 2 columns):
Bid    1036420 non-null float64
Ask    1036420 non-null float64
dtypes: float64(2)
memory usage: 23.7+ MB

Working with the Tick Data

In [10]:
%time td.get_data().info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1036420 entries, 2018-03-25 21:00:13.230000 to 2018-03-30 20:59:31.180000
Data columns (total 2 columns):
Bid    1036420 non-null float64
Ask    1036420 non-null float64
dtypes: float64(2)
memory usage: 23.7 MB
CPU times: user 3.96 s, sys: 28.9 ms, total: 3.98 s
Wall time: 3.98 s
In [11]:
%time td.get_data().info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1036420 entries, 2018-03-25 21:00:13.230000 to 2018-03-30 20:59:31.180000
Data columns (total 2 columns):
Bid    1036420 non-null float64
Ask    1036420 non-null float64
dtypes: float64(2)
memory usage: 23.7 MB
CPU times: user 6.91 ms, sys: 2.09 ms, total: 9 ms
Wall time: 6.06 ms
In [12]:
sub = td.get_data(start='2018-03-27 12:00:00', end='2018-03-27 16:00:00')
In [13]:
sub.head()
Out[13]:
Bid Ask
2018-03-27 12:00:00.006 1.23911 1.23912
2018-03-27 12:00:00.209 1.23911 1.23911
2018-03-27 12:00:00.234 1.23911 1.23908
2018-03-27 12:00:00.243 1.23911 1.23904
2018-03-27 12:00:00.280 1.23910 1.23904
In [14]:
sub['Mid'] = sub.mean(axis=1)
sub['SMA'] = sub['Mid'].rolling(1000).mean()
In [15]:
sub[['Mid', 'SMA']].iplot()

Reading Historical Candles Data

Available Symbols

In [16]:
from fxcmpy import fxcmpy_candles_data_reader as cdr
In [17]:
print(cdr.get_available_symbols())
('AUDCAD', 'AUDCHF', 'AUDJPY', 'AUDNZD', 'CADCHF', 'EURAUD', 'EURCHF', 'EURGBP', 'EURJPY', 'EURUSD', 'GBPCHF', 'GBPJPY', 'GBPNZD', 'GBPUSD', 'GBPCHF', 'GBPJPY', 'GBPNZD', 'NZDCAD', 'NZDCHF', 'NZDJPY', 'NZDUSD', 'USDCAD', 'USDCHF', 'USDJPY')

Retrieving Candles Data

In [18]:
start = dt.datetime(2018, 3, 1)
stop = dt.datetime(2018, 3, 25)

period must be one of m1, H1 or D1

In [19]:
period = 'H1'
In [20]:
candles = cdr('EURUSD', start, stop, period)
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2018/9.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2018/10.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2018/11.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2018/12.csv.gz
In [21]:
type(candles)
Out[21]:
fxcmpy.fxcmpy_data_reader.fxcmpy_candles_data_reader
In [22]:
data = candles.get_data()
In [23]:
data.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 480 entries, 2018-02-25 22:00:00 to 2018-03-23 20:00:00
Data columns (total 8 columns):
BidOpen     480 non-null float64
BidHigh     480 non-null float64
BidLow      480 non-null float64
BidClose    480 non-null float64
AskOpen     480 non-null float64
AskHigh     480 non-null float64
AskLow      480 non-null float64
AskClose    480 non-null float64
dtypes: float64(8)
memory usage: 33.8 KB
In [24]:
data.head()
Out[24]:
BidOpen BidHigh BidLow BidClose AskOpen AskHigh AskLow AskClose
2018-02-25 22:00:00 1.22931 1.22938 1.22849 1.22871 1.22948 1.22973 1.22862 1.22891
2018-02-25 23:00:00 1.22871 1.22949 1.22810 1.22884 1.22891 1.22952 1.22812 1.22887
2018-02-26 00:00:00 1.22884 1.22933 1.22802 1.22897 1.22887 1.22934 1.22804 1.22900
2018-02-26 01:00:00 1.22897 1.23050 1.22881 1.22987 1.22900 1.23051 1.22882 1.22989
2018-02-26 02:00:00 1.22987 1.23190 1.22981 1.23151 1.22989 1.23191 1.22982 1.23153

D1 for period currently only works for time windows before the current year.

In [25]:
start = dt.datetime(2017, 1, 3)
stop = dt.datetime(2017, 12, 31)
In [26]:
period = 'D1'
candles = cdr('EURUSD', start, stop, period)
Fetching data from: https://candledata.fxcorporate.com/D1/EURUSD/2017.csv.gz
In [27]:
candles.get_data().info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 309 entries, 2017-01-02 22:00:00 to 2017-12-31 22:00:00
Data columns (total 8 columns):
BidOpen     309 non-null float64
BidHigh     309 non-null float64
BidLow      309 non-null float64
BidClose    309 non-null float64
AskOpen     309 non-null float64
AskHigh     309 non-null float64
AskLow      309 non-null float64
AskClose    309 non-null float64
dtypes: float64(8)
memory usage: 21.7 KB

Working with Candles Data

In [28]:
sub = pd.DataFrame({'Mid': (candles.get_data()['BidClose'] + candles.get_data()['AskClose']) / 2},
                  index = candles.get_data().index)
sub['Returns'] = np.log(sub / sub.shift(1))
sub.head()
Out[28]:
Mid Returns
2017-01-02 22:00:00 1.040565 NaN
2017-01-03 22:00:00 1.048930 0.008007
2017-01-04 22:00:00 1.060675 0.011135
2017-01-05 22:00:00 1.053455 -0.006830
2017-01-07 22:00:00 1.053050 -0.000385
In [29]:
sub.iplot(subplots=True)
In [30]:
sub['Returns'].iplot(kind='histogram')

Quant Fingures

In [31]:
data = candles.get_data()[['AskOpen', 'AskHigh', 'AskLow', 'AskClose']]
data.columns = ['open', 'high', 'low', 'close']
data.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 309 entries, 2017-01-02 22:00:00 to 2017-12-31 22:00:00
Data columns (total 4 columns):
open     309 non-null float64
high     309 non-null float64
low      309 non-null float64
close    309 non-null float64
dtypes: float64(4)
memory usage: 12.1 KB
In [32]:
qf = cf.QuantFig(data, title='EUR/USD', legend='top',
                 name='EUR/USD', datalegend=False)
In [33]:
qf.iplot()
In [34]:
qf.add_bollinger_bands(periods=10, boll_std=2,
                       colors=['magenta', 'grey'], fill=True)
qf.data.update()
In [35]:
qf.iplot()
In [36]:
qf.add_rsi(periods=14, showbands=False)
qf.data.update()
In [37]:
qf.iplot()

Backtesting AI-Based Algorithmic Trading Strategy

The following example is simplified and for illustration purposes only. Among others, it does not consider transactions costs or bid-ask spreads.

Data Retrieval

In [38]:
period = 'H1'
candles = cdr('EURUSD', start, stop, period)
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/1.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/2.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/3.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/4.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/5.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/6.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/7.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/8.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/9.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/10.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/11.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/12.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/13.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/14.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/15.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/16.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/17.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/18.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/19.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/20.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/21.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/22.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/23.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/24.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/25.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/26.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/27.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/28.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/29.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/30.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/31.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/32.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/33.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/34.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/35.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/36.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/37.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/38.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/39.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/40.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/41.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/42.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/43.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/44.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/45.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/46.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/47.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/48.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/49.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/50.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/51.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/52.csv.gz
In [39]:
candles.get_data().info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 6184 entries, 2017-01-03 00:00:00 to 2017-12-29 21:00:00
Data columns (total 8 columns):
BidOpen     6184 non-null float64
BidHigh     6184 non-null float64
BidLow      6184 non-null float64
BidClose    6184 non-null float64
AskOpen     6184 non-null float64
AskHigh     6184 non-null float64
AskLow      6184 non-null float64
AskClose    6184 non-null float64
dtypes: float64(8)
memory usage: 434.8 KB
In [40]:
data = pd.DataFrame(candles.get_data()[['AskClose', 'BidClose']].mean(axis=1),
                    columns=['midclose'])
In [41]:
data.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 6184 entries, 2017-01-03 00:00:00 to 2017-12-29 21:00:00
Data columns (total 1 columns):
midclose    6184 non-null float64
dtypes: float64(1)
memory usage: 96.6 KB
In [42]:
data.tail()
Out[42]:
midclose
2017-12-29 17:00:00 1.202155
2017-12-29 18:00:00 1.201370
2017-12-29 19:00:00 1.200930
2017-12-29 20:00:00 1.199815
2017-12-29 21:00:00 1.200705
In [43]:
data.iplot()

Feature Preparation

In [44]:
data['returns'] = np.log(data / data.shift(1))
In [45]:
lags = 5
cols = []
for lag in range(1, lags + 1):
    col = 'lag_%s' % lag
    data[col] = data['returns'].shift(lag)
    cols.append(col)
In [46]:
cols
Out[46]:
['lag_1', 'lag_2', 'lag_3', 'lag_4', 'lag_5']
In [47]:
from pylab import plt
plt.style.use('seaborn')
%matplotlib inline
In [48]:
data['direction'] = np.sign(data['returns'])
to_plot = ['midclose', 'returns', 'direction']
data[to_plot].iloc[-75:].plot(figsize=(10, 6),
        subplots=True, style=['-', '-', 'ro'], title='EUR/USD');
In [49]:
# the "patterns" = 2 ** lags
np.digitize(data[cols], bins=[0])[:10]
Out[49]:
array([[1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [0, 1, 1, 1, 1],
       [1, 0, 1, 1, 1],
       [1, 1, 0, 1, 1],
       [0, 1, 1, 0, 1],
       [1, 0, 1, 1, 0],
       [0, 1, 0, 1, 1],
       [0, 0, 1, 0, 1]])
In [50]:
2 ** len(cols)
Out[50]:
32
In [51]:
data.dropna(inplace=True)

Support Vector Machines

In [52]:
from sklearn import svm
In [53]:
model = svm.SVC(C=100)
In [54]:
data.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 6178 entries, 2017-01-03 06:00:00 to 2017-12-29 21:00:00
Data columns (total 8 columns):
midclose     6178 non-null float64
returns      6178 non-null float64
lag_1        6178 non-null float64
lag_2        6178 non-null float64
lag_3        6178 non-null float64
lag_4        6178 non-null float64
lag_5        6178 non-null float64
direction    6178 non-null float64
dtypes: float64(8)
memory usage: 434.4 KB
In [55]:
%time model.fit(np.sign(data[cols]), np.sign(data['returns']))
CPU times: user 1.34 s, sys: 54.5 ms, total: 1.39 s
Wall time: 1.39 s
Out[55]:
SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

Predicting Market Direction

In the prediction, a +1 means a positive return is expected and a -1 means a negative return is expected.

In [56]:
pred = model.predict(np.sign(data[cols]))
pred[:15]
Out[56]:
array([ 1.,  1.,  1.,  1.,  1., -1.,  1.,  1.,  1.,  1.,  1., -1., -1.,
       -1.,  1.])

Vectorized Backtesting

In [57]:
data['position'] = pred
In [58]:
data['strategy'] = data['position'] * data['returns']
In [59]:
# unleveraged | no bid-ask spread or transaction costs | only in-sample
data[['returns', 'strategy']].cumsum().apply(np.exp).iplot()
In [60]:
data['position'].value_counts()
Out[60]:
 1.0    4018
-1.0    2159
 0.0       1
Name: position, dtype: int64

Train Test Split

In [61]:
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

Split Feature Sets

In [62]:
# mu = data['returns'].mean()
# v = data['returns'].std()
# bins = [mu - 0.5 * v, mu, mu + 0.5 * v]
train_x, test_x, train_y, test_y = train_test_split(
    data[cols].apply(lambda x: np.digitize(x, bins=[0])),                   
    np.sign(data['returns']),
    test_size=0.50, random_state=111)
In [63]:
train_x.sort_index(inplace=True)
train_y.sort_index(inplace=True)
test_x.sort_index(inplace=True)
test_y.sort_index(inplace=True)
In [64]:
# the patterns = buckets ** lags
train_x.head(5)
Out[64]:
lag_1 lag_2 lag_3 lag_4 lag_5
2017-01-03 06:00:00 0 1 1 0 1
2017-01-03 07:00:00 1 0 1 1 0
2017-01-03 16:00:00 0 0 1 0 0
2017-01-03 17:00:00 1 0 0 1 0
2017-01-03 19:00:00 0 1 1 0 0
In [65]:
test_x.tail(5)
Out[65]:
lag_1 lag_2 lag_3 lag_4 lag_5
2017-12-29 12:00:00 1 0 1 1 0
2017-12-29 13:00:00 1 1 0 1 1
2017-12-29 14:00:00 0 1 1 0 1
2017-12-29 20:00:00 0 0 0 1 1
2017-12-29 21:00:00 0 0 0 0 1
In [66]:
2 ** len(cols)
Out[66]:
32
In [67]:
ax = data['midclose'].iloc[-75:][train_x.index].plot(style=['bo'],
                                                    figsize=(10, 6))
data['midclose'].iloc[-75:][test_x.index].plot(style=['ro'], ax=ax)
data['midclose'].iloc[-75:].plot(ax=ax, lw=0.5, style=['k--']);

Model Fitting & Prediction

In [68]:
model.fit(train_x, train_y)
Out[68]:
SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)
In [69]:
train_pred = model.predict(train_x)
In [70]:
accuracy_score(train_y, train_pred)
Out[70]:
0.5364195532534801
In [71]:
test_pred = model.predict(test_x)
In [72]:
accuracy_score(test_y, test_pred)
Out[72]:
0.5250890255746197

Vectorized Backtesting — Direct Predictions

In [73]:
pred = model.predict(data[cols].apply(lambda x: np.digitize(x, bins=[0])).dropna())
pred[:15]
Out[73]:
array([ 1., -1.,  1.,  1.,  1., -1.,  1.,  1.,  1.,  1.,  1., -1., -1.,
       -1.,  1.])
In [74]:
data['position'] = pred
In [75]:
data['strategy'] = data['position'] * data['returns']
In [76]:
# in-sample | unleveraged | no bid-ask spread or transaction costs
data.loc[train_x.index][['returns', 'strategy']].cumsum().apply(np.exp).iplot()
In [77]:
# out-of-sample | unleveraged | no bid-ask spread or transaction costs
data.loc[test_x.index][['returns', 'strategy']].cumsum().apply(np.exp).iplot()
In [78]:
# number of trades
sum(data['position'].diff() != 0)
Out[78]:
2948

The Python Quants