The Python Quants

FXCM Algorithmic Trading Initiative¶

Algo Trading: Historical Data & Python Wrapper¶

Dr. Yves J. Hilpisch

The Python Quants GmbH

Risk Disclaimer¶

Trading forex/CFDs on margin carries a high level of risk and may not be suitable for all investors as you could sustain losses in excess of deposits. Leverage can work against you. Due to the certain restrictions imposed by the local law and regulation, German resident retail client(s) could sustain a total loss of deposited funds but are not subject to subsequent payment obligations beyond the deposited funds. Be aware and fully understand all risks associated with the market and trading. Prior to trading any products, carefully consider your financial situation and experience level. Any opinions, news, research, analyses, prices, or other information is provided as general market commentary, and does not constitute investment advice. FXCM & TPQ will not accept liability for any loss or damage, including without limitation to, any loss of profit, which may arise directly or indirectly from use of or reliance on such information.

Speaker Disclaimer¶

The speaker is neither an employee, agent nor representative of FXCM and is therefore acting independently. The opinions given are their own, constitute general market commentary, and do not constitute the opinion or advice of FXCM or any form of personal or investment advice. FXCM assumes no responsibility for any loss or damage, including but not limited to, any loss or gain arising out of the direct or indirect use of this or any other content. Trading forex/CFDs on margin carries a high level of risk and may not be suitable for all investors as you could sustain losses in excess of deposits.

Some Basic Imports¶

import time
import numpy as np
import pandas as pd
import datetime as dt
import cufflinks as cf
from pylab import plt
cf.set_config_file(offline=True)
plt.style.use('seaborn')
%matplotlib inline

You can install fxcmpy via

pip install fxcmpy

The documentation is currently found under http://fxcmpy.tpq.io

Read the license & risk warning carefully.

import fxcmpy

fxcmpy.__version__

'1.1.11'

Retrieving Historical Tick Data¶

Available Symbols¶

from fxcmpy import fxcmpy_tick_data_reader as tdr

print(tdr.get_available_symbols())

('AUDCAD', 'AUDCHF', 'AUDJPY', 'AUDNZD', 'CADCHF', 'EURAUD', 'EURCHF', 'EURGBP', 'EURJPY', 'EURUSD', 'GBPCHF', 'GBPJPY', 'GBPNZD', 'GBPUSD', 'GBPCHF', 'GBPJPY', 'GBPNZD', 'NZDCAD', 'NZDCHF', 'NZDJPY', 'NZDUSD', 'USDCAD', 'USDCHF', 'USDJPY')

Retrieving Tick Data¶

start = dt.datetime(2018, 3, 26)
stop = dt.datetime(2018, 3, 29)

%time td = tdr('EURUSD', start, stop)

Fetching data from: https://tickdata.fxcorporate.com/EURUSD/2018/13.csv.gz
CPU times: user 1.71 s, sys: 388 ms, total: 2.1 s
Wall time: 5.56 s

type(td)

fxcmpy.fxcmpy_data_reader.fxcmpy_tick_data_reader

td.get_raw_data().info()

<class 'pandas.core.frame.DataFrame'>
Index: 1036420 entries, 03/25/2018 21:00:13.230 to 03/30/2018 20:59:31.180
Data columns (total 2 columns):
Bid    1036420 non-null float64
Ask    1036420 non-null float64
dtypes: float64(2)
memory usage: 23.7+ MB

Working with the Tick Data¶

%time td.get_data().info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1036420 entries, 2018-03-25 21:00:13.230000 to 2018-03-30 20:59:31.180000
Data columns (total 2 columns):
Bid    1036420 non-null float64
Ask    1036420 non-null float64
dtypes: float64(2)
memory usage: 23.7 MB
CPU times: user 3.96 s, sys: 28.9 ms, total: 3.98 s
Wall time: 3.98 s

%time td.get_data().info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1036420 entries, 2018-03-25 21:00:13.230000 to 2018-03-30 20:59:31.180000
Data columns (total 2 columns):
Bid    1036420 non-null float64
Ask    1036420 non-null float64
dtypes: float64(2)
memory usage: 23.7 MB
CPU times: user 6.91 ms, sys: 2.09 ms, total: 9 ms
Wall time: 6.06 ms

sub = td.get_data(start='2018-03-27 12:00:00', end='2018-03-27 16:00:00')

sub.head()

sub['Mid'] = sub.mean(axis=1)
sub['SMA'] = sub['Mid'].rolling(1000).mean()

sub[['Mid', 'SMA']].iplot()

Reading Historical Candles Data¶

Available Symbols¶

from fxcmpy import fxcmpy_candles_data_reader as cdr

print(cdr.get_available_symbols())

('AUDCAD', 'AUDCHF', 'AUDJPY', 'AUDNZD', 'CADCHF', 'EURAUD', 'EURCHF', 'EURGBP', 'EURJPY', 'EURUSD', 'GBPCHF', 'GBPJPY', 'GBPNZD', 'GBPUSD', 'GBPCHF', 'GBPJPY', 'GBPNZD', 'NZDCAD', 'NZDCHF', 'NZDJPY', 'NZDUSD', 'USDCAD', 'USDCHF', 'USDJPY')

Retrieving Candles Data¶

start = dt.datetime(2018, 3, 1)
stop = dt.datetime(2018, 3, 25)

period must be one of m1, H1 or D1

period = 'H1'

candles = cdr('EURUSD', start, stop, period)

Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2018/9.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2018/10.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2018/11.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2018/12.csv.gz

type(candles)

fxcmpy.fxcmpy_data_reader.fxcmpy_candles_data_reader

data = candles.get_data()

data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 480 entries, 2018-02-25 22:00:00 to 2018-03-23 20:00:00
Data columns (total 8 columns):
BidOpen     480 non-null float64
BidHigh     480 non-null float64
BidLow      480 non-null float64
BidClose    480 non-null float64
AskOpen     480 non-null float64
AskHigh     480 non-null float64
AskLow      480 non-null float64
AskClose    480 non-null float64
dtypes: float64(8)
memory usage: 33.8 KB

data.head()

D1 for period currently only works for time windows before the current year.

start = dt.datetime(2017, 1, 3)
stop = dt.datetime(2017, 12, 31)

period = 'D1'
candles = cdr('EURUSD', start, stop, period)

Fetching data from: https://candledata.fxcorporate.com/D1/EURUSD/2017.csv.gz

candles.get_data().info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 309 entries, 2017-01-02 22:00:00 to 2017-12-31 22:00:00
Data columns (total 8 columns):
BidOpen     309 non-null float64
BidHigh     309 non-null float64
BidLow      309 non-null float64
BidClose    309 non-null float64
AskOpen     309 non-null float64
AskHigh     309 non-null float64
AskLow      309 non-null float64
AskClose    309 non-null float64
dtypes: float64(8)
memory usage: 21.7 KB

Working with Candles Data¶

sub = pd.DataFrame({'Mid': (candles.get_data()['BidClose'] + candles.get_data()['AskClose']) / 2},
                  index = candles.get_data().index)
sub['Returns'] = np.log(sub / sub.shift(1))
sub.head()

sub.iplot(subplots=True)

sub['Returns'].iplot(kind='histogram')

Quant Fingures¶

data = candles.get_data()[['AskOpen', 'AskHigh', 'AskLow', 'AskClose']]
data.columns = ['open', 'high', 'low', 'close']
data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 309 entries, 2017-01-02 22:00:00 to 2017-12-31 22:00:00
Data columns (total 4 columns):
open     309 non-null float64
high     309 non-null float64
low      309 non-null float64
close    309 non-null float64
dtypes: float64(4)
memory usage: 12.1 KB

qf = cf.QuantFig(data, title='EUR/USD', legend='top',
                 name='EUR/USD', datalegend=False)

qf.iplot()

qf.add_bollinger_bands(periods=10, boll_std=2,
                       colors=['magenta', 'grey'], fill=True)
qf.data.update()

qf.iplot()

qf.add_rsi(periods=14, showbands=False)
qf.data.update()

qf.iplot()

Backtesting AI-Based Algorithmic Trading Strategy¶

The following example is simplified and for illustration purposes only. Among others, it does not consider transactions costs or bid-ask spreads.

Data Retrieval¶

period = 'H1'
candles = cdr('EURUSD', start, stop, period)

Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/1.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/2.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/3.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/4.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/5.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/6.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/7.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/8.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/9.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/10.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/11.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/12.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/13.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/14.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/15.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/16.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/17.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/18.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/19.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/20.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/21.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/22.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/23.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/24.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/25.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/26.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/27.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/28.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/29.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/30.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/31.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/32.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/33.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/34.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/35.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/36.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/37.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/38.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/39.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/40.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/41.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/42.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/43.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/44.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/45.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/46.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/47.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/48.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/49.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/50.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/51.csv.gz
Fetching data from: https://candledata.fxcorporate.com/H1/EURUSD/2017/52.csv.gz

candles.get_data().info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 6184 entries, 2017-01-03 00:00:00 to 2017-12-29 21:00:00
Data columns (total 8 columns):
BidOpen     6184 non-null float64
BidHigh     6184 non-null float64
BidLow      6184 non-null float64
BidClose    6184 non-null float64
AskOpen     6184 non-null float64
AskHigh     6184 non-null float64
AskLow      6184 non-null float64
AskClose    6184 non-null float64
dtypes: float64(8)
memory usage: 434.8 KB

data = pd.DataFrame(candles.get_data()[['AskClose', 'BidClose']].mean(axis=1),
                    columns=['midclose'])

data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 6184 entries, 2017-01-03 00:00:00 to 2017-12-29 21:00:00
Data columns (total 1 columns):
midclose    6184 non-null float64
dtypes: float64(1)
memory usage: 96.6 KB

data.tail()

data.iplot()

Feature Preparation¶

data['returns'] = np.log(data / data.shift(1))

lags = 5
cols = []
for lag in range(1, lags + 1):
    col = 'lag_%s' % lag
    data[col] = data['returns'].shift(lag)
    cols.append(col)

cols

['lag_1', 'lag_2', 'lag_3', 'lag_4', 'lag_5']

from pylab import plt
plt.style.use('seaborn')
%matplotlib inline

data['direction'] = np.sign(data['returns'])
to_plot = ['midclose', 'returns', 'direction']
data[to_plot].iloc[-75:].plot(figsize=(10, 6),
        subplots=True, style=['-', '-', 'ro'], title='EUR/USD');

# the "patterns" = 2 ** lags
np.digitize(data[cols], bins=[0])[:10]

array([[1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1],
       [0, 1, 1, 1, 1],
       [1, 0, 1, 1, 1],
       [1, 1, 0, 1, 1],
       [0, 1, 1, 0, 1],
       [1, 0, 1, 1, 0],
       [0, 1, 0, 1, 1],
       [0, 0, 1, 0, 1]])

2 ** len(cols)

32

data.dropna(inplace=True)

Support Vector Machines¶

from sklearn import svm

model = svm.SVC(C=100)

data.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 6178 entries, 2017-01-03 06:00:00 to 2017-12-29 21:00:00
Data columns (total 8 columns):
midclose     6178 non-null float64
returns      6178 non-null float64
lag_1        6178 non-null float64
lag_2        6178 non-null float64
lag_3        6178 non-null float64
lag_4        6178 non-null float64
lag_5        6178 non-null float64
direction    6178 non-null float64
dtypes: float64(8)
memory usage: 434.4 KB

%time model.fit(np.sign(data[cols]), np.sign(data['returns']))

CPU times: user 1.34 s, sys: 54.5 ms, total: 1.39 s
Wall time: 1.39 s

SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

Predicting Market Direction¶

In the prediction, a +1 means a positive return is expected and a -1 means a negative return is expected.

pred = model.predict(np.sign(data[cols]))
pred[:15]

array([ 1.,  1.,  1.,  1.,  1., -1.,  1.,  1.,  1.,  1.,  1., -1., -1.,
       -1.,  1.])

Vectorized Backtesting¶

data['position'] = pred

data['strategy'] = data['position'] * data['returns']

# unleveraged | no bid-ask spread or transaction costs | only in-sample
data[['returns', 'strategy']].cumsum().apply(np.exp).iplot()

data['position'].value_counts()

 1.0    4018
-1.0    2159
 0.0       1
Name: position, dtype: int64

Train Test Split¶

from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

Split Feature Sets¶

# mu = data['returns'].mean()
# v = data['returns'].std()
# bins = [mu - 0.5 * v, mu, mu + 0.5 * v]
train_x, test_x, train_y, test_y = train_test_split(
    data[cols].apply(lambda x: np.digitize(x, bins=[0])),                   
    np.sign(data['returns']),
    test_size=0.50, random_state=111)

train_x.sort_index(inplace=True)
train_y.sort_index(inplace=True)
test_x.sort_index(inplace=True)
test_y.sort_index(inplace=True)

# the patterns = buckets ** lags
train_x.head(5)

test_x.tail(5)

2 ** len(cols)

32

ax = data['midclose'].iloc[-75:][train_x.index].plot(style=['bo'],
                                                    figsize=(10, 6))
data['midclose'].iloc[-75:][test_x.index].plot(style=['ro'], ax=ax)
data['midclose'].iloc[-75:].plot(ax=ax, lw=0.5, style=['k--']);

Model Fitting & Prediction¶

model.fit(train_x, train_y)

SVC(C=100, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

train_pred = model.predict(train_x)

accuracy_score(train_y, train_pred)

0.5364195532534801

test_pred = model.predict(test_x)

accuracy_score(test_y, test_pred)

0.5250890255746197

Vectorized Backtesting — Direct Predictions

pred = model.predict(data[cols].apply(lambda x: np.digitize(x, bins=[0])).dropna())
pred[:15]

array([ 1., -1.,  1.,  1.,  1., -1.,  1.,  1.,  1.,  1.,  1., -1., -1.,
       -1.,  1.])

data['position'] = pred

data['strategy'] = data['position'] * data['returns']

# in-sample | unleveraged | no bid-ask spread or transaction costs
data.loc[train_x.index][['returns', 'strategy']].cumsum().apply(np.exp).iplot()

# out-of-sample | unleveraged | no bid-ask spread or transaction costs
data.loc[test_x.index][['returns', 'strategy']].cumsum().apply(np.exp).iplot()

# number of trades
sum(data['position'].diff() != 0)

2948

The Python Quants

	Bid	Ask
2018-03-27 12:00:00.006	1.23911	1.23912
2018-03-27 12:00:00.209	1.23911	1.23911
2018-03-27 12:00:00.234	1.23911	1.23908
2018-03-27 12:00:00.243	1.23911	1.23904
2018-03-27 12:00:00.280	1.23910	1.23904

	BidOpen	BidHigh	BidLow	BidClose	AskOpen	AskHigh	AskLow	AskClose
2018-02-25 22:00:00	1.22931	1.22938	1.22849	1.22871	1.22948	1.22973	1.22862	1.22891
2018-02-25 23:00:00	1.22871	1.22949	1.22810	1.22884	1.22891	1.22952	1.22812	1.22887
2018-02-26 00:00:00	1.22884	1.22933	1.22802	1.22897	1.22887	1.22934	1.22804	1.22900
2018-02-26 01:00:00	1.22897	1.23050	1.22881	1.22987	1.22900	1.23051	1.22882	1.22989
2018-02-26 02:00:00	1.22987	1.23190	1.22981	1.23151	1.22989	1.23191	1.22982	1.23153

	Mid	Returns
2017-01-02 22:00:00	1.040565	NaN
2017-01-03 22:00:00	1.048930	0.008007
2017-01-04 22:00:00	1.060675	0.011135
2017-01-05 22:00:00	1.053455	-0.006830
2017-01-07 22:00:00	1.053050	-0.000385

	midclose
2017-12-29 17:00:00	1.202155
2017-12-29 18:00:00	1.201370
2017-12-29 19:00:00	1.200930
2017-12-29 20:00:00	1.199815
2017-12-29 21:00:00	1.200705

	lag_1	lag_2	lag_3	lag_4	lag_5
2017-01-03 06:00:00	0	1	1	0	1
2017-01-03 07:00:00	1	0	1	1	0
2017-01-03 16:00:00	0	0	1	0	0
2017-01-03 17:00:00	1	0	0	1	0
2017-01-03 19:00:00	0	1	1	0	0

	lag_1	lag_2	lag_3	lag_4	lag_5
2017-12-29 12:00:00	1	0	1	1	0
2017-12-29 13:00:00	1	1	0	1	1
2017-12-29 14:00:00	0	1	1	0	1
2017-12-29 20:00:00	0	0	0	1	1
2017-12-29 21:00:00	0	0	0	0	1