Brief Overview and Introduction
The Python Quant Platform is developed and maintained by The Python Quants GmbH. It offers Web-/browser-based data and financial analytics for individuals, teams and organizations. Free registrations are possible under http://trial.quant-platform.com.
You can freely choose your your_user_name and password. You can then login under http://analytics.quant-platform.com, using trial as company in combination with your_user_name and password.
Please note that trial/test accounts are only for illustration purposes and they can be closed at any time (with all data, code, etc. be permanently deleted).
Please read also the Terms & Conditions as well as our Privacy Policy.
If you have questions about the platform or any troubles, you can reach us under platform@pythonquants.com.
At the moment, the Python Quant Platform comprises the following components and features:
rpy2
and IPython Notebook In the left panel of the platform, you find the current working path indicated (in black) as well as the current folder and file structure (as links in purple). Note that in this panel only IPython Notebook files are displayed. Here you can navigate the current folder structure by clicking on a link. Clicking on the double points ".." brings you one level up in the structure. Clicking the refresh button right next to the double points updates the folder/file structure. Clicking on a file link opens the IPython Notebook file.
You find a link to open a new notebook on top of the left panel. With IPython notebooks, like with this one, you can interactively code Python and do data/financial analytics.
print ("Hello Quant World.")
Hello Quant World.
# simple calculations
3 + 4 * 2
11
# working with NumPy arrays
import numpy as np
rn = np.random.standard_normal(100)
rn[:10]
array([ 1.02513028, 0.17631207, 1.24145549, -0.91062576, -0.95066727, -0.24643119, -1.78595937, 0.46583419, -0.04754132, -0.46144837])
# plotting
import matplotlib.pyplot as plt
%matplotlib inline
plt.plot(rn.cumsum())
plt.grid(True)
IPython Notebook as a system shell.
!ls -n
total 44 -rwxr-x--- 1 1141 8 31230 Oct 1 13:00 Python_Big_Data_Platform.ipynb -rwxr-x--- 1 1141 8 2035 Oct 1 12:40 dx_example.py -rw-r--r-- 1 1141 8 4808 Sep 26 15:23 perf_tests.ipynb
!mkdir test
!ls
Python_Big_Data_Platform.ipynb dx_example.py perf_tests.ipynb test
!rm -r test
IPython Notebook as a media integrator. Here a talk by Yves about "Interactive Analytics of Large Financial Data Sets" with Python & IPython.
from IPython.display import YouTubeVideo
YouTubeVideo(id="XyqlduIcc2g", width=700, height=400)
Combining the pandas library with IPython Notebook makes for a powerful financial analytics environment.
import pandas as pd
import pandas.io.data as web
AAPL = web.DataReader('AAPL', data_source='google')
# reads data from Google Finance
AAPL['42d'] = pd.rolling_mean(AAPL['Close'], 42)
AAPL['252d'] = pd.rolling_mean(AAPL['Close'], 252)
# 42d and 252d trends
AAPL[['Close', '42d', '252d']].plot(figsize=(10, 5))
<matplotlib.axes._subplots.AxesSubplot at 0x7f6cdf3417d0>
DX Analytics is a Python library for advanced financial and derivatives analytics written by The Python Quants. It is particularly suited to model multi-risk derivatives and to do a consistent valuation of portfolios of complex derivatives. It mainly uses Monte Carlo simulation since it is the only numerical method capable of valuing and risk managing complex, multi-risk derivatives books.
An example with an European maximum call option on two underlyings.
import dx
%run dx_example.py
# sets up market environments
# and defines derivative instrument
max_call.payoff_func
# payoff of a maximum call option
# on two underlyings (European exercise)
"np.maximum(np.maximum(maturity_value['gbm'], maturity_value['jd']) - 34., 0)"
max_call.vega('jd')
# numerical Vega with respect
# to one risk factor
5.9487999999999985
We are going to generate a Vega surface for one risk factor with respect to the initial values of both risk factors.
asset_1 = np.arange(28., 46.1, 2.)
asset_2 = asset_1
a_1, a_2 = np.meshgrid(asset_1, asset_2)
value = np.zeros_like(a_1)
%%time
vega_gbm = np.zeros_like(a_1)
for i in range(np.shape(vega_gbm)[0]):
for j in range(np.shape(vega_gbm)[1]):
max_call.update('gbm', initial_value=a_1[i, j])
max_call.update('jd', initial_value=a_2[i, j])
vega_gbm[i, j] = max_call.vega('gbm')
CPU times: user 3.91 s, sys: 2 ms, total: 3.91 s Wall time: 3.91 s
dx.plot_greeks_3d([a_1, a_2, vega_gbm], ['gbm', 'jd', 'vega gbm'])
# Vega surface plot
Monte Carlo simulation is a computationally demanding task that nowadays in the financial industry is implemented generally on a large scale (eg for Value-at-Risk or Credit-Value-Adjustment calculations).
import math
This function simulates a geometric Brownian motion.
def simulate_geometric_brownian_motion(p):
M, I = p
# time steps, paths
S0 = 100; r = 0.05; sigma = 0.2; T = 1.0
# model parameters
dt = T / M
paths = np.zeros((M + 1, I))
paths[0] = S0
for t in range(1, M + 1):
paths[t] = paths[t - 1] * np.exp((r - 0.5 * sigma ** 2) * dt +
sigma * math.sqrt(dt) * np.random.standard_normal(I))
return paths
An example simulation with the function.
%time paths = simulate_geometric_brownian_motion((50, 100000))
# example simulation
CPU times: user 294 ms, sys: 1 ms, total: 295 ms Wall time: 295 ms
plt.plot(paths[:, :10]); plt.grid()
Now using the multiprocessing
module of Python.
from time import time
import multiprocessing as mp
I = 7500 # number of paths
M = 50 # number of time steps
t = 100 # number of tasks/simulations
# running with a max of 8 cores
times = []
for w in range(1, 9):
t0 = time()
pool = mp.Pool(processes=w)
result = pool.map(simulate_geometric_brownian_motion, t * [(M, I), ])
times.append(time() - t0)
And the performance results visualized.
plt.plot(range(1, 9), times)
plt.plot(range(1, 9), times, 'ro')
plt.grid(True)
plt.xlabel('number of threads')
plt.ylabel('time in seconds')
plt.title('%d Monte Carlo simulations' % t)
<matplotlib.text.Text at 0x7f6ccc07e0d0>
We analyze the statistical correlation between the EURO STOXX 50 stock index and the VSTOXX volatility index.
First the EURO STOXX 50 data.
import pandas as pd
cols = ['Date', 'SX5P', 'SX5E', 'SXXP', 'SXXE',
'SXXF', 'SXXA', 'DK5F', 'DKXF', 'DEL']
es_url = 'http://www.stoxx.com/download/historical_values/hbrbcpe.txt'
try:
es = pd.read_csv(es_url, # filename
header=None, # ignore column names
index_col=0, # index column (dates)
parse_dates=True, # parse these dates
dayfirst=True, # format of dates
skiprows=4, # ignore these rows
sep=';', # data separator
names=cols) # use these column names
# deleting the helper column
del es['DEL']
except:
# read stored data if there is no Internet connection
es = pd.HDFStore('data/SX5E.h5', 'r')['SX5E']
Second, the VSTOXX data.
vs_url = 'http://www.stoxx.com/download/historical_values/h_vstoxx.txt'
try:
vs = pd.read_csv(vs_url, # filename
index_col=0, # index column (dates)
parse_dates=True, # parse date information
dayfirst=True, # day before month
header=2) # header/column names
except:
# read stored data if there is no Internet connection
vs = pd.HDFStore('data/V2TX.h5', 'r')['V2TX']
Bridging to R from within IPython Notebook and pushing Python data to the R run-time.
%load_ext rpy2.ipython
import numpy as np
# log returns for the major indices' time series data
datv = pd.DataFrame({'SX5E' : es['SX5E'], 'V2TX': vs['V2TX']}).dropna()
rets = np.log(datv / datv.shift(1)).dropna()
ES = rets['SX5E'].values
VS = rets['V2TX'].values
%Rpush ES VS
Plotting with R in IPython Notebook.
%R plot(ES, VS, pch=19, col='blue'); grid(); title("Log returns ES50 & VSTOXX")
Linear regression with R.
%R c = coef(lm(VS~ES))
<FloatVector - Python:0x10b3d2b90 / R:0x101be3b50> [-0.000069, -2.753413]
%R plot(ES, VS, pch=19, col='blue'); grid(); abline(c, col='red', lwd=5)
Pulling data from R to Python
%Rpull c
import matplotlib.pyplot as plt
%matplotlib inline
plt.figure(figsize=(9, 6))
plt.plot(ES, VS, 'b.')
plt.plot(ES, c[0] + c[1] * ES, 'r', lw=3)
plt.grid(); plt.xlabel('ES'); plt.ylabel('VS')
<matplotlib.text.Text at 0x10ca23610>
The example we use is a "classical" pair traiding strategy, namely with gold and stocks of gold mining companies both represented by ETFs with symbols
GLD
(GLD background) andGDX
(GDX background), respectively.Example courtesy of Thomas Wiecki (@twiecki).
We use zipline
and PyMC3
for the analysis.
import numpy as np
import pymc as pm
import zipline
import pytz
import datetime as dt
First, we load the data from the Web.
try:
datg = zipline.data.load_from_yahoo(stocks=['GLD', 'GDX'],
end=dt.datetime(2014, 3, 15, 0, 0, 0, 0, pytz.utc)).dropna()
except:
datg = pd.HDFStore('data/gold.h5', 'r')['datg']
%matplotlib inline
datg.plot(figsize=(9, 5))
GLD GDX
<matplotlib.axes.AxesSubplot at 0x113bd51d0>
A scatter plot of the value pairs over time and a simple linear regression.
import matplotlib as mpl; import matplotlib.pyplot as plt
mpl_dates = mpl.dates.date2num(data.index)
plt.figure(figsize=(9, 5))
plt.scatter(datg['GDX'], datg['GLD'], c=mpl_dates, marker='o')
reg = np.polyfit(datg['GDX'], datg['GLD'], 1)
plt.plot(datg['GDX'], np.polyval(reg, datg['GDX']), 'r-', lw=3)
plt.grid(True); plt.xlabel('GDX'); plt.ylabel('GLD')
plt.colorbar(ticks=mpl.dates.DayLocator(interval=250),
format=mpl.dates.DateFormatter('%d %b %y'))
<matplotlib.colorbar.Colorbar instance at 0x7f941e855638>
We implement a Bayesian random walk model (I).
model_randomwalk = pm.Model()
with model_randomwalk:
sigma_alpha, log_sigma_alpha = \
model_randomwalk.TransformedVar('sigma_alpha',
pm.Exponential.dist(1. / .02, testval=.1),
pm.logtransform)
sigma_beta, log_sigma_beta = \
model_randomwalk.TransformedVar('sigma_beta',
pm.Exponential.dist(1. / .02, testval=.1),
pm.logtransform)
We implement a Bayesian random walk model (II).
from pymc.distributions.timeseries import GaussianRandomWalk
# take samples of 50 elements each
subsample_alpha = 50
subsample_beta = 50
with model_randomwalk:
alpha = GaussianRandomWalk('alpha', sigma_alpha**-2,
shape=len(datg) / subsample_alpha)
beta = GaussianRandomWalk('beta', sigma_beta**-2,
shape=len(datg) / subsample_beta)
alpha_r = np.repeat(alpha, subsample_alpha)
beta_r = np.repeat(beta, subsample_beta)
We implement a Bayesian random walk model (III).
with model_randomwalk:
# define regression
regression = alpha_r + beta_r * datg.GDX.values[:1950]
sd = pm.Uniform('sd', 0, 20)
likelihood = pm.Normal('GLD', mu=regression,
sd=sd, observed=datg.GLD.values[:1950])
We implement a Bayesian random walk model (IV).
import warnings; warnings.simplefilter('ignore')
import scipy.optimize as sco
with model_randomwalk:
# first optimize random walk
start = pm.find_MAP(vars=[alpha, beta], fmin=sco.fmin_l_bfgs_b)
# sampling
step = pm.NUTS(scaling=start)
trace_rw = pm.sample(100, step, start=start, progressbar=False)
The plot of the regression coefficients over time.
part_dates = np.linspace(min(mpl_dates), max(mpl_dates), 39)
fig, ax1 = plt.subplots(figsize=(10, 5))
plt.plot(part_dates, np.mean(trace_rw['alpha'], axis=0),
'b', lw=2.5, label='alpha')
for i in range(45, 55):
plt.plot(part_dates, trace_rw['alpha'][i], 'b-.', lw=0.75)
plt.xlabel('date'); plt.ylabel('alpha'); plt.axis('tight')
plt.grid(True); plt.legend(loc=2)
ax1.xaxis.set_major_formatter(mpl.dates.DateFormatter('%d %b %y') )
ax2 = ax1.twinx()
plt.plot(part_dates, np.mean(trace_rw['beta'], axis=0),
'r', lw=2.5, label='beta')
for i in range(45, 55):
plt.plot(part_dates, trace_rw['beta'][i], 'r-.', lw=0.75)
plt.ylabel('beta'); plt.legend(loc=4); fig.autofmt_xdate()
The plot of the regression lines over time.
plt.figure(figsize=(10, 5))
plt.scatter(datg['GDX'], datg['GLD'], c=mpl_dates, marker='o')
plt.colorbar(ticks=mpl.dates.DayLocator(interval=250),
format=mpl.dates.DateFormatter('%d %b %y'))
plt.grid(True); plt.xlabel('GDX'); plt.ylabel('GLD')
x = np.linspace(min(datg['GDX']), max(datg['GDX']))
for i in range(39):
alpha_rw = np.mean(trace_rw['alpha'].T[i])
beta_rw = np.mean(trace_rw['beta'].T[i])
plt.plot(x, alpha_rw + beta_rw * x, color=plt.cm.jet(256 * i / 39))
Let us apply Multi-Variate Auto Regression to the financial time series data we have:
Let us resample and join the data sets.
datv.index = datv.index.tz_localize(pytz.utc)
datf = datg.join(datv, how='left')
datf = datf.resample('1M', how='last')
# resampling to monthly data
# datf = datf / datf.ix[0] * 100
# uncomment for normalized starting values
# datf = np.log(datf / datf.shift(1)).dropna()
# uncomment for log return based analysis
The starting values of the time series data we use.
datf.head()
GDX | GLD | SX5E | V2TX | |
---|---|---|---|---|
Date | ||||
2006-05-31 00:00:00+00:00 | 36.91 | 64.23 | 3637.17 | 23.0529 |
2006-06-30 00:00:00+00:00 | 36.77 | 61.23 | 3648.92 | 18.3282 |
2006-07-31 00:00:00+00:00 | 36.82 | 63.16 | 3691.87 | 18.5171 |
2006-08-31 00:00:00+00:00 | 38.56 | 62.29 | 3808.70 | 16.1689 |
2006-09-30 00:00:00+00:00 | 33.87 | 59.47 | 3899.41 | 16.2455 |
We use the VAR
class of the statsmodels
library.
from statsmodels.tsa.api import VAR
model = VAR(datf)
lags = 5
# number of lags used for fitting
results = model.fit(maxlags=lags, ic='bic')
# model fitted to data
The summary statistics of the model fit.
results.summary()
Summary of Regression Results ================================= Model: VAR Method: OLS Date: Do, 02, Okt, 2014 Time: 11:11:18 -------------------------------------------------------------------- No. of Equations: 4.00000 BIC: 18.7332 Nobs: 94.0000 HQIC: 18.4106 Log likelihood: -1368.55 FPE: 7.95948e+07 AIC: 18.1921 Det(Omega_mle): 6.46927e+07 -------------------------------------------------------------------- Results for equation GDX ========================================================================== coefficient std. error t-stat prob -------------------------------------------------------------------------- const 3.656812 7.741225 0.472 0.638 L1.GDX 0.951813 0.052654 18.077 0.000 L1.GLD -0.019076 0.025755 -0.741 0.461 L1.SX5E -0.000279 0.001442 -0.193 0.847 L1.V2TX 0.050521 0.076826 0.658 0.512 ========================================================================== Results for equation GLD ========================================================================== coefficient std. error t-stat prob -------------------------------------------------------------------------- const 4.570986 13.017883 0.351 0.726 L1.GDX 0.030915 0.088544 0.349 0.728 L1.GLD 0.959570 0.043310 22.156 0.000 L1.SX5E -0.000580 0.002425 -0.239 0.812 L1.V2TX 0.047100 0.129193 0.365 0.716 ========================================================================== Results for equation SX5E ========================================================================== coefficient std. error t-stat prob -------------------------------------------------------------------------- const 673.405496 294.957168 2.283 0.025 L1.GDX 0.716665 2.006218 0.357 0.722 L1.GLD -1.660332 0.981323 -1.692 0.094 L1.SX5E 0.872341 0.054954 15.874 0.000 L1.V2TX -5.246971 2.927232 -1.792 0.076 ========================================================================== Results for equation V2TX ========================================================================== coefficient std. error t-stat prob -------------------------------------------------------------------------- const 3.538398 9.677262 0.366 0.716 L1.GDX 0.005024 0.065822 0.076 0.939 L1.GLD 0.002697 0.032196 0.084 0.933 L1.SX5E 0.000171 0.001803 0.095 0.924 L1.V2TX 0.820594 0.096040 8.544 0.000 ========================================================================== Correlation matrix of residuals GDX GLD SX5E V2TX GDX 1.000000 0.813748 0.061648 -0.158729 GLD 0.813748 1.000000 -0.058480 -0.081772 SX5E 0.061648 -0.058480 1.000000 -0.789066 V2TX -0.158729 -0.081772 -0.789066 1.000000
Historical data and forecasts.
results.plot_forecast(50, figsize=(8, 8), offset='M')
# historical/input data and
# forecasts given model fit