The Python Quants

Combining Multivariate Time Series and Derivatives Analytics

Python for Quant Finance Meetup, 15.02.2016

Dr. Yves J. Hilpisch

yves@tpq.io | http://tpq.io

The Python Quants GmbH

About Me

For the curious:

Multivariate Vector Auto Regression with R

Processing the Data with Python

As an illustrative example we are going to analyze index data for the EURO STOXX 50 stock index and its volatility index VSTOXX. First, some Python imports.

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns; sns.set()
%matplotlib inline
%load_ext rpy2.ipython
import warnings; warnings.simplefilter('ignore')

Second, some R imports.

In [2]:
%%R
library('zoo')
library('xts')
library('vars')

STOXX Limited provides open data for the two indicies on their Web site. We start with the VSTOXX data.

In [3]:
vs_url = 'http://www.stoxx.com/download/historical_values/h_vstoxx.txt'
vs = pd.read_csv(vs_url,  # filename
                 index_col=0,  # index column (dates)
                 parse_dates=True,  # parse date information
                 dayfirst=True, # day before month
                 header=2)  # header/column names
In [4]:
vs = vs.resample('MS')  # resampling to month start frequency

A quick look at the most recent data rows.

In [5]:
vs.tail()
Out[5]:
V2TX V6I1 V6I2 V6I3 V6I4 V6I5 V6I6 V6I7 V6I8
Date
2015-10-01 23.650945 23.610630 24.053955 24.104005 24.420600 24.953082 25.060995 25.282791 25.778995
2015-11-01 22.819967 22.048163 22.769295 22.446665 23.512514 24.191938 24.377910 24.622286 25.236167
2015-12-01 23.773375 27.789483 23.793500 24.163295 24.787850 25.179785 25.328595 25.543580 25.921474
2016-01-01 29.650480 30.799394 29.462415 28.658326 28.718210 28.152100 27.895670 27.711790 27.275525
2016-02-01 32.171720 32.637760 32.189910 31.089790 30.996920 30.363330 30.206020 29.880490 29.420770

Retrieval of the EURO STOXX 50 data is a bit more involved.

In [6]:
cols = ['Date', 'SX5P', 'SX5E', 'SXXP', 'SXXE',
        'SXXF', 'SXXA', 'DK5F', 'DKXF', 'DEL']
es_url = 'http://www.stoxx.com/download/historical_values/hbrbcpe.txt'
es = pd.read_csv(es_url,  # filename
             header=None,  # ignore column names
             index_col=0,  # index column (dates)
             parse_dates=True,  # parse these dates
             dayfirst=True,  # format of dates
             skiprows=4,  # ignore these rows
             sep=';',  # data separator
             names=cols)  # use these column names
del es['DEL']
es = es.resample('MS')  # resampling to month start

Another quick look at this data set.

In [7]:
es.tail()
Out[7]:
SX5P SX5E SXXP SXXE SXXF SXXA DK5F DKXF
Date
2015-10-01 3122.319545 3275.478636 364.093636 342.184545 428.119091 374.868636 9831.232727 596.840455
2015-11-01 3244.483810 3439.572381 378.570000 358.219048 446.125238 387.472381 10148.546667 622.268095
2015-12-01 3102.559545 3288.632727 365.571818 345.998182 433.103182 374.091818 9908.342273 612.990455
2016-01-01 2884.365000 3030.504500 339.934500 320.753000 403.609500 348.774000 9347.080000 575.960500
2016-02-01 2719.260000 2840.141000 321.522000 301.931000 380.421000 331.262000 8808.030000 543.555000

We write a sub-set of the data to disk as a CSV file.

In [8]:
data = vs.join(es)[['V2TX', 'SX5E']]
data.to_csv('SX5E_V2TX.csv')

A visualization of the data we are going to work with.

In [9]:
data.plot(subplots=True, color='b', figsize=(10, 6));

The time series are negatively correlated.

In [10]:
data.corr()
Out[10]:
V2TX SX5E
V2TX 1.000000 -0.335169
SX5E -0.335169 1.000000

Vector Autoregregression

A VAR model is an $n$-equation, $n$-variable model in which each variable is explained by its own lagged values plus the current and past values of the remaining $n-1$ variables.

Assume we have an economic/financial system with a vector of (demeaned) variables $z_t$. The VAR in structural form is:

$$ Az_t = B_1z_{t-1} + B_2z_{t-2}+ ... + B_pz_{t-p}+u_t $$

with

\begin{eqnarray*} Euu' = \Sigma_u = \begin{bmatrix} \sigma_{u1}^2 \ 0 \ \dots \ 0 \\ 0 \ \sigma_{u2}^2 \ \dots \ 0 \\ \vdots \ \vdots \ \ddots \ \vdots \\ 0 \ 0 \ \dots \ \sigma_{un}^2 \end{bmatrix} \end{eqnarray*}

The model can also be written as:

$$ z_t = A_t^1B_1z_{t-1} + A_t^1B_2z_{t-2}+ ... + A_t^1B_pz_{t-p}+u_t $$

In addition, it holds $E(u_t)=0$, $E(u_tu'_\tau)=\Sigma_u$ for $t=\tau$ and 0 otherwise.

VAR Modeling with R

The data can be read easily with R.

In [11]:
%%R
vd <- read.csv('SX5E_V2TX.csv')
data <- xts(vd[, -1], as.POSIXct(as.character(vd[,1]), format="%Y-%m-%d"))

Next, we generate a multivariate VAR model in R. We start with a simple one.

In [12]:
%%R
mod <- VAR(data,
      p=1,  # one lage only
      type='const');  # only mean relevant

The summary statistics from the model evaluation.

In [13]:
%%R
summary(mod)
VAR Estimation Results:
========================= 
Endogenous variables: V2TX, SX5E 
Deterministic variables: const 
Sample size: 205 
Log Likelihood: -1840.859 
Roots of the characteristic polynomial:
0.9654 0.893
Call:
VAR(y = data, p = 1, type = "const")


Estimation results for equation V2TX: 
===================================== 
V2TX = V2TX.l1 + SX5E.l1 + const 

         Estimate Std. Error t value Pr(>|t|)    
V2TX.l1 0.8844187  0.0361683  24.453   <2e-16 ***
SX5E.l1 0.0004605  0.0004378   1.052    0.294    
const   1.3925538  1.9613288   0.710    0.479    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


Residual standard error: 4.513 on 202 degrees of freedom
Multiple R-Squared: 0.7643,	Adjusted R-squared: 0.7619 
F-statistic: 327.4 on 2 and 202 DF,  p-value: < 2.2e-16 


Estimation results for equation SX5E: 
===================================== 
SX5E = V2TX.l1 + SX5E.l1 + const 

         Estimate Std. Error t value Pr(>|t|)    
V2TX.l1  -1.50875    1.23506  -1.222   0.2233    
SX5E.l1   0.97399    0.01495  65.149   <2e-16 ***
const   119.71537   66.97481   1.787   0.0754 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


Residual standard error: 154.1 on 202 degrees of freedom
Multiple R-Squared: 0.9599,	Adjusted R-squared: 0.9595 
F-statistic:  2419 on 2 and 202 DF,  p-value: < 2.2e-16 



Covariance matrix of residuals:
        V2TX    SX5E
V2TX   20.37  -511.1
SX5E -511.11 23753.7

Correlation matrix of residuals:
        V2TX    SX5E
V2TX  1.0000 -0.7348
SX5E -0.7348  1.0000


Fitting of the EURO STOXX 50 index.

In [14]:
%R plot(mod, names='SX5E')

Forecast with the simple model.

In [15]:
%%R
pr = predict(mod, n.ahead=60, ci=0.9, dumvar = NULL)
plot(pr)

Second, a more sophisticated one.

In [16]:
%%R
mod <- VAR(data,
      p=12,  # using 12 lags
      season=12,  # seasonality of 12 lags
      type='both');  # both mean and trend

Fitting results (I).

In [17]:
%R plot(mod, names='SX5E')

Fitting results (II).

In [18]:
%R plot(mod, names='V2TX')

A forecast over 60 months.

In [19]:
%%R
pr = predict(mod, n.ahead=60, ci=0.9, dumvar = NULL)
plot(pr)

Model Calibration Data

Derivatives Model Calibration

Model calibration is a numerical optimization routine that looks for such parameters for a given model that best replicate market observed option quotes.

In our case, we fit a square-root diffusion (CIR) process with 3 parameters to options on the VSTOXX volatility index.

$$dv_t = \kappa (\theta - v_t) dt + \sigma \sqrt{v_t} dZ_t$$

%run dx_srd_calibration.py

parameters.set_index('date').to_csv('srd_parameters.csv')

A possible calibration result might look as follows.

In [20]:
from IPython.display import Image 
In [21]:
Image('calibration_result.png', width='75%')
Out[21]:

The Data

We import the data set ...

In [22]:
para = pd.read_csv('srd_parameters.csv', index_col=0, parse_dates=True)
In [23]:
para[['initial_value', 'kappa', 'theta', 'sigma', 'MSE']].head()
Out[23]:
initial_value kappa theta sigma MSE
date
2014-01-02 18.9286 2.735729 17.313088 3.553804 0.002043
2014-01-03 18.0018 2.687111 17.415745 3.648552 0.001941
2014-01-06 18.1672 2.684570 17.438809 3.639782 0.002013
2014-01-07 17.1932 2.711549 17.644289 3.608929 0.004547
2014-01-08 17.3671 2.737476 17.841848 3.592961 0.003493

... and plot it.

In [24]:
para[['initial_value', 'kappa', 'theta', 'sigma', 'MSE']].plot(
        figsize=(10, 12), subplots=True, color='b');

VAR Model for the Calibration Data

Setting up the VAR Model

We write the relevant sub-set of the data to disk ...

In [25]:
para = para[['initial_value', 'kappa', 'theta', 'sigma']]
para[para.index < '2014-3-1'].to_csv('srd_para_for_r.csv')
In [26]:
len(para[para.index >= '2014-3-1']) # out-of-sample period
Out[26]:
21

... and read it with R.

In [27]:
%%R
vd <- read.csv('srd_para_for_r.csv')
data <- xts(vd[, -1], as.POSIXct(as.character(vd[, 1]), format="%Y-%m-%d"))

The VAR model based on the calibration data.

In [28]:
%R mod <- VAR(data, p=5, season=5, type='both');

For instance, theta shows a good fit.

In [29]:
%R plot(mod, names='theta')

Forecast for the Derivatives Model Parameters

We generate a forecast for 21 trading days.

In [30]:
%%R
pr = predict(mod, n.ahead=21, ci=0.75, dumvar = NULL)
plot(pr)

The forecast results are found in the fcst attribute.

In [31]:
%R attr(pr, 'names')
Out[31]:
array(['fcst', 'endog', 'model', 'exo.fcst'], 
      dtype='|S8')

We pull the results to the Python run-time ...

In [32]:
%%R
ivs <- pr$fcst$initial_value
ks <- pr$fcst$kappa
ts <- pr$fcst$theta
ss <- pr$fcst$sigma
In [33]:
%Rpull ivs
%Rpull ks
%Rpull ts
%Rpull ss

... and store them in pandas DataFrame objects.

In [34]:
index = pd.date_range(start='2014-3-1', periods=len(ivs), freq='B')
In [35]:
fce = pd.DataFrame({'initial_value' : ivs[:, 0],
                  'kappa' : ks[:, 0],
                  'theta' : ts[:, 0],
                  'sigma' : ss[:, 0]}, index=index)
fce = fce.append(para[para.index == '2014-2-28']).sort()
In [36]:
fcl = pd.DataFrame({'initial_value' : ivs[:, 1],
                  'kappa' : ks[:, 1],
                  'theta' : ts[:, 1],
                  'sigma' : ss[:, 1]}, index=index)
fcl = fcl.append(para[para.index == '2014-2-28']).sort()
In [37]:
fcu = pd.DataFrame({'initial_value' : ivs[:, 2],
                  'kappa' : ks[:, 2],
                  'theta' : ts[:, 2],
                  'sigma' : ss[:, 2]}, index=index)
fcu = fcu.append(para[para.index == '2014-2-28']).sort()

The forecast results plotted with Python.

In [38]:
ax = fce.plot(figsize=(10, 12), subplots=True, style='b--');
fcl.plot(subplots=True, style='r--', ax=ax, label='lower', legend=False);
fcu.plot(subplots=True, style='y--', ax=ax, label='upper', legend=False);
para[['initial_value', 'kappa', 'sigma', 'theta']][
    para.index >= '2014-2-28'].plot(subplots=True,
                    style='b-', ax=ax, legend=False);

DX Analytics

Background and Philosophy

DX Analytics is a Python library for advanced financial and derivatives analytics authored and maintained by The Python Quants. It is particularly suited to model multi-risk and exotic derivatives and to do a consistent valuation of complex derivatives portfolios. It mainly uses Monte Carlo simulation since it is the only numerical method capable of valuing and risk managing complex, multi-risk derivatives books.

DX Analytics is open source, cf. http://dx-analytics.com & http://github.com/yhilpisch/dx.

The development of the library is guided by two central principles.

  • global valuation approach: in practice, this approach translates into the non-redundant modeling of all risk factors (e.g. option underlyings like equity indexes) and the valuation of all derivative instruments by a unique, consistent numerical method — which is Monte Carlo simulation in the case of DX Analytics
  • unlimited computing resources: in 2016, the technical infrastructures available to even smaller players in the financial industry have reached performance levels that 10 years ago seemed impossible or at least financially not feasible; in that sense "unlimited resources" is not to be understood literally but rather as the guiding principle that hardware and computing resources generally are no longer a bottleneck

Example: DX Analytics — Quick Start

A quick overview of how the library works.

http://dx-analytics.com/00_dx_quickstart.html

Example: DX Analytics — Complex

This example values a portfolio with 50 (correlated) risk factors and 250 options (European/American exercise) via Monte Carlo simulation in risk-neutral fashion with stochastic short rates.

http://dx-analytics.com/13_dx_quite_complex_portfolios.html

Modeling the VSTOXX Index

In what follows, we model the VSTOXX volatility index by a square-root diffusion process with DX Analytics. First, some imports.

In [39]:
import dx
import datetime as dt
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()

All relevant data is stored in a market_environment object.

In [40]:
r = dx.constant_short_rate('r', 0.01)
me = dx.market_environment('me', para.index[0])
me.add_constant('initial_value', para['initial_value'].ix[0])
me.add_constant('kappa', para['kappa'].ix[0])
me.add_constant('theta', para['theta'].ix[0])
me.add_constant('volatility', para['sigma'].ix[0])
me.add_constant('final_date', dt.datetime(2014, 4, 18))
me.add_constant('currency', 'EUR')
me.add_constant('paths', 50000)
me.add_constant('frequency', 'W')
me.add_curve('discount_curve', r)

With this data, we instantiate the simulation object and simulate the process.

In [41]:
vstoxx = dx.square_root_diffusion('vstoxx', me)
In [42]:
paths = vstoxx.get_instrument_values()

The first 10 paths from the Monte Carlo simulation.

In [43]:
plt.figure(figsize=(10, 6));
plt.plot(vstoxx.time_grid, paths[:, :10]);
plt.gcf().autofmt_xdate();

A VSTOXX Option

We import options data for VSTOXX options and pick a particular option to do the further analyses.

In [44]:
h5 = pd.HDFStore('vstoxx_march_2014.h5', 'r')
od = h5['vstoxx_options']
od.set_index('DATE', drop=True, inplace=True)
h5.close()
In [45]:
strike = 19.0
odat = od[(od.TYPE == 'C')  # call option
        & (od.STRIKE == strike)  # fixed strike
        & (od.MATURITY == '2014-4-18')]  # fixed maturity date

The options price for the European call option over time.

In [46]:
odat['PRICE'].plot(figsize=(10, 6));

Modeling the VSTOXX Option

In a next step, we instantiate a valuation object for a single-risk derivative instrument — in our case a European call option.

In [47]:
me.add_constant('strike', strike)
me.add_constant('maturity', dt.datetime(2014, 4, 18))
In [48]:
payoff = 'np.maximum(maturity_value - strike, 0)'
call = dx.valuation_mcs_european_single(name='call', underlying=vstoxx,
        mar_env=me, payoff_func=payoff)

The valuation takes then only a single method call.

In [49]:
call.present_value()
Out[49]:
1.925228

We now take the calibration data and derive/re-calculate values for the option over the first quarter of 2014.

In [50]:
%%time
modvalues = []
for i in range(len(para)):
    pricing_date = para.index[i]
    initial_value, kappa, theta, sigma = para.ix[i][
        ['initial_value', 'kappa', 'theta', 'sigma']]
    vstoxx.update(pricing_date=pricing_date, initial_value=initial_value,
                  kappa=kappa, theta=theta, volatility=sigma)
    call.pricing_date = pricing_date
    modvalues.append(call.present_value())
CPU times: user 1.49 s, sys: 181 ms, total: 1.67 s
Wall time: 1.67 s

The result is a time series with the call option values.

In [51]:
plt.figure(figsize=(10, 6));
plt.plot(odat.index, odat['PRICE'], label='market quote');
plt.plot(para.index, modvalues, '--', label='model value');
plt.ylabel('call value');
plt.legend(loc=0)
plt.gcf().autofmt_xdate();

Next, and maybe a bit more interesting, we do a option price forecast for April 2014 based on the VAR results.

In [52]:
%%time
collect = {}
for fc, name in zip([fce, fcl, fcu], ['mean', 'lower', 'upper']):
    values = [odat[odat.index == '2014-2-28']['PRICE']]
    for i in range(1, len(fc)):
        pricing_date = fc.index[i]
        initial_value, kappa, theta, sigma = fc.ix[i][
            ['initial_value', 'kappa', 'theta', 'sigma']]
        vstoxx.update(pricing_date=pricing_date, initial_value=initial_value,
                      kappa=kappa, theta=theta, volatility=sigma)
        call.pricing_date = pricing_date
        values.append(call.present_value())
    collect[name] = values
fcresults = pd.DataFrame(collect, fc.index)
CPU times: user 913 ms, sys: 27 ms, total: 940 ms
Wall time: 909 ms

The European call values based on the parameter forecasts visualized.

In [53]:
plt.figure(figsize=(10, 6));
plt.plot(odat.index, odat['PRICE'], label='market quote');
plt.plot(para.index, modvalues, '--', label='model value');
plt.plot(fcresults.index, fcresults['lower'], label='lower')
plt.plot(fcresults.index, fcresults['mean'], label='mean')
plt.plot(fcresults.index, fcresults['upper'], label='upper')
plt.ylabel('call value'); plt.legend(loc=0); plt.gcf().autofmt_xdate();

Summary

Some major insights from today:

  • Python: Python is good at processing data, i.e. for data logistics
  • Python & R: Python and R can easily interact with each other, exchanging data back and forth
  • R: R provides powerful statistical libraries that might not be avalilable in Python
  • VAR: multivariate vector auto regression can be efficiently and productively applied via the vars package (constant + trend + seasonality)
  • DX: DX Analytics is a Python library for advanced derivatives and risk analytics
  • VAR + DX: we have combined mVAR with DX to generate forecasts of derivatives prices given a history of model parameters

The Python Quants

http://tpq.io | @dyjh | team@tpq.io

Python Quant Platform | http://quant-platform.com

Python for Finance | Python for Finance @ O'Reilly

Derivatives Analytics with Python | Derivatives Analytics @ Wiley Finance

Listed Volatility and Variance Derivatives | Listed VV Derivatives @ Wiley Finance