The Python Quants

Combining Multivariate Time Series and Derivatives Analytics

Data Science Luxembourg, 10.02.2016

Dr. Yves J. Hilpisch

yves@tpq.io | http://tpq.io

The Python Quants GmbH

Aboute Me

For the curious:

Multivariate Vector Auto Regression with R

Processing the Data with Python

As an illustrative example we are going to analyze index data for the EURO STOXX 50 stock index and its volatility index VSTOXX. First, some Python imports.

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns; sns.set()
%matplotlib inline
%load_ext rpy2.ipython
import warnings; warnings.simplefilter('ignore')

Second, some R imports.

In [2]:
%%R
library('zoo')
library('xts')
library('vars')

STOXX Limited provides open data for the two indicies on their Web site. We start with the VSTOXX data.

In [3]:
vs_url = 'http://www.stoxx.com/download/historical_values/h_vstoxx.txt'
vs = pd.read_csv(vs_url,  # filename
                 index_col=0,  # index column (dates)
                 parse_dates=True,  # parse date information
                 dayfirst=True, # day before month
                 header=2)  # header/column names
In [4]:
vs = vs.resample('MS')  # resampling to month start frequency

A quick look at the most recent data rows.

In [5]:
vs.tail()
Out[5]:
V2TX V6I1 V6I2 V6I3 V6I4 V6I5 V6I6 V6I7 V6I8
Date
2015-10-01 23.650945 23.610630 24.053955 24.104005 24.420600 24.953082 25.060995 25.282791 25.778995
2015-11-01 22.819967 22.048163 22.769295 22.446665 23.512514 24.191938 24.377910 24.622286 25.236167
2015-12-01 23.773375 27.789483 23.793500 24.163295 24.787850 25.179785 25.328595 25.543580 25.921474
2016-01-01 29.650480 30.799394 29.462415 28.658326 28.718210 28.152100 27.895670 27.711790 27.275525
2016-02-01 29.383320 28.933540 29.541060 28.858660 29.054280 28.685680 28.659340 28.612700 28.353340

We write a sub-set of the data to disk as a CSV file.

In [6]:
vs[['V2TX', 'V6I4', 'V6I8']].to_csv('v2tx.csv')

VAR for the VSTOXX

The data can be read easily with R.

In [7]:
%%R
vd <- read.csv('v2tx.csv')
data <- xts(vd[, -1], as.POSIXct(as.character(vd[,1]), format="%Y-%m-%d"))

Next, we generate a multivariate VAR model in R.

In [8]:
%R mod <- VAR(data, p=12, season=12, type='both');

Fitting of the main index.

In [9]:
%R plot(mod, names='V2TX')

Fitting of two sub-indices (I).

In [10]:
%R plot(mod, names='V6I4')

Fitting of two sub-indices (II).

In [11]:
%R plot(mod, names='V6I8')

Finally, a forecast over 60 months.

In [12]:
%%R
pr = predict(mod, n.ahead=60, ci=0.9, dumvar = NULL)
plot(pr)

Combining VSTOXX & EURO STOXX 50

Retrieval of the EURO STOXX 50 data is a bit more involved.

In [13]:
cols = ['Date', 'SX5P', 'SX5E', 'SXXP', 'SXXE',
        'SXXF', 'SXXA', 'DK5F', 'DKXF', 'DEL']
es_url = 'http://www.stoxx.com/download/historical_values/hbrbcpe.txt'

es = pd.read_csv(es_url,  # filename
             header=None,  # ignore column names
             index_col=0,  # index column (dates)
             parse_dates=True,  # parse these dates
             dayfirst=True,  # format of dates
             skiprows=4,  # ignore these rows
             sep=';',  # data separator
             names=cols)  # use these column names

# deleting the helper column
del es['DEL']

We generate a new data set for the two main indices.

In [14]:
es = es.resample('MS')  # resampling to month start
In [15]:
data = vs.join(es)[['V2TX', 'SX5E']]
In [16]:
data.corr()
Out[16]:
V2TX SX5E
V2TX 1.000000 -0.334486
SX5E -0.334486 1.000000

In what follows, we work with normalized data.

In [17]:
data = data / data.ix[0] * 100
In [18]:
data.tail()
Out[18]:
V2TX SX5E
Date
2015-10-01 65.470679 93.253030
2015-11-01 63.170358 97.924786
2015-12-01 65.809589 93.627527
2016-01-01 82.078623 86.278605
2016-02-01 81.339070 83.441038

A plot of the data.

In [19]:
data.plot(figsize=(10, 6), subplots=True, color='b');

We write a sub-set of the data again as CSV file to disk.

In [20]:
data[data.index >= '2005-1-1'].to_csv('v2tx_sx5e.csv')

As before, the data stored as CSV file is read into an R object.

In [21]:
%%R
vd <- read.csv('v2tx_sx5e.csv')
data <- xts(vd[, -1], as.POSIXct(as.character(vd[,1]), format="%Y-%m-%d"))

The VAR model and a plot for the stock index.

In [22]:
%R mod <- VAR(data, p=12, season=12, type='both');
In [23]:
%R plot(mod, names='SX5E')

The forecast over a 60 months period with a 90% confidence interval.

In [24]:
%%R
pr = predict(mod, n.ahead=60, ci=0.9, dumvar = NULL)
plot(pr)

Finally, an impulse response analysis.

In [25]:
%%R
ir <- irf(mod, impulse = NULL, response = NULL, n.ahead = 48,
ortho = TRUE, cumulative = FALSE, boot = TRUE, ci = 0.90,
runs = 100, seed = NULL)

The graphical representation of the IRA results.

In [26]:
%R plot(ir)
Hit <Return> to see next plot: 

Model Calibration Data

Derivatives Model Calibration

Model calibration is a numerical optimization routine that looks for such parameters for a given model that best replicate market observed option quotes.

In our case, we fit a square-root diffusion (CIR) process with 3 parameters to options on the VSTOXX volatility index.

$$dv_t = \kappa (\theta - v_t) dt + \sigma \sqrt{v_t} dZ_t$$

%run dx_srd_calibration.py

parameters.set_index('date').to_csv('srd_parameters.csv')

A possible calibration result might look as follows.

In [27]:
from IPython.display import Image 
In [28]:
Image('calibration_result.png', width='75%')
Out[28]:

The Data

We import the data set ...

In [29]:
para = pd.read_csv('srd_parameters.csv', index_col=0, parse_dates=True)
In [30]:
para[['initial_value', 'kappa', 'theta', 'sigma', 'MSE']].head()
Out[30]:
initial_value kappa theta sigma MSE
date
2014-01-02 18.9286 2.735729 17.313088 3.553804 0.002043
2014-01-03 18.0018 2.687111 17.415745 3.648552 0.001941
2014-01-06 18.1672 2.684570 17.438809 3.639782 0.002013
2014-01-07 17.1932 2.711549 17.644289 3.608929 0.004547
2014-01-08 17.3671 2.737476 17.841848 3.592961 0.003493

... and plot it.

In [31]:
para[['initial_value', 'kappa', 'theta', 'sigma', 'MSE']].plot(
        figsize=(10, 12), subplots=True, color='b');

VAR Model for the Calibration Data

Setting up the VAR Model

We write the relevant sub-set of the data to disk ...

In [32]:
para[['initial_value', 'kappa', 'theta', 'sigma']].to_csv('srd_para_for_r.csv')

... and read it with R.

In [33]:
%%R
vd <- read.csv('srd_para_for_r.csv')
data <- xts(vd[, -1], as.POSIXct(as.character(vd[, 1]), format="%Y-%m-%d"))

The VAR model based on the calibration data.

In [34]:
%R mod <- VAR(data, p=5, season=20, type='both');

For instance, theta shows a good fit.

In [35]:
%R plot(mod, names='theta')

Forecast for the Derivatives Model Parameters

We generate a forecast for 20 trading days.

In [36]:
%%R
pr = predict(mod, n.ahead=20, ci=0.9, dumvar = NULL)
plot(pr)

The forecast results are found in the fcst attribute.

In [37]:
%R attr(pr, 'names')
Out[37]:
array(['fcst', 'endog', 'model', 'exo.fcst'], 
      dtype='|S8')

We pull the results to the Python run-time ...

In [38]:
%%R
ivs <- pr$fcst$initial_value
ks <- pr$fcst$kappa
ts <- pr$fcst$theta
ss <- pr$fcst$sigma
In [39]:
%R ivs[1:5]
Out[39]:
array([ 15.09791646,  12.51026196,  12.1718373 ,  11.15844494,  12.12437766])
In [40]:
%Rpull ivs
%Rpull ks
%Rpull ts
%Rpull ss

... and store them in a pandas DataFrame object.

In [41]:
index = pd.date_range(start='2014-4-1', periods=len(ivs), freq='B')
In [42]:
fc = pd.DataFrame({'initial_value' : ivs[:, 0],
                  'kappa' : ks[:, 0],
                  'theta' : ts[:, 0],
                  'sigma' : ss[:, 0]}, index=index)

The forecast results plotted with Python.

In [43]:
fc.plot(figsize=(10, 10), subplots=True, color='b');

DX Analytics

Background and Philosophy

DX Analytics is a Python library for advanced financial and derivatives analytics authored and maintained by The Python Quants. It is particularly suited to model multi-risk and exotic derivatives and to do a consistent valuation of complex derivatives portfolios. It mainly uses Monte Carlo simulation since it is the only numerical method capable of valuing and risk managing complex, multi-risk derivatives books.

DX Analytics is open source, cf. http://dx-analytics.com & http://github.com/yhilpisch/dx.

The development of the library is guided by two central principles.

  • global valuation approach: in practice, this approach translates into the non-redundant modeling of all risk factors (e.g. option underlyings like equity indexes) and the valuation of all derivative instruments by a unique, consistent numerical method — which is Monte Carlo simulation in the case of DX Analytics
  • unlimited computing resources: in 2016, the technical infrastructures available to even smaller players in the financial industry have reached performance levels that 10 years ago seemed impossible or at least financially not feasible; in that sense "unlimited resources" is not to be understood literally but rather as the guiding principle that hardware and computing resources generally are no longer a bottleneck

Example: DX Analytics — Quick Start

A quick overview of how the library works.

http://dx-analytics.com/00_dx_quickstart.html

Example: DX Analytics — Complex

This example values a portfolio with 50 (correlated) risk factors and 250 options (European/American exercise) via Monte Carlo simulation in risk-neutral fashion with stochastic short rates.

http://dx-analytics.com/13_dx_quite_complex_portfolios.html

Modeling the VSTOXX Index

In what follows, we model the VSTOXX volatility index by a square-root diffusion process with DX Analytics. First, some imports.

In [44]:
import dx
import datetime as dt
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()

All relevant data is stored in a market_environment object.

In [45]:
r = dx.constant_short_rate('r', 0.01)
me = dx.market_environment('me', para.index[0])
me.add_constant('initial_value', para['initial_value'].ix[0])
me.add_constant('kappa', para['kappa'].ix[0])
me.add_constant('theta', para['theta'].ix[0])
me.add_constant('volatility', para['sigma'].ix[0])
me.add_constant('final_date', dt.datetime(2014, 12, 31))
me.add_constant('currency', 'EUR')
me.add_constant('paths', 50000)
me.add_constant('frequency', 'W')
me.add_curve('discount_curve', r)

With this data, we instantiate the simulation object and simulate the process.

In [46]:
vstoxx = dx.square_root_diffusion('vstoxx', me)
In [47]:
paths = vstoxx.get_instrument_values()

The first 10 paths from the Monte Carlo simulation.

In [48]:
plt.figure(figsize=(10, 6));
plt.plot(vstoxx.time_grid, paths[:, :10]);
plt.gcf().autofmt_xdate();

Modeling a Derivative Instrument

In a next step, we instantiate a valuation object for a single-risk derivative instrument — in our case a European call option.

In [49]:
me.add_constant('strike', 15) # me.get_constant('initial_value'))
me.add_constant('maturity', me.get_constant('final_date'))
In [50]:
call = dx.valuation_mcs_european_single(name='call', underlying=vstoxx,
        mar_env=me, payoff_func='np.maximum(maturity_value - strike, 0)')

The valuation takes then only a single method call.

In [51]:
call.present_value()
Out[51]:
3.774749

Let us do several valuations for different strikes.

In [52]:
call_values = []
strikes = np.linspace(10, 30, 20)
orig_strike = call.strike
for k in strikes:
    call.update(strike=k)
    call_values.append(call.present_value())
call.update(strike=orig_strike)

The results visualized.

In [53]:
plt.figure(figsize=(10, 6));
plt.plot(strikes, call_values);
plt.xlabel('strike'); plt.ylabel('call value')
Out[53]:
<matplotlib.text.Text at 0x7fd62c271150>

We now take the calibration data and derive/re-calculate values for the option over the first quarter of 2014.

In [54]:
%%time
values = []
for i in range(len(para)):
    initial_value, kappa, theta, sigma = para.ix[i][
        ['initial_value', 'kappa', 'theta', 'sigma']]
    vstoxx.update(initial_value=initial_value, kappa=kappa,
                  theta=theta, volatility=sigma)
    values.append(call.present_value())
CPU times: user 7.85 s, sys: 1.52 s, total: 9.37 s
Wall time: 9.32 s

The result is a time series with the call option values.

In [55]:
plt.figure(figsize=(10, 6));
plt.plot(para.index, values);
plt.ylabel('call value');
plt.gcf().autofmt_xdate();

Next, and maybe a bit more interesting, we do the same with the data that the multivariate VAR analysis has generated as a forecast for April 2014.

In [56]:
%%time
values = []
for i in range(len(fc)):
    initial_value, kappa, theta, sigma = fc.ix[i][
        ['initial_value', 'kappa', 'theta', 'sigma']]
    vstoxx.update(initial_value=initial_value, kappa=kappa,
                  theta=theta, volatility=sigma)
    values.append(call.present_value())
fc['value'] = values
CPU times: user 2.44 s, sys: 476 ms, total: 2.92 s
Wall time: 2.85 s

The European call values based on the parameter forecasts visualized.

In [57]:
fc.plot(figsize=(10, 10), subplots=True, color='b');

Summary

Some major insights from today:

  • Python: Python is good at processing data, i.e. for data logistics
  • Python & R: Python and R can easily interact with each other, exchanging data back and forth
  • R: R provides powerful statistical libraries that might not be avalilable in Python
  • VAR: multivariate vector auto regression can be efficiently and productively applied via the vars package (constant + trend + seasonality)
  • DX: DX Analytics is a Python library for advanced derivatives and risk analytics
  • VAR + DX: we have combined mVAR with DX to generate forecasts of derivatives prices given a history of model parameters

The Python Quants

http://tpq.io | @dyjh | team@tpq.io

Python Quant Platform | http://quant-platform.com

Python for Finance | Python for Finance @ O'Reilly

Derivatives Analytics with Python | Derivatives Analytics @ Wiley Finance

Listed Volatility and Variance Derivatives | Listed VV Derivatives @ Wiley Finance