Dr. Yves J. Hilpisch
Continuum Analytics Europe GmbH
New York City – 27. August 2013
A brief bio:
See www.hilpisch.com
This talk focuses on
It does not address such important issues like
The majority of financial algorithms, e.g. in option pricing, can be casted to an array (matrix) setting. It is therefore paramount to have available flexible and fast array manipulation capabilities.
NumPy provides the basic infrastructure for that. It provides the basic building blocks on which to build financial algorithms.
There are two major advantages:
We simulate the evolution of a stock index level which we assume to follow a geometric Brownian motion
\(dS_t = r S_t dt + \sigma S_t dZ_t\)
with \(S\) the stock price, \(r\) the risk-free rate, \(\sigma\) the volatility and \(Z\) a Brownian motion.
An exact discretization scheme is given for \(t>0\) by
\(S_t = S_{t - \Delta t} \exp \left((r - 0.5 \sigma^2) \Delta t + \sigma \sqrt{\Delta t} z_t \right)\)
with \(z\) being a standard normally distributed random variable.
First, we import needed functions and define some global variables.
#
# Simulating Geometric Brownian Motion with Python
#
from time import time
from math import exp, sqrt, log
from random import gauss
# Parameters
S0 = 100; r = 0.05; sigma = 0.2
T = 1.0; M = 50; dt = T / M
Then we define a function which returns us I simulated index level paths.
# Simulating I paths with M time steps
def genS_py(I):
''' I: number of paths '''
S = []
for i in range(I):
path = []
for t in range(M + 1):
if t == 0:
path.append(S0)
else:
z = gauss(0.0, 1.0)
St = path[t - 1] * exp((r - 0.5 * sigma ** 2) * dt
+ sigma * sqrt(dt) * z)
path.append(St)
S.append(path)
return S
Let's see how long the simulation takes.
I = 100000
%time S = genS_py(I)
CPU times: user 18.8 s, sys: 152 ms, total: 19 s Wall time: 19.4 s
plot(S[10])
grid(True)
We use the log version of the discretization scheme to fully apply NumPy capabilities.
from numpy import *
def genS_np(I):
''' I: number of paths '''
S = S0 * exp(cumsum((r - 0.5 * sigma ** 2) * dt
+ sigma * sqrt(dt) * standard_normal((M + 1, I)), axis=0))
S[0] = S0
return S
%time S = genS_np(I)
CPU times: user 884 ms, sys: 184 ms, total: 1.07 s Wall time: 1.1 s
This leads to a considerable speed-up and to a quite compact syntax – the algorithm is "fully NumPy".
First, the parameters for pricing an option on a stock index level.
from math import exp, sqrt
# Option Parameters
S0 = 105.0 # initial index level
K = 100.0 # strike price
T = 1. # call option maturity
r = 0.05 # constant short rate
vola = 0.25 # constant volatility factor of diffusion
# Time Parameters
M = 1000 # time steps
dt = T / M # length of time interval
df = exp(-r * dt) # discount factor per time interval
# Binomial Parameters
u = exp(vola * sqrt(dt)) # up-movement
d = 1 / u # down-movement
q = (exp(r * dt) - d) / (u - d) # martingale probability
The loop-heavy implementation which is already based on NumPy arrays.
import numpy as np
def oVal_py():
# Array Initialization for Index Levels
S = np.zeros((M + 1, M + 1), dtype=np.float64) # index level array
S[0, 0] = S0
z = 0
for j in range(1, M + 1, 1):
z += 1
for i in range(z + 1):
S[i, j] = S[0, 0] * (u ** j) * (d ** (i * 2))
# Array Initialization for Inner Values
iv = np.zeros((M + 1, M + 1), dtype=np.float64) # inner value array
z = 0
for j in range(0, M + 1, 1):
for i in range(z + 1):
iv[i, j] = max(S[i, j] - K, 0)
z += 1
# Valuation by Risk-Neutral Discounting
pv = np.zeros((M + 1, M + 1), dtype=np.float64) # present value array
pv[:, M] = iv[:, M] # initialize last time point
z = M + 1
for j in range(M - 1, -1, -1):
z -= 1
for i in range(z):
pv[i, j] = (q * pv[i, j + 1] + (1 - q) * pv[i + 1, j + 1]) * df
return pv[0, 0]
The execution takes quite a bit of time.
%timeit C = oVal_py()
1 loops, best of 3: 6.31 s per loop
Numba is an open-source, NumPy-aware optimizing compiler for Python sponsored by Continuum Analytics, Inc. It uses the remarkable LLVM compiler infrastructure to compile Python byte-code to machine code especially for use in the NumPy run-time and SciPy modules.
From Numba we can expect some nice performance improvements ...
from numba import *
oVal_nb = autojit(oVal_py)
%timeit C = oVal_nb()
1 loops, best of 3: 282 ms per loop
... with only one two lines of additional code.
The NumPy version avoids all but one loop on the Python level.
def oVal_np():
# Array Initialization for Index Levels
mu = arange(M + 1)
mu = resize(mu, (M + 1, M + 1))
md = transpose(mu)
mu = u ** (mu - md)
md = d ** md
S = S0 * mu * md
# Valuation Algorithm
V = maximum(S - K, 0) # inner values for European call option
Qu = zeros((M + 1, M + 1), dtype=np.float64)
Qu[:, :] = q # upwards martingale probabilities
Qd = 1 - Qu # downwards martingale probabilities
z = 0
for t in range(M - 1, -1, -1): # backwards iteration
V[0:M - z, t] = (Qu[0:M - z, t] * V[0:M - z, t + 1]
+ Qd[0:M - z, t] * V[1:M - z + 1, t + 1]) * df
z += 1
return V[0, 0]
The NumPy version approximately as fast as the Numba version.
%timeit C = oVal_np()
1 loops, best of 3: 349 ms per loop
This example illustrates how to implement a parallel valuation of American options by Monte Carlo simulation. The algorithm used is the Least-Squares Monte Carlo algorithm as proposed in Longstaff-Schwartz (2001): "Valuing American Options by Simulation: A Simple Least-Squares Approach." Review of Financial Studies, Vol. 14, 113-147. We observe speed-ups that are almost linear in the number of cores.
The following is an implementation of the Least-Squares Monte Carlo algorithm (LSM) of Longstaff-Schwartz (2001).
def optionValue(S0, vol, T, K=40, M=50, I=4096, r=0.06):
import numpy as np
np.random.seed(150000) # fix the seed for every valuation
dt = T / M # time interval
df = np.exp(-r * dt) # discount factor per time time interval
# Simulation of Index Levels
S = np.zeros((M + 1, I), dtype=np.float64) # stock price matrix
S[0, :] = S0 # intial values for stock price
for t in range(1, M + 1):
ran = np.random.standard_normal(I / 2)
ran = np.concatenate((ran, -ran)) # antithetic variates
ran = ran - np.mean(ran) # correct first moment
ran = ran / np.std(ran) # correct second moment
S[t, :] = S[t - 1, :] * np.exp((r - vol ** 2 / 2) * dt
+ vol * ran * np.sqrt(dt))
h = np.maximum(K - S, 0) # inner values for put option
V = np.zeros_like(h) # value matrix
V[-1] = h[-1]
# Valuation by LSM
for t in range(M - 1, 0, -1):
rg = np.polyfit(S[t, :], V[t + 1, :] * df, 5) # regression
C = np.polyval(rg, S[t, :]) # evaluation of regression
V[t, :] = np.where(h[t, :] > C, h[t, :],
V[t + 1, :] * df) # exercise decision/optimization
V0 = np.sum(V[1, :] * df) / I # LSM estimator
print "S0 %4.1f|vol %4.2f|T %2.1f| Option Value %8.3f" % (S0, vol, T, V0)
return V0
We want to replicate the whole table 1 of the seminal paper.
def seqValue():
optionValues = []
for S0 in (36., 38., 40., 42., 44.): # initial stock price values
for vol in (0.2, 0.4): # volatility values
for T in (1.0, 2.0): # times-to-maturity
optionValues.append(optionValue(S0, vol, T))
return optionValues
Now, we measure the time for the 20 different American put options of that table 1 with sequential execution.
from time import time
t0 = time()
optionValues = seqValue() # calculate all values
t1 = time(); d1 = t1 - t0
print "Duration in Seconds %6.3f" % d1
S0 36.0|vol 0.20|T 1.0| Option Value 4.500 S0 36.0|vol 0.20|T 2.0| Option Value 4.854 S0 36.0|vol 0.40|T 1.0| Option Value 7.112
First, start a local cluster, if you have multiple cores in your machine.
# in the shell ...
####
# ipcluster start -n 2
####
# or maybe more cores
Enable parallel computing capabilities.
from IPython.parallel import Client
cluster_profile = "default"
# the profile is set to the default one
# see IPython docs for further details
c = Client(profile=cluster_profile)
view = c.load_balanced_view()
Again, a loop for the 20 options of table 1. This time asynchronously distributed to the 2 cores.
def parValue():
optionValues = []
for S in (36., 38., 40., 42., 44.):
for vol in (0.2, 0.4):
for T in (1.0, 2.0):
value = view.apply_async(optionValue, S, vol, T)
# asynchronously calculate the option values
optionValues.append(value)
return optionValues
Now, we measure the time needed with parallel execution.
def execution():
optionValues = parValue() # calculate all values
print "Submitted tasks %d" % len(optionValues)
c.wait(optionValues)
# wait for all tasks to be finished
return optionValues
t0 = time()
optionValues = execution()
t1 = time(); d2 = t1 - t0
print "Duration in Seconds %6.3f" % d2
Submitted tasks 20 Duration in Seconds 2.712
d1 / d2 # speed-up of parallel execution
1.6769731386025957
You can inspect the metadata and access single data fields from the results dictionary.
optionValues[0].metadata
{'after': [], 'completed': datetime.datetime(2013, 8, 27, 18, 2, 17, 793919), 'data': {}, 'engine_id': 0, 'engine_uuid': u'597b14f1-ecfb-4a50-a35d-aa44897997b9', 'follow': [], 'msg_id': u'089767bb-c1f6-4a22-9daa-8247ef700f74', 'outputs': [], 'outputs_ready': True, 'pyerr': None, 'pyin': None, 'pyout': None, 'received': datetime.datetime(2013, 8, 27, 18, 2, 17, 798400), 'started': datetime.datetime(2013, 8, 27, 18, 2, 17, 511909), 'status': u'ok', 'stderr': '', 'stdout': u'S0 36.0|vol 0.20|T 1.0| Option Value 4.500\n', 'submitted': datetime.datetime(2013, 8, 27, 18, 2, 17, 506365)}
The whole output of the 20 valuations.
for result in optionValues:
print result.metadata['stdout'],
S0 36.0|vol 0.20|T 1.0| Option Value 4.500 S0 36.0|vol 0.20|T 2.0| Option Value 4.854 S0 36.0|vol 0.40|T 1.0| Option Value 7.112 S0 36.0|vol 0.40|T 2.0| Option Value 8.472 S0 38.0|vol 0.20|T 1.0| Option Value 3.262 S0 38.0|vol 0.20|T 2.0| Option Value 3.723 S0 38.0|vol 0.40|T 1.0| Option Value 6.112 S0 38.0|vol 0.40|T 2.0| Option Value 7.682 S0 40.0|vol 0.20|T 1.0| Option Value 2.305 S0 40.0|vol 0.20|T 2.0| Option Value 2.861 S0 40.0|vol 0.40|T 1.0| Option Value 5.258 S0 40.0|vol 0.40|T 2.0| Option Value 6.899 S0 42.0|vol 0.20|T 1.0| Option Value 1.569 S0 42.0|vol 0.20|T 2.0| Option Value 2.164 S0 42.0|vol 0.40|T 1.0| Option Value 4.544 S0 42.0|vol 0.40|T 2.0| Option Value 6.171 S0 44.0|vol 0.20|T 1.0| Option Value 1.020 S0 44.0|vol 0.20|T 2.0| Option Value 1.602 S0 44.0|vol 0.40|T 1.0| Option Value 3.912 S0 44.0|vol 0.40|T 2.0| Option Value 5.591
The following example is taken from Eurex's VSTOXX Advanced Services intiative which uses Python to illustrate volatility concepts and volatilty product features, e.g. of VSTOXX Futures and VSTOXX options. The complete tutorials and Python scripts are available under
See also the interview with me about this Python-based initiative:
In the model of Andreas Gruenbichler and Francis Longstaff (1996): "Valuing Futures and Options on Volatility." Journal of Banking and Finance, Vol. 20, 985 – 1001, (implied) volatility is assumed to follow a square-root diffusion process
\[ dV_t = \kappa (\theta - V_t) dt + \sigma \sqrt{V_t} dZ_t\]
where \(Z\) is a Brownian motion. GL96 derive a semi-analytical pricing formula for European call if volatility follows this stochastic differential equation. The following two Python functions implement their formula.
First, a helper function for the Chi-Squared Density.
def cx(K, gamma, nu, lambda_V):
'''Complementary Distribution Function of Non-Central Chi-Squared Density
K: strike price
gamma: as defined in the GL96 model
nu: degrees of freedom
lambda_V: non-centrality parameter '''
from scipy.stats import ncx2
return 1 - ncx2.cdf(gamma * K, nu, lambda_V)
Now the option pricing function for European calls on volatility.
def eurCallGL96(V0, kappa_V, theta_V, sigma_V, zeta_V, T, r, K):
'''Calcuation of European Call Option Price in GL96 Model
V0: current volatility level
kappa_V: mean-reversion factor
theta_V: long-run mean of volatility
sigma_V: volatility of volatility
zeta_V: volatility risk premium
T: time-to-maturity
r: risk-free short rate
K: strike price of the option '''
from math import exp
D = exp(-r * T) # discount factor
# variables
alpha = kappa_V * theta_V
beta = kappa_V + zeta_V
gamma = 4 * beta / (sigma_V ** 2 * (1 - exp(-beta * T)))
nu = 4 * alpha / sigma_V ** 2
lambda_V = gamma * exp(-beta * T) * V0
# formula for European call price
C = D * exp(-beta * T) * V0 * cx(K, gamma, nu + 4, lambda_V) \
+ D * (alpha / beta) * (1 - exp(-beta * T)) \
* cx(K, gamma, nu + 2, lambda_V) \
- D * K * cx(K, gamma, nu, lambda_V)
return C
A simple pricing example.
# Model Parameters
V0 = 21.39 # initial level of volatility index
kappa_V = 0.1 # speed of mean reversion
theta_V = 20.0 # long-term index level
sigma_V = 2.0 # volatility of volatility
zeta_V = 0.0 # factor of the expected volatility risk premium
r = 0.02 # risk-free interest rate
# Option Parameters
K = 22.0 # strike
T = 1.0 # time horizon
C = eurCallGL96(V0, kappa_V, theta_V, sigma_V, zeta_V, T, r, K)
print "Value of European Call according to GL96: %10.4f" % C
Value of European Call according to GL96: 3.0969
We first need to import some market data to calibrate the model to.
import pandas as pd
h5 = pd.HDFStore('VSTOXX_Opt_Fut.h5', 'r')
vso_mar = h5['vso_mar']
h5.close()
Some selected data from the DataFrame object.
vso_mar[['Strike Price', 'Last Price', 'Date', 'Time']].ix[10:20]
Strike Price | Last Price | Date | Time | |
---|---|---|---|---|
10 | 35.0 | 0.75 | 12/28/2012 | 18:33:48 |
11 | 32.5 | 0.90 | 12/28/2012 | 18:33:48 |
12 | 30.0 | 1.10 | 12/28/2012 | 18:33:48 |
13 | 29.0 | 1.20 | 12/28/2012 | 18:33:48 |
14 | 28.0 | 1.35 | 12/28/2012 | 18:33:48 |
15 | 27.0 | 1.45 | 12/28/2012 | 18:33:48 |
16 | 26.0 | 1.65 | 12/28/2012 | 18:33:48 |
17 | 25.0 | 1.85 | 12/28/2012 | 18:33:48 |
18 | 24.0 | 2.05 | 12/28/2012 | 18:33:48 |
19 | 23.0 | 2.35 | 12/28/2012 | 18:33:48 |
20 | 22.0 | 2.70 | 12/28/2012 | 18:33:48 |
We take a sub-set of the option data for the calibration and define some fixed parameters.
kL = np.array(vso_mar['Strike Price'][12:25])
vL = np.array(vso_mar['Last Price'][12:25])
# Fixed Parameters
days = 3 + 31 + 28 + 15 # time-to-maturity in days
T = days / 365. # time-to-maturity in years
r = 0.01 # risk-less short rate
V0 = 21.3534 # starting level of VSTOXX (28.12.2012)
zeta_V = 0. # volatility risk premium factor
Now, we define a function which gives us back an array of the whole set of model option prices.
def valFunc(p0):
'''Valuation Function for Array of Strike Prices
p0: set of parameters for calibration '''
kappa_V, theta_V, sigma_V = p0
GL96val = []
for K in kL:
GL96val.append(eurCallGL96(V0, kappa_V, theta_V,
sigma_V, zeta_V, T, r, K))
return np.array(GL96val)
Next, the error function which is to be minimized.
def errFunc(p0):
'''Error Function for GL96 Calibration
p0: set of parameters for calibration '''
GL96val = valFunc(p0)
kappa_V, theta_V, sigma_V = p0
pen = 0.
if 2 * kappa_V * theta_V < sigma_V ** 2:
pen = 1000.0 # penalty if Feller condition is not met
if kappa_V < 0 or theta_V < 0 or sigma_V < 0:
pen = 1000.0 # penalty for negative parameters
MSE = sum((GL96val - vL) ** 2) / len(vL) + pen
# print p0.round(3), "%16.4f" % MSE
return MSE
The calibration strategy rests on both global and local minimization of the error function. The result of the former is used as input for the latter.
import scipy.optimize as sop
%time p0 = sop.brute(errFunc, ((1.0, 15.1, 2.0), (10., 40.1, 2.5), (2.0, 18.1, 4.0)), finish=None)
CPU times: user 4.82 s, sys: 0 ns, total: 4.82 s Wall time: 4.85 s
errFunc(p0).round(5)
0.02588
Take p0 for the the local minimization.
%time opt = sop.fmin(errFunc, p0, xtol=0.000001, ftol=0.000001, maxiter=1000, maxfun=1500)
Optimization terminated successfully. Current function value: 0.012411 Iterations: 337 Function evaluations: 589 CPU times: user 5.53 s, sys: 0 ns, total: 5.53 s Wall time: 5.59 s
We want to inspect the quality of the calibration graphically.
def genPlot(opt):
''' Function to compare model values with market quotes.
opt: (optimal) input parameter set. '''
GL96val = valFunc(opt)
figure()
subplot(211)
plot(kL, vL, label='Market Quotes')
plot(kL, GL96val, 'ro', label='Model Prices')
xlabel('Strike Price')
ylabel('Option Value')
grid(True)
legend()
axis([min(kL) - 0.5, max(kL) + 0.5, 0.0, max(vL) * 1.1])
subplot(212)
wi = 0.3
bar(kL - wi / 2, GL96val - vL, width=wi)
grid(True)
ylabel('Difference')
axis([min(kL) - 0.5, max(kL) + 0.5,
min(GL96val - vL) * 1.1, max(GL96val - vL) * 1.1])
The calibration seems not too bad.
genPlot(opt)
It is a stylized fact that stock indexes and related volatility indexes are highly negatively correlated. The following example analyzes this based on the EURO STOXX 50 stock index and the VSTOXX volatility index using Ordinary Least-Squares regession (OLS).
First we collect historical data for both the EURO STOXX 50 stock and the VSTOXX volatility index.
import pandas as pd
from urllib import urlretrieve
es_url = 'http://www.stoxx.com/download/historical_values/hbrbcpe.txt'
vs_url = 'http://www.stoxx.com/download/historical_values/h_vstoxx.txt'
urlretrieve(es_url, 'es.txt')
urlretrieve(vs_url, 'vs.txt')
('vs.txt', <httplib.HTTPMessage instance at 0x7278200>)
The EURO STOXX 50 data is not yet in the right format. Some house cleaning is necessary.
lines = open('es.txt').readlines()
new_file = open('es50.txt', 'w')
new_file.writelines('date' + lines[3][:-1].replace(' ', '') + ';DEL' + lines[3][-1])
new_file.writelines(lines[4:-1])
print list(open('es50.txt'))[:5]
['date;SX5P;SX5E;SXXP;SXXE;SXXF;SXXA;DK5F;DKXF;DEL\n', '31.12.1986;775.00 ; 900.82 ; 82.76 ; 98.58 ; 98.06 ; 69.06 ; 645.26 ; 65.56\n', '01.01.1987;775.00 ; 900.82 ; 82.76 ; 98.58 ; 98.06 ; 69.06 ; 645.26 ; 65.56\n', '02.01.1987;770.89 ; 891.78 ; 82.57 ; 97.80 ; 97.43 ; 69.37 ; 647.62 ; 65.81\n', '05.01.1987;771.89 ; 898.33 ; 82.82 ; 98.60 ; 98.19 ; 69.16 ; 649.94 ; 65.82\n']
Now, the data can be safely read into a DataFrame object.
es = pd.read_csv('es50.txt', index_col=0, parse_dates=True, sep=';', dayfirst=True)
del es['DEL'] # delete the helper column
es
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 6862 entries, 1986-12-31 00:00:00 to 2013-08-08 00:00:00 Data columns (total 8 columns): SX5P 6862 non-null values SX5E 6862 non-null values SXXP 6862 non-null values SXXE 6861 non-null values SXXF 6861 non-null values SXXA 6861 non-null values DK5F 6861 non-null values DKXF 6861 non-null values dtypes: float64(8)
The VSTOXX data can be read without touching the raw data.
vs = pd.read_csv('vs.txt', index_col=0, header=2, parse_dates=True, sep=',', dayfirst=True)
# you can alternatively read from the Web source without saving the csv file to disk:
# vs = pd.read_csv(vs_url, index_col=0, header=2, parse_dates=True, sep=',', dayfirst=True)
We now merge the data for further analysis.
from datetime import datetime
data1 = pd.DataFrame({'EUROSTOXX' : es['SX5E'][es.index > datetime(1999, 12, 31)]})
data2 = pd.DataFrame({'VSTOXX' : vs['V2TX'][vs.index > datetime(1999, 12, 31)]})
data = data1.join(data2)
data
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 3487 entries, 2000-01-03 00:00:00 to 2013-08-08 00:00:00 Data columns (total 2 columns): EUROSTOXX 3487 non-null values VSTOXX 3466 non-null values dtypes: float64(2)
Let's inspect the two time series.
data.head()
EUROSTOXX | VSTOXX | |
---|---|---|
date | ||
2000-01-03 | 4849.22 | 30.98 |
2000-01-04 | 4657.83 | 33.22 |
2000-01-05 | 4541.75 | 32.59 |
2000-01-06 | 4500.69 | 31.18 |
2000-01-07 | 4648.27 | 27.44 |
A picture can tell the complete story.
data.plot(subplots=True, grid=True, style='b')
array([<matplotlib.axes.AxesSubplot object at 0x4f30990>, <matplotlib.axes.AxesSubplot object at 0x4f0dc10>], dtype=object)
We now generate log returns for both time series.
rets = np.log(data / data.shift(1))
rets.head()
EUROSTOXX | VSTOXX | |
---|---|---|
date | ||
2000-01-03 | NaN | NaN |
2000-01-04 | -0.040268 | 0.069810 |
2000-01-05 | -0.025237 | -0.019147 |
2000-01-06 | -0.009082 | -0.044229 |
2000-01-07 | 0.032264 | -0.127775 |
To this new data set, also stored in a DataFrame object, we apply OLS.
xdat = pd.rolling_mean(rets['EUROSTOXX'], window=1).shift(0)
ydat = pd.rolling_mean(rets['VSTOXX'], window=1).shift(0)
model = pd.ols(y=ydat, x=xdat)
model
-------------------------Summary of Regression Analysis------------------------- Formula: Y ~ <x> + <intercept> Number of Observations: 3444 Number of Degrees of Freedom: 2 R-squared: 0.5669 Adj R-squared: 0.5668 Rmse: 0.0371 F-stat (1, 3442): 4505.5960, p-value: 0.0000 Degrees of Freedom: model 1, resid 3442 -----------------------Summary of Estimated Coefficients------------------------ Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5% -------------------------------------------------------------------------------- x -2.6942 0.0401 -67.12 0.0000 -2.7729 -2.6155 intercept -0.0007 0.0006 -1.16 0.2459 -0.0020 0.0005 ---------------------------------End of Summary---------------------------------
Some selected results and again a pricture to illustrate the stylized fact in graphic fashion – the results obviously support the stylized fact.
print "beta %8.4f" % model.beta[0]
print "intc %8.4f" % model.beta[1]
print "r**2 %8.4f" % model.r2
beta -2.6942 intc -0.0007 r**2 0.5669
Again, we want to see how our results look graphically.
plot(xdat, ydat, 'r.')
ax = axis() # grab axis values
x = linspace(ax[0], ax[1] + 0.01)
plot(x, model.beta[1] + model.beta[0] * x, 'b', lw=2)
grid(True)
axis('tight')
(-0.10000000000000001, 0.16, -0.43180704799170044, 0.43706494329021089)
Using pandas, the code that follows reads intraday, high-frequency data from a Web source, plots it and resamples it.
date = datetime.now()
url = 'http://hopey.netfonds.no/posdump.php?date=%d0%d%d&paper=AAPL.O&csv_format=csv' \
% (date.year, date.month, date.day - 1)
# you may have to adjust the date since only recent dates are available
AAPL = pd.read_csv(url, index_col=0, header=0, parse_dates=True)
AAPL
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 3681 entries, 2013-08-26 10:00:02 to 2013-08-26 22:18:35 Data columns (total 6 columns): bid 3681 non-null values bid_depth 3681 non-null values bid_depth_total 3681 non-null values offer 3681 non-null values offer_depth 3681 non-null values offer_depth_total 3681 non-null values dtypes: float64(2), int64(4)
The intraday evolution of the Apple stock price.
AAPL['bid'].plot()
<matplotlib.axes.AxesSubplot at 0x4f39fd0>
A resampling of the data is easily accomplished with pandas.
# this resamples the record frequency to 5 minutes, using mean as aggregation rule
AAPL_5min = AAPL.resample(rule='5min', how='mean')
AAPL_5min.head()
bid | bid_depth | bid_depth_total | offer | offer_depth | offer_depth_total | |
---|---|---|---|---|---|---|
time | ||||||
2013-08-26 10:00:00 | 502.000000 | 400 | 400 | 503.700000 | 175.000000 | 175.000000 |
2013-08-26 10:05:00 | 502.233333 | 400 | 400 | 503.206667 | 233.333333 | 233.333333 |
2013-08-26 10:10:00 | 502.500000 | 200 | 200 | 503.620000 | 500.000000 | 500.000000 |
2013-08-26 10:15:00 | 502.715000 | 200 | 200 | 503.620000 | 200.000000 | 200.000000 |
2013-08-26 10:20:00 | 502.930000 | 200 | 200 | 503.510000 | 100.000000 | 100.000000 |
Let's have a look at the new data set.
AAPL_5min['bid'].plot()
<matplotlib.axes.AxesSubplot at 0x384b450>
Obvioulsly, there hasn't been trading data for every interval.
pandas has a number of approaches to fill in missing data.
AAPL_5min['bid'].fillna(method='ffill').plot()
<matplotlib.axes.AxesSubplot at 0x3a48250>
10 years ago, Python was considered exotic in Finance – at best. Today, Python has become a major force in Finance due to a number of characteristics:
Continuum Analytics, Inc. – the company Web site
Dr. Yves J. Hilpisch – my personal Web site
Derivatives Analytics with Python – my new book
Read an Excerpt and Order the Book
Contact Us
Continuum Analytics is about Python Data Exploration & Visualization.
The team comprises, among others:
Continuum provides the Python community with a number of open source libraries and free solutions to build Python-based data analytics infrastructures:
In June 2013, Continuum Analytics opened its first office and a separate business untit in Europe. We are actively recruiting and are looking for Python enthusiasts with diverse backgrounds.
# Autosave Function for IPyton Notebook
def autosave(interval=3):
"""Autosave the notebook every interval (in minutes)"""
from IPython.core.display import Javascript
interval *= 60 * 1000 # JS wants intervals in miliseconds
tpl = 'setInterval ( "IPython.notebook.save_notebook()", %i );'
return Javascript(tpl % interval)
autosave()
<IPython.core.display.Javascript at 0x39a7350>