The Python Quants



Efficient In-Memory Data Analytics with Python and pandas

Dr. Yves J. Hilpisch

The Python Quants GmbH

www.pythonquants.com

yves@pythonquants.com

@dyjh

CodeJam, Forschungszentrum Jülich – 29. Januar 2014

About Me

A brief bio:

  • Managing Partner of The Python Quants
  • Founder of Visixion GmbH – The Python Quants
  • Lecturer Mathematical Finance at Saarland University
  • Focus on Financial Industry and Financial Analytics
  • Book (2013) "Derivatives Analytics with Python"
  • Book (July 2014) "Python for Finance", O'Reilly
  • Dr.rer.pol in Mathematical Finance
  • Graduate in Business Administration
  • Martial Arts Practitioner and Fan

See www.hilpisch.com

Python for Analytics

Corporations, decision makers and analysts nowadays generally face a number of problems with data:

  • sources: data typically comes from different sources, like from the Web, from in-house databases or it is generated in-memory, e.g. for simulation purposes
  • formats: data is generally available in different formats, like SQL databases/tables, Excel files, CSV files, arrays, proprietary formats
  • structure: data typically comes differently structured, be it unstructured, simply indexed, hierarchically indexed, in table form, in matrix form, in multidimensional arrays
  • completeness: real-world data is generally incomplete, i.e. there is missing data (e.g. along an index) or multiple series of data cannot be aligned correctly (e.g. two time series with different time indexes)
  • conventions: for some types of data there a many “competing” conventions with regard to formatting, like for dates and time
  • interpretation: some data sets contain information that can be easily and intelligently interpreted, like a time index, others not, like texts
  • performance: reading, streamlining, aligning, analyzing – i.e. processing – (big) data sets might be slow

In addition to these data-oriented problems, there typically are organizational issues that have to be considered:

  • departments: the majority of companies is organized in departments with different technologies, databases, etc., leading to “data silos”
  • analytics skills: analytical and business skills in general are possessed by people working in line functions (e.g. production) or administrative functions (e.g. finance)
  • technical skills: technical skills, like retrieving data from databases and visualizing them, are generally possessed by people in technology functions (e.g. development, systems operations)

This presentation focuses on

  • Python & pandas as a general purpose data analytics environment
  • interactive analytics examples
  • prototyping-like Python usage

It does not address such important issues like

  • architectural issues regarding hardware and software
  • development processes, testing, documentation and production
  • real world problem modeling

Some Python Fundamentals

Fundamental Python Libraries

A fundamental Python stack for interactive data analytics and visualization should at least contain the following libraries tools:

  • Python – the Python interpreter itself
  • NumPy – high performance, flexible array structures and operations
  • SciPy – collection of scientific modules and functions (e.g. for regression, optimization, integration)
  • pandas – time series and panel data analysis and I/O
  • PyTables – hierarchical, high performance database (e.g. for out-of-memory analytics)
  • matplotlib – 2d and 3d visualization
  • IPython – interactive data analytics, visualization, publishing

It is best to use either the Python distribution Anaconda or the Web-based analytics environment Wakari. Both provide almost "complete" Python environments.

For example, pandas can, among others, help with the following data-related problems:

  • sources: pandas reads data directly from different data sources such as SQL databases, Excel files or JSON based APIs
  • formats: pandas can process input data in different formats like CSV files or Excel files; it can also generate output in different formats like CSV, XLS, HTML or JSON
  • structure: pandas' strength lies in structured data formats, like time series and panel data
  • completeness: pandas automatically deals with missing data in most circumstances, e.g. computing sums even if there are a few or many “not a number”, i.e. missing, values
  • conventions/interpretation: for example, pandas can interpret and convert different date-time formats to Python datetime objects and/or timestamps
  • performance: the majority of pandas classes, methods and functions is implemented in a performance-oriented fashion making heavy use of the Python/C compiler Cython

First Analytics Steps with Python

As a simple example let's generate a NumPy array with five sets of 1000 (pseudo-)random numbers each.

In [1]:
import numpy as np  # this imports the NumPy library
In [2]:
data = np.random.standard_normal((5, 1000))  # generate 5 sets with 1000 rn each
data[:, :5].round(3)  # print first five values of each set rounded to 3 digits
Out[2]:
array([[ 0.785, -0.544,  0.437,  0.793, -1.705],
       [-0.476, -0.115, -0.95 , -0.8  ,  0.172],
       [-0.349, -1.35 ,  0.188, -0.521, -1.299],
       [ 0.392,  0.062,  0.755, -1.712,  0.966],
       [-0.252, -0.504,  1.731, -0.559,  1.605]])

Let's plot a histogram of the 1st, 2nd and 3rd data set.

In [3]:
import matplotlib.pyplot as plt  # this imports matplotlib.pyplot
In [4]:
plt.hist([data[0], data[1], data[2]], label=['Set 0', 'Set 1', 'Set 2'])
plt.grid(True)  # grid for better readability
plt.legend()
Out[4]:
<matplotlib.legend.Legend at 0x105cc5910>

We then want to plot the 'running' cumulative sum of each set.

In [5]:
plt.figure()  # initialize figure object
plt.grid(True) 
for data_set in enumerate(data):  # iterate over all rows
    plt.plot(data_set[1].cumsum(), label='Set %s' % data_set[0])
        # plot the running cumulative sums for each row
plt.legend(loc=0)  # write legend with labels
Out[5]:
<matplotlib.legend.Legend at 0x105f2ffd0>

Some fundamental statistics from our data sets.

In [6]:
data.mean(axis=1)  # average value of the 5 sets
Out[6]:
array([-0.05009487, -0.02224559,  0.01991948,  0.01816627,  0.01484693])
In [7]:
data.std(axis=1)  # standard deviation of the 5 sets
Out[7]:
array([ 0.97038583,  1.00002371,  0.96460241,  1.01051799,  0.99002334])
In [8]:
np.round(np.corrcoef(data), 2)  # correlation matrix of the 5 data sets
Out[8]:
array([[ 1.  , -0.01,  0.03,  0.02, -0.05],
       [-0.01,  1.  , -0.02, -0.04,  0.08],
       [ 0.03, -0.02,  1.  ,  0.04, -0.03],
       [ 0.02, -0.04,  0.04,  1.  ,  0.04],
       [-0.05,  0.08, -0.03,  0.04,  1.  ]])

Data Analytics with Python & pandas

Creating a Database to Work With

Currently, and in the foreseeable future as well, most corporate data is stored in SQL databases. We take this as a starting point and generate a SQL table with dummy data.

In [9]:
filename = 'numbers'
import sqlite3 as sq3  # imports the SQLite library
# Remove files if they exist
import os
try:
    os.remove(filename + '.db')
    os.remove(filename + '.h5')
    os.remove(filename + '.csv')
    os.remove(filename + '.xlsx')
except:
    pass

We create a table and populate the table with the dummy data.

In [10]:
query = 'CREATE TABLE numbs (no1 real, no2 real, no3 real, no4 real, no5 real)'
con = sq3.connect(filename + '.db')
con.execute(query)
con.commit()
In [11]:
# Writing Random Numbers to SQLite3 Database
rows = 500000  # 500,000 rows for the SQL table (ca. 25 MB in size)
nr = np.random.standard_normal((rows, 5))
%time con.executemany('INSERT INTO numbs VALUES(?, ?, ?, ?, ?)', nr)
con.commit()
CPU times: user 8.41 s, sys: 176 ms, total: 8.59 s
Wall time: 8.62 s

Reading Data into NumPy Arrays

We can now read the data into a NumPy array.

In [12]:
# Reading a couple of rows from SQLite3 database table
array(con.execute('SELECT * FROM numbs').fetchmany(3)).round(3)
Out[12]:
array([[ 2.322, -1.008,  0.967,  0.84 ,  0.565],
       [-0.708, -0.826, -0.042, -1.043,  2.123],
       [-1.141, -0.388, -1.418,  1.172,  0.093]])
In [13]:
# Reading the whole table into an NumPy array
%time result = array(con.execute('SELECT * FROM numbs').fetchall())
CPU times: user 4.85 s, sys: 155 ms, total: 5 s
Wall time: 5.06 s

In [14]:
result[:3].round(3)  # the same rows from the in-memory array
Out[14]:
array([[ 2.322, -1.008,  0.967,  0.84 ,  0.565],
       [-0.708, -0.826, -0.042, -1.043,  2.123],
       [-1.141, -0.388, -1.418,  1.172,  0.093]])
In [15]:
nr = 0.0; result = 0.0  # memory clean-up

Reading Data with pandas

pandas is optimized both for convenience and performance when it comes to analytics and I/O operations.

In [16]:
import pandas as pd
import pandas.io.sql as pds
In [17]:
%time data = pds.read_frame('SELECT * FROM numbs', con)
con.close()
CPU times: user 1.93 s, sys: 132 ms, total: 2.06 s
Wall time: 2.07 s

In [18]:
data.head()
Out[18]:
no1 no2 no3 no4 no5
0 2.321564 -1.007644 0.966935 0.839818 0.565468
1 -0.707621 -0.825653 -0.042492 -1.042881 2.122805
2 -1.141395 -0.388145 -1.418309 1.172099 0.093294
3 -1.656099 0.111377 -1.839475 -0.365148 0.162408
4 0.112878 -1.655273 -0.508052 -0.287389 -0.559542

5 rows × 5 columns

The plotting of larger data sets often is a problem. This can easily be accomplished here by only plotting every 500th data row, for instance.

In [19]:
data[::500].cumsum().plot(grid=True, legend=True)  # every 500th row is plotted
Out[19]:
<matplotlib.axes.AxesSubplot at 0x108758450>

Writing Data with pandas

There are some reasons why you may want to write the DataFrame to disk:

  • Database Performance: you do not want to query the SQL database everytime anew
  • IO Speed: you want to increase data/information retrievel
  • Data Structure: maybe you want to store additional data with the original data set or you have generated a DataFrame out of multiple sources

pandas is tightly integrated with PyTables, a library that implements the HDF5 standard for Python. It makes data storage and I/O operations not only convenient but quite fast as well.

In [20]:
h5 = pd.HDFStore(filename + '.h5', 'w')
%time h5['data'] = data  # writes the DataFrame to HDF5 database
CPU times: user 28.9 ms, sys: 35.9 ms, total: 64.8 ms
Wall time: 75.6 ms

Now, we can read the data from the HDF5 file into another DataFrame object. Performance of such an operation generally is quite high.

In [21]:
%time data_new = h5['data']
CPU times: user 3.29 ms, sys: 12.6 ms, total: 15.9 ms
Wall time: 14.9 ms

In [22]:
%time data.sum() - data_new.sum()   # checking if the DataFrames are the same
CPU times: user 73.6 ms, sys: 13.2 ms, total: 86.8 ms
Wall time: 83.6 ms

Out[22]:
no1    0
no2    0
no3    0
no4    0
no5    0
dtype: float64
In [23]:
%time np.allclose(data, data_new, rtol=0.1)
CPU times: user 85.5 ms, sys: 28.6 ms, total: 114 ms
Wall time: 113 ms

Out[23]:
True
In [24]:
data_new = 0.0  # memory clean-up
h5.close()

In-Memory Analytics with pandas

pandas has powerful selection and querying mechanisms (I).

In [25]:
res = data[['no1', 'no2']][((data['no2'] > 0.5) | (data['no2'] < -0.5))]
In [26]:
plt.plot(res.no1, res.no2, 'ro')
plt.grid(True); plt.axis('tight')
Out[26]:
(-4.5855019802135892,
 4.2902614658186531,
 -4.7115453577460711,
 4.5006018407498072)

pandas has powerful selection and querying mechanisms (II).

In [27]:
res = data[['no1', 'no2']][((data['no1'] > 0.5) | (data['no1'] < -0.5))
                     & ((data['no2'] < -1) | (data['no2'] > 1))]
In [28]:
plt.plot(res.no1, res.no2, 'ro')
plt.grid(True); plt.axis('tight')
Out[28]:
(-4.360512668462273,
 4.2902614658186531,
 -4.7115453577460711,
 4.5006018407498072)

pandas has powerful selection and querying mechanisms (III).

In [29]:
res = data[['no1', 'no2', 'no3' ]][((data['no1'] > 1.5) | (data['no1'] < -1.5))
                     & ((data['no2'] < -1) | (data['no2'] > 1))
                     & ((data['no3'] < -1) | (data['no3'] > 1))]
In [30]:
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure(figsize=(8, 5))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(res.no1, res.no2, res.no3, zdir='z', s=25, c='r', marker='o')
ax.set_xlabel('no1'); ax.set_ylabel('no2'); ax.set_zlabel('no3')
Out[30]:
<matplotlib.text.Text at 0x10616fa10>

Exporting Data with pandas

The data is easily exported to different standard data file formats like comma separated value files (CSV).

In [31]:
%time data.to_csv(filename + '.csv')  # writes the whole data set as CSV file
CPU times: user 5.04 s, sys: 176 ms, total: 5.21 s
Wall time: 5.25 s

In [32]:
csv_file = open(filename + '.csv')
for i in range(5):
    print csv_file.readline(),
,no1,no2,no3,no4,no5
0,2.321564183657276,-1.0076441547182922,0.9669345116776533,0.8398179850987361,0.5654678971576657
1,-0.7076206524338957,-0.8256527633342371,-0.042492017121949534,-1.042880737514809,2.12280463091422
2,-1.1413946947894245,-0.3881445591466267,-1.4183091603968363,1.1720993048812203,0.09329359013480651
3,-1.6560985353714768,0.1113772366765862,-1.8394747939237726,-0.36514754370691044,0.16240810586181725

pandas also interacts easily with Microsoft Excel spreadsheet files (XLS/XLSX).

In [33]:
%time data.ix[:10000].to_excel(filename + '.xlsx')  # first 10,000 rows written in a Excel file
CPU times: user 14.3 s, sys: 295 ms, total: 14.6 s
Wall time: 14.5 s

With pandas you can in one step read data from an Excel file and e.g. plot it.

In [34]:
%time pd.read_excel(filename + '.xlsx', 'Sheet1').hist(grid=True)
CPU times: user 1.17 s, sys: 14.5 ms, total: 1.18 s
Wall time: 1.19 s

Out[34]:
array([[<matplotlib.axes.AxesSubplot object at 0x10e42e190>,
        <matplotlib.axes.AxesSubplot object at 0x10dae7fd0>,
        <matplotlib.axes.AxesSubplot object at 0x10c17cf50>],
       [<matplotlib.axes.AxesSubplot object at 0x10f801b50>,
        <matplotlib.axes.AxesSubplot object at 0x10f7fae90>,
        <matplotlib.axes.AxesSubplot object at 0x110d6bdd0>]], dtype=object)

One can also generate HTML code, e.g. for automatic Web-based report generation.

In [35]:
html_code = data.ix[:2].to_html()
html_code
Out[35]:
u'<table border="1" class="dataframe">\n  <thead>\n    <tr style="text-align: right;">\n      <th></th>\n      <th>no1</th>\n      <th>no2</th>\n      <th>no3</th>\n      <th>no4</th>\n      <th>no5</th>\n    </tr>\n  </thead>\n  <tbody>\n    <tr>\n      <th>0</th>\n      <td> 2.321564</td>\n      <td>-1.007644</td>\n      <td> 0.966935</td>\n      <td> 0.839818</td>\n      <td> 0.565468</td>\n    </tr>\n    <tr>\n      <th>1</th>\n      <td>-0.707621</td>\n      <td>-0.825653</td>\n      <td>-0.042492</td>\n      <td>-1.042881</td>\n      <td> 2.122805</td>\n    </tr>\n    <tr>\n      <th>2</th>\n      <td>-1.141395</td>\n      <td>-0.388145</td>\n      <td>-1.418309</td>\n      <td> 1.172099</td>\n      <td> 0.093294</td>\n    </tr>\n  </tbody>\n</table>'
In [36]:
from IPython.core.display import HTML
In [37]:
HTML(html_code)
Out[37]:
no1 no2 no3 no4 no5
0 2.321564 -1.007644 0.966935 0.839818 0.565468
1 -0.707621 -0.825653 -0.042492 -1.042881 2.122805
2 -1.141395 -0.388145 -1.418309 1.172099 0.093294

Some final checks and data cleaning.

In [38]:
ll numb*
-rw-rw-rw-  1 yhilpisch  staff  52468263 29 Jan 12:15 numbers.csv
-rw-r--r--  1 yhilpisch  staff  27224064 29 Jan 12:15 numbers.db
-rw-rw-rw-  1 yhilpisch  staff  24007368 29 Jan 12:15 numbers.h5
-rw-rw-rw-  1 yhilpisch  staff    766583 29 Jan 12:15 numbers.xlsx

In [39]:
data = 0.0  # memory clean-up
In [40]:
try:
    os.remove(filename + '.db')
    os.remove(filename + '.h5')
    os.remove(filename + '.csv')
    os.remove(filename + '.xlsx')
except:
    pass

N-Body Simulation

The simulation of the movement of N bodies in relation to each other, where \(N\) is a large number and a body can be a star/planet or a small particle, is an important problem in Physics.

The basic simulation algorithm involves a number of iterations ("loops") since one has to determine the force that every single body exerts on all the other bodies in the system to be simulated (e.g. a galaxy or a system of atomic particles).

The algorithm lends iteself pretty well for parallization. However, the example code that follows does not use any parallelization techniques. It illustrates rather what speed-ups are achievable by

  • using vectorization approaches with NumPy and
  • just-in-time compiling capabilities of Numba

A description and discussion of the algorithm is found in the recent article Vladimirov, Andrey and Vadim Karpusenko (2013): "Test-Driving Intel Xeon-Phi Coprocessors with a Basic N-Body Simulation." White Paper, Colfax International.

Pure Python

We first use pure Python to implement the algorithm. We start by generating random vectors of particle locations and a zero vector for the speed of each particle.

In [41]:
import random
In [42]:
nParticles = 250
nSteps = 15
In [43]:
p_py = []
for i in range(nParticles):
    p_py.append([random.gauss(0.0, 1.0) for j in range(3)])
In [44]:
pv_py = [[0, 0, 0] for x in p_py]

Next, we define the simulation function for the movements of each particle given their gravitational interaction.

In [45]:
def nbody_py(particle, particlev):  # lists as input
    dt = 0.01
    for step in range(1, nSteps + 1, 1):
        for i in range(nParticles):
            Fx = 0.0; Fy = 0.0; Fz = 0.0
            for j in range(nParticles):
                if j != i:
                    dx = particle[j][0] - particle[i][0]
                    dy = particle[j][1] - particle[i][1]
                    dz = particle[j][2] - particle[i][2]
                    drSquared = dx * dx + dy * dy + dz * dz
                    drPowerN32 = 1.0 / (drSquared + sqrt(drSquared))
                    Fx += dx * drPowerN32
                    Fy += dy * drPowerN32
                    Fz += dz * drPowerN32
                particlev[i][0] += dt * Fx
                particlev[i][1] += dt * Fy
                particlev[i][2] += dt * Fz
        for i in range(nParticles):
            particle[i][0] += particlev[i][0] * dt
            particle[i][1] += particlev[i][1] * dt
            particle[i][2] += particlev[i][2] * dt
    return particle, particlev

We then execute the function to measure execution speed.

In [46]:
%time p_py, pv_py = nbody_py(p_py, pv_py)
CPU times: user 13.1 s, sys: 33.9 ms, total: 13.1 s
Wall time: 13.1 s

NumPy Solution

The basic algorithm allows for a number of vectorizations by using NumPy.

In [47]:
import numpy as np
In [48]:
p_np = np.random.standard_normal((nParticles, 3))
pv_np = np.zeros_like(p_np)

Using NumPy arrays and matplotlib's 3d plotting capabilities, we can easily visualize the intial positions of the particles in the space.

In [49]:
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(p_np[:, 0], p_np[:, 1], p_np[:, 2],
           c='r', marker='.')
Out[49]:
<mpl_toolkits.mplot3d.art3d.Patch3DCollection at 0x10b360510>

The respective NumPy simulation function is considerably more compact.

In [50]:
def nbody_np(particle, particlev):  # NumPy arrays as input
    dt = 0.01
    for step in range(1, nSteps + 1, 1):
        Fp = np.zeros((nParticles, 3))
        for i in range(nParticles):
            dp = particle - particle[i]
            drSquared = np.sum(dp ** 2, axis=1)
            h = drSquared + np.sqrt(drSquared)
            drPowerN32 = 1. / np.maximum(h, 1E-10)
            Fp += -(dp.T * drPowerN32).T
            particlev += dt * Fp
        particle += particlev * dt
    return particle, particlev

It is also considerably faster.

In [51]:
%time p_np, pv_np = nbody_np(p_np, pv_np)
CPU times: user 204 ms, sys: 1.19 ms, total: 205 ms
Wall time: 205 ms

Let's check how the particles moved. The following is a snapshot of the final positions of all particles.

In [52]:
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(p_np[:, 0], p_np[:, 1], p_np[:, 2],
           c='r', marker='.')
Out[52]:
<mpl_toolkits.mplot3d.art3d.Patch3DCollection at 0x10b68e1d0>

Dynamically Compiled Version

For this case, we first implement a function which is somehow a hybrid function of the pure Python (loop-heavy) and the NumPy array class-based version.

In [53]:
def nbody_hy(particle, particlev):  # NumPy arrays as input
    dt = 0.01
    for step in range(1, nSteps + 1, 1):
        for i in range(nParticles):
            Fx = 0.0; Fy = 0.0; Fz = 0.0
            for j in range(nParticles):
                if j != i:
                    dx = particle[j,0] - particle[i,0]
                    dy = particle[j,1] - particle[i,1]
                    dz = particle[j,2] - particle[i,2]
                    drSquared = dx * dx + dy * dy + dz * dz
                    drPowerN32 = 1.0 / (drSquared + sqrt(drSquared))
                    Fx += dx * drPowerN32
                    Fy += dy * drPowerN32
                    Fz += dz * drPowerN32
                particlev[i, 0] += dt * Fx
                particlev[i, 1] += dt * Fy
                particlev[i, 2] += dt * Fz
        for i in range(nParticles):
            particle[i,0] += particlev[i,0] * dt
            particle[i,1] += particlev[i,1] * dt
            particle[i,2] += particlev[i,2] * dt
    return particle, particlev

This function is then compiled with Numba.

In [54]:
import numba as nb
In [55]:
nbody_nb = nb.autojit(nbody_hy)

This new, compiled version of the hybrid function is yet faster.

In [56]:
p_nb = np.random.standard_normal((nParticles, 3))
pv_nb = np.zeros_like(p_nb)
In [62]:
%time p_nb, pv_nb = nbody_nb(p_nb, pv_nb)
CPU times: user 20.5 ms, sys: 33 µs, total: 20.5 ms
Wall time: 20.5 ms

Speed Comparisons

A helper function to compare the performances.

In [58]:
def perf_comp_data(func_list, data_list, rep=3, number=1):
    from timeit import repeat
    res_list = {}
    for name in enumerate(func_list):
        stmt = name[1] + '(' + data_list[name[0]] + ')'
        setup = "from __main__ import " + name[1] + ', ' + data_list[name[0]]
        results = repeat(stmt=stmt, setup=setup, repeat=rep, number=number)
        res_list[name[1]] = sum(results) / rep
    res_sort = sorted(res_list.iteritems(), key=lambda (k, v): (v, k))
    for item in res_sort:
        rel = item[1] / res_sort[0][1]
        print 'function: ' + item[0] + ', av. time sec: %9.5f, ' % item[1] \
            + 'relative: %6.1f' % rel

The dynamically compiled version is by far the fastest.

In [59]:
func_list = ['nbody_py', 'nbody_np', 'nbody_nb', 'nbody_hy']
data_list = ['p_py, pv_py', 'p_np, pv_np', 'p_nb, pv_nb', 'p_nb, pv_nb']
In [60]:
perf_comp_data(func_list, data_list, rep=1)
function: nbody_nb, av. time sec:   0.01484, relative:    1.0
function: nbody_np, av. time sec:   0.17571, relative:   11.8
function: nbody_py, av. time sec:  12.86464, relative:  867.1
function: nbody_hy, av. time sec:  15.18388, relative: 1023.4

Why Python for Data Analytics & Visualization?

10 years ago, Python was considered exotic in the analytics space – at best. Languages/packages like R and Matlab dominated the scene. Today, Python has become a major force in data analytics & visualization due to a number of characteristics:

  • multi-purpose: prototyping, development, production, sytems administration – Python is one for all
  • libraries: there is a library for almost any task or problem you face
  • efficiency: Python speeds up all IT development tasks for analytics applications and reduces maintenance costs
  • performance: Python has evolved from a scripting language to a 'meta' language with bridges to all high performance environments (e.g. LLVM, multi-core CPUs, GPUs, clusters)
  • interoperalbility: Python seamlessly integrates with almost any other language and technology
  • interactivity: Python allows domain experts to get closer to their business and financial data pools and to do real-time analytics
  • collaboration: solutions like Wakari with IPython Notebook allow the easy sharing of code, data, results, graphics, etc.

The Python Quants



The Python Quants GmbH – Python Data Exploration & Visualization

The Python Quants – the company Web site

www.pythonquants.com

Dr. Yves J. Hilpisch – my personal Web site

www.hilpisch.com

Derivatives Analytics with Python – my new book

Read an Excerpt and Order the Book

Contact Us

yves@pythonquants.com

@dyjh

In [61]:
# Autosave Function for IPyton Notebook
def autosave(interval=3):
    """Autosave the notebook every interval (in minutes)"""
    from IPython.core.display import Javascript
    interval *= 60 * 1000 # JS wants intervals in miliseconds
    tpl = 'setInterval ( "IPython.notebook.save_notebook()", %i );'
    return Javascript(tpl % interval)
autosave()
Out[61]:
<IPython.core.display.Javascript at 0x108b4e5d0>