Dr. Yves J. Hilpisch
The Python Quants GmbH
BI Forum, Budapest – 06. November 2013
Corporations, decision makers and analysts nowadays generally face a number of problems with data:
In addition to these data-oriented problems, there typically are organizational issues that have to be considered:
This presentation focuses on
It does not address such important issues like
A fundamental Python stack for interactive data analytics and visualization should at least contain the following libraries tools:
It is best to use either the Python distribution Anaconda or the Web-based analytics environment Wakari. Both provide almost "complete" Python environments.
For example, pandas can, among others, help with the following data-related problems:
As a simple example let's generate a NumPy array with five sets of 1000 (pseudo-)random numbers each.
import numpy as np # this imports the NumPy library
data = np.random.standard_normal((5, 1000)) # generate 5 sets with 1000 rn each
data[:, :5].round(3) # print first five values of each set rounded to 3 digits
array([[ 0.321, -1.314, -0.593, -0.102, 1.658], [-1.482, 0.179, -0.682, -1.432, -0.41 ], [-0.138, 0.023, -0.363, 1.308, 0.297], [ 0.562, -0.749, -1.585, -0.208, -0.089], [ 1.425, -0.555, -0.573, 0.169, -0.078]])
Let's plot a histogram of the 1st, 2nd and 3rd data set.
import matplotlib.pyplot as plt # this imports matplotlib.pyplot
plt.hist([data[0], data[1], data[2]], label=['Set 0', 'Set 1', 'Set 2'])
plt.grid(True) # grid for better readability
plt.legend()
<matplotlib.legend.Legend at 0x105901590>
We then want to plot the 'running' cumulative sum of each set.
plt.figure() # initialize figure object
plt.grid(True)
for data_set in enumerate(data): # iterate over all rows
plt.plot(data_set[1].cumsum(), label='Set %s' % data_set[0])
# plot the running cumulative sums for each row
plt.legend(loc=0) # write legend with labels
<matplotlib.legend.Legend at 0x105d13a50>
Some fundamental statistics from our data sets.
data.mean(axis=1) # average value of the 5 sets
array([ 0.02138191, -0.01091389, 0.00409084, 0.006534 , 0.01840274])
data.std(axis=1) # standard deviation of the 5 sets
array([ 1.0369874 , 1.03725012, 0.99011055, 1.00291577, 0.97987308])
np.corrcoef(data) # correltion matrix of the 5 data sets
array([[ 1. , -0.01430132, -0.04065085, 0.03645453, 0.02389929], [-0.01430132, 1. , -0.0287767 , -0.02230724, 0.03217344], [-0.04065085, -0.0287767 , 1. , 0.00691503, -0.0021722 ], [ 0.03645453, -0.02230724, 0.00691503, 1. , 0.00393679], [ 0.02389929, 0.03217344, -0.0021722 , 0.00393679, 1. ]])
Currently, and in the foreseeable future as well, most corporate data is stored in SQL databases. We take this as a starting point and generate a SQL table with dummy data.
filename = 'numbers'
import sqlite3 as sq3 # imports the SQLite library
# Remove files if they exist
import os
try:
os.remove(filename + '.db')
os.remove(filename + '.h5')
os.remove(filename + '.csv')
os.remove(filename + '.xls')
except:
pass
We create a table and populate the table with the dummy data.
query = 'CREATE TABLE numbs (no1 real, no2 real, no3 real, no4 real, no5 real)'
con = sq3.connect(filename + '.db')
con.execute(query)
con.commit()
# Writing Random Numbers to SQLite3 Database
rows = 500000 # 500,000 rows for the SQL table (ca. 25 MB in size)
nr = np.random.standard_normal((rows, 5))
%time con.executemany('INSERT INTO numbs VALUES(?, ?, ?, ?, ?)', nr)
con.commit()
CPU times: user 8.83 s, sys: 177 ms, total: 9.01 s Wall time: 9.03 s
We can now read the data into a NumPy array.
# Reading a couple of rows from SQLite3 database table
array(con.execute('SELECT * FROM numbs').fetchmany(3)).round(3)
array([[ 1.678, -1.522, 0.455, 0.484, -1.186], [ 0.217, -2.242, -0.893, -0.357, -2.32 ], [ 1.241, -0.432, 0.731, 0.372, 1.517]])
# Reading the whole table into an NumPy array
%time result = array(con.execute('SELECT * FROM numbs').fetchall())
CPU times: user 4.71 s, sys: 134 ms, total: 4.84 s Wall time: 4.87 s
nr = 0.0; result = 0.0 # memory clean-up
pandas is optimized both for convenience and performance when it comes to analytics and I/O operations.
import pandas as pd
import pandas.io.sql as pds
%time data = pds.read_frame('SELECT * FROM numbs', con)
con.close()
CPU times: user 2.11 s, sys: 159 ms, total: 2.27 s Wall time: 2.28 s
data.head()
no1 | no2 | no3 | no4 | no5 | |
---|---|---|---|---|---|
0 | 1.677696 | -1.521913 | 0.455371 | 0.483874 | -1.185816 |
1 | 0.216890 | -2.242000 | -0.893104 | -0.357223 | -2.319643 |
2 | 1.240746 | -0.432435 | 0.730807 | 0.371782 | 1.516961 |
3 | 0.575273 | -1.219279 | -0.413639 | 0.433625 | 0.910346 |
4 | -1.913407 | -1.233322 | -0.063490 | -1.231014 | -0.571479 |
The plotting of larger data sets often is a problem. This can easily be accomplished here by only plotting every 500th data row, for instance.
%time data[::500].cumsum().plot(grid=True, legend=True) # every 500th row is plotted
CPU times: user 69.9 ms, sys: 1.61 ms, total: 71.6 ms Wall time: 71.2 ms
<matplotlib.axes.AxesSubplot at 0x107f1b050>
There are some reasons why you may want to write the DataFrame to disk:
pandas is tightly integrated with PyTables, a library that implements the HDF5 standard for Python. It makes data storage and I/O operations not only convenient but quite fast as well.
h5 = pd.HDFStore(filename + '.h5', 'w')
%time h5['data'] = data # writes the DataFrame to HDF5 database
CPU times: user 31.4 ms, sys: 35.4 ms, total: 66.8 ms Wall time: 84.9 ms
Now, we can read the data from the HDF5 file into another DataFrame object. Performance of such an operation generally is quite high.
%time data_new = h5['data']
CPU times: user 3.4 ms, sys: 12.3 ms, total: 15.7 ms Wall time: 15.4 ms
%time data.sum() - data_new.sum() # checking if the DataFrames are the same
CPU times: user 74.4 ms, sys: 15.3 ms, total: 89.7 ms Wall time: 83 ms
no1 0 no2 0 no3 0 no4 0 no5 0 dtype: float64
data_new = 0.0 # memory clean-up
The data is easily exported to different standard data file formats like comma separated value files (CSV).
%time data.to_csv(filename + '.csv') # writes the whole data set as CSV file
CPU times: user 4.89 s, sys: 129 ms, total: 5.02 s Wall time: 5.08 s
csv_file = open(filename + '.csv')
for i in range(5):
print csv_file.readline(),
,no1,no2,no3,no4,no5 0,1.677695668253341,-1.5219134106087264,0.4553707994114911,0.4838739437594936,-1.185815732601574 1,0.2168901241866966,-2.2420002426484427,-0.8931035434748411,-0.3572227389588546,-2.3196427644529645 2,1.2407462786194512,-0.43243481195488837,0.7308065597573864,0.37178197861313794,1.516960849144284 3,0.5752733225616632,-1.2192792718946073,-0.413638685603497,0.43362460986398205,0.9103460017903237
pandas also interacts easily with Microsoft Excel spreadsheet files (XLS/XLSX).
%time data.ix[:65000].to_excel(filename + '.xls') # first 65,000 rows written in a Excel file
CPU times: user 9.35 s, sys: 197 ms, total: 9.55 s Wall time: 9.59 s
With pandas you can in one step read data from an Excel file and e.g. plot it.
%time pd.read_excel(filename + '.xls', 'sheet1').hist(grid=True)
CPU times: user 2.38 s, sys: 66.2 ms, total: 2.44 s Wall time: 2.44 s
array([[<matplotlib.axes.AxesSubplot object at 0x1084433d0>, <matplotlib.axes.AxesSubplot object at 0x10841b090>, <matplotlib.axes.AxesSubplot object at 0x10843f950>], [<matplotlib.axes.AxesSubplot object at 0x106f21dd0>, <matplotlib.axes.AxesSubplot object at 0x106f42190>, <matplotlib.axes.AxesSubplot object at 0x1070fd950>]], dtype=object)
One can also generate HTML code, e.g. for automatic Web-based report generation.
html_code = data.ix[:2].to_html()
html_code
u'<table border="1" class="dataframe">\n <thead>\n <tr style="text-align: right;">\n <th></th>\n <th>no1</th>\n <th>no2</th>\n <th>no3</th>\n <th>no4</th>\n <th>no5</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td> 1.677696</td>\n <td>-1.521913</td>\n <td> 0.455371</td>\n <td> 0.483874</td>\n <td>-1.185816</td>\n </tr>\n <tr>\n <th>1</th>\n <td> 0.216890</td>\n <td>-2.242000</td>\n <td>-0.893104</td>\n <td>-0.357223</td>\n <td>-2.319643</td>\n </tr>\n <tr>\n <th>2</th>\n <td> 1.240746</td>\n <td>-0.432435</td>\n <td> 0.730807</td>\n <td> 0.371782</td>\n <td> 1.516961</td>\n </tr>\n </tbody>\n</table>'
from IPython.core.display import HTML
HTML(html_code)
no1 | no2 | no3 | no4 | no5 | |
---|---|---|---|---|---|
0 | 1.677696 | -1.521913 | 0.455371 | 0.483874 | -1.185816 |
1 | 0.216890 | -2.242000 | -0.893104 | -0.357223 | -2.319643 |
2 | 1.240746 | -0.432435 | 0.730807 | 0.371782 | 1.516961 |
Some final checks.
ll numb*
-rw-rw-rw- 1 yhilpisch staff 52469010 5 Nov 16:11 numbers.csv -rw-r--r-- 1 yhilpisch staff 27224064 5 Nov 16:11 numbers.db -rw-rw-rw- 1 yhilpisch staff 24007368 5 Nov 16:11 numbers.h5 -rw-rw-rw- 1 yhilpisch staff 8130560 5 Nov 16:11 numbers.xls
data = 0.0 # memory clean-up
try:
os.remove(filename + '.db')
os.remove(filename + '.h5')
os.remove(filename + '.csv')
os.remove(filename + '.xls')
except:
pass
First, let us define the function we want to evaluate on a large array.
from math import *
def f(x):
return abs(cos(x)) ** 0.5 + sin(2 + 3 * sqrt(abs(x)))
x = pi
f(x)
1.8594415167660776
Second, as a benchmark, a pure Python implementation.
I = 10000000
x = range(I)
def f1(x):
return [f(x) for x in x]
%time r = f1(x)
CPU times: user 7.82 s, sys: 208 ms, total: 8.03 s Wall time: 8.13 s
print r[:5]
[1.9092974268256817, -0.2238716875184228, 0.6045609273704432, 1.7863049827563042, 1.7978405413358955]
A better implementation uses NumPy for faster looping via vectorization.
import numpy as np
x = np.arange(I)
def f2(x):
return (np.abs(np.cos(x)) ** 0.5 +
np.sin(2 + 3 * np.sqrt(np.abs(x))))
%time r = f2(x)
CPU times: user 937 ms, sys: 177 ms, total: 1.11 s Wall time: 1.2 s
print r[:5]
[ 1.90929743 -0.22387169 0.60456093 1.78630498 1.79784054]
An even better and parallel version uses the numexpr library.
import numexpr as ne
ex = 'abs(cos(x)) ** 0.5 + sin(2 + 3 * sqrt(abs(x)))'
def f3(x):
ne.set_num_threads(4)
return ne.evaluate(ex)
%time r = f3(x)
CPU times: user 1.13 s, sys: 18.6 ms, total: 1.14 s Wall time: 384 ms
print r[:5]
[ 1.90929743 -0.22387169 0.60456093 1.78630498 1.79784054]
Let us compare the performance of the different functions in a bit more detail.
from perf_comp import perf_comp
func_list = [f1, f2, f3]
perf_comp(func_list, 'x', rep=3)
function: f3, av. time sec: 0.35817, relative: 1.0 function: f2, av. time sec: 0.85174, relative: 2.4 function: f1, av. time sec: 9.68758, relative: 27.0
The parallel version with numexpr is magnitudes of order faster than pure Python.
The algorithm we (arbitrarily) choose is the Monte Carlo estimator for a European option value in the Black-Scholes-Merton set-up. In this set-up the underlying follows the stochastic differential equation:
\[dS_t = r S_t dt + \sigma S_t dZ_t\]
The Monte Carlo estimator for a call option value looks like follows:
\(C_0 = e^{-rT} \frac{1}{I} \sum_{i=1}^{I} \max[S_T(i) - K, 0]\)
where \(S_T(i)\) is the \(i\)-th simulated value of the underlying at maturity \(T\).
A dynamic implementation of the BSM Monte Carlo valuation could look like follows.
def BSM_MCS(S0):
''' Dynamic Black-Scholes-Merton MCS Estimator for European Calls. '''
import numpy as np
K = 100.; T = 1.0; r = 0.05; vola = 0.2
M = 50; I = 20000
dt = T / M
rand = np.random.standard_normal((M + 1, I))
S = np.zeros((M + 1, I)); S[0] = S0
for t in range(1, M + 1):
S[t] = S[t-1] * np.exp((r - 0.5 * vola ** 2) * dt + vola * np.sqrt(dt) * rand[t])
BSM_MCS_Value = np.sum(np.maximum(S[-1] - K, 0)) / I
return BSM_MCS_Value
The benchmark case of sequential valuation of \(n\) options.
def seqValue(n):
optionValues = []
for S in linspace(80, 100, n):
optionValues.append(BSM_MCS(S))
return optionValues
First, start a (local) cluster.
# in the shell ...
####
# ipcluster start -n 4
####
# or maybe more cores
Second, enable parallel computing capabilities.
from IPython.parallel import Client
c = Client(profile="default")
view = c.load_balanced_view()
The function for the parallel execution of a number \(n\) of different options valuations.
def parValue(n):
optionValues = []
for S in linspace(80, 100, n):
value = view.apply_async(BSM_MCS, S)
# asynchronously calculate the option values
optionValues.append(value)
c.wait(optionValues)
return optionValues
Let us again compare performance of the two different approaches.
n = 100 # number of option valuations
perf_comp((seqValue, parValue), data='n', rep=3)
function: parValue, av. time sec: 3.26991, relative: 1.0 function: seqValue, av. time sec: 6.11594, relative: 1.9
Performance scales linearly with the number of CPU cores available.
It is a stylized fact that stock indexes and related volatility indexes are highly negatively correlated. The following example analyzes this stylized fact based on the EURO STOXX 50 stock index and the VSTOXX volatility index using Ordinary Least-Squares regession (OLS).
First, we collect historical data for both the EURO STOXX 50 stock and the VSTOXX volatility index.
import pandas as pd
from urllib import urlretrieve
try:
open('es.txt')
open('vs.txt')
except:
es_url = 'http://www.stoxx.com/download/historical_values/hbrbcpe.txt'
vs_url = 'http://www.stoxx.com/download/historical_values/h_vstoxx.txt'
urlretrieve(es_url, 'es.txt')
urlretrieve(vs_url, 'vs.txt')
The EURO STOXX 50 data is not yet in the right format. Some house cleaning is necessary.
lines = open('es.txt').readlines() # reads the whole file line-by-line
new_file = open('es50.txt', 'w') # opens a new file
new_file.writelines('date' + lines[3][:-1].replace(' ', '') + ';DEL' + lines[3][-1])
# writes the corrected third line of the orginal file as first line of new file
new_file.writelines(lines[4:-1]) # writes the remaining lines of the orginial file
print list(open('es50.txt'))[:5] # opens the new file for inspection
['date;SX5P;SX5E;SXXP;SXXE;SXXF;SXXA;DK5F;DKXF;DEL\n', '31.12.1986;775.00 ; 900.82 ; 82.76 ; 98.58 ; 98.06 ; 69.06 ; 645.26 ; 65.56\n', '01.01.1987;775.00 ; 900.82 ; 82.76 ; 98.58 ; 98.06 ; 69.06 ; 645.26 ; 65.56\n', '02.01.1987;770.89 ; 891.78 ; 82.57 ; 97.80 ; 97.43 ; 69.37 ; 647.62 ; 65.81\n', '05.01.1987;771.89 ; 898.33 ; 82.82 ; 98.60 ; 98.19 ; 69.16 ; 649.94 ; 65.82\n']
Now, the data can be safely read into a DataFrame object.
es = pd.read_csv('es50.txt', index_col=0, parse_dates=True, sep=';', dayfirst=True)
del es['DEL'] # delete the helper column
es
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 6907 entries, 1986-12-31 00:00:00 to 2013-10-10 00:00:00 Data columns (total 8 columns): SX5P 6907 non-null values SX5E 6907 non-null values SXXP 6907 non-null values SXXE 6907 non-null values SXXF 6906 non-null values SXXA 6906 non-null values DK5F 6906 non-null values DKXF 6906 non-null values dtypes: float64(7), object(1)
The VSTOXX data can be read without touching the raw data.
vs = pd.read_csv('vs.txt', index_col=0, header=2,
parse_dates=True, sep=',', dayfirst=True)
# you can alternatively read from the Web source directly
# without saving the csv file to disk:
# vs = pd.read_csv(vs_url, index_col=0, header=2,
# parse_dates=True, sep=',', dayfirst=True)
We now merge the data for further analysis.
from datetime import datetime
data = pd.DataFrame({'EUROSTOXX' :
es['SX5E'][es.index > datetime(1999, 12, 31)]})
data = data.join(pd.DataFrame({'VSTOXX' :
vs['V2TX'][vs.index > datetime(1999, 12, 31)]}))
data
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 3532 entries, 2000-01-03 00:00:00 to 2013-10-10 00:00:00 Data columns (total 2 columns): EUROSTOXX 3532 non-null values VSTOXX 3512 non-null values dtypes: float64(2)
Let's inspect the two time series.
data.head()
EUROSTOXX | VSTOXX | |
---|---|---|
date | ||
2000-01-03 | 4849.22 | 30.9845 |
2000-01-04 | 4657.83 | 33.2225 |
2000-01-05 | 4541.75 | 32.5944 |
2000-01-06 | 4500.69 | 31.1811 |
2000-01-07 | 4648.27 | 27.4407 |
A picture can tell the complete story.
data.plot(subplots=True, grid=True, style='b')
array([<matplotlib.axes.AxesSubplot object at 0x1071d4d50>, <matplotlib.axes.AxesSubplot object at 0x105f46e90>], dtype=object)
We now generate log returns for both time series.
rets = np.log(data / data.shift(1))
rets.head()
EUROSTOXX | VSTOXX | |
---|---|---|
date | ||
2000-01-03 | NaN | NaN |
2000-01-04 | -0.040268 | 0.069740 |
2000-01-05 | -0.025237 | -0.019087 |
2000-01-06 | -0.009082 | -0.044328 |
2000-01-07 | 0.032264 | -0.127785 |
To this new data set, also stored in a DataFrame object, we apply ordinary least-square regression (OLS).
xdat = rets['EUROSTOXX']
ydat = rets['VSTOXX']
model = pd.ols(y=ydat, x=xdat)
model
-------------------------Summary of Regression Analysis------------------------- Formula: Y ~ <x> + <intercept> Number of Observations: 3491 Number of Degrees of Freedom: 2 R-squared: 0.5573 Adj R-squared: 0.5571 Rmse: 0.0378 F-stat (1, 3489): 4391.4224, p-value: 0.0000 Degrees of Freedom: model 1, resid 3489 -----------------------Summary of Estimated Coefficients------------------------ Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5% -------------------------------------------------------------------------------- x -2.7062 0.0408 -66.27 0.0000 -2.7863 -2.6262 intercept -0.0007 0.0006 -1.05 0.2952 -0.0019 0.0006 ---------------------------------End of Summary---------------------------------
Again, we want to see how our results look graphically.
plt.plot(xdat, ydat, 'r.')
ax = plt.axis() # grab axis values
x = np.linspace(ax[0], ax[1] + 0.01)
plt.plot(x, model.beta[1] + model.beta[0] * x, 'b', lw=2)
plt.grid(True)
plt.axis('tight')
(-0.10000000000000001, 0.16, -0.43366353822915737, 0.43687964474802654)
Using pandas, the code that follows reads intraday, high-frequency data from a Web source, plots it and resamples it.
try:
AAPL = pd.read_csv('aapl.csv', index_col=0, header=0, parse_dates=True)
except:
date = datetime.now()
url1 = 'http://hopey.netfonds.no/posdump.php?'
url2 = 'date=%s%s%s&paper=AAPL.O&csv_format=csv' % ('2013', '10', '31')
# you may have to adjust the date since only recent dates are available
urlretrieve(url1 + url2, 'aapl.csv')
AAPL = pd.read_csv('aapl.csv', index_col=0, header=0, parse_dates=True)
AAPL
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 12944 entries, 2013-10-31 09:00:01 to 2013-10-31 21:21:26 Data columns (total 6 columns): bid 12944 non-null values bid_depth 12944 non-null values bid_depth_total 12944 non-null values offer 12944 non-null values offer_depth 12944 non-null values offer_depth_total 12944 non-null values dtypes: float64(2), int64(4)
The intraday evolution of the Apple stock price.
AAPL['bid'].plot()
<matplotlib.axes.AxesSubplot at 0x106e58090>
A resampling of the data is easily accomplished with pandas.
import datetime as dt
# this resamples the record frequency to 5 minutes, using mean as aggregation rule
AAPL_resam = AAPL.resample(rule='5min', how='mean')
AAPL_resam.head()
bid | bid_depth | bid_depth_total | offer | offer_depth | offer_depth_total | |
---|---|---|---|---|---|---|
time | ||||||
2013-10-31 09:00:00 | 519.223333 | 133.333333 | 133.333333 | 527.000 | 100 | 100 |
2013-10-31 09:05:00 | NaN | NaN | NaN | NaN | NaN | NaN |
2013-10-31 09:10:00 | NaN | NaN | NaN | NaN | NaN | NaN |
2013-10-31 09:15:00 | NaN | NaN | NaN | NaN | NaN | NaN |
2013-10-31 09:20:00 | 523.290000 | 500.000000 | 500.000000 | 526.995 | 300 | 300 |
Let's have a graphical look at the new data set.
AAPL_resam['bid'].plot()
<matplotlib.axes.AxesSubplot at 0x10b1eb5d0>
Obvioulsly, there hasn't been trading data for every interval.
pandas has a number of approaches to fill in missing data.
AAPL_resam['bid'].fillna(method='ffill').plot() # this forward fills missing data
<matplotlib.axes.AxesSubplot at 0x10b19fad0>
The simulation of the movement of N bodies in relation to each other, where \(N\) is a large number and a body can be a star/planet or a small particle, is an important problem in Physics.
The basic simulation algorithm involves a number of iterations ("loops") since one has to determine the force that every single body exerts on all the other bodies in the system to be simulated (e.g. a galaxy or a system of atomic particles).
The algorithm lends iteself pretty well for parallization. However, the example code that follows does not use any parallelization techniques. It illustrates rather what speed-ups are achievable by
A description and discussion of the algorithm is found in the recent article Vladimirov, Andrey and Vadim Karpusenko (2013): "Test-Driving Intel Xeon-Phi Coprocessors with a Basic N-Body Simulation." White Paper, Colfax International.
We first use pure Python to implement the algorithm. We start by generating random vectors of particle locations and a zero vector for the speed of each particle.
import random
nParticles = 500
particle = []
for i in range(nParticles):
particle.append([random.gauss(0.0, 1.0) for j in range(3)])
particlev = [[0, 0, 0] for x in particle]
Next, we define the simulation function for the movements of each particle given their gravitational interaction.
def nbody_py(particle, particlev): # lists as input
nSteps = 15; dt = 0.01
for step in range(1, nSteps + 1, 1):
for i in range(nParticles):
Fx = 0.0; Fy = 0.0; Fz = 0.0
for j in range(nParticles):
if j != i:
dx = particle[j][0] - particle[i][0]
dy = particle[j][1] - particle[i][1]
dz = particle[j][2] - particle[i][2]
drSquared = dx * dx + dy * dy + dz * dz
drPowerN32 = 1.0 / (drSquared + sqrt(drSquared))
Fx += dx * drPowerN32
Fy += dy * drPowerN32
Fz += dz * drPowerN32
particlev[i][0] += dt * Fx
particlev[i][1] += dt * Fy
particlev[i][2] += dt * Fz
for i in range(nParticles):
particle[i][0] += particlev[i][0] * dt
particle[i][1] += particlev[i][1] * dt
particle[i][2] += particlev[i][2] * dt
We then execute the function to measure execution speed.
%time nbody_py(particle, particlev)
CPU times: user 6.52 s, sys: 19.3 ms, total: 6.54 s Wall time: 6.55 s
The basic algorithm allows for a number of vectorizations by using NumPy.
import numpy as np
particle = np.random.standard_normal((nParticles, 3))
particlev = np.zeros_like(particle)
Using NumPy arrays and matplotlib's 3d plotting capabilities, we can easily visualize the intial positions of the particles in the space.
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(particle[:, 0], particle[:, 1], particle[:, 2],
c='r', marker='.')
<mpl_toolkits.mplot3d.art3d.Patch3DCollection at 0x1081c9b50>
The respective NumPy simulation function is considerably more compact.
def nbody_np(particle, particlev): # NumPy arrays as input
nSteps = 15; dt = 0.01
for step in range(1, nSteps + 1, 1):
Fp = np.zeros((nParticles, 3))
for i in range(nParticles):
dp = particle - particle[i]
drSquared = np.sum(dp ** 2, axis=1)
h = drSquared + np.sqrt(drSquared)
drPowerN32 = 1. / np.maximum(h, 1E-10)
Fp += -(dp.T * drPowerN32).T
particlev += dt * Fp
particle += particlev * dt
It is also considerably faster.
%time nbody_np(particle, particlev)
CPU times: user 493 ms, sys: 1.47 ms, total: 495 ms Wall time: 494 ms
Let's check how the particles moved. The following is a snapshot of the final positions of all particles.
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(particle[:, 0], particle[:, 1], particle[:, 2],
c='r', marker='.')
<mpl_toolkits.mplot3d.art3d.Patch3DCollection at 0x10820d8d0>
Numba is an open-source, NumPy-aware optimizing compiler for Python sponsored by The Python Quants GmbH It uses the remarkable LLVM compiler infrastructure to compile Python byte-code to machine code especially for use in the NumPy run-time and SciPy modules.
For this case, we first implement a function which is somehow a hybrid function of the pure Python (loop-heavy) and the NumPy array class-based version.
def nbody_hy(particle, particlev): # NumPy arrays as input
nSteps = 15; dt = 0.01
for step in range(1, nSteps + 1, 1):
for i in range(nParticles):
Fx = 0.0; Fy = 0.0; Fz = 0.0
for j in range(nParticles):
if j != i:
dx = particle[j,0] - particle[i,0]
dy = particle[j,1] - particle[i,1]
dz = particle[j,2] - particle[i,2]
drSquared = dx * dx + dy * dy + dz * dz
drPowerN32 = 1.0 / (drSquared + sqrt(drSquared))
Fx += dx * drPowerN32
Fy += dy * drPowerN32
Fz += dz * drPowerN32
particlev[i, 0] += dt * Fx
particlev[i, 1] += dt * Fy
particlev[i, 2] += dt * Fz
for i in range(nParticles):
particle[i,0] += particlev[i,0] * dt
particle[i,1] += particlev[i,1] * dt
particle[i,2] += particlev[i,2] * dt
This function is then compiled with Numba.
import numba as nb
nbody_nb = nb.autojit(nbody_hy)
nbody_nb(particle, particlev)
DEBUG -- translate:361:translate ; ModuleID = 'tmp.module.__main__.nbody_hy.108244cf8' @PyArray_API = linkonce_odr global i8** inttoptr (i64 4349352640 to i8**) define void @__numba_specialized_0___main___2E_nbody_hy({ i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particle, { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particlev) { entry: %nsteps8 = alloca i64 %target_temp7 = alloca i64 %nsteps6 = alloca i64 %target_temp5 = alloca i64 %nsteps4 = alloca i64 %target_temp3 = alloca i64 %nsteps = alloca i64 %target_temp = alloca i64 %particlev2 = alloca { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particle1 = alloca { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* store { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particle, { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }** %particle1 %0 = bitcast { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particle to { i64, i8* }* call void @Py_INCREF({ i64, i8* }* %0) %1 = getelementptr { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particle, i32 0, i32 2 %2 = load i8** %1, !tbaa !2 %3 = getelementptr { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particle, i32 0, i32 4 %4 = load i64** %3, !tbaa !4 %5 = getelementptr i64* %4, i32 0 %6 = load i64* %5 %7 = getelementptr i64* %4, i32 1 %8 = load i64* %7 %9 = getelementptr { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particle, i32 0, i32 5 %10 = load i64** %9, !tbaa !5 %11 = getelementptr i64* %10, i32 0 %12 = load i64* %11 %13 = getelementptr i64* %10, i32 1 %14 = load i64* %13 store { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particlev, { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }** %particlev2 %15 = bitcast { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particlev to { i64, i8* }* call void @Py_INCREF({ i64, i8* }* %15) %16 = getelementptr { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particlev, i32 0, i32 2 %17 = load i8** %16, !tbaa !2 %18 = getelementptr { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particlev, i32 0, i32 4 %19 = load i64** %18, !tbaa !4 %20 = getelementptr i64* %19, i32 0 %21 = load i64* %20 %22 = getelementptr i64* %19, i32 1 %23 = load i64* %22 %24 = getelementptr { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particlev, i32 0, i32 5 %25 = load i64** %24, !tbaa !5 %26 = getelementptr i64* %25, i32 0 %27 = load i64* %26 %28 = getelementptr i64* %25, i32 1 %29 = load i64* %28 store i64 0, i64* %target_temp, !tbaa !6 br i1 true, label %ifexp.then, label %ifexp.else cleanup_label: ; preds = %"exit_for_3:4", %error_label %30 = load { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }** %particlev2, !tbaa !14 %31 = bitcast { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %30 to { i64, i8* }* call void @Py_XDECREF({ i64, i8* }* %31) %32 = load { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }** %particle1, !tbaa !14 %33 = bitcast { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %32 to { i64, i8* }* call void @Py_XDECREF({ i64, i8* }* %33) ret void error_label: ; No predecessors! br label %cleanup_label ifexp.then: ; preds = %entry br label %ifexp.merge ifexp.else: ; preds = %entry br label %ifexp.merge ifexp.merge: ; preds = %ifexp.else, %ifexp.then %34 = phi i32 [ 1, %ifexp.then ], [ -1, %ifexp.else ] %35 = sext i32 %34 to i64 %36 = sub i64 16, %35 %37 = sdiv i64 %36, 1 store i64 %37, i64* %nsteps, !tbaa !7 br label %"for_condition_3:16" "for_condition_3:16": ; preds = %ifexp.merge, %"exit_for_19:8" %i_1 = phi i64 [ 123456789, %ifexp.merge ], [ %i_4, %"exit_for_19:8" ] %dz_1 = phi double [ 0x7FF8000000000000, %ifexp.merge ], [ %dz_2, %"exit_for_19:8" ] %Fx_1 = phi double [ 0x7FF8000000000000, %ifexp.merge ], [ %Fx_2, %"exit_for_19:8" ] %Fy_1 = phi double [ 0x7FF8000000000000, %ifexp.merge ], [ %Fy_2, %"exit_for_19:8" ] %Fz_1 = phi double [ 0x7FF8000000000000, %ifexp.merge ], [ %Fz_2, %"exit_for_19:8" ] %j_1 = phi i64 [ 123456789, %ifexp.merge ], [ %j_2, %"exit_for_19:8" ] %drPowerN32_1 = phi double [ 0x7FF8000000000000, %ifexp.merge ], [ %drPowerN32_2, %"exit_for_19:8" ] %dx_1 = phi double [ 0x7FF8000000000000, %ifexp.merge ], [ %dx_2, %"exit_for_19:8" ] %dy_1 = phi double [ 0x7FF8000000000000, %ifexp.merge ], [ %dy_2, %"exit_for_19:8" ] %drSquared_1 = phi double [ 0x7FF8000000000000, %ifexp.merge ], [ %drSquared_2, %"exit_for_19:8" ] %38 = load i64* %target_temp, !tbaa !6 %39 = load i64* %nsteps, !tbaa !7 %40 = icmp slt i64 %38, %39 %41 = icmp ne i1 %40, false br i1 %41, label %"loop_body_4:8", label %"exit_for_3:4" "exit_for_3:4": ; preds = %"for_condition_3:16" br label %cleanup_label "loop_body_4:8": ; preds = %"for_condition_3:16" %42 = load i64* %target_temp, !tbaa !6 %43 = mul i64 %42, 1 %44 = add i64 1, %43 %45 = load i64* %target_temp, !tbaa !6 %46 = add i64 %45, 1 store i64 %46, i64* %target_temp, !tbaa !6 store i64 0, i64* %target_temp3, !tbaa !8 store i64 500, i64* %nsteps4, !tbaa !9 br label %"for_condition_4:17" "for_condition_4:17": ; preds = %"loop_body_4:8", %"exit_for_6:12" %dz_2 = phi double [ %dz_3, %"exit_for_6:12" ], [ %dz_1, %"loop_body_4:8" ] %i_2 = phi i64 [ %51, %"exit_for_6:12" ], [ %i_1, %"loop_body_4:8" ] %Fx_2 = phi double [ %Fx_4, %"exit_for_6:12" ], [ %Fx_1, %"loop_body_4:8" ] %Fy_2 = phi double [ %Fy_4, %"exit_for_6:12" ], [ %Fy_1, %"loop_body_4:8" ] %Fz_2 = phi double [ %Fz_4, %"exit_for_6:12" ], [ %Fz_1, %"loop_body_4:8" ] %j_2 = phi i64 [ %j_3, %"exit_for_6:12" ], [ %j_1, %"loop_body_4:8" ] %drPowerN32_2 = phi double [ %drPowerN32_3, %"exit_for_6:12" ], [ %drPowerN32_1, %"loop_body_4:8" ] %dx_2 = phi double [ %dx_3, %"exit_for_6:12" ], [ %dx_1, %"loop_body_4:8" ] %dy_2 = phi double [ %dy_3, %"exit_for_6:12" ], [ %dy_1, %"loop_body_4:8" ] %drSquared_2 = phi double [ %drSquared_3, %"exit_for_6:12" ], [ %drSquared_1, %"loop_body_4:8" ] %47 = load i64* %target_temp3, !tbaa !8 %48 = load i64* %nsteps4, !tbaa !9 %49 = icmp slt i64 %47, %48 %50 = icmp ne i1 %49, false br i1 %50, label %"loop_body_5:12", label %"exit_for_4:8" "exit_for_4:8": ; preds = %"for_condition_4:17" store i64 0, i64* %target_temp7, !tbaa !12 store i64 500, i64* %nsteps8, !tbaa !13 br label %"for_condition_19:17" "loop_body_5:12": ; preds = %"for_condition_4:17" %51 = load i64* %target_temp3, !tbaa !8 %52 = load i64* %target_temp3, !tbaa !8 %53 = add i64 %52, 1 store i64 %53, i64* %target_temp3, !tbaa !8 store i64 0, i64* %target_temp5, !tbaa !10 store i64 500, i64* %nsteps6, !tbaa !11 br label %"for_condition_6:21" "for_condition_6:21": ; preds = %"loop_body_5:12", %"exit_if_7:16" %dz_3 = phi double [ %dz_2, %"loop_body_5:12" ], [ %dz_5, %"exit_if_7:16" ] %drSquared_3 = phi double [ %drSquared_2, %"loop_body_5:12" ], [ %drSquared_5, %"exit_if_7:16" ] %Fx_4 = phi double [ 0.000000e+00, %"loop_body_5:12" ], [ %Fx_6, %"exit_if_7:16" ] %Fy_4 = phi double [ 0.000000e+00, %"loop_body_5:12" ], [ %Fy_6, %"exit_if_7:16" ] %Fz_4 = phi double [ 0.000000e+00, %"loop_body_5:12" ], [ %Fz_6, %"exit_if_7:16" ] %j_3 = phi i64 [ %j_2, %"loop_body_5:12" ], [ %58, %"exit_if_7:16" ] %drPowerN32_3 = phi double [ %drPowerN32_2, %"loop_body_5:12" ], [ %drPowerN32_5, %"exit_if_7:16" ] %dx_3 = phi double [ %dx_2, %"loop_body_5:12" ], [ %dx_5, %"exit_if_7:16" ] %dy_3 = phi double [ %dy_2, %"loop_body_5:12" ], [ %dy_5, %"exit_if_7:16" ] %54 = load i64* %target_temp5, !tbaa !10 %55 = load i64* %nsteps6, !tbaa !11 %56 = icmp slt i64 %54, %55 %57 = icmp ne i1 %56, false br i1 %57, label %"loop_body_7:16", label %"exit_for_6:12" "exit_for_6:12": ; preds = %"for_condition_6:21" br label %"for_condition_4:17" "loop_body_7:16": ; preds = %"for_condition_6:21" %58 = load i64* %target_temp5, !tbaa !10 %59 = load i64* %target_temp5, !tbaa !10 %60 = add i64 %59, 1 store i64 %60, i64* %target_temp5, !tbaa !10 br label %"if_cond_7:19" "if_cond_7:19": ; preds = %"loop_body_7:16" %61 = icmp ne i64 %58, %51 %62 = icmp ne i1 %61, false br i1 %62, label %"if_body_8:20", label %"exit_if_7:16" "exit_if_7:16": ; preds = %"if_cond_7:19", %"if_body_8:20" %dz_5 = phi double [ %152, %"if_body_8:20" ], [ %dz_3, %"if_cond_7:19" ] %drSquared_5 = phi double [ %157, %"if_body_8:20" ], [ %drSquared_3, %"if_cond_7:19" ] %Fx_6 = phi double [ %162, %"if_body_8:20" ], [ %Fx_4, %"if_cond_7:19" ] %Fy_6 = phi double [ %164, %"if_body_8:20" ], [ %Fy_4, %"if_cond_7:19" ] %Fz_6 = phi double [ %166, %"if_body_8:20" ], [ %Fz_4, %"if_cond_7:19" ] %drPowerN32_5 = phi double [ %160, %"if_body_8:20" ], [ %drPowerN32_3, %"if_cond_7:19" ] %dx_5 = phi double [ %122, %"if_body_8:20" ], [ %dx_3, %"if_cond_7:19" ] %dy_5 = phi double [ %137, %"if_body_8:20" ], [ %dy_3, %"if_cond_7:19" ] %63 = mul i64 %51, %27 %64 = add i64 0, %63 %65 = mul i64 0, %29 %66 = add i64 %64, %65 %67 = getelementptr i8* %17, i64 %66 %68 = bitcast i8* %67 to double* %69 = load double* %68, !tbaa !2 %70 = fmul double 1.000000e-02, %Fx_6 %71 = fadd double %69, %70 %72 = mul i64 %51, %27 %73 = add i64 0, %72 %74 = mul i64 0, %29 %75 = add i64 %73, %74 %76 = getelementptr i8* %17, i64 %75 %77 = bitcast i8* %76 to double* store double %71, double* %77, !tbaa !2 %78 = mul i64 %51, %27 %79 = add i64 0, %78 %80 = mul i64 1, %29 %81 = add i64 %79, %80 %82 = getelementptr i8* %17, i64 %81 %83 = bitcast i8* %82 to double* %84 = load double* %83, !tbaa !2 %85 = fmul double 1.000000e-02, %Fy_6 %86 = fadd double %84, %85 %87 = mul i64 %51, %27 %88 = add i64 0, %87 %89 = mul i64 1, %29 %90 = add i64 %88, %89 %91 = getelementptr i8* %17, i64 %90 %92 = bitcast i8* %91 to double* store double %86, double* %92, !tbaa !2 %93 = mul i64 %51, %27 %94 = add i64 0, %93 %95 = mul i64 2, %29 %96 = add i64 %94, %95 %97 = getelementptr i8* %17, i64 %96 %98 = bitcast i8* %97 to double* %99 = load double* %98, !tbaa !2 %100 = fmul double 1.000000e-02, %Fz_6 %101 = fadd double %99, %100 %102 = mul i64 %51, %27 %103 = add i64 0, %102 %104 = mul i64 2, %29 %105 = add i64 %103, %104 %106 = getelementptr i8* %17, i64 %105 %107 = bitcast i8* %106 to double* store double %101, double* %107, !tbaa !2 br label %"for_condition_6:21" "if_body_8:20": ; preds = %"if_cond_7:19" %108 = mul i64 %58, %12 %109 = add i64 0, %108 %110 = mul i64 0, %14 %111 = add i64 %109, %110 %112 = getelementptr i8* %2, i64 %111 %113 = bitcast i8* %112 to double* %114 = load double* %113, !tbaa !2 %115 = mul i64 %51, %12 %116 = add i64 0, %115 %117 = mul i64 0, %14 %118 = add i64 %116, %117 %119 = getelementptr i8* %2, i64 %118 %120 = bitcast i8* %119 to double* %121 = load double* %120, !tbaa !2 %122 = fsub double %114, %121 %123 = mul i64 %58, %12 %124 = add i64 0, %123 %125 = mul i64 1, %14 %126 = add i64 %124, %125 %127 = getelementptr i8* %2, i64 %126 %128 = bitcast i8* %127 to double* %129 = load double* %128, !tbaa !2 %130 = mul i64 %51, %12 %131 = add i64 0, %130 %132 = mul i64 1, %14 %133 = add i64 %131, %132 %134 = getelementptr i8* %2, i64 %133 %135 = bitcast i8* %134 to double* %136 = load double* %135, !tbaa !2 %137 = fsub double %129, %136 %138 = mul i64 %58, %12 %139 = add i64 0, %138 %140 = mul i64 2, %14 %141 = add i64 %139, %140 %142 = getelementptr i8* %2, i64 %141 %143 = bitcast i8* %142 to double* %144 = load double* %143, !tbaa !2 %145 = mul i64 %51, %12 %146 = add i64 0, %145 %147 = mul i64 2, %14 %148 = add i64 %146, %147 %149 = getelementptr i8* %2, i64 %148 %150 = bitcast i8* %149 to double* %151 = load double* %150, !tbaa !2 %152 = fsub double %144, %151 %153 = fmul double %122, %122 %154 = fmul double %137, %137 %155 = fadd double %153, %154 %156 = fmul double %152, %152 %157 = fadd double %155, %156 %158 = call double @"numba.math.['double'].sqrt"(double %157) %159 = fadd double %157, %158 %160 = fdiv double 1.000000e+00, %159 %161 = fmul double %122, %160 %162 = fadd double %Fx_4, %161 %163 = fmul double %137, %160 %164 = fadd double %Fy_4, %163 %165 = fmul double %152, %160 %166 = fadd double %Fz_4, %165 br label %"exit_if_7:16" "for_condition_19:17": ; preds = %"exit_for_4:8", %"loop_body_20:29" %i_4 = phi i64 [ %i_2, %"exit_for_4:8" ], [ %171, %"loop_body_20:29" ] %167 = load i64* %target_temp7, !tbaa !12 %168 = load i64* %nsteps8, !tbaa !13 %169 = icmp slt i64 %167, %168 %170 = icmp ne i1 %169, false br i1 %170, label %"loop_body_20:29", label %"exit_for_19:8" "exit_for_19:8": ; preds = %"for_condition_19:17" br label %"for_condition_3:16" "loop_body_20:29": ; preds = %"for_condition_19:17" %171 = load i64* %target_temp7, !tbaa !12 %172 = load i64* %target_temp7, !tbaa !12 %173 = add i64 %172, 1 store i64 %173, i64* %target_temp7, !tbaa !12 %174 = mul i64 %171, %12 %175 = add i64 0, %174 %176 = mul i64 0, %14 %177 = add i64 %175, %176 %178 = getelementptr i8* %2, i64 %177 %179 = bitcast i8* %178 to double* %180 = load double* %179, !tbaa !2 %181 = mul i64 %171, %27 %182 = add i64 0, %181 %183 = mul i64 0, %29 %184 = add i64 %182, %183 %185 = getelementptr i8* %17, i64 %184 %186 = bitcast i8* %185 to double* %187 = load double* %186, !tbaa !2 %188 = fmul double %187, 1.000000e-02 %189 = fadd double %180, %188 %190 = mul i64 %171, %12 %191 = add i64 0, %190 %192 = mul i64 0, %14 %193 = add i64 %191, %192 %194 = getelementptr i8* %2, i64 %193 %195 = bitcast i8* %194 to double* store double %189, double* %195, !tbaa !2 %196 = mul i64 %171, %12 %197 = add i64 0, %196 %198 = mul i64 1, %14 %199 = add i64 %197, %198 %200 = getelementptr i8* %2, i64 %199 %201 = bitcast i8* %200 to double* %202 = load double* %201, !tbaa !2 %203 = mul i64 %171, %27 %204 = add i64 0, %203 %205 = mul i64 1, %29 %206 = add i64 %204, %205 %207 = getelementptr i8* %17, i64 %206 %208 = bitcast i8* %207 to double* %209 = load double* %208, !tbaa !2 %210 = fmul double %209, 1.000000e-02 %211 = fadd double %202, %210 %212 = mul i64 %171, %12 %213 = add i64 0, %212 %214 = mul i64 1, %14 %215 = add i64 %213, %214 %216 = getelementptr i8* %2, i64 %215 %217 = bitcast i8* %216 to double* store double %211, double* %217, !tbaa !2 %218 = mul i64 %171, %12 %219 = add i64 0, %218 %220 = mul i64 2, %14 %221 = add i64 %219, %220 %222 = getelementptr i8* %2, i64 %221 %223 = bitcast i8* %222 to double* %224 = load double* %223, !tbaa !2 %225 = mul i64 %171, %27 %226 = add i64 0, %225 %227 = mul i64 2, %29 %228 = add i64 %226, %227 %229 = getelementptr i8* %17, i64 %228 %230 = bitcast i8* %229 to double* %231 = load double* %230, !tbaa !2 %232 = fmul double %231, 1.000000e-02 %233 = fadd double %224, %232 %234 = mul i64 %171, %12 %235 = add i64 0, %234 %236 = mul i64 2, %14 %237 = add i64 %235, %236 %238 = getelementptr i8* %2, i64 %237 %239 = bitcast i8* %238 to double* store double %233, double* %239, !tbaa !2 br label %"for_condition_19:17" } declare { i64, i8* }* @Py_BuildValue(i8*, ...) declare i32 @PyArg_ParseTuple({ i64, i8* }*, i8*, ...) declare void @PyErr_Clear() declare void @Py_INCREF({ i64, i8* }*) declare double @"numba.math.['double'].sqrt"(double) declare void @Py_XDECREF({ i64, i8* }*) !tbaa = !{!0, !1, !2, !3, !4, !5, !6, !7, !8, !9, !10, !11, !12, !13, !14} !0 = metadata !{metadata !"root"} !1 = metadata !{metadata !"char *", metadata !0} !2 = metadata !{metadata !"float3264 *", metadata !1} !3 = metadata !{metadata !"npy_intp **", metadata !1} !4 = metadata !{metadata !"tbaa_type(numpy shape, npy_intp *) *", metadata !3} !5 = metadata !{metadata !"tbaa_type(numpy strides, npy_intp *) *", metadata !3} !6 = metadata !{metadata !"unique0", metadata !1} !7 = metadata !{metadata !"unique1", metadata !1} !8 = metadata !{metadata !"unique2", metadata !1} !9 = metadata !{metadata !"unique3", metadata !1} !10 = metadata !{metadata !"unique4", metadata !1} !11 = metadata !{metadata !"unique5", metadata !1} !12 = metadata !{metadata !"unique6", metadata !1} !13 = metadata !{metadata !"unique7", metadata !1} !14 = metadata !{metadata !"object", metadata !1} DEBUG -- translate:361:translate ; ModuleID = 'numba_executable_module' @PyArray_API = linkonce_odr global i8** inttoptr (i64 4349352640 to i8**) define void @Py_INCREF({ i64, i8* }* %obj) { decl: %obj1 = alloca { i64, i8* }* store { i64, i8* }* %obj, { i64, i8* }** %obj1 %0 = bitcast { i64, i8* }* %obj to i64* %1 = load i64* %0 %2 = add i64 %1, 1 store i64 %2, i64* %0 ret void } define void @Py_DECREF({ i64, i8* }* %obj) { decl: %obj1 = alloca { i64, i8* }* store { i64, i8* }* %obj, { i64, i8* }** %obj1 %0 = bitcast { i64, i8* }* %obj to i64* %1 = load i64* %0 %2 = icmp sgt i64 %1, 1 br i1 %2, label %if.then, label %if.else if.then: ; preds = %decl %3 = bitcast { i64, i8* }* %obj to i64* %4 = add i64 %1, -1 store i64 %4, i64* %3 br label %if.end if.else: ; preds = %decl call void @Py_DecRef({ i64, i8* }* %obj) br label %if.end if.end: ; preds = %if.else, %if.then ret void } declare void @Py_DecRef({ i64, i8* }*) define void @Py_XINCREF({ i64, i8* }* %obj) { decl: %obj1 = alloca { i64, i8* }* store { i64, i8* }* %obj, { i64, i8* }** %obj1 %0 = ptrtoint { i64, i8* }* %obj to i64 %1 = icmp ne i64 %0, 0 br i1 %1, label %if.then, label %if.end if.then: ; preds = %decl %2 = bitcast { i64, i8* }* %obj to i64* %3 = load i64* %2 %4 = add i64 %3, 1 store i64 %4, i64* %2 br label %if.end if.end: ; preds = %if.then, %decl ret void } define void @Py_XDECREF({ i64, i8* }* %obj) { decl: %obj1 = alloca { i64, i8* }* store { i64, i8* }* %obj, { i64, i8* }** %obj1 %0 = ptrtoint { i64, i8* }* %obj to i64 %1 = icmp ne i64 %0, 0 br i1 %1, label %if.then, label %if.end if.then: ; preds = %decl call void @Py_DECREF({ i64, i8* }* %obj) br label %if.end if.end: ; preds = %if.then, %decl ret void } define i8* @IndexAxis(i8* %data, i64* %in_shape, i64* %in_strides, i64 %src_dim, i64 %index) { decl: %data1 = alloca i8* %in_shape2 = alloca i64* %in_strides3 = alloca i64* %src_dim4 = alloca i64 %index5 = alloca i64 %result = alloca i8* store i8* %data, i8** %data1 store i64* %in_shape, i64** %in_shape2 store i64* %in_strides, i64** %in_strides3 store i64 %src_dim, i64* %src_dim4 store i64 %index, i64* %index5 %0 = load i64** %in_strides3 %1 = load i64* %src_dim4 %2 = getelementptr inbounds i64* %0, i64 %1 %3 = load i64* %2 %4 = mul i64 %3, %index %5 = load i8** %data1 %6 = getelementptr inbounds i8* %5, i64 %4 store i8* %6, i8** %result ret i8* %6 } define void @NewAxis(i64* %out_shape, i64* %out_strides, i32 %dst_dim) { decl: %out_shape1 = alloca i64* %out_strides2 = alloca i64* %dst_dim3 = alloca i32 store i64* %out_shape, i64** %out_shape1 store i64* %out_strides, i64** %out_strides2 store i32 %dst_dim, i32* %dst_dim3 %0 = load i64** %out_shape1 %1 = getelementptr inbounds i64* %0, i32 %dst_dim store i64 1, i64* %1 %2 = load i64** %out_strides2 %3 = load i32* %dst_dim3 %4 = getelementptr inbounds i64* %2, i32 %3 store i64 0, i64* %4 ret void } define i32 @Broadcast(i64* %dst_shape, i64* %src_shape, i64* %src_strides, i32 %max_ndim, i32 %ndim) { decl: %dst_shape1 = alloca i64* %src_shape2 = alloca i64* %src_strides3 = alloca i64* %max_ndim4 = alloca i32 %ndim5 = alloca i32 %0 = alloca i32 store i64* %dst_shape, i64** %dst_shape1 store i64* %src_shape, i64** %src_shape2 store i64* %src_strides, i64** %src_strides3 store i32 %max_ndim, i32* %max_ndim4 store i32 %ndim, i32* %ndim5 %1 = load i32* %max_ndim4 %2 = sub i32 %1, %ndim store i32 0, i32* %0 br label %loop.cond loop.cond: ; preds = %if.end11, %decl %3 = load i32* %0 %4 = load i32* %ndim5 %5 = icmp slt i32 %3, %4 br i1 %5, label %loop.body, label %loop.end loop.body: ; preds = %loop.cond %6 = load i64** %src_shape2 %7 = getelementptr inbounds i64* %6, i32 %3 %8 = add i32 %3, %2 %9 = load i64** %dst_shape1 %10 = getelementptr inbounds i64* %9, i32 %8 %11 = load i64* %7 %12 = icmp eq i64 %11, 1 br i1 %12, label %if.then, label %if.else loop.end: ; preds = %if.else7, %loop.cond %merge = phi i32 [ 1, %loop.cond ], [ 0, %if.else7 ] ret i32 %merge if.then: ; preds = %loop.body %13 = load i64** %src_strides3 %14 = getelementptr inbounds i64* %13, i32 %3 store i64 0, i64* %14 br label %if.end11 if.else: ; preds = %loop.body %15 = load i64* %10 %16 = icmp eq i64 %15, 1 br i1 %16, label %if.then6, label %if.else7 if.then6: ; preds = %if.else store i64 %11, i64* %10 br label %if.end11 if.else7: ; preds = %if.else %17 = icmp ne i64 %11, %15 br i1 %17, label %loop.end, label %if.end11 if.end11: ; preds = %if.else7, %if.then6, %if.then %18 = load i32* %0 %19 = add i32 %18, 1 store i32 %19, i32* %0 br label %loop.cond } define void @__numba_specialized_0___main___2E_nbody_hy({ i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particle, { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particlev) { entry: %0 = bitcast { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particle to { i64, i8* }* tail call void @Py_INCREF({ i64, i8* }* %0) %1 = getelementptr { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particle, i64 0, i32 2 %2 = load i8** %1, align 8, !tbaa !2 %3 = ptrtoint i8* %2 to i64 %4 = getelementptr { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particle, i64 0, i32 5 %5 = load i64** %4, align 8, !tbaa !5 %6 = load i64* %5, align 8 %7 = getelementptr i64* %5, i64 1 %8 = load i64* %7, align 8 %9 = bitcast { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particlev to { i64, i8* }* tail call void @Py_INCREF({ i64, i8* }* %9) %10 = getelementptr { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particlev, i64 0, i32 2 %11 = load i8** %10, align 8, !tbaa !2 %12 = ptrtoint i8* %11 to i64 %13 = getelementptr { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particlev, i64 0, i32 5 %14 = load i64** %13, align 8, !tbaa !5 %15 = load i64* %14, align 8 %16 = getelementptr i64* %14, i64 1 %17 = load i64* %16, align 8 %18 = shl i64 %17, 1 %19 = shl i64 %8, 1 %20 = sub i64 0, %3 %21 = sub i64 0, %12 br label %"loop_body_4:8" "for_condition_3:16.loopexit": ; preds = %"loop_body_20:29" %22 = add i64 %25, 1 %exitcond33 = icmp eq i64 %22, 15 br i1 %exitcond33, label %"exit_for_3:4", label %"loop_body_4:8" "exit_for_3:4": ; preds = %"for_condition_3:16.loopexit" %23 = bitcast { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particlev to { i64, i8* }* %24 = bitcast { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %particle to { i64, i8* }* tail call void @Py_XDECREF({ i64, i8* }* %23) tail call void @Py_XDECREF({ i64, i8* }* %24) ret void "loop_body_4:8": ; preds = %"for_condition_3:16.loopexit", %entry %25 = phi i64 [ 0, %entry ], [ %22, %"for_condition_3:16.loopexit" ] br label %"loop_body_5:12" "for_condition_4:17.loopexit": ; preds = %"exit_if_7:16" %26 = add i64 %27, 1 %exitcond31 = icmp eq i64 %26, 500 br i1 %exitcond31, label %"loop_body_20:29", label %"loop_body_5:12" "loop_body_5:12": ; preds = %"for_condition_4:17.loopexit", %"loop_body_4:8" %27 = phi i64 [ 0, %"loop_body_4:8" ], [ %26, %"for_condition_4:17.loopexit" ] %28 = mul i64 %27, %15 %29 = add i64 %28, %17 %30 = add i64 %28, %18 %31 = mul i64 %27, %6 %32 = add i64 %31, %8 %33 = add i64 %31, %19 br label %"loop_body_7:16" "loop_body_7:16": ; preds = %"exit_if_7:16", %"loop_body_5:12" %lsr.iv = phi i64 [ %lsr.iv.next, %"exit_if_7:16" ], [ %20, %"loop_body_5:12" ] %Fz_417 = phi double [ 0.000000e+00, %"loop_body_5:12" ], [ %Fz_6, %"exit_if_7:16" ] %Fy_416 = phi double [ 0.000000e+00, %"loop_body_5:12" ], [ %Fy_6, %"exit_if_7:16" ] %Fx_415 = phi double [ 0.000000e+00, %"loop_body_5:12" ], [ %Fx_6, %"exit_if_7:16" ] %34 = phi i64 [ 0, %"loop_body_5:12" ], [ %35, %"exit_if_7:16" ] %35 = add i64 %34, 1 %36 = icmp eq i64 %27, %34 br i1 %36, label %"exit_if_7:16", label %"if_body_8:20" "exit_if_7:16": ; preds = %"loop_body_7:16", %"if_body_8:20" %Fx_6 = phi double [ %68, %"if_body_8:20" ], [ %Fx_415, %"loop_body_7:16" ] %Fy_6 = phi double [ %70, %"if_body_8:20" ], [ %Fy_416, %"loop_body_7:16" ] %Fz_6 = phi double [ %72, %"if_body_8:20" ], [ %Fz_417, %"loop_body_7:16" ] %sunkaddr = ptrtoint i8* %11 to i64 %sunkaddr23 = add i64 %sunkaddr, %28 %sunkaddr24 = inttoptr i64 %sunkaddr23 to double* %37 = load double* %sunkaddr24, align 8, !tbaa !2 %38 = fmul double %Fx_6, 1.000000e-02 %39 = fadd double %38, %37 store double %39, double* %sunkaddr24, align 8, !tbaa !2 %sunkaddr25 = ptrtoint i8* %11 to i64 %sunkaddr26 = add i64 %sunkaddr25, %29 %sunkaddr27 = inttoptr i64 %sunkaddr26 to double* %40 = load double* %sunkaddr27, align 8, !tbaa !2 %41 = fmul double %Fy_6, 1.000000e-02 %42 = fadd double %41, %40 store double %42, double* %sunkaddr27, align 8, !tbaa !2 %sunkaddr28 = ptrtoint i8* %11 to i64 %sunkaddr29 = add i64 %sunkaddr28, %30 %sunkaddr30 = inttoptr i64 %sunkaddr29 to double* %43 = load double* %sunkaddr30, align 8, !tbaa !2 %44 = fmul double %Fz_6, 1.000000e-02 %45 = fadd double %44, %43 store double %45, double* %sunkaddr30, align 8, !tbaa !2 %lsr.iv.next = sub i64 %lsr.iv, %6 %exitcond = icmp eq i64 500, %35 br i1 %exitcond, label %"for_condition_4:17.loopexit", label %"loop_body_7:16" "if_body_8:20": ; preds = %"loop_body_7:16" %46 = mul i64 %lsr.iv, -1 %47 = inttoptr i64 %46 to double* %48 = load double* %47, align 8, !tbaa !2 %sunkaddr31 = ptrtoint i8* %2 to i64 %sunkaddr32 = add i64 %sunkaddr31, %31 %sunkaddr33 = inttoptr i64 %sunkaddr32 to double* %49 = load double* %sunkaddr33, align 8, !tbaa !2 %50 = fsub double %48, %49 %51 = mul i64 %lsr.iv, -1 %sunkaddr34 = add i64 %8, %51 %sunkaddr35 = inttoptr i64 %sunkaddr34 to double* %52 = load double* %sunkaddr35, align 8, !tbaa !2 %sunkaddr36 = ptrtoint i8* %2 to i64 %sunkaddr37 = add i64 %sunkaddr36, %32 %sunkaddr38 = inttoptr i64 %sunkaddr37 to double* %53 = load double* %sunkaddr38, align 8, !tbaa !2 %54 = fsub double %52, %53 %55 = mul i64 %lsr.iv, -1 %sunkaddr39 = mul i64 %8, 2 %sunkaddr40 = add i64 %55, %sunkaddr39 %sunkaddr41 = inttoptr i64 %sunkaddr40 to double* %56 = load double* %sunkaddr41, align 8, !tbaa !2 %sunkaddr42 = ptrtoint i8* %2 to i64 %sunkaddr43 = add i64 %sunkaddr42, %33 %sunkaddr44 = inttoptr i64 %sunkaddr43 to double* %57 = load double* %sunkaddr44, align 8, !tbaa !2 %58 = fsub double %56, %57 %59 = fmul double %50, %50 %60 = fmul double %54, %54 %61 = fadd double %59, %60 %62 = fmul double %58, %58 %63 = fadd double %61, %62 %64 = tail call double @"numba.math.['double'].sqrt"(double %63) %65 = fadd double %64, %63 %66 = fdiv double 1.000000e+00, %65 %67 = fmul double %50, %66 %68 = fadd double %Fx_415, %67 %69 = fmul double %54, %66 %70 = fadd double %Fy_416, %69 %71 = fmul double %58, %66 %72 = fadd double %Fz_417, %71 br label %"exit_if_7:16" "loop_body_20:29": ; preds = %"for_condition_4:17.loopexit", %"loop_body_20:29" %lsr.iv21 = phi i64 [ %lsr.iv.next22, %"loop_body_20:29" ], [ 500, %"for_condition_4:17.loopexit" ] %lsr.iv15 = phi i64 [ %lsr.iv.next16, %"loop_body_20:29" ], [ %21, %"for_condition_4:17.loopexit" ] %lsr.iv4 = phi i8* [ %89, %"loop_body_20:29" ], [ %2, %"for_condition_4:17.loopexit" ] %lsr.iv414 = bitcast i8* %lsr.iv4 to double* %lsr.iv45 = bitcast i8* %lsr.iv4 to i1* %73 = load double* %lsr.iv414, align 8, !tbaa !2 %74 = mul i64 %lsr.iv15, -1 %75 = inttoptr i64 %74 to double* %76 = load double* %75, align 8, !tbaa !2 %77 = fmul double %76, 1.000000e-02 %78 = fadd double %73, %77 store double %78, double* %lsr.iv414, align 8, !tbaa !2 %scevgep12 = getelementptr i8* %lsr.iv4, i64 %8 %scevgep1213 = bitcast i8* %scevgep12 to double* %79 = load double* %scevgep1213, align 8, !tbaa !2 %80 = mul i64 %lsr.iv15, -1 %sunkaddr45 = add i64 %17, %80 %sunkaddr46 = inttoptr i64 %sunkaddr45 to double* %81 = load double* %sunkaddr46, align 8, !tbaa !2 %82 = fmul double %81, 1.000000e-02 %83 = fadd double %79, %82 %scevgep10 = getelementptr i8* %lsr.iv4, i64 %8 %scevgep1011 = bitcast i8* %scevgep10 to double* store double %83, double* %scevgep1011, align 8, !tbaa !2 %sunkaddr47 = ptrtoint i8* %lsr.iv4 to i64 %sunkaddr48 = mul i64 %8, 2 %sunkaddr49 = add i64 %sunkaddr47, %sunkaddr48 %sunkaddr50 = inttoptr i64 %sunkaddr49 to double* %84 = load double* %sunkaddr50, align 8, !tbaa !2 %85 = mul i64 %lsr.iv15, -1 %sunkaddr51 = mul i64 %17, 2 %sunkaddr52 = add i64 %85, %sunkaddr51 %sunkaddr53 = inttoptr i64 %sunkaddr52 to double* %86 = load double* %sunkaddr53, align 8, !tbaa !2 %87 = fmul double %86, 1.000000e-02 %88 = fadd double %84, %87 %sunkaddr54 = ptrtoint i8* %lsr.iv4 to i64 %sunkaddr55 = mul i64 %8, 2 %sunkaddr56 = add i64 %sunkaddr54, %sunkaddr55 %sunkaddr57 = inttoptr i64 %sunkaddr56 to double* store double %88, double* %sunkaddr57, align 8, !tbaa !2 %scevgep = getelementptr i1* %lsr.iv45, i64 %6 %89 = bitcast i1* %scevgep to i8* %lsr.iv.next16 = sub i64 %lsr.iv15, %15 %lsr.iv.next22 = add i64 %lsr.iv21, -1 %exitcond32 = icmp eq i64 %lsr.iv.next22, 0 br i1 %exitcond32, label %"for_condition_3:16.loopexit", label %"loop_body_20:29" } declare double @"numba.math.['double'].sqrt"(double) define { i64, i8* }* @__numba_specialized_1_numba_2E_codegen_2E_llvmwrapper_2E___numba_wrapper_nbody_hy(i8* %self, { i64, i8* }* %args) { entry: %0 = alloca { i64, i8* }* %1 = alloca { i64, i8* }* %return_value = alloca { i64, i8* }* %2 = call i32 ({ i64, i8* }*, i8*, ...)* @PyArg_ParseTuple({ i64, i8* }* %args, i8* getelementptr inbounds ([3 x i8]* @__STR_0, i32 0, i32 0), { i64, i8* }** %1, { i64, i8* }** %0) %3 = icmp eq i32 %2, 0 br i1 %3, label %cleanup.if.true, label %cleanup.if.end cleanup_label: ; preds = %empty1, %error_label %4 = load { i64, i8* }** %return_value ret { i64, i8* }* %4 error_label: ; preds = %empty2, %cleanup.if.true store { i64, i8* }* null, { i64, i8* }** %return_value %5 = load { i64, i8* }** %return_value, !tbaa !14 call void @Py_XINCREF({ i64, i8* }* %5) br label %cleanup_label cleanup.if.true: ; preds = %entry br label %error_label cleanup.if.end: ; preds = %entry %6 = load { i64, i8* }** %1 %7 = load { i64, i8* }** %0 %8 = bitcast { i64, i8* }* %6 to { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %9 = bitcast { i64, i8* }* %7 to { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* call void @__numba_specialized_0___main___2E_nbody_hy({ i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %8, { i64, i8*, i8*, i32, i64*, i64*, i8*, i8*, i32, i8* }* %9) br label %empty empty: ; preds = %cleanup.if.end %10 = call i8* @PyErr_Occurred() %11 = ptrtoint i8* %10 to i64 %12 = icmp ne i64 %11, 0 br i1 %12, label %empty2, label %empty1 empty1: ; preds = %empty store { i64, i8* }* inttoptr (i64 4296527248 to { i64, i8* }*), { i64, i8* }** %return_value %13 = load { i64, i8* }** %return_value, !tbaa !14 call void @Py_XINCREF({ i64, i8* }* %13) br label %cleanup_label empty2: ; preds = %empty br label %error_label } declare i8* @PyErr_Occurred() declare { i64, i8* }* @Py_BuildValue(i8*, ...) declare i32 @PyArg_ParseTuple({ i64, i8* }*, i8*, ...) declare void @PyErr_Clear() !tbaa = !{!0, !1, !2, !3, !4, !5, !6, !7, !8, !9, !10, !11, !12, !13, !14, !0, !1, !14} !0 = metadata !{metadata !"root"} !1 = metadata !{metadata !"char *", metadata !0} !2 = metadata !{metadata !"float3264 *", metadata !1} !3 = metadata !{metadata !"npy_intp **", metadata !1} !4 = metadata !{metadata !"tbaa_type(numpy shape, npy_intp *) *", metadata !3} !5 = metadata !{metadata !"tbaa_type(numpy strides, npy_intp *) *", metadata !3} !6 = metadata !{metadata !"unique0", metadata !1} !7 = metadata !{metadata !"unique1", metadata !1} !8 = metadata !{metadata !"unique2", metadata !1} !9 = metadata !{metadata !"unique3", metadata !1} !10 = metadata !{metadata !"unique4", metadata !1} !11 = metadata !{metadata !"unique5", metadata !1} !12 = metadata !{metadata !"unique6", metadata !1} !13 = metadata !{metadata !"unique7", metadata !1} !14 = metadata !{metadata !"object", metadata !1}
This new, compiled version of the hybrid function is yet faster.
particle = np.random.standard_normal((nParticles, 3))
particlev = np.zeros_like(particle)
%time nbody_nb(particle, particlev)
CPU times: user 69.4 ms, sys: 113 µs, total: 69.5 ms Wall time: 69.6 ms
The dynamically compiled version is by far the fastest.
func_list = [nbody_py, nbody_np, nbody_nb]
perf_comp(func_list, data='particle, particlev', rep=3)
function: nbody_nb, av. time sec: 0.05783, relative: 1.0 function: nbody_np, av. time sec: 0.50086, relative: 8.7 function: nbody_py, av. time sec: 74.38617, relative: 1286.2
10 years ago, Python was considered exotic in the analytics space – at best. Languages/packages like R and Matlab dominated the scene. Today, Python has become a major force in data analytics & visualization due to a number of characteristics:
One of the easiest ways to deploy Python today across a whole organization for collaboration with a heterogenous IT infrastructure is via Wakari.io, Continuum's Web-/Browser- and Python-based Data Analytics environment. It is availble both as a cloud as well as an enterprise solution for in-house deployment.
The Python Quants – the company Web site
Dr. Yves J. Hilpisch – my personal Web site
Derivatives Analytics with Python – my new book
Contact Us