Python for Science and Engineering: a presentation to A*STAR and the Singapore Computational Sciences Club, Edward Schofield, Python Charmers, June 2011
An introduction to Python in science and engineering.
The presentation was given by Dr Edward Schofield of Python Charmers (www.pythoncharmers.com) to A*STAR and the Singapore Computational Sciences Club in June 2011.
Semelhante a Python for Science and Engineering: a presentation to A*STAR and the Singapore Computational Sciences Club, Edward Schofield, Python Charmers, June 2011
Semelhante a Python for Science and Engineering: a presentation to A*STAR and the Singapore Computational Sciences Club, Edward Schofield, Python Charmers, June 2011 (20)
Comparing Sidecar-less Service Mesh from Cilium and Istio
Python for Science and Engineering: a presentation to A*STAR and the Singapore Computational Sciences Club, Edward Schofield, Python Charmers, June 2011
1. Python for Science
and Engineering Dr Edward Schofield
A*STAR / Singapore Computational Sciences Club Seminar
June 14, 2011
2. Scientific programming in 2011
Most scientists and engineers are:
programming for 50+% of their work time (and rising)
self-taught programmers
using inefficient programming practices
using the wrong programming languages: C++,
FORTRAN, C#, PHP, Java, ...
4. Ed's story:
How I found Python
PhD in statistical pattern recognition: 2001-2006
Needed good tools for my research!
Discovered Python in 2002 after frustration with C++, Matlab,
Java, Perl
Contributed to NumPy and SciPy:
maxent, sparse matrices, optimization, Monte Carlo, etc.
Managed six releases of SciPy in 2005-6
9. Python’s standard library
data types strings networking threads
operating
compression GUI arguments
system
complex
CGI FTP cryptography
numbers
testing multimedia databases CSV files
calendar email XML serialization
10. What is an efficient
programming language?
Native Python code
executes 10x more slowly
than C and FORTRAN
11. Would you build a racing car ...
... to get to Kuala Lumpur ASAP?
12. Date Cost per GFLOPS (US $) Technology
1961 US $1.1 trillion 17 million IBM 1620s
1984 US $15,000,000 Cray X-MP
Two 16-CPU clusters of
1997 US $30,000
Pentiums
2000, Apr $1000 Bunyip Beowulf cluster
2003, Aug $82 KASY0
2007, Mar $0.42 Ambric AM2045
2009, Sep $0.13 ATI Radeon R800
Source: Wikipedia: “FLOPS”
23. Languages used at CSIRO
Python Fortran Java
Matlab C VB.net
IDL C++ R
Perl C# +5-10 others!
24. Which language do I choose?
A different language for each task?
A language you know?
A language others in your team are using: support and help?
25. Python Matlab
Interpreted Yes Yes
Powerful data input/output Yes Yes
Great plotting Yes Yes
General-purpose language Powerful Limited
Cost Free $$$
Open source Yes No
26. Python C++
Powerful Yes Yes
Portable Yes In theory
Standard libraries Vast Limited
Easy to write and maintain Yes No
Easy to learn Yes No
27. Python C
Fast to write Yes No
Good for embedded systems, device
No Yes
drivers and operating systems
Good for most other high-level tasks Yes No
Standard library Vast Limited
28. Python Java
Powerful, well-designed language Yes Yes
Standard libraries Vast Vast
Easy to learn Yes No
Code brevity Short Verbose
Easy to write and maintain Yes Okay
29. Open source
Python is open source software
Benefits:
No vendor lock-in
Cross-platform
Insurance against bugs in the platform
Free
31. Python success stories (2)
Aerospace:
NASA
Research:
universities worldwide ...
Others:
YouTube, Reddit, BitTorrent, Civilization IV,
32. Industrial Light & Magic
Python spread from
scripting to the entire
production pipeline
Numerous reviews since
1996: Python is still the
best tool for them
33. United Space Alliance
A common sentiment:
“We achieve immediate functioning code so much faster in
Python than in any other language that it’s staggering.”
- Robin Friedrich, Senior Project Engineer
34. Case study: air-traffic control
Eric Newton, “Python for
Critical Applications”: http://
metaslash.com/brochure/
recall.html
Metaslash, Inc: 1999 to 2001
Mission-critical system for
air-traffic control
Replicated, fault-tolerant
data storage
35. Case study: air-traffic control
Python prototype -> C++ implementation -> Python again
Why?
C++ dependencies were buggy
C++ threads, STL were not portable enough
Python’s advantages over C++
More portable
75% less code: more productivity, fewer bugs
36. More case studies
See http://www.python.org/about/success/ for lots more case
studies and success stories
41. NumPy
The most fundamental tool for numerical computing in
Python
Fast multi-dimensional array capability
42. What NumPy defines:
Two fundamental objects:
1. n-dimensional array
2. universal function
a rich set of numerical data types
nearly 400 functions and methods on arrays:
type conversions
mathematical
logical
43. NumPy's features
Fast. Written in C with BLAS/LAPACK hooks.
Rich set of data types
Linear algebra: matrix inversion, decompositions, …
Discrete Fourier transforms
Random number generation
Trig, hypergeometric functions, etc.
44. Elementwise array operations
Loops are mostly unnecessary
Operate on entire arrays!
>>> a = numpy.array([20, 30, 40, 50])
>>> a < 35
array([True, True, False, False], dtype=bool)
>>> b = numpy.arange(4)
>>> a - b
array([20, 29, 38, 47])
>>> b**2
array([0, 1, 4, 9])
45. Universal functions
NumPy defines 'ufuncs' that operate on entire arrays
and other sequences (hence 'universal')
Example: sin()
>>> a = numpy.array([20, 30, 40, 50])
>>> c = 10 * numpy.sin(a)
>>> c
array([ 9.12945251, -9.88031624, 7.4511316 ,
-2.62374854])
46. Array slicing
Arrays can be sliced and indexed powerfully:
>>> a = numpy.arange(10)**3
>>> a
array([ 0, 1, 8, 27, 64, 125, 216, 343,
512, 729])
>>> a[2:5]
array([ 8, 27, 64])
47. Fancy indexing
Arrays can be used as indices into other arrays:
>>> a = numpy.arange(12)**2
>>> ind = numpy.array([ 1, 1, 3, 8, 5 ])
>>> a[ind]
array([ 1, 1, 9, 64, 25])
48. Other linear algebra features
Matrix inversion: mat(A).I
Or: linalg.inv(A)
Linear solvers: linalg.solve(A, x)
Pseudoinverse: linalg.pinv(A)
49. What is SciPy?
A community
A conference
A package of scientific libraries
50. Python for scientific software
Back-end: computational work
Front-end: input / output, visualization, GUIs
Dozens of great scientific packages exist
51. Python in science (2)
NumPy: numerical / array module
Matplotlib: great 2D and 3D plotting library
IPython: nice interactive Python shell
SciPy: set of scientific libraries: sparse matrices, signal
processing, …
RPy: integration with the R statistical environment
52. Python in science (3)
Cython: C language extensions
Mayavi: 3D graphics, volumetric rendering
Nitimes, Nipype: Python tools for neuroimaging
SymPy: symbolic mathematics library
53. Python in science (4)
VPython: easy, real-time 3D programming
UCSF Chimera, PyMOL, VMD: molecular graphics
PyRAF: Hubble Space Telescope interface to RAF astronomical
data
BioPython: computational molecular biology
Natural language toolkit: symbolic + statistical NLP
Physics: PyROOT
54. The SciPy package
BSD-licensed software for maths, science,
engineering
integration signal processing sparse matrices
optimization linear algebra maximum entropy
interpolation ODEs statistics
n-dim image
FFTs scientific constants
processing
C/C++ and Fortran
clustering interpolation
integration
56. Example: fitting a model with
scipy.optimize
Task: Fit a model of the form y = a/bx sin(cx)+ε
to noisy data.
Spec:
1. Generate noisy data
2. Choose parameters (a, b, c) to minimize sum squared
errors
3. Plot the data and fitted model (next session)
57. SciPy optimisation example
import numpy
import pylab
from scipy.optimize import leastsq
def myfunc(params, x):
(a, b, c) = params
return a / (x**b) * numpy.sin(c * x)
true_params = [1.5, 0.1, 2.]
def f(x):
return myfunc(true_params, x)
def err(params, x, y): # error function
return myfunc(params, x) - y
58. SciPy optimisation example
# Generate noisy data to fit
n = 30; xmin = 0.1; xmax = 5
x = numpy.linspace(xmin, xmax, n)
y = f(x)
y += numpy.rand(len(x)) * 0.2 *
(y.max() - y.min())
v0 = [3., 1., 4.] # initial param estimate
# Fitting
v, success = leastsq(err, v0, args=(x, y), maxfev=10000)
print 'Estimated parameters: ', v
print 'True parameters: ', true_params
X = numpy.linspace(xmin, xmax, 5 * n)
pylab.plot(x, y, 'ro', X, myfunc(v, X))
pylab.show()
62. Sparse matrices
Sparse matrices are mostly zeros.
They can be symmetric or
asymmetric.
Sparsity patterns vary:
block sparse, band matrices, ...
They can be huge!
Only non-zeros are stored.
63. Sparse matrices in SciPy
SciPy supports seven sparse storage schemes
... and sparse solvers in Fortran.
64. Sparse matrix creation
To construct a 1000x1000 lil_matrix and add values:
>>> from scipy.sparse import lil_matrix
>>> from numpy.random import rand
>>> from scipy.sparse.linalg import spsolve
>>> A = lil_matrix((1000, 1000))
>>> A[0, :100] = rand(100)
>>> A[1, 100:200] = A[0, :100]
>>> A.setdiag(rand(1000))
65. Solving sparse matrix
systems
Now convert the matrix to CSR format and solve Ax=b:
>>> A = A.tocsr()
>>> b = rand(1000)
>>> x = spsolve(A, b)
# Convert it to a dense matrix and solve, and
check that the result is the same:
>>> from numpy.linalg import solve, norm
>>> x_ = solve(A.todense(), b)
# Compute norm of the error:
>>> err = norm(x - x_)
>>> err < 1e-10
True
66. Matplotlib
Great plotting package in Python
Matlab-like syntax
Great rendering: anti-aliasing etc.
Many ‘backends’: Cairo, GTK, Cocoa, PDF
Flexible output: to EPS, PS, PDF, TIFF, PNG, ...
68. Example: NumPy
vectorization
1. Use a Monte Carlo algorithm to
estimate π:
1. Generate uniform random variates (x,%y) over [0, 1].
2. Estimate π from the proportion p that land in the unit
circle.
2. Time two ways of doing this:
1. Using for loops
2. Using array operations (vectorized)
72. Python for HPC
Advantages Disadvantages
Portability Global interpreter lock
Easy scripting, glue Less control than C
Maintainability Native loops are slow
Profiling to identify hotspots
Vectorization with NumPy
73. Large data sets
Useful Python language features:
Generators, iterators
Useful packages:
Great HDF5 support from PyTables!
76. Applications of PyTables
aeronautics telecommunications
drug discovery data mining
financial analysis statistical analysis
climate prediction etc.
77. Breaking news: June 2011
PyTables Pro is now being open sourced.
Indexed searches for speed
Merging with PyTables
Working project name: NewPyTables
78. PyTables performance
OPSI indexing engine speed:
Querying 10 billion rows can take hundredths of a
second!
Target use-case:
mostly read-only or append-only data
80. Important principles
1. "Premature optimization is the root of all evil"
Don't write cryptic code just to make it more efficient!
2. 1-5% of the code takes up the vast majority of the
computing time!
... and it might not be the 1-5% that you think!
81. Checklist for efficient code
From most to least important:
1. Check: Do you really need to make it more efficient?
2. Check: Are you using the right algorithms and data
structures?
3. Check: Are you reusing pre-written libraries wherever
possible?
4. Check: Which parts of the code are expensive?
Measure, don't guess!
82. Relative efficiency gains
Exponential-order and polynomial-order speedups are
possible by choosing the right algorithm for a task.
These require the right data structures!
These dwarf 10-25x linear-order speedups from:
using lower-level languages
using different language constructs.
84. The largest Python training provider in South-East Asia
Delighted customers include:
85. Most popular course topics
Python for Programmers 3 days
Python for Scientists and Engineers 4 days
Python for Geoscientists 4 days
Python for Bioinformaticians 4 days
New courses:
Python for Financial Engineers 4 days
Python for IT Security Professionals 3 days
86. Python Charmers:
Topics of expertise
Python: beginners, advanced
Scientific data processing with Python
Software engineering with Python
Large-scale problems: HPC, huge data sets, grids
Statistics and Monte Carlo problems
87. Python Charmers:
Topics of expertise (2)
Spatial data analysis / GIS
General scripting, job control, glue
GUIs with PyQt
Integrating with other languages: R, C, C++, Fortran, ...
Web development in Django
88. How to get in touch
See PythonCharmers.com
or email us at: info@pythoncharmers.com