The Zero-ETL Approach: Enhancing Data Agility and Insight
The Joy of SciPy
1. The Joy of SciPy
David Kammeyer
PUB February 14, 2013
2. Brief History
Person Package Year
Matrix Object
Jim Fulton 1994
in Python
Jim Hugunin Numeric 1995
Perry Greenfield, Rick
White, Todd Miller Numarray 2001
Travis Oliphant NumPy 2005
3. SciPy 2001 Travis Oliphant
optimize
sparse
interpolate
integrate
special
signal
stats Founded in 2001 with Travis Vaught
fftpack
misc
Eric Jones
weave
cluster
Pearu Peterson
GA*
linalg
interpolate
f2py
5. Community effort
• Chuck Harris
• Pauli Virtanen
• David Cournapeau
• Stefan van der Walt
• Dag Sverre Seljebotn
• Robert Kern
• Warren Weckesser
• Ralf Gommers
• Mark Wiebe
• Nathaniel Smith
6. Why Python for Technical Computing
• Syntax (it gets out of your way)
• Over-loadable operators
• Complex numbers built-in early
• Just enough language support for arrays
• “Occasional” programmers can grok it
• Supports multiple programming styles
• Expert programmers can also use it effectively
• Has a simple, extensible implementation
• General-purpose language --- can build a system
• Critical mass
7. Putting Science back in Comp Sci
• Much of the software stack is for systems
programming --- C++, Java, .NET, ObjC, web
- Complex numbers?
- Vectorized primitives?
• Array-oriented programming has been
supplanted by Object-oriented programming
• Software stack for scientists is not as helpful
as it should be
• Fortran is still where many scientists end up
8. NumPy: an Array-Oriented Extension
• Data: the array object
– slicing and shaping
– data-type map to Bytes
• Fast Math:
– vectorization
– broadcasting
– aggregations
9. Zen of NumPy
• strided is better than scattered
• contiguous is better than strided
• descriptive is better than imperative
• array-oriented is better than object-oriented
• broadcasting is a great idea
• vectorized is better than an explicit loop
• unless it’s too complicated --- then use Cython/Numba
• think in higher dimensions
12. Benefits of Array-oriented
• Many technical problems are naturally array-
oriented (easy to vectorize)
• Algorithms can be expressed at a high-level
• These algorithms can be parallelized more
simply (quite often much information is lost in
the translation to typical “compiled” languages)
• Array-oriented algorithms map to modern
hard-ware caches and pipelines.
13. We need more focus on
complied array-oriented
languages with fast compilers!
14. What is good about NumPy?
• Array-oriented
• Extensive Dtype System (including structures)
• C-API
• Simple to understand data-structure
• Memory mapping
• Syntax support from Python
• Large community of users
• Broadcasting
• Easy to interface C/C++/Fortran code
15. New Project
NumPy
Blaze
Next Generation NumPy
Out-of-core
Distributed Tables
20. NumPy Users
• Want to be able to write Python to get fast
code that works on arrays and scalars
• Need access to a boat-load of C-extensions
(NumPy is just the beginning)
PyPy doesn’t cut it for us!
21. Ufuncs
Generalized
UFuncs
Python
Function
Window
Kernel
Funcs
Function-
based
Indexing
Memory
Dynamic compilation
Filters
Dynamic
Compilation
NumPy Runtime
I/O Filters
Reduction
Filters
Computed
Columns
function pointer
22. SciPy needs a Python compiler
optimize integrate
special ode
writing more of SciPy at high-level
23. Numba -- a Python compiler
• Replays byte-code on a stack with simple type-
inference
• Translates to LLVM (using LLVM-py)
• Uses LLVM for code-gen
• Resulting C-level function-pointer can be
inserted into NumPy run-time
• Understands NumPy arrays
• Is NumPy / SciPy aware
24. NumPy + Mamba = Numba
Python Function Machine Code
LLVM-PY
LLVM 3.1
ISPC OpenCL OpenMP CUDA CLANG
Intel AMD Nvidia Apple
31. NumFOCUS
• Mission
• To initiate and support educational programs
furthering the use of open source software in
science.
• To promote the use of high-level languages and
open source in science, engineering, and math
research
• To encourage reproducible scientific research
• To provide infrastructure and support for open
source projects for technical computing
32. NumFOCUS
Core Projects
NumPy SciPy IPython Matplotlib
Other Projects (seeking more --- need representatives)
Scikits Image
33. • Large-scale data analysis products
• Anaconda, SciPy in a Box
• Wakari.io -- Cloud Hosted SciPy
• Python training (data analysis and
development)
• NumPy support and consulting
• Blaze, Numba, and More Development