Keynote talk at PyCon Estonia 2019 where I discuss how to extend CPython and how that has led to a robust ecosystem around Python. I then discuss the need to define and build a Python extension language I later propose as EPython on OpenTeams: https://openteams.com/initiatives/2
5. Where I started
Started as my graduate student
“procrastination project” (as Multipack)
in 1998 and became SciPy in 2001 with
the help of colleagues.
108 releases, 766 contributors
Used by: 128,495
Pearu Peterson
Estonia was critical
To both SciPy and
NumPy
6. Where it led for me
Gave up my chance at a tenured academic
position in 2005-2006 to bring together the
diverging array community in Python and unify
Numeric and Numarray.
159 releases, 827 contributors
Used by: 254,856
7. What amplified data science
Created by Wes McKinney. Also, AQR agreed to
release this data-frame he started at AQR (while
dozens of other data-frames in hedge-funds and
investment banks did not get open-sourced)
106 releases, 1601 contributors
Used by: 139,133
8. Why Python for ML?
Created by David Cournapeau as Google Summer
of Code Project and then quickly added to by
100s of researchers around the world. Supported
by INRIA.
100 releases, 1433 contributors
Used by: 70,287
9. First DL Framework in Python
Built at Université de Montréal by Frédéric
Bastien and his students. Many contributors.
Forms foundation for PyMC3 and other libraries.
33 releases, 332 contributors
Used by: 6,194
13. Keys to Python Success
Modular Extensibility
New Types and Functions
Protocol Overloading (i.e. “dunder” methods)
Interoperability
14. Modular Extensibility
Modules Packages
>>> import numpy
>>> numpy.__file__
{path-prefix}numpy/__init__.py
>>> numpy.__path__
{path-prefix}numpy
>>> numpy.linalg.__file__
{path-prefix}numpy/linalg/__init__.py
>>> import math
>>> math.__file__
{path}math{platform}.so
>>> import os
>>> os.__file__
{path}os.py
.pydor
# my_module.py
a = 3
b = 4
def cross(x,y):
Return a*x + b*y
>>> import my_module
>>> my_module.__file__
{path}my_module.py
>>> ks = my_module.__dict__.keys()
>>> [y for y in ks
if not y.startswith('__')]
['a', 'b', 'cross']
subpackages = []
for name in dir(numpy):
obj = getattr(numpy, name)
if hasattr(obj, '__file__') and
obj.__file__.endswith('__init__.py')
subpackages.append(obj.__name__)
>>> print subpackages
[‘numpy.matrixlib','numpy.compat','numpy.core',
'numpy.fft','numpy.lib','numpy.linalg','numpy.ma',
'numpy.matrixlib','numpy.polynomial','numpy.random',
'numpy.testing']
15. New types New functions
class Node:
def __init__(self, item, parent=None):
self.item = item
self.children = []
if parent is not None:
parent.children.append(self)
from math import sqrt
def kurtosis(data):
N = len(data)
mean = sum(data)/N
std = sqrt(sum((x-mean)**2 for x in data)/N)
zi = ((x-mean)/std for x in data)
return sum(z**4 for z in zi)/N - 3
>>> g = Node(“Root”)
>>> type(g)
__main__.Node
>>> type(g).__mro__
(__main__.Node, object)
>>> type(Node).__mro__
(type, object)
>>> type(3)
int
>>> type(3).__mro__
(int, object)
>> type(int).__mro__
(type, object)
>>> type(kurtosis)
function
>>> type(sqrt)
builtin_function_or_method
>>> type(sum)
builtin_function_or_method
>>> import numpy; type(numpy.add)
numpy.ufunc
New Types and Functions
19. First problem: Efficient Data Input
The first step is to get the data right
“It’s Always About the Data”
http://www.python.org/doc/essays/refcnt/
Reference Counting Essay
May 1998
Guido van Rossum
TableIO
April 1998
Michael A. Miller
NumPyIO
June 1998
20. A walk through bitarray
Ilan Schnell
Built all first versions
of Anaconda
bitarray: efficient arrays of booleans
https://github.com/ilanschnell/bitarray
27. Powerful but requires care!
• Reference counting (you have to do this manually)
• Error handling (can be tedious)
• Initialization (can byte you badly if you aren’t careful)
• Other run-times (PyPy, RustPython) can’t easily use
your tool.
• You have access to all the machinery Python itself
uses to create all of its own builtins.
• You are literally extending Python with new builtin
types and functions.
• Incredible speed as fast as machine can work.
29. What should you do today?
• Just write your code in Python and use existing extensions.
• If More Speed is needed:
My opinionated modern view
• Use Numba
• Use Cython
• Use mypy (and eventually mypyc)
• Run with PyPy
• Use Rust and PyO3
• Or if few existing extensions being used:
30. • An open-source, function-at-a-time compiler library for Python
• Compiler toolbox for different targets and execution models:
• single-threaded CPU, multi-threaded CPU, GPU
• regular functions, “universal functions” (array functions), etc
• Speedup: 2x (compared to basic NumPy code) to 200x (compared to pure
Python)
• Combine ease of writing Python with speeds approaching FORTRAN
• Empowers scientists who make tools for themselves and other scientists
Numba: A JIT Compiler for Python
31. 7 things about Numba you may not know
1
2
3
4
5
6
7
Numba is 100% Open Source
Numba + Jupyter = Rapid
CUDA Prototyping
Numba can compile for the
CPU and the GPU at the same time
Numba makes array processing
easy with @(gu)vectorize
Numba comes with a
CUDA Simulator
You can send Numba
functions over the network
Numba has typed Lists and
Dictionaries (soon)
32. Numba (compile Python to CPUs and GPUs)
conda install numba
Intermediate
Representation
(IR)
x86
ARM
PTX
Python
LLVMNumba
Code Generation
Backend
Parsing
Frontend
33. How does Numba work?
Python Function
(bytecode)
Bytecode
Analysis
Functions
Arguments
Numba IR
Machine
Code
Execute!
Type
Inference
LLVM/NVVM JIT LLVM IR
Lowering
Rewrite IR
Cache
@jit
def do_math(a, b):
…
>>> do_math(x, y)
34. Supported Platforms and Hardware
OS HW SW
Windows
(7 and later)
32 and 64-bit CPUs (Incl
Xeon Phi)
Python 2.7, 3.4-3.7
OS X
(10.9 and later)
CUDA & HSA GPUs NumPy 1.10 and later
Linux
(RHEL 6 and later)
Some support for ARM and
ROCm
36. Basic Example
Array Allocation
Looping over ndarray x as an iterator
Using numpy math functions
Returning a slice of the array
2.7x speedup!
Numba decorator
(nopython=True not required)
37. • Detects CPU model during compilation and optimizes for that target
• Automatic type inference: No need to give type signatures for functions
• Dispatches to multiple type-specializations for the same function
• Call out to C libraries with CFFI and types
• Special "callback" mode for creating C callbacks to use with external
libraries
• Optional caching to disk, and ahead-of-time creation of shared libraries
• Compiler is extensible with new data types and functions
Numba Features
38. • Three main technologies for parallelism:
Parallel Computing
SIMD Multi-threading Distributed Computing
x0x1x2x3 x0x1x2x3
x0x3
x2 x1
39. • Numba's CPU detection will enable
LLVM to autovectorize for
appropriate SIMD instruction set:
• SSE, AVX, AVX2, AVX-512
• Will become even more important
as AVX-512 is now available on
both Xeon Phi and Skylake Xeon
processors
SIMD: Single Instruction Multiple Data
40. Manual Multithreading: Release the GIL
SpeedupRatio 0
0.9
1.8
2.6
3.5
Number of Threads
1 2 4
Option to release the GIL
Using Python
concurrent.futures
41. Universal Functions (Ufuncs)
Ufuncs are a core concept in NumPy for array-oriented
computing.
◦ A function with scalar inputs is broadcast across the elements of
the input arrays:
• np.add([1,2,3], 3) == [4, 5, 6]
• np.add([1,2,3], [10, 20, 30]) == [11, 22, 33]
◦ Parallelism is present, by construction. Numba will generate
loops and can automatically multi-thread if requested.
◦ Before Numba, creating fast ufuncs required writing C. No
longer!
44. ParallelAccelerator
• ParallelAccelerator is a special compiler pass contributed by Intel Labs
• Todd A. Anderson, Ehsan Totoni, Paul Liu
• Based on similar contribution to Julia
• Automatically generates mulithreaded code in a Numba compiled-
function:
• Array expressions and reductions
• Random functions
• Dot products
• Explicit loops indicated with prange() call
49. Basic use
Create a text file with a .pyx extension along with a setup.py
setup.py
helloworld.pyx
Hint: can use %%cython magic in notebooks
After %load_ext Cython
Borrowed from Cython documentation
cython.readthedocs.io
54. MyPyC
mypyc is a compiler that compiles mypy-annotated, statically typed
Python modules into CPython C extensions.
https://github.com/python/mypy/tree/master/mypyc
• Most type annotations are enforced at runtime (raising TypeError on mismatch)
• Classes are compiled into extension classes without __dict__ (much, but not quite, like if they used __slots__)
• Monkey patching doesn't work
• Instance attributes won't fall back to class attributes if undefined
• Metaclasses not supported
• Also there are still a bunch of bad bugs and unsupported features :)
Still Experimental!
60. What do we need?
•A way to extend Python that targets multiple run-
times by default (at least PyPy, CPython,
RustPython) with the ability to add new run-times
•Use a subset of typed-Python to do it — i.e. a
domain-specific extension language in Python
itself
•Need NumPy, Pandas, SciPy, Scikit-Learn, and
more to use this approach (this will take time)
62. A Bold Proposal
• Create a Cython-like tool that uses mypy typing
• Borrow heavily from Cython ideas but start a new
project that could be pulled into Python itself.
• At the same time work from below to continue the
clean-up of CPython C-API that has already started.
63. Need ~$5million commitment for a 3-year project to start this
• Core team of 5+ devs with 1 lead
• 1/2 time project manager and PSF representative
• 3+ community liaisons and developer evangelists
• Start with $500k Phase 0 to prove the idea
• Get total funding from at least 20 companies:
$25k initial buy-in, at least $250k
commitment over 3 years to start the effort.
• Allow up to $100k initial and $1million
commitment.
• Paying participants get project-management
attention and early easy-to-use runtimes and
binary extensions delivered with ability to set
priorities (plus marketing and the knowledge
they are leading Python forward).
How?
LABS
Cooperative
Community
Work Order
• We have the people in our network of
collaborators.
• We have a sales and marketing team
that will pitch this.
• We are just rolling out the proposal.
Interested? travis@quansight.com
65. A new platform to help open-source projects and
developers thrive professionally and financially.
Sign up to:
• build your open-
source portfolio
• show which
projects you use
• thank contributors
for projects you
love
• (soon) get
connected to
initiatives like the
one to make
Python universally
extensible.
https://openteams.com
66. LABS
Sustaining the Future
Open-source innovation and
maintenance around the entire data-
science and AI workflow.
• NumPy ecosystem maintenance
• Maintenance and support with PyData core team
• Improve connection of NumPy to ML Frameworks
• GPU Support for NumPy Ecosystem
• Improve foundations of Array computing
• JupyterLab
• Data Catalog standards
• Packaging (conda-forge, PyPA, etc.)
uarray — unified array interface and symbolic NumPy
xnd — re-factored NumPy (low-level cross-language
libraries for N-D (tensor) computing)
Partnered with NumFOCUS
and Ursa Labs (supporting
Apache Arrow)