Is PyPy Ready for Production? The Sequel

the sequel
Mark Rees
CTO
Century Software (M) Sdn Bhd
is it ready for production?

pypy & me
not affiliated with pypy team
have followed it‟s development since
2004
use cpython and jython at work
used ironpython for small projects
gave a similar talk at PyConAU 2012
the question:
would pypy improve performance of
some of our workloads?
i am a manager, who still is wants to be a
programmer, so i did the analysis

pypy
history
- first sprint 2003, EU project from 2004 – 2007
- open source project from 2007
https://bitbucket.org/pypy
- pypy 1.4 first release suitable for “production”
12/2010
what is pypy?
- RPython translation toolchain, a framework for
generating dynamic programming language
implementations
- a implementation of Python in Python using the
framework

pypy
current release
pypy 2.0 released may 2013
latest iteration 2.0.2
want to know more about pypy
- http://pypy.org/
- david beazley pycon 2012 keynote
http://goo.gl/5PXFQ
- how the pypy jit works http://goo.gl/dKgFp
- why pypy by example http://goo.gl/vpQyJ

production ready – a definition
it runs
it satisfies the project requirements
its design was well thought out
it's stable
it's maintainable
it's scalable
it's documented
it works with the python modules we use
it is as fast or faster than cpython
http://programmers.stackexchange.com/questions/61726/define-production-ready

pypy – does it run?
of course, it runs
See http://pypy.readthedocs.org/en/latest/cpython_differences.html
for differences between PyPy and CPython

pypy – other production criteria
does it satisfy the project requirements
- yes
is it‟s design was well thought out
- I would assume so
is it stable
- yes
is it maintainable
- 7 out of 10
is it scalable
- stackless & greenlets built in
is it documented
- cpython docs for functionality, rpython toolchain 8 out
of 10

pypy – does it work with the modules we use
standard library modules supported:
__builtin__, __pypy__, _ast, _bisect, _codecs, _collections, _ffi, _hashlib,
_io, _locale, _lsprof, _md5, _minimal_curses, _multiprocessing, _random,
_rawffi, _sha, _socket, _sre, _ssl, _warnings, _weakref, _winreg, array,
binascii, bz2, cStringIO, clr, cmath, cpyext, crypt, errno, exceptions,
fcntl, gc, imp, itertools, marshal, math, mmap, operator, oracle, parser,
posix, pyexpat, select, signal, struct, symbol, sys, termios, thread, time,
token, unicodedata, zipimport, zlib
these modules are supported but written in
python:
cPickle, _csv, ctypes, datetime, dbm, _functools, grp, pwd, readline,
resource, sqlite3, syslog, tputil
many python libs are known to work, like:
ctypes, django, pyglet, sqlalchemy, PIL. See
https://bitbucket.org/pypy/compatibility/wiki/Home for a more
exhaustive list.

pypy – does it work with the modules we use
pypy c-api support is beta, worked most of
the time but failed with reportlab:
Fatal error in cpyext, CPython compatibility layer, calling
PySequence_GetItem
Either report a bug or consider not using this particular extension
<OpErrFmt object at 0x7f94582f3100>
RPython traceback:
File ”pypy_module_cpyext_api_1.c", line 30287, in PySequence_GetItem
File ”pypy_module_cpyext_pyobject.c", line 1056, in
BaseCpyTypedescr_realize
File ”pypy_objspace_std_objspace.c", line 3404, in
allocate_instance__W_ObjectObject
File ”pypy_objspace_std_typeobject.c", line 33781, in
W_TypeObject_check_user_subclass
Segmentation fault
But this was the only compatibility issue we
had running all of our python code under
pypy and we could fallback to pure python
reportlab extensions anyway.

pypy – does it work with the modules you use
Ipython notebook requires tornado & zeromq

pypy – does it work with the modules you use

pypy – does it run as fast as cpython
http://speed.pypy.org/
but!

pypy django benchmark
DJANGO_TMPL = Template("""<table>
{% for row in table %}
<tr>{% for col in row %}<td>{{ col|escape }}</td>{% endfor %}</tr>
{% endfor %}
</table>
""")
def test_django(count):
table = [xrange(150) for _ in xrange(150)]
context = Context({"table": table})
# Warm up Django.
DJANGO_TMPL.render(context)
DJANGO_TMPL.render(context)
times = []
for _ in xrange(count):
t0 = time.time()
data = DJANGO_TMPL.render(context)
t1 = time.time()
times.append(t1 - t0)
return times

my csv to xml benchmark
def bench(data, output):
f = open(data, 'rb')
fn = [„age‟,….]
reader = csv.DictReader(f, fn)
writer = SAXWriter(output)
writer.start_doc()
writer.start_tag('data')
try:
for row in reader:
writer.start_tag('row')
for key in row.keys():
writer.tag(key.replace(' ', '_'), body=row[key])
writer.end_tag('row')
finally:
f.close()
writer.end_tag('data')
writer.end_doc()

my pypy benchmarks
https://bitbucket.org/hexdump42/pypy-benchmarks
benchmark cpython
2.7.3
pypy-jit
1.9
pypy-jit
2.0.2
bm_csv2xml 88.26/94.
04
28.89 3.0549 x
faster
23.86 3.7728x
faster
average execution time (in seconds)

my pypy benchmarks
benchmark cpython
2.7.3
pypy-jit
1.9
pypy-jit
2.0.2
04
28.89 3.0549 x
faster
23.86 3.7728x
faster
bm_csv 1.54/1.65 5.89 3.8122 x
slower
1.72 0.9825 x
slower

my pypy benchmarks
benchmark cpython
2.7.3
pypy-jit
1.9
pypy-jit
2.0.2
04
28.89 3.0549 x
faster
23.86 3.7728x
faster
bm_csv 1.54/1.65 5.89 3.8122 x
slower
1.72 0.9825 x
slower
bm_openpyxl 1.31/1.21 3.26 2.4871 x
slower
3.15 2.6051 x
slower

my pypy benchmarks
benchmark cpython
2.7.3
pypy-jit
1.9
pypy-jit
2.0.2
04
28.89 3.0549 x
faster
23.86 3.7728x
faster
bm_csv 1.54/1.65 5.89 3.8122 x
slower
1.72 0.9825 x
slower
bm_openpyxml 1.31/1.21 3.26 2.4871 x
slower
3.15 2.6051 x
slower
bm_xhtml2pdf 1.91/1.95 3.27 1.7155 x
slower
4.22 2.1637 x
slower

my pypy benchmarks
benchmark cpython
2.7.3
pypy-jit
1.9
pypy-jit
2.0.2
bm_interp 5412/5248 12556 2.32 x
larger
21880 4.1692 x
larger
bm_csv2xml 7048/7064 55180 7.8292 x
larger
55232 7.8188 x
larger
bm_csv 5812/5180 52200 8.9814 x
larger
52176 10.0726
x larger
bm_openpyxl 12656/
12656
77252 6.1040 x
larger
80428 6.3549 x
larger
bm_xhtml2pdf 48880/
34884
236792 4.8444 x
larger
101376 2.906 x
larger
max memory use

what is the pypy jit doing?
https://bitbucket.org/pypy/jitviewer/

modified csv pypy benchmarks
benchmark cpython
2.7.3
pypy-jit
1.9
pypy-jit
2.0.2
bm_csv2xml_mod 88.25/90.02 23.65 3.7315 x
faster
21.76 4.0556 x
faster
bm_csv_mod 1.62/1.69 1.89 0.8571 x
slower
1.68 0.9643 x
slower

is pypy ready for production
1. it runs
2. it satisfies the project requirements
3. its design was well thought out
4. it's stable
5. it's maintainable
6. it's scalable
7. it's documented
8. it works with the python modules we use
9. it can be as fast or faster than cpython

some other reasons to consider pypy
cffi – C foreign function interface for python
- http://cffi.readthedocs.org/
pypy version of numpy
py3k version of pypy work-in-progress
check out the STM/AME project
-
https://speakerdeck.com/pyconslides/pypy-
python-without-the-gil-by-armin-rigo-and-
maciej-fijalkowski
You can help
http://www.pypy.org/howtohelp.html

Mark Rees
mark at censof dot com
+Mark Rees
@hexdump42
hex-dump.blogspot.com
contact details
http://www.slideshare.net/hexdump42/pypy-isitreadyforproductionthesequel
http://goo.gl/8IPuX

Is PyPy Ready for Production? The Sequel

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Is PyPy Ready for Production? The Sequel

Similar to Is PyPy Ready for Production? The Sequel (20)

More from Mark Rees

More from Mark Rees (6)

Recently uploaded

Recently uploaded (20)

Is PyPy Ready for Production? The Sequel

Editor's Notes