Thomas Schorr, https://2020.ploneconf.org/speakers/thomas-schorr
Pyruvate is a non-blocking, multithreaded WSGI server with competitive performance, implemented in Rust.
It features non-blocking read/write based on mio (https://docs.rs/mio/), a rust-cpython (https://docs.rs/cpython/) based Python interface and a worker pool based on threadpool (https://docs.rs/threadpool/).
The sendfile system call is used for efficient file transfer.
Pyruvate integrates with the Python logging API using asynchronous logging.
PasteDeploy configuration and systemd socket activation are supported.
Beta releases are available for CPython (>=3.6) and Linux.
The talk will present the current state of the project and show how to use Pyruvate with Zope/Plone and other Python web frameworks.
Another focus will be on the roadmap towards a 1.0 release scheduled for end of this year.
https://gitlab.com/tschorr/pyruvate
https://pypi.org/project/pyruvate/
https://2020.ploneconf.org/talks/pyruvate-a-reasonably-fast-non-blocking-multithreaded-wsgi-server/
Pyruvate, a reasonably fast, non-blocking, multithreaded WSGI server
1. WSGI Why Rust? Project Status Performance Demo Next steps
Pyruvate, a reasonably fast, non-blocking,
multithreaded WSGI server
Thomas Schorr
Plone Conference 2020
2. WSGI Why Rust? Project Status Performance Demo Next steps
PEP-3333: Python Web Server Gateway Interface
def application(environ, start_response):
"""Simplest possible WSGI application"""
status = '200 OK'
response_headers = [
('Content-type', 'text/plain')]
start_response(status, response_headers)
return [b'Hello World!n']
3. WSGI Why Rust? Project Status Performance Demo Next steps
The Server Side
• The server invokes the application callable once for each HTTP request it
receives
4. WSGI Why Rust? Project Status Performance Demo Next steps
The Server Side
• The server invokes the application callable once for each HTTP request it
receives
• Many possibilities for handling requests
5. WSGI Why Rust? Project Status Performance Demo Next steps
The Server Side
• The server invokes the application callable once for each HTTP request it
receives
• Many possibilities for handling requests
• Single threaded server
6. WSGI Why Rust? Project Status Performance Demo Next steps
The Server Side
• The server invokes the application callable once for each HTTP request it
receives
• Many possibilities for handling requests
• Single threaded server
• Spawn a thread for each incoming request
7. WSGI Why Rust? Project Status Performance Demo Next steps
The Server Side
• The server invokes the application callable once for each HTTP request it
receives
• Many possibilities for handling requests
• Single threaded server
• Spawn a thread for each incoming request
• 1:1 threading, 1:n threading
8. WSGI Why Rust? Project Status Performance Demo Next steps
The Server Side
• The server invokes the application callable once for each HTTP request it
receives
• Many possibilities for handling requests
• Single threaded server
• Spawn a thread for each incoming request
• 1:1 threading, 1:n threading
• maintain a pool of worker threads
9. WSGI Why Rust? Project Status Performance Demo Next steps
The Server Side
• The server invokes the application callable once for each HTTP request it
receives
• Many possibilities for handling requests
• Single threaded server
• Spawn a thread for each incoming request
• 1:1 threading, 1:n threading
• maintain a pool of worker threads
• multiprocessing
10. WSGI Why Rust? Project Status Performance Demo Next steps
The Server Side
• The server invokes the application callable once for each HTTP request it
receives
• Many possibilities for handling requests
• Single threaded server
• Spawn a thread for each incoming request
• 1:1 threading, 1:n threading
• maintain a pool of worker threads
• multiprocessing
• ...
11. WSGI Why Rust? Project Status Performance Demo Next steps
The Server Side
• The server invokes the application callable once for each HTTP request it
receives
• Many possibilities for handling requests
• Single threaded server
• Spawn a thread for each incoming request
• 1:1 threading, 1:n threading
• maintain a pool of worker threads
• multiprocessing
• ...
• The WSGI server can give hints through environ dictionary
12. WSGI Why Rust? Project Status Performance Demo Next steps
The Application Side
• often needs to connect to components that outlive the single request
13. WSGI Why Rust? Project Status Performance Demo Next steps
The Application Side
• often needs to connect to components that outlive the single request
• databases, caches
14. WSGI Why Rust? Project Status Performance Demo Next steps
The Application Side
• often needs to connect to components that outlive the single request
• databases, caches
• connection might not be thread safe
15. WSGI Why Rust? Project Status Performance Demo Next steps
The Application Side
• often needs to connect to components that outlive the single request
• databases, caches
• connection might not be thread safe
• connection/setup might be expensive
16. WSGI Why Rust? Project Status Performance Demo Next steps
The Application Side
• often needs to connect to components that outlive the single request
• databases, caches
• connection might not be thread safe
• connection/setup might be expensive
• all of the above is true for Zope
17. WSGI Why Rust? Project Status Performance Demo Next steps
The Application Side
• often needs to connect to components that outlive the single request
• databases, caches
• connection might not be thread safe
• connection/setup might be expensive
• all of the above is true for Zope
• recipe for disaster: choose a WSGI server with an inappropriate worker
model
18. WSGI Why Rust? Project Status Performance Demo Next steps
Consequence: Limited Choice
of WSGI servers suitable for Zope/Plone.
• waitress (the default) with very good overall performance
• bjoern: fast, non-blocking, single threaded
• ...
19. WSGI Why Rust? Project Status Performance Demo Next steps
More options please
Wishlist:
• multithreaded, 1:1 threading, workerpool
• PasteDeploy entry point
• handle the Zope/Plone use case
• non-blocking
• File wrapper supporting sendfile
• competitive performance
Non Goals
• Python 2
• ASGI (not yet at least)
• Windows
20. WSGI Why Rust? Project Status Performance Demo Next steps
Why Rust?
Naive expectations:
• Faster than Python
• Easier to use than C
21. WSGI Why Rust? Project Status Performance Demo Next steps
Performance
Performance
Emmerich, P. et al (2019): The Case for Writing Network Drivers in High-Level Programming Languages. -
https://www.net.in.tum.de/fileadmin/bibtex/publications/papers/the-case-for-writing-network-drivers-in-high-level-languages.pdf
.
22. WSGI Why Rust? Project Status Performance Demo Next steps
Memory Management through Ownership
• feature unique to Rust
• a set of rules that the compiler checks at compile time
(https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html)
• Each value in Rust has a variable that’s called it’s owner.
• There can be only one owner at a time.
• When the owner goes out of scope, the value will be dropped.
• Drop is a trait; there’s a default implementation that you can override
• You can still control where (stack or heap) your data is stored.
23. WSGI Why Rust? Project Status Performance Demo Next steps
How is that relevant?
Example: interfacing with Python
• Python memory management: reference counting + garbage collection
• association: increasing an objects’ refcount using Py_INCREF
• should match with corresponding Py_DECREF invocations
• garbage collection when object refcount goes to 0
• Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps
24. WSGI Why Rust? Project Status Performance Demo Next steps
How is that relevant?
Example: interfacing with Python
• Python memory management: reference counting + garbage collection
• association: increasing an objects’ refcount using Py_INCREF
• should match with corresponding Py_DECREF invocations
• garbage collection when object refcount goes to 0
• Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps
• 63 occurences of Py_INCREF in BTrees (79 Py_DECREF), 19 in
zope.interface (50 Py_DECREF)
25. WSGI Why Rust? Project Status Performance Demo Next steps
How is that relevant?
Example: interfacing with Python
• Python memory management: reference counting + garbage collection
• association: increasing an objects’ refcount using Py_INCREF
• should match with corresponding Py_DECREF invocations
• garbage collection when object refcount goes to 0
• Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps
• 63 occurences of Py_INCREF in BTrees (79 Py_DECREF), 19 in
zope.interface (50 Py_DECREF)
• 1 Py_INCREF in rust-cpython (4 Py_DECREF)
26. WSGI Why Rust? Project Status Performance Demo Next steps
How is that relevant?
Example: interfacing with Python
• Python memory management: reference counting + garbage collection
• association: increasing an objects’ refcount using Py_INCREF
• should match with corresponding Py_DECREF invocations
• garbage collection when object refcount goes to 0
• Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps
• 63 occurences of Py_INCREF in BTrees (79 Py_DECREF), 19 in
zope.interface (50 Py_DECREF)
• 1 Py_INCREF in rust-cpython (4 Py_DECREF)
• very hard to create a mismatch of Py_INCREF/Py_DECREF
invocations, making it harder to create memory leaks or core dumps
27. WSGI Why Rust? Project Status Performance Demo Next steps
How is that relevant?
Example: interfacing with Python
• Python memory management: reference counting + garbage collection
• association: increasing an objects’ refcount using Py_INCREF
• should match with corresponding Py_DECREF invocations
• garbage collection when object refcount goes to 0
• Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps
• 63 occurences of Py_INCREF in BTrees (79 Py_DECREF), 19 in
zope.interface (50 Py_DECREF)
• 1 Py_INCREF in rust-cpython (4 Py_DECREF)
• very hard to create a mismatch of Py_INCREF/Py_DECREF
invocations, making it harder to create memory leaks or core dumps
• still possible to create more references than needed
28. WSGI Why Rust? Project Status Performance Demo Next steps
Other Rust features
• strict typing will find many problems at compile time
• Pattern matching
• very good documentation, helpful compiler messages
29. WSGI Why Rust? Project Status Performance Demo Next steps
What is Pyruvate from a user perspective
• a package available from PyPI:
30. WSGI Why Rust? Project Status Performance Demo Next steps
What is Pyruvate from a user perspective
• a package available from PyPI:
pip install pyruvate
31. WSGI Why Rust? Project Status Performance Demo Next steps
What is Pyruvate from a user perspective
• a package available from PyPI:
pip install pyruvate
• an importable Python module:
32. WSGI Why Rust? Project Status Performance Demo Next steps
What is Pyruvate from a user perspective
• a package available from PyPI:
pip install pyruvate
• an importable Python module:
import pyruvate
def application(environ, start_response):
"""WSGI application"""
...
pyruvate.serve(application, '0.0.0.0:7878', 3)
33. WSGI Why Rust? Project Status Performance Demo Next steps
Using Pyruvate with Zope/Plone
with plone.recipe.zope2instance:
• buildout.cfg
[instance]
recipe = plone.recipe.zope2instance
http-address = 127.0.0.1:8080
eggs =
Plone
pyruvate
wsgi-ini-template = ${buildout:directory}/
templates/pyruvate.ini.in
• pyruvate.ini.in Template
[server:main]
use = egg:pyruvate#main
socket = %(http_address)s
workers = 2
34. WSGI Why Rust? Project Status Performance Demo Next steps
Pyruvate project structure
• initially created with cargo new --lib
35. WSGI Why Rust? Project Status Performance Demo Next steps
Pyruvate project structure
• initially created with cargo new --lib
• Rust sources in src folder
36. WSGI Why Rust? Project Status Performance Demo Next steps
Pyruvate project structure
• initially created with cargo new --lib
• Rust sources in src folder
• Cargo.toml pulls Rust dependencies
37. WSGI Why Rust? Project Status Performance Demo Next steps
Pyruvate project structure
• initially created with cargo new --lib
• Rust sources in src folder
• Cargo.toml pulls Rust dependencies
• setup.py
• uses setuptools_rust to build a
RustExtension
• defines PasteDeploy entry point
38. WSGI Why Rust? Project Status Performance Demo Next steps
Pyruvate project structure
• initially created with cargo new --lib
• Rust sources in src folder
• Cargo.toml pulls Rust dependencies
• setup.py
• uses setuptools_rust to build a
RustExtension
• defines PasteDeploy entry point
• pyproject.toml to specify build system
requirements (PEP 518)
39. WSGI Why Rust? Project Status Performance Demo Next steps
Pyruvate project structure
• initially created with cargo new --lib
• Rust sources in src folder
• Cargo.toml pulls Rust dependencies
• setup.py
• uses setuptools_rust to build a
RustExtension
• defines PasteDeploy entry point
• pyproject.toml to specify build system
requirements (PEP 518)
• tests folder containing (currently only) Python
tests (unit tests in Rust modules)
40. WSGI Why Rust? Project Status Performance Demo Next steps
Pyruvate project structure
• initially created with cargo new --lib
• Rust sources in src folder
• Cargo.toml pulls Rust dependencies
• setup.py
• uses setuptools_rust to build a
RustExtension
• defines PasteDeploy entry point
• pyproject.toml to specify build system
requirements (PEP 518)
• tests folder containing (currently only) Python
tests (unit tests in Rust modules)
• __init__.py in pyruvate folder
• Paste Deploy entry point
• FileWrapper import
41. WSGI Why Rust? Project Status Performance Demo Next steps
Gitlab Pipeline
• Two stages: test + build
42. WSGI Why Rust? Project Status Performance Demo Next steps
Gitlab Pipeline
• Two stages: test + build
• Linting: rustfmt, clippy
43. WSGI Why Rust? Project Status Performance Demo Next steps
Gitlab Pipeline
• Two stages: test + build
• Linting: rustfmt, clippy
• cargo test
44. WSGI Why Rust? Project Status Performance Demo Next steps
Gitlab Pipeline
• Two stages: test + build
• Linting: rustfmt, clippy
• cargo test
• coverage report using kcov, uploaded to
https://codecov.io
45. WSGI Why Rust? Project Status Performance Demo Next steps
Gitlab Pipeline
• Two stages: test + build
• Linting: rustfmt, clippy
• cargo test
• coverage report using kcov, uploaded to
https://codecov.io
• Python integration tests with tox
46. WSGI Why Rust? Project Status Performance Demo Next steps
Gitlab Pipeline
• Two stages: test + build
• Linting: rustfmt, clippy
• cargo test
• coverage report using kcov, uploaded to
https://codecov.io
• Python integration tests with tox
• build wheels
47. WSGI Why Rust? Project Status Performance Demo Next steps
Binary packages
• manylinux2010 wheels for Python 3.6-3.9
• switched from manylinux1 after stable Rust stopped supporting the old
ABI (ELF file OS ABI invalid error when loading rust shared libraries)
1.47.0
• manylinux2010 needs recent pip and setuptools versions
48. WSGI Why Rust? Project Status Performance Demo Next steps
Binary packages
• manylinux2010 wheels for Python 3.6-3.9
• switched from manylinux1 after stable Rust stopped supporting the old
ABI (ELF file OS ABI invalid error when loading rust shared libraries)
1.47.0
• manylinux2010 needs recent pip and setuptools versions
• pip >= 19.0 if pip prefers sdist over wheel (and there’s no Rust)
• setuptools >= 42.0.0 (when using zc.buildout)
49. WSGI Why Rust? Project Status Performance Demo Next steps
Binary packages
• manylinux2010 wheels for Python 3.6-3.9
• switched from manylinux1 after stable Rust stopped supporting the old
ABI (ELF file OS ABI invalid error when loading rust shared libraries)
1.47.0
• manylinux2010 needs recent pip and setuptools versions
• pip >= 19.0 if pip prefers sdist over wheel (and there’s no Rust)
• setuptools >= 42.0.0 (when using zc.buildout)
• wanted: MacOS
50. WSGI Why Rust? Project Status Performance Demo Next steps
Features
• rust-cpython based Python interface
(https://github.com/dgrunwald/rust-cpython)
• Nonblocking IO using mio (https://github.com/tokio-rs/mio)
• Nonblocking read
• blocking or nonblocking write
• Worker pool based on threadpool (https://docs.rs/threadpool); 1:1
threading
• PasteDeploy entry point
• integrates with Python logging
• asynchronous logging -> no need to hold the GIL when creating the log
message
• logging configuration in wsgi.ini
• TCP or Unix Domain sockets
• supports systemd socket activation
51. WSGI Why Rust? Project Status Performance Demo Next steps
Performance
Pierre Terre / Rabbit Hole, Monarch’s Way / CC BY-SA 2.0
• number of requests/amount of
data transferred per unit of time
• Testing and eventually
improving it
52. WSGI Why Rust? Project Status Performance Demo Next steps
Approach
• Static code analyis + refactoring
53. WSGI Why Rust? Project Status Performance Demo Next steps
Approach
• Static code analyis + refactoring
• reminder: pyruvate started as a Hello Rust project
• memory allocations are expensive
54. WSGI Why Rust? Project Status Performance Demo Next steps
Approach
• Static code analyis + refactoring
• reminder: pyruvate started as a Hello Rust project
• memory allocations are expensive
• How to induce socket blocking?
55. WSGI Why Rust? Project Status Performance Demo Next steps
Approach
• Static code analyis + refactoring
• reminder: pyruvate started as a Hello Rust project
• memory allocations are expensive
• How to induce socket blocking?
• limiting socket buffer sizes of a Vagrant box
56. WSGI Why Rust? Project Status Performance Demo Next steps
Approach
• Static code analyis + refactoring
• reminder: pyruvate started as a Hello Rust project
• memory allocations are expensive
• How to induce socket blocking?
• limiting socket buffer sizes of a Vagrant box
• Docker?
57. WSGI Why Rust? Project Status Performance Demo Next steps
Approach
• Static code analyis + refactoring
• reminder: pyruvate started as a Hello Rust project
• memory allocations are expensive
• How to induce socket blocking?
• limiting socket buffer sizes of a Vagrant box
• Docker?
• Flame graphs from perf data
(http://www.brendangregg.com/flamegraphs.html)
58. WSGI Why Rust? Project Status Performance Demo Next steps
Approach
• Static code analyis + refactoring
• reminder: pyruvate started as a Hello Rust project
• memory allocations are expensive
• How to induce socket blocking?
• limiting socket buffer sizes of a Vagrant box
• Docker?
• Flame graphs from perf data
(http://www.brendangregg.com/flamegraphs.html)
• .to_lower() is much more expensive than
.to_ascii_uppercase()
59. WSGI Why Rust? Project Status Performance Demo Next steps
Approach
• Static code analyis + refactoring
• reminder: pyruvate started as a Hello Rust project
• memory allocations are expensive
• How to induce socket blocking?
• limiting socket buffer sizes of a Vagrant box
• Docker?
• Flame graphs from perf data
(http://www.brendangregg.com/flamegraphs.html)
• .to_lower() is much more expensive than
.to_ascii_uppercase()
• load testing with siege and ab
60. WSGI Why Rust? Project Status Performance Demo Next steps
Performance: Design considerations
• Python Global Interpreter Lock: Python code can only run when holding
the GIL
• Multiple worker threads need to acquire the GIL in turn
• acquire GIL only for application execution
• drop GIL when doing IO
• more than one possible way to do this
• IO event polling
• abstraction: mio Poll instance
• accepted connections are registered for read events with a Poll instance
in the main thread
• completely read requests + connection are passed to the worker pool
• iterate over WSGI response chunks (needs GIL)
• blocking write: loop until response is completely written
• non-blocking write:
• write until EAGAIN
• register connection for write events with per worker Poll instance
• drop GIL, stash response
61. WSGI Why Rust? Project Status Performance Demo Next steps
Performance: current status
• Lenovo X390 and Vagrant (2 CPU, 2 G RAM, 8K write buffer size limit)
• faster than waitress on a Hello world WSGI application
• faster that waitress on / (looking at
https://zope.readthedocs.io/en/4.x/wsgi.html#test-criteria-for-
recommendations)
• but slower on /Plone
• more performance testing needed
62. WSGI Why Rust? Project Status Performance Demo Next steps
Live Demo
63. WSGI Why Rust? Project Status Performance Demo Next steps
Release 1.0
• Planned for end of this year
• Reuse connections (keep-alive + chunked transport)
• Branch on Gitlab, needs some work
• MacOS support wanted
• optimize pipeline
• use a kcov binary package
• async logging: thread ID
• More testing + bugfixing
64. WSGI Why Rust? Project Status Performance Demo Next steps
Thanks for your attention
• Thomas Schorr
• info@thomasschorr.de
• https://gitlab.com/tschorr/pyruvate
• https://pypi.org/project/pyruvate