SlideShare uma empresa Scribd logo
1 de 29
Baixar para ler offline
Notes about moving
from Python to C++
Yung-Yu Chen (@yungyuc)

PyConTW 2020, Tainan, Taiwan
1
Why move?
• Numerical software: Computer programs to solve scientific or
mathematic problems.

• Other names: Mathematical software, scientific software, technical
software.

• Python is a popular language for application experts to describe the
problems and solutions, because it is easy to use.

• Most of the computing systems (the numerical software) are designed in
a hybrid architecture.

• The computing kernel uses C++.

• Python is chosen for the user-level API.
2
What I do
• By training, I am a computational scientist, focusing on
applications of continuum mechanics. (I am not a
computer scientist.)

• In my day job, I write high-performance code for
semiconductor applications of computational geometry
and photolithography.

• In my spare time, I am teaching a course ‘numerical
software development’ in the department of computer
science in NCTU.
3
twitter: https://twitter.com/yungyuc

linkedin: https://www.linkedin.com/in/yungyuc/.
Example: PDEs
4
Numerical simulations of
conservation laws:

∂u
∂t
+
3
∑
k=1
∂F(k)
(u)
∂xk
= 0
Use case: stress waves in 

anisotropic solids
Use case: compressible flows
Example: OPC
5
photoresist
silicon substrate
photomask
light source
Photolithography in semiconductor fabrication
wave length is only
hundreds of nm
image I want to
project on the PR
shape I need
on the mask
Optical proximity correction (OPC)
(smaller than the
wave length)
write code to
make it happen
Python: the good is the bad
• Dynamicity enables
convention over
compilation for productivity

• It is so when you have
good testing and
documentation
import math
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
@property
def length(self):
return math.hypot(self.x, self.y)
5.0print(Point(3, 4).length)
print(Point("3", "4").length) Traceback (most recent call last):
File "ambiguity.py", line 10, in <module>
print(Point("3", "4").length)
File "ambiguity.py", line 8, in length
return math.hypot(self.x, self.y)
TypeError: a float is required
6
What is fast?
worker(Data&, unsigned long, int):
mov DWORD PTR [rdi+rsi*4], edx
mov rax, rdi
ret
class Data {
int m_storage[1024];
public:
int & operator[] (size_t it) { return m_storage[it]; }
};
Data & worker(Data & data, size_t idx, int value) {
data[idx] = value;
return data;
}
High-level language
Machine code
7
Dynamic typing is slow
• Python trades runtime for convenience; everything is
indirect and incurs runtime overhead
def work(data, key, name, value):
# Assign value to the data as a container with key.
data[key] = value
# Assign value to the data as an object with name.
setattr(data, name, value)
return data
8
• The innocent-looking Python code uses many checking
functions under the hood
int PyObject_SetItem(PyObject *o, PyObject *key, PyObject *value)
{
PyMappingMethods *m;
if (o == NULL || key == NULL || value == NULL) {
null_error();
return -1;
}
m = o->ob_type->tp_as_mapping;
if (m && m->mp_ass_subscript)
return m->mp_ass_subscript(o, key, value);
if (o->ob_type->tp_as_sequence) {
if (PyIndex_Check(key)) {
Py_ssize_t key_value;
key_value = PyNumber_AsSsize_t(key, PyExc_IndexError);
if (key_value == -1 && PyErr_Occurred())
return -1;
return PySequence_SetItem(o, key_value, value);
}
else if (o->ob_type->tp_as_sequence->sq_ass_item) {
type_error(/* ... omit ... */);
return -1;
}
}
type_error(/* ... omit ... */);
return -1;
}
int PyObject_SetAttr(PyObject *v, PyObject *name, PyObject *value)
{
PyTypeObject *tp = Py_TYPE(v);
int err;
if (!PyUnicode_Check(name)) {
PyErr_Format(PyExc_TypeError, /* ... omit ... */);
return -1;
}
Py_INCREF(name);
PyUnicode_InternInPlace(&name);
if (tp->tp_setattro != NULL) {
err = (*tp->tp_setattro)(v, name, value);
Py_DECREF(name);
return err;
}
if (tp->tp_setattr != NULL) {
const char *name_str = PyUnicode_AsUTF8(name);
if (name_str == NULL) {
Py_DECREF(name);
return -1;
}
err = (*tp->tp_setattr)(v, (char *)name_str, value);
Py_DECREF(name);
return err;
}
Py_DECREF(name);
_PyObject_ASSERT(name, name->ob_refcnt >= 1);
if (tp->tp_getattr == NULL && tp->tp_getattro == NULL)
PyErr_Format(PyExc_TypeError, /* ... omit ... */);
else
PyErr_Format(PyExc_TypeError, /* ... omit ... */);
return -1;
}
data[key] = value setattr(data, name, value)
(in Objects/abstract.c) (in Objects/object.c)
static PyObject *__pyx_pf_6simple_work(
PyObject *__pyx_v_data, PyObject *__pyx_v_key
, PyObject *__pyx_v_name, PyObject *__pyx_v_value
) {
PyObject_SetItem(__pyx_v_data, __pyx_v_key, __pyx_v_value);
PyObject_SetAttr(__pyx_v_data, __pyx_v_name, __pyx_v_value);
return __pyx_v_data;
}
def work(data, key, name, value):
data[key] = value
setattr(data, name, value)
return data
mapping
sequence
attribute name check
check to set
9
Speed is the king
To pursue high-performance and flexibility simultaneously,
the systems need to use the best tools at the proper layers.
This is a path to develop such systems.
10
Python controls execution flow
C++ manages resources
Use C++ to generate fast assembly code
Glue Python and C++
PyObject uses malloc
PyObject_MALLOC() is a wrapper to the standard C memory
allocator malloc().
11
PyObject *
_PyObject_New(PyTypeObject *tp)
{
PyObject *op;
op = (PyObject *) PyObject_MALLOC(_PyObject_SIZE(tp));
if (op == NULL)
return PyErr_NoMemory();
return PyObject_INIT(op, tp);
}
Python call stack
• Python maintains its
own stack

• It’s different from that of
the ‘real machine’

• The two (C/C++ and
Python) call stacks work
very differently
12
#0 line 11: show_stack
#1 line 23: caller
#2 line 25: <module>: top-level module
local_val is: defined outside
import inspect
def show_stack():
# Print calling stacks.
f = inspect.currentframe()
i = 0
while f:
name = f.f_code.co_name
if '<module>' == name:
name += ': top-level module'
print("#%d line %d: %s" % (i, f.f_lineno, name))
f = f.f_back
i += 1
# Obtain a local variable defined in an outside stack.
f = inspect.currentframe().f_back
print('local_val is:', f.f_locals['local_var'])
del f
def caller():
local_var = "defined outside"
show_stack()
caller()
Best practice: glue
• For performance and compile-time checking, we do not want to use
Python

• For flow control and configuration, we do not want to use C++

• Glue tools:

• Python C API : https://docs.python.org/3/c-api/index.html

• Cython : https://cython.org/

• Boost.Python : https://www.boost.org/doc/libs/1_74_0/libs/python/
doc/html/index.html

• Pybind11 : https://github.com/pybind/pybind11
13
I only listed tools that require manual generation of wrapping code and compilation.
Pick pybind11
14
Tools Pros Cons
Python C API always available
manual reference count is hard to
get right
Cython easy to use no comprehensive C++ support
Boost.Python comprehensive C++ support requires boost
Pybind11
easy-to-use comprehensive
modern C++ support
require C++11
Build system: cmake
• Again, there are many tools for build systems

• Plain make

• Python distutils, setuptools, pip, etc.

• cmake

• cmake:

• Cross platform

• Consistent and up-to-date support
15
How cmake works
16
#include <pybind11/pybind11.h>
#include <pybind11/stl.h>
#include <cmath>
std::vector<double> distance(
std::vector<double> const & x
, std::vector<double> const & y) {
std::vector<double> r(x.size());
for (size_t i = 0 ; i < x.size() ; ++i)
{
r[i] = std::hypot(x.at(i), y.at(i));
}
return r;
}
PYBIND11_MODULE(_pybmod, mod) {
namespace py = pybind11;
mod.doc() = "simple pybind11 module";
mod.def("distance", &distance);
}
cmake_minimum_required(VERSION 3.9)
project(pybmod)
find_package(pybind11 REQUIRED)
include_directories(${pybind11_INCLUDE_DIRS})
pybind11_add_module(
_pybmod
pybmod.cpp
)
target_link_libraries(_pybmod
PRIVATE ${MKL_LINKLINE})
set_target_properties(_pybmod
PROPERTIES CXX_VISIBILITY_PRESET "default")
add_custom_target(_pybmod_py
COMMAND ${CMAKE_COMMAND} -E
copy $<TARGET_FILE:_pybmod>
${PROJECT_SOURCE_DIR}
DEPENDS _pybmod)
$ mkdir -p build ; cd build
$ cmake .. -DCMAKE_BUILD_TYPE=Release
$ make -C build _pybmod_py
$ python3 -c 'import _pybmod ; print(_pybmod.distance([3, 8], [4, 6]))'
[5.0, 10.0]
CMakeLists.txt pybmod.cpp
Zero-copy: do it where it fits
Python app C++ app
C++
container
Ndarray
manage
access
Python app C++ app
C++
container
Ndarray
manage
accessa11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn a11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn
memory buffer shared across language memory buffer shared across language
Top (Python) - down (C++) Bottom (C++) - up (Python)
Python app C++ app
a11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn
memory buffer shared across language
Ndarray
C++
container
17
Not recom
m
ended
for performance
Share buffer from C++
Two handy approaches by pybind11
18
cls
.def_buffer
(
[](wrapped_type & self)
{
namespace py = pybind11;
return py::buffer_info
(
self.data() /* Pointer to buffer */
, sizeof(int8_t) /* Size of one scalar */
, py::format_descriptor<char>::format()
/* Python struct-style format descriptor */
, 1 /* Number of dimensions */
, { self.size() } /* Buffer dimensions */
, { 1 } /* Strides (in bytes) for each index */
);
}
)
cls
.def_property_readonly
(
"ndarray"
, [](wrapped_type & self)
{
namespace py = pybind11;
return py::array
(
py::detail::npy_format_descriptor<int8_t>::dtype()
/* Numpy dtype */
, { self.size() } /* Buffer dimensions */
, { 1 } /* Strides (in bytes) for each index */
, self.data() /* Pointer to buffer */
, py::cast(self.shared_from_this())
/* Owning Python object */
);
}
)
.def_property_readonly(“ndarray2”, /* ... */)
Return a distinct numpy ndarray:

• The C++ class can return multiple
buffers
Python buffer protocol:

• The C++ class itself is turned into an
array description
Check assembly
• Make sure compilers generate good code for
performance hotspot

• Checking instead of writing: we don’t
have time to use assembly for everything

• If compilers don’t do a good job, before
jumping into hand-written assembly, consider
to use intrinsics: https://software.intel.com/
sites/landingpage/IntrinsicsGuide/
19
vmovupd ymm0, ymmword [rsi + r9*8]
vmulpd ymm0, ymm0, ymm0
vmovupd ymm1, ymmword [rdx + r9*8]
vmulpd ymm1, ymm1, ymm1
vaddpd ymm0, ymm0, ymm1
vsqrtpd ymm0, ymm0
void calc_distance(
size_t const n
, double const * x
, double const * y
, double * r) {
for (size_t i = 0 ; i < n ; ++i)
{
r[i] = std::sqrt(x[i]*x[i] + y[i]*y[i]);
}
}
movupd xmm0, xmmword [rsi + r8*8]
mulpd xmm0, xmm0
movupd xmm1, xmmword [rdx + r8*8]
mulpd xmm1, xmm1
addpd xmm1, xmm0
sqrtpd xmm0, xmm1
-O3 -mavx2 -O3
# Use redare2 to print the assembly.
r2 -Aqc "e scr.color=0 ; s sym.calc_distance_unsignedlong_doubleconst__doubleconst__double ; pdf" 
_pybmod.cpython-37m-darwin.so
AVX: 256-bit-wide vectorization SSE: 128-bit-wide vectorization
Manage resources
• Python uses reference count to manage object lifecycle.
You can do the in C++ by using shared pointer.
20
• Reference counting involves a lock and is slow. You didn’t
experience it in Python because (i) other parts in Python
are even slower and (ii) you didn’t have a choice.
class ConcreteBuffer
: public std::enable_shared_from_this<ConcreteBuffer>
{
private:
struct ctor_passkey {};
public:
static std::shared_ptr<ConcreteBuffer> construct(size_t nbytes)
{
return std::make_shared<ConcreteBuffer>(nbytes, ctor_passkey());
}
ConcreteBuffer(size_t nbytes, const ctor_passkey &)
: m_nbytes(nbytes)
, m_data(allocate(nbytes))
{}
};
Translate reference count
• If reference count is used in C++, make sure it is correctly
translated between Python.
21
pybind11::class_
<
ConcreteBuffer
, std::shared_ptr<ConcreteBuffer>
>(
mod, "ConcreteBuffer"
, "Contiguous memory buffer"
);
class ConcreteBuffer
: public std::enable_shared_from_this<ConcreteBuffer>
{
private:
struct ctor_passkey {};
public:
static std::shared_ptr<ConcreteBuffer> construct(size_t nbytes)
{
return std::make_shared<ConcreteBuffer>(nbytes, ctor_passkey());
}
ConcreteBuffer(size_t nbytes, const ctor_passkey &)
: m_nbytes(nbytes)
, m_data(allocate(nbytes))
{}
};
• If reference count is used in C++, make sure it is correctly
translated between Python.
Run Python in C++
• When C++ code puts together the architecture, oftentimes
we want to keep Python code as little as possible.

• But Python is used for scripting and it’s necessary to
return Python objects for the scripts.

• Pybind11 allows to write concise C++. It’s much more
maintainable than in-line Python.
22
>>> joined = cppfunction()
>>> print(joined)
one, two, three
py::str cppfunction() {
py::list lst = py::list();
lst.append("one");
lst.append("two");
lst.append("three");
return py::str(", ").attr("join")(lst);
}
// No, don't do this!
PyRun_SimpleString("', '.join('one', 'two', 'three')");
Import and run
Pybind11 API is very Pythonic
23
>>> show_modules()
1101
3000
void show_modules() {
py::module sys = py::module::import("sys");
py::print(py::len(sys.attr("modules")));
py::print(sys.attr("getrecursionlimit")());
}
def show_modules():
import sys
print(len(sys.modules))
print(sys.getrecursionlimit())
Spend time in compilation
to save time in runtime
Compiler does a lot of good. Make use of it as much as
possible.
24
template < typename T >
class SimpleArray
{
public:
using value_type = T;
using shape_type = small_vector<size_t>;
// Determine content value element size during compile time.
static constexpr size_t ITEMSIZE = sizeof(value_type);
static constexpr size_t itemsize() { return ITEMSIZE; }
// Straight constructor.
explicit SimpleArray(size_t length)
: m_buffer(ConcreteBuffer::construct(length * ITEMSIZE))
, m_shape{length}
, m_stride{1}
{}
template< class InputIt > SimpleArray(InputIt first, InputIt last)
: SimpleArray(last-first)
{
// Let STL decide how to optimize memory copy.
std::copy(first, last, data());
}
};
Compiler reduces runtime
errors
• Wrap the wrappers: reduce duplicated code

• Do it in compile time to reduce runtime errors
25
template
<
class Wrapper
, class Wrapped
, class Holder = std::unique_ptr<Wrapped>
, class WrappedBase = Wrapped
>
class WrapBase
{
public:
using wrapper_type = Wrapper;
using wrapped_type = Wrapped;
using wrapped_base_type = WrappedBase;
using holder_type = Holder;
// Treat inheritance hierarchy.
using class_ = typename std::conditional_t
<
std::is_same< Wrapped, WrappedBase >::value
, pybind11::class_
< wrapped_type, holder_type >
, pybind11::class_
<wrapped_type, wrapped_base_type, holder_type>
>;
// Singleton.
static wrapper_type & commit(pybind11::module & mod)
{
static wrapper_type derived(mod);
return derived;
}
class_ & cls() { return m_cls; }
protected:
// Register through construction.
template <typename... Extra>
WrapBase(
pybind11::module & mod
, char const * pyname, char const * pydoc
, const Extra & ... extra
)
: m_cls(mod, pyname, pydoc, extra ...)
{}
private:
class_ m_cls;
};
Decouple resource
management from algorithms
26
• Fixed-size
contiguous data
buffer
template < typename T >
class SimpleArray
{
public:
using value_type = T;
using shape_type = small_vector<size_t>;
static constexpr size_t ITEMSIZE = sizeof(value_type);
explicit SimpleArray(size_t length)
: m_buffer(ConcreteBuffer::construct(length * ITEMSIZE))
, m_shape{length}
, m_stride{1}
{}
private:
std::shared_ptr<ConcreteBuffer> m_buffer;
shape_type m_shape;
shape_type m_stride;
};
class ConcreteBuffer
: public std::enable_shared_from_this<ConcreteBuffer>
{
private:
struct ctor_passkey {};
public:
static std::shared_ptr<ConcreteBuffer> construct(size_t nbytes)
{ return std::make_shared<ConcreteBuffer>(nbytes, ctor_passkey()); }
ConcreteBuffer(size_t nbytes, const ctor_passkey &)
: m_nbytes(nbytes), m_data(allocate(nbytes))
{}
private:
using unique_ptr_type =
std::unique_ptr<int8_t, std::default_delete<int8_t[]>>;
size_t m_nbytes;
unique_ptr_type m_data;
};
• Descriptive data
object owning the
data buffer
Encapsulate complex calculation
without losing performance
27
// A solution element.
class Selm : public ElementBase<Selm>
{
public:
value_type dxneg() const { return x()-xneg(); }
value_type dxpos() const { return xpos()-x(); }
value_type xctr() const { return (xneg()+xpos())/2; }
value_type const & so0(size_t iv) const { return field().so0(xindex(), iv); }
value_type & so0(size_t iv) { return field().so0(xindex(), iv); }
value_type const & so1(size_t iv) const { return field().so1(xindex(), iv); }
value_type & so1(size_t iv) { return field().so1(xindex(), iv); }
value_type const & cfl() const { return field().cfl(xindex()); }
value_type & cfl() { return field().cfl(xindex()); }
value_type xn(size_t iv) const { return field().kernel().calc_xn(*this, iv); }
value_type xp(size_t iv) const { return field().kernel().calc_xp(*this, iv); }
value_type tn(size_t iv) const { return field().kernel().calc_tn(*this, iv); }
value_type tp(size_t iv) const { return field().kernel().calc_tp(*this, iv); }
value_type so0p(size_t iv) const { return field().kernel().calc_so0p(*this, iv); }
void update_cfl() { return field().kernel().update_cfl(*this); }
};
xj− 1
2
xs
j− 1
2
xj+ 1
2
xs
j+ 1
2
x+
j− 1
2
= x−
j+ 1
2
xj = xs
j
(h∗
)n
j− 1
2 ,+
· (0, −∆x+
j− 1
2
) (h∗
)n
j+ 1
2 ,−
· (0, −∆x−
j+ 1
2
)
h
n+ 1
2
j · (0, ∆xj)
(h∗
)n,+
j− 1
2
· (−∆t
2 , 0) (h∗
)n,+
j+ 1
2
· (∆t
2 , 0)
3 solution elements around a compound conservation element
(Part of a code for the space-time conservation element and solution element (CESE) method)
Conclusion
• Speed is the king for numerical calculation

• Use Python for configuration and C++ for number
crunching: glue them

• Keep in mind: Software engineering, zero-copy data
buffer, read assembly, manage resources, use compiler
for speed and safety

• More information: https://github.com/yungyuc/nsd
28
Thank you
29

Mais conteúdo relacionado

Mais procurados

数学カフェAdvent calendar2017 12_18〜圏論に於ける準同型定理〜
数学カフェAdvent calendar2017 12_18〜圏論に於ける準同型定理〜数学カフェAdvent calendar2017 12_18〜圏論に於ける準同型定理〜
数学カフェAdvent calendar2017 12_18〜圏論に於ける準同型定理〜aaaa mmmm
 
部内勉強会 数え上げの基礎
部内勉強会 数え上げの基礎部内勉強会 数え上げの基礎
部内勉強会 数え上げの基礎Kazuma Mikami
 
AtCoder Regular Contest 018 解説
AtCoder Regular Contest 018 解説AtCoder Regular Contest 018 解説
AtCoder Regular Contest 018 解説AtCoder Inc.
 
2部グラフの最小点被覆の求め方
2部グラフの最小点被覆の求め方2部グラフの最小点被覆の求め方
2部グラフの最小点被覆の求め方Kensuke Otsuki
 
Wavelet matrix implementation
Wavelet matrix implementationWavelet matrix implementation
Wavelet matrix implementationMITSUNARI Shigeo
 
代数方程式とガロア理論
代数方程式とガロア理論代数方程式とガロア理論
代数方程式とガロア理論Junpei Tsuji
 
CODE FESTIVAL 2015 予選A 解説
CODE FESTIVAL 2015 予選A 解説CODE FESTIVAL 2015 予選A 解説
CODE FESTIVAL 2015 予選A 解説AtCoder Inc.
 
自動定理証明の紹介
自動定理証明の紹介自動定理証明の紹介
自動定理証明の紹介Masahiro Sakai
 
Introduction to SAC(Soft Actor-Critic)
Introduction to SAC(Soft Actor-Critic)Introduction to SAC(Soft Actor-Critic)
Introduction to SAC(Soft Actor-Critic)Suhyun Cho
 
AtCoder Regular Contest 038 解説
AtCoder Regular Contest 038 解説AtCoder Regular Contest 038 解説
AtCoder Regular Contest 038 解説AtCoder Inc.
 
図と実装で理解する『木構造入門』
図と実装で理解する『木構造入門』図と実装で理解する『木構造入門』
図と実装で理解する『木構造入門』Proktmr
 
直前合宿 講義スライド
直前合宿 講義スライド直前合宿 講義スライド
直前合宿 講義スライドtozan gezan
 
二部グラフの最小点被覆と最大安定集合と最小辺被覆の求め方
二部グラフの最小点被覆と最大安定集合と最小辺被覆の求め方二部グラフの最小点被覆と最大安定集合と最小辺被覆の求め方
二部グラフの最小点被覆と最大安定集合と最小辺被覆の求め方Kensuke Otsuki
 
6 0 cap 006
6 0 cap 0066 0 cap 006
6 0 cap 006luisadr
 

Mais procurados (20)

Calculo sobreelvaciones y ampliaciones
Calculo  sobreelvaciones y ampliacionesCalculo  sobreelvaciones y ampliaciones
Calculo sobreelvaciones y ampliaciones
 
数学カフェAdvent calendar2017 12_18〜圏論に於ける準同型定理〜
数学カフェAdvent calendar2017 12_18〜圏論に於ける準同型定理〜数学カフェAdvent calendar2017 12_18〜圏論に於ける準同型定理〜
数学カフェAdvent calendar2017 12_18〜圏論に於ける準同型定理〜
 
目指せグラフマスター
目指せグラフマスター目指せグラフマスター
目指せグラフマスター
 
部内勉強会 数え上げの基礎
部内勉強会 数え上げの基礎部内勉強会 数え上げの基礎
部内勉強会 数え上げの基礎
 
AtCoder Regular Contest 018 解説
AtCoder Regular Contest 018 解説AtCoder Regular Contest 018 解説
AtCoder Regular Contest 018 解説
 
2部グラフの最小点被覆の求め方
2部グラフの最小点被覆の求め方2部グラフの最小点被覆の求め方
2部グラフの最小点被覆の求め方
 
Wavelet matrix implementation
Wavelet matrix implementationWavelet matrix implementation
Wavelet matrix implementation
 
双対性
双対性双対性
双対性
 
代数方程式とガロア理論
代数方程式とガロア理論代数方程式とガロア理論
代数方程式とガロア理論
 
CODE FESTIVAL 2015 予選A 解説
CODE FESTIVAL 2015 予選A 解説CODE FESTIVAL 2015 予選A 解説
CODE FESTIVAL 2015 予選A 解説
 
曲線弧長
曲線弧長曲線弧長
曲線弧長
 
自動定理証明の紹介
自動定理証明の紹介自動定理証明の紹介
自動定理証明の紹介
 
Introduction to SAC(Soft Actor-Critic)
Introduction to SAC(Soft Actor-Critic)Introduction to SAC(Soft Actor-Critic)
Introduction to SAC(Soft Actor-Critic)
 
Linear algebra
Linear algebraLinear algebra
Linear algebra
 
AtCoder Regular Contest 038 解説
AtCoder Regular Contest 038 解説AtCoder Regular Contest 038 解説
AtCoder Regular Contest 038 解説
 
図と実装で理解する『木構造入門』
図と実装で理解する『木構造入門』図と実装で理解する『木構造入門』
図と実装で理解する『木構造入門』
 
直前合宿 講義スライド
直前合宿 講義スライド直前合宿 講義スライド
直前合宿 講義スライド
 
グラフネットワーク〜フロー&カット〜
グラフネットワーク〜フロー&カット〜グラフネットワーク〜フロー&カット〜
グラフネットワーク〜フロー&カット〜
 
二部グラフの最小点被覆と最大安定集合と最小辺被覆の求め方
二部グラフの最小点被覆と最大安定集合と最小辺被覆の求め方二部グラフの最小点被覆と最大安定集合と最小辺被覆の求め方
二部グラフの最小点被覆と最大安定集合と最小辺被覆の求め方
 
6 0 cap 006
6 0 cap 0066 0 cap 006
6 0 cap 006
 

Semelhante a Notes about moving from python to c++ py contw 2020

Threads and Callbacks for Embedded Python
Threads and Callbacks for Embedded PythonThreads and Callbacks for Embedded Python
Threads and Callbacks for Embedded PythonYi-Lung Tsai
 
Cluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in CCluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in CSteffen Wenz
 
Object Oriented Technologies
Object Oriented TechnologiesObject Oriented Technologies
Object Oriented TechnologiesUmesh Nikam
 
Boost.Python: C++ and Python Integration
Boost.Python: C++ and Python IntegrationBoost.Python: C++ and Python Integration
Boost.Python: C++ and Python IntegrationGlobalLogic Ukraine
 
Pemrograman Python untuk Pemula
Pemrograman Python untuk PemulaPemrograman Python untuk Pemula
Pemrograman Python untuk PemulaOon Arfiandwi
 
Unit 2function in python.pptx
Unit 2function in python.pptxUnit 2function in python.pptx
Unit 2function in python.pptxvishnupriyapm4
 
An Overview Of Python With Functional Programming
An Overview Of Python With Functional ProgrammingAn Overview Of Python With Functional Programming
An Overview Of Python With Functional ProgrammingAdam Getchell
 
Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Python Functions Tutorial | Working With Functions In Python | Python Trainin...Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Python Functions Tutorial | Working With Functions In Python | Python Trainin...Edureka!
 
Pres_python_talakhoury_26_09_2023.pdf
Pres_python_talakhoury_26_09_2023.pdfPres_python_talakhoury_26_09_2023.pdf
Pres_python_talakhoury_26_09_2023.pdfRamziFeghali
 
#OOP_D_ITS - 2nd - C++ Getting Started
#OOP_D_ITS - 2nd - C++ Getting Started#OOP_D_ITS - 2nd - C++ Getting Started
#OOP_D_ITS - 2nd - C++ Getting StartedHadziq Fabroyir
 
Euro python2011 High Performance Python
Euro python2011 High Performance PythonEuro python2011 High Performance Python
Euro python2011 High Performance PythonIan Ozsvald
 
TWINS: OOP and FP - Warburton
TWINS: OOP and FP - WarburtonTWINS: OOP and FP - Warburton
TWINS: OOP and FP - WarburtonCodemotion
 
Chapter 1 Class 12 Computer Science Unit 1
Chapter 1 Class 12 Computer Science Unit 1Chapter 1 Class 12 Computer Science Unit 1
Chapter 1 Class 12 Computer Science Unit 1ssusera7a08a
 
Revision of the basics of python1 (1).pdf
Revision of the basics of python1 (1).pdfRevision of the basics of python1 (1).pdf
Revision of the basics of python1 (1).pdfoptimusnotch44
 

Semelhante a Notes about moving from python to c++ py contw 2020 (20)

Threads and Callbacks for Embedded Python
Threads and Callbacks for Embedded PythonThreads and Callbacks for Embedded Python
Threads and Callbacks for Embedded Python
 
Cluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in CCluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in C
 
Object Oriented Technologies
Object Oriented TechnologiesObject Oriented Technologies
Object Oriented Technologies
 
C++ manual Report Full
C++ manual Report FullC++ manual Report Full
C++ manual Report Full
 
Boost.Python: C++ and Python Integration
Boost.Python: C++ and Python IntegrationBoost.Python: C++ and Python Integration
Boost.Python: C++ and Python Integration
 
Pemrograman Python untuk Pemula
Pemrograman Python untuk PemulaPemrograman Python untuk Pemula
Pemrograman Python untuk Pemula
 
Unit 2function in python.pptx
Unit 2function in python.pptxUnit 2function in python.pptx
Unit 2function in python.pptx
 
C++ Programming
C++ ProgrammingC++ Programming
C++ Programming
 
An Overview Of Python With Functional Programming
An Overview Of Python With Functional ProgrammingAn Overview Of Python With Functional Programming
An Overview Of Python With Functional Programming
 
Lecture2.ppt
Lecture2.pptLecture2.ppt
Lecture2.ppt
 
C++ Programming
C++ ProgrammingC++ Programming
C++ Programming
 
Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Python Functions Tutorial | Working With Functions In Python | Python Trainin...Python Functions Tutorial | Working With Functions In Python | Python Trainin...
Python Functions Tutorial | Working With Functions In Python | Python Trainin...
 
Pres_python_talakhoury_26_09_2023.pdf
Pres_python_talakhoury_26_09_2023.pdfPres_python_talakhoury_26_09_2023.pdf
Pres_python_talakhoury_26_09_2023.pdf
 
#OOP_D_ITS - 2nd - C++ Getting Started
#OOP_D_ITS - 2nd - C++ Getting Started#OOP_D_ITS - 2nd - C++ Getting Started
#OOP_D_ITS - 2nd - C++ Getting Started
 
C Programming Tutorial - www.infomtec.com
C Programming Tutorial - www.infomtec.comC Programming Tutorial - www.infomtec.com
C Programming Tutorial - www.infomtec.com
 
Euro python2011 High Performance Python
Euro python2011 High Performance PythonEuro python2011 High Performance Python
Euro python2011 High Performance Python
 
Pythonpresent
PythonpresentPythonpresent
Pythonpresent
 
TWINS: OOP and FP - Warburton
TWINS: OOP and FP - WarburtonTWINS: OOP and FP - Warburton
TWINS: OOP and FP - Warburton
 
Chapter 1 Class 12 Computer Science Unit 1
Chapter 1 Class 12 Computer Science Unit 1Chapter 1 Class 12 Computer Science Unit 1
Chapter 1 Class 12 Computer Science Unit 1
 
Revision of the basics of python1 (1).pdf
Revision of the basics of python1 (1).pdfRevision of the basics of python1 (1).pdf
Revision of the basics of python1 (1).pdf
 

Mais de Yung-Yu Chen

Write Python for Speed
Write Python for SpeedWrite Python for Speed
Write Python for SpeedYung-Yu Chen
 
SimpleArray between Python and C++
SimpleArray between Python and C++SimpleArray between Python and C++
SimpleArray between Python and C++Yung-Yu Chen
 
Write code and find a job
Write code and find a jobWrite code and find a job
Write code and find a jobYung-Yu Chen
 
On the necessity and inapplicability of python
On the necessity and inapplicability of pythonOn the necessity and inapplicability of python
On the necessity and inapplicability of pythonYung-Yu Chen
 
Take advantage of C++ from Python
Take advantage of C++ from PythonTake advantage of C++ from Python
Take advantage of C++ from PythonYung-Yu Chen
 
Start Wrap Episode 11: A New Rope
Start Wrap Episode 11: A New RopeStart Wrap Episode 11: A New Rope
Start Wrap Episode 11: A New RopeYung-Yu Chen
 
Harmonic Stack for Speed
Harmonic Stack for SpeedHarmonic Stack for Speed
Harmonic Stack for SpeedYung-Yu Chen
 
Your interactive computing
Your interactive computingYour interactive computing
Your interactive computingYung-Yu Chen
 
Engineer Engineering Software
Engineer Engineering SoftwareEngineer Engineering Software
Engineer Engineering SoftwareYung-Yu Chen
 
Craftsmanship in Computational Work
Craftsmanship in Computational WorkCraftsmanship in Computational Work
Craftsmanship in Computational WorkYung-Yu Chen
 

Mais de Yung-Yu Chen (10)

Write Python for Speed
Write Python for SpeedWrite Python for Speed
Write Python for Speed
 
SimpleArray between Python and C++
SimpleArray between Python and C++SimpleArray between Python and C++
SimpleArray between Python and C++
 
Write code and find a job
Write code and find a jobWrite code and find a job
Write code and find a job
 
On the necessity and inapplicability of python
On the necessity and inapplicability of pythonOn the necessity and inapplicability of python
On the necessity and inapplicability of python
 
Take advantage of C++ from Python
Take advantage of C++ from PythonTake advantage of C++ from Python
Take advantage of C++ from Python
 
Start Wrap Episode 11: A New Rope
Start Wrap Episode 11: A New RopeStart Wrap Episode 11: A New Rope
Start Wrap Episode 11: A New Rope
 
Harmonic Stack for Speed
Harmonic Stack for SpeedHarmonic Stack for Speed
Harmonic Stack for Speed
 
Your interactive computing
Your interactive computingYour interactive computing
Your interactive computing
 
Engineer Engineering Software
Engineer Engineering SoftwareEngineer Engineering Software
Engineer Engineering Software
 
Craftsmanship in Computational Work
Craftsmanship in Computational WorkCraftsmanship in Computational Work
Craftsmanship in Computational Work
 

Último

Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfSumit Kumar yadav
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Velocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptVelocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptRakeshMohan42
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxRenuJangid3
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flyPRADYUMMAURYA1
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...Monika Rani
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...Scintica Instrumentation
 
Stages in the normal growth curve
Stages in the normal growth curveStages in the normal growth curve
Stages in the normal growth curveAreesha Ahmad
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Silpa
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 

Último (20)

Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Velocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptVelocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.ppt
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Stages in the normal growth curve
Stages in the normal growth curveStages in the normal growth curve
Stages in the normal growth curve
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 

Notes about moving from python to c++ py contw 2020

  • 1. Notes about moving from Python to C++ Yung-Yu Chen (@yungyuc) PyConTW 2020, Tainan, Taiwan 1
  • 2. Why move? • Numerical software: Computer programs to solve scientific or mathematic problems. • Other names: Mathematical software, scientific software, technical software. • Python is a popular language for application experts to describe the problems and solutions, because it is easy to use. • Most of the computing systems (the numerical software) are designed in a hybrid architecture. • The computing kernel uses C++. • Python is chosen for the user-level API. 2
  • 3. What I do • By training, I am a computational scientist, focusing on applications of continuum mechanics. (I am not a computer scientist.) • In my day job, I write high-performance code for semiconductor applications of computational geometry and photolithography. • In my spare time, I am teaching a course ‘numerical software development’ in the department of computer science in NCTU. 3 twitter: https://twitter.com/yungyuc
 linkedin: https://www.linkedin.com/in/yungyuc/.
  • 4. Example: PDEs 4 Numerical simulations of conservation laws: ∂u ∂t + 3 ∑ k=1 ∂F(k) (u) ∂xk = 0 Use case: stress waves in 
 anisotropic solids Use case: compressible flows
  • 5. Example: OPC 5 photoresist silicon substrate photomask light source Photolithography in semiconductor fabrication wave length is only hundreds of nm image I want to project on the PR shape I need on the mask Optical proximity correction (OPC) (smaller than the wave length) write code to make it happen
  • 6. Python: the good is the bad • Dynamicity enables convention over compilation for productivity • It is so when you have good testing and documentation import math class Point: def __init__(self, x, y): self.x = x self.y = y @property def length(self): return math.hypot(self.x, self.y) 5.0print(Point(3, 4).length) print(Point("3", "4").length) Traceback (most recent call last): File "ambiguity.py", line 10, in <module> print(Point("3", "4").length) File "ambiguity.py", line 8, in length return math.hypot(self.x, self.y) TypeError: a float is required 6
  • 7. What is fast? worker(Data&, unsigned long, int): mov DWORD PTR [rdi+rsi*4], edx mov rax, rdi ret class Data { int m_storage[1024]; public: int & operator[] (size_t it) { return m_storage[it]; } }; Data & worker(Data & data, size_t idx, int value) { data[idx] = value; return data; } High-level language Machine code 7
  • 8. Dynamic typing is slow • Python trades runtime for convenience; everything is indirect and incurs runtime overhead def work(data, key, name, value): # Assign value to the data as a container with key. data[key] = value # Assign value to the data as an object with name. setattr(data, name, value) return data 8 • The innocent-looking Python code uses many checking functions under the hood
  • 9. int PyObject_SetItem(PyObject *o, PyObject *key, PyObject *value) { PyMappingMethods *m; if (o == NULL || key == NULL || value == NULL) { null_error(); return -1; } m = o->ob_type->tp_as_mapping; if (m && m->mp_ass_subscript) return m->mp_ass_subscript(o, key, value); if (o->ob_type->tp_as_sequence) { if (PyIndex_Check(key)) { Py_ssize_t key_value; key_value = PyNumber_AsSsize_t(key, PyExc_IndexError); if (key_value == -1 && PyErr_Occurred()) return -1; return PySequence_SetItem(o, key_value, value); } else if (o->ob_type->tp_as_sequence->sq_ass_item) { type_error(/* ... omit ... */); return -1; } } type_error(/* ... omit ... */); return -1; } int PyObject_SetAttr(PyObject *v, PyObject *name, PyObject *value) { PyTypeObject *tp = Py_TYPE(v); int err; if (!PyUnicode_Check(name)) { PyErr_Format(PyExc_TypeError, /* ... omit ... */); return -1; } Py_INCREF(name); PyUnicode_InternInPlace(&name); if (tp->tp_setattro != NULL) { err = (*tp->tp_setattro)(v, name, value); Py_DECREF(name); return err; } if (tp->tp_setattr != NULL) { const char *name_str = PyUnicode_AsUTF8(name); if (name_str == NULL) { Py_DECREF(name); return -1; } err = (*tp->tp_setattr)(v, (char *)name_str, value); Py_DECREF(name); return err; } Py_DECREF(name); _PyObject_ASSERT(name, name->ob_refcnt >= 1); if (tp->tp_getattr == NULL && tp->tp_getattro == NULL) PyErr_Format(PyExc_TypeError, /* ... omit ... */); else PyErr_Format(PyExc_TypeError, /* ... omit ... */); return -1; } data[key] = value setattr(data, name, value) (in Objects/abstract.c) (in Objects/object.c) static PyObject *__pyx_pf_6simple_work( PyObject *__pyx_v_data, PyObject *__pyx_v_key , PyObject *__pyx_v_name, PyObject *__pyx_v_value ) { PyObject_SetItem(__pyx_v_data, __pyx_v_key, __pyx_v_value); PyObject_SetAttr(__pyx_v_data, __pyx_v_name, __pyx_v_value); return __pyx_v_data; } def work(data, key, name, value): data[key] = value setattr(data, name, value) return data mapping sequence attribute name check check to set 9
  • 10. Speed is the king To pursue high-performance and flexibility simultaneously, the systems need to use the best tools at the proper layers. This is a path to develop such systems. 10 Python controls execution flow C++ manages resources Use C++ to generate fast assembly code Glue Python and C++
  • 11. PyObject uses malloc PyObject_MALLOC() is a wrapper to the standard C memory allocator malloc(). 11 PyObject * _PyObject_New(PyTypeObject *tp) { PyObject *op; op = (PyObject *) PyObject_MALLOC(_PyObject_SIZE(tp)); if (op == NULL) return PyErr_NoMemory(); return PyObject_INIT(op, tp); }
  • 12. Python call stack • Python maintains its own stack • It’s different from that of the ‘real machine’ • The two (C/C++ and Python) call stacks work very differently 12 #0 line 11: show_stack #1 line 23: caller #2 line 25: <module>: top-level module local_val is: defined outside import inspect def show_stack(): # Print calling stacks. f = inspect.currentframe() i = 0 while f: name = f.f_code.co_name if '<module>' == name: name += ': top-level module' print("#%d line %d: %s" % (i, f.f_lineno, name)) f = f.f_back i += 1 # Obtain a local variable defined in an outside stack. f = inspect.currentframe().f_back print('local_val is:', f.f_locals['local_var']) del f def caller(): local_var = "defined outside" show_stack() caller()
  • 13. Best practice: glue • For performance and compile-time checking, we do not want to use Python • For flow control and configuration, we do not want to use C++ • Glue tools: • Python C API : https://docs.python.org/3/c-api/index.html • Cython : https://cython.org/ • Boost.Python : https://www.boost.org/doc/libs/1_74_0/libs/python/ doc/html/index.html • Pybind11 : https://github.com/pybind/pybind11 13 I only listed tools that require manual generation of wrapping code and compilation.
  • 14. Pick pybind11 14 Tools Pros Cons Python C API always available manual reference count is hard to get right Cython easy to use no comprehensive C++ support Boost.Python comprehensive C++ support requires boost Pybind11 easy-to-use comprehensive modern C++ support require C++11
  • 15. Build system: cmake • Again, there are many tools for build systems • Plain make • Python distutils, setuptools, pip, etc. • cmake • cmake: • Cross platform • Consistent and up-to-date support 15
  • 16. How cmake works 16 #include <pybind11/pybind11.h> #include <pybind11/stl.h> #include <cmath> std::vector<double> distance( std::vector<double> const & x , std::vector<double> const & y) { std::vector<double> r(x.size()); for (size_t i = 0 ; i < x.size() ; ++i) { r[i] = std::hypot(x.at(i), y.at(i)); } return r; } PYBIND11_MODULE(_pybmod, mod) { namespace py = pybind11; mod.doc() = "simple pybind11 module"; mod.def("distance", &distance); } cmake_minimum_required(VERSION 3.9) project(pybmod) find_package(pybind11 REQUIRED) include_directories(${pybind11_INCLUDE_DIRS}) pybind11_add_module( _pybmod pybmod.cpp ) target_link_libraries(_pybmod PRIVATE ${MKL_LINKLINE}) set_target_properties(_pybmod PROPERTIES CXX_VISIBILITY_PRESET "default") add_custom_target(_pybmod_py COMMAND ${CMAKE_COMMAND} -E copy $<TARGET_FILE:_pybmod> ${PROJECT_SOURCE_DIR} DEPENDS _pybmod) $ mkdir -p build ; cd build $ cmake .. -DCMAKE_BUILD_TYPE=Release $ make -C build _pybmod_py $ python3 -c 'import _pybmod ; print(_pybmod.distance([3, 8], [4, 6]))' [5.0, 10.0] CMakeLists.txt pybmod.cpp
  • 17. Zero-copy: do it where it fits Python app C++ app C++ container Ndarray manage access Python app C++ app C++ container Ndarray manage accessa11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn a11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn memory buffer shared across language memory buffer shared across language Top (Python) - down (C++) Bottom (C++) - up (Python) Python app C++ app a11 a12 ⋯ a1n a21 ⋯ am1 ⋯ amn memory buffer shared across language Ndarray C++ container 17 Not recom m ended for performance
  • 18. Share buffer from C++ Two handy approaches by pybind11 18 cls .def_buffer ( [](wrapped_type & self) { namespace py = pybind11; return py::buffer_info ( self.data() /* Pointer to buffer */ , sizeof(int8_t) /* Size of one scalar */ , py::format_descriptor<char>::format() /* Python struct-style format descriptor */ , 1 /* Number of dimensions */ , { self.size() } /* Buffer dimensions */ , { 1 } /* Strides (in bytes) for each index */ ); } ) cls .def_property_readonly ( "ndarray" , [](wrapped_type & self) { namespace py = pybind11; return py::array ( py::detail::npy_format_descriptor<int8_t>::dtype() /* Numpy dtype */ , { self.size() } /* Buffer dimensions */ , { 1 } /* Strides (in bytes) for each index */ , self.data() /* Pointer to buffer */ , py::cast(self.shared_from_this()) /* Owning Python object */ ); } ) .def_property_readonly(“ndarray2”, /* ... */) Return a distinct numpy ndarray: • The C++ class can return multiple buffers Python buffer protocol: • The C++ class itself is turned into an array description
  • 19. Check assembly • Make sure compilers generate good code for performance hotspot • Checking instead of writing: we don’t have time to use assembly for everything • If compilers don’t do a good job, before jumping into hand-written assembly, consider to use intrinsics: https://software.intel.com/ sites/landingpage/IntrinsicsGuide/ 19 vmovupd ymm0, ymmword [rsi + r9*8] vmulpd ymm0, ymm0, ymm0 vmovupd ymm1, ymmword [rdx + r9*8] vmulpd ymm1, ymm1, ymm1 vaddpd ymm0, ymm0, ymm1 vsqrtpd ymm0, ymm0 void calc_distance( size_t const n , double const * x , double const * y , double * r) { for (size_t i = 0 ; i < n ; ++i) { r[i] = std::sqrt(x[i]*x[i] + y[i]*y[i]); } } movupd xmm0, xmmword [rsi + r8*8] mulpd xmm0, xmm0 movupd xmm1, xmmword [rdx + r8*8] mulpd xmm1, xmm1 addpd xmm1, xmm0 sqrtpd xmm0, xmm1 -O3 -mavx2 -O3 # Use redare2 to print the assembly. r2 -Aqc "e scr.color=0 ; s sym.calc_distance_unsignedlong_doubleconst__doubleconst__double ; pdf" _pybmod.cpython-37m-darwin.so AVX: 256-bit-wide vectorization SSE: 128-bit-wide vectorization
  • 20. Manage resources • Python uses reference count to manage object lifecycle. You can do the in C++ by using shared pointer. 20 • Reference counting involves a lock and is slow. You didn’t experience it in Python because (i) other parts in Python are even slower and (ii) you didn’t have a choice. class ConcreteBuffer : public std::enable_shared_from_this<ConcreteBuffer> { private: struct ctor_passkey {}; public: static std::shared_ptr<ConcreteBuffer> construct(size_t nbytes) { return std::make_shared<ConcreteBuffer>(nbytes, ctor_passkey()); } ConcreteBuffer(size_t nbytes, const ctor_passkey &) : m_nbytes(nbytes) , m_data(allocate(nbytes)) {} };
  • 21. Translate reference count • If reference count is used in C++, make sure it is correctly translated between Python. 21 pybind11::class_ < ConcreteBuffer , std::shared_ptr<ConcreteBuffer> >( mod, "ConcreteBuffer" , "Contiguous memory buffer" ); class ConcreteBuffer : public std::enable_shared_from_this<ConcreteBuffer> { private: struct ctor_passkey {}; public: static std::shared_ptr<ConcreteBuffer> construct(size_t nbytes) { return std::make_shared<ConcreteBuffer>(nbytes, ctor_passkey()); } ConcreteBuffer(size_t nbytes, const ctor_passkey &) : m_nbytes(nbytes) , m_data(allocate(nbytes)) {} }; • If reference count is used in C++, make sure it is correctly translated between Python.
  • 22. Run Python in C++ • When C++ code puts together the architecture, oftentimes we want to keep Python code as little as possible. • But Python is used for scripting and it’s necessary to return Python objects for the scripts. • Pybind11 allows to write concise C++. It’s much more maintainable than in-line Python. 22 >>> joined = cppfunction() >>> print(joined) one, two, three py::str cppfunction() { py::list lst = py::list(); lst.append("one"); lst.append("two"); lst.append("three"); return py::str(", ").attr("join")(lst); } // No, don't do this! PyRun_SimpleString("', '.join('one', 'two', 'three')");
  • 23. Import and run Pybind11 API is very Pythonic 23 >>> show_modules() 1101 3000 void show_modules() { py::module sys = py::module::import("sys"); py::print(py::len(sys.attr("modules"))); py::print(sys.attr("getrecursionlimit")()); } def show_modules(): import sys print(len(sys.modules)) print(sys.getrecursionlimit())
  • 24. Spend time in compilation to save time in runtime Compiler does a lot of good. Make use of it as much as possible. 24 template < typename T > class SimpleArray { public: using value_type = T; using shape_type = small_vector<size_t>; // Determine content value element size during compile time. static constexpr size_t ITEMSIZE = sizeof(value_type); static constexpr size_t itemsize() { return ITEMSIZE; } // Straight constructor. explicit SimpleArray(size_t length) : m_buffer(ConcreteBuffer::construct(length * ITEMSIZE)) , m_shape{length} , m_stride{1} {} template< class InputIt > SimpleArray(InputIt first, InputIt last) : SimpleArray(last-first) { // Let STL decide how to optimize memory copy. std::copy(first, last, data()); } };
  • 25. Compiler reduces runtime errors • Wrap the wrappers: reduce duplicated code • Do it in compile time to reduce runtime errors 25 template < class Wrapper , class Wrapped , class Holder = std::unique_ptr<Wrapped> , class WrappedBase = Wrapped > class WrapBase { public: using wrapper_type = Wrapper; using wrapped_type = Wrapped; using wrapped_base_type = WrappedBase; using holder_type = Holder; // Treat inheritance hierarchy. using class_ = typename std::conditional_t < std::is_same< Wrapped, WrappedBase >::value , pybind11::class_ < wrapped_type, holder_type > , pybind11::class_ <wrapped_type, wrapped_base_type, holder_type> >; // Singleton. static wrapper_type & commit(pybind11::module & mod) { static wrapper_type derived(mod); return derived; } class_ & cls() { return m_cls; } protected: // Register through construction. template <typename... Extra> WrapBase( pybind11::module & mod , char const * pyname, char const * pydoc , const Extra & ... extra ) : m_cls(mod, pyname, pydoc, extra ...) {} private: class_ m_cls; };
  • 26. Decouple resource management from algorithms 26 • Fixed-size contiguous data buffer template < typename T > class SimpleArray { public: using value_type = T; using shape_type = small_vector<size_t>; static constexpr size_t ITEMSIZE = sizeof(value_type); explicit SimpleArray(size_t length) : m_buffer(ConcreteBuffer::construct(length * ITEMSIZE)) , m_shape{length} , m_stride{1} {} private: std::shared_ptr<ConcreteBuffer> m_buffer; shape_type m_shape; shape_type m_stride; }; class ConcreteBuffer : public std::enable_shared_from_this<ConcreteBuffer> { private: struct ctor_passkey {}; public: static std::shared_ptr<ConcreteBuffer> construct(size_t nbytes) { return std::make_shared<ConcreteBuffer>(nbytes, ctor_passkey()); } ConcreteBuffer(size_t nbytes, const ctor_passkey &) : m_nbytes(nbytes), m_data(allocate(nbytes)) {} private: using unique_ptr_type = std::unique_ptr<int8_t, std::default_delete<int8_t[]>>; size_t m_nbytes; unique_ptr_type m_data; }; • Descriptive data object owning the data buffer
  • 27. Encapsulate complex calculation without losing performance 27 // A solution element. class Selm : public ElementBase<Selm> { public: value_type dxneg() const { return x()-xneg(); } value_type dxpos() const { return xpos()-x(); } value_type xctr() const { return (xneg()+xpos())/2; } value_type const & so0(size_t iv) const { return field().so0(xindex(), iv); } value_type & so0(size_t iv) { return field().so0(xindex(), iv); } value_type const & so1(size_t iv) const { return field().so1(xindex(), iv); } value_type & so1(size_t iv) { return field().so1(xindex(), iv); } value_type const & cfl() const { return field().cfl(xindex()); } value_type & cfl() { return field().cfl(xindex()); } value_type xn(size_t iv) const { return field().kernel().calc_xn(*this, iv); } value_type xp(size_t iv) const { return field().kernel().calc_xp(*this, iv); } value_type tn(size_t iv) const { return field().kernel().calc_tn(*this, iv); } value_type tp(size_t iv) const { return field().kernel().calc_tp(*this, iv); } value_type so0p(size_t iv) const { return field().kernel().calc_so0p(*this, iv); } void update_cfl() { return field().kernel().update_cfl(*this); } }; xj− 1 2 xs j− 1 2 xj+ 1 2 xs j+ 1 2 x+ j− 1 2 = x− j+ 1 2 xj = xs j (h∗ )n j− 1 2 ,+ · (0, −∆x+ j− 1 2 ) (h∗ )n j+ 1 2 ,− · (0, −∆x− j+ 1 2 ) h n+ 1 2 j · (0, ∆xj) (h∗ )n,+ j− 1 2 · (−∆t 2 , 0) (h∗ )n,+ j+ 1 2 · (∆t 2 , 0) 3 solution elements around a compound conservation element (Part of a code for the space-time conservation element and solution element (CESE) method)
  • 28. Conclusion • Speed is the king for numerical calculation • Use Python for configuration and C++ for number crunching: glue them • Keep in mind: Software engineering, zero-copy data buffer, read assembly, manage resources, use compiler for speed and safety • More information: https://github.com/yungyuc/nsd 28