SlideShare uma empresa Scribd logo
1 de 30
Baixar para ler offline
Take advantage of C++
from Python
Yung-Yu Chen
PyCon Kyushu
30th June 2018
Why C++
❖ Python is slow
❖ Everything is on heap
❖ Always dynamic types
❖ Hard to access assembly
❖ Convoluted algorithms with ndarray
❖ Access external code written in any language
❖ Detail control and abstraction
Hard problems take time
• Supersonic jet in cross flow; density contour
• 264 cores with 53 hours for 1.3 B variables (66 M elements) by 12,000 time steps
• At OSC, 2011 (10 Gbps InfiniBand)
HPC (high-performance computing) is hard. Physics is harder. Don’t mingle.
Best of both worlds
❖ C++: fast runtime, strong static type checking, industrial grade
❖ Slow to code
❖ Python: fast prototyping, batteries included, easy to use
❖ Slow to run
❖ Hybrid system is everywhere.
❖ TensorFlow, Blender, OpenCV, etc.
❖ C++ crunches numbers. Python controls the flow.
❖ Applications work like libraries, libraries like applications.
pybind11
❖ https://github.com/pybind/pybind11: C++11
❖ Expose C++ entities to Python
❖ Use Python from C++
❖ list, tuple, dict, and str
❖ handle, object, and none
C++11(/14/17/20)
New language features: auto and decltype, defaulted and deleted
functions, final and override, trailing return type, rvalue references,
move constructors/move assignment, scoped enums, constexpr and
literal types, list initialization, delegating and inherited constructors,
brace-or-equal initializers, nullptr, long long, char16_t and char32_t,
type aliases, variadic templates, generalized unions, generalized
PODs, Unicode string literals, user-defined literals, attributes,
lambda expressions, noexcept, alignof and alignas, multithreaded
memory model, thread-local storage, GC interface, range for (based
on a Boost library), static assertions (based on a Boost library)
http://en.cppreference.com/w/cpp/language/history
Python’s friends
❖ Shared pointer: manage resource ownership between
C++ and Python
❖ Move semantics: speed
❖ Lambda expression: ease the wrapping code
Ownership
❖ All Python objects are dynamically allocated on the
heap. Python uses reference counting to know who
should deallocate the object when it is no longer used.
❖ A owner of the reference to an object is responsible for
deallocating the object. With multiple owners, the last
owner (at this time, the reference count is 1) calls the
destructor and deallocate. Other owners simply
decrement the count by 1.
Shared pointer
#include <memory>
#include <vector>
#include <algorithm>
#include <iostream>
class Series {
std::vector<int> m_data;
public:
int sum() const {
const int ret = std::accumulate(
m_data.begin(), m_data.end(), 0);
std::cout << "Series::sum() = " << ret << std::endl;
return ret;
}
static size_t count;
Series(size_t size, int lead) : m_data(size) {
for (size_t it=0; it<size; it++) { m_data[it] = lead+it; }
count++;
}
~Series() { count--; }
};
size_t Series::count = 0;
void use_raw_pointer() {
Series * series_ptr = new Series(10, 2);
series_ptr->sum(); // call member function
// OUT: Series::sum() = 65
// remember to delete the object or we leak memory
std::cout << "before explicit deletion, Series::count = "
<< Series::count << std::endl;
// OUT: before explicit deletion, Series::count = 1
delete series_ptr;
std::cout << "after the resource is manually freed, Series::count = "
<< Series::count << std::endl;
// OUT: after the resource is manually freed, Series::count = 0
}
void use_shared_pointer() {
std::shared_ptr<Series> series_sptr(new Series(10, 3));
series_sptr->sum(); // call member function
// OUT: Series::sum() = 75
// note shared_ptr handles deletion for series_sptr
}
int main(int argc, char ** argv) {
// the common raw pointer
use_raw_pointer();
// now, shared_ptr
use_shared_pointer();
std::cout << "no memory leak: Series::count = "
<< Series::count << std::endl;
// OUT: no memory leak: Series::count = 0
return 0;
}
Move semantics
❖ Number-crunching code needs large arrays as memory buffers.
They aren’t supposed to be copied frequently.
❖ 50,000 × 50,000 takes 20 GB.
❖ Shared pointers should manage large chucks of memory.
❖ New reference to an object: copy constructor of shared pointer
❖ Borrowed reference to an object: const reference to the shared
pointer
❖ Stolen reference to an object: move constructor of shared
pointer
Lambda
❖ Put the code at the place it should be shown
namespace py = pybind11;
cls = py::class_< wrapped_type, holder_type >(mod, pyname, clsdoc):
cls
.def(
py::init([](block_type & block, index_type icl, bool init_sentinel) {
return wrapped_type(block, icl, init_sentinel);
}),
py::arg("block"), py::arg("icl"), py::arg("init_sentinel")=true
)
.def("repr", &wrapped_type::repr, py::arg("indent")=0, py::arg("precision")=0)
.def("__repr__", [](wrapped_type & self){ return self.repr(); })
.def("init_sentinel", &wrapped_type::init_sentinel)
.def_readwrite("cnd", &wrapped_type::cnd)
.def_readwrite("vol", &wrapped_type::vol)
.def_property_readonly(
"nbce",
[](wrapped_type & self) { return self.bces.size(); }
)
.def(
"get_bce",
[](wrapped_type & self, index_type ibce) { return self.bces.at(ibce); }
)
;
Lambda, cont’d
❖ Code as free as Python, as fast as C
#include <unordered_map>
#include <functional>
#include <cstdio>
int main(int argc, char ** argv) {
// Python: fmap = dict()
std::unordered_map<int, std::function<void(int)>> fmap;
// Python: fmap[1] = lambda v: print("v = %d" % v)
fmap.insert({
1, [](int v) -> void { std::printf("v = %dn", v); }
});
// Python: fmap[5] = lambda v: print("v*5 = %d" % (v*5))
fmap.insert({
5, [](int v) -> void { std::printf("v*5 = %dn", v*5); }
});
std::unordered_map<int, std::function<void(int)>>::iterator search;
// Python: fmap[1](100)
search = fmap.find(1);
search->second(100);
// OUT: v = 100
// Python: fmap[5](500)
search = fmap.find(5);
search->second(500);
// OUT: v*5 = 2500
return 0;
}
Manipulate Python
❖ Don’t mingle Python with C++
❖ Python has GIL
❖ Don’t include Python.h if you don’t intend to run
Python
❖ Once it enters your core, it’s hard to get it off
#include <Python.h>
class Core {
private:
int m_value;
PyObject * m_pyobject;
};
Do it in the wrapping layer
cls
.def(
py::init([](py::object pyblock) {
block_type * block = py::cast<block_type *>(pyblock.attr("_ustblk"));
std::shared_ptr<wrapped_type> svr = wrapped_type::construct(block->shared_from_this());
for (auto bc : py::list(pyblock.attr("bclist"))) {
std::string name = py::str(bc.attr("__class__").attr("__name__").attr("lstrip")("GasPlus"));
BoundaryData * data = py::cast<BoundaryData *>(bc.attr("_data"));
std::unique_ptr<gas::TrimBase<NDIM>> trim;
if ("Interface" == name) {
trim = make_unique<gas::TrimInterface<NDIM>>(*svr, *data);
} else if ("NoOp" == name) {
trim = make_unique<gas::TrimNoOp<NDIM>>(*svr, *data);
} else if ("NonRefl" == name) {
trim = make_unique<gas::TrimNonRefl<NDIM>>(*svr, *data);
} else if ("SlipWall" == name) {
trim = make_unique<gas::TrimSlipWall<NDIM>>(*svr, *data);
} else if ("Inlet" == name) {
trim = make_unique<gas::TrimInlet<NDIM>>(*svr, *data);
} else {
/* do nothing for now */ // throw std::runtime_error("BC type unknown");
}
svr->trims().push_back(std::move(trim));
}
if (report_interval) { svr->make_qty(); }
return svr;
}),
py::arg("block")
);
pybind11::list
❖ Read a list and cast contents:
❖ Populate:
#include <pybind11/pybind11.h> // must be first
#include <string>
#include <iostream>
namespace py = pybind11;
PYBIND11_MODULE(_pylist, mod) {
mod.def(
"do",
[](py::list & l) {
// convert contents to std::string and send to cout
std::cout << "std::cout:" << std::endl;
for (py::handle o : l) {
std::string s = py::cast<std::string>(o);
std::cout << s << std::endl;
}
}
);
mod.def(
"do2",
[](py::list & l) {
// create a new list
std::cout << "py::print:" << std::endl;
py::list l2;
for (py::handle o : l) {
std::string s = py::cast<std::string>(o);
s = "elm:" + s;
py::str s2(s);
l2.append(s2); // populate contents
}
py::print(l2);
}
);
} /* end PYBIND11_PLUGIN(_pylist) */
>>> import _pylist
>>> # print the input list
>>> _pylist.do(["a", "b", "c"])
std::cout:
a
b
c
>>> _pylist.do2(["d", "e", "f"])
py::print:
['elm:d', 'elm:e', 'elm:f']
pybind11::tuple
❖ Tuple is immutable, thus
behaves like read-only. The
construction is through another
iterable object.
❖ Read the contents of a tuple:
#include <pybind11/pybind11.h> // must be first
#include <vector>
namespace py = pybind11;
PYBIND11_MODULE(_pytuple, mod) {
mod.def(
"do",
[](py::args & args) {
// build a list using py::list::append
py::list l;
for (py::handle h : args) {
l.append(h);
}
// convert it to a tuple
py::tuple t(l);
// print it out
py::print(py::str("{} len={}").format(t, t.size()));
// print the element one by one
for (size_t it=0; it<t.size(); ++it) {
py::print(py::str("{}").format(t[it]));
}
}
);
} /* end PYBIND11_PLUGIN(_pytuple) */
>>> import _pytuple
>>> _pytuple.do("a", 7, 5.6)
('a', 7, 5.6) len=3
a
7
5.6
pybind11::dict
❖ Dictionary is one of the
most useful container in
Python.
❖ Populate a dictionary:
❖ Manipulate it:
#include <pybind11/pybind11.h> // must be first
#include <string>
#include <stdexcept>
#include <iostream>
namespace py = pybind11;
PYBIND11_MODULE(_pydict, mod) {
mod.def(
"do",
[](py::args & args) {
if (args.size() % 2 != 0) {
throw std::runtime_error("argument number must be even");
}
// create a dict from the input tuple
py::dict d;
for (size_t it=0; it<args.size(); it+=2) {
d[args[it]] = args[it+1];
}
return d;
}
);
mod.def(
"do2",
[](py::dict d, py::args & args) {
for (py::handle h : args) {
if (d.contains(h)) {
std::cout << py::cast<std::string>(h)
<< " is in the input dictionary" << std::endl;
} else {
std::cout << py::cast<std::string>(h)
<< " is not found in the input dictionary" << std::endl;
}
}
std::cout << "remove everything in the input dictionary!" << std::endl;
d.clear();
return d;
}
);
} /* end PYBIND11_PLUGIN(_pydict) */
>>> import _pydict
>>> d = _pydict.do("a", 7, "b", "name", 10, 4.2)
>>> print(d)
{'a': 7, 'b': 'name', 10: 4.2}
>>> d2 = _pydict.do2(d, "b", "d")
b is in the input dictionary
d is not found in the input dictionary
remove everything in the input dictionary!
>>> print("The returned dictionary is empty:", d2)
The returned dictionary is empty: {}
>>> print("The first dictionary becomes empty too:", d)
The first dictionary becomes empty too: {}
>>> print("Are the two dictionaries the same?", d2 is d)
Are the two dictionaries the same? True
pybind11::str
❖ One more trick with
Python strings in
pybind11; user-defined
literal:



#include <pybind11/pybind11.h> // must be first
#include <iostream>
namespace py = pybind11;
using namespace py::literals; // to bring in the `_s` literal
PYBIND11_MODULE(_pystr, mod) {
mod.def(
"do",
[]() {
py::str s("python string {}"_s.format("formatting"));
py::print(s);
}
);
} /* end PYBIND11_PLUGIN(_pystr) */
>>> import _pystr
>>> _pystr.do()
python string formatting
Generic Python objects
❖ Pybind11 defines two generic types for representing
Python objects:
❖ “handle”: base class of all pybind11 classes for Python
types
❖ “object” derives from handle and adds automatic
reference counting
pybind11::handle and object
manually descrases refcount after h.dec_ref(): 3
#include <pybind11/pybind11.h> // must be first
#include <iostream>
namespace py = pybind11;
using namespace py::literals; // to bring in the `_s` literal
PYBIND11_MODULE(_pyho, mod) {
mod.def(
"do",
[](py::object const & o) {
std::cout << "refcount in the beginning: "
<< o.ptr()->ob_refcnt << std::endl;
py::handle h(o);
std::cout << "no increase of refcount with a new pybind11::handle: "
<< h.ptr()->ob_refcnt << std::endl;
{
py::object o2(o);
std::cout << "increased refcount with a new pybind11::object: "
<< o2.ptr()->ob_refcnt << std::endl;
}
std::cout << "decreased refcount after the new pybind11::object destructed: "
<< o.ptr()->ob_refcnt << std::endl;
h.inc_ref();
std::cout << "manually increases refcount after h.inc_ref(): "
<< h.ptr()->ob_refcnt << std::endl;
h.dec_ref();
std::cout << "manually descrases refcount after h.dec_ref(): "
<< h.ptr()->ob_refcnt << std::endl;
}
);
} /* end PYBIND11_PLUGIN(_pyho) */
>>> import _pyho
>>> _pyho.do(["name"])
refcount in the beginning: 3
no increase of refcount with a new pybind11::handle: 3
increased refcount with a new pybind11::object: 4
decreased refcount after the new pybind11::object destructed: 3
manually increases refcount after h.inc_ref(): 4
pybind11::none
❖ It’s worth noting that
pybind11 has “none”
type. In Python, None is
a singleton, and
accessible as Py_None in
the C API.
❖ Access None single from
C++:
#include <pybind11/pybind11.h> // must be first
#include <iostream>
namespace py = pybind11;
using namespace py::literals; // to bring in the `_s` literal
PYBIND11_MODULE(_pynone, mod) {
mod.def(
"do",
[](py::object const & o) {
if (o.is(py::none())) {
std::cout << "it is None" << std::endl;
} else {
std::cout << "it is not None" << std::endl;
}
}
);
} /* end PYBIND11_PLUGIN(_pynone) */
>>> import _pynone
>>> _pynone.do(None)
it is None
>>> _pynone.do(False)
it is not None
Fast Code with C++
Never loop in Python
❖ Sum 100,000,000 integers
❖ The C++ version:
❖ Numpy is better, but not enough
$ python -m timeit -s 'data = range(100000000)' 'sum(data)'
10 loops, best of 3: 2.36 sec per loop
$ time ./run
real 0m0.010s
user 0m0.002s
sys 0m0.004s
#include <cstdio>
int main(int argc, char ** argv) {
long value = 0;
for (long it=0; it<100000000; ++it) { value += it; }
return 0;
}
$ python -m timeit -s 'import numpy as np ; data =
np.arange(100000000, dtype="int64")' 'data.sum()'
10 loops, best of 3: 74.9 msec per loop
Wisely use arrays
❖ Python calls are expensive. Data need to be transferred
from Python to C++ in batch. Use arrays.
❖ C++ code may use arrays as internal representation. For
example, matrices are arrays having a 2-D view.
❖ Arrays are used as both
❖ interface between Python and C++, and
❖ internal storage in the C++ engine
Arrays in Python
❖ What we really mean is numpy(.ndarray)
❖ 12 lines to create vertices for zig-zagging mesh
❖ They get things done, although sometimes look convoluted
# create nodes.
nodes = []
for iy, yloc in enumerate(np.arange(y0, y1+dy/4, dy/2)):
if iy % 2 == 0:
meshx = np.arange(x0, x1+dx/4, dx, dtype='float64')
else:
meshx = np.arange(x0+dx/2, x1-dx/4, dx, dtype='float64')
nodes.append(np.vstack([meshx, np.full_like(meshx, yloc)]).T)
nodes = np.vstack(nodes)
assert nodes.shape[0] == nnode
blk.ndcrd[:,:] = nodes
assert (blk.ndcrd == nodes).all()
Expose memory buffer
class Buffer: public std::enable_shared_from_this<Buffer> {
private:
size_t m_length = 0;
char * m_data = nullptr;
struct ctor_passkey {};
public:
Buffer(size_t length, const ctor_passkey &)
: m_length(length) { m_data = new char[length](); }
static std::shared_ptr<Buffer> construct(size_t length) {
return std::make_shared<Buffer>(length, ctor_passkey());
}
~Buffer() {
if (nullptr != m_data) {
delete[] m_data;
m_data = nullptr;
}
}
/** Backdoor */
template< typename T >
T * data() const { return reinterpret_cast<T*>(m_data); }
};
py::array from(array_flavor flavor) {
// ndarray shape and stride
npy_intp shape[m_table.ndim()];
std::copy(m_table.dims().begin(),
m_table.dims().end(),
shape);
npy_intp strides[m_table.ndim()];
strides[m_table.ndim()-1] = m_table.elsize();
for (ssize_t it = m_table.ndim()-2; it >= 0; --it) {
strides[it] = shape[it+1] * strides[it+1];
}
// create ndarray
void * data = m_table.data();
py::object tmp = py::reinterpret_steal<py::object>(
PyArray_NewFromDescr(
&PyArray_Type,
PyArray_DescrFromType(m_table.datatypeid()),
m_table.ndim(),
shape,
strides,
data,
NPY_ARRAY_WRITEABLE,
nullptr));
// link lifecycle to the underneath buffer
py::object buffer = py::cast(m_table.buffer());
py::array ret;
if (PyArray_SetBaseObject((PyArrayObject *)tmp.ptr(),
buffer.inc_ref().ptr()) == 0) {
ret = tmp;
}
return ret;
}
Internal buffer Expose the buffer as ndarray
❖ Numpy arrays provide the most common construct: a
contiguous memory buffer, and tons of code
❖ N-dimensional arrays (ndarray)
❖ There are variants, but less useful in C++: masked
array, sparse matrices, etc.
Define your meta data
❖ Free to define how the memory is used
class LookupTableCore {
private:
std::shared_ptr<Buffer> m_buffer;
std::vector<index_type> m_dims;
index_type m_nghost = 0;
index_type m_nbody = 0;
index_type m_ncolumn = 0;
index_type m_elsize = 1; ///< Element size in bytes.
DataTypeId m_datatypeid = MH_INT8;
public:
index_type ndim() const { return m_dims.size(); }
index_type nghost() const { return m_nghost; }
index_type nbody() const { return m_nbody; }
index_type nfull() const { return m_nghost + m_nbody; }
index_type ncolumn() const { return m_ncolumn; }
index_type nelem() const { return nfull() * ncolumn(); }
index_type elsize() const { return m_elsize; }
DataTypeId datatypeid() const { return m_datatypeid; }
size_t nbyte() const { return buffer()->nbyte(); }
};
0
bodyghost
Organize arrays
❖ LookupTable is a class
template providing static
information for the dynamic
array core
❖ Now we can put together a
class that keeps track of all
data for computation
template< size_t NDIM >
class UnstructuredBlock {
private:
// geometry arrays.
LookupTable<real_type, NDIM> m_ndcrd;
LookupTable<real_type, NDIM> m_fccnd;
LookupTable<real_type, NDIM> m_fcnml;
LookupTable<real_type, 0> m_fcara;
LookupTable<real_type, NDIM> m_clcnd;
LookupTable<real_type, 0> m_clvol;
// meta arrays.
LookupTable<shape_type, 0> m_fctpn;
LookupTable<shape_type, 0> m_cltpn;
LookupTable<index_type, 0> m_clgrp;
// connectivity arrays.
LookupTable<index_type, FCMND+1> m_fcnds;
LookupTable<index_type, FCNCL > m_fccls;
LookupTable<index_type, CLMND+1> m_clnds;
LookupTable<index_type, CLMFC+1> m_clfcs;
// boundary information.
LookupTable<index_type, 2> m_bndfcs;
std::vector<BoundaryData> m_bndvec;
};
(This case is for unstructured meshes of mixed elements in 2-/3-dimensional Euclidean space)
Fast and hideous
❖ In theory we can write
beautiful and fast code in
C++, and we should.
❖ In practice, as long as it’s
fast, it’s not too hard to
compromise on elegance.
❖ Testability is the bottom
line.
const index_type *
pclfcs = reinterpret_cast<const index_type *>(clfcs().row(0));
prcells = reinterpret_cast<index_type *>(rcells.row(0));
for (icl=0; icl<ncell(); icl++) {
for (ifl=1; ifl<=pclfcs[0]; ifl++) {
ifl1 = ifl-1;
ifc = pclfcs[ifl];
const index_type *
pfccls = reinterpret_cast<const index_type *>(fccls().row(0))
+ ifc*FCREL;
if (ifc == -1) { // NOT A FACE!? SHOULDN'T HAPPEN.
prcells[ifl1] = -1;
continue;
} else if (pfccls[0] == icl) {
if (pfccls[2] != -1) { // has neighboring block.
prcells[ifl1] = -1;
} else { // is interior.
prcells[ifl1] = pfccls[1];
};
} else if (pfccls[1] == icl) { // I am the neighboring cell.
prcells[ifl1] = pfccls[0];
};
// count rcell number.
if (prcells[ifl1] >= 0) {
rcellno[icl] += 1;
} else {
prcells[ifl1] = -1;
};
};
// advance pointers.
pclfcs += CLMFC+1;
prcells += CLMFC;
}; (This looks like C since it really was C.)
Final notes
❖ Avoid Python when you need speed; use it as a shell to
your high-performance library from day one
❖ Resource management is in the core of the hybrid
architecture; do it in C++
❖ Use array (look-up tables) to keep large data
❖ Don’t access PyObject from your core
❖ Always keep in mind the differences in typing systems

Mais conteúdo relacionado

Mais procurados

Chapter 1 semantic web
Chapter 1 semantic webChapter 1 semantic web
Chapter 1 semantic web
R A Akerkar
 

Mais procurados (20)

Introduction to the BioLink datamodel
Introduction to the BioLink datamodelIntroduction to the BioLink datamodel
Introduction to the BioLink datamodel
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
Auto encoding-variational-bayes
Auto encoding-variational-bayesAuto encoding-variational-bayes
Auto encoding-variational-bayes
 
ShEx vs SHACL
ShEx vs SHACLShEx vs SHACL
ShEx vs SHACL
 
Parallel sorting Algorithms
Parallel  sorting AlgorithmsParallel  sorting Algorithms
Parallel sorting Algorithms
 
Deep Learning for Graphs
Deep Learning for GraphsDeep Learning for Graphs
Deep Learning for Graphs
 
Model Your Application Domain, Not Your JSON Structures
Model Your Application Domain, Not Your JSON StructuresModel Your Application Domain, Not Your JSON Structures
Model Your Application Domain, Not Your JSON Structures
 
Introduction to RDF
Introduction to RDFIntroduction to RDF
Introduction to RDF
 
Cikm 2018
Cikm 2018Cikm 2018
Cikm 2018
 
Graph Neural Network - Introduction
Graph Neural Network - IntroductionGraph Neural Network - Introduction
Graph Neural Network - Introduction
 
Neo4j graph database
Neo4j graph databaseNeo4j graph database
Neo4j graph database
 
Learning the structure of Gaussian Graphical models with unobserved variables...
Learning the structure of Gaussian Graphical models with unobserved variables...Learning the structure of Gaussian Graphical models with unobserved variables...
Learning the structure of Gaussian Graphical models with unobserved variables...
 
NLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit DistanceNLP_KASHK:Minimum Edit Distance
NLP_KASHK:Minimum Edit Distance
 
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habibConceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
 
Topic Modeling
Topic ModelingTopic Modeling
Topic Modeling
 
Presentation on binary search, quick sort, merge sort and problems
Presentation on binary search, quick sort, merge sort  and problemsPresentation on binary search, quick sort, merge sort  and problems
Presentation on binary search, quick sort, merge sort and problems
 
Representation Learning of Text for NLP
Representation Learning of Text for NLPRepresentation Learning of Text for NLP
Representation Learning of Text for NLP
 
RDF, linked data and semantic web
RDF, linked data and semantic webRDF, linked data and semantic web
RDF, linked data and semantic web
 
Chapter 1 semantic web
Chapter 1 semantic webChapter 1 semantic web
Chapter 1 semantic web
 
Naive Bayes Classifier
Naive Bayes ClassifierNaive Bayes Classifier
Naive Bayes Classifier
 

Semelhante a Take advantage of C++ from Python

C++totural file
C++totural fileC++totural file
C++totural file
halaisumit
 
Python and Pytorch tutorial and walkthrough
Python and Pytorch tutorial and walkthroughPython and Pytorch tutorial and walkthrough
Python and Pytorch tutorial and walkthrough
gabriellekuruvilla
 
CS225_Prelecture_Notes 2nd
CS225_Prelecture_Notes 2ndCS225_Prelecture_Notes 2nd
CS225_Prelecture_Notes 2nd
Edward Chen
 

Semelhante a Take advantage of C++ from Python (20)

Start Wrap Episode 11: A New Rope
Start Wrap Episode 11: A New RopeStart Wrap Episode 11: A New Rope
Start Wrap Episode 11: A New Rope
 
Boost.Python: C++ and Python Integration
Boost.Python: C++ and Python IntegrationBoost.Python: C++ and Python Integration
Boost.Python: C++ and Python Integration
 
C++ tutorial
C++ tutorialC++ tutorial
C++ tutorial
 
C++totural file
C++totural fileC++totural file
C++totural file
 
Cluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in CCluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in C
 
Python and Pytorch tutorial and walkthrough
Python and Pytorch tutorial and walkthroughPython and Pytorch tutorial and walkthrough
Python and Pytorch tutorial and walkthrough
 
Intro To C++ - Class #17: Pointers!, Objects Talking To Each Other
Intro To C++ - Class #17: Pointers!, Objects Talking To Each OtherIntro To C++ - Class #17: Pointers!, Objects Talking To Each Other
Intro To C++ - Class #17: Pointers!, Objects Talking To Each Other
 
tokyotalk
tokyotalktokyotalk
tokyotalk
 
PHP 8: Process & Fixing Insanity
PHP 8: Process & Fixing InsanityPHP 8: Process & Fixing Insanity
PHP 8: Process & Fixing Insanity
 
Return of c++
Return of c++Return of c++
Return of c++
 
Apache Thrift
Apache ThriftApache Thrift
Apache Thrift
 
CS225_Prelecture_Notes 2nd
CS225_Prelecture_Notes 2ndCS225_Prelecture_Notes 2nd
CS225_Prelecture_Notes 2nd
 
C++primer
C++primerC++primer
C++primer
 
Why learn Internals?
Why learn Internals?Why learn Internals?
Why learn Internals?
 
Linux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudLinux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloud
 
Notes about moving from python to c++ py contw 2020
Notes about moving from python to c++ py contw 2020Notes about moving from python to c++ py contw 2020
Notes about moving from python to c++ py contw 2020
 
C language introduction
C language introduction C language introduction
C language introduction
 
SRAVANByCPP
SRAVANByCPPSRAVANByCPP
SRAVANByCPP
 
Introduction Of C++
Introduction Of C++Introduction Of C++
Introduction Of C++
 
C++ theory
C++ theoryC++ theory
C++ theory
 

Mais de Yung-Yu Chen

Mais de Yung-Yu Chen (8)

Write Python for Speed
Write Python for SpeedWrite Python for Speed
Write Python for Speed
 
SimpleArray between Python and C++
SimpleArray between Python and C++SimpleArray between Python and C++
SimpleArray between Python and C++
 
Write code and find a job
Write code and find a jobWrite code and find a job
Write code and find a job
 
On the necessity and inapplicability of python
On the necessity and inapplicability of pythonOn the necessity and inapplicability of python
On the necessity and inapplicability of python
 
Harmonic Stack for Speed
Harmonic Stack for SpeedHarmonic Stack for Speed
Harmonic Stack for Speed
 
Your interactive computing
Your interactive computingYour interactive computing
Your interactive computing
 
Engineer Engineering Software
Engineer Engineering SoftwareEngineer Engineering Software
Engineer Engineering Software
 
Craftsmanship in Computational Work
Craftsmanship in Computational WorkCraftsmanship in Computational Work
Craftsmanship in Computational Work
 

Último

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
Bhagirath Gogikar
 

Último (20)

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 o
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 

Take advantage of C++ from Python

  • 1. Take advantage of C++ from Python Yung-Yu Chen PyCon Kyushu 30th June 2018
  • 2. Why C++ ❖ Python is slow ❖ Everything is on heap ❖ Always dynamic types ❖ Hard to access assembly ❖ Convoluted algorithms with ndarray ❖ Access external code written in any language ❖ Detail control and abstraction
  • 3. Hard problems take time • Supersonic jet in cross flow; density contour • 264 cores with 53 hours for 1.3 B variables (66 M elements) by 12,000 time steps • At OSC, 2011 (10 Gbps InfiniBand) HPC (high-performance computing) is hard. Physics is harder. Don’t mingle.
  • 4. Best of both worlds ❖ C++: fast runtime, strong static type checking, industrial grade ❖ Slow to code ❖ Python: fast prototyping, batteries included, easy to use ❖ Slow to run ❖ Hybrid system is everywhere. ❖ TensorFlow, Blender, OpenCV, etc. ❖ C++ crunches numbers. Python controls the flow. ❖ Applications work like libraries, libraries like applications.
  • 5. pybind11 ❖ https://github.com/pybind/pybind11: C++11 ❖ Expose C++ entities to Python ❖ Use Python from C++ ❖ list, tuple, dict, and str ❖ handle, object, and none
  • 6. C++11(/14/17/20) New language features: auto and decltype, defaulted and deleted functions, final and override, trailing return type, rvalue references, move constructors/move assignment, scoped enums, constexpr and literal types, list initialization, delegating and inherited constructors, brace-or-equal initializers, nullptr, long long, char16_t and char32_t, type aliases, variadic templates, generalized unions, generalized PODs, Unicode string literals, user-defined literals, attributes, lambda expressions, noexcept, alignof and alignas, multithreaded memory model, thread-local storage, GC interface, range for (based on a Boost library), static assertions (based on a Boost library) http://en.cppreference.com/w/cpp/language/history
  • 7. Python’s friends ❖ Shared pointer: manage resource ownership between C++ and Python ❖ Move semantics: speed ❖ Lambda expression: ease the wrapping code
  • 8. Ownership ❖ All Python objects are dynamically allocated on the heap. Python uses reference counting to know who should deallocate the object when it is no longer used. ❖ A owner of the reference to an object is responsible for deallocating the object. With multiple owners, the last owner (at this time, the reference count is 1) calls the destructor and deallocate. Other owners simply decrement the count by 1.
  • 9. Shared pointer #include <memory> #include <vector> #include <algorithm> #include <iostream> class Series { std::vector<int> m_data; public: int sum() const { const int ret = std::accumulate( m_data.begin(), m_data.end(), 0); std::cout << "Series::sum() = " << ret << std::endl; return ret; } static size_t count; Series(size_t size, int lead) : m_data(size) { for (size_t it=0; it<size; it++) { m_data[it] = lead+it; } count++; } ~Series() { count--; } }; size_t Series::count = 0; void use_raw_pointer() { Series * series_ptr = new Series(10, 2); series_ptr->sum(); // call member function // OUT: Series::sum() = 65 // remember to delete the object or we leak memory std::cout << "before explicit deletion, Series::count = " << Series::count << std::endl; // OUT: before explicit deletion, Series::count = 1 delete series_ptr; std::cout << "after the resource is manually freed, Series::count = " << Series::count << std::endl; // OUT: after the resource is manually freed, Series::count = 0 } void use_shared_pointer() { std::shared_ptr<Series> series_sptr(new Series(10, 3)); series_sptr->sum(); // call member function // OUT: Series::sum() = 75 // note shared_ptr handles deletion for series_sptr } int main(int argc, char ** argv) { // the common raw pointer use_raw_pointer(); // now, shared_ptr use_shared_pointer(); std::cout << "no memory leak: Series::count = " << Series::count << std::endl; // OUT: no memory leak: Series::count = 0 return 0; }
  • 10. Move semantics ❖ Number-crunching code needs large arrays as memory buffers. They aren’t supposed to be copied frequently. ❖ 50,000 × 50,000 takes 20 GB. ❖ Shared pointers should manage large chucks of memory. ❖ New reference to an object: copy constructor of shared pointer ❖ Borrowed reference to an object: const reference to the shared pointer ❖ Stolen reference to an object: move constructor of shared pointer
  • 11. Lambda ❖ Put the code at the place it should be shown namespace py = pybind11; cls = py::class_< wrapped_type, holder_type >(mod, pyname, clsdoc): cls .def( py::init([](block_type & block, index_type icl, bool init_sentinel) { return wrapped_type(block, icl, init_sentinel); }), py::arg("block"), py::arg("icl"), py::arg("init_sentinel")=true ) .def("repr", &wrapped_type::repr, py::arg("indent")=0, py::arg("precision")=0) .def("__repr__", [](wrapped_type & self){ return self.repr(); }) .def("init_sentinel", &wrapped_type::init_sentinel) .def_readwrite("cnd", &wrapped_type::cnd) .def_readwrite("vol", &wrapped_type::vol) .def_property_readonly( "nbce", [](wrapped_type & self) { return self.bces.size(); } ) .def( "get_bce", [](wrapped_type & self, index_type ibce) { return self.bces.at(ibce); } ) ;
  • 12. Lambda, cont’d ❖ Code as free as Python, as fast as C #include <unordered_map> #include <functional> #include <cstdio> int main(int argc, char ** argv) { // Python: fmap = dict() std::unordered_map<int, std::function<void(int)>> fmap; // Python: fmap[1] = lambda v: print("v = %d" % v) fmap.insert({ 1, [](int v) -> void { std::printf("v = %dn", v); } }); // Python: fmap[5] = lambda v: print("v*5 = %d" % (v*5)) fmap.insert({ 5, [](int v) -> void { std::printf("v*5 = %dn", v*5); } }); std::unordered_map<int, std::function<void(int)>>::iterator search; // Python: fmap[1](100) search = fmap.find(1); search->second(100); // OUT: v = 100 // Python: fmap[5](500) search = fmap.find(5); search->second(500); // OUT: v*5 = 2500 return 0; }
  • 13. Manipulate Python ❖ Don’t mingle Python with C++ ❖ Python has GIL ❖ Don’t include Python.h if you don’t intend to run Python ❖ Once it enters your core, it’s hard to get it off #include <Python.h> class Core { private: int m_value; PyObject * m_pyobject; };
  • 14. Do it in the wrapping layer cls .def( py::init([](py::object pyblock) { block_type * block = py::cast<block_type *>(pyblock.attr("_ustblk")); std::shared_ptr<wrapped_type> svr = wrapped_type::construct(block->shared_from_this()); for (auto bc : py::list(pyblock.attr("bclist"))) { std::string name = py::str(bc.attr("__class__").attr("__name__").attr("lstrip")("GasPlus")); BoundaryData * data = py::cast<BoundaryData *>(bc.attr("_data")); std::unique_ptr<gas::TrimBase<NDIM>> trim; if ("Interface" == name) { trim = make_unique<gas::TrimInterface<NDIM>>(*svr, *data); } else if ("NoOp" == name) { trim = make_unique<gas::TrimNoOp<NDIM>>(*svr, *data); } else if ("NonRefl" == name) { trim = make_unique<gas::TrimNonRefl<NDIM>>(*svr, *data); } else if ("SlipWall" == name) { trim = make_unique<gas::TrimSlipWall<NDIM>>(*svr, *data); } else if ("Inlet" == name) { trim = make_unique<gas::TrimInlet<NDIM>>(*svr, *data); } else { /* do nothing for now */ // throw std::runtime_error("BC type unknown"); } svr->trims().push_back(std::move(trim)); } if (report_interval) { svr->make_qty(); } return svr; }), py::arg("block") );
  • 15. pybind11::list ❖ Read a list and cast contents: ❖ Populate: #include <pybind11/pybind11.h> // must be first #include <string> #include <iostream> namespace py = pybind11; PYBIND11_MODULE(_pylist, mod) { mod.def( "do", [](py::list & l) { // convert contents to std::string and send to cout std::cout << "std::cout:" << std::endl; for (py::handle o : l) { std::string s = py::cast<std::string>(o); std::cout << s << std::endl; } } ); mod.def( "do2", [](py::list & l) { // create a new list std::cout << "py::print:" << std::endl; py::list l2; for (py::handle o : l) { std::string s = py::cast<std::string>(o); s = "elm:" + s; py::str s2(s); l2.append(s2); // populate contents } py::print(l2); } ); } /* end PYBIND11_PLUGIN(_pylist) */ >>> import _pylist >>> # print the input list >>> _pylist.do(["a", "b", "c"]) std::cout: a b c >>> _pylist.do2(["d", "e", "f"]) py::print: ['elm:d', 'elm:e', 'elm:f']
  • 16. pybind11::tuple ❖ Tuple is immutable, thus behaves like read-only. The construction is through another iterable object. ❖ Read the contents of a tuple: #include <pybind11/pybind11.h> // must be first #include <vector> namespace py = pybind11; PYBIND11_MODULE(_pytuple, mod) { mod.def( "do", [](py::args & args) { // build a list using py::list::append py::list l; for (py::handle h : args) { l.append(h); } // convert it to a tuple py::tuple t(l); // print it out py::print(py::str("{} len={}").format(t, t.size())); // print the element one by one for (size_t it=0; it<t.size(); ++it) { py::print(py::str("{}").format(t[it])); } } ); } /* end PYBIND11_PLUGIN(_pytuple) */ >>> import _pytuple >>> _pytuple.do("a", 7, 5.6) ('a', 7, 5.6) len=3 a 7 5.6
  • 17. pybind11::dict ❖ Dictionary is one of the most useful container in Python. ❖ Populate a dictionary: ❖ Manipulate it: #include <pybind11/pybind11.h> // must be first #include <string> #include <stdexcept> #include <iostream> namespace py = pybind11; PYBIND11_MODULE(_pydict, mod) { mod.def( "do", [](py::args & args) { if (args.size() % 2 != 0) { throw std::runtime_error("argument number must be even"); } // create a dict from the input tuple py::dict d; for (size_t it=0; it<args.size(); it+=2) { d[args[it]] = args[it+1]; } return d; } ); mod.def( "do2", [](py::dict d, py::args & args) { for (py::handle h : args) { if (d.contains(h)) { std::cout << py::cast<std::string>(h) << " is in the input dictionary" << std::endl; } else { std::cout << py::cast<std::string>(h) << " is not found in the input dictionary" << std::endl; } } std::cout << "remove everything in the input dictionary!" << std::endl; d.clear(); return d; } ); } /* end PYBIND11_PLUGIN(_pydict) */ >>> import _pydict >>> d = _pydict.do("a", 7, "b", "name", 10, 4.2) >>> print(d) {'a': 7, 'b': 'name', 10: 4.2} >>> d2 = _pydict.do2(d, "b", "d") b is in the input dictionary d is not found in the input dictionary remove everything in the input dictionary! >>> print("The returned dictionary is empty:", d2) The returned dictionary is empty: {} >>> print("The first dictionary becomes empty too:", d) The first dictionary becomes empty too: {} >>> print("Are the two dictionaries the same?", d2 is d) Are the two dictionaries the same? True
  • 18. pybind11::str ❖ One more trick with Python strings in pybind11; user-defined literal:
 
 #include <pybind11/pybind11.h> // must be first #include <iostream> namespace py = pybind11; using namespace py::literals; // to bring in the `_s` literal PYBIND11_MODULE(_pystr, mod) { mod.def( "do", []() { py::str s("python string {}"_s.format("formatting")); py::print(s); } ); } /* end PYBIND11_PLUGIN(_pystr) */ >>> import _pystr >>> _pystr.do() python string formatting
  • 19. Generic Python objects ❖ Pybind11 defines two generic types for representing Python objects: ❖ “handle”: base class of all pybind11 classes for Python types ❖ “object” derives from handle and adds automatic reference counting
  • 20. pybind11::handle and object manually descrases refcount after h.dec_ref(): 3 #include <pybind11/pybind11.h> // must be first #include <iostream> namespace py = pybind11; using namespace py::literals; // to bring in the `_s` literal PYBIND11_MODULE(_pyho, mod) { mod.def( "do", [](py::object const & o) { std::cout << "refcount in the beginning: " << o.ptr()->ob_refcnt << std::endl; py::handle h(o); std::cout << "no increase of refcount with a new pybind11::handle: " << h.ptr()->ob_refcnt << std::endl; { py::object o2(o); std::cout << "increased refcount with a new pybind11::object: " << o2.ptr()->ob_refcnt << std::endl; } std::cout << "decreased refcount after the new pybind11::object destructed: " << o.ptr()->ob_refcnt << std::endl; h.inc_ref(); std::cout << "manually increases refcount after h.inc_ref(): " << h.ptr()->ob_refcnt << std::endl; h.dec_ref(); std::cout << "manually descrases refcount after h.dec_ref(): " << h.ptr()->ob_refcnt << std::endl; } ); } /* end PYBIND11_PLUGIN(_pyho) */ >>> import _pyho >>> _pyho.do(["name"]) refcount in the beginning: 3 no increase of refcount with a new pybind11::handle: 3 increased refcount with a new pybind11::object: 4 decreased refcount after the new pybind11::object destructed: 3 manually increases refcount after h.inc_ref(): 4
  • 21. pybind11::none ❖ It’s worth noting that pybind11 has “none” type. In Python, None is a singleton, and accessible as Py_None in the C API. ❖ Access None single from C++: #include <pybind11/pybind11.h> // must be first #include <iostream> namespace py = pybind11; using namespace py::literals; // to bring in the `_s` literal PYBIND11_MODULE(_pynone, mod) { mod.def( "do", [](py::object const & o) { if (o.is(py::none())) { std::cout << "it is None" << std::endl; } else { std::cout << "it is not None" << std::endl; } } ); } /* end PYBIND11_PLUGIN(_pynone) */ >>> import _pynone >>> _pynone.do(None) it is None >>> _pynone.do(False) it is not None
  • 23. Never loop in Python ❖ Sum 100,000,000 integers ❖ The C++ version: ❖ Numpy is better, but not enough $ python -m timeit -s 'data = range(100000000)' 'sum(data)' 10 loops, best of 3: 2.36 sec per loop $ time ./run real 0m0.010s user 0m0.002s sys 0m0.004s #include <cstdio> int main(int argc, char ** argv) { long value = 0; for (long it=0; it<100000000; ++it) { value += it; } return 0; } $ python -m timeit -s 'import numpy as np ; data = np.arange(100000000, dtype="int64")' 'data.sum()' 10 loops, best of 3: 74.9 msec per loop
  • 24. Wisely use arrays ❖ Python calls are expensive. Data need to be transferred from Python to C++ in batch. Use arrays. ❖ C++ code may use arrays as internal representation. For example, matrices are arrays having a 2-D view. ❖ Arrays are used as both ❖ interface between Python and C++, and ❖ internal storage in the C++ engine
  • 25. Arrays in Python ❖ What we really mean is numpy(.ndarray) ❖ 12 lines to create vertices for zig-zagging mesh ❖ They get things done, although sometimes look convoluted # create nodes. nodes = [] for iy, yloc in enumerate(np.arange(y0, y1+dy/4, dy/2)): if iy % 2 == 0: meshx = np.arange(x0, x1+dx/4, dx, dtype='float64') else: meshx = np.arange(x0+dx/2, x1-dx/4, dx, dtype='float64') nodes.append(np.vstack([meshx, np.full_like(meshx, yloc)]).T) nodes = np.vstack(nodes) assert nodes.shape[0] == nnode blk.ndcrd[:,:] = nodes assert (blk.ndcrd == nodes).all()
  • 26. Expose memory buffer class Buffer: public std::enable_shared_from_this<Buffer> { private: size_t m_length = 0; char * m_data = nullptr; struct ctor_passkey {}; public: Buffer(size_t length, const ctor_passkey &) : m_length(length) { m_data = new char[length](); } static std::shared_ptr<Buffer> construct(size_t length) { return std::make_shared<Buffer>(length, ctor_passkey()); } ~Buffer() { if (nullptr != m_data) { delete[] m_data; m_data = nullptr; } } /** Backdoor */ template< typename T > T * data() const { return reinterpret_cast<T*>(m_data); } }; py::array from(array_flavor flavor) { // ndarray shape and stride npy_intp shape[m_table.ndim()]; std::copy(m_table.dims().begin(), m_table.dims().end(), shape); npy_intp strides[m_table.ndim()]; strides[m_table.ndim()-1] = m_table.elsize(); for (ssize_t it = m_table.ndim()-2; it >= 0; --it) { strides[it] = shape[it+1] * strides[it+1]; } // create ndarray void * data = m_table.data(); py::object tmp = py::reinterpret_steal<py::object>( PyArray_NewFromDescr( &PyArray_Type, PyArray_DescrFromType(m_table.datatypeid()), m_table.ndim(), shape, strides, data, NPY_ARRAY_WRITEABLE, nullptr)); // link lifecycle to the underneath buffer py::object buffer = py::cast(m_table.buffer()); py::array ret; if (PyArray_SetBaseObject((PyArrayObject *)tmp.ptr(), buffer.inc_ref().ptr()) == 0) { ret = tmp; } return ret; } Internal buffer Expose the buffer as ndarray ❖ Numpy arrays provide the most common construct: a contiguous memory buffer, and tons of code ❖ N-dimensional arrays (ndarray) ❖ There are variants, but less useful in C++: masked array, sparse matrices, etc.
  • 27. Define your meta data ❖ Free to define how the memory is used class LookupTableCore { private: std::shared_ptr<Buffer> m_buffer; std::vector<index_type> m_dims; index_type m_nghost = 0; index_type m_nbody = 0; index_type m_ncolumn = 0; index_type m_elsize = 1; ///< Element size in bytes. DataTypeId m_datatypeid = MH_INT8; public: index_type ndim() const { return m_dims.size(); } index_type nghost() const { return m_nghost; } index_type nbody() const { return m_nbody; } index_type nfull() const { return m_nghost + m_nbody; } index_type ncolumn() const { return m_ncolumn; } index_type nelem() const { return nfull() * ncolumn(); } index_type elsize() const { return m_elsize; } DataTypeId datatypeid() const { return m_datatypeid; } size_t nbyte() const { return buffer()->nbyte(); } }; 0 bodyghost
  • 28. Organize arrays ❖ LookupTable is a class template providing static information for the dynamic array core ❖ Now we can put together a class that keeps track of all data for computation template< size_t NDIM > class UnstructuredBlock { private: // geometry arrays. LookupTable<real_type, NDIM> m_ndcrd; LookupTable<real_type, NDIM> m_fccnd; LookupTable<real_type, NDIM> m_fcnml; LookupTable<real_type, 0> m_fcara; LookupTable<real_type, NDIM> m_clcnd; LookupTable<real_type, 0> m_clvol; // meta arrays. LookupTable<shape_type, 0> m_fctpn; LookupTable<shape_type, 0> m_cltpn; LookupTable<index_type, 0> m_clgrp; // connectivity arrays. LookupTable<index_type, FCMND+1> m_fcnds; LookupTable<index_type, FCNCL > m_fccls; LookupTable<index_type, CLMND+1> m_clnds; LookupTable<index_type, CLMFC+1> m_clfcs; // boundary information. LookupTable<index_type, 2> m_bndfcs; std::vector<BoundaryData> m_bndvec; }; (This case is for unstructured meshes of mixed elements in 2-/3-dimensional Euclidean space)
  • 29. Fast and hideous ❖ In theory we can write beautiful and fast code in C++, and we should. ❖ In practice, as long as it’s fast, it’s not too hard to compromise on elegance. ❖ Testability is the bottom line. const index_type * pclfcs = reinterpret_cast<const index_type *>(clfcs().row(0)); prcells = reinterpret_cast<index_type *>(rcells.row(0)); for (icl=0; icl<ncell(); icl++) { for (ifl=1; ifl<=pclfcs[0]; ifl++) { ifl1 = ifl-1; ifc = pclfcs[ifl]; const index_type * pfccls = reinterpret_cast<const index_type *>(fccls().row(0)) + ifc*FCREL; if (ifc == -1) { // NOT A FACE!? SHOULDN'T HAPPEN. prcells[ifl1] = -1; continue; } else if (pfccls[0] == icl) { if (pfccls[2] != -1) { // has neighboring block. prcells[ifl1] = -1; } else { // is interior. prcells[ifl1] = pfccls[1]; }; } else if (pfccls[1] == icl) { // I am the neighboring cell. prcells[ifl1] = pfccls[0]; }; // count rcell number. if (prcells[ifl1] >= 0) { rcellno[icl] += 1; } else { prcells[ifl1] = -1; }; }; // advance pointers. pclfcs += CLMFC+1; prcells += CLMFC; }; (This looks like C since it really was C.)
  • 30. Final notes ❖ Avoid Python when you need speed; use it as a shell to your high-performance library from day one ❖ Resource management is in the core of the hybrid architecture; do it in C++ ❖ Use array (look-up tables) to keep large data ❖ Don’t access PyObject from your core ❖ Always keep in mind the differences in typing systems