O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
{
"talk": {
"title": "Python on the edge of a razor",
"event_id": "PyConRu_2017",
},
"speaker": {
"__qname__" : "Aleksandr...
Rather weird talk on performance
def interprete():
while True:
age = int(input('age: >> ')) # opcode, kind of
# handlers
if age < 6:
print("Ahh %s, Who is ...
- Frame represents running code
- Code represents instructions and interface for context
PyObject *
PyEval_EvalFrameEx(PyF...
TARGET(BINARY_ADD) {
PyObject *right = POP();
PyObject *left = TOP();
PyObject *sum;
...
sum = PyNumber_Add(left, right);
...
Years of awkward optimisations
>>> def fib(a, b):
while True:
a, b = b, a+b
yield b
>>> f = fib(1, 1)
>>> next(f)
2
>>> next(f)
3
>>> next(f)
5
>>> next(...
>>> dis.dis(fib)
2 0 SETUP_LOOP 26 (to 29)
--------------------------------------------------------------
3 >> 3 LOAD_FAST...
from cocode import CodeObjectProxy, Constant, Return, Add
code_proxy = CodeObjectProxy(
Constant("Hello "),
Constant("worl...
def fibonacci(a, b):
pass
fib_asm_code = CodeObjectProxy(
VariableFast('a'),
VariableFast('b'),
# ------------------------...
>>> dis.dis(fibonacci)
0 0 LOAD_FAST 0 (a)
3 LOAD_FAST 1 (b)
-------------------------------------------------------------...
Dit it help?
$ time python fib.py
real 0m0.449s
$ time python fib_asm.py
real 0m0.462s
Dit it help?
PyPy is fast because it is jitted, ok?
This answer does not satisfy me
late binding
boxing
vm overhead
Deadly sins of dynamic programming languages
some_method = some_object.method
for value in huge_list:
some_method(value)
Late binding:
00100000000100101000111100000000
00000000000000000000000000000000
10000000000100101000111100000000
00000000000000000000000...
We dont really see it, yet it is there
VM overhead
- negative
Is this the end, mommy?
Example: Say you are asked to sort one
billion numbers, what you gonna do?
The key to performance is specialization
● PyPy: yet another python implementation (fastest on market, though)
● PyPy is written in RPython - turbo ugly subset of ...
● PyPy: yet another python implementation (fastest on market, though)
● PyPy is written in RPython - turbo ugly subset of ...
$ python2 rpython/bin/rpython -O2 demo.py
Before we start: RPython, PyPy etc.
$ ./demo-c
Is that even legal?
Before we start: RPython, PyPy etc.
[Timer] Timings:
[Timer] annotate --- 3.6 s
[Timer] rtype_lltype --- 0.1 s
[Timer] backendopt_lltype --- 0.1 s
[Timer] sta...
By Davide Ancona, Carl Friedrich Bolz, Antonio Cuni, and Armin Rigo
Automatic generation of JIT compilers for
dynamic lang...
By Yoshihiko Futamura
Having program P, taking inputs s1
...sn
and d1
...dm
S(P, (s1
,..., sm
)) = P’
Produce program P’ s...
S(P, (s1
, ..., sm
)) = P’
Similar to rendering equation
Similar to rendering equation
Specialization level.
(since it is too static)
Classical partial evaluation has a disadvantage
So, we better obtain
more info at runtime.
(Tempo, DyC)
Classical partial evaluation has a disadvantage
Yes!
Can we go fully dynamic?
TLC
Study case
– Stack manipulation: POP, PUSHARG, SWAP, etc.
– Flow control: BR_COND, BR jumps.
– Arithmetic: ADD, SUB, etc.
– Compariso...
def interp_eval(code, pc, args, pool):
code_len = len(code)
stack = []
while pc < code_len:
opcode = ord(code[pc])
pc += 1...
class IntObj(Obj):
def __init__(self, value):
self.value = value
def lt(self, other):
return self.value < other.int_o()
de...
main:
PUSHARG
PUSH 0
LT
BR_COND neg
pos:
PUSHARG
RETURN
neg:
PUSH 0
PUSHARG
SUB
RETURN
Say we are provided with the follow...
Not much.
What static partial evaluator can do?
def interp_eval(code, pc, args, pool):
code_len = len(code)
stack = []
while pc < code_len:
opcode = ord(code[pc])
pc += 1...
def interp_eval_abs(args):
stack = []
stack.append(args[0])
stack.append(IntObj(0))
a, b = stack.pop(), stack.pop()
stack....
Escape analysis
def interp_eval_abs(args):
stack = []
stack.append(args[0])
stack.append(IntObj(0))
a, b = stack.pop(), stack.pop()
stack....
def interp_eval_abs(args):
v0 = args[0]
v1 = IntObj(0)
a, b = v0, v1
v0 = IntObj(b.lt(a))
cond = v0
if cond.istrue():
v0 =...
def interp_eval_abs(args):
a = args[0]
cls_a = a.__class__
switch cls_a :
IntObj :
if 0 < a.value:
return a
else:
return I...
But does this make any
sense?
It depends.
● Apply S function only to hot loops
● Lots of consequent iterations of the loop
take the same path in the CFG
Oook, but t...
Metatracer
Approach (a approach, not the approach):
Automatic handling of such a problems is
extremely difficult task. How about providing
a set of hints?
But still:
myjitdriver = JitDriver(greens = ['pc', 'code'], reds = ['frame', 'pool'])
def interp_eval(code, pc, args, pool):
code_len...
def lookup(cls, methname):
...
def call_method(obj, method_name, arguments):
cls = obj.getclass()
...
method = lookup(cls,...
@elidable
def lookup(cls, methname):
...
def call_method(obj, method_name, arguments):
cls = obj.getclass()
promote(cls)
m...
No, thanks
Any details to metajit in this talk?
:3
«Python на острие бритвы: PyPy project» Александр Кошкин, Positive Technologies
Próximos SlideShares
Carregando em…5
×

«Python на острие бритвы: PyPy project» Александр Кошкин, Positive Technologies

101 visualizações

Publicada em

Выступление на PYCON RUSSIA 2017

Publicada em: Internet
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

«Python на острие бритвы: PyPy project» Александр Кошкин, Positive Technologies

  1. 1. { "talk": { "title": "Python on the edge of a razor", "event_id": "PyConRu_2017", }, "speaker": { "__qname__" : "Aleksandr Koshkin", "linkedin" : "lnkfy.com/7Do", "github" : "/magniff", } }
  2. 2. Rather weird talk on performance
  3. 3. def interprete(): while True: age = int(input('age: >> ')) # opcode, kind of # handlers if age < 6: print("Ahh %s, Who is a little cutie?" % age) elif age < 23: print( "So, %s ha? How is your school today?" % age ) else: print("Go find a job, you hippie!1") Is it just me, or...
  4. 4. - Frame represents running code - Code represents instructions and interface for context PyObject * PyEval_EvalFrameEx(PyFrameObject *f, int throwflag) { co = f->f_code; for (;;) { switch (opcode) { TARGET(LOAD_FAST) {* implementation of LOAD_FAST *} TARGET(LOAD_CONST) {* implementation of LOAD_CONST *} TARGET(STORE_FAST) {* implementation of STORE_FAST *} ... } } Frame evaluator
  5. 5. TARGET(BINARY_ADD) { PyObject *right = POP(); PyObject *left = TOP(); PyObject *sum; ... sum = PyNumber_Add(left, right); ... DISPATCH(); } C API
  6. 6. Years of awkward optimisations
  7. 7. >>> def fib(a, b): while True: a, b = b, a+b yield b >>> f = fib(1, 1) >>> next(f) 2 >>> next(f) 3 >>> next(f) 5 >>> next(f) 8 Fibonacci generator
  8. 8. >>> dis.dis(fib) 2 0 SETUP_LOOP 26 (to 29) -------------------------------------------------------------- 3 >> 3 LOAD_FAST 1 (b) 6 LOAD_FAST 0 (a) 9 LOAD_FAST 1 (b) 12 BINARY_ADD 13 ROT_TWO 14 STORE_FAST 0 (a) 17 STORE_FAST 1 (b) 4 20 LOAD_FAST 1 (b) 23 YIELD_VALUE 24 POP_TOP 25 JUMP_ABSOLUTE 3 What CPython generates
  9. 9. from cocode import CodeObjectProxy, Constant, Return, Add code_proxy = CodeObjectProxy( Constant("Hello "), Constant("world!"), Add(), Return() ) code = code_proxy.assemble() assert eval(code) == "Hello world!" CoCode(https://github.com/magniff/cocode)
  10. 10. def fibonacci(a, b): pass fib_asm_code = CodeObjectProxy( VariableFast('a'), VariableFast('b'), # -------------------------------------- Label(Dup(), "loop"), Rot3(), Add(), Dup(), Yield(), Pop(), Jump("loop"), interface=fibonacci, ) Fibonacci generator, cocode version
  11. 11. >>> dis.dis(fibonacci) 0 0 LOAD_FAST 0 (a) 3 LOAD_FAST 1 (b) -------------------------------------------------------------- >> 6 DUP_TOP 7 ROT_THREE 8 BINARY_ADD 9 DUP_TOP 10 YIELD_VALUE 11 POP_TOP 12 JUMP_ABSOLUTE 6 # so, the algorithm is # a,b -> a,b,b -> b,a,b -> b,a+b -> b,a+b,a+b -> yield a+b and loop back Using stack machine like a pro
  12. 12. Dit it help? $ time python fib.py real 0m0.449s $ time python fib_asm.py real 0m0.462s
  13. 13. Dit it help?
  14. 14. PyPy is fast because it is jitted, ok? This answer does not satisfy me
  15. 15. late binding boxing vm overhead Deadly sins of dynamic programming languages
  16. 16. some_method = some_object.method for value in huge_list: some_method(value) Late binding:
  17. 17. 00100000000100101000111100000000 00000000000000000000000000000000 10000000000100101000111100000000 00000000000000000000000000000000 00001111000000000000000000000000 00000000000000000000000000000000 11100000111001001000100100000000 00000000000000000000000000000000 00000001000000000000000000000000 00000000000000000000000000000000 01100100000000000000000000000000 Boxing:
  18. 18. We dont really see it, yet it is there VM overhead
  19. 19. - negative Is this the end, mommy?
  20. 20. Example: Say you are asked to sort one billion numbers, what you gonna do? The key to performance is specialization
  21. 21. ● PyPy: yet another python implementation (fastest on market, though) ● PyPy is written in RPython - turbo ugly subset of python language ● You might think: hmmm, Python, even restricted one - that sounds fun ● It is not, trust me Before we start: RPython, PyPy etc.
  22. 22. ● PyPy: yet another python implementation (fastest on market, though) ● PyPy is written in RPython - turbo ugly subset of python language ● You might think: hmmm, Python, even restricted one - that sounds fun ● It is not, trust me $ cat demo.py import os def main(argv): os.write(1, 'Is that even legal?n') return 0 def target(*args): return main, None Before we start: RPython, PyPy etc.
  23. 23. $ python2 rpython/bin/rpython -O2 demo.py Before we start: RPython, PyPy etc.
  24. 24. $ ./demo-c Is that even legal? Before we start: RPython, PyPy etc.
  25. 25. [Timer] Timings: [Timer] annotate --- 3.6 s [Timer] rtype_lltype --- 0.1 s [Timer] backendopt_lltype --- 0.1 s [Timer] stackcheckinsertion_lltype --- 0.0 s [Timer] database_c --- 8.5 s [Timer] source_c --- 0.9 s [Timer] compile_c --- 1.8 s [Timer] ========================================= [Timer] Total: --- 15.0 s Before we start: RPython, PyPy etc.
  26. 26. By Davide Ancona, Carl Friedrich Bolz, Antonio Cuni, and Armin Rigo Automatic generation of JIT compilers for dynamic languages in .NET Я испытал мощный эмоциональный подъем
  27. 27. By Yoshihiko Futamura Having program P, taking inputs s1 ...sn and d1 ...dm S(P, (s1 ,..., sm )) = P’ Produce program P’ such as P(s1 ...sn , d1 ,..., dm ) = P’(d1 ,..., dm ) Partial Evaluation of Computation Process, Revisited
  28. 28. S(P, (s1 , ..., sm )) = P’
  29. 29. Similar to rendering equation
  30. 30. Similar to rendering equation
  31. 31. Specialization level. (since it is too static) Classical partial evaluation has a disadvantage
  32. 32. So, we better obtain more info at runtime. (Tempo, DyC) Classical partial evaluation has a disadvantage
  33. 33. Yes! Can we go fully dynamic?
  34. 34. TLC Study case
  35. 35. – Stack manipulation: POP, PUSHARG, SWAP, etc. – Flow control: BR_COND, BR jumps. – Arithmetic: ADD, SUB, etc. – Comparisons: EQ, LT, GT, etc. – Object-oriented operations: NEW, GETATTR, SETATTR. – List operations: CONS, CAR, CDR. TLC language (rpython/jit/tl/tlc.py)
  36. 36. def interp_eval(code, pc, args, pool): code_len = len(code) stack = [] while pc < code_len: opcode = ord(code[pc]) pc += 1 if opcode == PUSH: stack.append(IntObj(char2int(code[pc]))) pc += 1 elif opcode == PUSHARG: stack.append (args[0]) elif opcode == SUB: a, b = stack.pop(), stack.pop() stack.append(b.sub(a)) elif opcode == LT: a, b = stack.pop(), stack.pop() stack.append(IntObj(b.lt(a))) elif opcode == BR_COND: cond = stack.pop() if cond.istrue(): pc += char2int(code[pc]) pc += 1 elif opcode == RETURN: break ... return stack[-1]
  37. 37. class IntObj(Obj): def __init__(self, value): self.value = value def lt(self, other): return self.value < other.int_o() def sub(self, other): return IntObj(self.value - other.int_o()) def int_o(self): return self.value
  38. 38. main: PUSHARG PUSH 0 LT BR_COND neg pos: PUSHARG RETURN neg: PUSH 0 PUSHARG SUB RETURN Say we are provided with the following code
  39. 39. Not much. What static partial evaluator can do?
  40. 40. def interp_eval(code, pc, args, pool): code_len = len(code) stack = [] while pc < code_len: opcode = ord(code[pc]) pc += 1 if opcode == PUSH: stack.append(IntObj(char2int(code[pc]))) pc += 1 elif opcode == PUSHARG: stack.append (args[0]) elif opcode == SUB: a, b = stack.pop(), stack.pop() stack.append(b.sub(a)) elif opcode == LT: a, b = stack.pop(), stack.pop() stack.append(IntObj(b.lt(a))) elif opcode == BR_COND: cond = stack.pop() if cond.istrue(): pc += char2int(code[pc]) pc += 1 elif opcode == RETURN: break ... return stack[-1]
  41. 41. def interp_eval_abs(args): stack = [] stack.append(args[0]) stack.append(IntObj(0)) a, b = stack.pop(), stack.pop() stack.append(IntObj(b.lt(a))) cond = stack.pop() if cond.istrue(): stack.append(args[0]) return stack[-1] else: stack.append(IntObj(0)) stack.append(args[0]) a, b = stack.pop(), stack.pop() stack.append(b.sub(a)) return stack[-1]
  42. 42. Escape analysis
  43. 43. def interp_eval_abs(args): stack = [] stack.append(args[0]) stack.append(IntObj(0)) a, b = stack.pop(), stack.pop() stack.append(IntObj(b.lt(a))) cond = stack.pop() if cond.istrue(): stack.append(args[0]) return stack[-1] # 0_o else: stack.append(IntObj(0)) stack.append(args[0]) a, b = stack.pop(), stack.pop() stack.append(b.sub(a)) return stack[-1] # 0_o
  44. 44. def interp_eval_abs(args): v0 = args[0] v1 = IntObj(0) a, b = v0, v1 v0 = IntObj(b.lt(a)) cond = v0 if cond.istrue(): v0 = args[0] return v0 else: v0 = IntObj(0) v1 = args[0] a, b = v0, v1 v0 = b.sub(a) return v0
  45. 45. def interp_eval_abs(args): a = args[0] cls_a = a.__class__ switch cls_a : IntObj : if 0 < a.value: return a else: return IntObj(0 - a.value ) default: try_something_else()
  46. 46. But does this make any sense?
  47. 47. It depends.
  48. 48. ● Apply S function only to hot loops ● Lots of consequent iterations of the loop take the same path in the CFG Oook, but the hottest loop in the interpreter is a opcode dispatch loop. Assumptions:
  49. 49. Metatracer Approach (a approach, not the approach):
  50. 50. Automatic handling of such a problems is extremely difficult task. How about providing a set of hints? But still:
  51. 51. myjitdriver = JitDriver(greens = ['pc', 'code'], reds = ['frame', 'pool']) def interp_eval(code, pc, args, pool): code_len = len(code) stack = [] frame = Frame(args, pc) while pc < code_len: myjitdriver.jit_merge_point( frame=frame, code=code, pc=pc, pool=pool ) opcode = ord(code[pc]) pc += 1 if opcode == some_opcode: ... elif opcode == BR: old_pc = pc pc += char2int(code[pc]) + 1 if old_pc > pc: myjitdriver.can_enter_jit( code=code, pc=pc, frame=frame, pool=pool ) return stack[-1]
  52. 52. def lookup(cls, methname): ... def call_method(obj, method_name, arguments): cls = obj.getclass() ... method = lookup(cls, method_name) return method.call(obj, arguments)
  53. 53. @elidable def lookup(cls, methname): ... def call_method(obj, method_name, arguments): cls = obj.getclass() promote(cls) method = lookup(cls, method_name) return method.call(obj, arguments) Somewhat similar to PIC
  54. 54. No, thanks Any details to metajit in this talk?
  55. 55. :3

×