SlideShare a Scribd company logo
1 of 36
Download to read offline
Towards Ruby3x3 Performance
Introducing RTL and MJIT
Vladimir Makarov
Red Hat
September 21, 2017
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 1 / 30
About Myself
Red Hat, Toronto office, Canada
Tools group (GCC, Glibc, LLVM, Rust, Go, OpenMP)
part of a bigger platform enablement team (porting
Linux kernel to new hardware)
20 years of work on GCC
2 years of work on MRI
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 2 / 30
Ruby 3 performance goal
Matz set a very ambitious goal: MRI 3 should be 3x faster than MRI
2
Koichi Sasada improved MRI performance by about 3x
It is symbolic to expect MRI 3 should be 3x faster than MRI 2
Doable for CPU intensive programs
Hardly possible for memory or IO bound programs
I treat Matz’s performance goal as: MRI needs another cardinal
performance improvement
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 3 / 30
RTL insns
IR for Ruby code analysis, optimizations, and JIT
Importance of easy data dependence discovery
Stack based insns are an inconvenient IR for such goals
Stack insns vs RTL insns for Ruby code a = b + c:
getlocal_OP__WC__0 <b index>
getlocal_OP__WC__0 <c index>
opt_plus
setlocal_OP__WC__0 <a index>
plus <a index>, <b index>, <c index>
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 4 / 30
Using RTL insns for interpretation
RTL for analysis and JIT code generation
RTL or stack insns for interpretation?
Feature Stack insns RTL insns
Insn length shorter longer
Insn number more less
Code length less more
Insn decoding less more
Code data locality more less
Insn dispatching more less
Memory traffic more less
Instructions: Pros & Cons for interpretation
Decision: Use RTL for the interpreter too
Allows sharing code between the interpreter and JIT
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 5 / 30
How to generate RTL
A simpler way is to generate RTL insns from the stack insns
A faster approach is to generate directly from MRI parse tree nodes
Decision: generate RTL directly from MRI nodes
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 6 / 30
RTL insn operands
What could be an operand:
only temporaries
temporaries and locals
temporaries and locals even from higher levels (outside Ruby block)
the above + instance variables
the above + class variables, globals
Decoding overhead of numerous type operands will not be
compensated by processing smaller number of insns
Complicated operands also complicate optimizations and JIT
Currently we use only temporaries and locals. This gives best
performance results according to my experiments
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 7 / 30
RTL complications
Practically any RTL insn might be an ISEQ call. A call always puts a
result on the stack top. We need to move this result to a destination
operand:
If an RTL insn is actually a call, change the return PC so the next insn
executed after the call will be an insn moving the result from the stack
top to the insn destination
To decrease memory overhead, the move insn is a part of the original
insn
For example, if the following insn
plus <move opcode>, <call data>, dst, op1, op2
is a method call, the next executed insn will be
<move opcode> <call data>, dst, op1, op2
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 8 / 30
RTL insn combining and specialization
Immediate value specialization
e.g. plus − > plusi - addition with immediate fixnum as an operand
Frequent insn sequence combining
e.g. eq + bt − > bteq - comparison and branch if the operands are
equal
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 9 / 30
Speculative insn generation
Some initially generated insns can be transformed into speculative
ones during their execution
Speculation is based on operand types (e.g. plus can be transformed
into an integer plus) and on the operand values (e.g. no
multi-precision integers)
Speculative insns can be transformed into unchanging regular insns
if the speculation is wrong
Speculation insns include code checking the speculation correctness
plus
iplus
uplus
fplus
Speculation will be more important for JITted code performance
It creates a lot of big extended basic blocks which a C compiler
optimizes well
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 10 / 30
RTL insn status and future work
It mostly works (make check reports no regressions)
Slightly better performance than stack based insns
27% GeoMean improvement on 23 small benchmarks (+110% to -7%)
Code Change (Optcarrot):
Stack insns → RTL insns
Executed insns number -23%
Executed insn length +19%
Still some work to do for RTL improvement:
Reducing code size
Reducing overhead in operand decoding
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 11 / 30
Possible JIT approaches
1. Writing own JIT from scratch
LuaJIT, JavaScript V8, etc
2. Using widely used optimizing compilers
GCC, LLVM
3. Using existing JITs
JVM, OMR, RPython, Graal/Truffle, etc.
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 12 / 30
Option 1: Writing own JIT from scratch
Full control, small size, fast compilation
Fast compilation is mostly a result of fewer optimizations than in
industrial optimizing compilers
Still a huge effort to implement decent optimizations
Ongoing burden in maintenance and porting
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 13 / 30
Option 2: Using widely used optimizing compilers
Highly optimized code (GCC has > 300 optimization passes), easier
implementation and porting and extremely well maintained (> 2K
contributors since GCC 2.95)
Portable (currently supports 49 targets)
Reliable and well tested (> 16K reporters since GCC 2.95)
No new dependencies
But slower compilation
Slower mostly because it does much more than a typical JIT
Compilation can be made faster by disabling less valuable optimizations
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 14 / 30
Option 3: Using existing JITs
Duplication: already used for JRuby, Topaz (Rpython), Opal (JS),
OMR Ruby, Graal/Truffle Ruby
JVM is stable, reliable, optimizing, and ubiquitous
But still worse code performance than GCC/JIT
Azul Falcon (LLVM based JIT) up to 8x better performance than JVM
C2 (source: http://stuff-gil-says.blogspot.ca/2017)
License issues and patent minefield
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 15 / 30
Own or existing JITs vs GCC/LLVM based JITs
Webkit moved from LLVM JIT to own JIT (source:
https://webkit.org/blog/5852/introducing-the-b3-jit-compiler)
Implemented about 20 optimizations
4-5 speedup in compilation time
Final results: Jetstream, Kraken, Octane (-9% to +8%)
ISP RAS research: JS V8 ported to LLVM (source
http://llvm.org/devmtg/2016-09/slides/Melnik-LLV8.pdf)
GeoMean speedup 8-16% on Sunspider
Resulting situation: is the glass half full or half empty?
In my opinion, considering implementation and maintenance efforts,
GCC/LLVM JIT is a winner, especially for long running server programs
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 16 / 30
How to use GCC/LLVM for implementing JITs
Using LibGCCJIT/MCJIT/ORC:
New, unstable interfaces
A lot of tedious calls to create the environment (see GNU Octave and
PyPy port to libgccjit)
Generating C code:
No dependency on a particular compiler, easier debugging
But some people call it a heavy, “junky” approach
Wrong! if we implement it carefully
LibGCCJIT vs GCC data flow (red parts are different):
Environment creation
through API calls
C header parsing
(emvironment)
C function parsing Optimizations
and Generation
Optimizations
and Generation
Assembler/LD
Assembler/LD Loading .so file
Loading .so fileFunction creation
through API calls
GCC
LibGCCJIT
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 17 / 30
How to use GCC/LLVM for implementing JITs – cont’d
Generating C code
Environment takes from 21% to 41% of all compilation time
Using a precompiled header (PCH) decreases this to less than 3.5%
Function parsing takes less than 1%
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
Header Minimized_Header PCH Minimized_PCH
GCCthousandinsns
GCC −O2 processing a function with 44 RTL insns
Environment
Function Parsing
Optimizations & Generation
GCC with C executable size: 25.1 MB for cc1 vs. 22.6MB for libgccjit
(only 10% difference)
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 18 / 30
MJIT
MJIT is MRI JIT
MJIT is Method JIT
MJIT is a JIT based on C code generation and PCH
MJIT can use GCC or LLVM, in the future other C compilers
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 19 / 30
MJIT architecture
Environment
header
Minimized
header
MRI building phase
New MRI MJIT environment building step
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 20 / 30
MJIT architecture
Environment
header
Minimized
header
MRI building phase
MJIT
MRI execution run
MRI
Precompiled header
CC
thread
MJIT initialized in parallel with Ruby program execution
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 20 / 30
MJIT architecture
Environment
header
Minimized
header
MRI building phase
MJIT
MRI execution run
MRI
Precompiled header
CC
thread
C code .so file
CC
loading
threads
MJIT works in parallel with Ruby program execution
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 20 / 30
Example
Ruby code:
def loop
i = 0; while i < 100_000; i += 1; end
i
end
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 21 / 30
Example
Ruby code:
def loop
i = 0; while i < 100_000; i += 1; end
i
end
RTL code right after compilation:
...
0004 val2loc 3, 0
0007 goto 15
0009 plusi cont_op2, <calldata...>, 3, 3, 1
0015 btlti cont_btcmp, 9, <calldata...>, -1, 3, 100000
0022 loc_ret 3, 16
...
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 21 / 30
Example
Ruby code:
def loop
i = 0; while i < 100_000; i += 1; end
i
end
Speculative RTL code after some execution:
...
0004 val2loc 3, 0
0007 goto 15
0009 iplusi _, _, 3, 3, 1
0015 ibtlti _, 9, _, -1, 3, 100000
0022 loc_ret 3, 16
...
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 21 / 30
Example
Ruby code:
def loop
i = 0; while i < 100_000; i += 1; end
i
end
MJIT generated C code:
...
l4: cfp->pc = (void *) 0x5576729ccd88; val2loc_f(cfp, &v0, 3, 0x1);
l7: cfp->pc = (void *) 0x5576729ccd98; ruby_vm_check_ints(th); goto l15;
l9: if (iplusi_f(cfp, &v0, 3, &v0, 3, &new_insn)) {
vm_change_insn(cfp->iseq, (void *) 0x5576729ccda6, new_insn);
goto stop_spec;
}
l15: flag = ibtlti_f(cfp, &t0, -1, &v0, 200001, &val, &new_insn);
if (val == RUBY_Qundef) {
vm_change_insn(cfp->iseq, (void *) 0x5576729ccdd6, new_insn);
goto stop_spec;
}
if (flag) goto l9;
l22: cfp->pc = (void *) 0x5576729cce26;
loc_ret_f(th, cfp, &v0, 16, &val);
return val;
...
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 21 / 30
Example
Ruby code:
def loop
i = 0; while i < 100_000; i += 1; end
i
end
GCC optimized x86-64 code:
...
movl $200001, %eax
...
ret
There is no loop
JVM can not do this
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 21 / 30
MJIT performance results
Benchmarking MRI v2 (v2), MRI GCC MJIT (MJIT), MRI LLVM
MJIT (MJIT-L), OMR Ruby rev. 57163 using JIT (OMR), JRuby9k
9.1.8 (JRuby9K), JRuby9k -Xdynamic (JRuby9k-D), Graal Ruby 0.22
(Graal)
Mainstream CPU (i3-7100) under Fedora 25 with GCC-6.3 and
Clang-3.9
Microbenchmarks and small benchmarks (dir MJIT-benchmarks)
Each benchmark runs at least 20-30sec on MRI v2
Optcarrot (https://github.com/mame/optcarrot)
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 22 / 30
MJIT performance results
Microbenchmarks: Geomean Wall time improvement relative to
MRI v2
v2 MJIT MJIT-L OMR JRuby9k JRuby9k-D Graal
0
1
2
3
4
5
6
7Speedup(GeoMean)
1.09
1.59
2.48
1.83
6.18
4.02
Wall time Speedup
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 23 / 30
MJIT performance results
Microbenchmarks: Geomean CPU time improvement relative to
MRI v2
v2 MJIT MJIT-L OMR JRuby9k JRuby9k-D Graal
0
1
2
3
4
5
6Speedup(GeoMean)
1.09
1.33
1.88
0.69
5.55
3.67
CPU time Speedup
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 24 / 30
MJIT performance results
Microbenchmarks: Geomean Peak memory overhead relative to
MRI v2
v2 MJIT MJIT-L OMR JRuby9k JRuby9k-D Graal
10-1
100
101
102
103
Peakmemory(GeoMean)
2.54
161.76 198.86
79.65
4.15
6.44
Peak memory overhead
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 25 / 30
MJIT performance results
Optcarrot: FPS speedup relative to MRI v2
v2 MJIT MJIT-L OMR JRuby9k JRuby9k-D Graal
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
Speedup
1.20 1.14
2.38
2.83
2.94
FPS improvement
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 26 / 30
MJIT performance results
Optcarrot: CPU time improvement relative to MRI v2
v2 MJIT MJIT-L OMR JRuby9k JRuby9k-D Graal
0.0
0.5
1.0
1.5
2.0
Speedup
1.13
0.79 0.76
1.53
1.45
CPU time Speedup
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 27 / 30
MJIT performance results
Optcarrot: Peak memory overhead relative to MRI v2
v2 MJIT MJIT-L OMR JRuby9k JRuby9k-D Graal
10-1
100
101
102
103
Peakmemory
1.41
10.67
17.68
1.16 1.16
Peak memory overhead
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 28 / 30
Recommendations to use GCC/LLVM for a JIT
My recommendations in order of importance:
Don’t use MCJIT, ORC, or LIBGCCJIT
Use a pre-compiled header (JIT code environment) in a memory FS
Compile code in parallel with program interpretation
Use a good strategy to choose byte code for JITting
Minimize the environment if you don’t use PCH
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 29 / 30
MJIT status and future directions
The project is at an early development stage:
Unstable, passes ‘make test’, can not pass ‘make check’ yet
Doesn’t work on Windows
At least one more year to mature
Need more optimizations:
No inlining yet. The most important optimization!
Different approaches to implement inlining:
Node or RTL level
Use C inlining (I’ll pursue this one)
New GCC/LLVM extension (a new inline attribute) would be useful
Will RTL and MJIT be a part of MRI?
It does not depend on me
I am going to work in this direction
Will be happy if even some project ideas will be used in future MRI
Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 30 / 30

More Related Content

What's hot

Transparent eBPF Offload: Playing Nice with the Linux Kernel
Transparent eBPF Offload: Playing Nice with the Linux KernelTransparent eBPF Offload: Playing Nice with the Linux Kernel
Transparent eBPF Offload: Playing Nice with the Linux KernelOpen-NFP
 
LinuxCon 2015 Stateful NAT with OVS
LinuxCon 2015 Stateful NAT with OVSLinuxCon 2015 Stateful NAT with OVS
LinuxCon 2015 Stateful NAT with OVSThomas Graf
 
Improving GStreamer performance on large pipelines: from profiling to optimiz...
Improving GStreamer performance on large pipelines: from profiling to optimiz...Improving GStreamer performance on large pipelines: from profiling to optimiz...
Improving GStreamer performance on large pipelines: from profiling to optimiz...Luis Lopez
 
On the Capability and Achievable Performance of FPGAs for HPC Applications
On the Capability and Achievable Performance of FPGAs for HPC ApplicationsOn the Capability and Achievable Performance of FPGAs for HPC Applications
On the Capability and Achievable Performance of FPGAs for HPC ApplicationsWim Vanderbauwhede
 
Q2.12: Debugging with GDB
Q2.12: Debugging with GDBQ2.12: Debugging with GDB
Q2.12: Debugging with GDBLinaro
 
A Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on SparkA Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on SparkYu Liu
 
A Guide to Data Versioning with MapR Snapshots
A Guide to Data Versioning with MapR SnapshotsA Guide to Data Versioning with MapR Snapshots
A Guide to Data Versioning with MapR SnapshotsIan Downard
 
Simulation of BRKSS Architecture for Data Warehouse Employing Shared Nothing ...
Simulation of BRKSS Architecture for Data Warehouse Employing Shared Nothing ...Simulation of BRKSS Architecture for Data Warehouse Employing Shared Nothing ...
Simulation of BRKSS Architecture for Data Warehouse Employing Shared Nothing ...Dr. Amarjeet Singh
 
Network Measurement with P4 and C on Netronome Agilio
Network Measurement with P4 and C on Netronome AgilioNetwork Measurement with P4 and C on Netronome Agilio
Network Measurement with P4 and C on Netronome AgilioOpen-NFP
 
Accelerating Networked Applications with Flexible Packet Processing
Accelerating Networked Applications with Flexible Packet ProcessingAccelerating Networked Applications with Flexible Packet Processing
Accelerating Networked Applications with Flexible Packet ProcessingOpen-NFP
 
Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...NECST Lab @ Politecnico di Milano
 
Building a QT based solution on a i.MX7 processor running Linux and FreeRTOS
Building a QT based solution on a i.MX7 processor running Linux and FreeRTOSBuilding a QT based solution on a i.MX7 processor running Linux and FreeRTOS
Building a QT based solution on a i.MX7 processor running Linux and FreeRTOSFernando Luiz Cola
 
Improvement of Congestion window and Link utilization of High Speed Protocols...
Improvement of Congestion window and Link utilization of High Speed Protocols...Improvement of Congestion window and Link utilization of High Speed Protocols...
Improvement of Congestion window and Link utilization of High Speed Protocols...IOSR Journals
 
P4-based VNF and Micro-VNF Chaining for Servers With Intelligent Server Adapters
P4-based VNF and Micro-VNF Chaining for Servers With Intelligent Server AdaptersP4-based VNF and Micro-VNF Chaining for Servers With Intelligent Server Adapters
P4-based VNF and Micro-VNF Chaining for Servers With Intelligent Server AdaptersOpen-NFP
 

What's hot (20)

Transparent eBPF Offload: Playing Nice with the Linux Kernel
Transparent eBPF Offload: Playing Nice with the Linux KernelTransparent eBPF Offload: Playing Nice with the Linux Kernel
Transparent eBPF Offload: Playing Nice with the Linux Kernel
 
LinuxCon 2015 Stateful NAT with OVS
LinuxCon 2015 Stateful NAT with OVSLinuxCon 2015 Stateful NAT with OVS
LinuxCon 2015 Stateful NAT with OVS
 
Improving GStreamer performance on large pipelines: from profiling to optimiz...
Improving GStreamer performance on large pipelines: from profiling to optimiz...Improving GStreamer performance on large pipelines: from profiling to optimiz...
Improving GStreamer performance on large pipelines: from profiling to optimiz...
 
On the Capability and Achievable Performance of FPGAs for HPC Applications
On the Capability and Achievable Performance of FPGAs for HPC ApplicationsOn the Capability and Achievable Performance of FPGAs for HPC Applications
On the Capability and Achievable Performance of FPGAs for HPC Applications
 
3rd 3DDRESD: ReCPU 4 NIDS
3rd 3DDRESD: ReCPU 4 NIDS3rd 3DDRESD: ReCPU 4 NIDS
3rd 3DDRESD: ReCPU 4 NIDS
 
Q2.12: Debugging with GDB
Q2.12: Debugging with GDBQ2.12: Debugging with GDB
Q2.12: Debugging with GDB
 
A Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on SparkA Generate-Test-Aggregate Parallel Programming Library on Spark
A Generate-Test-Aggregate Parallel Programming Library on Spark
 
Fairness in Transfer Control Protocol for Congestion Control in Multiplicativ...
Fairness in Transfer Control Protocol for Congestion Control in Multiplicativ...Fairness in Transfer Control Protocol for Congestion Control in Multiplicativ...
Fairness in Transfer Control Protocol for Congestion Control in Multiplicativ...
 
A Guide to Data Versioning with MapR Snapshots
A Guide to Data Versioning with MapR SnapshotsA Guide to Data Versioning with MapR Snapshots
A Guide to Data Versioning with MapR Snapshots
 
Simulation of BRKSS Architecture for Data Warehouse Employing Shared Nothing ...
Simulation of BRKSS Architecture for Data Warehouse Employing Shared Nothing ...Simulation of BRKSS Architecture for Data Warehouse Employing Shared Nothing ...
Simulation of BRKSS Architecture for Data Warehouse Employing Shared Nothing ...
 
Network Measurement with P4 and C on Netronome Agilio
Network Measurement with P4 and C on Netronome AgilioNetwork Measurement with P4 and C on Netronome Agilio
Network Measurement with P4 and C on Netronome Agilio
 
Accelerating Networked Applications with Flexible Packet Processing
Accelerating Networked Applications with Flexible Packet ProcessingAccelerating Networked Applications with Flexible Packet Processing
Accelerating Networked Applications with Flexible Packet Processing
 
Go at uber
Go at uberGo at uber
Go at uber
 
Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...
 
2017 10 17_quantum_program_v2
2017 10 17_quantum_program_v22017 10 17_quantum_program_v2
2017 10 17_quantum_program_v2
 
Building a QT based solution on a i.MX7 processor running Linux and FreeRTOS
Building a QT based solution on a i.MX7 processor running Linux and FreeRTOSBuilding a QT based solution on a i.MX7 processor running Linux and FreeRTOS
Building a QT based solution on a i.MX7 processor running Linux and FreeRTOS
 
Improvement of Congestion window and Link utilization of High Speed Protocols...
Improvement of Congestion window and Link utilization of High Speed Protocols...Improvement of Congestion window and Link utilization of High Speed Protocols...
Improvement of Congestion window and Link utilization of High Speed Protocols...
 
Programming policy diagrams
Programming policy diagramsProgramming policy diagrams
Programming policy diagrams
 
2017 07 04_cmmse_quantum_programming_v1
2017 07 04_cmmse_quantum_programming_v12017 07 04_cmmse_quantum_programming_v1
2017 07 04_cmmse_quantum_programming_v1
 
P4-based VNF and Micro-VNF Chaining for Servers With Intelligent Server Adapters
P4-based VNF and Micro-VNF Chaining for Servers With Intelligent Server AdaptersP4-based VNF and Micro-VNF Chaining for Servers With Intelligent Server Adapters
P4-based VNF and Micro-VNF Chaining for Servers With Intelligent Server Adapters
 

Similar to Towards ruby-3x3-performance

Three CRuby Performance Projects
Three CRuby Performance ProjectsThree CRuby Performance Projects
Three CRuby Performance ProjectsVladimir Makarov
 
MongoDB World 2019: The Journey of Migration from Oracle to MongoDB at Rakuten
MongoDB World 2019: The Journey of Migration from Oracle to MongoDB at RakutenMongoDB World 2019: The Journey of Migration from Oracle to MongoDB at Rakuten
MongoDB World 2019: The Journey of Migration from Oracle to MongoDB at RakutenMongoDB
 
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...Edge AI and Vision Alliance
 
Let's make it flow ... one way
Let's make it flow ... one wayLet's make it flow ... one way
Let's make it flow ... one wayRoberto Ciatti
 
Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Brendan Gregg
 
Microsoft Azure in HPC scenarios
Microsoft Azure in HPC scenariosMicrosoft Azure in HPC scenarios
Microsoft Azure in HPC scenariosmictc
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...Rob Skillington
 
Multiprocessor Architecture for Image Processing
Multiprocessor Architecture for Image ProcessingMultiprocessor Architecture for Image Processing
Multiprocessor Architecture for Image Processingmayank.grd
 
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Things
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of ThingsJerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Things
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of ThingsSamsung Open Source Group
 
Update on the Mont-Blanc Project for ARM-based HPC
Update on the Mont-Blanc Project for ARM-based HPCUpdate on the Mont-Blanc Project for ARM-based HPC
Update on the Mont-Blanc Project for ARM-based HPCinside-BigData.com
 
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...Yusuke Izawa
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...Ryousei Takano
 
TSC BoF: OSS Toolchain Discussion - SFO17-409
TSC BoF: OSS Toolchain Discussion - SFO17-409TSC BoF: OSS Toolchain Discussion - SFO17-409
TSC BoF: OSS Toolchain Discussion - SFO17-409Linaro
 
Minko - Build WebGL applications with C++ and asm.js
Minko - Build WebGL applications with C++ and asm.jsMinko - Build WebGL applications with C++ and asm.js
Minko - Build WebGL applications with C++ and asm.jsMinko3D
 
Dryad Paper Review and System Analysis
Dryad Paper Review and System AnalysisDryad Paper Review and System Analysis
Dryad Paper Review and System AnalysisJinGui LI
 
NGRX Apps in Depth
NGRX Apps in DepthNGRX Apps in Depth
NGRX Apps in DepthTrayan Iliev
 

Similar to Towards ruby-3x3-performance (20)

Three CRuby Performance Projects
Three CRuby Performance ProjectsThree CRuby Performance Projects
Three CRuby Performance Projects
 
MongoDB World 2019: The Journey of Migration from Oracle to MongoDB at Rakuten
MongoDB World 2019: The Journey of Migration from Oracle to MongoDB at RakutenMongoDB World 2019: The Journey of Migration from Oracle to MongoDB at Rakuten
MongoDB World 2019: The Journey of Migration from Oracle to MongoDB at Rakuten
 
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
 
Let's make it flow ... one way
Let's make it flow ... one wayLet's make it flow ... one way
Let's make it flow ... one way
 
An Optics Life
An Optics LifeAn Optics Life
An Optics Life
 
Introduction to IoT.JS
Introduction to IoT.JSIntroduction to IoT.JS
Introduction to IoT.JS
 
Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)Computing Performance: On the Horizon (2021)
Computing Performance: On the Horizon (2021)
 
Microsoft Azure in HPC scenarios
Microsoft Azure in HPC scenariosMicrosoft Azure in HPC scenarios
Microsoft Azure in HPC scenarios
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
 
Multiprocessor Architecture for Image Processing
Multiprocessor Architecture for Image ProcessingMultiprocessor Architecture for Image Processing
Multiprocessor Architecture for Image Processing
 
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Things
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of ThingsJerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Things
JerryScript: An ultra-lighteweight JavaScript Engine for the Internet of Things
 
Update on the Mont-Blanc Project for ARM-based HPC
Update on the Mont-Blanc Project for ARM-based HPCUpdate on the Mont-Blanc Project for ARM-based HPC
Update on the Mont-Blanc Project for ARM-based HPC
 
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
 
TSC BoF: OSS Toolchain Discussion - SFO17-409
TSC BoF: OSS Toolchain Discussion - SFO17-409TSC BoF: OSS Toolchain Discussion - SFO17-409
TSC BoF: OSS Toolchain Discussion - SFO17-409
 
SNAP MACHINE LEARNING
SNAP MACHINE LEARNINGSNAP MACHINE LEARNING
SNAP MACHINE LEARNING
 
Minko - Build WebGL applications with C++ and asm.js
Minko - Build WebGL applications with C++ and asm.jsMinko - Build WebGL applications with C++ and asm.js
Minko - Build WebGL applications with C++ and asm.js
 
Dryad Paper Review and System Analysis
Dryad Paper Review and System AnalysisDryad Paper Review and System Analysis
Dryad Paper Review and System Analysis
 
NGRX Apps in Depth
NGRX Apps in DepthNGRX Apps in Depth
NGRX Apps in Depth
 
Js on-microcontrollers
Js on-microcontrollersJs on-microcontrollers
Js on-microcontrollers
 

Recently uploaded

Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Hasting Chen
 
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝soniya singh
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...NETWAYS
 
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Salam Al-Karadaghi
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfhenrik385807
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Delhi Call girls
 
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )Pooja Nehwal
 
Mathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMoumonDas2
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...Sheetaleventcompany
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Chameera Dedduwage
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesPooja Nehwal
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024eCommerce Institute
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AITatiana Gurgel
 
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...henrik385807
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Pooja Nehwal
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024eCommerce Institute
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubssamaasim06
 
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrSaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrsaastr
 

Recently uploaded (20)

Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
 
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
 
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
OSCamp Kubernetes 2024 | A Tester's Guide to CI_CD as an Automated Quality Co...
 
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
 
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdfOpen Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
Open Source Strategy in Logistics 2015_Henrik Hankedvz-d-nl-log-conference.pdf
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
 
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
 
Mathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptx
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
 
Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AI
 
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
CTAC 2024 Valencia - Sven Zoelle - Most Crucial Invest to Digitalisation_slid...
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubs
 
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStrSaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
SaaStr Workshop Wednesday w: Jason Lemkin, SaaStr
 

Towards ruby-3x3-performance

  • 1. Towards Ruby3x3 Performance Introducing RTL and MJIT Vladimir Makarov Red Hat September 21, 2017 Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 1 / 30
  • 2. About Myself Red Hat, Toronto office, Canada Tools group (GCC, Glibc, LLVM, Rust, Go, OpenMP) part of a bigger platform enablement team (porting Linux kernel to new hardware) 20 years of work on GCC 2 years of work on MRI Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 2 / 30
  • 3. Ruby 3 performance goal Matz set a very ambitious goal: MRI 3 should be 3x faster than MRI 2 Koichi Sasada improved MRI performance by about 3x It is symbolic to expect MRI 3 should be 3x faster than MRI 2 Doable for CPU intensive programs Hardly possible for memory or IO bound programs I treat Matz’s performance goal as: MRI needs another cardinal performance improvement Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 3 / 30
  • 4. RTL insns IR for Ruby code analysis, optimizations, and JIT Importance of easy data dependence discovery Stack based insns are an inconvenient IR for such goals Stack insns vs RTL insns for Ruby code a = b + c: getlocal_OP__WC__0 <b index> getlocal_OP__WC__0 <c index> opt_plus setlocal_OP__WC__0 <a index> plus <a index>, <b index>, <c index> Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 4 / 30
  • 5. Using RTL insns for interpretation RTL for analysis and JIT code generation RTL or stack insns for interpretation? Feature Stack insns RTL insns Insn length shorter longer Insn number more less Code length less more Insn decoding less more Code data locality more less Insn dispatching more less Memory traffic more less Instructions: Pros & Cons for interpretation Decision: Use RTL for the interpreter too Allows sharing code between the interpreter and JIT Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 5 / 30
  • 6. How to generate RTL A simpler way is to generate RTL insns from the stack insns A faster approach is to generate directly from MRI parse tree nodes Decision: generate RTL directly from MRI nodes Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 6 / 30
  • 7. RTL insn operands What could be an operand: only temporaries temporaries and locals temporaries and locals even from higher levels (outside Ruby block) the above + instance variables the above + class variables, globals Decoding overhead of numerous type operands will not be compensated by processing smaller number of insns Complicated operands also complicate optimizations and JIT Currently we use only temporaries and locals. This gives best performance results according to my experiments Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 7 / 30
  • 8. RTL complications Practically any RTL insn might be an ISEQ call. A call always puts a result on the stack top. We need to move this result to a destination operand: If an RTL insn is actually a call, change the return PC so the next insn executed after the call will be an insn moving the result from the stack top to the insn destination To decrease memory overhead, the move insn is a part of the original insn For example, if the following insn plus <move opcode>, <call data>, dst, op1, op2 is a method call, the next executed insn will be <move opcode> <call data>, dst, op1, op2 Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 8 / 30
  • 9. RTL insn combining and specialization Immediate value specialization e.g. plus − > plusi - addition with immediate fixnum as an operand Frequent insn sequence combining e.g. eq + bt − > bteq - comparison and branch if the operands are equal Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 9 / 30
  • 10. Speculative insn generation Some initially generated insns can be transformed into speculative ones during their execution Speculation is based on operand types (e.g. plus can be transformed into an integer plus) and on the operand values (e.g. no multi-precision integers) Speculative insns can be transformed into unchanging regular insns if the speculation is wrong Speculation insns include code checking the speculation correctness plus iplus uplus fplus Speculation will be more important for JITted code performance It creates a lot of big extended basic blocks which a C compiler optimizes well Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 10 / 30
  • 11. RTL insn status and future work It mostly works (make check reports no regressions) Slightly better performance than stack based insns 27% GeoMean improvement on 23 small benchmarks (+110% to -7%) Code Change (Optcarrot): Stack insns → RTL insns Executed insns number -23% Executed insn length +19% Still some work to do for RTL improvement: Reducing code size Reducing overhead in operand decoding Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 11 / 30
  • 12. Possible JIT approaches 1. Writing own JIT from scratch LuaJIT, JavaScript V8, etc 2. Using widely used optimizing compilers GCC, LLVM 3. Using existing JITs JVM, OMR, RPython, Graal/Truffle, etc. Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 12 / 30
  • 13. Option 1: Writing own JIT from scratch Full control, small size, fast compilation Fast compilation is mostly a result of fewer optimizations than in industrial optimizing compilers Still a huge effort to implement decent optimizations Ongoing burden in maintenance and porting Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 13 / 30
  • 14. Option 2: Using widely used optimizing compilers Highly optimized code (GCC has > 300 optimization passes), easier implementation and porting and extremely well maintained (> 2K contributors since GCC 2.95) Portable (currently supports 49 targets) Reliable and well tested (> 16K reporters since GCC 2.95) No new dependencies But slower compilation Slower mostly because it does much more than a typical JIT Compilation can be made faster by disabling less valuable optimizations Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 14 / 30
  • 15. Option 3: Using existing JITs Duplication: already used for JRuby, Topaz (Rpython), Opal (JS), OMR Ruby, Graal/Truffle Ruby JVM is stable, reliable, optimizing, and ubiquitous But still worse code performance than GCC/JIT Azul Falcon (LLVM based JIT) up to 8x better performance than JVM C2 (source: http://stuff-gil-says.blogspot.ca/2017) License issues and patent minefield Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 15 / 30
  • 16. Own or existing JITs vs GCC/LLVM based JITs Webkit moved from LLVM JIT to own JIT (source: https://webkit.org/blog/5852/introducing-the-b3-jit-compiler) Implemented about 20 optimizations 4-5 speedup in compilation time Final results: Jetstream, Kraken, Octane (-9% to +8%) ISP RAS research: JS V8 ported to LLVM (source http://llvm.org/devmtg/2016-09/slides/Melnik-LLV8.pdf) GeoMean speedup 8-16% on Sunspider Resulting situation: is the glass half full or half empty? In my opinion, considering implementation and maintenance efforts, GCC/LLVM JIT is a winner, especially for long running server programs Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 16 / 30
  • 17. How to use GCC/LLVM for implementing JITs Using LibGCCJIT/MCJIT/ORC: New, unstable interfaces A lot of tedious calls to create the environment (see GNU Octave and PyPy port to libgccjit) Generating C code: No dependency on a particular compiler, easier debugging But some people call it a heavy, “junky” approach Wrong! if we implement it carefully LibGCCJIT vs GCC data flow (red parts are different): Environment creation through API calls C header parsing (emvironment) C function parsing Optimizations and Generation Optimizations and Generation Assembler/LD Assembler/LD Loading .so file Loading .so fileFunction creation through API calls GCC LibGCCJIT Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 17 / 30
  • 18. How to use GCC/LLVM for implementing JITs – cont’d Generating C code Environment takes from 21% to 41% of all compilation time Using a precompiled header (PCH) decreases this to less than 3.5% Function parsing takes less than 1% 0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 Header Minimized_Header PCH Minimized_PCH GCCthousandinsns GCC −O2 processing a function with 44 RTL insns Environment Function Parsing Optimizations & Generation GCC with C executable size: 25.1 MB for cc1 vs. 22.6MB for libgccjit (only 10% difference) Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 18 / 30
  • 19. MJIT MJIT is MRI JIT MJIT is Method JIT MJIT is a JIT based on C code generation and PCH MJIT can use GCC or LLVM, in the future other C compilers Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 19 / 30
  • 20. MJIT architecture Environment header Minimized header MRI building phase New MRI MJIT environment building step Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 20 / 30
  • 21. MJIT architecture Environment header Minimized header MRI building phase MJIT MRI execution run MRI Precompiled header CC thread MJIT initialized in parallel with Ruby program execution Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 20 / 30
  • 22. MJIT architecture Environment header Minimized header MRI building phase MJIT MRI execution run MRI Precompiled header CC thread C code .so file CC loading threads MJIT works in parallel with Ruby program execution Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 20 / 30
  • 23. Example Ruby code: def loop i = 0; while i < 100_000; i += 1; end i end Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 21 / 30
  • 24. Example Ruby code: def loop i = 0; while i < 100_000; i += 1; end i end RTL code right after compilation: ... 0004 val2loc 3, 0 0007 goto 15 0009 plusi cont_op2, <calldata...>, 3, 3, 1 0015 btlti cont_btcmp, 9, <calldata...>, -1, 3, 100000 0022 loc_ret 3, 16 ... Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 21 / 30
  • 25. Example Ruby code: def loop i = 0; while i < 100_000; i += 1; end i end Speculative RTL code after some execution: ... 0004 val2loc 3, 0 0007 goto 15 0009 iplusi _, _, 3, 3, 1 0015 ibtlti _, 9, _, -1, 3, 100000 0022 loc_ret 3, 16 ... Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 21 / 30
  • 26. Example Ruby code: def loop i = 0; while i < 100_000; i += 1; end i end MJIT generated C code: ... l4: cfp->pc = (void *) 0x5576729ccd88; val2loc_f(cfp, &v0, 3, 0x1); l7: cfp->pc = (void *) 0x5576729ccd98; ruby_vm_check_ints(th); goto l15; l9: if (iplusi_f(cfp, &v0, 3, &v0, 3, &new_insn)) { vm_change_insn(cfp->iseq, (void *) 0x5576729ccda6, new_insn); goto stop_spec; } l15: flag = ibtlti_f(cfp, &t0, -1, &v0, 200001, &val, &new_insn); if (val == RUBY_Qundef) { vm_change_insn(cfp->iseq, (void *) 0x5576729ccdd6, new_insn); goto stop_spec; } if (flag) goto l9; l22: cfp->pc = (void *) 0x5576729cce26; loc_ret_f(th, cfp, &v0, 16, &val); return val; ... Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 21 / 30
  • 27. Example Ruby code: def loop i = 0; while i < 100_000; i += 1; end i end GCC optimized x86-64 code: ... movl $200001, %eax ... ret There is no loop JVM can not do this Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 21 / 30
  • 28. MJIT performance results Benchmarking MRI v2 (v2), MRI GCC MJIT (MJIT), MRI LLVM MJIT (MJIT-L), OMR Ruby rev. 57163 using JIT (OMR), JRuby9k 9.1.8 (JRuby9K), JRuby9k -Xdynamic (JRuby9k-D), Graal Ruby 0.22 (Graal) Mainstream CPU (i3-7100) under Fedora 25 with GCC-6.3 and Clang-3.9 Microbenchmarks and small benchmarks (dir MJIT-benchmarks) Each benchmark runs at least 20-30sec on MRI v2 Optcarrot (https://github.com/mame/optcarrot) Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 22 / 30
  • 29. MJIT performance results Microbenchmarks: Geomean Wall time improvement relative to MRI v2 v2 MJIT MJIT-L OMR JRuby9k JRuby9k-D Graal 0 1 2 3 4 5 6 7Speedup(GeoMean) 1.09 1.59 2.48 1.83 6.18 4.02 Wall time Speedup Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 23 / 30
  • 30. MJIT performance results Microbenchmarks: Geomean CPU time improvement relative to MRI v2 v2 MJIT MJIT-L OMR JRuby9k JRuby9k-D Graal 0 1 2 3 4 5 6Speedup(GeoMean) 1.09 1.33 1.88 0.69 5.55 3.67 CPU time Speedup Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 24 / 30
  • 31. MJIT performance results Microbenchmarks: Geomean Peak memory overhead relative to MRI v2 v2 MJIT MJIT-L OMR JRuby9k JRuby9k-D Graal 10-1 100 101 102 103 Peakmemory(GeoMean) 2.54 161.76 198.86 79.65 4.15 6.44 Peak memory overhead Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 25 / 30
  • 32. MJIT performance results Optcarrot: FPS speedup relative to MRI v2 v2 MJIT MJIT-L OMR JRuby9k JRuby9k-D Graal 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 Speedup 1.20 1.14 2.38 2.83 2.94 FPS improvement Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 26 / 30
  • 33. MJIT performance results Optcarrot: CPU time improvement relative to MRI v2 v2 MJIT MJIT-L OMR JRuby9k JRuby9k-D Graal 0.0 0.5 1.0 1.5 2.0 Speedup 1.13 0.79 0.76 1.53 1.45 CPU time Speedup Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 27 / 30
  • 34. MJIT performance results Optcarrot: Peak memory overhead relative to MRI v2 v2 MJIT MJIT-L OMR JRuby9k JRuby9k-D Graal 10-1 100 101 102 103 Peakmemory 1.41 10.67 17.68 1.16 1.16 Peak memory overhead Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 28 / 30
  • 35. Recommendations to use GCC/LLVM for a JIT My recommendations in order of importance: Don’t use MCJIT, ORC, or LIBGCCJIT Use a pre-compiled header (JIT code environment) in a memory FS Compile code in parallel with program interpretation Use a good strategy to choose byte code for JITting Minimize the environment if you don’t use PCH Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 29 / 30
  • 36. MJIT status and future directions The project is at an early development stage: Unstable, passes ‘make test’, can not pass ‘make check’ yet Doesn’t work on Windows At least one more year to mature Need more optimizations: No inlining yet. The most important optimization! Different approaches to implement inlining: Node or RTL level Use C inlining (I’ll pursue this one) New GCC/LLVM extension (a new inline attribute) would be useful Will RTL and MJIT be a part of MRI? It does not depend on me I am going to work in this direction Will be happy if even some project ideas will be used in future MRI Vladimir Makarov (Red Hat) Towards Ruby3x3 Performance September 21, 2017 30 / 30