SlideShare uma empresa Scribd logo
1 de 68
Baixar para ler offline
from Binary to Binary:
How Qemu Works
魏禛 (@_zhenwei_) <zhenwei.tw@gmail.com>
林致民 (Doraemon) <r06944005@csie.ntu.edu.tw>
August 11, 2018 / COSCUP 2018
Who are we?
● 林致民 (Doraemon)
○ Master Student @ NTU Compiler
Optimization and Virtualization Lab
○ Insterested in compiler optimization and
system performance
● 魏禛 (@_zhenwei_)
○ From Tainan, Taiwan
○ Master student @ NTU
○ Interested in Computer Architecture,
Virtual Machine and Compiler stuff
2
Outline
● Introduction of Qemu
● Guest binary to TCG-IR translation
● Block Chaining !
● TCG-IR to x86_64 translation
● Do not cover ...
○ Full system emulation
○ Interrupt handling
○ Multi-thread implementation
○ Optimization ...
3
● Created by Fabrice Bellard in 2003
● Features
○ Just-in-time (JIT) compilation support ot achieve high performance
○ Cross-platform (most UNIX-like system and MS-Windows)
○ Lots of target hosts and targets support (full system emulation)
■ x86, aarch32, aarch64, mips, sparc, risc-v
○ User mode emulation: Qemu can run applications compiled for another CPU (same OS)
● More excellent slides !
○ Qemu JIT Code Generator and System Emulation
○ QEMU - Binary Translation
What is Qemu
4
Environment
● Guest (target) machine: RISC-V
● Host machine: Intel x86_64
● Tools
○ Qemu 1.12.0 https://www.qemu.org
○ RISC-V GNU Toolchain https://github.com/riscv/riscv-gnu-toolchain.git
5
Translation Block (tb)
● Definition of translation block (tb)
○ Encounter the branch (modify PC)
○ Encounter the system call
○ Reach the page boundary
The picture is referenced from “QEMU - Binary Translation” by Jiann-Fuh Liaw 6
Dynamic Binary Translation
● Translate guest ISA
instuction to Host ISA
instruction (runtime)
● After the translation
block is executed, the
control come back to
the Qemu
7
Dynamic Binary Translation
● Block Chaining - avoid the “context switching” overhead
8
Tiny Code Generator (TCG)
9
It is referenced from tcg/README
Qemu Execution Flow
Find the Translation Block
Execute the Translation Block
Generate the
Translation Block
Chain the generated
Translation Block to
existed Block
10
Qemu Execution Flow
11
tb_find()
gen_intermediate_code()
tcg_gen_code()
tb_add_jump()
cpu_loop_exec_tb()
Data structures would be used later ...
TCGContext TranslationBlock
DisasContext
CPURISCVState 12
The main loop
cpu_exec() @ accel/tcg/cpu-exec.c
● tb_find()
○ Find the desired translation block by pc
value
● cpu_loop_exec_tb()
○ Execute the native code in the
translation block
13
Find the desired Translation Block or create one
tb_find() @ accel/tcg/cpu-exec.c
● tb_lookup__cpu_state()
○ Find the specific tb (Translation Block)
by pc value
● tb_gen_code()
○ If the desired tb hasn’t been generated
yet, we just create one
● tb_add_jump()
○ The block chaining patch point!
○ We will talk about it later ...
14
The Translation Block Finding Algorithm
tb_lookup__cpu_state() @
include/exec/tb-lookup.h
● tb_jmp_cache_hash_func()
○ A level-1 lookup cache is implemented
(fast path)
● tb_htable_lookup()
○ A traditional hash table is used to find
the specific tb by hash value (slow path)
○ Update the level-1 lookup cache if found
it
15
The Translation Block Finding Algorithm
bzip2 mcf
# of tb
executed
32877233 8125325
tb-cache
miss ratio
0.01 % 2.10 %
Some tbs would
be executed
many times
throughout the
program
40% perfmance loss 20% perfmance loss
16
time (sec)
Start to generate the Translation Block
tb_gen_code() @ accel/tcg/translate-all.c
● tb_alloc()
○ This function would allocate the space
from the Code Cache in TCGContext
○ If the Code Cache is full, just flush it !
● gen_intermediate_code()
○ We will get the TCG-IR produced in this
function, which is stored in the
TCGContext
● tcg_gen_code()
○ Generate the host machine code
according to the TCG-IR, and it would be
stored in the tb 17
Generate the TCG-IR first !
gen_intermediate_code() @
target/riscv/translate.c
● The while loop would decode each
instruction in the guest binary until
encounter the branch or reach the
tb size
● decode_opc()
○ Decode the instruction in binary form
18
Decode the instruction in guest binary
decode_RV32_64G() @
target/riscv/translate.c
19
Generate the TCG-IR first ! (E.g. arithmetic instr.)
gen_arith() @ target/riscv/translate.c
● The guest instruction need to be
implemented in the TCG variable
system
● The TCG variables are declared and
can be assigned the value from the
architecture state or constants
● The TCG frontend ops would operate
on these TCG variables. 20
Generate the TCG-IR first ! (E.g. arithmetic instr.)
tcg_gen_? @ tcg/tcg-op.h & tcg/tcg-op.c
● tcg_emit_op()
○ This function will allocate the space from
TCGContext and insert it into the ops
linked-list
● After getting the allocated space, the
tcg opcode and arguments are filled
into it.
● The TCG instrution generated would
be showed later ... 21
More about tcg_emit_op()
22
Generate the TCG-IR first ! (E.g. branch instr.)
gen_branch() @ target/riscv/translate.c
● The Label also needed to generated
via TCG-IR form
● gen_goto_tb()
○ Jump into the specific translation block !
● When encountered the branch, which
means it the end of the tb.
○ The ctx->bstate is set to break the outer
while loop in gen_intermediate_code()
23
How Block Chaining works?
A slot waits for patching.
If not patched yet, it would just
jump to the next instruction
The location of this slot would be
recorded in the tb->jump_target_arg
when generating host machine code
The generating tb will patch the last executed tb24
The Patch Point
Translate TCG-IR to
x86_64 binary code
25
TCG-IR to x86_64 translation
● Before entering the backend …
26
TCG-IR to x86_64 translation
RISC-V 32bit
TCG-IR
x86_64
● Let’s take a look at some examples: Store
tmp0 = tmp0 + 1156(0x484)
Load s2 to tmp0
Load s3 to tmp1
Store tmp1 back to address tmp0How does QEMU generate
binary code?
27
TCG-IR to x86_64 translation
● Let’s take a look at some examples: add
RISC-V 32bit
TCG-IR
x86_64
28
TCG-IR to x86_64 translation
● Let’s take a look at some examples: add
RISC-V 32bit
TCG-IR
x86_64
Architecture States
Data Structure 29
Pointer
TCG-IR to x86_64 translation
● Let’s take a look at some examples: add
RISC-V 32bit
TCG-IR
x86_64
Architecture States
Data Structure 30
General Purpose Registers
Floating Point Registers
TCG-IR to x86_64 translation
● Let’s take a look at some examples: add
RISC-V 32bit
TCG-IR
x86_64
Load data from
architecture state a5 reg.
31
TCG-IR to x86_64 translation
● Let’s take a look at some examples: add
RISC-V 32bit
TCG-IR
x86_64
tmp0 = tmp0 + tmp1
Load data from
architecture state a5
32
TCG-IR to x86_64 translation
● Let’s take a look at some examples: add
RISC-V 32bit
TCG-IR
x86_64
tmp0 = tmp0 + tmp1
Load data from
architecture state a5
Store tmp0 back to
architecture state a5 reg
33
TCG-IR to x86_64 translation
RISC-V 32bit
TCG-IR
x86_64
● Let’s take a look at some examples: load
Load data from address
‘tmp0’ to tmp1
34
TCG-IR to x86_64 translation
RISC-V 32bit
TCG-IR
x86_64
● Let’s take a look at some examples: load
Store tmp1 back to
archi. state a4 reg
Load data from address
‘tmp0’ to tmp1
35
TCG-IR to x86_64 translation
RISC-V 32bit
TCG-IR
x86_64
● Let’s take a look at some examples: Store
Load s2 to tmp0
36
TCG-IR to x86_64 translation
RISC-V 32bit
TCG-IR
x86_64
● Let’s take a look at some examples: Store
tmp0 = tmp0 + 1156(0x484)
Load s2 to tmp0
37
TCG-IR to x86_64 translation
RISC-V 32bit
TCG-IR
x86_64
● Let’s take a look at some examples: Store
tmp0 = tmp0 + 1156(0x484)
Load s2 to tmp0
Load from s3 to tmp1
38
TCG-IR to x86_64 translation
RISC-V 32bit
TCG-IR
x86_64
● Let’s take a look at some examples: Store
tmp0 = tmp0 + 1156(0x484)
Load s2 to tmp0
Load s3 to tmp1
39
Store tmp1 to address tmp 0
TCG-IR to x86_64 translation
RISC-V 32bit
TCG-IR
x86_64
● Let’s take a look at some examples: Store
tmp0 = tmp0 + 1156(0x484)
Load s2 to tmp0
Load s3 to tmp1
Store tmp1 back to address tmp0How does QEMU handle
branch instructions?
40
TCG-IR to x86_64 translation
● Let’s take a look at some examples: Branch
○ Direct Branch
■ Conditional Branch
beqz rs, offset / bgt rs, rt, offset / …..
■ Unconditional Branch
j offset / jal offset / call offset / ...
○ Indirect Branch
■ Switch/Case → Branch table
■ Indirect function call
■ Return Instructions (ret)
41
TCG-IR to x86_64 translation
● Let’s take a look at some examples: Direct Branch (Unconditional)
RISC-V 32bit
TCG-IR
x86_64
42
TCG-IR to x86_64 translation
● Let’s take a look at some examples: Direct Branch (Unconditional)
RISC-V 32bit
TCG-IR
x86_64
Remind:
Patch point for block chaining
43
TCG-IR to x86_64 translation
● Let’s take a look at some examples: Direct Branch (Unconditional)
RISC-V 32bit
TCG-IR
x86_64
Synchronize program counter
to architecture states
44
TCG-IR to x86_64 translation
● Let’s take a look at some examples: Direct Branch (Unconditional)
RISC-V 32bit
TCG-IR
x86_64
Synchronize program counter
Prepare return value
45
TCG-IR to x86_64 translation
● Let’s take a look at some examples: Direct Branch (Unconditional)
RISC-V 32bit
TCG-IR
x86_64
Synchronize program counter
Prepare return value
Go back to QEMU to find
next Translation Block
46
TCG-IR to x86_64 translation
● Let’s take a look at some examples: Direct Branch (Unconditional)
RISC-V 32bit
TCG-IR
x86_64
Remind: Patch point
47
Block Chaining:
Link the current TB to
previous TB by patching
the jump target address
TCG-IR to x86_64 translation
● Let’s take a look at some examples: Direct Branch (Conditional)
48
TCG-IR to x86_64 translation
● Let’s take a look at some examples: Direct Branch (Conditional)
49
TCG-IR to x86_64 translation
● Let’s take a look at some examples: Direct Branch (Conditional)
50
TCG-IR to x86_64 translation
● Let’s take a look at some examples: Direct Branch (Conditional)
If the block is chained
by QEMU, jump to
target translation block
51
TCG-IR to x86_64 translation
● Let’s take a look at some examples: Direct Branch (Conditional)
Go back to QEMU to
find next Translation
Block
52
TCG-IR to x86_64 translation
● Let’s take a look at some examples: Indirect Branch - return instruction
RISC-V 32bit
TCG-IR
x86_64
53
TCG-IR to x86_64 translation
● Let’s take a look at some examples: Indirect Branch - return instruction
RISC-V 32bit
TCG-IR
x86_64
54
Store the value from return address
register to program counter
TCG-IR to x86_64 translation
● Let’s take a look at some examples: Indirect Branch - return instruction
RISC-V 32bit
TCG-IR
x86_64
Go back to QEMU to find
next Translation Block in
Program Counter
55
TCG-IR to x86_64 translation
● helper function call
○ QEMU provides a ‘hook’ for developers to write emulation behavior in C
language. Commonly used in:
■ Emulate hardware not supported in host machine
● e.g. Hardware FP, SIMD, AES, etc.
■ Dynamic Instrumentation
● Collect runtime information from source program to analyze
program’s behavior
(e.g. Dynamic call graph / control flow graph)
56
TCG-IR to x86_64 translation
● helper function call - example
call helper_function
return result (%rax) to ‘fa5’ 57
TCG-IR to x86_64 translation
● helper - Emulate IEEE 754 floating point add with double precision
58
TCG-IR to x86_64 translation
● Prologue, epilogue - The entry point for each Translation Block
59
TCG-IR to x86_64 translation
● Prologue, epilogue
○ Decide whether current TB can be executed
60
TCG-IR to x86_64 translation
● Prologue, epilogue
How to execute translated
binary code?
61
x86_64 code execution
● Entry point - qemu-riscv/accel/tcg/cpu-exec.c
62
tcg_qemu_tb_exec: A pointer targeting
to the head of translation block
x86_64 code execution
● Entry point - qemu-riscv/accel/tcg/cpu-exec.c
63
Put env to %rdi, tb_ptr to %rsi
x86_64 code execution
● Entry point - qemu-riscv/accel/tcg/cpu-exec.c
64
Move %rdi to %r14, and jump %rsi
x86_64 code execution
● Entry point - qemu-riscv/accel/tcg/cpu-exec.c
65
Move %rdi to %r14, and jump %rsi
x86_64 code execution
● Entry point - qemu-riscv/accel/tcg/cpu-exec.c
66
Exit code cache and go back to QEMU
x86_64 code execution
● Entry point - qemu-riscv/accel/tcg/cpu-exec.c
67
Return the value to ret, record last TB we just execute.
Reference
● Qemu JIT Code Generator and System Emulation
● QEMU - Binary Translation
● QEMU TCG Frontend Ops
● RISC-V Insturction Set Manual
● Doraemon’s Notes: QEMU Backend
○ Written in Mandarin Chinese
68

Mais conteúdo relacionado

Mais procurados

Linux Initialization Process (2)
Linux Initialization Process (2)Linux Initialization Process (2)
Linux Initialization Process (2)shimosawa
 
Static partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-VStatic partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-VRISC-V International
 
twlkh-linux-vsyscall-and-vdso
twlkh-linux-vsyscall-and-vdsotwlkh-linux-vsyscall-and-vdso
twlkh-linux-vsyscall-and-vdsoViller Hsiao
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux KernelAdrian Huang
 
Launch the First Process in Linux System
Launch the First Process in Linux SystemLaunch the First Process in Linux System
Launch the First Process in Linux SystemJian-Hong Pan
 
Linux on ARM 64-bit Architecture
Linux on ARM 64-bit ArchitectureLinux on ARM 64-bit Architecture
Linux on ARM 64-bit ArchitectureRyo Jin
 
Arm device tree and linux device drivers
Arm device tree and linux device driversArm device tree and linux device drivers
Arm device tree and linux device driversHoucheng Lin
 
用十分鐘 向jserv學習作業系統設計
用十分鐘  向jserv學習作業系統設計用十分鐘  向jserv學習作業系統設計
用十分鐘 向jserv學習作業系統設計鍾誠 陳鍾誠
 
Linux Kernel MMC Storage driver Overview
Linux Kernel MMC Storage driver OverviewLinux Kernel MMC Storage driver Overview
Linux Kernel MMC Storage driver OverviewRajKumar Rampelli
 
LCU13: An Introduction to ARM Trusted Firmware
LCU13: An Introduction to ARM Trusted FirmwareLCU13: An Introduction to ARM Trusted Firmware
LCU13: An Introduction to ARM Trusted FirmwareLinaro
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Brendan Gregg
 
Qemu device prototyping
Qemu device prototypingQemu device prototyping
Qemu device prototypingYan Vugenfirer
 
Virtualization Support in ARMv8+
Virtualization Support in ARMv8+Virtualization Support in ARMv8+
Virtualization Support in ARMv8+Aananth C N
 

Mais procurados (20)

Linux Initialization Process (2)
Linux Initialization Process (2)Linux Initialization Process (2)
Linux Initialization Process (2)
 
The Internals of "Hello World" Program
The Internals of "Hello World" ProgramThe Internals of "Hello World" Program
The Internals of "Hello World" Program
 
Static partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-VStatic partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-V
 
Interpreter, Compiler, JIT from scratch
Interpreter, Compiler, JIT from scratchInterpreter, Compiler, JIT from scratch
Interpreter, Compiler, JIT from scratch
 
twlkh-linux-vsyscall-and-vdso
twlkh-linux-vsyscall-and-vdsotwlkh-linux-vsyscall-and-vdso
twlkh-linux-vsyscall-and-vdso
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux Kernel
 
Launch the First Process in Linux System
Launch the First Process in Linux SystemLaunch the First Process in Linux System
Launch the First Process in Linux System
 
Linux on ARM 64-bit Architecture
Linux on ARM 64-bit ArchitectureLinux on ARM 64-bit Architecture
Linux on ARM 64-bit Architecture
 
Character drivers
Character driversCharacter drivers
Character drivers
 
Memory model
Memory modelMemory model
Memory model
 
Linux Internals - Part I
Linux Internals - Part ILinux Internals - Part I
Linux Internals - Part I
 
Arm device tree and linux device drivers
Arm device tree and linux device driversArm device tree and linux device drivers
Arm device tree and linux device drivers
 
用十分鐘 向jserv學習作業系統設計
用十分鐘  向jserv學習作業系統設計用十分鐘  向jserv學習作業系統設計
用十分鐘 向jserv學習作業系統設計
 
Linux Kernel MMC Storage driver Overview
Linux Kernel MMC Storage driver OverviewLinux Kernel MMC Storage driver Overview
Linux Kernel MMC Storage driver Overview
 
LLVM
LLVMLLVM
LLVM
 
I2c drivers
I2c driversI2c drivers
I2c drivers
 
LCU13: An Introduction to ARM Trusted Firmware
LCU13: An Introduction to ARM Trusted FirmwareLCU13: An Introduction to ARM Trusted Firmware
LCU13: An Introduction to ARM Trusted Firmware
 
Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)Performance Wins with eBPF: Getting Started (2021)
Performance Wins with eBPF: Getting Started (2021)
 
Qemu device prototyping
Qemu device prototypingQemu device prototyping
Qemu device prototyping
 
Virtualization Support in ARMv8+
Virtualization Support in ARMv8+Virtualization Support in ARMv8+
Virtualization Support in ARMv8+
 

Semelhante a from Binary to Binary: How Qemu Works

lecture16-recap-questions-and-answers.pdf
lecture16-recap-questions-and-answers.pdflecture16-recap-questions-and-answers.pdf
lecture16-recap-questions-and-answers.pdfAyushKumar93531
 
A taste of GlobalISel
A taste of GlobalISelA taste of GlobalISel
A taste of GlobalISelIgalia
 
COSCUP2023 RSA256 Verilator.pdf
COSCUP2023 RSA256 Verilator.pdfCOSCUP2023 RSA256 Verilator.pdf
COSCUP2023 RSA256 Verilator.pdfYodalee
 
FPGA design with CλaSH
FPGA design with CλaSHFPGA design with CλaSH
FPGA design with CλaSHConrad Parker
 
EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5PRADEEP
 
Address/Thread/Memory Sanitizer
Address/Thread/Memory SanitizerAddress/Thread/Memory Sanitizer
Address/Thread/Memory SanitizerPlatonov Sergey
 
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...Yusuke Izawa
 
How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDBPingCAP
 
24-02-18 Rejender pratap.pdf
24-02-18 Rejender pratap.pdf24-02-18 Rejender pratap.pdf
24-02-18 Rejender pratap.pdfFrangoCamila
 
GPU Programming on CPU - Using C++AMP
GPU Programming on CPU - Using C++AMPGPU Programming on CPU - Using C++AMP
GPU Programming on CPU - Using C++AMPMiller Lee
 
MOVED: The challenge of SVE in QEMU - SFO17-103
MOVED: The challenge of SVE in QEMU - SFO17-103MOVED: The challenge of SVE in QEMU - SFO17-103
MOVED: The challenge of SVE in QEMU - SFO17-103Linaro
 
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIWLec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIWHsien-Hsin Sean Lee, Ph.D.
 
Finding Xori: Malware Analysis Triage with Automated Disassembly
Finding Xori: Malware Analysis Triage with Automated DisassemblyFinding Xori: Malware Analysis Triage with Automated Disassembly
Finding Xori: Malware Analysis Triage with Automated DisassemblyPriyanka Aash
 
Parallel program design
Parallel program designParallel program design
Parallel program designZongYing Lyu
 
BKK16-304 The State of GDB on AArch64
BKK16-304 The State of GDB on AArch64BKK16-304 The State of GDB on AArch64
BKK16-304 The State of GDB on AArch64Linaro
 
Cpp17 and Beyond
Cpp17 and BeyondCpp17 and Beyond
Cpp17 and BeyondComicSansMS
 

Semelhante a from Binary to Binary: How Qemu Works (20)

lecture16-recap-questions-and-answers.pdf
lecture16-recap-questions-and-answers.pdflecture16-recap-questions-and-answers.pdf
lecture16-recap-questions-and-answers.pdf
 
A taste of GlobalISel
A taste of GlobalISelA taste of GlobalISel
A taste of GlobalISel
 
COSCUP2023 RSA256 Verilator.pdf
COSCUP2023 RSA256 Verilator.pdfCOSCUP2023 RSA256 Verilator.pdf
COSCUP2023 RSA256 Verilator.pdf
 
FPGA design with CλaSH
FPGA design with CλaSHFPGA design with CλaSH
FPGA design with CλaSH
 
Auto Tuning
Auto TuningAuto Tuning
Auto Tuning
 
EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5EMBEDDED SYSTEMS 4&5
EMBEDDED SYSTEMS 4&5
 
Address/Thread/Memory Sanitizer
Address/Thread/Memory SanitizerAddress/Thread/Memory Sanitizer
Address/Thread/Memory Sanitizer
 
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
 
How to build TiDB
How to build TiDBHow to build TiDB
How to build TiDB
 
24-02-18 Rejender pratap.pdf
24-02-18 Rejender pratap.pdf24-02-18 Rejender pratap.pdf
24-02-18 Rejender pratap.pdf
 
GPU Programming on CPU - Using C++AMP
GPU Programming on CPU - Using C++AMPGPU Programming on CPU - Using C++AMP
GPU Programming on CPU - Using C++AMP
 
CS4961-L9.ppt
CS4961-L9.pptCS4961-L9.ppt
CS4961-L9.ppt
 
MOVED: The challenge of SVE in QEMU - SFO17-103
MOVED: The challenge of SVE in QEMU - SFO17-103MOVED: The challenge of SVE in QEMU - SFO17-103
MOVED: The challenge of SVE in QEMU - SFO17-103
 
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIWLec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
 
Async fun
Async funAsync fun
Async fun
 
Finding Xori: Malware Analysis Triage with Automated Disassembly
Finding Xori: Malware Analysis Triage with Automated DisassemblyFinding Xori: Malware Analysis Triage with Automated Disassembly
Finding Xori: Malware Analysis Triage with Automated Disassembly
 
Parallel program design
Parallel program designParallel program design
Parallel program design
 
BKK16-304 The State of GDB on AArch64
BKK16-304 The State of GDB on AArch64BKK16-304 The State of GDB on AArch64
BKK16-304 The State of GDB on AArch64
 
Quiz 9
Quiz 9Quiz 9
Quiz 9
 
Cpp17 and Beyond
Cpp17 and BeyondCpp17 and Beyond
Cpp17 and Beyond
 

Último

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 

Último (20)

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 

from Binary to Binary: How Qemu Works

  • 1. from Binary to Binary: How Qemu Works 魏禛 (@_zhenwei_) <zhenwei.tw@gmail.com> 林致民 (Doraemon) <r06944005@csie.ntu.edu.tw> August 11, 2018 / COSCUP 2018
  • 2. Who are we? ● 林致民 (Doraemon) ○ Master Student @ NTU Compiler Optimization and Virtualization Lab ○ Insterested in compiler optimization and system performance ● 魏禛 (@_zhenwei_) ○ From Tainan, Taiwan ○ Master student @ NTU ○ Interested in Computer Architecture, Virtual Machine and Compiler stuff 2
  • 3. Outline ● Introduction of Qemu ● Guest binary to TCG-IR translation ● Block Chaining ! ● TCG-IR to x86_64 translation ● Do not cover ... ○ Full system emulation ○ Interrupt handling ○ Multi-thread implementation ○ Optimization ... 3
  • 4. ● Created by Fabrice Bellard in 2003 ● Features ○ Just-in-time (JIT) compilation support ot achieve high performance ○ Cross-platform (most UNIX-like system and MS-Windows) ○ Lots of target hosts and targets support (full system emulation) ■ x86, aarch32, aarch64, mips, sparc, risc-v ○ User mode emulation: Qemu can run applications compiled for another CPU (same OS) ● More excellent slides ! ○ Qemu JIT Code Generator and System Emulation ○ QEMU - Binary Translation What is Qemu 4
  • 5. Environment ● Guest (target) machine: RISC-V ● Host machine: Intel x86_64 ● Tools ○ Qemu 1.12.0 https://www.qemu.org ○ RISC-V GNU Toolchain https://github.com/riscv/riscv-gnu-toolchain.git 5
  • 6. Translation Block (tb) ● Definition of translation block (tb) ○ Encounter the branch (modify PC) ○ Encounter the system call ○ Reach the page boundary The picture is referenced from “QEMU - Binary Translation” by Jiann-Fuh Liaw 6
  • 7. Dynamic Binary Translation ● Translate guest ISA instuction to Host ISA instruction (runtime) ● After the translation block is executed, the control come back to the Qemu 7
  • 8. Dynamic Binary Translation ● Block Chaining - avoid the “context switching” overhead 8
  • 9. Tiny Code Generator (TCG) 9 It is referenced from tcg/README
  • 10. Qemu Execution Flow Find the Translation Block Execute the Translation Block Generate the Translation Block Chain the generated Translation Block to existed Block 10
  • 12. Data structures would be used later ... TCGContext TranslationBlock DisasContext CPURISCVState 12
  • 13. The main loop cpu_exec() @ accel/tcg/cpu-exec.c ● tb_find() ○ Find the desired translation block by pc value ● cpu_loop_exec_tb() ○ Execute the native code in the translation block 13
  • 14. Find the desired Translation Block or create one tb_find() @ accel/tcg/cpu-exec.c ● tb_lookup__cpu_state() ○ Find the specific tb (Translation Block) by pc value ● tb_gen_code() ○ If the desired tb hasn’t been generated yet, we just create one ● tb_add_jump() ○ The block chaining patch point! ○ We will talk about it later ... 14
  • 15. The Translation Block Finding Algorithm tb_lookup__cpu_state() @ include/exec/tb-lookup.h ● tb_jmp_cache_hash_func() ○ A level-1 lookup cache is implemented (fast path) ● tb_htable_lookup() ○ A traditional hash table is used to find the specific tb by hash value (slow path) ○ Update the level-1 lookup cache if found it 15
  • 16. The Translation Block Finding Algorithm bzip2 mcf # of tb executed 32877233 8125325 tb-cache miss ratio 0.01 % 2.10 % Some tbs would be executed many times throughout the program 40% perfmance loss 20% perfmance loss 16 time (sec)
  • 17. Start to generate the Translation Block tb_gen_code() @ accel/tcg/translate-all.c ● tb_alloc() ○ This function would allocate the space from the Code Cache in TCGContext ○ If the Code Cache is full, just flush it ! ● gen_intermediate_code() ○ We will get the TCG-IR produced in this function, which is stored in the TCGContext ● tcg_gen_code() ○ Generate the host machine code according to the TCG-IR, and it would be stored in the tb 17
  • 18. Generate the TCG-IR first ! gen_intermediate_code() @ target/riscv/translate.c ● The while loop would decode each instruction in the guest binary until encounter the branch or reach the tb size ● decode_opc() ○ Decode the instruction in binary form 18
  • 19. Decode the instruction in guest binary decode_RV32_64G() @ target/riscv/translate.c 19
  • 20. Generate the TCG-IR first ! (E.g. arithmetic instr.) gen_arith() @ target/riscv/translate.c ● The guest instruction need to be implemented in the TCG variable system ● The TCG variables are declared and can be assigned the value from the architecture state or constants ● The TCG frontend ops would operate on these TCG variables. 20
  • 21. Generate the TCG-IR first ! (E.g. arithmetic instr.) tcg_gen_? @ tcg/tcg-op.h & tcg/tcg-op.c ● tcg_emit_op() ○ This function will allocate the space from TCGContext and insert it into the ops linked-list ● After getting the allocated space, the tcg opcode and arguments are filled into it. ● The TCG instrution generated would be showed later ... 21
  • 23. Generate the TCG-IR first ! (E.g. branch instr.) gen_branch() @ target/riscv/translate.c ● The Label also needed to generated via TCG-IR form ● gen_goto_tb() ○ Jump into the specific translation block ! ● When encountered the branch, which means it the end of the tb. ○ The ctx->bstate is set to break the outer while loop in gen_intermediate_code() 23
  • 24. How Block Chaining works? A slot waits for patching. If not patched yet, it would just jump to the next instruction The location of this slot would be recorded in the tb->jump_target_arg when generating host machine code The generating tb will patch the last executed tb24 The Patch Point
  • 25. Translate TCG-IR to x86_64 binary code 25
  • 26. TCG-IR to x86_64 translation ● Before entering the backend … 26
  • 27. TCG-IR to x86_64 translation RISC-V 32bit TCG-IR x86_64 ● Let’s take a look at some examples: Store tmp0 = tmp0 + 1156(0x484) Load s2 to tmp0 Load s3 to tmp1 Store tmp1 back to address tmp0How does QEMU generate binary code? 27
  • 28. TCG-IR to x86_64 translation ● Let’s take a look at some examples: add RISC-V 32bit TCG-IR x86_64 28
  • 29. TCG-IR to x86_64 translation ● Let’s take a look at some examples: add RISC-V 32bit TCG-IR x86_64 Architecture States Data Structure 29 Pointer
  • 30. TCG-IR to x86_64 translation ● Let’s take a look at some examples: add RISC-V 32bit TCG-IR x86_64 Architecture States Data Structure 30 General Purpose Registers Floating Point Registers
  • 31. TCG-IR to x86_64 translation ● Let’s take a look at some examples: add RISC-V 32bit TCG-IR x86_64 Load data from architecture state a5 reg. 31
  • 32. TCG-IR to x86_64 translation ● Let’s take a look at some examples: add RISC-V 32bit TCG-IR x86_64 tmp0 = tmp0 + tmp1 Load data from architecture state a5 32
  • 33. TCG-IR to x86_64 translation ● Let’s take a look at some examples: add RISC-V 32bit TCG-IR x86_64 tmp0 = tmp0 + tmp1 Load data from architecture state a5 Store tmp0 back to architecture state a5 reg 33
  • 34. TCG-IR to x86_64 translation RISC-V 32bit TCG-IR x86_64 ● Let’s take a look at some examples: load Load data from address ‘tmp0’ to tmp1 34
  • 35. TCG-IR to x86_64 translation RISC-V 32bit TCG-IR x86_64 ● Let’s take a look at some examples: load Store tmp1 back to archi. state a4 reg Load data from address ‘tmp0’ to tmp1 35
  • 36. TCG-IR to x86_64 translation RISC-V 32bit TCG-IR x86_64 ● Let’s take a look at some examples: Store Load s2 to tmp0 36
  • 37. TCG-IR to x86_64 translation RISC-V 32bit TCG-IR x86_64 ● Let’s take a look at some examples: Store tmp0 = tmp0 + 1156(0x484) Load s2 to tmp0 37
  • 38. TCG-IR to x86_64 translation RISC-V 32bit TCG-IR x86_64 ● Let’s take a look at some examples: Store tmp0 = tmp0 + 1156(0x484) Load s2 to tmp0 Load from s3 to tmp1 38
  • 39. TCG-IR to x86_64 translation RISC-V 32bit TCG-IR x86_64 ● Let’s take a look at some examples: Store tmp0 = tmp0 + 1156(0x484) Load s2 to tmp0 Load s3 to tmp1 39 Store tmp1 to address tmp 0
  • 40. TCG-IR to x86_64 translation RISC-V 32bit TCG-IR x86_64 ● Let’s take a look at some examples: Store tmp0 = tmp0 + 1156(0x484) Load s2 to tmp0 Load s3 to tmp1 Store tmp1 back to address tmp0How does QEMU handle branch instructions? 40
  • 41. TCG-IR to x86_64 translation ● Let’s take a look at some examples: Branch ○ Direct Branch ■ Conditional Branch beqz rs, offset / bgt rs, rt, offset / ….. ■ Unconditional Branch j offset / jal offset / call offset / ... ○ Indirect Branch ■ Switch/Case → Branch table ■ Indirect function call ■ Return Instructions (ret) 41
  • 42. TCG-IR to x86_64 translation ● Let’s take a look at some examples: Direct Branch (Unconditional) RISC-V 32bit TCG-IR x86_64 42
  • 43. TCG-IR to x86_64 translation ● Let’s take a look at some examples: Direct Branch (Unconditional) RISC-V 32bit TCG-IR x86_64 Remind: Patch point for block chaining 43
  • 44. TCG-IR to x86_64 translation ● Let’s take a look at some examples: Direct Branch (Unconditional) RISC-V 32bit TCG-IR x86_64 Synchronize program counter to architecture states 44
  • 45. TCG-IR to x86_64 translation ● Let’s take a look at some examples: Direct Branch (Unconditional) RISC-V 32bit TCG-IR x86_64 Synchronize program counter Prepare return value 45
  • 46. TCG-IR to x86_64 translation ● Let’s take a look at some examples: Direct Branch (Unconditional) RISC-V 32bit TCG-IR x86_64 Synchronize program counter Prepare return value Go back to QEMU to find next Translation Block 46
  • 47. TCG-IR to x86_64 translation ● Let’s take a look at some examples: Direct Branch (Unconditional) RISC-V 32bit TCG-IR x86_64 Remind: Patch point 47 Block Chaining: Link the current TB to previous TB by patching the jump target address
  • 48. TCG-IR to x86_64 translation ● Let’s take a look at some examples: Direct Branch (Conditional) 48
  • 49. TCG-IR to x86_64 translation ● Let’s take a look at some examples: Direct Branch (Conditional) 49
  • 50. TCG-IR to x86_64 translation ● Let’s take a look at some examples: Direct Branch (Conditional) 50
  • 51. TCG-IR to x86_64 translation ● Let’s take a look at some examples: Direct Branch (Conditional) If the block is chained by QEMU, jump to target translation block 51
  • 52. TCG-IR to x86_64 translation ● Let’s take a look at some examples: Direct Branch (Conditional) Go back to QEMU to find next Translation Block 52
  • 53. TCG-IR to x86_64 translation ● Let’s take a look at some examples: Indirect Branch - return instruction RISC-V 32bit TCG-IR x86_64 53
  • 54. TCG-IR to x86_64 translation ● Let’s take a look at some examples: Indirect Branch - return instruction RISC-V 32bit TCG-IR x86_64 54 Store the value from return address register to program counter
  • 55. TCG-IR to x86_64 translation ● Let’s take a look at some examples: Indirect Branch - return instruction RISC-V 32bit TCG-IR x86_64 Go back to QEMU to find next Translation Block in Program Counter 55
  • 56. TCG-IR to x86_64 translation ● helper function call ○ QEMU provides a ‘hook’ for developers to write emulation behavior in C language. Commonly used in: ■ Emulate hardware not supported in host machine ● e.g. Hardware FP, SIMD, AES, etc. ■ Dynamic Instrumentation ● Collect runtime information from source program to analyze program’s behavior (e.g. Dynamic call graph / control flow graph) 56
  • 57. TCG-IR to x86_64 translation ● helper function call - example call helper_function return result (%rax) to ‘fa5’ 57
  • 58. TCG-IR to x86_64 translation ● helper - Emulate IEEE 754 floating point add with double precision 58
  • 59. TCG-IR to x86_64 translation ● Prologue, epilogue - The entry point for each Translation Block 59
  • 60. TCG-IR to x86_64 translation ● Prologue, epilogue ○ Decide whether current TB can be executed 60
  • 61. TCG-IR to x86_64 translation ● Prologue, epilogue How to execute translated binary code? 61
  • 62. x86_64 code execution ● Entry point - qemu-riscv/accel/tcg/cpu-exec.c 62 tcg_qemu_tb_exec: A pointer targeting to the head of translation block
  • 63. x86_64 code execution ● Entry point - qemu-riscv/accel/tcg/cpu-exec.c 63 Put env to %rdi, tb_ptr to %rsi
  • 64. x86_64 code execution ● Entry point - qemu-riscv/accel/tcg/cpu-exec.c 64 Move %rdi to %r14, and jump %rsi
  • 65. x86_64 code execution ● Entry point - qemu-riscv/accel/tcg/cpu-exec.c 65 Move %rdi to %r14, and jump %rsi
  • 66. x86_64 code execution ● Entry point - qemu-riscv/accel/tcg/cpu-exec.c 66 Exit code cache and go back to QEMU
  • 67. x86_64 code execution ● Entry point - qemu-riscv/accel/tcg/cpu-exec.c 67 Return the value to ret, record last TB we just execute.
  • 68. Reference ● Qemu JIT Code Generator and System Emulation ● QEMU - Binary Translation ● QEMU TCG Frontend Ops ● RISC-V Insturction Set Manual ● Doraemon’s Notes: QEMU Backend ○ Written in Mandarin Chinese 68