The document compares the performance of running a MySQL database benchmark (Sysbench) on virtual machines versus bare metal machines. On Fedora, the benchmark achieved 6-7% higher transactions per second, queries per second, and lower latency when run on the bare metal host compared to the virtual machine guest. Similarly, on Debian, the benchmark achieved significantly higher transactions per second (over 500 vs under 80) and lower latency when run on bare metal. Tracing tools like trace-cmd can be used to analyze the additional overhead introduced by the virtualization layer.
Analyze Virtual Machine Overhead Compared to Bare Metal with Tracing
1. Brought to you by
Analyze Virtual Machine
Overhead Compared to Bare
Metal with Tracing
Steven Rostedt
Software Engineer at Google Inc
2. Steven Rostedt
Software Engineer at Google
■ One of the original authors of the PREEMPT_RT patch set
■ Creator and maintainer of ftrace (the official Linux tracer)
■ Creator of “make localmodconfig”
■ Creator and maintainer of “ktest.pl” Linux testing framework
3. Using Virtual Machines
Pros:
■ Gives more flexibility
● Can migrate from one machine to another
● Duplicate VMs
● Can save them (archives)
4. Using Virtual Machines
Pros:
■ Gives more flexibility
● Can migrate from one machine to another
● Duplicate VMs
● Can save them (archives)
■ Can easily shutdown and restart
● No lengthy BIOS
5. Using Virtual Machines
Pros:
■ Gives more flexibility
● Can migrate from one machine to another
● Duplicate VMs
● Can save them (archives)
■ Can easily shutdown and restart
● No lengthy BIOS
■ More “secure”
● Better isolation of tasks
7. Using Virtual Machines
Cons:
■ Takes up more memory
● Requires two operating systems on the machine
■ Adds overhead
● There’s indirection between the VM and the devices
9. Virtual Machines Overhead
How bad is it really?
■ The pros usually outweigh the cons
● We are always trying to improve
10. Virtual Machines Overhead
How bad is it really?
■ The pros usually outweigh the cons
● We are always trying to improve
■ Where to look for that improvement
● Usually anytime the hypervisor needs to do work for the VM
● Utilize virtio more
■ Virtual devices that take advantage of the virtual environment
■ Easier to implement, and less overhead than simulating real devices
11. Tools to determine Virtual Machine Overhead
As I’m the ftrace maintainer, I’ll focus on ftrace tracing solutions.
12. Tools to determine Virtual Machine Overhead
As I’m the ftrace maintainer, I’ll focus on ftrace tracing solutions.
■ trace-cmd
● A front end CLI tool to interact with the Linux kernel tracing facility
13. Tools to determine Virtual Machine Overhead
As I’m the ftrace maintainer, I’ll focus on ftrace tracing solutions.
■ trace-cmd
● A front end CLI tool to interact with the Linux kernel tracing facility
● Can start tracing and examine the live events
14. Tools to determine Virtual Machine Overhead
As I’m the ftrace maintainer, I’ll focus on ftrace tracing solutions.
■ trace-cmd
● A front end CLI tool to interact with the Linux kernel tracing facility
● Can start tracing and examine the live events
● Can record to a file for post processing
15. Tools to determine Virtual Machine Overhead
As I’m the ftrace maintainer, I’ll focus on ftrace tracing solutions.
■ trace-cmd
● A front end CLI tool to interact with the Linux kernel tracing facility
● Can start tracing and examine the live events
● Can record to a file for post processing
● This talk will focus on the post processing
16. What to look at for Virtual Machine Overhead
Databases are a common critical path of virtual machine services
■ sysbench
● https://github.com/akopytov/sysbench
● Available on most distributions
● Has a mysql benchmark
● Good to compare different machines (in this case, VM vs Bare-metal)
17. What to look at for Virtual Machine Overhead
Databases are a common critical path of virtual machine services
■ sysbench
● https://github.com/akopytov/sysbench
● Available on most distributions
● Has a mysql benchmark
● Good to compare different machines (in this case, VM vs Bare-metal)
■ mysql / MariaDB
● In Debian the “mysql” commands are run by tasks called mariadbd
● In Fedora the “mysql” commands are run by tasks called “mysqld”
■ But it still is MariaDB and not mysql!
18. The setup
Using two bare-metal machines. One with Fedora 33 the other is Debian “testing”
■ Fedora 33
● Kernel: 5.14.18-100.fc33.x86_64
● File system: ext4
● 8 CPUs (4 cores / 2 hyperthreaded)
● 32Gs RAM
● VM - same setup but only 4 CPUs / 2G RAM
■ Debian testing (from a month ago)
● Kernel: 5.18.0-3-amd64
● File system: ext4
● 6 CPUs (6 cores)
● 16Gs RAM
● VM - same setup but only 4 CPUs / 2G RAM
20. The setup
To make it even, limit everything to a single CPU
■ taskset
● Sets the affinity of the tasks
● Set everything to CPU 1
■ CPU 0 usually has house keeping tasks
● Set the database server (mysqld / mariadbd) to CPU 1
● Set the sysbench application to CPU 1
21. Set the database server to CPU 1
# ps aux | grep mysqld
mysql 20539 0.1 3.4 2289088 131892 ? Ssl 19:36 0:02 /usr/libexec/mysqld --basedir=/usr
root 20925 0.0 0.0 221440 860 pts/3 S+ 20:18 0:00 grep --color=auto mysqld
# taskset -a -pc 1 20539
34. Using trace-cmd
■ There are over a 100 KVM events (Host events to handle guests)
■ Traces when guests enter and leave the virtual environment
35. Using trace-cmd
■ There are over a 100 KVM events (Host events to handle guests)
■ Traces when guests enter and leave the virtual environment
■ Shows why guests exit and what the host is doing for the guest
36. On Fedora trace Guest from Host
# trace-cmd record -e sched -e kvm ssh root@Fedora33 ‘taskset -a -c 1 sysbench
mysql-user=sbtest_user --mysql_password=password --mysql-db=sbtest --tables=16
--table-size=10000 --threads=4 --time=10 --events=0 --report-interval=1
/usr/share/sysbench/oltp_read_write.lua run’
39. Using libtracecmd
■ Is a library that comes with trace-cmd
■ Allows you to write tools that can analyze trace.dat files
● These are the files that trace-cmd creates.
40. Using libtracecmd
■ Is a library that comes with trace-cmd
■ Allows you to write tools that can analyze trace.dat files
● These are the files that trace-cmd creates.
■ This talk is not about how to use this library
● Only that it exists
41. Using libtracecmd
■ Is a library that comes with trace-cmd
■ Allows you to write tools that can analyze trace.dat files
● These are the files that trace-cmd creates.
■ This talk is not about how to use this library
● Only that it exists
■ But we will use a tool I created with this library
● https://rostedt.org/code/kvm-exit.c
42. Using libtracecmd
■ Is a library that comes with trace-cmd
■ Allows you to write tools that can analyze trace.dat files
● These are the files that trace-cmd creates.
■ This talk is not about how to use this library
● Only that it exists
■ But we will use a tool I created with this library
● https://rostedt.org/code/kvm-exit.c
■ This examines the kvm_exit and kvm_entry events
43. On Fedora Host run
# kvm-exit trace.dat
vCPU 0: host_pid: 2442
Number of exits: 32505
Total time (us): 8290386
Avg time (us): 255
Max time (us): 66519
Min time (nano): 272
reason: EXCEPTION_NMI isa:1 exit:0
Number of exits: 1
Total time (us): 5
Avg time (us): 5
Max time (us): 5
Min time (nano): 5452
reason: EXTERNAL_INTERRUPT isa:1 exit:1
Number of exits: 5840
Total time (us): 43813
Avg time (us): 7
Max time (us): 97
Min time (nano): 766
Migrated: 3
Preempted:
Number of exits: 12
44. On Fedora Host run
reason: INTERRUPT_WINDOW isa:1 exit:7
Number of exits: 1427
Total time (us): 2263
Avg time (us): 1
Max time (us): 16
Min time (nano): 382
reason: CPUID isa:1 exit:10
Number of exits: 65
Total time (us): 90
Avg time (us): 1
Max time (us): 7
Min time (nano): 596
reason: HLT isa:1 exit:12
Number of exits: 3132
Total time (us): 8149964
Avg time (us): 2602
Max time (us): 66519
Min time (nano): 468
Migrated: 1
Preempted:
Number of exits: 52
Total time (us): 544
Avg time (us): 10
45. Using trace-cmd agent
■ Run trace-cmd agent on the guests
● Listens on the vsocket for connections from the host
● Can start tracing for the guest
● Synchronizes timestamps with the host
46. Using trace-cmd agent
■ Run trace-cmd agent on the guests
● Listens on the vsocket for connections from the host
● Can start tracing for the guest
● Synchronizes timestamps with the host
■ Use the -A option on the host running trace-cmd record
● Will connect to the guest agent
● Can start tracing on both the host and the guest
● Negotiates timestamp synchronization to keep host and guest events in sync
48. On the Guest
# trace-cmd agent
listening on @4:823
vsocket CID
49. On the Guest
# trace-cmd agent
listening on @4:823
vsocket port
vsocket CID
50. On Host run
# trace-cmd record -e sched -e kvm -A @4:823 --name guest -e sched
ssh root@Fedora33 ‘taskset -a -c 1 sysbench --mysql-user=sbtest_user
--mysql_password=password --mysql-db=sbtest --tables=16 --table-size=10000 --threads=4
--time=10 --events=0 --report-interval=1 /usr/share/sysbench/oltp_read_write.lua run’
51. On Host run
# trace-cmd record -e sched -e kvm -A @4:823 --name guest -e sched
ssh root@Fedora33 ‘taskset -a -c 1 sysbench --mysql-user=sbtest_user
--mysql_password=password --mysql-db=sbtest --tables=16 --table-size=10000 --threads=4
--time=10 --events=0 --report-interval=1 /usr/share/sysbench/oltp_read_write.lua run’
vsocket port
vsocket CID
52. Analyze Fedora sysbench
# kvm-exit -c sysbench trace.dat trace-guest.dat
vCPU 1: host_pid: 16185
Number of exits: 2548
Total time (us): 16112
Avg time (us): 6
Max time (us): 71
Min time (nano): 655
task sysbench: Total run time: 575048(us)
reason: EXTERNAL_INTERRUPT isa:1 exit:1
Number of exits: 692
Total time (us): 8430
Avg time (us): 12
Max time (us): 71
Min time (nano): 2644
reason: PENDING_INTERRUPT isa:1 exit:7
Number of exits: 118
Total time (us): 280
Avg time (us): 2
Max time (us): 6
Min time (nano): 655
53. Analyze Fedora sysbench
reason: MSR_WRITE isa:1 exit:32
Number of exits: 656
Total time (us): 2111
Avg time (us): 3
Max time (us): 18
Min time (nano): 832
reason: EPT_VIOLATION isa:1 exit:48
Number of exits: 500
Total time (us): 2012
Avg time (us): 4
Max time (us): 31
Min time (nano): 948
reason: PREEMPTION_TIMER isa:1 exit:52
Number of exits: 582
Total time (us): 3277
Avg time (us): 5
Max time (us): 16
Min time (nano): 1693
54. Analyze Fedora mysqld
# kvm-exit -c mysqld trace.dat trace-guest.dat
vCPU 1: host_pid: 16185
Number of exits: 10479
Total time (us): 66016
Avg time (us): 6
Max time (us): 72
Min time (nano): 457
task mysqld: Total run time: 2363606(us)
reason: EXTERNAL_INTERRUPT isa:1 exit:1
Number of exits: 2583
Total time (us): 30788
Avg time (us): 11
Max time (us): 72
Min time (nano): 814
reason: PENDING_INTERRUPT isa:1 exit:7
Number of exits: 663
Total time (us): 1174
Avg time (us): 1
Max time (us): 9
Min time (nano): 457
55. Analyze Fedora mysqld
reason: MSR_WRITE isa:1 exit:32
Number of exits: 2956
Total time (us): 10012
Avg time (us): 3
Max time (us): 54
Min time (nano): 534
reason: EPT_VIOLATION isa:1 exit:48
Number of exits: 1571
Total time (us): 7942
Avg time (us): 5
Max time (us): 25
Min time (nano): 733
reason: EPT_MISCONFIG isa:1 exit:49
Number of exits: 291
Total time (us): 3448
Avg time (us): 11
Max time (us): 29
Min time (nano): 4022
reason: PREEMPTION_TIMER isa:1 exit:52
Number of exits: 2415
Total time (us): 12651
Avg time (us): 5
Max time (us): 16
Min time (nano): 731
56. Analyze Debian sysbench
# kvm-exit -c sysbench trace.dat trace-guest.dat
vCPU 1: host_pid: 613977
Number of exits: 5287
Total time (us): 26484
Avg time (us): 5
Max time (us): 978
Min time (nano): 283
task sysbench: Total run time: 1361148(us)
reason: EXTERNAL_INTERRUPT isa:1 exit:1
Number of exits: 1236
Total time (us): 10163
Avg time (us): 8
Max time (us): 978
Min time (nano): 590
reason: PENDING_INTERRUPT isa:1 exit:7
Number of exits: 228
Total time (us): 195
Avg time (us): 0
Max time (us): 3
Min time (nano): 408
57. Analyze Debian sysbench
reason: MSR_WRITE isa:1 exit:32
Number of exits: 568
Total time (us): 1744
Avg time (us): 3
Max time (us): 34
Min time (nano): 354
reason: PAUSE_INSTRUCTION isa:1 exit:40
Number of exits: 1
Total time (us): 2
Avg time (us): 2
Max time (us): 2
Min time (nano): 2829
reason: EPT_VIOLATION isa:1 exit:48
Number of exits: 2923
Total time (us): 13477
Avg time (us): 4
Max time (us): 411
Min time (nano): 513
reason: PREEMPTION_TIMER isa:1 exit:52
Number of exits: 313
Total time (us): 886
Avg time (us): 2
Max time (us): 21
Min time (nano): 831
58. Analyze Debian mariadbd
# kvm-exit -c mariadbd trace.dat trace-guest.dat
vCPU 1: host_pid: 613977
Number of exits: 17846
Total time (us): 132184
Avg time (us): 7
Max time (us): 31001
Min time (nano): 309
task mariadbd: Total run time: 7045710(us)
reason: EXCEPTION_NMI isa:1 exit:0
Number of exits: 2
Total time (us): 12
Avg time (us): 6
Max time (us): 6
Min time (nano): 6034
reason: EXTERNAL_INTERRUPT isa:1 exit:1
Number of exits: 6320
Total time (us): 43236
Avg time (us): 6
Max time (us): 598
Min time (nano): 572
59. Analyze Debian mariadbd
reason: PENDING_INTERRUPT isa:1 exit:7
Number of exits: 535
Total time (us): 464
Avg time (us): 0
Max time (us): 6
Min time (nano): 417
reason: MSR_WRITE isa:1 exit:32
Number of exits: 3300
Total time (us): 16468
Avg time (us): 4
Max time (us): 6177
Min time (nano): 309
reason: EPT_VIOLATION isa:1 exit:48
Number of exits: 2667
Total time (us): 48480
Avg time (us): 18
Max time (us): 31001
Min time (nano): 594
reason: EPT_MISCONFIG isa:1 exit:49
Number of exits: 3237
Total time (us): 18510
Avg time (us): 5
Max time (us): 264
Min time (nano): 1308
60. Analyze Debian mariadbd Guest functions
# kvm-exit -c mariadbd -f trace.dat trace-guest.dat
vCPU 1: host_pid: 613977
Number of exits: 17846
Total time (us): 132184
Avg time (us): 7
Max time (us): 31001
Min time (nano): 309
task mariadbd: Total run time: 7045710(us)
reason: EXCEPTION_NMI isa:1 exit:0
Number of exits: 2
Total time (us): 12
Avg time (us): 6
Max time (us): 6
Min time (nano): 6034
reason: EXTERNAL_INTERRUPT isa:1 exit:1
Number of exits: 6320
Total time (us): 43236
Avg time (us): 6
Max time (us): 598
Min time (nano): 572
61. Analyze Debian mariadbd Guest functions
reason: MSR_WRITE native_write_msr+0x4 isa:1 exit:32
Number of exits: 3300
Total time (us): 16468
Avg time (us): 4
Max time (us): 6177
Min time (nano): 309
reason: EPT_VIOLATION isa:1 exit:48
Number of exits: 475
Total time (us): 1066
Avg time (us): 2
Max time (us): 26
Min time (nano): 594
reason: EPT_VIOLATION memcg_slab_post_alloc_hook+0x127 isa:1 exit:48
Number of exits: 2192
Total time (us): 47413
Avg time (us): 21
Max time (us): 31001
Min time (nano): 1191
reason: EPT_MISCONFIG iowrite16+0x9 isa:1 exit:49
Number of exits: 3237
Total time (us): 18510
Avg time (us): 5
Max time (us): 264
Min time (nano): 1308
62. Using libtracecmd for task analysis
■ Another tool I created for task analysis
● https://rostedt.org/code/task-time.c
63. Using libtracecmd for task analysis
■ Another tool I created for task analysis
● https://rostedt.org/code/task-time.c
■ This examines the times the task:
● Runs on each CPU
64. Using libtracecmd for task analysis
■ Another tool I created for task analysis
● https://rostedt.org/code/task-time.c
■ This examines the times the task:
● Runs on each CPU
● Is preempted by other tasks (and shows when threads preempt each other)
65. Using libtracecmd for task analysis
■ Another tool I created for task analysis
● https://rostedt.org/code/task-time.c
■ This examines the times the task:
● Runs on each CPU
● Is preempted by other tasks (and shows when threads preempt each other)
● Is blocked on I/O
66. Using libtracecmd for task analysis
■ Another tool I created for task analysis
● https://rostedt.org/code/task-time.c
■ This examines the times the task:
● Runs on each CPU
● Is preempted by other tasks (and shows when threads preempt each other)
● Is blocked on I/O
● Is sleeping (but could be blocked on a pthread_mutex)
67. Analyze Debian mariadbd task
# task-time -c mariadbd trace.dat
Task: mariadbd
Total run time (us): 7374469
Thread preempt time (us): 23500051
Total preempt time (us): 1375971 (-22124080)
Total blocked time (us): 2432569
Total sleep time (us): 49682796
thread id: 493170
Total run time (us): 215
Total preempt time (us): 0
Total blocked time (us): 0
Total sleep time (us): 9450
thread id: 493171
Total run time (us): 428
Total preempt time (us): 0
Total blocked time (us): 0
Total sleep time (us): 9605371
[..]
68. Analyze Debian mariadbd task
# task-time -c mariadbd trace.dat
Task: mariadbd
Total run time (us): 7374469
Thread preempt time (us): 23500051
Total preempt time (us): 1375971 (-22124080)
Total blocked time (us): 2432569
Total sleep time (us): 49682796
thread id: 493170
Total run time (us): 215
Total preempt time (us): 0
Total blocked time (us): 0
Total sleep time (us): 9450
thread id: 493171
Total run time (us): 428
Total preempt time (us): 0
Total blocked time (us): 0
Total sleep time (us): 9605371
[..]
# task-time -c mariadbd trace-guest.dat
Task: mariadbd
Total run time (us): 7045710
Thread preempt time (us): 20535866
Total preempt time (us): 1327352 (-19208514)
Total blocked time (us): 3352464
Total sleep time (us): 37335578
thread id: 1206
Total run time (us): 233
Total preempt time (us): 0
Total blocked time (us): 0
Total sleep time (us): 4113
thread id: 1207
Total run time (us): 375
Total preempt time (us): 0
Total blocked time (us): 0
Total sleep time (us): 10001415
[..]
69. Analyze Debian mariadbd task
# task-time -c mariadbd trace.dat
Task: mariadbd
Total run time (us): 7374469
Thread preempt time (us): 23500051
Total preempt time (us): 1375971 (-22124080)
Total blocked time (us): 2432569
Total sleep time (us): 49682796
thread id: 493170
Total run time (us): 215
Total preempt time (us): 0
Total blocked time (us): 0
Total sleep time (us): 9450
thread id: 493171
Total run time (us): 428
Total preempt time (us): 0
Total blocked time (us): 0
Total sleep time (us): 9605371
[..]
# task-time -c mariadbd trace-guest.dat
Task: mariadbd
Total run time (us): 7045710
Thread preempt time (us): 20535866
Total preempt time (us): 1327352 (-19208514)
Total blocked time (us): 3352464
Total sleep time (us): 37335578
thread id: 1206
Total run time (us): 233
Total preempt time (us): 0
Total blocked time (us): 0
Total sleep time (us): 4113
thread id: 1207
Total run time (us): 375
Total preempt time (us): 0
Total blocked time (us): 0
Total sleep time (us): 10001415
[..]
Blocked for 33.0% Blocked for 47.6%
70. ■ KernelShark can show host guest interactions
● Follow this tutorial
■ https://rostedt.org/host-guest-tutorial/
● kernelshark trace.dat -a trace-guest.dat
Using KernelShark
71.
72. Brought to you by
Steven Rostedt
rostedt@goodmis.org
@rostedt