How to Troubleshoot Apps for the Modern Connected Worker
Implements BIOS emulation support for BHyVe
1. Implements BIOS
emulation support for
BHyVe
Takuya ASADA<syuu@freebsd.org>
13年3月17日日曜日
2. Before talk about BIOS
Emulation on BHyVe
Let’s quickly looking into BHyVe internal
structure and Intel VT-x
13年3月17日日曜日
3. BHyVe Overview
2. Run VM instace
Disk image
• bhyveload loads guest
1. Create VM instance, tap device OS
load guest kernel stdin/stdout
Guest
kernel N
Console
3. Destroy VM • bhyve is userland part of
H
D
I
C
instance Hypervisor
bhyveload
bhyve
bhyvectl
Emulates devices
•
libvmmapi
mmap/ioctl
bhyvectl is a management
tool
/dev/vmm/${vm_name} (vmm.ko)
FreeBSD kernel • libvmmapi is userland API
• vmm.ko is kernel part of
Hypervisor
13年3月17日日曜日
4. vmm.ko
• Provides /dev/vmm/${vmname}
• Each vmm device file contains each VM
instance state
• The device file can create via sysctl:
hw.vmm.create
• Destroy via sysctl: hw.vmm.destroy
13年3月17日日曜日
5. /dev/vmm/${vmname}
interfaces
• read/write/mmap
Can access guest memory area by standard
syscall (Which means you even can dump
guest memory by dd command)
• ioctl
Provides various operation to VM
13年3月17日日曜日
6. /dev/vmm/${vmname}
ioctls
• VM_MAP_MEMORY: Map guest memory
area as requested size
• VM_SET/GET_REGISTER: Access registers
• VM_RUN: Run guest machine, until virtual
devices accessed (Or some other trap
happened)
13年3月17日日曜日
7. bhyveload
• FreeBSD bootloader ported to userland: userboot
• bhyveload loads userboot.so as dynamic link library, call loader_main function
• Once it called, it does following things:
• Parse UFS on diskimage, find kernel
• Load kernel to guest memory area (using mmap)
• Set initial guest register values (using VM_SET_REGISTER ioctl)
• RIP = kernel entry point
• CR0 = Paging enable | Protected mode enable
• EFER = Long mode enable | Long mode active
• Initialize Page Table, set addr to CR3
• Create GDT, IDT, LDT, set addr to GDTR, IDTR, LDTR
• Initialize TR
• Guest machine starts from kernel entry point, with 64bit mode enabled
13年3月17日日曜日
8. bhyve
• bhyve command runs like following rules:
while (1) {
ioctl(VM_RUN);
device_io_emulation();
}
13年3月17日日曜日
9. Intel VT-x: Hardware
assisted virtualization
VMX VMX
root mode non-root
mode
User User
(Ring 3) VMEntry (Ring 3)
Kernel VMExit Kernel
(Ring 0) (Ring 0)
• New CPU mode:
VMX root mode(hypervisor) / VMX non-root mode(guest)
• If some event which need to emulate in hypervisor,
CPU stops guest, exit to hypervisor → VMExit
13年3月17日日曜日
10. VT-x configuration
• Which event should be handled by
hypervisor?
It depends hypervisor implementation!
• VT-x is configurable!
You can disable/enable each event
• Also can change some behavior of CPU
13年3月17日日曜日
11. BHyVe BIOS emulation
project
• Google Summer of Code ’12
“BHyVe BIOS emulation to boot legacy
systems”
• Project Goal:
Implement BIOS emulation on BHyVe
hypervisor, to make BHyVe able to support
more guest OSes
13年3月17日日曜日
12. Limitation of bhyveload
• It’s legacy free! yay!
• But...
• Only supports FreeBSD/amd64
• You need to implement kernel loader for
each OSes
• Want to run more OSes on BHyVe!
13年3月17日日曜日
13. Why don’t you just
implement OS loader?
• Better than supporting legacy ugly BIOS? True! But...
• OS loader will be heavily dependent kernel
implementation
• You’ll be need to implement OS loader for each OSes
ex: Linux loader, NetBSD loader, OpenBSD loader...
• Maybe it’s very hard to implement proprietary OS loader
• Even OS loader could worked, Guest OS may call BIOS
interrupt handler → DIE!
It’s common on 32bit x86 OSes.
Most 64bit OS are legacy free.
13年3月17日日曜日
15. What happen when it
called?
int 13h Software interrupt(INTx)
CPU reads interrupt vector
On the
ROM
Execute BIOS call handler
Perform IO by in/out or MMIO
Hardware
13年3月17日日曜日
16. How Linux KVM
handles BIOS
• KVM uses QEMU for userland process
• QEMU has real BIOS called “SeaBIOS”,
opensource BIOS
• SeaBIOS perform I/O by in/out instruction
or MMIO
• KVM handles these I/O, emulate devices
13年3月17日日曜日
17. BIOS call handling on
KVM
int 13h Software interrupt(INTx)
CPU reads interrupt vector
Execute interrupt handler
SeaBIOS preforms IO VMExit by in/out or MMIO
to virtual HW
QEMU HW
Guest Emulation
HyperVisor QEMU emulates HW IO
13年3月17日日曜日
18. Bring SeaBIOS in
BHyVe?
• I wanted to use it
• But we can’t bring the code in FreeBSD
• Because it’s GPLv3 licensed
13年3月17日日曜日
19. OK then, is there BSDL
BIOS?
• Unfortunately, we haven’t find any BSDL
BIOS
• But, there’s BSDL DOS emulator on Ports:
doscmd
• It has DOS & BIOS interrupt call emulator
runs on FreeBSD/i386
13年3月17日日曜日
20. How doscmd works
• Map pages on low memory area to place DOS app(<1MB)
• Setup interrupt vector / interrupt handler(It just issues HLT;IRET)
• Load DOS app on low memory area
• Enter virtual 8086 mode(i386_vm86(2)), entry DOS app entry address
• CPU executes DOS app in virtual 8086 mode
• When DOS app calls DOS/BIOS interrupt call, it handled by interrupt
handler, the handler issues HLT instruction
• Once HLT instruction issued, CPU leaves from virtual 8086 mode
• doscmd emulates DOS/BIOS interrupt call virtual 8086
• return to virtual 8086 mode mode
13年3月17日日曜日
21. How doscmd works
int 13h Software interrupt(INTx)
CPU reads interrupt vector
Issue HLT instruction Execute interrupt handler
HLT instruction Trap
DOS app on
BIOS Emulation
v8086 mode
doscmd emulates BIOS call
doscmd on FreeBSD/i386
13年3月17日日曜日
22. Difference of BIOS handling
on QEMU vs doscmd
• QEMU
Runs real BIOS in guest machine
Interrupt handler handles BIOS interrupt call
QEMU just emulates hardware devices
• doscmd
Hasn’t real BIOS
Interrupt handler is just for trap vm86
machine
doscmd emulates BIOS interrupt call handler
13年3月17日日曜日
23. Plan to emulate BIOS
on BHyVe
• Extract only necessary code from doscmd, make it library
Export two function: biosemul_init() / biosemul_call()
• In biosemul_init(), perform BIOS compatible initialization
(initialize register value, boot sector loading, initialize
interrupt vector, install interrupt handler)
• On interrupt handler, use VMCALL instruction instead of
HLT instruction
Because GuestOS also may use HLT, and we don’t want
to handle it by BIOS emulation code
• biosemul_call() handles BIOS interrupt call
Executes BIOS interrupt call emulation using doscmd code
13年3月17日日曜日
24. How to handle BIOS
interrupt call in BHyVe
int 13h Software interrupt(INTx)
CPU reads interrupt vector
Execute interrupt call handler
Issue VMCALL VMExit by VMCALL
instruction
BIOS Emulation
Guest
HyperVisor doscmd emulates BIOS call
13年3月17日日曜日
25. Why don’t you trap
interrupt directly?
• Intel VT-x has ability to trap interrupt directly
(no need to issue VMCALL instruction in
interrupt handler)
• Why we shouldn’t use it for BIOS emulation?
Because guest OS may use BIOS interrupt call
vector numbers for different software interrupt
after entering protected mode
• Bootloaders may invoke interrupt handler by
jumping address (btx does it)
13年3月17日日曜日
26. Problems(1)
• doscmd is 64bit unsafe!
Need to rewrite some type definition
Ex: u_long → uint32_t
• doscmd maps guest memory area at 0x0
Maybe we also can mmap guest memry area at 0x0
on BHyVe, but I rewrited code
Ex:
*(char *)(0x400) = 0;
↓
*(char *)(0x400 + guest_mem) = 0;
13年3月17日日曜日
27. Problems(2)
• Guest register storage
doscmd stores register value in their
structure, but BHyVe requires to issue ioctl
to set/get guest register
I decided to copy all register first, then
emulate BIOS interrupt call, writeback
modified register after that
13年3月17日日曜日
28. Debugging BIOS
emulator
• When I started implementing BIOS emulation, I inserted register
dump for each BIOS interrupt call
• Actually, dumping for each BIOS interrupt call is too few to
determine what’s going on
• And the emulation doesn’t worked fine, it finally jumped away
to strange EIP and commit suicide, I have no idea
• I haven’t find a way to run BHyVe on an emulator and getting
instruction level trace
• BHyVe can run on VMware, but I haven’t find a way to do
tracing on it
• Decided to implement instruction level trace on BHyVe
13年3月17日日曜日
29. Implement instruction
level tracer on BHyVe(1)
• If guest CPU is emulated, dumping each instruction is
very easy
Just dump everything when instruction decoder called
• But, on BHyVe guest program runs natively
Because it uses VT-x
• This means, you have no way to inspect instruction or
dump registers until VMExit caused
• Then, we can raise exception on every instruction
• You can insert instruction to raise exception, but x86 has
a flag to single step debugging (TF bit on EFLAGS)
13年3月17日日曜日
30. Implement instruction
level tracer on BHyVe(2)
• At first, I implemented following rule:
• Sets TF bit on EFLAGS, enables VMExit on
#DB exception
• bhyve handle #DB exception, disassembly
instruction on EIP, step forward EIP
address,VMEnter again
• I suddenly realized VMExit causing BEFORE
executing instruction! USELESS!!
13年3月17日日曜日
31. Implement instruction
level tracer on BHyVe(3)
• I changed my mind to handle it just same as BIOS interrupt
call (interrupt handler issue VMCALL instruction→VMExit)
• EIP and some register are pushed on stack because it’s not
returned
Need to fetch from stack to dump
• OLD_EIP = *(uint16_t *)(ESP)
• OLD_CS = * (uint16_t *)(ESP + 2)
• OLD_EFLAGS = * (uint16_t *)(ESP + 4)
• OLD_ESP = * (uint16_t *)(ESP + 6)
13年3月17日日曜日
33. Tracing suddenly stops!
(1)
• EFLAGS can be cleared on some conditions
• popf clears EFLAGS:
#DB exception still causes immediately
after popf instruction issued, so setting TF
bit on OLD_FLAGS(on stack) can solve
the issue
(Guest machine restores EFLAGS by
IRET)
13年3月17日日曜日
34. Tracing suddenly stops!
(2)
• EFLAGS can be cleared on some conditions
• BIOS interrupt call VMExit:
Looks like CPU clears TF flag when it interrupted
doscmd uses following interrupt call handler for handle
BIOS interrupt call:
VMCALL; STI; RETF 2
RETF 2 means don’t restore CS and EFLAGS, so changing
OLD_EFLAGS(on stack) has no effect
Just sets TF bit on EFLAGS can solve the issue
• But we must not set TF bit on EFLAGS when interrupt is
#DB exception
It causes infinite loop
13年3月17日日曜日
35. Tracing suddenly stops!
(3)
• lidt just before switching protected mode
• After IDTR changed, #DB exception cannot handle anymore
• Because #DB handler only installed on real mode interrupt
vector, not on IDT
• Modified IDT and implement #DB handler on btx
• #DB exception haven’t caused in real mode after the lidt
instruction
• Probably because IDT for protected mode is not valid for real
mode
• After switching protected mode, tracing could resumed by set
TF flag on EFLAGS
13年3月17日日曜日
36. Exception causes
exception
• Not really sure, but it looks like exception
raises at an exception handler
• Because of this, it can’t print error on
console
• Inserted VMCALL at the beginning of
exception handler, dump it all
13年3月17日日曜日
40. Conclusion
• Test implementation of BIOS emulator for BHyVe
implemented
• Instruction level tracer implemented on it for debugging
• Reached at /boot/loader stage, but it dies before loading
a kernel
• Advices by bootloader developers are really needed
• Advices for better debugging method is also needed
(Is there hardware debugger for x86?
Or, maybe VMware has cool debugging feature?)
13年3月17日日曜日