VM Forking & Hypervisor-based Fuzzing with Xen
Open Source Summit Europe 2020
Tamas K Lengyel
2
Notices & Disclaimers
Intel technologies may require enabled hardware, software or service activation.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel
disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a
particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage
in trade.
The products described may contain design defects or errors known as errata which may cause the product to deviate from
published specifications. Current characterized errata are available on request.
You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel
products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted
which includes subject matter disclosed herein.
Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.
No product or component can be absolutely secure.
Your costs and results may vary.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names
and brands may be claimed as the property of others.
3
# whoami
• Senior Security Research @ Intel
• Maintainer of Xen’s introspection subsystem
• Maintainer of LibVMI
• Hypervisor agnostic introspection library (Xen, KVM, etc)
• Lot’s of convenient APIs to do introspection with
• Background in malware research & black-box binary analysis
4
Outline
1. Intro & Motivation
2. VM introspection
3. VM forking nuts & bolts
4. Fuzzing on Xen
• Harnessing & coverage tracing
• VMs with PCI-passthrough (IOMMU) devices
• Doublefetch detection
5
Motivation
• Time-tested approach to software validation
• Conceptually straight-forward
• In practice can be difficult depending on what you want to fuzz
• How do you create coverage trace for the kernel?
• How do you recover fast enough for fuzzing to be effective?
• How do you ensure system is in the proper state?
• How do you fuzz kernel-internal interfaces?
• How do you detect more then just “crashes”?
6
Kernel fuzzers do exist
• syzkaller
• Linux syscall fuzzer with built-in coverage guidance
• kAFL
• KVM based using AFL, coverage via Intel PT & PML
• Chocolate milk
• Custom bootloader & hypervisor, all in rust
7
Why make another one?
• These platforms are tightly coupled to their use-case
• We wanted something stable but also flexible to build on
• Preferring code that’s upstream to cut down on time it takes to
maintain custom patches & debugging things when they break
• Xen’s VMI subsystem is still experimental but fits the bill
• Also allows us to consider new types of fuzzing approaches
• Also allows us to target new use-cases
8
VM introspection
• Inspect VM internals from an external perspective
• Very similar to kernel debugging & memory forensics
• We can pause the VM at any event that traps to the VMM
• EPT faults
• Breakpoints
• CPUID
• Singlestep (MTF)
• Can do it both with in-guest help or without
9
Why VM forking?
• We need a way to restore VMs to a start point quickly after each
fuzz cycle
• Restoring from a save-file can take up to 2s
• Even from a fast SSD or tmpfs
• Fuzzing to be effective we need to be faster than that
• Xen has a long-forgotten, half abandoned subsystem:
• Memory sharing!
• We can use it to create forks in a fast & lightweight manner!
10
VM forking overview
1. Create VM with an empty EPT (ie. no memory)
2. Specify its parent VM
3. Copy vCPU parameters from parent
4. When VM is started it will page-fault back to Xen each time it tries
to access memory not yet mapped
5. Populate pages on-demand in the page-fault handler
• Read & execute accesses are populated with a shared entry
• Write accesses are deduplicated
11
VM forking details
• It’s a bit different then fork() on Linux
• The parent domain currently remains paused while forks are active
• This was fine for our use-case
• For a full domain split, all the parent pages need to be made shared
• Pages that can’t be made shared would need an extra copy
• Doable, was out-of-scope for now
• Forks can be further forked!
• Pages are searched for recursively
12
VM forking details
• VM forks can run with only CPU & memory
• No disk
• No networking
• No I/O
• No interrupts!
• It’s possible to launch QEMU to start backend services
• Patches implementing this are posted but not yet upstream
• Launching & resetting QEMU is slow
• Not a priority since it’s not required for fuzzing
13
VM forks: resetting
• No need to keep creating forks for every fuzz iteration
• We can just reset a previously forked VM
• Re-copy vCPU settings from parent
• Keep memory shared entries in place
• Future iterations will be that much faster
• Throw-away deduplicated memory
• Reset speed depends on how much memory needs to be freed here
• During fuzzing it’s usually very few pages
14
VM forking speed
VM fork creation time:
~745 μs ~= 1300 VM/s
VM fork reset time:
~111 μs ~= 9000 reset/s
Measured on i5-8350U
15
Harnessing
• Fuzzer needs to know where the target code starts & stops
• Need to manually mark it
• Harness needs to trap to the hypervisor
• Should not have side-effects
• Code needs to execute normally between start & stop harness
• Code needs to consume some input
• We need to know where the input is so we can fuzz it
16
Harnessing
CPUID instruction always traps to VMM
We use a magic CPUID leaf as our mark
No side-effect on target code, without
the fuzzer this is effectively a NOP
Call harness() before and after target code
Just printk info before the first harness!
17
Harnessing
• Parent VM will display information about target (buffer address) on
its virtual serial console that we’ll fuzz
• Parent VM will trap to the VMM on CPUID
• Detect if it’s the start signal (magic value) and pause Parent VM
• Increments IP so vCPU will be next starting just after the CPUID
18
Coverage tracing
• Fuzzer (AFL) needs to know when new code-paths are discovered
• By default AFL requires you to recompile your target
• Instruments each branch with hooks
• We don’t want to recompile the whole kernel
• We want to minimize the modifications we make to the target
• Just adding the calls to harness() and displaying relevant information
• During fuzzing code will run in a VM fork & the only visibility we
have is when it traps to VMM
19
Coverage tracing with VMI
• We can read & write to the VM forks memory from the VMM!
1. Configure VM fork to trap breakpoints to the VMM
2. Read & disassemble code from start point (RIP)
3. Find next control-flow instruction
4. Replace it with breakpoint
5. Resume vCPU
6. Breakpoint traps, remove breakpoint and enable singlestep (MTF)
7. MTF traps, disable MTF, goto Step 2
• Works in nested setups as well (tested with Xen inside VMware)!
20
Detecting crashes
• Breakpoint the kernel’s crash handlers
• Defined as “sink” points
• Breakpoints trap to the hypervisor, if any of them execute report
“crash”
• Good base targets to sink:
• panic()
• oops_begin()
• page_fault() or it’s new name asm_exc_page_fault()
21
Putting it all together
1. Setup parent VM: trap on first call to harness()
2. Create first fork: breakpoint the sinks
3. Create second fork: fuzz, execute & collect coverage trace!
Parent VM -> Sink VM -> Fuzz VM
22
Demo:
https://youtu.be/0A4msmDx30c
23
Coverage tracing with Intel Processor Trace
• Disassembly, breakpoint & singlestep is expensive
• We can go faster if the silicon collects the info for us
• Designate memory location (up to 4GB) as PT buffer
• VM forks’ execution will be recorded there
• Need to decode custom PT buffer format to reconstruct coverage
• Can be tedious and existing decoders not designed for high-speed fuzzing
• Open Source community to the rescue: https://github.com/nyx-fuzz/libxdc
• Does not work in nested setup, only single address-space
24
AFL + PT demo
25
Alternative harnessing
• What if we can’t recompile our target to add the harness()?
• We can use a debugger to add breakpoints as our harness!
• Run with GDB, set breakpoint before & after target code
• Fuzzer needs to know original instruction before it was
breakpointed (really just the first byte)
• When breakpoint traps to the VMM, replace breakpoint with
original content
• Fuzz!
26
Demo:
https://youtu.be/kundkmZMbl4
27
PCI-passthrough devices & fuzzing
• Making sure your target code is in the right state can be difficult
• Kernel modules may only fully initialize if physical device is present
• We can attach device to parent VM!
• Kernel module fully initializes & actively drives device
• Harness & fork works just the same!
• Only parent VM has access to device
• VM fork can’t corrupt device
• VM fork can’t access the device
28
Demo:
https://youtu.be/O6zti8V45ds
29
Detecting doublefetches
• We can define any condition as a “crash”
• Detecting doublefetch conditions is very difficult
• Sometimes introduced by the compiler so source review is not sufficient
• We are already hooked into the VMM pagefault handler
• We can detect doublefetches using EPT
1. Remove R/W permissions from page suspected of being doublefetched from
2. When an access faults, record page offset, reset permission & singlestep
3. In singlestep handler remove permissions & continue
4. If next access fault is at the same offset: doublefetch detected!
30
Demo:
https://youtu.be/O6zti8V45ds
31
Code released as open-source (MIT)
VM forking is upstream in Xen 4.14
Kernel Fuzzer for Xen Project (kfx):
https://github.com/intel/kernel-fuzzer-for-xen-project
32
Thanks!
Questions? Comments?
tamas.lengyel@intel.com
@tklengyel
Special thanks to the following people for their significant help:
Andrew Cooper, @buherator, @icedevml, @0xTony, @poeplau,
@proskurinserg, @is_eqv, @ms_s3c
33

VM Forking and Hypervisor-based Fuzzing with Xen

  • 1.
    VM Forking &Hypervisor-based Fuzzing with Xen Open Source Summit Europe 2020 Tamas K Lengyel
  • 2.
    2 Notices & Disclaimers Inteltechnologies may require enabled hardware, software or service activation. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein. Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy. No product or component can be absolutely secure. Your costs and results may vary. © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
  • 3.
    3 # whoami • SeniorSecurity Research @ Intel • Maintainer of Xen’s introspection subsystem • Maintainer of LibVMI • Hypervisor agnostic introspection library (Xen, KVM, etc) • Lot’s of convenient APIs to do introspection with • Background in malware research & black-box binary analysis
  • 4.
    4 Outline 1. Intro &Motivation 2. VM introspection 3. VM forking nuts & bolts 4. Fuzzing on Xen • Harnessing & coverage tracing • VMs with PCI-passthrough (IOMMU) devices • Doublefetch detection
  • 5.
    5 Motivation • Time-tested approachto software validation • Conceptually straight-forward • In practice can be difficult depending on what you want to fuzz • How do you create coverage trace for the kernel? • How do you recover fast enough for fuzzing to be effective? • How do you ensure system is in the proper state? • How do you fuzz kernel-internal interfaces? • How do you detect more then just “crashes”?
  • 6.
    6 Kernel fuzzers doexist • syzkaller • Linux syscall fuzzer with built-in coverage guidance • kAFL • KVM based using AFL, coverage via Intel PT & PML • Chocolate milk • Custom bootloader & hypervisor, all in rust
  • 7.
    7 Why make anotherone? • These platforms are tightly coupled to their use-case • We wanted something stable but also flexible to build on • Preferring code that’s upstream to cut down on time it takes to maintain custom patches & debugging things when they break • Xen’s VMI subsystem is still experimental but fits the bill • Also allows us to consider new types of fuzzing approaches • Also allows us to target new use-cases
  • 8.
    8 VM introspection • InspectVM internals from an external perspective • Very similar to kernel debugging & memory forensics • We can pause the VM at any event that traps to the VMM • EPT faults • Breakpoints • CPUID • Singlestep (MTF) • Can do it both with in-guest help or without
  • 9.
    9 Why VM forking? •We need a way to restore VMs to a start point quickly after each fuzz cycle • Restoring from a save-file can take up to 2s • Even from a fast SSD or tmpfs • Fuzzing to be effective we need to be faster than that • Xen has a long-forgotten, half abandoned subsystem: • Memory sharing! • We can use it to create forks in a fast & lightweight manner!
  • 10.
    10 VM forking overview 1.Create VM with an empty EPT (ie. no memory) 2. Specify its parent VM 3. Copy vCPU parameters from parent 4. When VM is started it will page-fault back to Xen each time it tries to access memory not yet mapped 5. Populate pages on-demand in the page-fault handler • Read & execute accesses are populated with a shared entry • Write accesses are deduplicated
  • 11.
    11 VM forking details •It’s a bit different then fork() on Linux • The parent domain currently remains paused while forks are active • This was fine for our use-case • For a full domain split, all the parent pages need to be made shared • Pages that can’t be made shared would need an extra copy • Doable, was out-of-scope for now • Forks can be further forked! • Pages are searched for recursively
  • 12.
    12 VM forking details •VM forks can run with only CPU & memory • No disk • No networking • No I/O • No interrupts! • It’s possible to launch QEMU to start backend services • Patches implementing this are posted but not yet upstream • Launching & resetting QEMU is slow • Not a priority since it’s not required for fuzzing
  • 13.
    13 VM forks: resetting •No need to keep creating forks for every fuzz iteration • We can just reset a previously forked VM • Re-copy vCPU settings from parent • Keep memory shared entries in place • Future iterations will be that much faster • Throw-away deduplicated memory • Reset speed depends on how much memory needs to be freed here • During fuzzing it’s usually very few pages
  • 14.
    14 VM forking speed VMfork creation time: ~745 μs ~= 1300 VM/s VM fork reset time: ~111 μs ~= 9000 reset/s Measured on i5-8350U
  • 15.
    15 Harnessing • Fuzzer needsto know where the target code starts & stops • Need to manually mark it • Harness needs to trap to the hypervisor • Should not have side-effects • Code needs to execute normally between start & stop harness • Code needs to consume some input • We need to know where the input is so we can fuzz it
  • 16.
    16 Harnessing CPUID instruction alwaystraps to VMM We use a magic CPUID leaf as our mark No side-effect on target code, without the fuzzer this is effectively a NOP Call harness() before and after target code Just printk info before the first harness!
  • 17.
    17 Harnessing • Parent VMwill display information about target (buffer address) on its virtual serial console that we’ll fuzz • Parent VM will trap to the VMM on CPUID • Detect if it’s the start signal (magic value) and pause Parent VM • Increments IP so vCPU will be next starting just after the CPUID
  • 18.
    18 Coverage tracing • Fuzzer(AFL) needs to know when new code-paths are discovered • By default AFL requires you to recompile your target • Instruments each branch with hooks • We don’t want to recompile the whole kernel • We want to minimize the modifications we make to the target • Just adding the calls to harness() and displaying relevant information • During fuzzing code will run in a VM fork & the only visibility we have is when it traps to VMM
  • 19.
    19 Coverage tracing withVMI • We can read & write to the VM forks memory from the VMM! 1. Configure VM fork to trap breakpoints to the VMM 2. Read & disassemble code from start point (RIP) 3. Find next control-flow instruction 4. Replace it with breakpoint 5. Resume vCPU 6. Breakpoint traps, remove breakpoint and enable singlestep (MTF) 7. MTF traps, disable MTF, goto Step 2 • Works in nested setups as well (tested with Xen inside VMware)!
  • 20.
    20 Detecting crashes • Breakpointthe kernel’s crash handlers • Defined as “sink” points • Breakpoints trap to the hypervisor, if any of them execute report “crash” • Good base targets to sink: • panic() • oops_begin() • page_fault() or it’s new name asm_exc_page_fault()
  • 21.
    21 Putting it alltogether 1. Setup parent VM: trap on first call to harness() 2. Create first fork: breakpoint the sinks 3. Create second fork: fuzz, execute & collect coverage trace! Parent VM -> Sink VM -> Fuzz VM
  • 22.
  • 23.
    23 Coverage tracing withIntel Processor Trace • Disassembly, breakpoint & singlestep is expensive • We can go faster if the silicon collects the info for us • Designate memory location (up to 4GB) as PT buffer • VM forks’ execution will be recorded there • Need to decode custom PT buffer format to reconstruct coverage • Can be tedious and existing decoders not designed for high-speed fuzzing • Open Source community to the rescue: https://github.com/nyx-fuzz/libxdc • Does not work in nested setup, only single address-space
  • 24.
  • 25.
    25 Alternative harnessing • Whatif we can’t recompile our target to add the harness()? • We can use a debugger to add breakpoints as our harness! • Run with GDB, set breakpoint before & after target code • Fuzzer needs to know original instruction before it was breakpointed (really just the first byte) • When breakpoint traps to the VMM, replace breakpoint with original content • Fuzz!
  • 26.
  • 27.
    27 PCI-passthrough devices &fuzzing • Making sure your target code is in the right state can be difficult • Kernel modules may only fully initialize if physical device is present • We can attach device to parent VM! • Kernel module fully initializes & actively drives device • Harness & fork works just the same! • Only parent VM has access to device • VM fork can’t corrupt device • VM fork can’t access the device
  • 28.
  • 29.
    29 Detecting doublefetches • Wecan define any condition as a “crash” • Detecting doublefetch conditions is very difficult • Sometimes introduced by the compiler so source review is not sufficient • We are already hooked into the VMM pagefault handler • We can detect doublefetches using EPT 1. Remove R/W permissions from page suspected of being doublefetched from 2. When an access faults, record page offset, reset permission & singlestep 3. In singlestep handler remove permissions & continue 4. If next access fault is at the same offset: doublefetch detected!
  • 30.
  • 31.
    31 Code released asopen-source (MIT) VM forking is upstream in Xen 4.14 Kernel Fuzzer for Xen Project (kfx): https://github.com/intel/kernel-fuzzer-for-xen-project
  • 32.
    32 Thanks! Questions? Comments? tamas.lengyel@intel.com @tklengyel Special thanksto the following people for their significant help: Andrew Cooper, @buherator, @icedevml, @0xTony, @poeplau, @proskurinserg, @is_eqv, @ms_s3c
  • 33.