QEMU is truly excellent software, from the perspective of a person who very rarely needs to emulate another architecture. It "just works" and has wonderful integrations with basically everything I could want.. sometimes it feels like magic: even if the commandline UX is a bit weird in places.
I've always wondered though how it works with KVM: I know KVM is a virtualisation accelerator that enables passing through native code to the CPU somehow; but it feels like QEMU/KVM basically runs the internet now. Almost the entire modern cloud is built on QEMU and KVM as a hypervisor (right?) but I feel like I'm missing a lot about how it's working.
I also wonder if this steals huge amounts of resources away from emulation, or does it end up helping out. Because to say the modern internet is largely running on QEMU is likely a massive understatement.
monocasa 2 days ago [-]
KVM is basically three components.
* An abstraction over second level page tables to map some of a host user process as what the guest thinks of as physical memory.
* An abstraction to jump into the context that uses those page tables, and traps back out in the case of anything that the hardware would normally handle, but the hypervisor wants to handle manually instead.
* A collection of mechanisms to handle some of those traps in kernel space to avoid having to context switch back out to the host user process if the kind of trap is common enough, both in the sense of the trap itself happens often enough to show up on perf graphs, as well as the abstraction being exercised is relatively standard (think interrupt controllers and timers).
Let me know if you have any other questions.
eddd-ddde 2 days ago [-]
Where could someone get started in terms of reading material to learn more about this in depth?
I would assume sooner or later you're going to end up in the Intel Developer manuals or the equivalent for whatever architecture you are interested in. The Intel ones are very complete at least.
znpy 2 days ago [-]
> I would assume sooner or later you're going to end up in the Intel Developer manuals or the equivalent for whatever architecture you are interested in. The Intel ones are very complete at least.
I can vouch for this. I'm no virtualization expert but I did stumble upon some intel developers manuals (truthfully, i fell into the rabbit hole) and just skimming it made everything make much more sense.
The link above explains how the VMX extension work on intel processors. Any software doing hardware-assisted virtualization (so no binary translation, no full-system-emulation) will likely be using those instructions.
jlokier 2 days ago [-]
The AMD Processor Programming Reference manuals are also good for this, if you like complete and detailed. They complement the Intel manuals. Much the material is duplicate because the processors are so similar, but written in a different way.
billywhizz 2 days ago [-]
if you want to look at existing implementations on top of kvm then these might be useful - rust-vmm is a core library for AWS' firecracker vmm.
How does nested KVM work? Are all the page tables handled by the top level? Do the traps have to propagate up?
bonzini 2 days ago [-]
Yes, the top level uses write protection of guest memory to combine the two levels of translation into one.
privatelypublic 2 days ago [-]
I thought part of vt-d/vt-x made the "virtual tables" actual tables.
Eg- the memory the VM can access is controlled by the MMU of the CPU (below ring0/kernel). Resulting in the only VM escapes being the Shim(s) for talking with the host (network, memory balloon, graphics).
bonzini 2 days ago [-]
Yes, there are virtualization-specific page tables that convert guest physical to host physical addresses. KVM still haw to take host userspace's virtual addresses, convert them to host physical addresses, and make sure that the virtualization-specific page tables stay in sync with the kernel's usual page tables (which convery host virtual addresses to host physical)
jamesy0ung 2 days ago [-]
Yeah I also found myself curious as to how KVM actually works, I found these helpful
Excellent. I haven't gone through them yet, but if you've any similar pointers for QEMU, please share.
My rough understanding is that it's the user-space emulation part of a virtualization solution. I.e., when the kernel traps the virtualized process, saying 'nope, you can't do that here', the control falls back to user space handler in QEMU saying, 'hey, the kernel said I can't do that there; can you sort this out?'. And this back-and-forth games keeps happening during the lifetime of the virtualized process.
dijit 2 days ago [-]
Awesome, thanks for the entrypoint!
teekert 2 days ago [-]
If you use it rarely, I can high recommend the excellent QuickEMU [0]
Any VM is just a `quickget ubuntu 24.04` and `quickemu --vm ubuntu-24.04.conf` away. The conf file is just a yaml that is very readable and can give you more cores/ram/disk easily. Just run `quickget` to get a list of OS's to download.
Looks like it does :) Maybe take it up with Martin Wimpress, who probably meant it as a way to jar us all just a tad (knowing him from podcasts he probably has a witty and funny response to such inquiries).
stinkbeetle 2 days ago [-]
> I've always wondered though how it works with KVM
Other people have given some more comprehensive explanations, but I'll try to put it as simply as possible.
Plain QEMU has a CPU emulation layer called TCG. The machine basically consists of memory (RAM and MMIO devices) and CPUs (CPU registers and state). When QEMU has set up the machine and is ready to run, it calls TCG to say "given this memory and this initial CPU register state, start running instructions". When you use QEMU with KVM, the TCG emulation layer is swapped out with KVM and it asks KVM to start running instructions. That's it. KVM exposes APIs that caller can specify guest memory and initial CPU register state, and a call to run that CPU with that memory.
Going a bit further, the hardware virtualization functions that KVM uses have the ability to map that memory with a second level of translation which lets KVM present it to the guest at the locations it expects, and to prevent the guest from accessing any memory that it should not. The hardware also has the ability to run the CPU in a mode where it has the normal set of registers (which is what QEMU wants), but it maintains some additional hypervisor control registers not available to the guest, and those can ensure the guest can't take complete control of the CPU (for example, the guest OS can "disable interrupts" with the usual MSR or similar bit and that does prevent the guest from getting interrupts, but that it does not disable hypervisor directed interrupts, so the hypervisor can always take back control of the CPU with a hypervisor-IPI or hypervisor timer interrupt).
Further still: when running in plain QEMU mode, devices are emulated by registering MMIO ranges in the memory address space and emulated loads and stores have code to detect these regions and instead of performing a simple load or store, they call into device model code which handles it accordingly. When you plug KVM in, you can still use these emulated devices. These are modeled by using that second level page table to put "not-valid" mappings in those MMIO ranges. These cause the CPU to trigger a page fault when it tries to access them, and KVM sees this, looks up the table of memory registered by QEMU, and sees that it is an address which QEMU wants to handle, so it returns from the KVM_RUN system call with result code that indicates there was an MMIO read/write that needs to be handled. QEMU then directs this into its emulated device model. Then when QEMU has performed that device emulation, it calls back into KVM to continue running the CPU.
It's all pretty clever. The really astounding thing is that most of the basic concepts for all this stuff were developed/discovered/invented like 50+ years ago.
egberts1 1 days ago [-]
And there are other emulated accelerators that QEMU leverages:
The only comment that directly answers the original doubt about how QEMU can use and work with KVM. Hats off.
pm215 2 days ago [-]
Resources-wise there's not really any "stealing" going on. The people/companies who care about KVM and the virtualization use cases work on that, and the people/companies who care about emulation work on those parts. If QEMU didn't support virtualization then it's not like the people currently working on QEMU virtualization would shift over to emulation support: they'd be working on some other project instead to achieve their VM goals.
dizhn 2 days ago [-]
qemu/kvm in enabling the cloud is huge but that's not the only place it really makes a tremendous difference. One example where it's essential is new OS development. They all basically first target the qemu machine with its virtual hardware. It makes development much faster compared to running on real hardawre while easily enabling debug output without needing cables and the like.
monocasa 1 days ago [-]
Eh, we just used stuff like bochs and vmware prior.
dizhn 1 days ago [-]
I didn't mean qemu is the only option.
monocasa 1 days ago [-]
My point is that the appearance qemu/kvm didn't really practically change the space much.
dizhn 1 days ago [-]
Oh I understand what you're saying. You're probably right. Collectively virtualization allows a lot but qemu might not be as exclusive as I said. (I think there are only a few years between bocsh/vmware and qemu/xen.)
EDIT: I didn't mean to sound like ChatGPT. It happened naturally :)
lathiat 2 days ago [-]
Not everything uses qemu. Some do. More use KVM. Not everything does.
I've found QEMUs microvm to be faster at boot while having nicer tooling and a cleaner upgrade path if needing more features. Aside from hype I'm actually not sure why anyone would still use firecracker.
monocasa 1 days ago [-]
Mainly because of the much larger attack surface of QEMU.
xyse53 14 hours ago [-]
I can't quantify how much of that surface is also reduced with the microvm machine vs other parts of QEMU vs Firecracker... But fair enough point.
hnlmorg 2 days ago [-]
Xen is still used massively too.
bonzini 2 days ago [-]
AWS runs Xen guests through a KVM-based compatibility layer. You can try it with QEMU too.
drzaiusx11 2 days ago [-]
> Experimental support for compiling to WASM using Emscripten.
Neat. This will unlock various online "playgrounds" for a number of CPU architectures, among other interesting use cases.
Likely this was possible beforehand, but it's nice to see it added as a feature to the project directly.
jononor 1 days ago [-]
Could be cool for interactive simulator of microcontroller/embedded in browser.
dlachausse 2 days ago [-]
If you’re on macOS or iOS, UTM is an excellent Qemu front end.
QEMU is a Software Freedom Conservancy member project like Git, OpenWRT, and many others. You can donate through the Conservancy link you posted and mention which project you wish to support.
stronglikedan 2 days ago [-]
are you by any chance a checker at a grocery store?
tracker1 2 days ago [-]
I'm curious if QEMU will ever support features like x86(_64) hardware paths that with Arm and RISC-V... Since most of the patents are now expired, it makes a lot of sense. Apple seems to be further along here than other competitors, but it seems to be limited to Rosetta, not broadly supported.
bonzini 2 days ago [-]
If the kernel makes it available to userspace, then it could be done. I am not aware of any specific emulation-friendly extensions to RISC-V though.
tracker1 2 days ago [-]
Me either, but I'm assuming they'll start to become more common. Possibly the inverse as well, ARM cores on x86 even.
xhrpost 2 days ago [-]
Didn't realize there was a MIPS build of Windows NT. Which led me to wikipedia to find there were a lot of other architectures supported in the past.
cbm-vic-20 2 days ago [-]
In my last job, I was on the team that handled the Windows NT bulid on DEC Alpha. Native Alpha apps were much faster than the equivalent Intel NT machines. Apropos to this topic, DEC had a sybsystem called FX!32 that was sort of like what Rosetta does for Apple Silicon, allowing Intel apps to be run at useable speeds on Alpha.
If I still had my UltraSPARC, I'd be wanting to find the SPARC port of Windows NT.
ducktective 2 days ago [-]
Awesome tech!
It's not possible to run an android VM on QEMU right? As in, is it officially supported? (I know about Waydroid)
epilys 2 days ago [-]
Yes, it's possible and supported. QEMU can emulate an aarch64 system, and Google provides aarch64 Android builds for virtual machines specifically, called "Cuttlefish". Search for keywords "Android Cuttlefish QEMU" for instructions.
homebrewer 2 days ago [-]
The official Android "emulator" supplied by Google is qemu. If you're not satisfied with it for some reason, IIRC I used these images some years ago on top of vanilla qemu:
They don't seem to be well supported anymore, and there aren't many prebuilt alternatives. One can always compile AOSP from source, though Google does not make this easy.
acuozzo 2 days ago [-]
> The official Android "emulator" supplied by Google is qemu
Nitpick: It's a fork of QEMU. There are quite a few Google-exclusive changes bundled-in.
mrlonglong 22 hours ago [-]
Fabrice Ballard is a genius.
simonebrunozzi 2 days ago [-]
Curious: how do people use Qemu the most these days? Dev environment? Running specific apps on a different OS? I don't know... gaming?
mdaniel 2 days ago [-]
I'd suspect a great deal of people are secretly benefiting from qemu when they do $(docker build --platform linux/{arm64,amd64}) courtesy of binfmt_misc and a static copy of qemu
and let me tell you from first-hand experience, that trying to swap in an updated version of the bundled qemu binary when the static version panics on some mis-emulated instruction is some whooooooo, boy
mdaniel 2 days ago [-]
Then again, everything in buildkit is designed for maximum opacity, in my experience so I guess it tracks
nativeit 2 days ago [-]
Mine is a rather prosaic example, but I'm sure it's not uncommon: Proxmox on leased bare metal servers make for wonderful (small scale, but impressively equipped at ~$100/mo) cheap dev hosting.
If you find yourself limited by the equivalent VPS expense, I discovered that for my use-case (mixed web hosting, dev services, self-hosting) I could squeeze a lot more out of an entry level bare-metal box with ~48GB of RAM, and everything just becomes a VM in Proxmox, and it's still trivially simple to scale/replicate, maintain backups, and tie together with other VPS or cloud services.
The only part that was a bit of a challenge is negotiating NAT for the virtual NICs so you don't need separate IPv4 addresses for each guest. But Proxmox's docs are pretty robust, and I'm sure there are dozens of tuts available now.
I've always wondered though how it works with KVM: I know KVM is a virtualisation accelerator that enables passing through native code to the CPU somehow; but it feels like QEMU/KVM basically runs the internet now. Almost the entire modern cloud is built on QEMU and KVM as a hypervisor (right?) but I feel like I'm missing a lot about how it's working.
I also wonder if this steals huge amounts of resources away from emulation, or does it end up helping out. Because to say the modern internet is largely running on QEMU is likely a massive understatement.
* An abstraction over second level page tables to map some of a host user process as what the guest thinks of as physical memory.
* An abstraction to jump into the context that uses those page tables, and traps back out in the case of anything that the hardware would normally handle, but the hypervisor wants to handle manually instead.
* A collection of mechanisms to handle some of those traps in kernel space to avoid having to context switch back out to the host user process if the kind of trap is common enough, both in the sense of the trap itself happens often enough to show up on perf graphs, as well as the abstraction being exercised is relatively standard (think interrupt controllers and timers).
Let me know if you have any other questions.
I can vouch for this. I'm no virtualization expert but I did stumble upon some intel developers manuals (truthfully, i fell into the rabbit hole) and just skimming it made everything make much more sense.
For example: https://www.intel.com/content/dam/www/public/us/en/documents... - "CHAPTER 23 INTRODUCTION TO VIRTUAL MACHINE EXTENSIONS"
The link above explains how the VMX extension work on intel processors. Any software doing hardware-assisted virtualization (so no binary translation, no full-system-emulation) will likely be using those instructions.
https://github.com/rust-vmm/kvm https://github.com/kvmtool/kvmtool https://github.com/sysprog21/kvm-host
Eg- the memory the VM can access is controlled by the MMU of the CPU (below ring0/kernel). Resulting in the only VM escapes being the Shim(s) for talking with the host (network, memory balloon, graphics).
https://www.kernel.org/doc/ols/2007/ols2007v1-pages-225-230.... http://www.haifux.org/lectures/312/High-Level%20Introduction... https://zserge.com/posts/kvm/
My rough understanding is that it's the user-space emulation part of a virtualization solution. I.e., when the kernel traps the virtualized process, saying 'nope, you can't do that here', the control falls back to user space handler in QEMU saying, 'hey, the kernel said I can't do that there; can you sort this out?'. And this back-and-forth games keeps happening during the lifetime of the virtualized process.
Any VM is just a `quickget ubuntu 24.04` and `quickemu --vm ubuntu-24.04.conf` away. The conf file is just a yaml that is very readable and can give you more cores/ram/disk easily. Just run `quickget` to get a list of OS's to download.
[0] https://github.com/quickemu-project/quickemu
Other people have given some more comprehensive explanations, but I'll try to put it as simply as possible.
Plain QEMU has a CPU emulation layer called TCG. The machine basically consists of memory (RAM and MMIO devices) and CPUs (CPU registers and state). When QEMU has set up the machine and is ready to run, it calls TCG to say "given this memory and this initial CPU register state, start running instructions". When you use QEMU with KVM, the TCG emulation layer is swapped out with KVM and it asks KVM to start running instructions. That's it. KVM exposes APIs that caller can specify guest memory and initial CPU register state, and a call to run that CPU with that memory.
Going a bit further, the hardware virtualization functions that KVM uses have the ability to map that memory with a second level of translation which lets KVM present it to the guest at the locations it expects, and to prevent the guest from accessing any memory that it should not. The hardware also has the ability to run the CPU in a mode where it has the normal set of registers (which is what QEMU wants), but it maintains some additional hypervisor control registers not available to the guest, and those can ensure the guest can't take complete control of the CPU (for example, the guest OS can "disable interrupts" with the usual MSR or similar bit and that does prevent the guest from getting interrupts, but that it does not disable hypervisor directed interrupts, so the hypervisor can always take back control of the CPU with a hypervisor-IPI or hypervisor timer interrupt).
Further still: when running in plain QEMU mode, devices are emulated by registering MMIO ranges in the memory address space and emulated loads and stores have code to detect these regions and instead of performing a simple load or store, they call into device model code which handles it accordingly. When you plug KVM in, you can still use these emulated devices. These are modeled by using that second level page table to put "not-valid" mappings in those MMIO ranges. These cause the CPU to trigger a page fault when it tries to access them, and KVM sees this, looks up the table of memory registered by QEMU, and sees that it is an address which QEMU wants to handle, so it returns from the KVM_RUN system call with result code that indicates there was an MMIO read/write that needs to be handled. QEMU then directs this into its emulated device model. Then when QEMU has performed that device emulation, it calls back into KVM to continue running the CPU.
It's all pretty clever. The really astounding thing is that most of the basic concepts for all this stuff were developed/discovered/invented like 50+ years ago.
https://wiki.gentoo.org/wiki/QEMU#Introduction
EDIT: I didn't mean to sound like ChatGPT. It happened naturally :)
Example: https://firecracker-microvm.github.io/
Neat. This will unlock various online "playgrounds" for a number of CPU architectures, among other interesting use cases.
Likely this was possible beforehand, but it's nice to see it added as a feature to the project directly.
https://getutm.app/
https://mac.getutm.app/
QEMU is a Software Freedom Conservancy member project like Git, OpenWRT, and many others. You can donate through the Conservancy link you posted and mention which project you wish to support.
https://en.wikipedia.org/wiki/FX!32
It's not possible to run an android VM on QEMU right? As in, is it officially supported? (I know about Waydroid)
https://www.fosshub.com/Android-x86.html
They don't seem to be well supported anymore, and there aren't many prebuilt alternatives. One can always compile AOSP from source, though Google does not make this easy.
Nitpick: It's a fork of QEMU. There are quite a few Google-exclusive changes bundled-in.
- https://github.com/moby/buildkit/blob/v0.23.2/docs/multi-pla...
- https://github.com/moby/buildkit/blob/v0.23.2/Dockerfile#L16...
- https://github.com/tonistiigi/binfmt/blob/buildkit/v9.2.2-54...
- https://github.com/tonistiigi/binfmt/blob/buildkit/v9.2.2-54... and https://github.com/tonistiigi/binfmt/blob/buildkit/v9.2.2-54...
and let me tell you from first-hand experience, that trying to swap in an updated version of the bundled qemu binary when the static version panics on some mis-emulated instruction is some whooooooo, boy
If you find yourself limited by the equivalent VPS expense, I discovered that for my use-case (mixed web hosting, dev services, self-hosting) I could squeeze a lot more out of an entry level bare-metal box with ~48GB of RAM, and everything just becomes a VM in Proxmox, and it's still trivially simple to scale/replicate, maintain backups, and tie together with other VPS or cloud services.
The only part that was a bit of a challenge is negotiating NAT for the virtual NICs so you don't need separate IPv4 addresses for each guest. But Proxmox's docs are pretty robust, and I'm sure there are dozens of tuts available now.
>(Cannot access the database)