Performance : Hyper-V vs. ESXi vs. KVM vs. VirtualBox

Posted on May 1st, 2012 by David Luhman and tagged .

First, let me say that these results should be taken with a grain of salt. These are rather informal tests, and are geared to my environment and my way of working.

Still, I tried to be somewhat rigorous by using the excellent Phoronix Test Suite, and keeping conditions as uniform as possible across the various virtualization environments.

Testing with multiple VMs running concurrently

Initially, I wanted to test under three conditions :

* One VM running alone within the host
* Two VMs running concurrently within the host
* Three VMs running concurrently within the host

My HP MicroServer has one CPU with two cores, so I wanted to see if I would start getting contention issues when the number of VMs exceeded the number of cores.

My testing found that, at least for the CPU-intensive tests, running two VMs on the host was fine. In other words, performance "scores" (ex. MFLOPs) were pretty stable for either one or two VMs.

I only tested two environments, Windows 2008 R2 Hyper-V, and VirtualBox with three concurrent VMs. Because I had limited on-server storage, and because ESXi seemed to limit my ability to micromanage that storage, I didn't have enough space to create three VMs under ESXi.

Anyway, I found expected results for three concurrent VMs under Hyper-V. For example, here are the results (time in seconds) for LAME MP3 encoding under one, two, and three concurrent instances

HyperV-OneVM 51
HyperV-TwoVM-Dev 51
HyperV-TwoVM-Stage 51
HyperV-ThreeVM-Dev 73
HyperV-ThreeVM-Stage 71
HyperV-ThreeVM-Mon 70
VirutalBox-OneVM 53
VirtualBox-ThreeVM-Dev 81
VirtualBox-ThreeVM-Stage 75
VirtualBox-ThreeVM-Mon 71

Note under Hyper-V, the MP3 encoding time was 51 seconds for one or two concurrent VMs, but went up to about 71 seconds when three VMs were running concurrently.

Times for VirtualBox under one and three VMs were comparable, but slightly slower.

Little variation in CPU-intensive tasks

Most of my testing involved testing either one or two VMs running concurrently. In these tests, for CPU-intensive tasks, there was little variation in test results across the three virtualization technologies.

For example, here are results for LAME MP3 encoding when running one or two VMs concurrently. Time is in seconds, so lower numbers are better.

HyperV-OneVM 51
HyperV-TwoVM-Dev 51
HyperV-TwoVM-Stage 51
ESXi-OneVM 48
ESXi-TwoVM-Dev 48
ESXi-TwoVM-Stage 48
KVM-OneVM-Dev 58
KVM-TwoVM-Dev 67
KVM-TwoVM-Stage 50
VirtualBox-OneVM-Dev 53

Note ESXi performance was a bit better than Hyper-V, which was a bit better than VirtualBox, with KVM had the worst times on average.

This is for one test, MP3 encoding, but these results tended to be repeated for the CPU-intensive tests. In other words, ESXi tended to have slightly better performance than Hyper-V, Hyper-V slightly better than VirtualBox, with KVM being the worst by a modest amount (maybe 10 percent slower than the best). Of course, in some tests, Hyper-V had the best performance etc.

Dramatic change in I/O intensive tests

I ran a suite of 26 tests in Phoronix. From the looks of the tests, the majority were CPU intensive. Some were a mixture (7-zip or BZIP compression and PostgreSQL transactions per second), and some were I/O intensive.

The I/O intensive tasks had the greatest variation.

For example, here are some results from a "Threaded I/O Write" test. Numbers are MB/sec, so higher is better :

HyperV-OneVM 306
HyperV-TwoVM-Dev 228
HyperV-TwoVM-Stage 215
ESXi-OneVM 11
ESXi-TwoVM-Dev 56
ESXi-TwoVM-Stage 41
KVM-OneVM-Dev 9
KVM-TwoVM-Dev 13
KVM-TwoVM-Stage 9
VirtualBox-OneVM-Dev 73

Note the wide variation in numbers. In all cases, I tried to create thin volumes, so this can explain some of the variation. In other words, with thin volumes, the first writes of the test may be allocating new disk space, which is overhead which won't be seen subsequently.

However, the Phoronix Test Suite has built-in guards to run a test more often when it sees variation in test times, so I'm not sure this overhead had that much of an effect.

For Hyper-V, KVM, and VirtualBox, I placed the concurrent VM disks on the same physical disks, so I expected and did see slow-downs in concurrent tests.

As I mentioned for ESXi, I had questions about how ESXi actually allocated storage, but I thought that ESXi was using RAID 0 in my setup, so I expected that ESXi would deliver better performance. However, that wasn't the case.

Despite the wide variation in I/O results, it seemed like Hyper-V had best results, followed by VirtualBox, then ESXi, with KVM last.

However, I'm sure with some tweaking you could get any kind of I/O performance you wanted. Suffice it to say that if you're disappointed with your VM's performance, you should concentrate your tuning efforts on I/O tweaks.

Comments

Disk performance

I find the difference in disk I/O performance you're seeing interesting.

What type of drives are you using and in what sort of configuration? SATA, SAS? JBOD, RAID?

What sort of guest OS were you testing with? Were you using the accelerated virtual disk and network adapters that Hyper-V, KVM, and VMware provide?

I ask primarily because the Hyper-V disk performance is so much higher than the others that I'm led to believe it's "cheating" and using a caching layer of some sort.

If Hyper-V isn't actually flushing changes to disk before letting the guest happily continue to make filesystem changes then this could account for both the I/O and other benchmarks being higher than the competitors.

My assumption would be that KVM using virtio drivers for disk devices with writeback caching enabled would perform as well if not better than Hyper-V. Unfortunately, caching changes to a virtual disk image is asking for data integrity issues in the event of the host shutting down improperly due to a power failure or other reason.

Comment by Sterling Windmill (not verified) on May 13th, 2012 at 4:18 pm
Comment by Morrizor (not verified) on May 16th, 2012 at 9:44 am

Disk setup and options

@Sterling :

Thanks for the comment. As indicated, all tests were writing to a single disk, except for ESXi where ESXi "took over" (in the default case) and seemed to present JBOD (non-RAID?) storage. Disks were SATA drives.

In all cases, aside from intentionally selecting thinly provisioned disks, I took the "default" storage options offered by the storage wizard.

I'm sure there are an infinite number of tweaks one could do to improve disk performance. I wanted to see what a layman (me) would see with the default options. It's certainly possible either the virtualization, OS or firmware layer can cheat with caching and the like.

Again, your mileage will undoubtedly vary, and disk I/O is probably the one place where you want to focus your attention once you've narrowed your virtualization choice to one or two candidates.

Comment by David Luhman on Jun 7th, 2012 at 1:09 pm

Thanks

Hi

Funny I am just considering the same thing for my N40L; ESXI vs Hyper-V - however I have access to Windows Server 2012 so am definitely going to go with that.

Would have been cool to boot to ESXI from the internal USB stick but no need, I have a small SSD anyway for quick boot to 2012 (hopefully)

Did you know you can put 16GB RAM in? Be careful which kind, don't have the detail but you can probably find it.

Gus

Comment by Gus (not verified) on Aug 26th, 2012 at 11:02 am

Not the same results here

But, instead of using file based disk images I used LVM and virtio for my virtual machines under KVM. Using image files and IDE interfaces will make for a slow virtual machine due to the added abstraction of the file to disk logic. Extremely painful when you add in thin provisioning. As others have noted no doubt the Hyper-V is using cache to achieve the I/O you describe. As a matter of fact I don't think that a single Sata (II ?) drive can support 300+ MB (big B) writes per second. Are they SATA IIIs? Still doubtful if that is the case.

Comment by Denny Hoff (not verified) on Sep 4th, 2012 at 10:08 pm

awesome tests

good to see some testing done on all 4 and results are surprising for the i/o -

Comment by moustachio (not verified) on Jun 11th, 2013 at 7:10 pm