Performance overheads of virtualization

Over the years since the last article I wrote on the subject of performance overheads of virtualization, the most common criticism I have received has become that it is out of date. And it is a well placed criticism – I originally carried out that analysis over a decade ago on Core2 class hardware. Since the very next generation of CPUs, there were cries that things have improved – despite the fact that I have been periodically re-testing with newer hardware and software and finding that things have not in fact changed all that much.

So in this article, I will move things on by a few years and publish some findings on more recent generations of CPUs, though by no means the current generation (that will be in the next article). The testing methodology I used in this case is different, since I was doing the testing for a project I was working on for a client at the time. On a positive note, it means testing is very much based on a real production workload (MySQL performance), so if anything the findings should be more relevant and less synthetic than a kernel compilation workload.

Measuring Virtualization Performance Overheads – Test Setup and Methodology

Data: A copy of the optimally indexed and optimised production database. Data was copied using tar – no data import/export was carried out to ensure the data was identical to production environment, complete with distribution of partially empty data pages. The test is to replay of the a day’s worth of SELECT queries captured via the MySQL general log, in parallel, using twice as many threads as there are CPUs in the host system, to simulate full saturation and reduce effect of possible I/O bottlenecks. Data was replayed twice – once to warm up the buffer pool, and then again for the measurement run.

MySQL: Percona 5.5.37 on both servers, running on CentOS 6. (Yes, ancient, but these notes are from testing in 2014.)

Cores/Threads: 4 cores, 4 threads
The CPUs in the cloud provider’s servers appeared to be Xeon E7-8837 which do not support HyperThreading. In order to provide as close as possible to a like-for-like comparison, the bare metal server was also tested with HT disabled. There are some unavoidable system discrepancies between the two servers, but the results will be normalised to compensate for this as much as possible.

Servers

Bare Metal Test Server:
1x Xeon L5520 @ 2.27GHz (2009)
Cores: 4
Threads: 8
Cache: 8MB
Memory Channels: 3
Replay time: 4 cores / 8 threads: 115m25s
4 cores / 4 threads: 128m43s
The test results on the bare metal server show HyperThreading to provide approximately an 11% boost from HT

Cloud provider’s Virtual Test Server:
1x Xeon E7-8837 @ 2.67GHz (2011)
Cores: 8
Threads: 8
Cache: 24MB
Memory Channels: 4
8 cores / 8 threads: 105m13s
4 cores / 4 threads: 194m14s

Equalisation adjustments:
Bare metal (BM): 2.27GHz
Virtual machine (VM): 2.67GHz (1.1762x faster clock)
The performance boost provided by the extra memory channel is not easy to quantify, so we will ignore this for now, and let it favour the VM setup.

Adjusted performance:
BM: 6,925s x 1.0000 = 6,925 s econds
VM: 11,654s x 1.1762 = 13,708 seconds

VM is 1.979x slower than bare metal per core-GHz.
VM has 50.52% of performance of bare metal per core-GHz.

This does not take into account the advantage the VM has from the memory bandwidth due to an extra memory channel or any additional instructions-per-clock (IPC) improvements available on the CPUs of a 2 year newer design. Therefore, for a database workload, in order to maintain an approximately similar performance level, the virtual machines will each need twice as many CPU cores as the bare metal machines, assuming equal clock speeds on both. It is worth noting that there are scalability constraints in software, so as the number of cores goes up, even more cores would need to be added to compensate for the virtualization performance overhead.

The Followup

Following a conversation with the cloud provider to discuss the findings, I re-tested the VM performance on an instance in their newer date centre which has slightly newer hardware.

The results with the exact same test were as follows:
Bare metal: Xeon L5520 (2009 generation) (4x 2.27GHz cores, HT disabled): 128m43s
VM 1: Xeon E7-8837 (2011 generation) (4x 2.67GHz cores, no HT): 194m14s
VM 2: Xeon E5-4650 (2012 generation) (4x 2.70GHz codes, no HT): 152m53s

Adjusting for clock speed difference:
BM: 6,925 seconds (100%)
VM 1: 13,708 seconds (51% of bare metal performance)
VM 2: 10,982 seconds (63% of bare metal performance)

Without adjusting for clock speed differences:
BM: 6,925 seconds (100%)
VM 1: 11,654 seconds (59% of bare metal performance)
VM 2: 9,173 seconds (75% of bare metal performance)

Conclusion: Virtualization Performance Overhead is Very High

In the best case, ignoring CPU clocks speeds and IPC improvements, for database workloads a VM will have 3/4 of performance of a bare metal servers with a CPU of a generation 3 years older, if the number of CPU cores is kept the same. Adjusting for clock speeds, virtualization performance overheads remove for more than a third of performance of bare metal, clock-for-clock.

This is not to say that virtualisation and cloud are a bad thing. Cloud has many advantage, particularly in terms of convenience of not having to manage physical hardware yourself and spend capital on servers, especially if you don’t need as much performance as a smallest reasonable server provides. But next time you are having database performance problems in the cloud, you should remember how much performance you are giving up by having those server on cloud VMs.

In the next article, I will cover virtualization performance overheads on AMD Zen 2 Epyc CPUs, and the performance impact of mitigating CPU security flaws.