Virtualization performance testing tips
posted by John Troyer (VMWare) on August 20, 2008
I attended an interesting Chalk Talk today from Intel's Kshitij Doshi and Ashok Emani: "Taming the Complexity of Studying Performance Under Virtualization". They walked through the development of a server consolidation study usng the vConsolidate Framework. Performance testing of virtualized systems is fraught with complexities, especially when you're trying to say something about real-world scenarios with multiple virtual machines.
Kshitij and Ashok walked us through their methodology and experiences. To oversimplify, when you're dealing with multiple workloads on top of a hypervisor, you not only have many points of failure (like apps or testing frameworks crashing), you also now have many ways for bottlenecks to arise (in CPU, disk, I/O, memory), and often the hypervisor helpfully hides those bottlenecks in an attempt to do production-level resource allocation over all its virtual workloads.
Their list of "common sources of errors and anomolies" is worth a paper of its own, as you can tell it comes from long experience, but for this blog post let me just hit the headers of their slides on "common pitfalls." After reading this, I hope you will think twice before just firing up a quick timer on a process in a virtual machine. It's probably not telling you what you think it's telling you! (Most real-world virtualized workloads are not performance-bound, anyway, but that's a whole other conversation.)
- Time drift
- Unmonitored failures (loadsim, webbench)
- Disk space
- Spurious interrupts, network isolation
- IOPS contention
- Client instabilities (memory leaks, MTU, login failurers)
- VMM "knobs"
- Guest OS knobs -- Ticklessness (avoiding context switches)
- Application tuneables -- Java heap and large page tuneables
- Affinitization (shared caches on multicore machines can help or hurt)
- Service packs and application versions
- Client memory exhaustion
- VMM memory fragmentation, oversubscription
- BIOS or hardware "knobs"
For more information on vConsolidate, check out this paper from the Intel Developer Journal, Redefining Server Performance Characterization for Virtualization Benchmarking. Tom Adelmeyer gives some context in this video presentation on the Intel Software Network: Virtualization Performance Testing with vConsolidate.
Over at VMware, we have also developed a framework known as VMmark, and our performance team often publishes results at VROOM!, the VMware performance blog, which is interesting reading if you're a performance geek or just trying to get a sense of how hypervisors scale in 2008. Both VMware and Intel are also working together with SPEC to develop an industry-wide benchmark for virtualized systems.
Comments (3)
tagged: benchmarking, idf, idf2008, performance, virtualization


Comments
Aug 20 | Bryan Rhoads said:
Hey John - Great post and nice to have you aboard!
Aug 21 | Stephen A Readan said:
Hi John,
Having gone through the woes of Performance Testing on a virtualised environment I enjoyed reading this.
The problems we unearthed were disk access speeds been too slow - this was a GIS application with very large maps being generated. The only solution they could come up with was to de-vrtualised parts of the environment.
Regards,
Stephen
Aug 21 | John Troyer (VMWare) said:
@Stephen: virtual I/O performance has gotten better (see 100,000 I/O Operations Per Second, One ESX Host), and from what I’m seeing here at IDF with what’s coming with VMDirectPath (more on that later) and Nehalem, we will be able to have our cake (direct access to network and storage devices) and eat it too (live migration and other virtualization goodness).
Even today, most workloads are fine and performance is not an issue, but you do have to watch for the outliers, like in your case. Glad that you got to keep some of it virtual!