Meltdown and Spectre Patches May Increase CPU Load [Initial Findings]
With the OS and hypervisor patches for Meltdown (CVE-2017-5754) having been released and the OS, hypervisor, and firmware patches for Spectre (CVE-2017-5715 and CVE-2017-5753) in some partial state of release depending on vendor, there have been many questions about the performance implications of updating. Because there’s low-level interaction at play for Meltdown and Spectre patches, the net result will depend significantly on factors like workload (e.g. what is it that users are actually doing), CPU architecture, OS version, hypervisor version, hardware characteristics, and so on. This means that it’s essential to have a method of benchmarking (and in some cases predicting) what the net impact may be. We’ve undertaken some initial testing to try and give some indicative guidance but, as always, this may not be reflective of your experience depending on your unique setup.
Clearly there are a wild number of variables at play in this, but for starters, we chose to begin with VDI workloads as our impression was that these (along with other shared CPU scenarios) would be the most likely to see the significant aggregate impact. While we might not be able to validate all scenarios in-house, we will attempt to benchmark common scenarios with more universal workloads, and we’ll especially focus on slightly older hardware to make it more relevant to where many enterprise customers are likely to be in their product cycle for supporting servers. Additionally, we will be posting more details on the discrete/physical workstation case when there’s more information to be drawn from the SysTrack Community.
The following is a short summary of our findings around performance impacts of Meltdown and Spectre patches on VDI workloads. You can also listen to the Lifeguard IT podcast for more background on Meltdown and Spectre. The discussion of our CPU impact findings starts at 6:40. The episode is also available on iTunes/Apple Podcasts and Google Play.
|Patch Tested||Meltdown ONLY|
|Hypervisor||VMware ESXi, 6.0.0, 6921384|
|Guest OS||Windows 10, build 15063|
|CPU||Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz|
|Disk||SSD local storage|
To try and make our workload generally representative of what normal enterprise activity would look like for more of a task-based worker (restricted to mostly browser based and basic Office product usage), we restricted our artificial load to Internet Explorer, Microsoft Word, Outlook, and Excel. Quick shout-out to Login VSI for providing the framework for the synthetic transactions.
The thought process for the testing was straightforward: let’s evaluate the same workload and density on a single VDI host with both hypervisor and OS unpatched and then subsequently patched. For the evaluation of the impact, we continuously collected data with SysTrack to monitor all the resource consumption metrics of interest as well as our own end-user experience KPIs and score.
So, let’s get straight to the interesting part: what are the results? For the unpatched load, we saw an active average (only with workload, not idle) of around 20.26% CPU usage. For patched load, we saw 21.52%. That means we’re looking at around an increase of 1.26% CPU in active load.
Now, what does that mean overall? The workload increase in CPU usage on a per VM basis is small, but as we know, that can add up very quickly when you start dealing with higher densities. More importantly, we were only doing very I/O light applications, and all indications (at least for Windows) seem to show that I/O intense workloads can present much more of a problem. So, how can you figure out whether this is going to pose a problem for your users?
How do I play the home game with Meltdown and Spectre patches?
Luckily this can be a simple three-step process, given the right Digital Experience Monitoring tool. Now, assuming you’ve got SysTrack, the process can be made even simpler with Lakeside’s new Kit for both the predictive analysis and impact outline for after the patch is applied. Based on our understanding of the primary factors at play for the mitigation methods to be used for both problems, we’ve developed a method of predicting a potential impact based on the activity from an existing system. This is best seen through our Speculation Control Kit dashboard.
This is a complex topic, and we’ll have a lot more details as soon as we start to see more of the patched system results out in the wild. Stay tuned for more from us on performance testing, and feel free to reach out with any questions.
tl;dr: Performance impact per machine is low with a light workload: roughly 1.3% per VM, but that adds up quickly with density. Expect to see more impact from Spectre, and check with us for resources on how to analyze your environment’s impact.