SQL Server and Virtualization – Part 1 – CPU

Hey,

So a hot topic in my work life at the moment is virtualization, and it’s going to become a hot topic in other peoples work lives very soon. Virtualization as a whole is a great idea, it’s a money saver, it makes hardware redundancy so much easier to manage, and there are so many behind the scenes tweaks you can make to improve your guests with little to no fuss.

Enter SQL Server.

As a DBA, I want dedicated hardware, the most expensive out there, with a team of NASA trained chimps to keep everything ticking over the way I would like, while one, wearing a fez, brings me tea. We’re simplistic folk after all. So the idea of Virtualization scared me. Putting all that trust in other non NASA trained people, no real understand of how SQL is now getting to it’s resources, having to be that fez wearing chimp providing tea to the ESX admins to get anything done, it’s a terrifying prospect. So I went off and did some research to put my mind at ease. Virtualizing SQL server can be done, but it can also go very wrong if we don’t take into consideration the differences in how it gets to it’s hardware resources. So I want to do a little series on Virtualizing SQL server, and some considerations you might have to take into account, and today I’m going to start with CPU utilization.

A brief history of rhyme

So traditionally (at a high level), SQL Server would ask Windows how many cores are available in the machine, and the SQLOS would start a CPU scheduler per core it was allowed to use. The thinking behind this was if a query came in and used Scheduler 1/CPU 1, it would cache the data there, so the next time it came in SQL would recognize this, say “Hey, I’ve seen you before, maybe you still have some data in the cache over there” and send it that way again. This is to save the CPU time gathering data again, and hopefully get some re-use out of work it has already done.

Now at the guest level on a VM, the same procedure is carried out, and as far as SQL and Windows are concerned, they’re performing the same task. Where it gets tricky, is where your Hyper Visor sends your request once it gets it.

Rather than having dedicated CPU’s, we now have virtual CPU’s presented to the OS. So while our guest is unaware, the Hyper Visor is sitting in the background load balancing our requests to protect the physical hardware, and the other VM’s on the host. So what does this concern us?

On physical hardware, our request would go:
SQL Scheduler 1 -> Physical CPU 1.

On a virtual machine, for the first request we may take the route of:

SQL Scheduler 1 -> Virtual CPU 1 -> Physical CPU 1.

Now what about the next time that query comes in? Well what if our ESX host is after giving Physical CPU 1 over to another guest which is maxing it out. Well to protect our SQL Server from suffering due to a maxed out CPU, our request may now go:

SQL Scheduler 1 -> Virtual CPU 1 -> Physical CPU 4

So the caching that was performed on query run 1 is now useless to us for query run 2, as it’s gone to a different physical CPU which will have to gather it again, undoing the hard work SQL server’s caching tried to do.

So, what can we do here?

Well, in that scenario, honestly, nothing. Your Hyper Visor admin can dedicate a CPU/core to you, but if that gets maxed out, that means you’re stuck waiting for resources regardless of any other CPU’s being idle. So it’s a risk either way.

So how do we set our server up to run optimally on VM. Well, as with anything, it depends. The general advice would be to use as few vCPU’s as you can get away with. For the reason above, losing your caching, because the guest won’t be aware of the physical hosts NUMA setup (covered in the memory session), and also, you have to play your part in protecting the overall health of the Hyper Visor.

Consider this setup. I have a physical machine with 8 CPU’s or cores. I have two Virtualized SQL Servers which on standalone hardware, had 8 cores each. I have one VM running on Physical Host 1, and one VM running on Physical Host 2. When everything is working smoothly, I can max out CPU’s on both VM’s, and have no problems. Now disaster strikes and you lose a main board in Physical Host 1. VM Number 1 fails over to Physical Host 2, and all of a sudden I’ve got 8 physical cores trying to handle 16 virtual CPU’s. Basic math tells us we’re in trouble here.

Summary

So a brief summary of what we covered, and a few smaller considerations to keep in mind when you’re planning out your deployment

1. Query performance wise, the CPU SQL thinks it has, might not always be the CPU it actually has, which can nullify all the hard work the caching does.
2. You have to plan for a worst case scenario where all of your servers are running on one physical host and not over commit on CPU resources
3. If you share your host with other VM’s, be aware that just because your server isn’t maxing out the physical CPU right now, doesn’t mean another host isn’t.
4. Start as low as you can go (2 vCPU’s) initially, and work your way up. It can be better to have 2 vCPU’s working at 60% than 4 vCPU’s working at 20%. This
A) Gives you a greater chance of being assigned the same physical CPU/cache
B) Reduces the risk of your vCPU’s having to wait to find available resources on the host machine
C) Gives you a little more piece of mind that in the event of failover, you’ll have enough physical resources
on the machine to run both of your systems

Obviously if you assign 2 vCPU’s and they get maxed out consistently, it might be time to add more. But don’t take for granted that this will solve your problem either. A poorly written query can max out 16 vCPU’s just as much as it will max 2 vCPU’s, so try to make sure your database/application is in as good a shape as possible.

These musings/suggestions are purely from my own experience and may not cover the entire range of scenarios. So I would highly recommend you also have a read of the links on this page:

http://www.brentozar.com/community/virtualization-best-practices/

Brent is a SQL MCM with a ton of SAN and Virtualization experience, and he approaches the subject in a very neutral way. If you are seriously considering Virtualization in the near future, do as much reading as you can to give yourself the best fighting chance.

As always, any questions/comments please don’t hesitate to ask. I’ll do a follow up on Memory considerations later in the week.

Good luck!

Leave a Comment Cancel Reply