SQL Server and Virtualization – Part 2 – Memory

Hey,

Sorry, I had hoped to have this out last week but working on fixing the problems I want to talk about has kept me from it!

So last week I covered Virtualization and CPU’s at a high level, and this week I’d like to look at considerations you need to take when speccing out your RAM.

Something I really should have mentioned in the last blog, but is even more apt when it comes to sizing memory, is that your considerations on setting up servers now has to extend beyond the individual server you are working on. You now need to consider the effect that this will have on the virtual host, not only immediately on the host you are currently on, but what happens if a host fails and your guests all move? Now a setup that previously had two physical hosts with 64GB of RAM each, has only one host with 64GB of RAM. So if you 50GB of RAM assigned on each host, you now have 100GB of allocated memory trying to fit into 64GB of physical memory. Something has to give, and unfortunately, it’s probably going to be everything! (Side note, your guests will still work, however there will be memory paging out to disk at the host level, so you’ll see some pretty rough performance problems unless your swap is on an SSD or something). For a bit of background on this :

When the Hyper Visor comes under memory pressure, an operation called the Balloon Driver comes in to play. This tells your guest OS’s “Hey, I’m going to need all of your available memory for a big operation, give me back what you can”. In the case of SQL Server, being the helpful type that it is, it says “Hey, I could use this RAM, but you can have it, hugs and kisses, SQL”. So it releases the memory to this phantom operation on your guest which is demanding all of this memory, and as the Balloon driver is a fake operation, it just throws the memory away and reduces the pressure on physical memory in the host.

So this Balloon driver is great to protect the host, but if you haven’t set your Minimum Memory setting in SQL Server to reserve some memory for SQL, your server is now going to have to start pulling pages back into memory, and trying to reclaim some RAM for itself. So all of the hard work caching your data pages is all for naught, and you have a very tired server.

The other consideration when deciding how much memory to give your guest server, is how much memory is contained per NUMA node. A VERY high level description of NUMA : In modern servers each physical CPU socket has a memory bank which it can access data in quicker. It can still access data in other banks, but it takes a slower route to get there, so pulling data back from memory can take longer if it takes the slow boat.

So for example if you have 2 physical CPU’s, with 16GB of memory across the two banks, CPU 1 will access it’s 8GB’s faster than it will access the 8GB paired with CPU 2. So if you give it 10GB of RAM, it can access it, but it will get to the last 2 GB’s slower than the first 8GB (Crude example).

So why do we care? If you have a physical server as outlined above, you have assigned it 2 cores which reside on physical CPU 1, and you give your guest 10GB of RAM, then you run in to the scenario above.

Now in ESX 4.1, things have gotten a lot smarter. It is NUMA aware, and unless you specifically tell it to use 2 cores on CPU 1, it is smart enough to know that it will start having to cross NUMA nodes, and will load balance accordingly (1 core on each CPU, 5GB of RAM from each pool). But it is still something to consider.

So a summary and one other bone of contention. Reservations. It is possible to set a memory reservation for your guest on your Hyper Visor. What this means is that no matter what else is blowing up on your host, your server will always be given X amount of RAM depending on your reservation. In a perfect world you would have everything you have assigned your guest reserved, but that most likely won’t be possible, as it will kill any other guests on the physical host who need RAM.

What I would consider a decent compromise is that if know that your SQL Instance needs a minimum of 4GB to run, you would assign a reservation of 6GB. So set your Minimum Memory in SQL server to 4GB, and that still leaves 2GB for the OS to keep everything alive. Then when the pressure on the host eases, it can start to reclaim some of the other RAM.

The debate here is that if you set a reservation, then your server can only fail over to a host with enough physical memory free to facilitate that reservation. So, if you have 6GB reserved, and your failover host only has 4GB free, your guest gets a bullet to protect the rest of the guests. So from the Virtualization guys perspective, you’re just a pain in the neck who is screwing up his High Availability plan (and he’s not wrong, but still, we have to protect our babies)

This area more than most needs careful planning and discussion with your Virtualization guys/vendor. It’s so easy to get it wrong, it’s worth spending the extra time on.

So to summarize:

  • Plan, plan, then plan again, then throw it all away and plan it again
  • Be aware of the physical layout of your server, NUMA nodes can bring performance troubles
  • Keep in mind that you are now speccing servers to fit into an estate, not just individual requirements. Be aware of your surroundings
  • Reservations can be good for you, but can be bad for the estate. Talk to your VM guys
Virtualization, more than any other project I’ve worked on, is a complete team effort. The integration between DBA’s, Sys Admins and SAN admins is so tight in a virtualization project, that you all have to have a base knowledge of what the other guys are doing, and you all have to work together to strike a compromise that meets all of your needs.
I’ll leave you with a link to the VMWare white paper on memory management in ESX, and I’ll do a short one on disk allocations in ESX next week.
Best of luck!
Stephen

5 thoughts on “SQL Server and Virtualization – Part 2 – Memory”

  1. Good article!

    Don't forget TPS (Transparent Page Sharing) a type of memory page compression technique in the hypervisor that allows you to safely (according to VMware at least) overcommit memory on hosts. In my experience by around 20% but your mileage may vary.

    http://kb.vmware.com/kb/1021095

    I would also advise that DBAs familiarise themselves with the Admission Control techniques employed within an ESX(i) cluster. I prefer the cluster percentage option but "host failures cluster tolerates" is the default. This can make resource management trickier to calculate especially if database server VMs have outsized reservations compared to other VMs in the cluster or hosts have unbalanced resource levels.

    http://www.yellow-bricks.com/vmware-high-availability-deepdiv/#HA-admission

    Reservations are always honoured so if the VM admin is happy to have them, then definitely take advantage. Just be sure admission control is strictly enforced so your database server will always be restarted in the event of a host failure!

    Cheers,
    Brian

  2. Hi Brian,

    Thanks for the feedback! Your comment proves exactly what I was talking about regarding the need for close integration between teams while managing a project like this. As a DBA I want every CPU you can fit in there, as much memory as I can buy, and 42 mirrored SANS. But the reality is, that it's a delicate balancing act to get true performance and reliability, particularly in a Virtualized environment.

    I wouldn't have enough VM experience or knowledge to implement the advanced methods you mentioned there, but thankfully, you do! It really is a team effort to try to get this right.

  3. No worries, VMware sysadmin was my former job – it can be both a boon and a curse! You're spot on, teamwork is definitely key although it probably does not hurt to keep a close eye on VM Memory and VM Processor stats in Perfmon either 😉

  4. Unfortunately they have them blocked as part of the ESX hardening in my place, bloody compliance!

    I should mention as well, that with Transparent Page Sharing you don't get as much benefit with SQL Servers as you do with App servers. Most of the data cached in memory is going to be from unique databases, so you won't get a whole lot of overlap. So just be careful if you're allowing room with over committing based on the effectiveness of TPS with app servers.

    The joys of databases!

  5. Currently I work for Dell and thought your blog on virtualization is quite impressive. I think virtualization, in computing, is the creation of a virtual (rather than actual) version of something, such as a hardware platform, operating system, a storage device or network resources.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.