Why are indexes good for your server?

Hey,

It’s quite a well known fact that indexes can be good for your queries. They offer a shorter route to your data, and take some load off your server. They achieve this by creating a Balanced Tree (B-Tree) with pointers to your data. So in a table with customer names, if you search for “Mark”, instead of doing a full table scan to check all data for the name “Mark”, an index on the name column will have all occurrences of the name “Mark”, and can take a direct route to get your data.

But why does this make your server happy?

I’m going to be working with some fabricated data for the example. All tables/indexes/data I’m using are available here if you want to play along! Sorry for the download site, I don’t have proper web hosting yet.

In my example I’m using Regions, but it can be any data you like to test the principle.

I’ll cycle through the steps, and then go through the output.

  1. Clear your buffer using DBCC DROPCLEANBUFFERS – http://msdn.microsoft.com/en-us/library/ms187762.aspx
  2. Before your query, you want to Set Statistics IO on – http://msdn.microsoft.com/en-us/library/ms184361.aspx
  3. Your query in full should look like :
    1. Set Statistics IO on
    2. Select Region from dbo.IndexTest
  4. Turn on “Include Actual Execution Plan” in your query tool bar to verify the route the query is taking
  5. Run the query and examine
For the first run, I have nothing on the table but the Clustered Index created by my Primary Key. This forces a Clustered Index (essentially table) scan to retrieve my data. On my table this gives me the following results for the Statistics IO (Shown in the “Messages” tab of your query results)
Table Scan – logical reads 164, physical reads 2
This means that I’m having to pull 164 pages into my Data Cache/Buffer Pool in order to get my results back. Multiply this by 8K and I’m using roughly 1312KB of memory/space to pull back my results. This is because it had to trawl every piece of data in that table in order to find the relevant data.
At this point, apply the index to your column (provided in the scripts above). Now run the command “Checkpoint” to write your changes to disk. I’ll run through the steps above again, and check the Statistics I/O output again. This time I’ve got :
Index – logical reads 99, physical reads 2

So with the inclusion of the index, I’ve reduced the amount of pages manipulated to 99, and I’m only using 792KB of memory/space. It’s exactly the same table, exactly the same amount of rows returned, but with the inclusion of an index, I’ve almost halved the amount of resources required to get my query back.

This example is obviously on a very small scale, and you’d be right not to be worried about saving yourself 600KB of memory on a production server. But scale it up by 100, or 1000, and suddenly the resources being saved are extremely significant.

This is just something I found interesting when I started digging a bit deeper, I always knew indexes were good for queries, but I never really thought to ask why!

So a slightly random blog post this time around, but it’s something I’ve meant to cover for a while. The more you know and all that!

Stephen

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.