(You can see previous T-SQL Tuesday retrospectives by visiting this link.)
In September 2010, Michael J. Swart (blog | Twitter) invited us to talk about indexes.
Indexes are strange things. You never need to explicitly create one to create a fully-functional database, but if you want a database to perform well, they’re indispensable.
Many words have been written and said on this topic, including my own. In fact, I did a two-part podcast interview with SQL Data Partners a few years ago on the topic of indexes. I took the opportunity to rediscover what I’d written on this blog, considering the importance of indexes (or indices as some people like to say).
In 2012 I developed a SQL Server 2000 Duplicate Index Finder, which I based off Kimberly L Tripp’s Duplicate Index Finder for SQL Server 2008 and higher. At the time I was supporting SQL Server 2000 and wanted something that worked the same way on that version. I’ll be the first to assume I’m the only person in the world who used it, but it was useful to find duplicate indexes.
These days SQL Server lets you build up to 999 non-clustered indexes on a table. Even more exciting is that the database engine doesn’t care if you build 999 of the exact same index either, because SQL Server is not the boss of you. Nevertheless, those duplicate indexes — or indexes that are so close to being duplicates that it’s just a matter of column order or included columns — should be avoided for a number of reasons:
- Storage: duplicate indexes take up unnecessary space
- Maintenance: if you have regular index and statistic maintenance, duplicate indexes must be maintained too
- Memory: duplicate indexes may take up space in the buffer pool unnecessarily, if the engine decides to use respective ones in different execution plans
- Transaction log: indexes that are built and maintained also take up space in the transaction log, which will be shipped to secondary servers in an availability group
Reduce, reuse, recycle
To limit the amount of duplicate work that your indexes might be doing, you have to identify the duplicates and either eliminate them (remember to disable them before dropping them completely), or merge them into other indexes that are doing similar work1In some cases you might have to hang on to indexes that look similar but have different left-leading columns and that’s fine..
- Check out Kimberly’s Duplicate Index Finder. It works by looking at included columns as well, unlike the built-in
sp_helpindex
system stored procedure. sp_BlitzIndex
is a community-maintained script as part of Brent Ozar’s First Responder Kit. It helps you identify indexes in a similar way to Duplicate Index Finder.
Care and feeding
Along with sp_BlitzIndex
, you can also check out Glenn Berry’s diagnostic scripts, which help you design better indexes based on the missing indexes reported by SQL Server. This goes a long way to understanding your database and helping the query optimizer make the best decisions. Remember that the primary goal of a fast database is to have the least amount of data in memory as possible. You achieve this by normalizing your database, and indexing the primary and foreign keys (SQL Server does not index foreign keys by default).
While the table designer in SQL Server Management Studio (SSMS) tends to make the decision for you, don’t feel obligated to make your primary key a clustered index if it doesn’t make sense. A clustered index is included in every non-clustered index for that table, so you want it to be as narrow and unique as possible. A GUID might make an excellent unique value for a primary key, but it is 16 bytes wide compared to an integer (4 bytes) or a BIGINT (8 bytes) so it will take up a lot of space for millions of rows.
Indexes are more of an art form than a science. They depend not only on the data distribution of your database, but also how it is queried. Don’t be scared to experiment.
Share your indexing stories in the comments below.
Photo by Fredy Jacob on Unsplash.