A quick(er) post this week, in response to Greg Low’s blog post from a few weeks ago titled “Don’t start identity columns or sequences with large negative values.”
Greg writes that you shouldn’t use large negative values in a table, because… it’s hard to read them, I guess? And also they don’t compress well.
I disagree … to a degree. Dang, words are hard. Anyway, when I design a table I create what’s called a surrogate key as my primary key, which is a value that is intended for the table to uniquely identify a row so that it participates in relational activities like joins and foreign keys in an efficient way. In other words the identity column is not for me, it’s for the database engine. I don’t need to worry about what row a value has. I choose the data type for that identity column based on the estimated number of rows, not whether I can memorize that a
[StatusID] of 5 means something. Magic numbers are bad, mmmkay?
If a table will never have more than 255 values, I use a
TINYINT and start the identity counter at zero. If it will have fewer than 32,767 rows, I use a
SMALLINT, and if it will have fewer than 2.1 billion rows, I use an
INT. If for some reason there may be more than 2 billion rows, I then reach for a
BIGINT and hope row compression is turned on.
So far, so good. Greg and I agree on choosing the right data type for the purpose of the identity column, and that in most cases starting the counter at zero (and therefore “losing” all the possible values below zero) is naught much of a loss. Sorry, I’ll try harder next week.
There are cases where you may have far fewer than 2.1 billion rows in your table, but the amount of data churn (rows deleted and rows inserted) can cause the identity value to exceed the
INT limit. In cases like this there are a few options. You could alter the table to increase the
INT to a
BIGINT, and on some versions of SQL Server (especially if you have data compression enabled), this could be a metadata-only operation and finish instantly.
Or you could reseed the identity column to start counting at negative 2.1 billion (the “large negative value” in Greg’s post). Guess what? You’ve just bought yourself a whole lot of time by doing that, and you didn’t need to change your data type.
What happens if you already have two billion rows in your table? Greg suggests that you should be using data compression anyway, and that switching to a
BIGINT here is all that’s needed as the compression algorithm will only use the bytes it needs. Unfortunately row compression isn’t available on non-Enterprise editions prior to SQL Server 2016 with Service Pack 1. If you’re on SQL Server 2012 or 2014, too bad for you.
So your options are:
- you can alter the column to be double the size (at four extra bytes times two billion rows, that’s instantly another 8.5 GB you’ll need), plus a size-of-data operation which will require downtime to make the modification unless you’re very smart; or
- you can reseed the identity column to a large negative value
At least the second option allows you to bring your system online as fast as possible and get your manager off your back. Then you can use that extra time to come up with a plan.
Share your thoughts in the comments below.