Last week we looked at ACID compliance. This week we dive a little deeper into the Isolation portion, and what it all means.
Isolation levels work to manage how my transactions may be isolated from your transactions, to allow concurrency. They work by locking sections of the database until a transaction is committed or rolled back, which results in blocking other access to those sections of the database.
In other words, locking and blocking is expected behaviour in a RDBMS.
There are four isolation levels in SQL Server (as quoted from SQL Server Books Online):
- Read uncommitted (the lowest level where transactions are isolated only enough to ensure that physically corrupt data is not read)
- Read committed (Database Engine default level)
- Repeatable read
- Serializable (the highest level, where transactions are completely isolated from one another)
The default on SQL Server (Read committed) means that readers block writers, and writers block readers. If I’m performing an UPDATE
, DELETE
or INSERT
on a table when you’re trying to run a SELECT
, and they happen to be trying to access the same set of 8 KB data pages that make up that table, whichever transaction started first (the one with a lower LSN) will take precedence, and the database engine will lock all the portions of the table that are being affected. Any other operation that also tries to access those portions of the table will be blocked until the transaction is complete. Locks happen thousands of times a second on a busy system, and most of the time we barely notice.
Deadlocks occur when two operations manage to lock each other out. SQL Server will kill a deadlock by terminating the process that was using fewer resources. This means that a long-running SELECT
could easily take precedence over a short-running UPDATE
. There’s no way of knowing who the deadlock victim will be, which is why our code should always have a retry mechanism built in. Always keep in mind that concurrency, by design, must assume locking, blocking, and deadlocks.
Whither NOLOCK
?
We may have used the query hint WITH (NOLOCK)
in the past because we noticed that queries seemed to run a lot faster. That’s because the NOLOCK keyword forces the lowest possible isolation level.
Referring back to the Books Online link above (or even the Wikipedia article), Read Uncommitted, or NOLOCK
, permits the following bad behaviour:
- dirty reads – a row that is uncommitted (either has not been committed yet, or may be rolled back) will end up in your query.
- non-repeatable reads – a row might be read twice, with differing results, because the database engine might decide to delete and reinsert that row at the end of a table, for example.
- phantom reads – a row that will eventually be deleted, can also show up in your results.
Consider the reliability of these uncommitted reads. Consider the negative impact of making business or financial decisions based on these results.
Don’t use NOLOCK
on data that is changing all the time.
RCSI
There’s a fairly painless solution, though. We use NOLOCK
in the first place because the default Read Committed isolation level causes readers and writers to block each other. A database-level setting, called Read Committed Snapshot Isolation, will use tempdb to keep track of older versions of committed data.
When a data modification (INSERT
, UPDATE
, DELETE
) takes place, the database engine stores the last committed version of each affected row in an internal table on tempdb. When a SELECT
comes along, the engine redirects it to see the last known committed version instead of having to wait for the data modification to complete.
Nothing is for free, though. There is a slight performance impact, relating to the disk requirements for tempdb, along with additional space. However, in all cases that I have implemented RCSI, the benefits far outweigh the costs.
You can read more about this feature in a post by Paul White for more details.
Find me on Twitter at @bornsql if you have any dirty read stories to share.