Acids and Databases: A look at ACID and transactions

Relational database management systems (RDBMS) such as SQL Server, Oracle, MySQL, and PostgreSQL use transactions to allow concurrent users to select, insert, update, and delete data without affecting everyone else.

An RDBMS is considered ACID-compliant if it can guarantee data integrity during transactions under the following conditions:

ACID

Atomic – transactions are all-or-nothing. If something goes wrong, the entire transaction should go back to a previously known state.
Consistent – the data has to follow the internal rules of the database structure, including (but not limited to) data types, foreign key constraints, and default constraints. When we see a failure due to this type of error, the database must revert back to a known state again.
Isolated – when I run an operation, it should not impact how you run an operation. My operations are isolated from yours. That does not mean that you and I cannot update the same column in the same table. If your transaction starts before mine, your change will go through first, and then mine will go through. Isolation does not prevent more than one change on the same data, nor is it designed to.
Durable – once the transaction has been committed, that data is now persisted, and any failure that occurs after the commit will not affect the durability of the operation.

Transaction Log

This is why the transaction log on SQL Server is so important. It doesn’t just keep track of successful changes in the database. Every single modification in the database must be recorded in the transaction log as it occurs, and each modification has to have a start and an end marker.

If there’s a failure, the rollback will “undo” all the logs to that point from the starting marker (called an LSN, or Log Sequence Number in SQL Server). Yes, that generates even more logs as it makes changes to the database to perform the rollback, until the database is in a state that is equivalent to the moment just before the transaction began.

There are cases where operations from the original query may cause changes to the physical database and file structure, which may not be undone exactly, but the rollback will ensure that the database is at least in a consistent state and that any data modifications performed during a failed transaction are reversed.

Crash Recovery

When SQL Server starts up, sometimes it can take a while for a database to come back online again. That’s because it goes through a process called crash recovery, because there’s no way for the service to know with 100% certainty that it started cleanly for each database.

Think of the transaction log as a replay button.

The transaction log is scanned for active portions which contain information about transactions that were not yet committed. If they contain an end marker, the transaction is rolled forward (it runs the transaction again and modifies the database so that it matches what’s in the transaction log). If there is no end marker, or the original transaction was rolled back, that transaction is rolled back again.

The same process happens when a database is restored from a backup because there may have been transactions in progress when the backup started, some of which may have ended by the time the backup completed.

Implicit vs. Explicit Commit

While some RDBMS like Oracle may require explicit commits (in other words, adding a COMMIT keyword to the end of an operation) to end a transaction, SQL Server has implicit commits by default (no COMMIT keyword is required). We have to be mindful of this when moving between certain platforms. On SQL Server, we need to be sure that the change we are making in a production environment is the right one, or at least test our changes beforehand.

Stay tuned next week when we discuss locking, blocking, isolation levels, and NOLOCK.

Find me on Twitter at @bornsql.