System-versioned ledger tables: the next step

In the first post of this series, we learned about a new type of system-versioned table that also works at the database level and introduces a mechanism that demonstrates whether your database has been tampered with. Very simply, if the cryptographic hash does not match what is in the off-site digest, your database has been tampered with and there are forensic tools to establish when they diverged based on transaction IDs.

Two types of table

This week we will look at the different types of ledger table: append-only and updatable.

Unlike temporal tables, a ledger table can be append-only which makes it immutable. You can only insert data and therefore it does not need a history table. In fact, you may be using append-only tables in your data warehouse already. While this is secure, it may not be practical.

In a regular transactional database (also known as a system of record, or SOR) you can have updatable ledger tables. Things are a little trickier, so — just like with temporal tables — keeping track of changes in a history table has a cost associated with it in terms of storage, CPU, memory, and limited data types.

With an updatable ledger table you can use the system-generated ledger view for that table to query the history in a meaningful way. The reason the view is necessary is because changes in ledger tables are transactional, not row based. While temporal tables let you look at the start and end columns to figure out a change and write an appropriate temporal SELECT query, row updates in updatable ledger tables are transactional.

Extra columns

Four columns are needed in an updatable ledger table, and you can customize their names if you wish:

- ledger_transaction_id: BIGINT
- ledger_sequence_number: BIGINT
- ledger_operation_type_id: TINYINT
- ledger_operation_type_desc: NVARCHAR(128)

The ledger_operation_type columns are a little strange to me. It makes sense to have the type of operation in the ID column (which for now can only be 0 or 1 for insert or delete respectively¹), but then the database engine writes out INSERT or DELETE into the description column as well. This feels like an unnecessary waste of space, especially in a table with millions of rows.

Note that the ledger_operation_type columns are not needed in append-only ledger tables, where the only possible operation is an INSERT.

The database ledger

Whether you have one table or the entire database ledger-enabled, you need to store the actual chain of cryptographic hashes (the blockchain), and each of these hashes operates on the transaction level.

In a relational database management server like SQL Server (the same engine which powers Azure SQL Database), there are four conditions for a database transaction: atomicity, consistency, isolation, and durability (which is where the term ACID compliance comes from). Since the transaction is the smallest possible unit of work in a database, this is the level on which the ledger feature operates.

I’ll quote directly from the official documentation here:

Every 30 seconds, the transactions that the database processes are SHA-256 hashed together through a Merkle tree data structure. The result is a root hash that forms a block. The block is then SHA-256 hashed through the root hash of the block, along with the root hash of the previous block as input to the hash function. That hashing forms a blockchain.

Thus, the state of the database ledger can be inferred from the most recent hash: the state of the database now is only possible through the transactional events that produced the current hash. This state (also called the database digest) is then copied off-site to an immutable data store for later comparative purposes during forensic analysis. As with all things security, it is critical that this infrastructure is kept separate, so if you do elect to store the digest in a location of your choice, keep this in mind.

Tamper evidence

Anyone with the right permissions (including an attacker who engineers a privilege escalation) can alter the state of a database. With the ledger verification process, you can demonstrate that tampering has occurred. This is a resource-intensive process because all hashes must be recomputed to compare against the off-site copy of the database digest, but it can show when the hashes diverged and with additional auditing can be mapped back to a particular user or application using the transaction information when that change occurred.

The moral of the story is that defence in depth is the best way to protect your database. While ledger tables are an extremely valuable way to prove tampering, you also need to have auditing at the application and network layers to be able to correlate access.

Stay tuned for next time when we dive into the limitations of ledger tables. In the meantime, share your thoughts in the comments below.

Business photo created by xb100 – www.freepik.com