The dire warning in the subject line is not meant to scare you. Rather, it is advice that is going to be useful to those of us who need to audit changes to a database. This is even more important now as SQL Server 2022 approaches, with its new system-versioned ledger tables which record everything.
As we already know because we are smart people, when we tell SQL Server to update zero or more rows it will do so exactly as instructed. What we might not know is that if we run an
UPDATE command against a table (or view) where the data is not changing at all, the database will still perform the update. In other words, it does exactly what you asked it to do. Instead of being the arbiter of what amounts to a business rule, if SQL Server can modify the data, it will do so (even if it replaces the existing value with the same value) and a transaction log record is generated.
Let’s say we have a policy that all modifications to our database must be recorded. This can be achieved with temporal tables, triggers, stored procedures, ledger tables, or a combination of these options. Unless you specifically write custom code to check whether the data being modified has not changed from what is already in the table, that change will occur, and transaction log records will be generated. Especially important with temporal tables or ledger tables in particular, everything that can be recorded will be committed to the transaction log and history tables.
Just how bad might it be? Let me give you a real-world example of an auditing process I’ve been investigating for some time now in order to improve performance and resource management. All data modifications are performed through stored procedures, and these stored procedures for the most part are generated using a code generator. Update and Delete procedures all follow the same pattern: take the old row, write it to the audit table in XML format, and update the table with the new row.
Unfortunately, the generated code does not account for rows that aren’t changing. For example, a process runs once per evening where 52,000 records for this customer are validated against a third party on a table with over 100 columns. The application is indiscriminate because it only has one rule: if there’s a valid row matching the surrogate key from the third party, that row from the third party must overwrite whatever is in the database. Now, over the course of just 161 days — less than six months — 52,000 records have been updated 8.4 million times. That means there are 8.4 million data modifications in the audit table in XML format (around 4KB per row), for a table that has over 100 columns. Each of those 8.4 million modifications has also been dutifully recorded in the transaction log. Backups have naturally increased in size accordingly over time. Maintenance plans are taking longer to run. Large parts of the audit table are stored in the buffer pool because it is being constantly appended to… you get the idea. At this current rate of growth, we could estimate the audit table would grow to almost 20 million rows per year, at 4KB per row.
After an in-depth analysis of the audit table and deduplication of the XML through a fairly arduous process, I identified that one half of one percent (0.5%) of the 8.4 million rows were legitimate audit records. The rest were just recording that the same data was updating, over and over again.
While I obviously can only speak from my own experience, it is rare that a table update doesn’t actually change what’s happening, so this is not something I would have considered much prior to this recent discovery. Now I have something extra to consider when designing an auditing system, and I wanted to share it with you to ensure you get the absolute best use of your database resources.
Share your thoughts in the comments below.