Be careful with table updates

Update

The dire warning in the subject line is not meant to scare you. Rather, it is advice that is going to be useful to those of us who need to audit changes to a database. This is even more important now as SQL Server 2022 approaches, with its new system-versioned ledger tables which record everything.

As we already know because we are smart people, when we tell SQL Server to update zero or more rows it will do so exactly as instructed. What we might not know is that if we run an UPDATE command against a table (or view) where the data is not changing at all, the database will still perform the update. In other words, it does exactly what you asked it to do. Instead of being the arbiter of what amounts to a business rule, if SQL Server can modify the data, it will do so (even if it replaces the existing value with the same value) and a transaction log record is generated.

Let’s say we have a policy that all modifications to our database must be recorded. This can be achieved with temporal tables, triggers, stored procedures, ledger tables, or a combination of these options. Unless you specifically write custom code to check whether the data being modified has not changed from what is already in the table, that change will occur, and transaction log records will be generated. Especially important with temporal tables or ledger tables in particular, everything that can be recorded will be committed to the transaction log and history tables.

Just how bad might it be? Let me give you a real-world example of an auditing process I’ve been investigating for some time now in order to improve performance and resource management. All data modifications are performed through stored procedures, and these stored procedures for the most part are generated using a code generator. Update and Delete procedures all follow the same pattern: take the old row, write it to the audit table in XML format, and update the table with the new row.

Unfortunately, the generated code does not account for rows that aren’t changing. For example, a process runs once per evening where 52,000 records for this customer are validated against a third party on a table with over 100 columns. The application is indiscriminate because it only has one rule: if there’s a valid row matching the surrogate key from the third party, that row from the third party must overwrite whatever is in the database. Now, over the course of just 161 days — less than six months — 52,000 records have been updated 8.4 million times. That means there are 8.4 million data modifications in the audit table in XML format (around 4KB per row), for a table that has over 100 columns. Each of those 8.4 million modifications has also been dutifully recorded in the transaction log. Backups have naturally increased in size accordingly over time. Maintenance plans are taking longer to run. Large parts of the audit table are stored in the buffer pool because it is being constantly appended to… you get the idea. At this current rate of growth, we could estimate the audit table would grow to almost 20 million rows per year, at 4KB per row.

After an in-depth analysis of the audit table and deduplication of the XML through a fairly arduous process, I identified that one half of one percent (0.5%) of the 8.4 million rows were legitimate audit records. The rest were just recording that the same data was updating, over and over again.

While I obviously can only speak from my own experience, it is rare that a table update doesn’t actually change what’s happening, so this is not something I would have considered much prior to this recent discovery. Now I have something extra to consider when designing an auditing system, and I wanted to share it with you to ensure you get the absolute best use of your database resources.

Share your thoughts in the comments below.

Photo by Markus Winkler on Unsplash.

2 thoughts on “Be careful with table updates

  • That is a really interesting find and a really weird “feature”. I do get it from a coding perspective though on the SQL engine side of things – if user X says to update A to A, I will update it. Why add the extra logic layer into the SQL engine when it could just as easily be validated in the stored procedure side of things.
    But then again, why bother worrying about it in the stored procedure because changing A to A doesn’t “harm” the data in any way (unless you have a calculated “last modified” column, or any of the things you mentioned above) in which case things can go sideways fast.
    It is definitely not something I had thought of before but is now on my radar. I have a service broker thing set up that watches for changes in a few specific tables and if the data changes (ie trigger gets fired), a service broker message goes out telling other systems that the value changed and then the other systems update their data accordingly. I imagine this is being hit by the same thing you mentioned and I just haven’t noticed because we don’t have a lot of messages going out. But, like with your example, Service Broker uses XML for the messages and we keep a history so we have an audit trail of when values should have changed, so our audit table gets large quickly. The audit table (thankfully) isn’t used by auditors, just internal for when end users say “the data is out of sync” or “the data looks wrong”, then I can jump in and see if a message got stuck, processed out of order, or is in the queue and the end user needs to wait a second or 2 and the data will be good. USUALLY it is just the end user needed to be a bit more patient.

    I now have a new thing to check out and probably modify. Downside is this is now a bit of work to dig through the code… plus side is that performance should improve of our service broker as we will have fewer messages jumping around. The fun part is going to be adding logic to the trigger without causing a performance hit… The system that is sending out the most messages is also the most fussy and gets grumpy when things don’t go how it expects them to…

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: