Performance as a feature, software developer edition

Relational database management systems (RDBMS) like SQL Server and Azure SQL Database are very good at managing normalized data. Efficient storage and retrieval of data is the name of the game, so performance is a feature. That’s why SQL Server (and other RDBMS products in the market) keep as much of the data in memory as needed, and why more memory is usually a good thing for getting our data out faster.

Unfortunately, having lots of memory and CPU capacity does not cover all scenarios of poor performance in the real world.

The case of the .NET application

Let’s say that we have an application that needs to retrieve 50,000 rows from a table for a month-end report. The table has been designed correctly: it is narrow; it has a unique, ever-increasing integer for its Primary Key; and the table is clustered on that column. So far, so good.

Our application can fetch this data in many ways, but the developer made a call during the design phase several years ago that memory was at a premium on the client side, and asking for too much data at the time might cause the application to crash due to resource constraints. As a result, the application asks for one row at a time from the database engine. Jeff Moden calls this “RBAR,” or “row-by-agonizing-row,” because SQL Server is specifically engineered to bring back data in sets.

In our baseline of this report generation, we find that it takes about an hour — sometimes longer — to display the data in a grid and generate the associated report in HTML.

Was the RBAR design decision several years ago the correct one in retrospect? Looking at the source code which is written in one of the .NET languages, we see a SqlDataReader object iterating over each row from the query (“SELECT * FROM table“) and doing something with each row. Clearly this is the bottleneck, because we see from SQL Server that it is basically sitting idle, delivering one row at a time.

So we branch the the code in source control, and start writing a helper class to manage the data for us closer to the application. We throw in a SqlDataAdapter, use the Fill() method to bring back all the rows from the query in one go, and then use a caching layer to keep it in memory in case we need it again. SQL Server’s part in this story has now faded into the background. This narrow table consumes a tiny 8 MB of RAM, and having two or more copies in memory isn’t the end of the world for our testing. So far, so good again.

We run the new code, first stepping through to make sure that it still does what it used to, massaging here and there so that in the end, a grid is populated on the application with the results of the query. Success! We then compile it in Release mode, and run it without any breakpoints to do some performance testing.

And then we find that it runs at exactly the same speed to produce exactly the same report, using our caching and SqlDataAdapter, and we’ve wasted another hour of our time waiting for the grid and report. Where did we go wrong?

Premature optimization

So we crack open the code again with a code profiler, and take a closer look at what happens after the data is retrieved from the database. We see the grid population is a lot quicker than it used to be, saving two minutes. Good job! The caching layer added a few milliseconds, but nothing worth worrying about, relatively speaking.

Our attention moves to the HTML report generation which happens after the grid is populated.

There's your problem

The table from SQL Server is placed into an array, which the report generator loops through, doing HTML things like adding tags and so on, building up a String object for each loop iteration.

One of the first things I learned as a software developer (we can’t legally call ourselves “engineers” in Canada) is that strings are immutable in .NET (and many other programming languages). This means that a string in memory cannot be modified. If you add or change a string, a copy is made of that string in memory, and the copy is what you then work with. In memory-managed languages like .NET and Java, a garbage collector will remove the old copies of the string. This can severely impact performance, especially if we are building up what amounts to 50,000 strings in a short amount of time, each one progressively longer than the last.

Mutability for the win

.NET has a mutable string object called StringBuilder, which was created for this kind of operation. If you are doing concatenation of strings in a loop, the StringBuilder object is far more efficient because of how it allocates memory.

Once we replace the string concatenation with a StringBuilder.Append(), we find that the report generation happens in under a minute.

Summary

SQL Server is really good at producing sets of data. .NET and other software development platforms are really good at consuming that data. And then we find that some languages don’t work the same way as others, especially at scale. We assumed the performance issues were related to SQL Server and RBAR. While significant, they were not the root cause of the slow application.

The moral of the story is that there is often not one single cause of slow performance. In this case, adding the caching layer, bringing back the full dataset, and optimizing the string concatenation portion all contributed to making the application much faster, at a cost of slightly more memory. In fact, the caching layer could probably be removed if the data is not queried repeatedly, or it changes often.

Share your premature optimization stories with me on Twitter at @bornsql.

P.S. On July 1st, I was renewed as a Microsoft Data Platform MVP. Thank you, Microsoft.

2 thoughts on “Performance as a feature, software developer edition”

Sean Redmond July 11, 2018 at 8:21 am

Do I understand this correctly, a developer is pulling 50,000 rows using Entity Framework and displaying the lot on a webpage? This sounds like madness.

If this is a regular occurrence, why wouldn’t he use a stored-procedure, cast the non-strings as strings, add the HTML-elements as necessary and serve the lot straight to the front-end? 60s seems like a long time for this. The SP will need maybe hundreds of milliseconds, depending on the complexity of the task.

I understand that I am breaking a cardinal rule (that the application server should do the application server’s job) but the application server does seem slow if it needs 60s for this.
1. randolph July 11, 2018 at 9:30 am
  
  I was intentionally vague on details. There was no mention of specific technology aside from .NET because the specifics don’t matter. Thanks for your comment.

Comments are closed.