(This post was co-authored by Erik Darling.)
The more things stay the same, the more they change…
No, that’s not a mistake. In fact, it’s a reference to long-held belief systems that don’t take new information into account, and how confirmation bias is not a good motivator for recommending best practices. Let’s talk about the SQL Server configuration setting “optimize for ad hoc workloads,” first introduced in SQL Server 2008.
Before we talk about what it does, let’s break it down. The setting — which is off by default — suggests that there’s a situation where ad hoc workloads might need some optimization.
So, what is an ad hoc workload?
I’m glad you asked, Internet friend. Ad hoc is a Latin term which means “impromptu,” or “when needed.” In the context of SQL Server, an ad hoc workload has an implied meaning, and that is “non-parameterized queries.” Where this might become a problem is when we have many thousands of queries that are the same shape (meaning they query the same table or tables) but have different parameters (in other words, the conditions of the
WHERE clause might be slightly different).
This is a common issue with ORMs (object-relational mapping tools such as Entity Framework) because a query is submitted as a string and the query optimizer must parse it out. Change one character, or add in a space or a line break, and the query is considered different, even if it is identical to the previous one.
So yes, it is possible to have tens of thousands of queries that are similar enough, but that are cached individually in the plan cache. We call this “plan pollution” because what we want to see are well-parameterized queries! We want to see plan reuse! Memory is expensive!
There is a commonly held belief amongst my peers, a belief I held until very recently, that there is a way to mitigate plan pollution quickly and easily by turning on “optimize for ad hoc workloads.” What that does is turn all single-use queries into query stubs in the plan cache. If the query runs a second time, only then is the full plan cached! That’s so clever. Imagine the amount of expensive memory all those tens of thousands of single-use plans are no longer using. It’s an immediate win! After all, there’s very little evidence that turning this setting on is bad, right?
Enter Erik Darling:
and a jolly good day to everyone except people who pushed optimize for ad hoc workloads as a best practice for 15 years pic.twitter.com/waz2Q1zRuM
— Darling Data (@erikdarlingdata) January 27, 2022
As I noted previously, I’ve recently started working at Microsoft on the Database Docs team. I’m also a co-author on two SQL Server Administration Inside Out (Microsoft Press) books, and assuming contracts are in place soon, we’ll be starting on the third book in this series.
During my onboarding experience I have been watching training videos about bias. While the thrust of the training is more about how I treat people I work with, one of the things that Erik’s tweet demonstrated is that my only evidence of this setting ever working is confirmation bias. I have not extensively tested it since it was originally introduced in SQL Server 2008. I have been telling people all over the world that it’s a net win when we turn on this setting. It even says that in the two books. I know because I wrote those chapters.
My confirmation bias overrode a pretty simple way to test that this setting is actually problematic. I haven’t measured its effect lately. How do I have the temerity, the audacity, to encourage people to use SQL Server 2019 and the upcoming SQL Server 2022 if I’m still stuck in the past?
After speaking with Erik (who kindly contributed to this post), I’m changing my guidance about “optimize for ad hoc workloads” to the following:
If there is a measurable difference in memory pressure and performance by enabling “optimize for ad hoc workloads”, then turn it on. Otherwise leave it off. Consider using “forced parameterization” instead.
But you’re not here for that guidance. You like to know the why and wherefore, so let’s go a little deeper:
- Turn on Query Store. It’s going to make your lives much easier when troubleshooting performance issues. I swear by it, but we should take care in configuring it properly. There’s some great guidance from Erin Stellato if you haven’t touched it since 2016.
- If there are tens of thousands of single-use queries, then parameterize them so that we can help the query optimizer reuse existing plans. For systems with ORMs, or those vendor systems where the query text can’t be modified, that means enabling “forced parameterization.”
- Yes, as Microsoft Docs points out, “forced parameterization can cause inappropriate plan sharing in cases where a single execution plan does not make sense,” and this is where the Query Store and plan guides come in. If you’re on Azure SQL Database or Azure SQL Managed Instance, you can even use Query Store hints! (It’s likely that this functionality will be coming to SQL Server 2022, based on Microsoft’s cloud-first strategy.)
Bottom line: forced parameterization does what you think optimize for ad hoc workloads is supposed to do, but it’s still not magic.
I get it. We’ve been telling folks for years that parameter sniffing can be a problem; we try so many ways to mitigate it, and therefore you’d think we’d want to avoid forced parameterization and find another magic bullet, but… SQL Server Enterprise Edition and batch mode on rowstore says we don’t need to worry about that anymore.
Here are some reasons why “optimize for ad hoc workloads” is not helpful in the year 2022:
- Workload analysis is all but impossible because poorly performing “single-use” query plans aren’t being cached
- As hinted at previously, if the same query is running poorly, you’ll have more than one stub and still no proper plan
- Full plans are still being compiled for these “single-use” queries, even though they’re not being cached
- Stubs still count towards the total number of plans you can have cached, and large plans are rare, even with ORMs
- SQL Server is 64-bit now, and user-space memory is not limited to some value between 2 and 3 GB
“Optimize for ad hoc workloads” often hides the problem we’re trying to solve for — namely unparameterized queries — because a lot of us don’t analyze the plan cache looking for duplicate plans. Instead, we tend to ignore results with no query plan associated, because those stubs don’t show up. And, when we end up with “the same” unparameterized query running repeatedly, we might miss out on aggregate resource usage for it unless we group by a hash or handle.
I guess what I’m saying is, it depends, and that’s exactly why we are supposed to test these things out. That includes testing forced parameterization!
For instance, Michael J Swart wrote a blog post recently about measuring cost threshold for parallelism, another setting we seem to set to “50” and then move on. Read it and understand why this setting matters, especially in your environment.
There are no hard and fast rules about SQL Server. If something works beautifully in my environment then it works beautifully in my environment, but there’s no guarantee it will work in yours. That guarantee, that “best practice,” should always be limited in scope. I’m guilty of it. So are you. We provide reams of advice about managing database environments, but the most important piece of advice should be “test it, measure it.”
This is fundamental stuff. We talk about baselines, about performance metrics, about best practices, but we are all biased in some way, and it bears repeating that we will only know for sure if it’s the right decision once we’ve tested it ourselves.
Share your long held but incorrect “best practices” in the comments below.