Don’t optimize for ad hoc workloads as a best practice

a coffee filtering process

(This post was co-authored by Erik Darling.)

The more things stay the same, the more they change…

No, that’s not a mistake. In fact, it’s a reference to long-held belief systems that don’t take new information into account, and how confirmation bias is not a good motivator for recommending best practices. Let’s talk about the SQL Server configuration setting “optimize for ad hoc workloads,” first introduced in SQL Server 2008.

Before we talk about what it does, let’s break it down. The setting — which is off by default — suggests that there’s a situation where ad hoc workloads might need some optimization.

So, what is an ad hoc workload?

I’m glad you asked, Internet friend. Ad hoc is a Latin term which means “impromptu,” or “when needed.” In the context of SQL Server, an ad hoc workload has an implied meaning, and that is “non-parameterized queries.” Where this might become a problem is when we have many thousands of queries that are the same shape (meaning they query the same table or tables) but have different parameters (in other words, the conditions of the WHERE clause might be slightly different).

This is a common issue with ORMs (object-relational mapping tools such as Entity Framework) because a query is submitted as a string and the query optimizer must parse it out. Change one character, or add in a space or a line break, and the query is considered different, even if it is identical to the previous one.

So yes, it is possible to have tens of thousands of queries that are similar enough, but that are cached individually in the plan cache. We call this “plan pollution” because what we want to see are well-parameterized queries! We want to see plan reuse! Memory is expensive!

There is a commonly held belief amongst my peers, a belief I held until very recently, that there is a way to mitigate plan pollution quickly and easily by turning on “optimize for ad hoc workloads.” What that does is turn all single-use queries into query stubs in the plan cache. If the query runs a second time, only then is the full plan cached! That’s so clever. Imagine the amount of expensive memory all those tens of thousands of single-use plans are no longer using. It’s an immediate win! After all, there’s very little evidence that turning this setting on is bad, right?

Enter Erik Darling:

As I noted previously, I’ve recently started working at Microsoft on the Database Docs team. I’m also a co-author on two SQL Server Administration Inside Out (Microsoft Press) books, and assuming contracts are in place soon, we’ll be starting on the third book in this series.

During my onboarding experience I have been watching training videos about bias. While the thrust of the training is more about how I treat people I work with, one of the things that Erik’s tweet demonstrated is that my only evidence of this setting ever working is confirmation bias. I have not extensively tested it since it was originally introduced in SQL Server 2008. I have been telling people all over the world that it’s a net win when we turn on this setting. It even says that in the two books. I know because I wrote those chapters.

My confirmation bias overrode a pretty simple way to test that this setting is actually problematic. I haven’t measured its effect lately. How do I have the temerity, the audacity, to encourage people to use SQL Server 2019 and the upcoming SQL Server 2022 if I’m still stuck in the past?

After speaking with Erik (who kindly contributed to this post), I’m changing my guidance about “optimize for ad hoc workloads” to the following:

If there is a measurable difference in memory pressure and performance by enabling “optimize for ad hoc workloads”, then turn it on. Otherwise leave it off. Consider using “forced parameterization” instead.

But you’re not here for that guidance. You like to know the why and wherefore, so let’s go a little deeper:

  1. Turn on Query Store. It’s going to make your lives much easier when troubleshooting performance issues. I swear by it, but we should take care in configuring it properly. There’s some great guidance from Erin Stellato if you haven’t touched it since 2016.
  2. If there are tens of thousands of single-use queries, then parameterize them so that we can help the query optimizer reuse existing plans. For systems with ORMs, or those vendor systems where the query text can’t be modified, that means enabling “forced parameterization.”
  3. Yes, as Microsoft Docs points out, “forced parameterization can cause inappropriate plan sharing in cases where a single execution plan does not make sense,” and this is where the Query Store and plan guides come in. If you’re on Azure SQL Database or Azure SQL Managed Instance, you can even use Query Store hints! (It’s likely that this functionality will be coming to SQL Server 2022, based on Microsoft’s cloud-first strategy.)

Bottom line: forced parameterization does what you think optimize for ad hoc workloads is supposed to do, but it’s still not magic.

I get it. We’ve been telling folks for years that parameter sniffing can be a problem; we try so many ways to mitigate it, and therefore you’d think we’d want to avoid forced parameterization and find another magic bullet, but… SQL Server Enterprise Edition and batch mode on rowstore says we don’t need to worry about that anymore.

Here are some reasons why “optimize for ad hoc workloads” is not helpful in the year 2022:

  • Workload analysis is all but impossible because poorly performing “single-use” query plans aren’t being cached
  • As hinted at previously, if the same query is running poorly, you’ll have more than one stub and still no proper plan
  • Full plans are still being compiled for these “single-use” queries, even though they’re not being cached
  • Stubs still count towards the total number of plans you can have cached, and large plans are rare, even with ORMs
  • SQL Server is 64-bit now, and user-space memory is not limited to some value between 2 and 3 GB

“Optimize for ad hoc workloads” often hides the problem we’re trying to solve for — namely unparameterized queries — because a lot of us don’t analyze the plan cache looking for duplicate plans. Instead, we tend to ignore results with no query plan associated, because those stubs don’t show up. And, when we end up with “the same” unparameterized query running repeatedly, we might miss out on aggregate resource usage for it unless we group by a hash or handle.

I guess what I’m saying is, it depends, and that’s exactly why we are supposed to test these things out. That includes testing forced parameterization!

For instance, Michael J Swart wrote a blog post recently about measuring cost threshold for parallelism, another setting we seem to set to “50” and then move on. Read it and understand why this setting matters, especially in your environment.

There are no hard and fast rules about SQL Server. If something works beautifully in my environment then it works beautifully in my environment, but there’s no guarantee it will work in yours. That guarantee, that “best practice,” should always be limited in scope. I’m guilty of it. So are you. We provide reams of advice about managing database environments, but the most important piece of advice should be “test it, measure it.”

This is fundamental stuff. We talk about baselines, about performance metrics, about best practices, but we are all biased in some way, and it bears repeating that we will only know for sure if it’s the right decision once we’ve tested it ourselves.

Share your long held but incorrect “best practices” in the comments below.

Photo by Nathan Dumlao on Unsplash.

8 thoughts on “Don’t optimize for ad hoc workloads as a best practice

  • I am guilty of the ad-hoc workflows setting. Thankfully, I haven’t turned it on in every system, I just turned it on where I saw a lot of single use queries. I am one of those who is hesitant to change anything on live until I confirm there is minimal impact on a test system. And even then, I am sometimes hesitant to do the change on live if I don’t see a benefit.
    I did turn on ad-hoc workloads on our ERP system’s database, but that thing has a lot of single use queries baked in that would not benefit from forced parameterization (I believe… I would need to investigate, review, and test to confirm) because it creates new temporary objects with random names (look like GUID’s but a bit shorter) when it does its work. So since the WHERE may be the same (slight differences as it would be for different items or customers, but sometimes the same) but the FROM or INSERT is different in most of the queries, I don’t think forced parameterization would help. Now, that being said, I have not tested it and confirmed this, it is just what I observed from profiler (I know, use extended events… Profiler is just a lot faster to set up and I have templates in place for monitoring our ERP that I set up back when it was on SQL Server 2008 R2… which sadly wasn’t that long ago).
    Other “best practices” I know are that you should bump up the cost threshold for parallelism, but I’ve never seen a good method to calculate out a “magic number”… it’s always trial and error suggestions. Another best practice I’ve read was to not use cursors unless it is the only option. That one I agree with.
    I also have not found a good best practice for a temp table vs a table variable. I’ve read that number of rows is a factor you can use, but the value for that seems to change depending on where I look.
    One best practice I am aware of that I don’t follow is that SSRS, SSIS, and the SQL instance should be on separate boxes. I agree this will give you the most resources (memory, cpu, etc) for each service, but it also jumps up the licensing cost.
    Another best practice that we don’t follow is that we have mixed mode for login enabled rather than Windows only. The reason being we have SOME tools that do not support AD authentication to the SQL instance and we have SOME PC’s that are not domain joined connecting to the SQL instance. Without being domain joined, the connection is not trusted and the connection is refused.

    Lastly, avoiding NOLOCK is a “best practice” I have read about and I follow. For pretty much every query I have come across where NOLOCK looks like the solution, I’ve found alternate solutions that sometimes even resulted in a performance boost. Had one query that would run in about 2-3 seconds most of the time, but every now and then (thinking like 2-3 times a year when this was run daily) would run for about 8 hours. Further investigation of it, it seemed like when it would run slow, it was due to parallelism and a mix of the server not having enough free cores to run the threads in parallel and the threads doing some self-blocking resulting in the parallel threads essentially needing to run in series. Quick fix prior to investigation – add NOLOCK which resolved the problem. Further investigation resulted in adding a MAXDOP 1 hint which reduced the “good” run time down to 1-2 seconds and further fixing was done by tweaking cost threshold for parallelism to be just above what this query was asking for. I think it worked out to be setting it to 75-ish if I remember right.

    I’ve also read that turning Auto-Shrink on is a best practice. I’m joking about that one! Best practice is to have that turned off. BUT, as the option is available, I expect there is some use case where it is recommended to have it on. I can’t think what it would be or when you would want that turned on, but since the option is available, it MUST have a use case.

  • Confirmation bias is the major contributor, but there are other contributors. The next largest contributor, in my opinion, is guidance from documentation is typically poor. The feature optimize for adhoc workloads is a great example of this problem. After reading this article, I went back to Microsoft’s documentation to see if good guidance exists. It still does not. The key points I need to know, not just for this setting, but for any setting, is how do you determine/measure the need and how to determine/measure that a setting is causing problems. It must be specific. Stating to enable adhoc workloads if single use plans are high is vague. How do you define high?

    I do not expect Microsoft to have good guidance for every feature upon release. However, I would expect a feature to have good guidance after a few years of a feature exiting. The adhoc workloads feature has existed for over a decade, but the guidance has not improved.

    There are many great bloggers that fill in these guidance gaps. The best ones include their methodology in detail. This provides me the information to determine if I may be able to make a change that can have a positive impact to the servers I am responsible for. Randolph West and Kendra Little are two great bloggers that routinely include this kind of detail in their articles, who both happen to recently start working at Microsoft. I am looking forward to their skills being reflected in Microsoft’s documentation!

    • Robert, as a new employee with Microsoft, I will take this response under advisement and see what we can do about clarifying that page. No promises on when!

  • I was someone who went with Optimize for Ad Hoc Workloads being ON from either SQL 2008R2 or SQL 2012. I also followed Kim Tripp’s https://www.sqlskills.com/blogs/kimberly/plan-cache-adhoc-workloads-and-clearing-the-single-use-plan-cache-bloat/ and then added Jeff Iannucci’s https://desertdba.com/removing-ad-hoc-plans-from-query-store/ to keep the Plan Cache clean.

    Besides your blog I now see Pinal’s https://blog.sqlauthority.com/2017/10/20/sql-server-turn-optimize-ad-hoc-workloads/ This gives some guidance as to when to use it.

    For Brian I suggest you look at https://michaeljswart.com/2022/01/measure-the-effect-of-cost-threshold-for-parallelism/ Michael is trying to find a method to see what effect any change makes.

    • This is exactly why I LOVE the SQL community – we help each other out!
      I was looking at that blog post that Chris posted and now I have some method to actually measure what I should set the value to rather than just picking a query that went parallel and setting the value to something higher.
      MOST of our transactions are on OLTP databases, so we want mostly single threaded stuff running on them. I could probably just crank the value to some insane number like 10,000 and MOST of my transactions would be happy… but there are always the few that would be hurt and those few may be the loudest complainers.
      BUT I now have some method to review and analyze what I should set it to! Thanks a bunch Chris!

  • We also need to stop having the confirmation bias about OLTP systems doing only lightweight queries. This is a myth long gone, yet we still keep repeating that mantra. In order to cope with business requirements, which are (for valid reasons) more and more complicated, analytical queries are common in OLTP systems. For example in warehouse or shipment division, such as: “if there’s still enough empty space on this pallet (volume-wise and weight-wise) then put my box on it otherwise create new pallet”, or “if this pallet will fit on the allocated truck that goes in 2 days to this region, then use it else allocate new truck and put this pallet as the first one in it”, or “how many boxes do I need for this warehouse worker to put up goods from this picking task into”, or “of all the toothpastes with code xyz prepare me the list of locations where 100 pieces are located, which are needed for 5 orders, which the warehouse worker should pick. Order them by expiration date, unless they’re positioned in a place difficult to reach, then skip them and take newer ones”. I can go on for hours. OLTP systems DO issue heavy queries ALL THE TIME. Or you work with unrealistically simple stuff.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: