A few months ago I suggested that the following settings should be the default for most SQL Server instances:
- Set cost threshold for parallelism to 50
- Disable lightweight pooling if it is enabled
- Disable priority boost if it is enabled
- Set optimize for ad hoc workloads to enabled
- Set max server memory (MB) to a custom value consistent with Jonathan Kehayias’s algorithm
- Set backup compression default to enabled
- Set the power saving settings on Windows to high performance if possible
- Provide an option to flush the plan cache as needed
Over the next few posts I will dive into the why. This week we will start with cost threshold for parallelism.
Cost threshold for parallelism
Modern Central Processing Units (CPUs) often have more than one CPU core on the physical processor. In principle, this allows certain tasks to be executed across the cores in parallel, to reduce the overall amount of time it takes to perform those tasks.
Note: Multiple physical cores are different to simultaneous multithreading (Intel calls this Hyper-Threading), which splits up a physical core into two virtual cores. SMT virtual cores share execution engines, so the performance gain is only between 15% and 30% over a single-core CPU. Also note that SMT may be affected by Spectre, Meltdown and Foreshadow security vulnerabilities.
SQL Server uses a cost-based query optimizer to come up with a query plan. The query “cost” is a sum of all of the operations that the optimizer estimates it would take to perform the query. After considering thousands of options in a few microseconds, the database engine will pick the query plan with the lowest cost. This cost is a unitless measure that does not map directly to time or system resources.
The query optimizer will take multiple cores into account (called parallel processing) when searching for a query plan with the lowest estimated cost to choose for a given query. The cost threshold for parallelism setting tells the optimizer at which point parallelism should be considered. If the estimated cost for a single-core plan exceeds this setting, then the query optimizer will come up with plans that use more than one CPU core.
This can be an expensive proposition, because each CPU core has its own thread which must be monitored and managed, requiring an additional set of resources (memory and CPU) to keep track, so the parallel plan is not always the one with the lowest cost.
Over the years, CPUs have become exponentially faster. There are fundamental issues with this. For starters, any clock that ticks over billions of times a second is going to need a lot of energy to operate. Additionally, CPUs are prone to overheating as they approach 4 GHz (4 billion on-off cycles per second). To work around these physical limitations, CPU manufacturers added SMT (Hyper-Threading), multiple physical cores, and added options to reduce performance in order to conserve energy and lower cooling costs.
This has an unfortunate side-effect in that single-core performance is generally faster than multi-core performance. In other words, it takes a lot of energy to power two or more cores running at the same speed as a single core, plus they get really hot and might melt. To bring it back to SQL Server, queries may well run faster on a single core up to a certain point, because the query optimizer has no way to measure single-threaded and multi-threaded performance, nor does it care about the difference between logical and physical cores.
That certain point where single-threaded queries will no longer be faster is the cost threshold for parallelism. Unfortunately, the default setting of 5 means that queries are likely to hit that threshold more often than not on modern hardware, and the optimizer is forced to look at parallel plans unnecessarily. A side-effect to this is that queries running in parallel will block queries running on a single thread, so short-running queries will be delayed by long-running queries needing the same resources.
Therefore, if I am setting up a SQL Server instance from scratch, I will set this value to 50 by default, and monitor the performance counters. Once a query cost exceeds 50, then it can use all the cores up to the limit defined by the max degree of parallelism.
Max degree of parallelism
SQL Server over the years has been improved to recognize modern hardware, including multiple physical CPUs and multiple CPU cores (both physical and virtual) on each physical processor, with some considerations:
- If the motherboard has more than one CPU processor socket, are these sockets populated?
- If these multiple processors are populated, do they address their own banks of main memory (NUMA), or is the main memory shared between processors (SMP)?
NUMA (Non-Uniform Memory Access) is an architecture where each physical processor and bank of RAM (Random Access Memory) is a NUMA node, and addresses its own set of memory. Any memory access outside of that local NUMA node is expensive (i.e. slow) because electrons carrying data to and from memory and CPU have farther to travel.
SMP (Symmetric Multi-Processing) architecture where memory is shared between all CPUs does not scale well, so modern servers with multiple physical processors are predominantly NUMA-based.
Bringing it back to the query optimizer, we want SQL Server to avoid using CPU cores that are not part of a NUMA node, so the max degree of parallelism should be set to the number of physical cores on a single NUMA node, up to a maximum of 8.
What about virtualisation?
All of the above assumes physical hardware, but many organisations today are running on virtual machines (VMs). When CPUs are virtualised, there are a few things to take into account.
Firstly, check that SMT (Hyper-Threading) is disabled at the BIOS level. While some people may not be comfortable losing between 15% and 30% in raw processor performance, the abovementioned security vulnerabilities are especially risky on shared resources, such as a VM host.
Secondly, there are SQL Server licencing considerations if the VM host is not licenced with Enterprise Edition, namely that all CPU cores count towards the Standard Edition CPU core count limit, be they virtual cores or physical. If you enable SMT on a machine with 16 cores and install SQL Server Standard Edition, only the first 24 of the 32 logical cores will be used, which will have a detrimental effect on performance.
Thirdly, the virtual CPU configuration for a VM guest should be 1:1 to the physical core count of the underlying host. Any skew in this mapping will also have a detrimental effect on performance.
What about SQL Server on Linux?
For these particular settings, there is no difference between SQL Server running on Windows and Linux.
Share your thoughts about parallelism and your SQL Server settings for it in the comments below.