98% of I/O is taken up by tempdb

During a recent engagement, the customer explained that the performance on their relatively small system was especially slow on Monday morning, but generally bad that entire day.

The SQL Server 2008 R2 instance, which runs in a virtual machine, has 30GB of RAM allocated, but the SQL Server memory usage never exceeds 16GB.

Digging deeper, we discovered that 98% of all I/O was consumed by tempdb, described by Brent Ozar as the public toilet of SQL Server.

The system makes extensive use of temporary tables and table variables in its queries.

The tempdb data file hovers around 2.5GB, and the log file is around 1.5GB. Certainly not a huge database, and puzzling why this small company is experiencing performance issues.

It all fits in memory! Which is fast!

We noticed that the SQL Server instance had been restarted the previous Sunday, as part of a maintenance cycle, and that the file allocation for tempdb was set to the defaults:

  • 8MB for data file, 10% growth
  • 1MB for log file, 10% growth

In other words, no matter how big the tempdb database was before the restart, it would always shrink down to that tiny size, needing to zero out the log file when it grew again.

To get to the size it was today, tempdb had to grow the data file 60 times and the log file 77 times. Each time it took 10% longer than before to grow, requiring 10% more space.

Of note is that SQL Server does not have any global trace flags enabled, in particular, TF1118.

Every half hour, a virtual machine snapshot is taken, freezing the I/O on the server as well.

Our resolution was as follows:

  • Set Windows Power Saving Mode to High Performance
  • Increase tempdb data files to match number of CPU cores
  • Change tempdb default file sizes to the current usage
  • Change tempdb growth to fixed value, appropriate for I/O subsystem
  • Shrink and resize tempdb log to current size
  • Enable Trace Flags 1118 and 1117 to improve data file contention
  • Switch from VM snapshots to Full Recovery Mode and enable transaction log backups

We will have to wait to find out what impact these changes had on the environment over the course of a week. I’ll report back next time.

%d bloggers like this: