A trillion and one

Joe Obbish wrote an epic post a few weeks ago about loading a trillion rows into a non-partitioned table in SQL Server, using a clustered columnstore index to maximise the compression.

(Short version: it’s very slow to query. Don’t do it. Bad things happen. I have an ongoing investigation with Ewald Cress about the evil wait type involved, which Joe noted in his original post. None of what I do is original.)

So I decided to repeat his experiment as well, mainly to see if I could. There is no limit to the number of rows you can store in a table in SQL Server (except in terms of disk space), so I replicated Joe’s experiment and … well, it was practically the same.

Because I’m petty, I inserted one trillion and one rows into my table. I currently hold the record, I guess?

My particular instance was an Ubuntu 16.04 LTS virtual machine, running on ESXi 6.5. The VM had 40GB of RAM and eight CPU cores assigned, which meant I was really being unfair on my quad-core Xeon CPU. The version of SQL Server was, naturally, 2017 for Linux, and I was using the latest release candidate, RC2. Max Server Memory was 32GB, Cost Threshold for Parallelism was 50, and MAXDOP was 0.

Also, I disabled auto-create and auto-update for statistics. In retrospect, this was a mistake, but I live with my mistakes and learn from them, et cetera.

One observation I have is that Linux appears to be marginally quicker than Windows Server on the same hardware, particularly around disk I/O. I can’t give you reasonable numbers because my NVMe drive has zero latency. Literally. At that level, any number greater than zero is an outlier, and I haven’t had time to do a proper statistical analysis.

During the five days it took (I did the trillion row insert twice, and the second time was much quicker), the latency did climb up to 6.6 ms for log writes at one stage.

What this really tells me is nothing at all. A trillion inserts was CPU- and memory-bound because the high compression of the clustered columnstore index meant very few disk writes were actually performed (notwithstanding the transaction log). Like Joe, my table was less than 1GB in size.

On the other hand, I managed to get a sustained speed of 8.4 million inserts per second, which is quite impressive.

SQL Server is definitely capable of taking a lot of abuse.

If you have any million-million-row tables you want to talk about, let’s chat on Twitter at @bornsql.

SQL Server 2017 Announced

By now you will have heard that the next version of SQL Server has been announced. There’s no release date yet, but Brent Ozar seems to think it’ll be before June.

There are many new features, but the biggest deal is that SQL Server 2017 runs on both Windows Server and Linux. Yup, SQL Server 2017 is certified to run on Windows Server, Red Hat Enterprise Linux, Ubuntu, and SuSE. (You can even run it on macOS in a Docker container.)

There are some big improvements to the Query Optimizer as well, starting with Adaptive Query Processing. It’s like a smart go-faster button, making incremental improvements to your queries the more often they run.

Despite being in pre-release mode, Microsoft states that 2017 is production-ready, which means that it should work as expected.

Watch the announcement, and download the latest technical preview to try it for yourself.