Joe Obbish wrote an epic post a few weeks ago about loading a trillion rows into a non-partitioned table in SQL Server, using a clustered columnstore index to maximise the compression.
(Short version: it’s very slow to query. Don’t do it. Bad things happen. I have an ongoing investigation with Ewald Cress about the evil wait type involved, which Joe noted in his original post. None of what I do is original.)
So I decided to repeat his experiment as well, mainly to see if I could. There is no limit to the number of rows you can store in a table in SQL Server (except in terms of disk space), so I replicated Joe’s experiment and … well, it was practically the same.
Because I’m petty, I inserted one trillion and one rows into my table. I currently hold the record, I guess?
My particular instance was an Ubuntu 16.04 LTS virtual machine, running on ESXi 6.5. The VM had 40GB of RAM and eight CPU cores assigned, which meant I was really being unfair on my quad-core Xeon CPU. The version of SQL Server was, naturally, 2017 for Linux, and I was using the latest release candidate, RC2. Max Server Memory was 32GB, Cost Threshold for Parallelism was 50, and MAXDOP was 0.
Also, I disabled auto-create and auto-update for statistics. In retrospect, this was a mistake, but I live with my mistakes and learn from them, et cetera.
One observation I have is that Linux appears to be marginally quicker than Windows Server on the same hardware, particularly around disk I/O. I can’t give you reasonable numbers because my NVMe drive has zero latency. Literally. At that level, any number greater than zero is an outlier, and I haven’t had time to do a proper statistical analysis.
During the five days it took (I did the trillion row insert twice, and the second time was much quicker), the latency did climb up to 6.6 ms for log writes at one stage.
What this really tells me is nothing at all. A trillion inserts was CPU- and memory-bound because the high compression of the clustered columnstore index meant very few disk writes were actually performed (notwithstanding the transaction log). Like Joe, my table was less than 1GB in size.
On the other hand, I managed to get a sustained speed of 8.4 million inserts per second, which is quite impressive.
SQL Server is definitely capable of taking a lot of abuse.
If you have any million-million-row tables you want to talk about, let’s chat on Twitter at @bornsql.