I don’t blog anymore, but not because I work at Microsoft

Last year I had a health scare that got my blood pressure at 200-and-something over 100-and-something, soon to be wired up to an IV drip, put through a CT scanner, and a couple of weeks later, poked and prodded from both ends while administered fentanyl. I’m in my mid-40s, and my father died when he was 41. It felt like I imagined a heart attack would, but luckily it wasn’t.

That’s why I don’t blog regularly. I decided to do other things with my time. I still write. I’m writing a couple of stories now, and in May this year we landed our third edition of the SQL Server Administration Inside Out book through Pearson. I walk every day, 10,000 steps. I spend very little time on social media. I avoid news websites and apps where possible. And I don’t work as many hours as I used to.

I write daily at my job. So what am I doing that I can blog about? What is not considered NDA?

Well, I was ostensibly hired to be the content lead at Microsoft for SQL Server on Linux, and then I picked up the content lead role for SQL migrations (that’s Azure SQL and SQL Server, not the MySQL and PostgreSQL side). Additionally, our team splits the workload for maintaining thousands of reference articles for the Database Engine, so I help out here and there where I can.

In the (less than) two years I’ve worked there, from the same spot in my condo as always, I’ve changed the Applies to section (on the top of the Database Docs articles) to use a vector-based green checkmark so it looks better than the old low-resolution PNG that was there. I’ve also made those product names, in the same section, link to a page explaining what they mean, and did away with the confusing “SQL Server (all supported versions)”.

I spend a lot of my day working in Visual Studio Code, which isn’t my favourite text editor (that would be BBEdit), and GitHub. For performing all my Git-related tasks, I use Fork.

From time to time I have downtime (which I have to make for myself: there’s a backlog of things and more than enough to keep us busy in our team of 8, including our manager). During that downtime I work on special projects. Most of them relate to C# development, because as a former consultant and network engineer, automation is everything in IT. If you’re repeating yourself, automate that task.

So I have leveraged my (more than) 30 years of experience writing software, into producing tooling for automation. It combines two of my favourite nerd games: performance tuning, and text manipulation. I can process every file in the SQL Docs repository in 43 seconds. Git doesn’t like it when I do that. Git gets confused.

What kind of stuff have I automated?

For starters, and what started out as the foundation for my Microsoft.DocsTools project that you’ll never see, the monthly SQL Server on Linux release notes and release history is completely automated. On release day, I put a build number and a date into a CSV file, and the rest just happens.

The trickiest part about generating the release notes is that there are three versions of SQL Server that support Linux, and three supported distributions. Currently each distribution in turn is supported on two versions each for Red Hat and SUSE, and three on Ubuntu. So I wrote a clean room C# parser for the RPM and Debian package manifest format, as one does, in an afternoon. Now those 7 x 3 x 3 folders, containing between four and eight packages depending on the version (don’t forget GDR variants!), are collated and sorted and filtered and finally become a “latest version for each package for each release”, and a full history of each release. That happens in two seconds, once the manifests have been polled.

Then, the Database Engine errors and events articles are also fully automated. Since about the middle of last year, we split them out into versioned error messages, using the navigation menu on the top left of Database Docs to dictate what you see. Behind the scenes, we generate 25 files per version of SQL Server (2016 and later), directly from the source code that generates those messages in the database engine. That takes about two seconds as well, once I’ve grabbed the latest source code file for each CU.

But this isn’t the big project. We have a number of tools built by people who come and go. Some operationalize those tools and some don’t. My tooling is not operationalized, which means it’s not technically supported, and so I will be working to include this next project into the larger one that does the same thing.

So let me explain what it does. In simple terms, it does 50-60% of my busywork for me. I mentioned that we have thousands of reference articles, because SQL Server is a massive multi-million dollar revenue generator for Microsoft, and with the Azure SQL side of things, that adds even more reference documentation. All those system functions have to be documented, as well as the views (including DMVs), and stored procedures, and Transact-SQL language elements. We have over 20,000 files in the SQL Docs repository, of which about 16,000 are Markdown or YAML files. Believe it or not, we even try our best to keep marketing material away from Docs.

Let’s say an issue comes in, either by a GitHub pull request from an external contributor (usually someone who probably reads this blog, and thank you for that), or internally. If it’s internal, it’s either an update to an existing article, or it’s new content. The vast majority of my time deals with existing articles. One of those 16,000 files, some of which haven’t been looked at since 2015 when they were ported over from MSDN and SQL Server Books Online. And even then, they may have been inherited from Sybase SQL Server for all we know. Thousands of articles, thousands of inconsistencies. Scores of content developers with their own styles. Contributors who don’t speak English as a first language. Content that was written when Microsoft may have been a little less accessible to the average consumer.

I’ll quickly talk about the new content, because I’m sure you’re curious about that. Microsoft is a big place, and I deal directly with maybe 40 people in the organization during a regular month, and 60 in a wider circle less frequently. So usually one of the program managers, or PMs, themselves responsible for managing their own teams, will say that there’s a new feature coming, and we need to document it.

Depending on a number of factors, which mirror any large working environment, the deadlines seem to make whooshing sounds as they go by. Technical specifications can change. Massive engineering effort can be ~~WinFS~~ cancelled. Documentation is not always at the forefront of the process.

All that being said, new content is easier than maintaining old content. At some point the PM will sign off and if there was a mistake, well, we get two publishing schedules per day to fix it. Not a big deal.

It’s the old stuff that grinds us down. Imagine you find an issue in an article and you open a pull request (thank you!). Take 16,000 articles, supporting hundreds of features across four versions of SQL Server (Docs only updates 2016 and later content), written by scores of content writers over the years. We might have to check that your edit is not mentioned somewhere else as well. Maybe you’re correcting an example, or adding a technical limitation. Do we need to repeat that in other places?

Now add in history. A sentence that is technically correct for SQL Server 7.0 may still be correct today (7.0 is when the database engine was re-engineered from the old Sybase engine, to use 8KB data pages, for example). On the other hand, the sentence right after it might have been correct until a specific range of CUs during the 2019 release, but only that range.

Side note: this happened to me. There’s behaviour in one configuration feature in SQL Server 2019, that affects only one or two CUs. There’s a small block of text in an article that is only visible if you choose 2019 from the navigation menu.

Markdown is very forgiving. And the way that DocFX uses it (that’s the engine Docs are built on), with our include files, and YAML and JSON, and build pipeline, we can get away with a bunch of stuff. We even have automated grammar checking. But it’s not perfect, and text layout in Markdown files isn’t consistent, and we have templates and style guidelines that change. We have styles specific to SQL Docs that (say) the .NET team doesn’t think about because they generate their docs from scratch every time.

So I wrote a tool to help us manage the metadata in these 16,000 files. That tool grew to fix common errors in Markdown files for the Database Docs team. For example, I format tables a certain way. I put backticks around system object names, including system databases. I fix formatting of headings. I fix formatting of syntax code fences. I add blank lines after headings, and remove double blank lines. If you ask it to, the tool will trim line endings. I recently added a contraction fixer because our grammar checker told us to be more informal. Our include files now have to be included a certain way. Images also have to be added a certain way for new content, but for the longest time we did them the old Markdown way and they need to be converted if you edit an article containing the old way. And then fix the alt-text to be accessible. And make sure the image itself isn’t outdated. We even have thousands of reference articles which have different ways of showing argument lists.

So I wrote all this automation that makes all that repeatable stuff go by a little faster. I can run it maybe ten times while I’m working on an article or a series of articles and it’s fast enough that it doesn’t distract me. It’s written in C#, I make extensive use of Span<T> and Parallel.ForEach. There’s already a .NET 8 build in active testing.

I say 50-60% but it’s probably more. It took me a week to refresh a section of content early last year with around 25 files. Using my automation tooling, it took one day to do 75 files. Depending on the type of work, I can refresh 50 articles in a day, but it’s usually around 5-10.

This isn’t a SQL Server blog post, it’s SQL Server adjacent. To make it count, I’m going to remind you not to use NOLOCK, because RCSI is literally right there, you twit. (Imagine writing that in the NOLOCK article!)

Stay groovy, friends, and keep learning. If you see an error in the docs, open a pull request and one of the 8 of us will see it and help out. If it’s me, you can bet I’ll be running some automation on it from a Canadian MacBook Pro.

I don’t blog anymore, but not because I work at Microsoft

Leave a Reply Cancel reply