I'm a DBA. Do I really need to know Kubernetes?

Welcome to February 2020. It’s the second month of the year 2020! I remember sitting at a server with SQL Server 6.5 installed on it, worrying about the Y2K bug. It feels like yesterday.

While I haven’t been a database administrator for all that time between then and now, I have been exposed to broad swathes of the information technology industry in that time, and I think I have some insight into future trends, my inside knowledge from Microsoft notwithstanding.

So this question, whether a SQL Server DBA really needs to know about Kubernetes, is really a question about whether DBAs need to know about the plumbing that runs the infrastructure upon which our databases reside.

In October 2018 I asked, “What is a DBA anyway?” It was a week after another post where I declared the DBA role “history.” My answer is:

Yes! You need to know Kubernetes if you’re a SQL Server DBA.

Managing virtual consumers

As you may already know, virtual machines (VMs) are a way to abstract server hardware from the underlying operating system, allowing hardware resources to be shared among multiple operating systems. For the software installed on each VM, the virtual hardware layer is indistinguishable from real hardware and is treated the same way. VMs are managed like physical computers, with some benefits around migrating and upgrading them. SQL Server is a supported platform on VMs, and when configured correctly, the hypervisor will add only a 3-5% overhead compared to physical hardware.

Containers are a way to virtualize the operating system as well, and Docker is the market leader here. This offers even better resource allocation because there is no need to have multiple operating systems that require maintenance. Each application can run fully isolated in its own container. On Linux, there isn’t even a need for a container hypervisor; containers just run alongside each other, (mostly) unaware that they are sharing the underlying operating system. Importantly, this allows containers to run inside a Linux virtual machine. For other operating systems like Windows and macOS, a Docker hypervisor is required to map the operating system calls, but it is still lighter weight than a full virtual machine.

If you need to upgrade an application or replace a hanging one, containers can be discarded and replaced almost immediately without the need to reinstall and reconfigure the application inside it. This is because the container image is pre-defined before deployment. This container replacement process can be fully automated, and that’s where Kubernetes comes in. Kubernetes is a control plane, or container orchestration system, that manages containers and their rules for deployment and replacement. For instance, we don’t want more than one SQL Server container pointed at the same storage at the same time.

I refer to both virtual machines and containers as “virtual consumers.”

Why do we need to know Kubernetes?

Why do we need to know Kubernetes if we’re still supporting SQL Server 2008 R2 in a traditional corporate environment with dedicated physical hardware? Why do we need to know Kubernetes if we have a team of network engineers managing the environment? Why do we need to know Kubernetes if all we’re doing is query tuning and index maintenance?

The answer is the same as it has always been. Our job as DBAs is to enable our customers (whether internal to the organization where we work, or external to clients and partners) to get to their data effectively and efficiently, in a secure manner. This comes with certain guarantees including that it is accurate, not corrupt, routinely backed up, and that the backups are tested.

While your organization may not be embracing virtualization at the moment (and containers are just another virtual abstraction layer), they eventually will be. Hardware manufacturers are optimizing for multiple CPU cores. Storage technology is now fast enough to hide random I/O patterns, with solid-state already eliminating seek times intrinsic to spinning hard drives. ECC RAM is available in larger modules, with persisted memory available in 512 GB modules. Persisted memory allows most workloads to operate purely in RAM, backed by solid-state storage in the same physical RAM module.

So when your organization looks at its next hardware cycle, it will be planning for making the best use of available resources, which means virtualization. And if you don’t know how SQL Server writes to the underlying storage layer because it is abstracted behind a Linux-based management interface, you will no longer be an asset to the organization and will either have to find employment elsewhere, or pivot into a new role.

Get ahead of the curve. Lean into the new technology. You can run Docker containers today, and the Docker Desktop application gives you a way to play with Kubernetes on your local computer.

The Calgary PASS user group will be presenting a free session on SQL Server in Kubernetes, on 26 February 2020, presented by Anthony Nocentino (blog | Twitter). Visit calgary.pass.org for more information.

I’m a DBA. Do I really need to know Kubernetes?

Managing virtual consumers

Why do we need to know Kubernetes?