What is a container anyway?

Recently there was a thread on Twitter which established that a lot of IT people didn’t know the difference between virtual machines and containers.

https://twitter.com/IanColdwater/status/1137737702227550208

I felt like this was a question I’d already answered, so I searched my computer for the word “container.” It turns out that I have explained this in the past, but I just hadn’t written a blog post about it.

Components, kernels and drivers

At the heart of every modern computer is a central processing unit, memory, storage, and a network interface. An operating system (e.g. Windows, Linux, macOS) is an interface between these components and the software that we use.

If we want to get a little more technical, the kernel of an operating system is where this magic happens: the interface between components (also known as devices) and their device drivers (the software that makes them go). When your typical word processor wants to write a document to the hard drive for example, it lets the operating system know, which in turn makes a system call to the kernel. The kernel asks the device driver to write the file to the mechanical hard drive or solid-state drive, and the firmware on that drive lets the device driver know when it’s done. This confirmation is returned to the kernel, which notifies the operating system, which notifies the word processor you’re using.

Virtual machines

A virtual machine (VM) is, from an operating system’s perspective, no different than a physical computer. A VM host runs what we call a hypervisor, which is a thin translation layer between the physical components and their virtual representations. It allows more than one virtual machine to run on the same hardware — for better or worse — depending on resource allocation and workload. This solves one of the main drawbacks of powerful physical servers, namely that their resources can now be better utilized by sharing them between various virtual machines. Each VM is independent of the others in that it can see and access every single resource assigned to that VM as if it was the only consumer of those resources. The hypervisor manages these interactions invisibly.

The main drawback with virtual machines is that they have distinct operating environments, which have to be maintained independently. A lot of time and effort is spent on duplicating work. For instance, a hypervisor might host eight VM guests, and each of these guests may have the same kind of operating system installed, so when it comes to updating them they will run the same process eight times over. Additionally, there is a lot of overhead with each virtual machine needing to run an entire operating system.

Bring in the containers

If virtual machines solve the problem of resource utilization of physical machines, containers solve the problem of repetitive maintenance and operating system overhead of virtual machines, because instead of virtualizing the physical hardware, a container host virtualizes just the operating system. This makes it much easier to deploy software, because it only needs certain resources to run. Because the operating system is not included in a container, it is lighter weight, so more containers can run on the same hardware than the number of virtual machines.

How? This is where the aforementioned Twitter thread comes in. Imagine a house which has its own plumbing, electrical, and heating systems. That’s your typical computer, where a regular checkup of the furnace filters would be analogous to running Windows Updates.

Now imagine a duplex with two identical sets of resources. Two furnaces, two electrical systems, two sets of plumbing, even if they might be sharing the same access to municipal connections. This is your typical virtual machine environment. Even though the duplex shares resources, they have to be maintained independently. Fixing one furnace does not automatically fix the other.

Now imagine a modern apartment building. Each apartment might have its own water pipes and electrical outlets, but they won’t have their own forced air furnaces. The electrical and plumbing systems might be shared per floor. This is your typical container environment, where each apartment has just enough resources to get the job done, but no more than it needs. The proverbial “lock up and go.”

Shipping containers have somehow become synonymous with container software, and there are reasons for that (Kubernetes is named after the Greek word for a ship captain), but that analogy isn’t as helpful as the apartment building, in my opinion.

The container host is implemented slightly differently depending on the operating system in question, but it comes down to a thin translation layer between the operating system kernel and the container. This is different to the hypervisor because it’s virtualizing the operating system itself, not the underlying hardware. It’s a narrower scope of focus with smaller overhead.

In fact, certain operating system families like Linux and BSD make the overhead almost negligible with chroot. It lets processes think they’re the only process running, by faking out the root directory, and anything running in that process cannot see outside of it. This is the principle that allows containers to run on these operating systems without requiring the overhead of a full virtual machine.

So instead of having one operating system for each virtual machine, we now only have one operating system for the entire host, and each container is running the software it needs to get the job done.

This is all well and good, but if your host machine crashes or fails, all your containers vanish at that single point of failure. In the same way that VMs can be moved around, containers can be stopped and started on different hosts using a management interface (a control plane) in the form of orchestration software such as Kubernetes. Even Docker has adopted Kubernetes, despite creating their own orchestrator in the past.

With orchestration software, and the architectural decision of separating storage from compute containers (as described in last week’s post), a container may not even be running on the same physical hardware as it was a day ago, or even ten minutes ago. That now becomes a problem for Kubernetes to solve. If the container is unresponsive for any length of time, Kubernetes will just stop it and start it somewhere where there’s available space on a host.

The future is Docker containers, even if you don’t think your organization is ready for cloud computing. Even SQL Server runs in Docker containers. SQL Server 2019 Big Data Clusters is implemented using containers under the covers.

Docker containers run on Windows, macOS, and of course Linux. Even Windows 10 has its own Linux kernel now with the recently announced Windows Subsystem for Linux version 2 (WSL 2). Docker containers on Linux, running on Windows. The operating system almost doesn’t matter now.

Share your thoughts in the comments below.

Photo by Kyle Ryan on Unsplash.