My friend Jason asked me about his MySQL database export file last week, whether it was sufficient to create a new database. I replied saying that he would have to set up the database separately, because the physical files are allocated in a particular way depending on the web host.
What is a physical file, though? Kenneth Fisher, that wild stallion from SQL Studies, reminded me on Twitter that a physical file, or table, or database, isn’t actually physical. Hard drives are mostly virtualised, especially with web servers, and nothing is physical anymore. Even more confusingly, hard drives are gradually replaced by electric circuits and solid state memory technology that does something called wear-levelling.
When we speak of physical files, or databases, or tables, we’re referring to the way these files are allocated on a storage device. In the old days, pre-2000, it was possible on some operating systems and file systems to allocate the fastest part of the spinning platter of a hard drive (the inside tracks) to processes that needed low seek times, like Relational Database Management Systems (RDBMS).
Nowadays with virtualised storage, including network storage and cloud storage, the physical aspect is meaningless. Spindles are what matter. But what’s a spindle? Until recently, storage was provided by spinning electromagnetic platters called hard drives. The spindle in the center is what we refer to, and the more drives allocated to a storage device, the better.
The problem with spinning drives is that they have a built-in delay for the read-head to locate the correct area on the platter surface, which is referred to as latency. Adding more spindles (more drives) to an array lowers the latency, because files are striped across multiple disks (RAID).
For example, in a three-disk configuration, one file could effectively be split into three parts. That improved read times, and therefore latency, because the file system could look for different parts of the file on different underlying platters, effectively reducing seek times by one-third.
To complicate things even more, solid-state storage, which comes in many forms, but most commonly as USB thumb drives, camera cards (often called memory cards, which is a misnomer), and solid-state drives (SSDs), are gradually replacing hard drives in devices for network and cloud storage. SSDs have no latency because there’s no physical platter to spin up and no read-head to position over a particular spot on the surface. The cells that store the circuits that contain the bits of data are accessible instantly.
However, this speed improvement has a trade-off. Solid state storage has a limited lifespan. The cells can only be written to a certain number of times before failing. The drive controller has to move data all the time, called wear-levelling, to ensure that the blocks of storage are used evenly, thereby extending the life of a drive. Where a file is physically located on the drive, can change at any given moment.
Virtual storage in cloud and network storage devices already muddied the waters with sharing data between virtual machines. Solid-state drives make it even more abstract.
A physical database might refer to the server it’s running on or a virtual machine that is running on a physical server. A physical file might refer to a collection of bits on a single spinning disk drive or SSD, or a collection of bits that are located on several hundred spinning disks.
We have reached a point where a physical storage paradigm is meaningless. When we talk about the physical file on a disk, we mean the abstract representation of the file system’s knowledge of where that file is stored at that point in time. Ultimately, though, it doesn’t mean what it used to.
If you would like to commiserate the loss of high-failure spinning rust, find me on Twitter at @bornsql .