What is a Database?

What is Microsoft SQL Server?

To answer that, we have to ask what a relational database management server (RDBMS) is.

To answer that, we have to ask what a relational database is.

To answer that, we have to understand what the relational model is, what a database is, and how these two concepts combine to form what is effectively the basis of most technology today: the database.

Database

A database is, fundamentally, a collection of information that is stored in a defined structure.

A telephone book is a database. A recipe book is a database. Each of these books has a set of rules that define how the information is stored and retrieved. That is the structure.

Database Query

When we want to retrieve information, we query the structure with language appropriate to the database. Tell me the phone number of Randolph West, by looking up the surnames and going through all the Ws. Find the recipe for lemon meringue by going through the desserts and searching for meringue.

In a RDBMS, the language is called Structured Query Language, or SQL. You can pronounce it like “sequel”, or say each letter.

Relational Model

A few decades ago, IBM employee and computer scientist Edgar Codd developed his “Twelve Rules”, dictating how data should be laid out in a relational manner, using first-order predicate logic.

There’s no really easy way to explain what this exactly means at a philosophical or mathematical level, especially not in this forum, so I will explain what makes a relational database instead.

Imagine you want to buy a new iPhone. You walk into the store, and find a Genius. Her name tag says Thandi, and she takes you to the desk where the various models are displayed.

You decide after a few minutes that you want to get the glossy black one with the big screen and lots of storage, because in this imaginary scenario, you have lots of disposable income.

You also select Apple Care on the phone, and for a little bonus, you get yourself a blue leather phone cover.

Out comes the credit card, the transaction is approved, and Thandi and you exchange Twitter handles and decide to be friends. Everyone is happy.

Here’s what happens from the relational perspective, in a high level overview:

Stock Item
Product iPhone
Model iPhone 7 Plus
Storage Capacity 256GB
Colour Black
Style Glossy
Carrier Unlocked
Serial Number xxx
Apple Care Yes
Purchase Date 2017-03-08
Price $1000
Stock Item
Product iPhone cover
Model iPhone 7 Plus
Colour Blue
Style Leather
Purchase Date 2017-03-09
Price $50
Location
Store name Cross Iron Mills, Calgary
Salesperson thandi@example.com
Date 2017-03-08
Customer
Customer randolph@example.com
Payment method Credit Card

According to Codd’s paper on relational theory (PDF), these items should be defined by their natural structure, and that there should be one way, and only one way, to uniquely identify each item in that purchase event, from you and Thandi, to the phone cover and Apple Care, and how they relate to each other to create a single, unique transaction.

This is called normalization. The transaction can only be recorded once, but in a way that includes all the different information.

Although Apple has sold billions of phones, there is only one product category in their stock called iPhone. It would be reasonable to assume that their database contains a table called Products, and possible values include Mac, iPhone, iPad and iPod. Models would be in their own unique list.

In the same vein, there is a single list of possible colours that these products can be: white, black, silver, gold, red, blue, rose gold. These would go into a database table called Colours.

There are only a few storage capacities they can be: 32 GB, 128 GB, 256 GB. There’s the Storage table.

There would also be a table of Stores, a table of Staff, and a table of Customers. Each possible item would appear only once in these tables.

To distinguish between each unique value in the table, each record, or row, contains a unique identifier, or what we call a Primary Key, or PK for short (sometimes called an identity column, but this is not always the case). This is usually a number, because computers process numbers very efficiently, but can be any data type (we will cover data types in the next post).

For example, in the Colours table, the primary key might be 2 for black. For Products, the phone’s identifier might be 19, because Apple created several product categories before the iPhone was invented. The Storage primary key might be 5. Thandi might have a Staff PK of 322,544. Your Customer PK value might be 477,211,549.

All this information so far doesn’t tell us anything about the transaction itself. To do that, we have to associate each of these Primary Keys from each table, into a single row in a Transactions table, where we record the date and time, along with the sale amount and GST.

This association is called a relationship, and this is where we get the notion of relational data. The data in the Transactions table only has meaning, when it relates to the other tables.

In other words, all of these elements can be used to uniquely identify the sale of the phone and the phone cover to you, provided they all link back to a unique transaction PK.

When we use the values of those primary keys in a table to refer back to the original table, we call them Foreign Keys (FKs). This is because the Transactions table would have its own PK, as mentioned above, which in our example might be named the TransactionID column.

Next week, we will dive slightly deeper into data types and collation. Stay tuned.

If you have any feedback, please let me know on Twitter at @bornsql.

Author: randolph

Randolph West is a Microsoft Data Platform MVP, and has worked with SQL Server since the late 1990s. When not consulting, he can be seen acting on the stage and screen, or doing voices for independent video games. Randolph is available for talks on SQL Server, and technology in general. He also offers training for junior DBAs. Connect with Randolph on Google+ or Twitter.