Fundamentals of Data Types

Last week, we discussed storing text in a database. This week we will dive deeper into data types.

When storing data in our database, we want to make sure that it’s stored accurately and that we only use the required amount of space.

This is because when we access the data later, we want to make sure any calculations are accurate; plus reading the data takes up memory, and we want to be as efficient as we can with memory usage.

There are seven data type categories in SQL Server:

  • exact numerics
  • approximate numerics
  • date and time
  • character strings
  • Unicode character strings
  • binary strings
  • other

When we want to use these data types for our columns, we need to declare them. Some require a length, some require a precision and scale, and some can be declared without a length at all. For example:

No Length (implied in data type):
DECLARE @age AS TINYINT;

Explicit Length (length is supplied):
DECLARE @firstName AS VARCHAR(255);

Precision and Scale:
DECLARE @interestRate AS DECIMAL(9,3);

Let’s talk a bit about precision and scale, because those values between the brackets may not work the way we think they do.

Precision and Scale

Data types with decimal places are defined by what we call fixed precision and scale. Let’s look at an example:

123,456.789

In the above number, we see a six-digit number (ignoring the thousand separator) followed by a decimal point, and then a fraction represented by three decimal places. This number has a scale of 3 (the digits after the decimal point) and a precision of 9 (the digits for the entire value, on both sides of the decimal point). We would declare this value as DECIMAL(9,3).

This is confusing at first glance, because we have to declare it “backwards”, with the precision first, and then the scale. It may be easier to think of the precision in the same way we think of a character string’s length.

Date and time data types can also have decimal places, and SQL Server supports times accurate to the nearest 100 nanoseconds. The most accurate datetime is DATETIME2(7), where 7 decimal places are reserved for the time.

Before SQL Server 2008, we used DATETIME, which is only accurate to the nearest 3 milliseconds, and uses 8 bytes. A drop-in replacement for this is DATETIME2(3), using 3 decimal places, and accurate to the nearest millisecond. It only needs 7 bytes per column.

Be mindful that, as higher precision and scale are required, a column’s storage requirement increases. Accuracy is a trade-off with disk space and memory, so we may find ourselves using floating point values everywhere.

However, in cases where accuracy is required, always stick to exact numerics. Financial calculations, for example, should always use DECIMAL and MONEY data types.

Exact Numerics

Exact Numerics are exact, because any value that is stored is the exact same value that is retrieved later. These are the most common types found in a database, and INT is the most prevalent.

Exact numerics are split up into integers (BIGINT, INT, SMALLINT, TINYINT, BIT) and decimals (NUMERIC, DECIMAL, MONEY, SMALLMONEY). Decimals have decimal places (defined by precision and scale), while integers do not.

Integers have fixed sizes (see table below), so we don’t need to specify a length when declaring this data type.

Type Bytes Range
BIGINT 8 bytes -2^63 to 2^63-1
INT 4 bytes -2^31 to 2^31-1
SMALLINT 2 bytes -2^15 to 2^15-1
TINYINT 1 byte 0 to 255
BIT 1 bit 0 to 1
  • BIT is often used for storing Boolean values, where 1 = True and 0 = False.
  • Yes, BIGINT can store numbers as large as 2 to the power of 63 minus 1. That’s 19 digits wide, with a value of 9,223,372,036,854,775,807, or 9.2 quintillion.

Decimals may vary depending on the precision and scale, so we have to specify those in the declaration.

Type Bytes Range
DECIMAL 5 to 17 bytes Depends on precision and scale.
38 digits is the longest possible precision.
NUMERIC
  • DECIMAL and NUMERIC are synonyms and can be used interchangeably. Read more about this data type, and how precision and scale affects bytes used, here.

Although the MONEY and SMALLMONEY data types do have decimal places, they don’t require the precision and scale in the declaration because these are actually synonyms for DECIMAL(19,4) and DECIMAL(10,4) respectively. Think of these data types for convenience more than anything.

Type Bytes Range
MONEY 8 bytes -922,337,203,685,477.5808 to 922,337,203,685,477.5807
SMALLMONEY 4 bytes -214,748.3648 to 214,748.3647

Approximate Numerics

Approximate Numerics mean that the value stored is only approximate. Floating point numbers would be classified as approximate numerics, and these comprise FLOAT and REAL.

Declaring a FLOAT requires a length, which represents the number of bits used to store the mantissa. REAL is a synonym of FLOAT(24).

The mantissa means the significant digits of a number in scientific notation, which is how floating point numbers are represented. The default is FLOAT(53). Generally, we stick to the defaults, and use REAL if we want to save space, forgoing some accuracy of the larger FLOAT(53).

Type Bytes Range
FLOAT 4 or 8 bytes -1.79E+308 to -2.23E-308, 0 (zero),
and 2.23E-308 to 1.79E+308
REAL 4 bytes -3.40E+38 to -1.18E-38, 0 (zero),
and 1.18E-38 to 3.40E+38

Date and Time

Date and time data types are slightly more complex. For storing dates (with no time), we use DATE. We store times (with no dates) using TIME. For storing both date and time in the same column, we can use DATETIME2, DATETIME, or SMALLDATETIME. Finally, we can even store timezone-aware values comprising a date and time and timezone offset, using DATETIMEOFFSET.

DATETIME2, TIME, and DATETIMEOFFSET take a length in their declarations, otherwise they default to 7 (accurate to the nearest 100 nanoseconds).

Character Strings

As we saw last week, characters can be fixed-length (CHAR) or variable-length (VARCHAR), and can support special Unicode character types (NCHAR and NVARCHAR respectively). Collation should also be taken into account.

Length can be 1 to 8000 for CHAR and VARCHAR, or 1 to 4000 for NCHAR and NVARCHAR. For storing values larger than that, see the Large Objects section below.

Binary Strings

Sometimes we want to store binary content in a database. This might be a JPEG image, a Word document, an SSL certificate file, or anything that could traditionally be saved on the file system. SQL Server provides the BINARY and VARBINARY data types for this (and IMAGE for backward compatibility).

Length can be 1 to 8000 for BINARY and VARBINARY. For storing values larger than that, see the Large Object section below.

Large Objects

SQL Server 2008 introduced a new MAX length for several data types, including CHAR, NCHAR, VARCHAR, NVARCHAR, BINARY and VARBINARY.

(The XML data type uses MAX under the covers as well.)

This new specification allows up to 2 GB of data to be stored in a column with that declared length. We should take care not to use 2 GB when inserting data into these columns, but it provides greater flexibility when inserting more than 8000 bytes into one of these columns.

Other Data Types

SQL Server supports other types of data, which fall outside the scope of text and numerics. These include CURSOR, TABLE, XML, UNIQUEIDENTIFIER, TIMESTAMP (not to be confused with the date and time types), HIERARCHYID, SQL_VARIANT, and Spatial Types (GEOGRAPHY and GEOMETRY).

Next week, we will see how normalization and data types work together, now that we have a good overview of the different data types in a database.

If you have any thoughts or comments, please find me on Twitter at @bornsql.

%d bloggers like this: