In this post we are looking at how SQL Server stores floating point values (
If you’d like to read the previous posts in this series of how SQL Server stores data types, you can do so here:
Floating point values
If an integer or decimal amount is a precise representation of a value, a floating point is the closest approximation of that value in binary. Programming languages and databases use floating point numbers to trade storage (and memory) costs against precision. A floating point value is imprecise, but even that is underselling the problem.
Say you want to store your bank balance, and then want to calculate possible interest repayments on that balance. Let’s assume you’re storing amounts using the
FLOAT data type because you want to save a few bytes. So you store the value 4,513.19. SQL Server will take this value and store it in binary (which we can represent as hexadecimal) as follows:
The first thing you’ll notice is that it’s using eight bytes to store a value, so what is going on here? According to the official documentation for
FLOAT, if you don’t specify the number of digits after the decimal point it defaults to 15 decimal places, which means the mantissa is set to 53 bits. SQL Server is erring on the side of caution for you, and using the full span of 15 decimal places even if your starting value only requires two decimal places.
The full range of values that the
FLOAT can store is 1.79E+308 to -2.23E-308, the value zero (0), and 2.23E-308 to 1.79E+308. Those numbers are so astronomical in size (hint!) that it’s meaningless to write out 1.79 x 10^308, but this is my blog so I’m going to anyway:
Wow. All that can fit in eight bytes, huh? Yeah, no. No matter which language or database you’re working in, with ranges that have to handle values this large, some rounding is necessary to get the closest approximation of that value, stored in binary.
The IEEE Standard for Floating-Point Arithmetic (IEEE 754) defines how SQL Server stores both
REAL values (
REAL is the 4-byte, or 32-bit variation, while
FLOAT is the 8-byte, or 64-bit double-wide variation). The only way SQL Server differs from the standard is that it reverses the bytes when it stores them because of how Intel CPUs work (called “little-endian”), otherwise it’s the same.
Looking at the binary representation of 4513.19 as stored in a
FLOAT data type,
0x3D0AD7A330A1B140, we flip the bytes around and start prying the binary apart. Yes, binary. This data type stays binary as long as it possibly can:
0 10000001011 0001101000010011000010100011110101110000101000111101
According to IEEE 754, the 64 bits break down as follows:
- 1: the signed bit
- 2 to 12: the exponent, which is offset against a bias
- 13 to 64: the significand (also known as the mantissa)
The eagle-eyed amongst you will notice that this value only has 52 bits in the significand. That’s because there’s an implicit first bit with a value of 1 that is hidden when storing the value. Yes yes, I know, but I don’t make the rules. Keep this hidden bit in mind.
The offset bias is a fixed power of 2 (1023 for
FLOAT according to the IEEE standard) that allows us to store unsigned numbers, giving us a much wider range of values than we would have with signed numbers.
In the case of
REAL (also known as
FLOAT(24)), it uses a 4-byte value (32 bits wide), with an exponent of 8 bits and a significand of 23 bits (22 are stored because of the hidden bit). The signed bit is the same, but the offset bias is 127 as opposed to 1023.
Stepping through the conversion
The first bit represents the sign. 0 means the number is positive, 1 means negative. A floating point value of zero is considered positive.
The next eleven bits are the exponent, which in this case is
0x40B, or 1035. Because it’s an exponent, we raise 2 (the base-2, or binary counting system we’re using) to the power of 1035. However, the offset bias is 2^1023 (2 to the power of 1023) which we have to subtract (hence “offset”). This gives us an actual exponent of 2^(1035 – 1023), or 2^12. Using a quick calculation, that represents the decimal value of 4096. Bearing in mind that our stored value is 4513.19, it makes sense that 4096 is the closest exponent we can get to in base-2. So far, so good.
Now we get to the part that makes floating point values inaccurate, because we’re using what’s called binary fractions: it’s like a regular decimal fraction, but it’s in binary. Remember that “hidden” bit at the start of the significand, that is always assumed to be 1? This is where we use it.
Let’s look closely at our significand (or mantissa). Do you notice the repeating pattern starting at the 15th bit? (The first two rows indicate column positions.)
0000 0000 0111 1111 1112 2222 2222 2333 3333 3334 4444 4444 4555
1234 5678 9012 3456 7890 1234 5678 9012 3456 7890 1234 5678 9012
0001 1010 0001 0011 0000 1010 0011 1101 0111 0000 1010 0011 1101
A repeating pattern is common in floating point binary, and it means that there is a recurring value (similar to decimal fractions like 0.333333). In other words, it is not possible to store an accurate binary representation of the floating point value, because we only have 52 bits (or 22 in the case of a
REAL) to store the significand. Eventually we run out of bits and have to truncate the remainder.
Let’s calculate this out, though, by prefixing our significand with the hidden bit of 1, and then take the remaining 52 bits and put them after a decimal point, like so (the first two rows indicate column positions):
Each position in this fraction is a power of 2, because we’re still in binary. We keep the 1 to the left of the decimal point and add on the 52 positions after the decimal point, where each position must be converted using negative powers.
So for this amount, we’d have 1 x 2^0 before the decimal place, which always equals 1. Then for every position where we see a 1, we add in the negative power for that position:
(1 x 2^0) + (2^-4) + (2^-5) + (2^-7) + (2^-12) + (2^-15) + (2^-16) + (2^-21) + (2^-23) + (2^-27) + (2^-28) + (2^-29) + (2^-30) + (2^-32) + (2^-34) + (2^-35) + (2^-36) + (2^-41) + (2^-43) + (2^-47) + (2^-48) + (2^-49) + (2^-50) + (2^-52)
Depending on the number of decimal places your calculator supports, you end up with a decimal fraction that looks like 1.101853027343749902300373832986225. We take this resulting fraction and multiply it by the exponent and end up with 4513.1899999999995998223312199115776. While floating points can have up to 15 significant decimal digits, our starting value was only two significant decimal digits so the result is rounded up to 4513.19.
We tried to store 4513.19 in a floating point data type, and ended up storing 4513.1899999999995998223312199115776 instead. If you are storing exponents for massive numbers such as scientific or engineering fields,
FLOAT is probably fine.
On the other hand,
REAL which has the same problems but at a larger scale) are not recommended for storing financial amounts. Even with 52 bits, the binary representation of your decimal value will be inaccurate most of the time. It’s just not worth the risk. Store your financial values as
DECIMAL, and make sure you use a large precision (the number of digits after the decimal) when performing calculations.
You can download the Excel Floating Point Calculator I created for checking my work in this post.
Leave your thoughts in the comments below.
A very good and interesting read! I like that and it explains why we shouldn’t use REAL or FLOAT for most cases. I was aware of this in .NET, but hadn’t given it any thought in SQL before.
As a historic note, we originally had only three approximate datatypes in the SQL standard; real, float and double precision. There was no IEEE standard at the time, so every vendor had its own implementation. We defined real as being a base 10 representation and float and double precision as base e. Yes, this sounds a little vague and each manufacturer could’ve implemented some of this in hardware differently. But the goal of the standard was to tell you “what” and not “how” the number should work. While I’ve never had any occasion to use approximate numerics in the last 30 years, I’m glad we went over to the IEEE standards. I started life as a Fortran programmer, so I had to learn an awful lot about floating-point numbers. I’m not sure if foreign programmers are taught all of the tricks we had to learn and I’m not sure that any of the SQL compilers handle them correctly either. One of the exercises I gave in my SQL classes was to write a random list of numbers all sizes using floating-point datatype. Add them and write down the total.. Sort them in ascending order, add them and write down the total. Sort them in descending order and write down the total. Finally, sort them by absolute value in ascending order and add them. The results will not be the same in most SQL implementations, because of floating-point rounding errors and SQL compilers don’t rearrange column by absolute value when they do arithmetic. Scientific languages that deal with smaller data sets than databases will often have a pass in the compiler to rearrange expressions to minimize the rounding error.