On Twitter, Michael Dyrynda writes:
⚠️ Never ever ever ever ever ever ever ever ever ever ever ever ever ever ever ever ever ever ever ever ever ever ever ever ever ever ever ever ever ever ever ever ever ever ever ever ever ever ever ever ever ever ever ever ever ever store currency as a float in your database
I completely agree with this statement. Never store values used in financial calculations as floating point values, because a floating point is an approximate representation of a decimal value, stored as binary. In most cases it is inaccurate as soon as you store it. You can read more in this excellent — if a little dry — technical paper.
With that out of the way, we get into an interesting discussion about the correct data type to store currency values.
I’ve contributed to several books about SQL Server, and I wrote recently on this website about MONEY
and SMALLMONEY
. I like to think I know a little bit about picking the right data type for a certain column.
So I was surprised when a friend on that Twitter thread mentioned that he stores currency values as integers (multiplying them by 100 first). Being as literal as I am, I thought he meant the INTEGER
(or INT
) data type, which is a four-byte data type that goes up to around 2.1 billion (for values greater than zero), but Michael reminded me that “big integers exist,” so of course my friend was talking about BIGINT
.
I wrote in a reply that “Decimal will have to do. I’m not going to store money as a BIGINT
.” Nevertheless, I found it an interesting thought experiment to write about here.
Assume that your currency has a hundred possible decimal values from 0.00 to 0.99. Now assume that you can eliminate pesky rounding errors by storing currency data in cents as a BIGINT
. This requires you to multiply the currency amount by 100 when inserting it, and dividing by 100 when displaying it. That seems counterproductive in my mind, but bear with me as we investigate this further.
What problem is this meant to solve? Why avoid the use of the DECIMAL
data type, which is engineered to store decimal values with a decimal precision? Is the risk of rounding errors so great as to avoid a data type completely as the floating point? Surely DECIMAL
can handle this? Most of the time we use simple sums (SUM
), averages (AVG
), and regular arithmetic operations like addition, subtraction, multiplication and division, and in my experience — provided I put the parentheses in the right place — I’ve never run into a rounding error that was significant enough (i.e. outside of the tolerance of the organization calculating the amounts) to be a problem.
As I’ve pointed out before, BIGINT
ranges from negative 9 quintillion, to positive 9 quintillion (a number that is 19 digits wide). That’s a lot of squillions. Even storing these values as cents, you can store amounts that could handle even the ill-fated Zimbabwean dollar, which — at the time it was discontinued — was distributed in $100-trillion notes (100,000,000,000,000.00, or 17 digits wide). A BIGINT
needs 8 bytes of uncompressed storage for each value.
Now DECIMAL
is a different beastie. If you need the highest precision, a DECIMAL
can use up to 17 bytes for each value. Generally though, I like using DECIMAL(19,4)
for currency, which needs 9 bytes and can store numbers 19 digits wide, where the last four digits are after the decimal place. Coincidentally, this has the same range of values as the MONEY
data type. Maybe it’s not a coincidence that I chose that then, eh?
Purists will say that the MONEY
data type is woefully inaccurate when it comes to calculations, including multiplication and division. To those purists I say “pish!” Unless we’re talking Bitcoin, a currency amount usually has two decimal places. By using four decimal places, this means that rounding errors are less likely. In the vast majority of cases you will be doing things like summing values, calculating percentages, working out differences between two values, and so on. If you are being caught out by rounding errors, perhaps you should increase the precision of the input values for the purposes of that calculation only.
After all, if you’re going to the effort of storing currency values as cents in a BIGINT
column, you’re already doing conversions of your data. Why not just eliminate that overhead of multiplication and division on storage and retrieval respectively, store these values as DECIMAL(19,4)
, and increase the precision only when you need it? In fact, sometimes all you need to do is change the order of operations in your calculation.
When I design tables on SQL Server that are going to store a currency value, I generally use DECIMAL(19,4)
. If I encounter a customer database that stores these values using the MONEY
data type, I have no problem whatsoever with that, because they are basically the same thing anyway.
(Note: I know they’re not identical, but for the sake of storing currency values, it makes no practical difference.)
I don’t see the point of using BIGINT
to store currency values. It causes confusion, and will introduce bugs when multiplying or dividing by 100. This is especially moot since MONEY
is stored on disk in exactly the same format as BIGINT
anyway.
If you’re trying to save storage space, just compress your data using row or page compression. One thing we can all agree on though, is never to store currency as a floating point.
Share your thoughts in the comments below.
Photo by Josh Appel on Unsplash.
Here’s why people use BIGINT and why it may be necessary and should not be dismissed out-of-hand. In many cases in the accounting world we aggregate currency amounts in non-additive ways that may not make sense to the lay person. Aggregating these numbers can easily result in USD that is in the quadrillions which will not fit in dec(19,4). There are also a handful of currencies that do in fact have precision to 4 decimal points. Then there is the Korean Won which is like a billion Won to the dollar (exaggerating and too lazy to check current exchange rates). Converting those numbers also spills a dec(19,4). So the “trick” that most accounting software use is to use BIGINT when a number is stored PLUS a currency id for the column. The currency table has metadata col that holds the currency and a multiplier. An API is written that ensures that ALL DATA ACCESS goes through the API which handles the value for display purposes. So, if I want to store my salary which is $45,876.56 I would store it in a BIGINT as 4587656 with currencyid 2 which has a multiplier of 100. You can quickly see how that is beneficial. I’m not suggesting this solution should be used EVERYWHERE (and definitely not salary), but it’s helpful to not be so dismissive of these solutions which are IMPERATIVE in the acctg world, for some use cases. Understand your use case, understand good logical and physical modeling, and understand the capabilities of your rdbms.
Considering that MONEY and BIGINT are stored in *exactly* the same way by the database engine (see last week’s post), I don’t see any benefit to what you describe. I’d sooner use a wide DECIMAL with a high precision.
Regarding not using float in the database, I would also apply the same principal to programming languages too. If an application stores a value in the database where the datatype in the application is float, accuracy is already lost. It’s just not obvious like it is within the database.
Never using float is a bit of an extreme. It has a use case where an approximate value is good enough. For example, the position of a planet in the sky does not need to be exact for many astronomers. However, if you were plotting a course to orbit a planet, this inaccuracy would cause you to miss the planet.
Steal my thunder for next week’s post, why don’t you? 🙂
That is twice that I have done that to you. Great minds think alike! 🙂
No doubt. Thanks for your thoughts on this week’s post.
Just to counter this point, you could make your decimal larger. It doesn’t need to be (19,4). You could do (38,4) in which case decimal will hold a boatload more numbers than BIGINT would. For example, maximum BIGINT is:
9223372036854775807
While maximum decimal(38,4) is:
9999999999999999999999999999999999.9999
which is substantially larger than a BIGINT value. Mind you it uses more space than a BIGINT, but it also allows for more precision and less “lookups” to see what you should be dividing by. Using decimal(38,19) would give you the same number of digits before the decimal as BIGINT while offering 19 additional digits for precision.
so I think the argument for BIGINT hits a snag there, no?
The old Sybase MONEY datatypes are left over from the first early days of SQL. COBOL includes formatting in its declarations. They use what is called a PICTURE clause that gives floating currency symbols, commas, leading zeros, and other punctuation in currency amounts. Sybase was trying to keep COBOL programmers when they designed the MONEY datatypes, so they built in display features to match the PICTURE clauses. The problem was they also defined the precision of those MONEY datatypes and the rules for doing arithmetic with them. They got the rules wrong! You can Google it, but basically when you try to do multiplication and division with MONEY datatypes, you get rounding errors.. Add to this the number of decimal places is also not correctly defined according to EU standards.. You would really need to sit down with your accounting department and figure out which are required to do by law as well as by EU and GAAP accounting standards.
Thanks Joe. As I covered in my previous post, MONEY is stored as BIGINT anyway, so it makes sense it’ll have the same kind of rounding issues.
please give me a real world example, where you have to multiply two money values.
Multiplying money with integer or decimal (usually an amount) is very common, but I never had to multiply 100$ by 5$ (only 100$ by 5)
It doesn’t matter if YOU see benefits, the fact is, many accounting systems are architected this way. Understanding the WHYs is helpful to understand when/if you see it in the wild. Let me be clear, under no circumstances would I model a greenfield schema this way in 2020, but that doesn’t mean you won’t encounter it in the wild. Remember that many acctg systems were architected in a time when a customer could choose which dbms they wanted to implement it on. Not every dbms internally represents the data structures for MONEY and BIGINT the same way, in fact, some don’t support larger numerical data needed to support things like non-additive aggregations and Korean Won conversions that are required. To make my point again, it’s important to understand use cases and decisions, even poor ones, so we can work with them but ensure we don’t make the mistakes of 1971 again in 2020.
My blog post is titled “How should I store currency values in SQL Server?”. Knowing the history of accountants doing it wrong is interesting, but ultimately irrelevant. This post stems from a series I’m writing about how the storage engine persists data, and as a data professional working with SQL Server in 2020, I’m going to use DECIMAL with a high precision starting at (19,4) when it comes to currencies, unless there is a good use case otherwise. No one is arguing that point. The fact that some people use BIGINT instead of MONEY is an interesting tangent, but ultimately meaningless *to the database engine* because they’re persisted the same way on disk and in memory. That’s the point of this post (and tomorrow’s post explores in detail why FLOAT is a terrible choice for currency). I appreciate your comments, and hope you stop by in the future. You are welcome to continue discussing this with me on Twitter.
What’s your definition of “real world”? I mentioned this in the first comment…accountants can do some odd things with data. Here’s one. Large companies often have the concept of “controls” on their reports and data extracts. Something like, I know this report is accurate if I can do some crazy math against 2 summary/control accounts, and the number gets put in the footer or a control file. Often this is modulo math on multiple columns, but that’s really just multiplication, right? This was done, as Joe Celko mentioned above, b/c older acctg systems may use PIC(x) in cobol when the dbms stores both as MONEY or a column where the underlying UOM is a currency. Having a control file gives a warm fuzzy that something wasn’t rounded/trunc’d. You may say that is stupid in 2020 and computers don’t make basic errors like that and a control file/footer is unnecessary. But then why do most acct systems still actually store their data like an old school green bar accting book with DB/CR entries? Answer: because that’s what it takes to make the accountants feel good. In today’s world we do the same thing with “control files”, just using more modern technologies such as hashing or bin_checksum(). In 2020 this is a terrible implementation but unfortunately we still see and need to understand the WHY of it.
Even if we keep money as decimal, shouldn’t we convert it to an integer when making transactions? convert it to decimal again while saving? to eliminate bugs? If this is the case then there is no difference between storing as deciamal or bigint, I guess.
Why would you want to convert to integers? Keep as decimal. The argument to convert to integers doesn’t make any sense to me to begin with.
In most of my recent tables I’ve either separated the values into two or used DECIMAL.
When going the BIGINT route I used BIGINT for the dollars and INT for the cents.
My input scripts do the math separately:
$10.92 + $25.12 => 10+25 & 92+12 => 35 & 104 => 36 & 4 (expressed as $36.04 in the front-end UI)
There is also absolutely no need for them to have more than 2 decimal places. But in the event I needed anywhere up to around 10decimal places of precision, the INT will handle that without the need for me to change anything but how it’s computed. (eg 4 could represent $0.0004 and 4444 could be $0.4444)
If I used FLOAT or MONEY I’d have nightmares.
— I usually don’t waste time commenting on personal-view-articles like this, but because this somehow appears on top of Google search result made me responsible for replying here in case anyone read it. —
The reason why you would want to store money value as BIGINT is because of the applications that consume the data need that exact precision.
Storing money in decimal means it will always get converted into Float within the application layer (regardless of which language), and float has a very limited precision and can cause wrong mathematical result. It’s a concept known as floating point arithmetic, you can find out more here: https://en.wikipedia.org/wiki/Floating-point_arithmetic
Try this on javascript or python: 26.4-25
you will get: 1.3999999999999986
This is the main reason why the majority of system that deal with money and currencies would use integer as the method to store money, and database designers understand this, hence they made MONEY as BIGNIT.
Neil, thank you for your comment.
I’m extremely familiar with floating point data types, and have written another blog post on that which you can read here.
I wanted to point out that in all .NET languages (for example, and not counting scores of other languages) the fixed-precision for the
DECIMAL
data type is absolutely maintained, so your “regardless of which language” assertion is false.Your example uses implicit type conversion, and I’d suggest that anyone writing financial calculations that way needs to undergo a code review with a more experienced software developer. That review would cover a number of best practices, including the need to use strongly typed values, to define your variables up front with the correct data type, and make sure that calculations are performed on values that are the same type. With a few very minor changes, the 26.4 – 25 example would result in the correct, fixed-precision result of 1.4 you’d expect.
Even with your assertion that
BIGINT
is the correct way to store values to work around a problem that would be easily remedied through better coding practices, your example would have the exact same problem with an integer type value, because as soon as you perform an implicit conversion you’d still be working with floating point values anyway.It’s a shame you used a fake email address, because you won’t see the notification for this reply, and we will miss out on a robust discussion.
Thanks for stopping by.
Comments are closed.