Recently I wrote:
Don’t store passwords in a database.
I stand by this statement.
I expected a lot of flak because I didn’t explain myself. This post goes into a bit of an explanation of my position, as well as how to go about storing something in a database that can be used for authenticating users.
If you are storing passwords in a database, you should stop doing that immediately. We, as software developers and data professionals, should never know what passwords our customers are using. The same goes for most sensitive data: we technical staff probably don’t need to know what’s in there. Some stuff should be hashed, and some stuff should be encrypted.
To explain the difference between hashing and encryption, I’m going to quote myself from a book I contributed to:
In a security context, data that is converted in a repeatable manner to an unreadable, fixed-length format using a cryptographic algorithm and that cannot be converted back to its original form is said to be hashed.
Data that is converted to an unreadable form that can be converted back to its original form using a cryptographic key is said to be encrypted.
How do I authenticate my users’ passwords then?
What you can store in a database is a password hash. To quote myself again from the same book:
When a password has been hashed correctly, it cannot be decrypted into its original form. Used with a random salt (a random string applied along with the hash function), this results in passwords that are impossible to reconstruct, even if the same password is used by different people.
Because we never need to know what password our customers are using, we will hash and salt the password using a one-way, irreversible cryptographic algorithm that generates a fixed-length binary value, which we can then persist to our database (as binary data, or even as a textual representation if we wish).
In practical terms, this means that we don’t care about uppercase characters, symbols, or any of that nonsense. As long as the password is escaped correctly (in other words, any characters with special meaning in a particular programming language are rendered as plain text) when the authentication system captures it, the resulting hashed and salted value is written to the database without worrying about SQL injection, or even how long the password is. The hashed value length depends on the cryptographic algorithm we’ve used, and every single password will have the same length. That means SQL Server database developers don’t need to worry about the maximum length of a password, and can even save two bytes per column, by making the data type
n is the length of the hashed and salted value, as opposed to storing a variable length string as a
VARCHAR data type.
Remember to add a salt
The algorithm is not as important as making the resulting hash completely random, which is possible by adding a random string to the password (the salt). Without a salt, no matter how many times I hash the password “hello”, it will always be hashed the same way depending on which algorithm I use. The salt can be stored in the database along with the password hash, because (provided you have designed the system correctly) an attacker gaining access to an unencrypted database means that you have bigger problems.
Which cryptographic algorithm should I use?
Over the years, many algorithms have been rendered useless, either because of a weakness in the algorithm itself, a weakness in the implementation of that algorithm, or (more likely) an increase in computing power that makes it easy to brute-force the entire range of hashes that can be generated by that algorithm (the key space). This latter problem is mitigated in part by the random salt that is used when hashing a password, as noted previously.
However, the newer hashing algorithms also include a couple of built-in defence mechanisms. The first is that the algorithm is slow by design, either by requiring a lot of CPU power, requiring a lot of memory to perform the hash function, or a combination of these factors. Examples are bcrypt and scrypt, and many programming languages contain implementations of these algorithms. The second built-in way to reduce the chance of brute-force attacks is by using longer hashing keys (measured in bits). For example, AES-256 refers to the AES algorithm and the size of the hashing key, which is 256 bits in size (32 bytes). With current computing power, it will take thousands of years to explore the entire key space.
There are known weaknesses in the implementation of AES, which is why security professionals recommend using AES-256 at the very least. With bcrypt, for example, we can increase the iteration count to be more effective, and scrypt is designed to use large amounts of memory.
Hopefully this clears up my position. You should encrypt your database anyway (using Transparent Data Encryption for SQL Server for example), but you should also make sure that if your password hashes are accessible, that they cannot be reverse-engineered into the original passwords.