Making up passwords is annoying. And when you finally come up with something, you always get a message saying it needs to have a capital letter in it, or a number, or a special character. In this post I’m going to explain the maths behind those bothersome messages and suggest a better way to choose passwords.

First, to provide some context and background information on the topic, I’d like to go on a little digression about how passwords are stored. A while ago it occurred to me that my password must be stored somewhere in a text file, so that next time I have to put my password in, the computer can tell if I’ve got it right, hence anyone who knew where that file was could find out what my password was. This bothered me a bit at the time, and then I forgot about it and went on with my life. Some years later I found out what actually happens. It isn’t your password that’s stored, but rather its image under a particular function called a hash function. When you set your password \(p\) the computer stores the hash of your password, \(f(p)\), and when you next enter a password \(q\) it calculates \(f(q)\) and compares that to \(f(p)\). One obvious property the hash function needs is to be very hard to invert, i.e. given \(f(p)\) it isn’t possible to work out \(p\).

Second, I’d like to mention the most common way that passwords are attacked. What normally happens is that an attacker manages to exploit a security hole in a web server and gets access to a list of user names and the corresponding passwords. If they’re very lucky the passwords are in plaintext, in which case they can then access any of the accounts, or sell the passwords on the black market. More often the passwords are hashed. There are various different hash functions that could be used, and some hash functions are better than others. The attacker can easily find out which hash function has been used, so this does not add to the security.

Now we can go back to my original question: why does your password need to have an upper case character, a number and a special character in it? For security, of course. But how does that make it secure?

Imagine you are an attacker, and you want to get access to something that’s password-protected. You’ve managed to get the list of hashed passwords. The hash function has been designed so that you can’t go from the hash to the original password in a reasonable amount of time. But what you can do is make lots of guesses, run them through the hash function, and see if you can guess the password (or something else which hashes to the same thing). Unless special precautions have been taken, you can check against the whole list of hashed passwords at once, which makes it more likely you’ll find one that matches.

The most obvious thing to do is to try everything, i.e. every string of characters of length 1, then length 2, then length 3, and so on until you get a match. That’s what’s known as a brute force attack. Suppose the password is seven characters long, surely there are too many possible passwords for an attacker to try them all? Well, you’d be surprised. As computers have got faster the rates at which computers can calculate hashes has got faster, and because this has only happened fairly recently old hash functions weren’t designed with this in mind. The old tried-and-tested hash functions, such as MD5 (“message digest 5”, first published in 1992 by Ron Rivest, who put the R in RSA) and SHA-1 (first published in 1995), can now be executed extremely quickly on modern computers, especially on the type of processors found in modern graphics cards. That’s a good thing for some of the other uses of hash functions, but when you’re securing passwords it’s a very bad thing.

In this post I’ll assume a worst-case scenario, that the password has been hashed with a single round of MD5. A single graphics card, which costs about £300, can calculate 8 billion MD5 hashes per second. At that rate, trying all passwords of length six can be done in the time it takes to make a cup of tea. Let’s take our set of characters to be all 26 lower case letters, 26 upper case letters, 10 digits and 33 special characters, so there are 95 in total.

Password length Number of passwords Time needed
1 95 102.0 26.6 12 nanoseconds
2 952 = 9.0×103 104.0 213.1 1.1 microseconds
3 953 = 8.6×105 105.9 219.7 0.11 milliseconds
4 954 = 8.2×107 107.9 226.3 10 milliseconds
5 955 = 7.7×109 109.9 232.8 0.97 seconds
6 956 = 7.4×1011 1011.9 239.4 1.5 minutes
7 957 = 7.0×1013 1013.8 246.0 2.4 hours
8 958 = 6.6×1015 1015.8 252.6 9.6 days
9 959 = 6.3×1017 1017.8 259.1 2.5 years

Great, so we just need to make sure all our passwords are at least nine characters long and we’re okay? Not quite. There’s no reason why the attacker has to try all possible passwords up to a certain length, when some are obviously far more likely than others. For example, if I wanted to find the combination to a four-wheel padlock, I wouldn’t start at 0000 and work my way up to 9999. I would first try some combinations I thought were more likely. For instance:

  1. 1234
  2. 0000, 1111, 2222, …, 9999
  3. 1900-2019
  4. Dates in DDMM format, i.e. <01-31><01-12>
  5. All digits the same except for one, i.e. 0001, …, 0009, 1110, …, 1119, …, 9888

Let’s use the word ‘template’ to refer to a rule for generating a subset of possible passwords. These five templates give me 1, 10, 121, 372 and 400 passwords to guess respectively, so 904 in total, far fewer than the 10,000 possible combinations. If I can try one combination per second, trying all 904 would take about 15 minutes. (If this was a four digit password on a computer those 904 guesses would take 0.1 microseconds – but then, trying all 10,000 four digit passwords would only take 1 microsecond.)

Similarly, when people choose passwords they prefer to use things which have meaning to them, so that they are easier to remember. Looking at lists of leaked passwords, we can see that when there are no restrictions on passwords, people tend to use short words or sequences of numbers. It is very easy for an attacker to get an exhaustive list of words, e.g. all the words used in Wikipedia so you’ve covered people and places as well, and use them as their guesses.

To avoid this problem, when you make a password you are usually required to use a mixture of upper case and lower case characters, with at least one number and special character. But actually, this doesn’t solve the problem at all. People find random passwords difficult to remember, so they tend deal with these restrictions in ways which they find easy to remember. If they do use an upper case letter it’s at the beginning of a word. They replace characters in words with numbers or special characters which look similar. Any other special characters go at the end of the password.

It isn’t hard to come up with templates that cover an awful lot of these kinds of passwords. First you take your list of all the words from before. Then you add in all the things you get by starting from those words and replacing the letter “a” by “@” or “4”, the letter “I” by “1”, the letter “s” by “$”, the letter “e” by “3”, and so on. Then you look at leaked lists of passwords and make up templates that cover as many of the passwords as possible. For example:

  1. All numbers, length up to 9
  2. Lower case word followed by another character
  3. Word with first letter capitalised, followed by a special character
  4. Word, special character, word, special character

Here are some examples of passwords that were recovered this way for a recent article in Ars Technica.

  1. n3xtb1gth1ng
  2. Oscar+emmy2
  3. Garrett1993*
  4. J21.redskin
  5. $0ccerBa11

So, the normal advice about how to make a password doesn’t work particularly well, because in order to make their password memorable people will tend to choose something that will be caught by one of these templates. What is the alternative?

One thing that is suggested quite a lot is using the first letters from a sentence that you can remember, and replacing some letters by numbers and special characters. While that can be secure, if the sentence you use is too well known then an attacker may be able to guess it. People are likely to use famous quotations and lines from songs, which aren’t that hard for an attacker to guess.

Instead, I’d like to describe a password generation scheme called Diceware, that is provably secure. We’re going to remove our reliance on a person coming up with something random. This will make it much easier to say with confidence that our scheme is secure. To do this, we need to have a secure random number generator. As you’ve probably guessed from the name, we’re going to use dice rolls. We could use these random numbers to generate a string of characters of length 9, but we’re going to have a hard time remembering it. What people find easy to remember are words, because they have a meaning. What if we could generate a string of random words instead? That’s exactly what we’re going to do.

First get yourself a list of 7776 (65) short words, indexed using five numbers from one to six. These are available online for US English speakers, non-US English speakers and many other languages. Decide how many words you want in your password, and then for each word use five dice rolls to choose a random word from the list. Put a space between each word, and if required capitalise the first letter and add a number and special character to the end so it will be accepted.

Number of words Number of passwords Time needed
1 65 = 7.8×103 103.9 212.9 97 microseconds
2 610 = 6.0×107 107.8 225.8 7.5 milliseconds
3 615 = 4.7×1011 1011.7 238.8 59 seconds
4 620 = 3.7×1015 1015.6 251.7 5.3 days
5 625 = 2.8×1019 1019.5 264.6 110 years

At first sight, Diceware passwords seem to go against the common advice not to use dictionary words in your passwords. But their security comes from there being so many possible combinations of words that it would take far too long to try them all. Your password is safe even if the attacker knows how you came up with it, and what list of words you used, as long as they don’t know what numbers you got from your dice rolls.

To summarise: the normal advice about making passwords is good for making passwords that are hard to remember and easy for an attacker to guess. If you instead want passwords that are easy to remember and hard for an attacker to guess, you can use Diceware.