You Want Salt With That? Part Two: We Need A Hash

OK, we want to sketch out an authentication system which is sufficiently secure against common attacks even if all the details of the system are known to the attacker. Let's start with a simple system, take a look at what its vulnerabilities are, and see if we can mitigate them:

System #1

The client transmits the username and password "in the clear" to the server. The server consults a list of usernames and passwords, and grants or denies access accordingly.

There are two big problems with such a system. First, it's susceptible to eavesdropping if the client and the server are not the same machine. If someone can listen in on the message sent from the client to the server with a network packet sniffer then the eavesdropper learns a valid username and password. Second, if the server is ever compromised and the password list is read, the attacker learns everyone's username and password.

Can we mitigate these problems? Let's look at the second vulnerability first. Does the server need to store the password? How could it NOT store the password? What would it compare against?

We can eliminate the need for the server to store the password if we have a hash function. A hash function is a function which takes a string as its argument and produces a number, usually a hundred--or-so-bit integer, as its result.

A good hash algorithm has the property that slightly different inputs produce very different outputs. A one-bit change in the input should cause on average 50% of the output bits to change. Because of this, it should be extremely difficult to deduce the an input that produces a given output, even given partial information about the input. (There are other hash algorithm properties that are important for cryptographic operations such as document signing, but that's a topic for another day.)

Proving that a given algorithm actually has this property can be quite tricky, but we have some industry-standard hash algorithms which have withstood rigorous testing and deep analysis by professional cryptographers and are widely believed to be "one way functions" -- it's easy to go from string to number, and very hard to go backwards.

System #2

The client sends the username and password to the server. The server has a list of usernames and the hashes of passwords, but not the passwords themselves. (When the user first created the password, the system hashed the password and then discarded it, saving only the hash.) The server hashes the client-supplied password and compares the hashes.

This is better; now the server is not storing the passwords, just the hashes. If an attacker compromises the server, they can't easily go from the hash back to the password. (It also has the nice property that every entry in the password table is now the same size. Longer passwords do not have longer hashes.)

But there are two new problems with this system. First, any two users who have the same password have the same hash. If one of those users is evil and compromises the server, they immediately learn who has the same password as they do.

Second, this system is susceptible to dictionary attacks. An attacker can hash every word in a large dictionary, compromise the server, and then compare every password hash to every word in the dictionary. Since dictionary words are likely passwords, the attacker will probably be able to figure out at least a few passwords.

And of course we still haven’t mitigated the fact that eavesdroppers could be listening in on the conversation between the client and the server.

Next time, we'll add a little salt to the mix in an attempt to mitigate the dictionary attack and same-password vulnerabilities. Then we'll see if we can use some of the hash technology to mitigate the eavesdropping attack.

Comments

  • Anonymous
    January 31, 2005
    The comment has been removed

  • Anonymous
    January 31, 2005
    The comment has been removed

  • Anonymous
    January 31, 2005
    The comment has been removed

  • Anonymous
    January 31, 2005
    The comment has been removed

  • Anonymous
    January 31, 2005
    When I first learned about Kerberos, I thought it was an overly complex, rather inefficient protocol; I didn't understand why when NTLM and hashed basic auth. already existed.

    Fast forward a year, with a great deal more knowledge of crypto and authentication, and I discovered v5 is a breathtakingly awesome feat. Properly implemented (which seems to be the killer) it solves all MITM and cracking problems as much as currently feasible.

    A salt just forces cracking to concentate on one password at a time - it does nothing to increase the security of a single entity, but it increases the time to attack an entire system by multiplying time for one by the number of accounts attacked, compared to negligible extra time for unsalted passwords.

  • Anonymous
    January 31, 2005
    Indeed, Kerberos is a thing of shiny beauty, though of course, like all security systems, it trades some convenience for greater security -- you've got to have a mutually trusted third party, you've got to ensure that everyone's clocks are synchronized, etc.

  • Anonymous
    January 31, 2005
    The comment has been removed

  • Anonymous
    January 31, 2005
    "take two hundred digit prime numbers, multiply them together. That's a one-way algorithm. It is computationally infeasible to go the other way "

    Why do the numbers have to be prime? Wouldn't this be true of any two hundred digit numbers?

  • Anonymous
    February 01, 2005
    The comment has been removed

  • Anonymous
    February 01, 2005
    The MD5 faq is here:

    http://www.faqs.org/rfcs/rfc1321

    and the SHA1 faq is here:

    http://www.faqs.org/rfcs/rfc3174.html

  • Anonymous
    February 01, 2005
    > Why do the numbers have to be prime? Wouldn't this be true of any two hundred digit numbers?

    Yes and no. If the question is "find the two numbers that I had in mind", then yes, this is an even harder problem because it is information losing. If the product is 36, did I start with 1 x 36, 2 x 18, 3 x 12, ... ?

    But if the question is just "find a factorization" then primeness is important. Factoring 2^250 x 3^200 is trivial. Finding a factorization is hard if all the factors are very large. A number which is the sum of two large prime numbers is very hard to factor.


  • Anonymous
    February 01, 2005
    In your last comment Eric, I think you mean, "A number which is the product of two large prime numbers is very hard to factor."

  • Anonymous
    February 01, 2005
    Yep. Clearly I am typing faster than I am thinking...

  • Anonymous
    February 01, 2005
    The comment has been removed

  • Anonymous
    February 01, 2005
    MD5 is insecure. Why?

    If you manage to get a collision at one stage, then the collisions propogate going forward along subsequent rounds.

    Recent studies have shown how to do this for practical examples, I'm sure you can Teoma to find them.

    For now, make sure you use both MD5 and SHA-1 as a temporary countermeasure. Longer term, let's develop some new, better hashes (without MS input please, we don't want patents on mathematics).

  • Anonymous
    February 01, 2005
    Teoma them? Don't you mean A9 them? :)

  • Anonymous
    February 03, 2005
    Fabulous Adventures in Coding assays this week a, well, fabulous adventure, in simple cryptography. I know enough to get myself in deep trouble with this subject, but Eric has put together three short and knowledgeable posts that begin easy and...

  • Anonymous
    February 10, 2005
    "A number which is the sum of two large prime numbers is very hard to factor"

    I decree that 2 is one factor of that sum.

  • Anonymous
    February 22, 2006
    The blog is very useful.

  • Anonymous
    September 06, 2007
    A recent question I got about the .NET CLR's hashing algorithm for strings is apropos of our discussion

  • Anonymous
    January 10, 2008
    The comment has been removed

  • Anonymous
    January 10, 2008
    The comment has been removed