Breakable hash function or Random Oracle?

12 Jun 2015

Introduction

A recent thread on Twitter started when Frank Denis asked the simple question in relation to ed25519 signatures:

Why did ed25519 opt for h(R || A || m) and h(rh(h(k)) || m) instead of h(m || R || A) and h(m || rh(h(k))) that could be precomputed?
— Frank Denis (@jedisct1) June 11, 2015

Some of the discussion was much terser than desirable (a necessity of the Twitter platform), so I thought I’d write up the thread here for the mystified.

Can we trust hash functions to be collision resistant?

@veorq @jedisct1 @mik235 It's a random oracle. No, it's a crappy hash function full of collisions. Can't decide!
— Matthew Green (@matthew_d_green) June 11, 2015

History has taught us that hash collisions can be found, and they become practical much faster than we can eradicate their use. MD5 is the now-classic example, although even MD4 is still in widespread use in collision-susceptible settings (such as Microsoft Remote Differential Compression).

What the designers of the Ed25519 signature scheme have done is take out insurance against precomputed hash collisions.

Prying open the Oracle

In the case of Merkle-Damgard hash functions (and similar constructs), appending data - even if it’s secret - to a pre-computed colliding message will not break the collision.

The Merkle-Damgard construct works vaguely as follows:

message_blocks := split_into_blocks(pad(message))
ihv := IV
for i in num_blocks(message_blocks) {
	ihv += compress(ihv, message_blocks[i])
}
return ihv

If you have a pair of messages where the IHV collides after block 2, then you can always swap out blocks 0-2 with your alternate message - even if you don’t know what’s in block 3 and it will produce the same hash.

As such, any time you see H(m) or H(m || …) and a hash function such as MD5, SHA-1, SHA-256 you are at the mercy of precomputed hash collisions.

Ok, so what about a true random oracle?

In a true random oracle model there is still an attack on hash functions - a Birthday Attack. Appending or prepending a secret nonce has the same effect - it breaks a precomputed birthday attack.

Simply having long enough output from your hash function fixes this and is what most signature schemes in widespread use depend on. Still - including the secret nonce does prevent any precomputation in this scenario.

What about SHA-3?

SHA-3 uses a crytographic sponge function. This is a clever construct that can be used to turn a PRP (pseudo-random permutation) into a hash function, MAC, stream cipher, AEAD and more.

As a hash function it operates roughly like this:

block := IV
data = pad(message)
for i in range(length(data) / r) {
	rate := block[0..r]
	capacity := block[r..end]
	rate ^= data[i*r .. (i+1)*r]
	block = permute(rate || capacity)
}
return block[0..r] # Optionally permute again and return more r bits

The general concept is that the capacity section of the block is reserved for the function, while the rate determines how much data can be processed per call to the permutation. The security level of a sponge function is capacity/2 bits (c=512 will give 256-bit security).

A precomputed collision attack that only went far enough to control the output (and not the entire permutation) would be foiled by either appending or prepending a secret nonce to the message.

On the other hand an attack producing messages that collide for the entire permutation would not be foiled by a secret suffix (but a secret prefix might do the trick).

So what does this mean for deployed hash functions?

HMAC famously survived MD5 collision attacks. It’s function is

ikey := k ^ ipad
okey := k ^ opad
inner := H(ikey || m)
return H(okey || inner)

Since the key is prepended to the message in the inner hash function, and the key is presumably unknown to an attacker (if they know the key they don’t need collisions), precomputed collisions are impossible for any semi-reasonable hash function. The outer hash serves to protect against Length Extension attacks.

Now look at this code snippet from rsync (version 3.1.0):

char seedbuf[4];
md5_begin(&m);
md5_update(&m, (uchar *)buf, len);
if (checksum_seed) {
	SIVALu(seedbuf, 0, checksum_seed);
	md5_update(&m, seedbuf, 4);
}
md5_result(&m, (uchar *)sum);

Note that the checksum seed is appended to the message - so even if you have a secret checksum seed it’s still susceptible to collisions. Try these ones.

In Summary

If you’re using hash functions in some way, prepending data that attacker doesn’t know will generally improve security. Proofs in the random oracle model say that appending works too, but it pays to dig deeper sometimes.