
       The  New  Cryptographic  Hash  Function  RIPEMD-160
       A secure replacement for MD4 and MD5

       Antoon Bosselaers, Hans Dobbertin, and Bart Preneel

(Footnote:
    Antoon and Bart are with the COSIC research group of the Department of
Electrical Engineering of the Katholieke Universiteit Leuven. They can be
contacted at {antoon.bosselaers,bart.preneel}@esat.kuleuven.ac.be. Bart is an
N.F.W.O. postdoctoral researcher, sponsored by the National Fund for
Scientific Research (Belgium). Hans is with the German Information Security
Agency. He can be contacted at dobbertin@skom.rhein.de.)


   A cryptographic hash function takes an input string (message) of arbitrary
size, and reduces this to a short string of typically 16 or 20 bytes.  The
hash value of an input string is an equivalent to the fingerprint of a
person. It is often also called a message digest.  Hash functions are used
for digital signatures such as RSA and DSA, but also for the construction of
MACs (message authentication codes), the protection of passwords, and for the
derivation of independent secret keys from a single master key. Cryptographic
hash functions are an essential building block for applications which require
data integrity such as the detection of computer viruses, Internet security
(for example PGP, IPSEC), and the security of electronic commerce and
banking. Well-known hash functions, designed by Ron Rivest, are MD4 and its
successor MD5. However, by recent research the trust in their security is set
into question. The authors have designed a new hash function called
RIPEMD-160, which is intended to offer a secure replacement for MD4 and MD5.
   A hash function must be a one-way function, which means that finding an
input corresponding to a given output string is difficult.  With `difficult',
one should understand `really hard': even an opponent who wants to spend a
significant amount of money, say 10 million $, will have a negligible success
probability. One also requires that finding a different input hashing to the
same value as a given input should be difficult (this is called a second
preimage).
   For some applications, such as digital signatures, it should also be
difficult to find two different inputs with the same hash value.  Such a pair
is called a collision. At first sight, this might seem the same as finding a
second preimage.  However, it turns out that this problem is a lot easier.  A
simple analogy is the following:  you will need, on average, to talk to
365/2=182 people before finding someone who is born on the same day as you
are.  But, in a group of 23 people, odds are 2:1 that there are two people
with the same birthday.  This surprising observation is known as the
`birthday paradox'.  (For the wizards among you:  in practice, the
probabilities are even higher since birthdays are not evenly distributed over
the year.)
   The mathematics behind this can be put to work to find collisions for a
hash function with a 16-byte, i.e., 128-bit result (as MD4 and MD5) after
about sqrt(2**128) = 2**64 = about 2.10**19 evaluations of the hash function. 
For MD4 and MD5, this brute-force job can be done in about a month with a 10
million $ investment. These calculations have been made rigorous by Paul van
Oorschot and Mike Wiener, two cryptographers from Nortel.  For 20-byte hash
results, the same amount of money will only bring success after a few
thousand years (but this ball-park estimate does not take into account that
computers become faster every year).  Thus if one selects a hash function for
digital signature schemes, one better goes for one with a 20-byte result.
After all, the overhead in storage and computation is relatively small.
   Another reason to move away from the hash functions MD4 and MD5 are the
analytic attacks found by the second author Hans Dobbertin. By outfoxing the
internal structure of the respective hash algorithm, a collision pair for MD4
can be constructed in less than a second on a PC, and collisions for a
modified version of MD5 (with a different so-called initial value, see below)
can be found in about 10 hours on a PC. These results show that a more
conservative approach to the design of hash functions is required; new hash
functions need more rounds and will be slower than the previous ones.
   RIPEMD-160 is such a new cryptographic hash function. It is a strengthened
version of RIPEMD, a hash function which was developed in the framework of
the EU project RIPE (Race Integrity Primitives Evaluation, '88-'92).  RIPEMD
has only a 16-byte result, and the new attacks proposed by Hans apply to a
reduced version of RIPEMD.
   RIPEMD-160 is a fast cryptographic hash function which is expected to be
secure for the next ten years or more. The design philosophy is to build as
much as possible on the experience gained by evaluating MD4, MD5, and RIPEMD.
As its predecessors, RIPEMD-160 is tuned towards 32-bit processors, which we
feel will remain important for the coming decade. The algorithm will work on
8-bit and 16-bit microprocessors, but will be slower on these. It also does
not take full advantage of 64-bit processors.



Overview of the Algorithm


RIPEMD-160, as all variants of MD4, operate on 32-bit words.  It uses the
following primitive operations.

   - Bitwise Boolean operations (AND, NOT, OR, exclusive-OR).
   - Two's complement addition of words, denoted by `+'. This is modulo 2**32
     addition.
   - Left-rotation (or "left-spin") of words.

   RIPEMD-160 compresses an arbitrary size input by dividing it into blocks
of 16 words (512 bits) each. In order to guarantee that the total input size
is a multiple of 16 words, the input is padded: one appends a single 1
followed by a string of 0s (the number of 0s lies between 0 and 511); the
last two words of the extended input contain the binary representation of the
input size in bits.
   The result of RIPEMD-160 is contained in five 32-bit words, which form the
internal state of the algorithm. This state is initialized with a fixed
string, the initial value. The main part of the algorithm is known as the
compression function: it computes the new state from the old state and the
next 16-word block.
   The compression function consists of five parallel rounds, each containing
16 steps.  The total number of steps is thus 5 x 16 x 2 = 160, compared to 3 x
16 = 48 for MD4 and 4 x 16 = 64 for MD5. First, two copies are made from the
old state (five left and right registers of 32-bits). Both halves are
processed independently. Each step computes a new value for one of the
registers based on the other four register and one message word. At the end
of the compression function, we compute the new state by adding to each word
of the old state one register from the left half and one from the right half
(see Figure 1).
   The C source code of the compression function (Listing 1 and 2) has been
tested in many environments (including MS-DOS, Windows, and Unix). RIPEMD-160
has been put in the public domain by its designers so that anyone can use it.



Speed an Security


Highly optimized assembly code of RIPEMD-160 runs at 40 Mbit/s on a 90 MHz
Pentium. The code size is about 4.2 K. The fastest C versions are on the same
processor about two times slower when compiled with Watcom C 10.0 in native
protected mode.
   RIPEMD-160 is three times slower than MD5, but one should realize that
RIPEMD-160 is a conservative design, which is intended to be used for
applications which require high security for an extended period of time.  It
would be very tempting to design an algorithm with performance in between the
two algorithms, but such an algorithm could not give you the same confidence.
   While RIPEMD-160 is a new algorithm, the fact that it is heavily based on
RIPEMD implies that one can give it sufficient confidence to use it in
important applications.



References


B. Schneier, "One-Way Hash Functions," Dr. Dobb's Journal, September 1991.

H. Dobbertin, "Cryptanalysis of MD4," Fast Software Encryption, LNCS 1039,
D. Gollmann, Ed., Springer-Verlag, 1996, pp. 53-69.

RIPE, "Integrity Primitives for Secure Information Systems.  Final Report
of RACE Integrity Primitives Evaluation (RIPE-RACE 1040)," LNCS 1007,
Springer-Verlag, 1995.

P.C. van Oorschot, M.J. Wiener, "Parallel collision search with application
to hash functions and discrete logarithms," Proc. 2nd ACM Conference on
Computer and Communications Security, ACM, 1994, pp. 210-218.



Sidebar:  The birthday paradox



Consider the following question: if we have a group of 23 people, what is the
probability p that at least two of them have the same birthday?  The answer
can be computed by looking at the problem in a different way: we compute the
probability that they all have a different birthday (this probability is
equal to 1-p). The first person can have his or her birthday on any day; for
the second person, there are 364 days left out of 365 (we ignore leap years);
for the third person, there are 363 days left out of 365, and so one. For the
23rd person, there are 343 days left out of 365.
   The probability that all these birthdays are different is equal to the
product of these probabilities, or


                             364   363         343
                     1 - p = --- x --- x ... x --- = 0.507
                             365   365         365

And thus we find p = 0.493. This probability is much higher than our
intuition suggests. The explanation of this paradox is that there are only 23
people, but 23x22/2 = 253 pairs of people.
   How can this now be applied to finding collisions for hash functions? For
a hash function with an n-bit result (typically n = 128 or 160), we have a
space of size 2**n (this corresponds to the 365 days in the year). If we
evaluate the hash function for r inputs, we have t = r(r-1)/2 pairs of
strings. The probability that the two strings in a pair are identical is
1/(2**n). So if t equals about 2**n, we expect to find one pair with identical
strings. This corresponds to r =about sqrt(2n+1) =about 2**(n/2). So for a
hash function with a 128-bit result, the effort to find a collision is only
about 2**64 evaluations of the hash function, which is the square root of the
the effort to find a preimage. It turns out that clever algorithms allows us
to find these solutions without having to store this huge number of strings.




  Figure 1: Outline of the compression function of RIPEMD-160. Inputs are a
16-word message block Xi[0..15] and a 5-word internal state h0h1h2h3h4, output
is a new value of the internal state. rho and pi permute the order of the
words within the message block Xi[0..15], i.e., they permute the indices
0..15; rho**2 means that rho is applied twice. The fj's are 5 nonlinear
functions and the Kj's and K'j's are 10 additive constants. 





Pseudo-code for RIPEMD-160



    RIPEMD-160: definitions


    nonlinear functions at bit level: exor, mux, -, mux, -

    f(j, x, y, z) = x XOR y XOR z                (0 <= j <= 15)
    f(j, x, y, z) = (x AND y) OR (NOT(x) AND z)  (16 <= j <= 31)
    f(j, x, y, z) = (x OR NOT(y)) XOR z          (32 <= j <= 47)
    f(j, x, y, z) = (x AND z) OR (y AND NOT(z))  (48 <= j <= 63)
    f(j, x, y, z) = x XOR (y OR NOT(z))          (64 <= j <= 79)


    added constants (hexadecimal)

    K(j) = 0x00000000      (0 <= j <= 15)     
    K(j) = 0x5A827999     (16 <= j <= 31)      int(2**30 x sqrt(2))
    K(j) = 0x6ED9EBA1     (32 <= j <= 47)      int(2**30 x sqrt(3))
    K(j) = 0x8F1BBCDC     (48 <= j <= 63)      int(2**30 x sqrt(5))
    K(j) = 0xA953FD4E     (64 <= j <= 79)      int(2**30 x sqrt(7))
    K'(j) = 0x50A28BE6     (0 <= j <= 15)      int(2**30 x cbrt(2))
    K'(j) = 0x5C4DD124    (16 <= j <= 31)      int(2**30 x cbrt(3))
    K'(j) = 0x6D703EF3    (32 <= j <= 47)      int(2**30 x cbrt(5))
    K'(j) = 0x7A6D76E9    (48 <= j <= 63)      int(2**30 x cbrt(7))
    K'(j) = 0x00000000    (64 <= j <= 79)
	    

    selection of message word

    r(j)      = j                    (0 <= j <= 15)
    r(16..31) = 7, 4, 13, 1, 10, 6, 15, 3, 12, 0, 9, 5, 2, 14, 11, 8
    r(32..47) = 3, 10, 14, 4, 9, 15, 8, 1, 2, 7, 0, 6, 13, 11, 5, 12
    r(48..63) = 1, 9, 11, 10, 0, 8, 12, 4, 13, 3, 7, 15, 14, 5, 6, 2
    r(64..79) = 4, 0, 5, 9, 7, 12, 2, 10, 14, 1, 3, 8, 11, 6, 15, 13
    r0(0..15) = 5, 14, 7, 0, 9, 2, 11, 4, 13, 6, 15, 8, 1, 10, 3, 12
    r0(16..31)= 6, 11, 3, 7, 0, 13, 5, 10, 14, 15, 8, 12, 4, 9, 1, 2
    r0(32..47)= 15, 5, 1, 3, 7, 14, 6, 9, 11, 8, 12, 2, 10, 0, 4, 13
    r0(48..63)= 8, 6, 4, 1, 3, 11, 15, 0, 5, 12, 2, 13, 9, 7, 10, 14
    r0(64..79)= 12, 15, 10, 4, 1, 5, 8, 7, 6, 2, 13, 14, 0, 3, 9, 11


    amount for rotate left (rol)

    s(0..15)  = 11, 14, 15, 12, 5, 8, 7, 9, 11, 13, 14, 15, 6, 7, 9, 8
    s(16..31) = 7, 6, 8, 13, 11, 9, 7, 15, 7, 12, 15, 9, 11, 7, 13, 12
    s(32..47) = 11, 13, 6, 7, 14, 9, 13, 15, 14, 8, 13, 6, 5, 12, 7, 5
    s(48..63) = 11, 12, 14, 15, 14, 15, 9, 8, 9, 14, 5, 6, 8, 6, 5, 12
    s(64..79) = 9, 15, 5, 11, 6, 8, 13, 12, 5, 12, 13, 14, 11, 8, 5, 6
    s'(0..15) = 8, 9, 9, 11, 13, 15, 15, 5, 7, 7, 8, 11, 14, 14, 12, 6
    s'(16..31)= 9, 13, 15, 7, 12, 8, 9, 11, 7, 7, 12, 7, 6, 15, 13, 11
    s'(32..47)= 9, 7, 15, 11, 8, 6, 6, 14, 12, 13, 5, 14, 13, 13, 7, 5
    s'(48..63)= 15, 5, 8, 11, 14, 14, 6, 14, 6, 9, 12, 9, 12, 5, 15, 8
    s'(64..79)= 8, 5, 12, 9, 12, 5, 14, 6, 8, 13, 6, 5, 15, 13, 11, 11


    initial value (hexadecimal)

    h0 = 0x67452301; h1 = 0xEFCDAB89; h2 = 0x98BADCFE;
    h3 = 0x10325476; h4 = 0xC3D2E1F0;


   It is assumed that the message after padding consists of t 16-word blocks
that will be denoted with Xi[j], with 0 <= i <= t-1 and 0 <= j <= 15. The
symbol [+] denotes addition modulo 2**32 and rol_s denotes cyclic left shift
(rotate) over s positions.  The pseudo-code for RIPEMD-160 is then given
below, and an outline of the compression function is given in Figure 1.



    RIPEMD-160: pseudo-code

    for i := 0 to t-1 {
        A := h0; B := h1; C := h2; D = h3; E = h4;
        A' := h0; B' := h1; C' := h2; D' = h3; E' = h4;
        for j := 0 to 79 {
            T := rol_s(j)(A [+] f(j, B, C, D) [+] Xi[r(j)] [+] K(j)) [+] E;
            A := E; E := D; D := rol_10(C); C := B; B := T;
            T := rol_s'(j)(A' [+] f(79-j, B', C', D') [+] Xi[r'(j)] [+] K'(j)) 
                 [+] E';
            A' := E'; E' := D'; D' := rol_10(C'); C' := B'; B' := T;
        }
        T := h1 [+] C [+] D'; h1 := h2 [+] D [+] E'; h2 := h3 [+] E [+] A';
        h3 := h4 [+] A [+] B'; h4 := h0 [+] B [+] C'; h0 := T;
    }



Test Values for RIPEMD-160

Messages and corresponding RIPEMD-160 hash results. Messages are given as
ASCII strings, hash results are given in hexadecimal format.


Message: "" (Empty string)
Hash result: 9c1185a5c5e9fc54612808977ee8f548b2258d31
Message: "a"
Hash result: 0bdc9d2d256b3ee9daae347be6f4dc835a467ffe
Message: "abc"
Hash result: 8eb208f7e05d987a9b044a8e98c6b087f15a0bfc
Message: "message digest"
Hash result: 5d0689ef49d2fae572b881b123a85ffa21595f36
Message: "abcdefghijklmnopqrstuvwxyz"
Hash result: f71c27109c692c1b56bbdceb5b9d2865b3708dbc
Message: "abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq"
Hash result: 12a053384a9c0c88e405a06c27dcf49ada62eb2b
Message: "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
Hash result: b0e20b6e3116640286ed3a87a5713079b21f5189
Message: 8 times "1234567890"
Hash result: 9b752e45573d4b39f4dbd3323cab82bf63326bfb
Message: 1 million times "a"
Hash result: 52783243c1697bdbe16d37f97f68f08325dc1528


