Cryptography - Authentication, Hash Algorithms

This lesson introduces the concept of message authentication, and introduces the hashing algorithms: MD4, MD5, Haval, RIPEMD, and SHA.

Objectives

understand the difference between hash and signature algorithms, and what each is used for

know in general how hash algorithms like MD5 and SHA work

Lecture Content

Message Authentication

Message Authentication
- message authentication is concerned with:
  - protecting the integrity of a message
  - validating identity of originator
  - non-repudiation of origin (dispute resolution)
  To now, have been concerned with protecting message content (ie secrecy) by encrypting the message. Will now consider how to protect message integrity (ie protection from modification), as well as confirming the identity of the sender. Generically this is the problem of message authentication, and in eCommerce applications is arguably more imprtant than secrecy.
Message Authentication Code (MAC)
- authenticator, signature, or message authentication code (MAC)
- electronic equivalent of a signature on a message
- sent along with the message
- is generated via some algorithm which depends on both the message and some (public or private) key known only to the sender and receiver
- message may be of any length
- MAC may be of any length, but more often is some fixed size
- this requires the use of some hash function
  - to condense the message to the required size
  - if this is not acheived by the authentication scheme
- need to consider replay problems with message and MAC
  - require a message sequence number, timestamp or negotiated random values
  In general, want to send an "electronic signature" along with the message to validate its contents, and the senders identity. Unless the message is processed in a way that provides both encryption and authentication (as when a chaining mode is used with a block cipher), the electronic signature is separate from the encrypted message. Unless the amount of information sent is to be doubled, some means is needed to create a "digest" (MAC) of the message of a suitable, fairly small, size.
Message Authentication Process
Authentication using Private-key Ciphers
- if a message is being encrypted using a session key known only to the sender and receiver, then the message may also be authenticated
  - since only sender or receiver could have created it
  - any interference will corrupt the message (provided it includes sufficient redundancy to detect change)
  - but this does not provide non-repudiation since it is impossible to prove who created the message
  As mentioned above, if a message is sent encrypted using a chaining mode of a block cipher (CBC or CFB) then implicitly the message is also authenticated, since any external modification would corrupt the decryption, and since only the sender & reciever supposedly know the key used. However since the algorithm is symmtric, there is no way to prove to a 3rd party who did what.
Authentication using Private-key Ciphers
- message authentication may also be done using the standard modes of use of a block cipher
  - sometimes do not want to send encrypted messages
  - can use either CBC or CFB modes and send final block, since this will depend on all previous bits of the message
  - no hash function is required, since this method accepts arbitrary length input and produces a fixed output
  - usually use a fixed known IV
  - this is the approached used in Australian EFT standards AS8205
  - major disadvantage is small size of resulting MAC since 64-bits is probably too small
  Can also use block cipher chaining mdoes to create a separate authenticator, by just sending the last block. However this suffers from being a bit too small for acceptable use today.

Hashing Functions

Hashing Functions
- used to condense an arbitrary length message to a fixed size
- usually for subsequent signature by a digital signature algorithm
- it is usually assumed that the hash function is public and not keyed
- traditional CRCs do not satisfy the above requirements
- length should be large enough to resist birthday attacks
  - 64-bits is now regarded as too small
  - using 128-512 is regarded as suitable
  Hash functions are used to "digest" or "condense" a message down to a fixed size, which can then be signed, in a way that makes finding other messages with the same hash extremely difficult (so the signature wont apply easily to other messages). The hash needs to be large enough to resist "birthday attacks" on it - see Stallings 8.4 and 8A.
Hashing Function Design Principles
- a good cryptographic hash function h should have the following properties:
  - h should destroy all homomorphic structures in the underlying public key cryptosystem (be unable to compute hash value of 2 messages combined given their individual hash values)
  - h should be computed on the entire message
  - h should be a one-way function so that messages are not disclosed by their signatures
  - it should be computationally infeasible given a message and its hash value to compute another message with the same hash value
  - should resist birthday attacks (finding any 2 messages with the same hash value, perhaps by iterating through minor permutations of 2 messages)
  These are the "mathematical" specifications for good hash functions. Essentially it must be extremely difficult to find 2 messages with the same hash, and the hash should not be related to the message in any obvious way (ie it should be a complex non-linear function of the message). There are quite a few similarities in the evolution of hash functions & block ciphers, and in the evolution of the design requirements on both.
MD2
- original member of a family of one-way hash functions
- all designed by Ronald Rivest (R in RSA)
- MD2 is the oldest
- produces a 128-bit hash value
- is regarded as slower and less efficient than MD4 and MD5
- specified as an Internet standard (RFC1320)
  This is the first generally used hash function designed by Ronald Rivest. Whilst not broken, it is no longer favoured for use due to its greater complexity and slower speed.
MD4
- MD4 produces a 128-bit hash of the message
- uses bit operations on 32-bit operands for fast implementation
- specified as an Internet standard (RFC1186)
- design goals:
  - collision resistant (hard to find collisions)
  - direct security (no dependence on "hard" problems)
  - fast, simple, compact
  - favours little-endian systems (eg PCs)
  One of the widely used hash functions - has appeared in a number of security uses, eg secure email PEM/PGP, S/Key passwords etc
MD4 overview
- pad message so its length is 448 mod 512
- append a 64-bit message length value to message
- initialise the 4-word (128-bit) buffer (A,B,C,D)
- process the message in 16-word (512-bit) chunks, using 3 rounds of 16 bit operations each on the chunk & buffer
- output hash value is the final buffer value
  Approach is a bit like chaining a block cipher - a previous value is combined with the current chunk of message through a few rounds, before the updating the current value. The final values is the hash.
MD4 Security
- progress at cryptanalysing MD4 has been made
  - Boer/Bosselaers attacked last 2 rounds
  - Merkle attacked first 2 rounds
  - Biham discusses differential crpytanalysis of first 2 rounds
- whilst not actually broken, serious concerns exist
- also a number of collisions have been found
- future use is being depreciated
- Rivest evolved it into MD5
MD5
- MD5 was designed as a strengthened version of MD4
- uses four, more complex, rounds compared to 3 in MD4
- some progress at cryptanalysing MD5 has been made
- a small number of collisions have been found
- MD5 is in use and is considered reasonably secure in most practical applications
- specified as an Internet standard (RFC1321)
  MD5 is the current, and very widely used, member of this family of hash functions. Some progress has been made analysing it, but of more concern is its 128-bit size, which is starting to look a bit small.
MD5 overview
- pad message so its length is 448 mod 512
- append a 64-bit message length value to message
- initialise the 4-word (128-bit) buffer (A,B,C,D)
- process the message in 16-word (512-bit) blocks:
  - using 4 rounds of 16 bit operations each on the chunk & buffer
  - adding the output to the input to form the new buffer value
- output hash value is the final buffer value
  MD5 Main Loop
  Again see the message broken into blocks, processed along with the buffer value using 4 rounds, and the result added to the input buffer to make the new buffer value. Repeat till run out of message, and use final buffer value as hash. nb. due to padding always have a full final block (with length in it). Show Schneier description of MD5 here.
MD5 Round
- each round has 16 steps:
- R(a,b,c,d,M_i,s,t_i): a = b + ((a+F(b,c,d)+M_i+t_i)<<s)
- where F(b,c,d) is a different nonlinear function in each round
- t_i is a value derived from sine
  MD5 Round
  Each round mixes the buffer input with the next "word" of the message in a complex, non-linear manner. A different non-linear function is used in each of the 4 rounds (but the same function for all 16 steps in a round). The 4 buffer words (a,b,c,d) are rotated from step to step so all are used and updated. See Schneier section 18.5 or Stallings 9.1 for details.
SHA (Secure Hash Algorithm)
- SHA was designed by NIST & NSA in 1993, revised 1995
- is the US federal standard for use with the DSA signature scheme
- nb the algorithm is SHA, the standard is SHS
- produces 160-bit hash values
- now the generally preferred hash algorithm
- based on design of MD4/5 with some key differences
  SHA is one of the newer generation of hash functions, more resistant to cryptanalysis, and now probably preferred for new applications.
SHA Overview
- pad message so its length is a multiple of 512 bits
- initialise the 5-word (160-bit) buffer (A,B,C,D,E) to
- (67452301,efcdab89,98badcfe,10325476,c3d2e1f0)
- process the message in 16-word (512-bit) chunks
- expand the 16 words into 80 words by mixing and shifting
- using 4 rounds of 20 bit operations each on the chunk & buffer
- (A,B,C,D,E) <- ((E+F(t,B,C,D)+(A<<5)+W_t+K_t),A,(B<<30),C,D)
- adding the output to the input to form the new buffer value
- output hash value is the final buffer value
  SHA Overview
  Can see SHA shares much in common with MD5, but with 20 instead of 16 steps in each of the 4 rounds. Note the 4 constants are based on sqrt(2,3,5,10). Show Schneier description of SHA here.
SHA Security vs MD4/5
- original SHA was slightly modified in 1995 (in expansion of message blocks)
- following NIST identification of some concerns
- the exact nature of which is not public
- current version is SHA-1
- SHA-1 is regarded as secure and suggested for use:
  - brute force attack is harder (160 vs 128 bits for MD5)
  - not vulnerable to known attacks (vs MD4/5)
  - a little slower than MD5 (80 vs 64 steps)
  - optimised for big endian CPU's (vs MD5 little endian)
  SHA-1 is probbaly the preferred hash function for new applications. Currently no problems are known with it.
RIPEMD-160
- RIPEMD-160 was developed in Europe
- by researches involved in attacks on MD4/5
- somewhat similar to MD5/SHA
- uses 2 parallel lines of 5 rounds of 16 steps
- creates a 160-bit hash value
- slower, but probably more secure, than SHA
- if security a concern, then suggested for use
  RIPEMD-160 was developed in Europe as a result of an EC call for new authentication algorithms in the early-90's. The original version evolved into this, which was finally accepted. It is similar to SHA/MD5 - slower, but likely more secure.
HAVAL
- a variable length one-way hash function
- designed by Uni of Wollongong and published at Auscrypt'92
- it processes messages in 1024-bit blocks
- using an 8-word buffer and 3 to 5 rounds of 16 steps each
- creating hash values of 128, 160, 192, 224, or 256 bits in length
- uses highly non-linear 7-variable functions in each step
- is faster than MD5
- it not subject to MD5 type analysis, no attack is known
  HAVAL is a local Australian hash algorith developed by Profs Pieprzyk & Seberry at UOW. It has a rather different approach to the MD family, and hence is rather less likely to be susceptible to the known attacks on them. It is quite fast, and has been included in some security products, like Tripwire.
Hashing Using Private Key Ciphers
- a large number of "Modes of Use" have been proposed which use a block cipher to create a hash value
- original proposal was by Davies and Meyer
- many other proposals
- most have been broken using a birthday attack
- the design of fast, secure hash functions of this form is still being studied, with many questions unresolved
  Have many proposals to use block ciphers to create a hash, but one larger than the standard block sized used by the ciphers - to avoid birthday attacks. Most of the results have been flawed - see Schneier 18.11. Not much joy to be had with this approach.