What is MD5? Understanding Message-Digest Algorithms
The message-digest algorithm MD5 is a cryptographic hash that is used to generate and verify digital signatures or message digests. MD5 is still widely used despite being declared “cryptographically broken” over a decade ago.
As a cryptographic hash, it has known security vulnerabilities, including a high potential for collisions, which is when two distinct messages end up with the same generated hash value.
MD5 can be successfully used for non-cryptographic functions, including as a checksum to verify data integrity against unintentional corruption. MD5 is a 128-bit algorithm. Even with its known security issues, it remains one of the most commonly used message-digest algorithms.
What is the MD5 message-digest algorithm?
Published as RFC 1321 around 30 years ago, the MD5 message-digest algorithm is still widely used today. Using the MD5 algorithm, a 128-bit more compact output can be created from a message input of variable length. This is a type of cryptographic hash that is designed to generate digital signatures, compressing large files into smaller ones in a secure manner and then encrypting them with a private ( or secret) key to be matched with a public key.
MD5 can also be used to detect file corruption or inadvertent changes within large collections of files as a command-line implementation using common computer languages such as Java, Perl, or C. MD5 can then be used as a checksum verifying data integrity and digital signatures. Other non-cryptographic functions of MD5 can include using it to determine the partitional for a specific key in a partitioned database.
MD5 can be used to either print (generate) or check (verify) 128-bit cryptographic hashes. MD5 has some serious well-documented vulnerabilities and flaws, however. Because of this, it should not be used for security purposes.
History of MD5 use
Developed as an extension of the cryptographic hash function MD4, MD5 was created by Ronald Rivest of RSA Data Security, Inc. and MIT Laboratory for Computer Science in 1991 to replace this earlier version that was deemed insecure. It was published in the public domain a year later. Just a year later a “pseudo-collision” of the MD5 compression function was discovered.
The timeline of MD5 discovered (and exploited) vulnerabilities is as follows:
- In 1996, a full collision was reported, and cryptographers recommended replacing MD5 with a different cryptographic hash function such as SHA-1.
- Early in 2004, a project began to prove that MD5 was vulnerable to a birthday attack due to the small size of the hash value at 128 bits.
- By mid-2004, an analytical attack was completed in only an hour that was able to create collisions for the full MD5.
- In 2005, a practical collision was demonstrated using two X.509 certificates with different public keys and the same MD5 hash value. Days later, an algorithm was created that could construct MD5 collisions in just a few hours.
- A year later, in 2006, an algorithm was published that used tunnelling to find a collision within one minute on a single notebook computer.
- In 2008, MD5 was officially declared “cryptographically broken” as MD5 hashes can be created to collide with trusted X.509 certificates issued by well-known certificate authorities (CAs).
Despite the known security vulnerabilities and issues, MD5 is still used today even though more secure alternatives now exist.
Security issues with MD5
The MD5 hash function’s security is considered to be severely compromised. Collisions can be found within seconds, and they can be used for malicious purposes.
In fact, in 2012, the Flame spyware that infiltrated thousands of computers and devices in Iran was considered one of the most troublesome security issues of the year. Flame used MD5 hash collisions to generate counterfeit Microsoft update certificates used to authenticate critical systems. Fortunately, the vulnerability was discovered quickly, and a software update was issued to close this security hole. This involved switching to using SHA-1 for Microsoft certificates.
A hash collision occurs when two different inputs create the same hash value, or output. The security and encryption of a hash algorithm depend on generating unique hash values, and collisions represent security vulnerabilities that can be exploited.
Threat actors can force collisions that will then send a digital signature that will be accepted by the recipient. Even though it is not the actual sender, the collision provides the same hash value so the threat actor’s message will be verified and accepted as legitimate.
What programs use MD5?
Even though it has known security issues, MD5 is still used for password hashing in software. MD5 is used to store passwords with a one-way hash of the password, but it is not among the recommended hashes for this purpose. MD5 is common and easy to use, and developers often still choose it for password hashing and storage.
MD5 is also still used in cybersecurity to verify and authenticate digital signatures. Using MD5, a user can verify that a downloaded file is authentic by matching the public and private key and hash values. Due to the high rate of MD5 collisions, however, this message-digest algorithm is not ideal for verifying the integrity of data or files as threat actors can easily replace the hash value with one of their own.
Data can be verified for integrity using MD5 as a checksum function to ensure that it has not become accidentally corrupted. Files can produce errors when they are unintentionally changed in some of the following ways:
- Errors in data transmission
- Software bugs
- When files are copied or moved write errors can occur
- Issues within the storage medium
The message-digest algorithm MD5 can be used to ensure that the data is the same as it was initially by checking that the output is the same as the input. If a file has been inadvertently changed, the input will create a different hash value, which will then no longer match. This tells you that the file is corrupted. This is only effective when the data has been unintentionally corrupted, however, and not in the case of malicious tampering.
Alternatives to MD5
MD5 should not be used for security purposes or when collision resistance is important. With proven security vulnerabilities and the ease at which collisions can be created using MD5, other more secure hash values are recommended.
The SHA-2 family of hashes is typically chosen as a valid alternative. This family of cryptographic hash functions was initially published in 2001 and includes the following:
- SHA-256
- SHA-224
- SHA-384
- SHA-512
- SHA-512/224
- SHA-512/256
SHA-1 can still be used to verify old time stamps and digital signatures, but the NIST (National Institute of Standards and Technology) does not recommend using SHA-1 to generate digital signatures or in cases where collision resistance is required.
Approved cryptographic hashes by NIST include the SHA-2 family as well as the following four fixed-length SHA-3 algorithms:
- SHA3-224
- SHA3-256
- SHA3-384
- SHA-512
The SHA-2 and SHA-3 family of cryptographic hash functions are secure and recommended alternatives to the MD5 message-digest algorithm. They are much more resistant to potential collisions and generate truly unique hash values.
References
The MD5 Message-Digest Algorithm. (April 1992). Network Working Group Internet Engineering Task Force (IETF).
RSA. (2022). RSA Security, LLC.
MIT CSAIL. MIT CSAIL.
MD5 Is Really Seriously Broken This Time. (December 2008). Security Musings.
Flame’s MD5 Collision Is the Most Worrisome Security Discover of 2012. (June 2012). Forbes.
NIST Policy on Hash Functions. (August 2015). National Institute of Standards and Technology (NIST).
Hash Functions. (June 2020). National Institute of Standards and Technology (NIST).