SWGDE

published documents

SWGDE Position on the Use of MD5 and SHA1 Hash Algorithms in Digital and Multimedia Forensics

12-f-002

Disclaimer and Conditions Regarding Use of SWGDE Documents:

SWGDE documents are developed by a consensus process that involves the best efforts of relevant subject matter experts, organizations, and input from other stakeholders to publish suggested best practices, practical guidance, technical positions, and educational information in the discipline of digital and multi-media forensics and related fields. No warranty or other representation as to SWGDE work product is made or intended. As a condition to the use of this document (and the information contained herein) in any judicial, administrative, legislative, or other adjudicatory proceeding in the United States or elsewhere, the SWGDE requests notification by e-mail before or contemporaneous to the introduction of this document, or any portion thereof, as a marked exhibit offered for or moved into evidence in such proceeding.. The notification should include: 1) The formal name of the proceeding, including docket number or similar identifier; 2) the name and location of the body conducting the hearing or proceeding; and, 3) the name, mailing address (if available) and contact information of the party offering or moving the document into evidence. Subsequent to the use of this document in the proceeding please notify SWGDE as to the outcome of the matter. Notifications should be sent to secretary@swgde.org.

From time to time, SWGDE documents may be revised, updated, or sunsetted. Readers are advised to verify on the SWGDE website (www.swgde.org) they are utilizing the current version of this document. Prior versions of SWGDE documents are archived and available on the SWGDE website.

Redistribution Policy:

SWGDE grants permission for redistribution and use of all publicly posted documents created by SWGDE, provided that the following conditions are met:

  1. Redistribution of documents or parts of documents must retain this SWGDE cover page containing the Disclaimer and Conditions of Use.
  2. Neither the name of SWGDE nor the names of contributors may be used to endorse or promote products derived from its documents.
  3. Any reference or quote from a SWGDE document must include the version number (or creation date) of the document and also indicate if the document is in a draft status.

Requests for Modification:

SWGDE encourages stakeholder participation in the preparation of documents. Suggestions for modifications are welcome and must be forwarded to the Secretary in writing at secretary@swgde.org. The following information is required as a part of any suggested modification:

  1. Submitter’s name
  2. Affiliation (agency/organization)
  3. Address
  4. Telephone number and email address
  5. SWGDE Document title and version number
  6. Change from (note document section number)
  7. Change to (provide suggested text where appropriate; comments not including suggested text will not be considered)
  8. Basis for suggested modification

Intellectual Property:

Unauthorized use of the SWGDE logo or documents without written permission from SWGDE is a violation of our intellectual property rights.

Individuals may not misstate and/or over represent duties and responsibilities of SWGDE work. This includes claiming oneself as a contributing member without actively participating in SWGDE meetings; claiming oneself as an officer of SWGDE without serving as such; claiming sole authorship of a document; use the SWGDE logo on any material and/or curriculum vitae.

Any mention of specific products within SWGDE documents is for informational purposes only; it does not imply a recommendation or endorsement by SWGDE.

Table of Contents

1. Purpose

The purpose of this document is to recommend that all users of hashing move to current standards and to explain that the use of the Message Digest version 5 (MD5) and Secure Hash Algorithm version 1 (SHA1) hash algorithms remains acceptable for certain functions in digital and multimedia forensic disciplines despite the algorithms having been shown to be inappropriate for broader cryptographic purposes.

2. Scope

This document addresses the use of the MD5 and SHA1 hash algorithms for integrity verification and file identification.

3. Summary

There are many types of hashes that are used for different purposes. This paper compares four commonly-used hashing algorithms: MD5, SHA1, SHA2, and SHA3.

While SWGDE promotes the adoption of SHA2 and SHA3 by vendors and practitioners, the MD5 and SHA1 algorithms remain acceptable for integrity verification and file identification applications in digital forensics. Because of known limitations of the MD5 and SHA1 algorithms, only SHA2 and SHA3 are appropriate for digital signatures and other security applications. SWGDE recommends that all users transition to SHA2 and SHA3, as tools implement support for them.

4. Background

4.1 Hashing: General Background

Hash algorithms use complex mathematics to create a value that is typically represented as a string of hexadecimal characters (called a hash) based on a given set of data. If the data changes, so will the hash. When two different datasets produce the same hash, this is called a collision. There are two types of collisions: random collisions and deliberately engineered collisions. While there have been engineered collisions in some modern hash algorithms, no random collisions have yet been identified. (See Reference Section for additional information.)

Hashing serves a variety of functions. It is used for integrity verification, file identification, random number generation, creating a unique representation of a file to be digitally signed, and many other uses. Most public key digital signature applications require a hash as part of the process. In the field of digital forensics, hashing is primarily used for integrity verification and file identification.

4.2 Integrity Verification

The goal of integrity verification is to determine if data has changed since the hash value was calculated. This is a common use of hashes in digital and multimedia forensics. Integrity verification is important for maintaining a chain of custody. When a file is hashed, a “digital fingerprint” of a file is created, which is unique to the file. The change of even one bit in a file will cause the hash to change. Note that hashes only provide this protection from the time the hash is made and depend on the hash being stored securely. A proper chain of custody starts from collection – even before the hash is made and secured. SWGDE recommends hashes be made as early as possible during the collection process.

4.3 File Identification

The goal of file identification is to efficiently scan a digital image or other digital object for specific items. Many organizations use hashes to identify files, because a hash is significantly smaller than the file it describes and it is not possible to re-create a file based on its hash. The hashes are easier to distribute and sensitive information is not disclosed. Many common forensic tools use lists of hash values to identify known and notable files in an examination. For example, the National Institute of Standards and Technology (NIST) National Software Reference Library distributes hashes of known software. See www.nsrl.nist.gov.

4.4 What makes hash algorithms good or bad?

Hash algorithms are ranked according to their strength for various underlying properties, including collision resistance and preimage resistance.

  • Collision resistance means that there are not two files with the same hash. Collisions can occur randomly or be engineered. If it is possible to create two files with the same hash, the algorithm is considered broken with respect to collision resistance.
  • Preimage resistance is the inability to create a file with the same hash as a previously calculated hash.
  • Other properties. There are other properties of hashing algorithms, such as the inability to reverse engineer a file from its hash and the computational efficiency of the algorithm.

4.5 Description of Commonly-Used Hash Algorithms

There are four common hash algorithm families in current use. (See Reference Section for additional information.)

  1. MD5: MD5 is defined in IETF RFC 1321 and is the oldest among these hash algorithms. The first practical public attack on MD5 was published in 2004. This attack can be used to engineer hash collisions, involving the deliberate creation of two different files with the same hash. It cannot be used to create a different file whose hash matches a pre-existing file’s hash.
  1. SHA1: SHA1 was first allowed for federal use in NIST publication FIPS 180 in 1995 and subsequently disallowed in 2010. The attack on MD5 raised the possibility that the same type of attack could be used on SHA1. The first successful SHA1 collision attack was published in 2017. However, like the MD5 attack, it cannot be used to create a different file whose hash matches a pre-existing file’s hash.
  2. SHA2: NIST added SHA2 to FIPS 180 in 2006 after concerns about SHA1 surfaced. It includes multiple versions, the most common being SHA256 and SHA512. There are no known successful attacks against SHA2.
  3. SHA3: NIST added SHA3 to FIPS 180 in 2015. It uses a different algorithm than previous SHA versions. There are no known successful attacks against SHA3.

5. Recommendations for the Appropriate Uses of MD5 and SHA1

Because MD5 and SHA1 have proven to be susceptible to engineered collisions, they should not be used for most security applications. (See NIST Policy on Hash Functions.) Within the field of digital forensics, their use is still appropriate in the following situations:

  • Integrity Verification
    It is appropriate to use both MD5 and SHA1 for integrity verification provided the hash is securely stored or recorded in the examination documentation. This will prevent an individual from substituting a different file and its hash. This is true for all hash algorithms. Since there are no preimage attacks on any of the four hashing algorithms discussed, the only way to manipulate the evidence without detection is to do it before it is hashed. (Note: hashing files does not protect against manipulation of evidence before it is hashed, but technical and procedural controls during the data collection process guard against such risks.)
  • File Identification
    Since there are no preimage attacks against MD5 and SHA1, it is appropriate to use these algorithms to assist in file discovery to help identify files for further examination. If the file identification will be used as evidence (e.g., to identify child pornography) without visual or other inspection, only SHA2 and SHA3 hashes are appropriate, provided such hashes come from an authorized source that meets evidentiary standards for pedigree and content.

6. References

[1] Internet Engineering Task Force, The MD5 Message-Digest Algorithm, April 1992, https://www.ietf.org/rfc/rfc1321.txt

[2] National Institute of Standards and Technology, NIST Policy on Hash Functions, August 5, 2015, https://csrc.nist.gov/Projects/Hash-Functions/NIST-Policy-on-Hash-Functions

[3] Google Security Blog, Announcing the first SHA1 collision, February 23, 2017, https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html

[4] Xiaoyun Wang, Dengguo Feng, Xuejia Lai, and Hongbo Yu, Collisions for Hash Functions MD4, MD5, HAVAL-128 and RIPEMD, Cryptology ePrint Archive: Report 2004/199, https://eprint.iacr.org/2004/199

[5] National Institute of Standards and Technology, FIPS 180-1: Secure Hash Standard, April 17, 1995, https://csrc.nist.gov/publications/detail/fips/180/1/archive/1995-04-17

[6] National Institute of Standards and Technology, SP 800-107, Recommendations for Applications Using Approved Hash Algorithms, August 2012, https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-107r1.pdf

[7] National Institute of Standards and Technology, NIST Comments on Cryptanalytic Attacks on SHA-1, April 26, 2006, https://csrc.nist.gov/News/2006/NIST-Comments-on-Cryptanalytic-Attacks-on-SHA-1

[8] IOP Science, Journal of Physics: Conference Series, A comparative study of Message Digest 5 (MD5) and SHA256 algorithm, March 2018, http://iopscience.iop.org/article/10.1088/1742-6596/978/1/012116

[9] Sharma, Arvind & K Mittal, S. (2018). Comparative Analysis of Cryptographic Hash Functions, https://www.researchgate.net/publication/327664102_COMPARATIVE_ANALYSIS_OF_CRYPTOGRAPHIC_HASH_FUNCTIONS

[10] Preshing on Programming, Hash Collision Probabilities, May 04, 2011, http://preshing.com/20110504/hash-collision-probabilities/

History

Revision Issue Date Section History
1.0 DRAFT
2018-09-20
All
Initial draft created and voted by SWGDE for release as a Draft for Public Comment.
1.0 DRAFT
2018-11-20
Formatted and released as a Draft for Public Comment.
1.0
2019-09-19
All
Minor changes made throughout document for clarification purposes in response to public comments. SWGDE voted to publish as Approved.
1.0
2019-09-29
Formatted for release as Approved version 1.0.

Version: 1.0 (September 29, 2019)