Select Monthly Archives
- June 2017
- May 2017
- March 2017
- February 2017
- January 2017
- December 2016
- November 2016
- October 2016
- September 2016
- August 2016
- July 2016
- March 2016
- October 2015
- September 2015
- July 2015
- May 2015
- March 2015
- February 2015
- January 2015
- December 2014
- September 2014
- August 2014
- July 2014
- June 2014
- March 2014
- January 2014
- December 2013
- September 2012
Written By: Adam Caudill February 24, 2017
Today a long-awaited announcement was made, Google released the first full SHA-1 collision. For those in the cryptography community, it was widely expected that such a break would occur this year; the flaws that allow the attack have been known since 2005 and academic work has been going on since then to produce a full collision. Today, the fruits of that labor have been released, and we will explain what this means for you.
What is a collision?
Hashing algorithms such as SHA-1 are used extensively in cryptographic protocols, and are often found in places that you may not expect. All secure hashing algorithms have a few critical properties; from a high level, they are:
- Collision Resistance – Two distinct inputs should not create the same output.
- Pseudo-Random Functions – The output should be indistinguishable from random data.
- Non-Reversible – There should be no practical way to take the output and determine the input.
- Deterministic – The output is always the same, given the same input
If any of these properties are violated, the algorithm is considered to be cryptographically broken. In most cryptographic protocols, collision resistance is the most important. For example, certain types of collisions could, and have in the past, allowed for the creation of fraudulent SSL certificates. Such a break could be used to bypass code signing checks, break GPG signatures checks, and potentially even insert malicious code into source control repositories. Once any of these properties has been broken, the algorithm should not be used in new systems, and should be removed from existing systems. How quickly it needs to be removed depends on the severity of the break.
There are multiple types of collision possible with a hashing algorithm, the severity depends on the exact type of collision that is generated. Though it is important to understand that when a collision has occurred, it is common that methods will improve over time and more advanced types of collisions will be generated, increasing the risks. Certain types of collisions are more serious than others, due to the attacks that they allow.
What kind of collision is this?
This collision is interesting due to the technique used, an identical-prefix collision attack. What this means is that given two pieces of data (a PDF file was used in this case) that begin with the same data, and end with the same data, different data can be inserted between them, and still produce the same hash output from SHA-1. This means you have two files, that share some content, but have differences in the middle, that produce the same hash.
Similar attacks exist for MD5 and the ill-fated SHA-0 (both member of the same family as SHA-1, along with SHA-2), though the most effective and best known attack against MD5 is a chosen-prefix collision attack, which means that an attacker can take two arbitrary (and different) prefixes, and find suffixes that can be appended to them, that cause the same hash output to be generated. This is a more powerful attack, that allows an attacker to more easily produce useful results – it was demonstrated by producing a fraudulent SSL intermediate certificate that could be used to sign server certificates for any domain they wished.
In the case of SHA-1, leveraging this type of collision is more complex due to the requirement that both the beginning and ending data be the same, though with work results similar to what was achieved with MD5 is possible.
What this type of collision doesn’t allow is an attacker to generate a collision from data that is already signed – to perform the attack, both files must be generated with near-colliding blocks, then the final data for the collision is generated. The attacker must have control of both for this type of attack to work.
What does this collision allow?
SHA-1 has been widely adopted since it was standardized by NIST in 1999; here are some examples of the impact:
GPG: The default configuration for GPG on many platforms uses SHA-1 for file signatures, meaning that an attacker could sign a file that is innocent, and later provide a malicious file that has the same signature as the innocent file, which was believed to be safe. The attacker wouldn’t be able to take an existing signed file and generate a colliding file that results in the same hash.
SSL Certificates: An attacker would be able to generate two certificates that result in the same hash, in an attack somewhat similar to the well-known attack performed against MD5. Performing this attack would be rather complex due to various factors:
- CABForum, the industry group that sets standards for SSL certificates has banned SHA-1 certificates, and the exception process that was put in place had strict rules to limit the user-supplied data included. Unfortunately, SHA-1 certificates are still used in many environments, and some Certificate Authorities still sell SHA-1 certificates from roots that have been removed from major trust-stores for use with legacy devices and systems; these certificates operate outside of the standard rules.
- SSL certificates are now required to contain randomly generated serial numbers, produced by the certificate authority. This makes it more complex to produce certificates that can be signed, and still produce a useful collision.
- Browser support is ending, Firefox is disabling SHA-1 as of February 24, 2017; Chrome disabled SHA-1 support in version 56, released on January 31, 2017. Microsoft will be disabling support for SHA-1 in its browsers that chain to a root in Microsoft’s trust-store, locally installed roots will continue to work.
Git: The popular Git source code version control system uses SHA-1 algorithm extensively, and the exact impact is still being researched. This SHA-1 collision could be used to replace a file with a malicious version, though such an attack would be complex to execute. It would likely involve compromising infrastructure in addition to convincing a person with access to a repository to merge a specially prepared file. Such an attack would be simpler with binary files, compared to plain text files where the near-collision blocks would be obvious.
Others: There are countless other systems that use SHA-1, that may be vulnerable in any number of ways. Determining the impact, and proper ways to secure the system requires review by professionals experienced in reviewing cryptographic systems.
How much does it cost?
Calculating this type of collision requires substantial processing resources – the researchers that released this collision would have spent close to $800,000 on cloud computing. Based on improvements to the techniques used, and migrating processing to GPU from CPU, it is estimated that creating a collision would cost approximately $100,000.
This cost places the attack only within the realm of well-funded attackers, such as nations or large criminal organizations. As processing power becomes cheaper, and techniques are improved, this cost will continue to drop.
How worried should I be?
For now, the attack is complex and expensive – thanks to hard work of many, SHA-1 has been greatly reduced in SSL, which substantially reduces the impact of one of the most significant attacks. Due to the expense, only the most sensitive systems are at realistic risk, and even then, for a budget of more than $100,000, there are likely more effective attacks.
As the cost of the attack drops, it will become more practical for less funded attackers – in time, it will likely be practical for individuals to execute this attack. This is why it is important that uses of SHA-1 that could introduce vulnerabilities be identified, and work begin to replace it with more secure options.
What should I use instead?
There are a number of hashing algorithms that are secure, while SHA-2 is related to SHA-1 in that they share a common basic structure, SHA-2 was designed to address these weaknesses. The SHA-3 family is based on an entirely different structure, and for high performance applications where SHA-2 and SHA-3 are too slow, BLAKE2 provides high speed and high security.