Instant MD5 collisions

ralf · September 2022

Can instantly generate MD5 collisions for a wide variety of common file formats:
https://github.com/corkami/collisions

If you were still using MD5, you definitely shouldn't be any more!

yoursunny · September 2022

Mentally strong people use SHA256.

jiggawatt · September 2022

I thought this was possible a decade ago? https://eprint.iacr.org/2013/170.pdf

bulbasaur · September 2022

@yoursunny said: Mentally strong people use SHA256.

Mentally strong people use a non Merkle-Damgard based construction such as SHA3 or BLAKE2.

let_rocks · September 2022

BLAKE3 all the way.
https://github.com/BLAKE3-team/BLAKE3

ralf · September 2022

@jiggawattz said:
I thought this was possible a decade ago? https://eprint.iacr.org/2013/170.pdf

It was possible, but it'd take hours or days to find a collision. This new tool will modify documents to match an arbitrary hash in seconds.

ralf · September 2022

@stevewatson301 said:

@yoursunny said: Mentally strong people use SHA256.

Mentally strong people use a non Merkle-Damgard based construction such as SHA3 or BLAKE2.

Mentally strong people use "wc -l" as a hash function.

bulbasaur · September 2022

@let_rocks said:
BLAKE3 all the way.
https://github.com/BLAKE3-team/BLAKE3

Honestly, I'd be a bit careful before using BLAKE3, they reduced(!) the number of rounds based on the assumption that crypto is too strong; and I'm not sure what kind of security guarantees are provided by the Merkle-tree based construction.

let_rocks · September 2022

@stevewatson301 said:

@let_rocks said:
BLAKE3 all the way.
https://github.com/BLAKE3-team/BLAKE3

Honestly, I'd be a bit careful before using BLAKE3, they reduced(!) the number of rounds based on the assumption that crypto is too strong; and I'm not sure what kind of security guarantees are provided by the Merkle-tree based construction.

Thanks for the heads-up. In most environments I still use SHA256/SHA384 anyway.

ralf · September 2022

@let_rocks said:
BLAKE3 all the way.

I always preferred BLAKE's 7.

Levi · September 2022

@let_rocks said: SHA384

This is very uncommon. Why 384 instead of 512?

MeAtExampleDotCom · September 2022

@LTniger said:

@let_rocks said: SHA384

This is very uncommon. Why 384 instead of 512?

SHA-384 is formed as SHA-512 then truncated. This means it doesn't preserve the full state of the hash in the final result so is resilient to content extension attacks without other steps being taken in that regard. Similarly, SHA-224 is SHA-256 truncated.

Some other hashes are resilient in this manner in all variants, SHA3 for one.

As a sidenote: if you are using hashes simply for validation (i.e. to check for bit-rot in backups) rather than to defend against corruption by an active attacker, then look at xxhash which is far far faster and no less safe for situations where there isn't an active attacker (but absolutely does not give any guarantees against active attack).

ralf · September 2022

Another approach that I like is using two different hashes from different families. e.g. if both SHA-1 and MD5 hash match you can be pretty confident that it's the file you expect.

johnnyquestion · September 2022

This paper was the breakthrough to finding SHA-1 collisions (all collisions so far use the XOR of rounds 5-20)
https://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/2011/PHD/PHD-2011-08.pdf

Levi · September 2022

@MeAtExampleDotCom said:

@LTniger said:

@let_rocks said: SHA384

This is very uncommon. Why 384 instead of 512?

SHA-384 is formed as SHA-512 then truncated. This means it doesn't preserve the full state of the hash in the final result so is resilient to content extension attacks without other steps being taken in that regard. Similarly, SHA-224 is SHA-256 truncated.

Some other hashes are resilient in this manner in all variants, SHA3 for one.

As a sidenote: if you are using hashes simply for validation (i.e. to check for bit-rot in backups) rather than to defend against corruption by an active attacker, then look at xxhash which is far far faster and no less safe for situations where there isn't an active attacker (but absolutely does not give any guarantees against active attack).

Nice read on xxhash. I thought that crc32c is fast. Time for update...

Daniel15 · September 2022

These attacks has been known for over 10 years. It's not new Use SHA256 if you need a crytographic hash or xxHash for other use cases.

@LTniger said: Nice read on xxhash. I thought that crc32c is fast. Time for update...

xxHash is great and it's used everywhere. It's in the Linux kernel, and a bunch of databases and file transfer apps use it for checksumming.

xxHash64 is the most commonly used variant. For new development, consider XXH3 (64-bit) or XXH128 (128-bit) which are the latest versions of xxHash and improve performance especially for small data.

There's some performance comparisons on the author's site: https://cyan4973.github.io/xxHash/

@stevewatson301 said: based on the assumption that crypto is too strong

help, my crypto is too strong

said nobody ever

angstrom · September 2022

@ralf said:
Can instantly generate MD5 collisions for a wide variety of common file formats:
https://github.com/corkami/collisions

If you were still using MD5, you definitely shouldn't be any more!

Correct me if I'm mistaken, but I think that MD5 is still resistant to second-preimage attacks, which is relevant insofar as the main use of MD5 nowadays is for verifying file integrity

More concretely, if one offers a given file f for download together with its MD5 sum, it wouldn't be feasible (or it would be very-very difficult) for a manipulator to create another file f' with the same MD5 sum as file f

Naturally, one should prefer stronger hash functions nowadays, but the use of MD5 for verifying the file integrity of files out in the wild is still a reasonable use

(Or am I mistaken?)

emg · September 2022

@Daniel15 said: help, my crypto is too strong - said nobody ever

Not true. The US government said it, and forced companies to weaken their cryptographic libraries for exported products back in the 1980s and 1990s. The US export licensing regime for products with strong crypto was challenging, and the process was time consuming and painful. Among other things, you had to pass cryptographic tests to prove you were using standard algorithms. You also had to provide a copy of your cryptographic source code and have your technical people available to discuss your design and implementation with the NSA. The pain is similar to going through a FIPS 140 certification for those who know.

Most people here are too young to remember when manufacturers were forced to weaken the cryptographic algorithms that were included when you bought a new PC or downloaded a browser. For symmetric key ciphers, the maximum key size for unlicensed export was 40 bits, so everybody's computers and browsers had weak ciphers.

If you lived in the US or another approved country (or not!), the first thing that savvy people did was download the strong cryptographic library. The strong cryptographic library included additional ciphers such as 56-bit DES, 168-bit Triple-DES, and other ciphers with longer-than-40-bit key lengths. You got 40-bit RC4 with the no-license library. 128-bit RC4 was added with the strong library.

If you were in a country, company, or person where cryptographic exports were limited (say, North Korea), you could obtain the strong cryptographic library easily enough. It was technically illegal according to US export regulations.

Microsoft had a special signing key for the cryptographic libraries. A cryptographic library would not install unless it was signed with Microsoft's special key. The goal was to prevent people from installing unauthorized, custom cryptographic libraries.

There was a flaw in the logic. Someone noticed a second signing key for cryptographic libraries. Microsoft accidentally released a a version of Windows for developers that included debug symbols with programmer's variable names. That second cryptographic library key was called "NSAKEY". Yes, that NSA.

There was a lot of speculation about the purpose of the NSAKEY signing key. Some speculated that it let the NSA create backdoors with faulty cryptographic libraries by hacking the computers of NSA and other government agency targets. (It is my personal opinion that NSAKEY lets the NSA create cryptographic libraries with classified algorithms for government-exclusive use, but that is just an opinion. In case you're wondering, nobody has found a cryptographic library signed by NSAKEY ... yet.)

The presence of the second cryptographic library signing key (NSAKEY) meant that everybody could install their own custom cryptographic libraries. All they had to do was replace the NSAKEY with their own signing key. Because they substituted the second key, they could still continue to use Microsoft's standard cryptographic library that is signed with the first key, so regular applications will continue to work as expected. Duh!!

Daniel15 · September 2022

@emg said: Not true. The US government said it, and forced companies to weaken their cryptographic libraries for exported products back in the 1980s and 1990s.

Oh yeah, I completely forgot about this! I remember seeing something about this in old browsers back when I first started using the Web.

ralf · September 2022

@angstrom said:

@ralf said:
Can instantly generate MD5 collisions for a wide variety of common file formats:
https://github.com/corkami/collisions

If you were still using MD5, you definitely shouldn't be any more!

Correct me if I'm mistaken, but I think that MD5 is still resistant to second-preimage attacks, which is relevant insofar as the main use of MD5 nowadays is for verifying file integrity

More concretely, if one offers a given file f for download together with its MD5 sum, it wouldn't be feasible (or it would be very-very difficult) for a manipulator to create another file f' with the same MD5 sum as file f

(Or am I mistaken?)

I think you are mistaken. Essentially this works by using the equivalent of comments in the file formats it supports, so that data can be inserted that causes the MD5 sum to be set to a known intermediate value so that the rest of the content can be appended so that the resultant MD5 is the same. I think the size of the fake file needs to be smaller than the original, so padding can be added as well as the extra data so that the resultant size matches as well as the hash. In fact, it might even be a requirement that the size is the same, because the last block includes the file size, so if it was different, the problem becomes harder.

bulbasaur · September 2022

@Daniel15 said:
help, my crypto is too strong

Apparently, the authors of the paper think so:
https://eprint.iacr.org/2019/1492.pdf

Which was included in Blake3:
https://raw.githubusercontent.com/BLAKE3-team/BLAKE3-specs/master/blake3.pdf

Based on existing cryptanalysis of BLAKE and BLAKE2, BLAKE3 reduces the number of rounds in the compression function from 10 to 7.

angstrom · September 2022

@ralf said:

@angstrom said:

@ralf said:
Can instantly generate MD5 collisions for a wide variety of common file formats:
https://github.com/corkami/collisions

If you were still using MD5, you definitely shouldn't be any more!

Correct me if I'm mistaken, but I think that MD5 is still resistant to second-preimage attacks, which is relevant insofar as the main use of MD5 nowadays is for verifying file integrity

More concretely, if one offers a given file f for download together with its MD5 sum, it wouldn't be feasible (or it would be very-very difficult) for a manipulator to create another file f' with the same MD5 sum as file f

(Or am I mistaken?)

I think you are mistaken. Essentially this works by using the equivalent of comments in the file formats it supports, so that data can be inserted that causes the MD5 sum to be set to a known intermediate value so that the rest of the content can be appended so that the resultant MD5 is the same. I think the size of the fake file needs to be smaller than the original, so padding can be added as well as the extra data so that the resultant size matches as well as the hash. In fact, it might even be a requirement that the size is the same, because the last block includes the file size, so if it was different, the problem becomes harder.

Okay, but let me ask: in the README file on the GitHub page that you cite, the authors write (under Status):

Current status of known attacks:

get a file to get another file's hash or a given hash: impossible

it's still even not practical with MD2 or MD4.

works for simpler hashes(*)

What case are the authors referring to here if not to a second-preimage attack, as I described above? (Just wondering)

AndrewL64 · September 2022

@emg said: Microsoft had a special signing key for the cryptographic libraries. A cryptographic library would not install unless it was signed with Microsoft's special key. The goal was to prevent people from installing unauthorized, custom cryptographic libraries.

There was a flaw in the logic. Someone noticed a second signing key for cryptographic libraries. Microsoft accidentally released a a version of Windows for developers that included debug symbols with programmer's variable names. That second cryptographic library key was called "NSAKEY". Yes, that NSA.

lol bruh

ralf · September 2022

@angstrom said:
Okay, but let me ask: in the README file on the GitHub page that you cite, the authors write (under Status):

Current status of known attacks:

get a file to get another file's hash or a given hash: impossible

it's still even not practical with MD2 or MD4.

works for simpler hashes(*)

What case are the authors referring to here if not to a second-preimage attack, as I described above? (Just wondering)

Re-reading this, it looks like I'm wrong and that this attack is generating two modified files to have the same hash. I misunderstood and thought that only one file was modified such that somewhere it in the residual hash matched the original and then the rest of the original could be copied to the end so that the rest of the file also matched the MD5.

angstrom · September 2022

@ralf said:

@angstrom said:
Okay, but let me ask: in the README file on the GitHub page that you cite, the authors write (under Status):

Current status of known attacks:

get a file to get another file's hash or a given hash: impossible

it's still even not practical with MD2 or MD4.

works for simpler hashes(*)

What case are the authors referring to here if not to a second-preimage attack, as I described above? (Just wondering)

Re-reading this, it looks like I'm wrong and that this attack is generating two modified files to have the same hash. I misunderstood and thought that only one file was modified such that somewhere it in the residual hash matched the original and then the rest of the original could be copied to the end so that the rest of the file also matched the MD5.

Right. It's still about constructing -- in one way or another -- two files to have the same MD5 sum: in other words, collision attacks

rincewind · September 2022

https://github.com/rurban/smhasher/ benchmarks hashes on quality and speed. The Summary section is good. It all depends on what you use hashing for.

emg · September 2022

This is based on 15 year old research. I remember when the first paper came out and the wonderful PDF collision example: They published two PDF files of recommendation letters. Both PDF files had the same hash that anyone could check for themselves. One letter was a glowing recommendation. The other letter said something like "Don't hire this person under any circumstances."

TimboJones · September 2022

Everyone should be suspicious of any 15+ year old document that suddenly turns up.

johnnyquestion · September 2022

up to 4KB of compressed shellcode into an executable binary, near-instantly. The output file will always have the same MD5 hash: 3cebbe60d91ce760409bbe513593e401

https://github.com/DavidBuchanan314/monomorph

Howdy, Stranger!

Categories

In this Discussion

Instant MD5 collisions

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Instant MD5 collisions

Comments