Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Instant MD5 collisions
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Instant MD5 collisions

ralfralf Member
edited September 2022 in News

Can instantly generate MD5 collisions for a wide variety of common file formats:
https://github.com/corkami/collisions

If you were still using MD5, you definitely shouldn't be any more!

Comments

  • yoursunnyyoursunny Member, IPv6 Advocate

    Mentally strong people use SHA256.

  • I thought this was possible a decade ago? https://eprint.iacr.org/2013/170.pdf

  • @yoursunny said: Mentally strong people use SHA256.

    Mentally strong people use a non Merkle-Damgard based construction such as SHA3 or BLAKE2.

  • @jiggawattz said:
    I thought this was possible a decade ago? https://eprint.iacr.org/2013/170.pdf

    It was possible, but it'd take hours or days to find a collision. This new tool will modify documents to match an arbitrary hash in seconds.

  • @stevewatson301 said:

    @yoursunny said: Mentally strong people use SHA256.

    Mentally strong people use a non Merkle-Damgard based construction such as SHA3 or BLAKE2.

    Mentally strong people use "wc -l" as a hash function.

    Thanked by 1yoursunny
  • Honestly, I'd be a bit careful before using BLAKE3, they reduced(!) the number of rounds based on the assumption that crypto is too strong; and I'm not sure what kind of security guarantees are provided by the Merkle-tree based construction.

  • @stevewatson301 said:

    Honestly, I'd be a bit careful before using BLAKE3, they reduced(!) the number of rounds based on the assumption that crypto is too strong; and I'm not sure what kind of security guarantees are provided by the Merkle-tree based construction.

    Thanks for the heads-up. In most environments I still use SHA256/SHA384 anyway.

  • @let_rocks said:
    BLAKE3 all the way.

    I always preferred BLAKE's 7.

    Thanked by 1M66B
  • @let_rocks said: SHA384

    This is very uncommon. Why 384 instead of 512?

  • edited September 2022

    @LTniger said:

    @let_rocks said: SHA384

    This is very uncommon. Why 384 instead of 512?

    SHA-384 is formed as SHA-512 then truncated. This means it doesn't preserve the full state of the hash in the final result so is resilient to content extension attacks without other steps being taken in that regard. Similarly, SHA-224 is SHA-256 truncated.

    Some other hashes are resilient in this manner in all variants, SHA3 for one.

    As a sidenote: if you are using hashes simply for validation (i.e. to check for bit-rot in backups) rather than to defend against corruption by an active attacker, then look at xxhash which is far far faster and no less safe for situations where there isn't an active attacker (but absolutely does not give any guarantees against active attack).

  • Another approach that I like is using two different hashes from different families. e.g. if both SHA-1 and MD5 hash match you can be pretty confident that it's the file you expect.

  • This paper was the breakthrough to finding SHA-1 collisions (all collisions so far use the XOR of rounds 5-20)
    https://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/2011/PHD/PHD-2011-08.pdf

  • @MeAtExampleDotCom said:

    @LTniger said:

    @let_rocks said: SHA384

    This is very uncommon. Why 384 instead of 512?

    SHA-384 is formed as SHA-512 then truncated. This means it doesn't preserve the full state of the hash in the final result so is resilient to content extension attacks without other steps being taken in that regard. Similarly, SHA-224 is SHA-256 truncated.

    Some other hashes are resilient in this manner in all variants, SHA3 for one.

    As a sidenote: if you are using hashes simply for validation (i.e. to check for bit-rot in backups) rather than to defend against corruption by an active attacker, then look at xxhash which is far far faster and no less safe for situations where there isn't an active attacker (but absolutely does not give any guarantees against active attack).

    Nice read on xxhash. I thought that crc32c is fast. Time for update...

  • Daniel15Daniel15 Veteran
    edited September 2022

    These attacks has been known for over 10 years. It's not new :smile: Use SHA256 if you need a crytographic hash or xxHash for other use cases.

    @LTniger said: Nice read on xxhash. I thought that crc32c is fast. Time for update...

    xxHash is great and it's used everywhere. It's in the Linux kernel, and a bunch of databases and file transfer apps use it for checksumming.

    xxHash64 is the most commonly used variant. For new development, consider XXH3 (64-bit) or XXH128 (128-bit) which are the latest versions of xxHash and improve performance especially for small data.

    There's some performance comparisons on the author's site: https://cyan4973.github.io/xxHash/

    @stevewatson301 said: based on the assumption that crypto is too strong

    help, my crypto is too strong

    said nobody ever

  • @ralf said:
    Can instantly generate MD5 collisions for a wide variety of common file formats:
    https://github.com/corkami/collisions

    If you were still using MD5, you definitely shouldn't be any more!

    Correct me if I'm mistaken, but I think that MD5 is still resistant to second-preimage attacks, which is relevant insofar as the main use of MD5 nowadays is for verifying file integrity

    More concretely, if one offers a given file f for download together with its MD5 sum, it wouldn't be feasible (or it would be very-very difficult) for a manipulator to create another file f' with the same MD5 sum as file f

    Naturally, one should prefer stronger hash functions nowadays, but the use of MD5 for verifying the file integrity of files out in the wild is still a reasonable use

    (Or am I mistaken?)

  • @Daniel15 said: help, my crypto is too strong - said nobody ever

    Not true. The US government said it, and forced companies to weaken their cryptographic libraries for exported products back in the 1980s and 1990s. The US export licensing regime for products with strong crypto was challenging, and the process was time consuming and painful. Among other things, you had to pass cryptographic tests to prove you were using standard algorithms. You also had to provide a copy of your cryptographic source code and have your technical people available to discuss your design and implementation with the NSA. The pain is similar to going through a FIPS 140 certification for those who know.

    Most people here are too young to remember when manufacturers were forced to weaken the cryptographic algorithms that were included when you bought a new PC or downloaded a browser. For symmetric key ciphers, the maximum key size for unlicensed export was 40 bits, so everybody's computers and browsers had weak ciphers.

    If you lived in the US or another approved country (or not!), the first thing that savvy people did was download the strong cryptographic library. The strong cryptographic library included additional ciphers such as 56-bit DES, 168-bit Triple-DES, and other ciphers with longer-than-40-bit key lengths. You got 40-bit RC4 with the no-license library. 128-bit RC4 was added with the strong library.

    If you were in a country, company, or person where cryptographic exports were limited (say, North Korea), you could obtain the strong cryptographic library easily enough. It was technically illegal according to US export regulations.

    Microsoft had a special signing key for the cryptographic libraries. A cryptographic library would not install unless it was signed with Microsoft's special key. The goal was to prevent people from installing unauthorized, custom cryptographic libraries.

    There was a flaw in the logic. Someone noticed a second signing key for cryptographic libraries. Microsoft accidentally released a a version of Windows for developers that included debug symbols with programmer's variable names. That second cryptographic library key was called "NSAKEY". Yes, that NSA.

    There was a lot of speculation about the purpose of the NSAKEY signing key. Some speculated that it let the NSA create backdoors with faulty cryptographic libraries by hacking the computers of NSA and other government agency targets. (It is my personal opinion that NSAKEY lets the NSA create cryptographic libraries with classified algorithms for government-exclusive use, but that is just an opinion. In case you're wondering, nobody has found a cryptographic library signed by NSAKEY ... yet.)

    The presence of the second cryptographic library signing key (NSAKEY) meant that everybody could install their own custom cryptographic libraries. All they had to do was replace the NSAKEY with their own signing key. Because they substituted the second key, they could still continue to use Microsoft's standard cryptographic library that is signed with the first key, so regular applications will continue to work as expected. Duh!!

  • @emg said: Not true. The US government said it, and forced companies to weaken their cryptographic libraries for exported products back in the 1980s and 1990s.

    Oh yeah, I completely forgot about this! I remember seeing something about this in old browsers back when I first started using the Web.

  • ralfralf Member
    edited September 2022

    @angstrom said:

    @ralf said:
    Can instantly generate MD5 collisions for a wide variety of common file formats:
    https://github.com/corkami/collisions

    If you were still using MD5, you definitely shouldn't be any more!

    Correct me if I'm mistaken, but I think that MD5 is still resistant to second-preimage attacks, which is relevant insofar as the main use of MD5 nowadays is for verifying file integrity

    More concretely, if one offers a given file f for download together with its MD5 sum, it wouldn't be feasible (or it would be very-very difficult) for a manipulator to create another file f' with the same MD5 sum as file f

    (Or am I mistaken?)

    I think you are mistaken. Essentially this works by using the equivalent of comments in the file formats it supports, so that data can be inserted that causes the MD5 sum to be set to a known intermediate value so that the rest of the content can be appended so that the resultant MD5 is the same. I think the size of the fake file needs to be smaller than the original, so padding can be added as well as the extra data so that the resultant size matches as well as the hash. In fact, it might even be a requirement that the size is the same, because the last block includes the file size, so if it was different, the problem becomes harder.

    Thanked by 1angstrom
  • @Daniel15 said:
    help, my crypto is too strong

    Apparently, the authors of the paper think so:
    https://eprint.iacr.org/2019/1492.pdf

    Which was included in Blake3:
    https://raw.githubusercontent.com/BLAKE3-team/BLAKE3-specs/master/blake3.pdf

    Based on existing cryptanalysis of BLAKE and BLAKE2, BLAKE3 reduces the number of rounds in the compression function from 10 to 7.

  • @ralf said:

    @angstrom said:

    @ralf said:
    Can instantly generate MD5 collisions for a wide variety of common file formats:
    https://github.com/corkami/collisions

    If you were still using MD5, you definitely shouldn't be any more!

    Correct me if I'm mistaken, but I think that MD5 is still resistant to second-preimage attacks, which is relevant insofar as the main use of MD5 nowadays is for verifying file integrity

    More concretely, if one offers a given file f for download together with its MD5 sum, it wouldn't be feasible (or it would be very-very difficult) for a manipulator to create another file f' with the same MD5 sum as file f

    (Or am I mistaken?)

    I think you are mistaken. Essentially this works by using the equivalent of comments in the file formats it supports, so that data can be inserted that causes the MD5 sum to be set to a known intermediate value so that the rest of the content can be appended so that the resultant MD5 is the same. I think the size of the fake file needs to be smaller than the original, so padding can be added as well as the extra data so that the resultant size matches as well as the hash. In fact, it might even be a requirement that the size is the same, because the last block includes the file size, so if it was different, the problem becomes harder.

    Okay, but let me ask: in the README file on the GitHub page that you cite, the authors write (under Status):

    Current status of known attacks:

    • get a file to get another file's hash or a given hash: impossible
      • it's still even not practical with MD2 or MD4.
      • works for simpler hashes(*)

    What case are the authors referring to here if not to a second-preimage attack, as I described above? (Just wondering)

  • @emg said: Microsoft had a special signing key for the cryptographic libraries. A cryptographic library would not install unless it was signed with Microsoft's special key. The goal was to prevent people from installing unauthorized, custom cryptographic libraries.

    There was a flaw in the logic. Someone noticed a second signing key for cryptographic libraries. Microsoft accidentally released a a version of Windows for developers that included debug symbols with programmer's variable names. That second cryptographic library key was called "NSAKEY". Yes, that NSA.

    lol bruh

  • @angstrom said:
    Okay, but let me ask: in the README file on the GitHub page that you cite, the authors write (under Status):

    Current status of known attacks:

    • get a file to get another file's hash or a given hash: impossible
      • it's still even not practical with MD2 or MD4.
      • works for simpler hashes(*)

    What case are the authors referring to here if not to a second-preimage attack, as I described above? (Just wondering)

    Re-reading this, it looks like I'm wrong and that this attack is generating two modified files to have the same hash. I misunderstood and thought that only one file was modified such that somewhere it in the residual hash matched the original and then the rest of the original could be copied to the end so that the rest of the file also matched the MD5.

    Thanked by 1angstrom
  • @ralf said:

    @angstrom said:
    Okay, but let me ask: in the README file on the GitHub page that you cite, the authors write (under Status):

    Current status of known attacks:

    • get a file to get another file's hash or a given hash: impossible
      • it's still even not practical with MD2 or MD4.
      • works for simpler hashes(*)

    What case are the authors referring to here if not to a second-preimage attack, as I described above? (Just wondering)

    Re-reading this, it looks like I'm wrong and that this attack is generating two modified files to have the same hash. I misunderstood and thought that only one file was modified such that somewhere it in the residual hash matched the original and then the rest of the original could be copied to the end so that the rest of the file also matched the MD5.

    Right. It's still about constructing -- in one way or another -- two files to have the same MD5 sum: in other words, collision attacks

  • https://github.com/rurban/smhasher/ benchmarks hashes on quality and speed. The Summary section is good. It all depends on what you use hashing for.

  • emgemg Veteran
    edited September 2022

    This is based on 15 year old research. I remember when the first paper came out and the wonderful PDF collision example: They published two PDF files of recommendation letters. Both PDF files had the same hash that anyone could check for themselves. One letter was a glowing recommendation. The other letter said something like "Don't hire this person under any circumstances."

  • Everyone should be suspicious of any 15+ year old document that suddenly turns up.

  • up to 4KB of compressed shellcode into an executable binary, near-instantly. The output file will always have the same MD5 hash: 3cebbe60d91ce760409bbe513593e401

    https://github.com/DavidBuchanan314/monomorph

Sign In or Register to comment.