New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Instant MD5 collisions
Can instantly generate MD5 collisions for a wide variety of common file formats:
https://github.com/corkami/collisions
If you were still using MD5, you definitely shouldn't be any more!
Comments
Mentally strong people use SHA256.
I thought this was possible a decade ago? https://eprint.iacr.org/2013/170.pdf
Mentally strong people use a non Merkle-Damgard based construction such as SHA3 or BLAKE2.
BLAKE3 all the way.
https://github.com/BLAKE3-team/BLAKE3
It was possible, but it'd take hours or days to find a collision. This new tool will modify documents to match an arbitrary hash in seconds.
Mentally strong people use "wc -l" as a hash function.
Honestly, I'd be a bit careful before using BLAKE3, they reduced(!) the number of rounds based on the assumption that crypto is too strong; and I'm not sure what kind of security guarantees are provided by the Merkle-tree based construction.
Thanks for the heads-up. In most environments I still use SHA256/SHA384 anyway.
I always preferred BLAKE's 7.
This is very uncommon. Why 384 instead of 512?
SHA-384 is formed as SHA-512 then truncated. This means it doesn't preserve the full state of the hash in the final result so is resilient to content extension attacks without other steps being taken in that regard. Similarly, SHA-224 is SHA-256 truncated.
Some other hashes are resilient in this manner in all variants, SHA3 for one.
As a sidenote: if you are using hashes simply for validation (i.e. to check for bit-rot in backups) rather than to defend against corruption by an active attacker, then look at xxhash which is far far faster and no less safe for situations where there isn't an active attacker (but absolutely does not give any guarantees against active attack).
Another approach that I like is using two different hashes from different families. e.g. if both SHA-1 and MD5 hash match you can be pretty confident that it's the file you expect.
This paper was the breakthrough to finding SHA-1 collisions (all collisions so far use the XOR of rounds 5-20)
https://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-get.cgi/2011/PHD/PHD-2011-08.pdf
Nice read on xxhash. I thought that crc32c is fast. Time for update...
These attacks has been known for over 10 years. It's not new Use SHA256 if you need a crytographic hash or xxHash for other use cases.
xxHash is great and it's used everywhere. It's in the Linux kernel, and a bunch of databases and file transfer apps use it for checksumming.
xxHash64 is the most commonly used variant. For new development, consider XXH3 (64-bit) or XXH128 (128-bit) which are the latest versions of xxHash and improve performance especially for small data.
There's some performance comparisons on the author's site: https://cyan4973.github.io/xxHash/
help, my crypto is too strong
said nobody ever
Correct me if I'm mistaken, but I think that MD5 is still resistant to second-preimage attacks, which is relevant insofar as the main use of MD5 nowadays is for verifying file integrity
More concretely, if one offers a given file f for download together with its MD5 sum, it wouldn't be feasible (or it would be very-very difficult) for a manipulator to create another file f' with the same MD5 sum as file f
Naturally, one should prefer stronger hash functions nowadays, but the use of MD5 for verifying the file integrity of files out in the wild is still a reasonable use
(Or am I mistaken?)
Not true. The US government said it, and forced companies to weaken their cryptographic libraries for exported products back in the 1980s and 1990s. The US export licensing regime for products with strong crypto was challenging, and the process was time consuming and painful. Among other things, you had to pass cryptographic tests to prove you were using standard algorithms. You also had to provide a copy of your cryptographic source code and have your technical people available to discuss your design and implementation with the NSA. The pain is similar to going through a FIPS 140 certification for those who know.
Most people here are too young to remember when manufacturers were forced to weaken the cryptographic algorithms that were included when you bought a new PC or downloaded a browser. For symmetric key ciphers, the maximum key size for unlicensed export was 40 bits, so everybody's computers and browsers had weak ciphers.
If you lived in the US or another approved country (or not!), the first thing that savvy people did was download the strong cryptographic library. The strong cryptographic library included additional ciphers such as 56-bit DES, 168-bit Triple-DES, and other ciphers with longer-than-40-bit key lengths. You got 40-bit RC4 with the no-license library. 128-bit RC4 was added with the strong library.
If you were in a country, company, or person where cryptographic exports were limited (say, North Korea), you could obtain the strong cryptographic library easily enough. It was technically illegal according to US export regulations.
Microsoft had a special signing key for the cryptographic libraries. A cryptographic library would not install unless it was signed with Microsoft's special key. The goal was to prevent people from installing unauthorized, custom cryptographic libraries.
There was a flaw in the logic. Someone noticed a second signing key for cryptographic libraries. Microsoft accidentally released a a version of Windows for developers that included debug symbols with programmer's variable names. That second cryptographic library key was called "NSAKEY". Yes, that NSA.
There was a lot of speculation about the purpose of the NSAKEY signing key. Some speculated that it let the NSA create backdoors with faulty cryptographic libraries by hacking the computers of NSA and other government agency targets. (It is my personal opinion that NSAKEY lets the NSA create cryptographic libraries with classified algorithms for government-exclusive use, but that is just an opinion. In case you're wondering, nobody has found a cryptographic library signed by NSAKEY ... yet.)
The presence of the second cryptographic library signing key (NSAKEY) meant that everybody could install their own custom cryptographic libraries. All they had to do was replace the NSAKEY with their own signing key. Because they substituted the second key, they could still continue to use Microsoft's standard cryptographic library that is signed with the first key, so regular applications will continue to work as expected. Duh!!
Oh yeah, I completely forgot about this! I remember seeing something about this in old browsers back when I first started using the Web.
I think you are mistaken. Essentially this works by using the equivalent of comments in the file formats it supports, so that data can be inserted that causes the MD5 sum to be set to a known intermediate value so that the rest of the content can be appended so that the resultant MD5 is the same. I think the size of the fake file needs to be smaller than the original, so padding can be added as well as the extra data so that the resultant size matches as well as the hash. In fact, it might even be a requirement that the size is the same, because the last block includes the file size, so if it was different, the problem becomes harder.
Apparently, the authors of the paper think so:
https://eprint.iacr.org/2019/1492.pdf
Which was included in Blake3:
https://raw.githubusercontent.com/BLAKE3-team/BLAKE3-specs/master/blake3.pdf
Okay, but let me ask: in the README file on the GitHub page that you cite, the authors write (under Status):
What case are the authors referring to here if not to a second-preimage attack, as I described above? (Just wondering)
lol bruh
Re-reading this, it looks like I'm wrong and that this attack is generating two modified files to have the same hash. I misunderstood and thought that only one file was modified such that somewhere it in the residual hash matched the original and then the rest of the original could be copied to the end so that the rest of the file also matched the MD5.
Right. It's still about constructing -- in one way or another -- two files to have the same MD5 sum: in other words, collision attacks
https://github.com/rurban/smhasher/ benchmarks hashes on quality and speed. The Summary section is good. It all depends on what you use hashing for.
This is based on 15 year old research. I remember when the first paper came out and the wonderful PDF collision example: They published two PDF files of recommendation letters. Both PDF files had the same hash that anyone could check for themselves. One letter was a glowing recommendation. The other letter said something like "Don't hire this person under any circumstances."
Everyone should be suspicious of any 15+ year old document that suddenly turns up.
up to 4KB of compressed shellcode into an executable binary, near-instantly. The output file will always have the same MD5 hash: 3cebbe60d91ce760409bbe513593e401
https://github.com/DavidBuchanan314/monomorph