It’s using a combination of multicollision attacks against MD5 and sequences of groups of alternate blocks of data representing the alphabet encoded in a way compatible with the file format.
It’s basically <[a+random]/[b+random]/[c+random]…> * (length of message). The random data is crafted by the attack tool so each block has the exact same effect on the MD5 hashing algorithm as it processes each block. You need to decide how many variable blocks you need and where and their encoding in advance. You encode the blocks so the randomness isn’t visible in the final rendered file.
When you have that prepped, you compute the final hash, then at each block position you select the block representing the letter you want (and its associated random data). So then you can select letters matching the actual file hash value.
It only works against hash functions with practical multicollision attacks. Doesn’t work on SHA256 and newer hashes.
Tldr, modern hash algorithms process data in fixed size blocks. For MD5 you take 128 bits at a time.
The core function in a hash is a little scrambler function (permutation) that takes two different inputs and gives you a single output back.
So it starts with a fixed value built into the algorithm, and then scrambles the first block of the message with it. Then it takes that scrambled piece and mixes that with the next block of the message, then takes THAT scrambled piece and mixes it with the next block. And so on until the end of the message. The last scrambled piece is the hash value.
Collision attacks target that core function by figuring out how to tweak multiple messages so that their scrambler outputs “collide”, ending up equal. So you can hash two tweaked messages and get the same hash value. These tweaks usually include a bunch of random looking bits to work.
Then for a multicollision we don’t just do it for two messages. We do it for every letter in the alphabet. For a HTML document we encode something like <div hidden garbage=xyz>a</div> and repeat for every letter. Every letter gets a distinct random looking value. Then we have many documents with the same hash and one letter different. We can show you a hash and then pick which letter to present you with in the document. All of them checks out.
But then we repeat the attack. We add another whole alphabet right after the first one! Now we have <div hidden_garbage=xyz>a</div> <div hidden_garbage_2=xyz>a</div>. And because the second letter is in a different block, that works just fine! Adding a second letter don’t change the first intermediate value, and you can attack the second intermediate value for the second letter separately. So you add the whole alphabet again (with new associated calculated garbage for every letter in the second position), and now after the second letter we have a new intermediate value which is the same regardless of which letter we pick in the second position.
So now we can independently pick a random letter in the first position and in the second position too! Every combination of two letters has the same hash because of the hidden calculated garbage after each letter!
Then we just repeat the multicollision attack on the whole alphabet over and over until your document is long enough to encode your message. And that message may include the document’s own hash.
Okay first of all this message is really nicely written to explain multi collision attacks! (I knew some stuff about hashing and collision attacks before but not about multi collision and why that would be really useful here.)
However, I first thought they were looking for inputs which basically preserve a known state and then generating an alphabet with those kinds of blocks (basically have one for each symbol and up to n additional blocks to “reset” the state to the known value) because that could shrink the size of stored blocks by a lot (I’d imagine).
But now I am wondering if that’s even possible currently (even with an algorithm as “broken” as MD5 has become now)?
That’s a second pre-image attacks when you’re targeting existing state (attacking hash values of existing data by creating a second file matching it). For some reason even with MD5 that’s still infeasible - but collision attacks where you don’t have a target output value, but instead have partial target inputs which need to have the same output hash, are however practical and fast.
You can but you need to define what part of the data the signature covers (a signature can’t sign itself, so it must be excluded from the data bundle). Signed PDF files has the signature appended after the document data
Exactly. And even though there are message start and end markers it’s not quite clear at which pixel the signed image starts and ends. Also the image format that is signed is not defined.
Has anyone confirmed that signature? I think it’s not possible to have the signature as a part of the data itself. Kinda chicken egg problem
Here you go:
https://www.bleepingcomputer.com/news/security/this-image-shows-its-own-md5-checksum-and-its-kind-of-a-big-deal/
(MD5 is not PGP, but impressive nonetheless)
I opened the comment section to ask if it was possible to have an image with its own hash.
Thanks.
It’s using a combination of multicollision attacks against MD5 and sequences of groups of alternate blocks of data representing the alphabet encoded in a way compatible with the file format.
It’s basically <[a+random]/[b+random]/[c+random]…> * (length of message). The random data is crafted by the attack tool so each block has the exact same effect on the MD5 hashing algorithm as it processes each block. You need to decide how many variable blocks you need and where and their encoding in advance. You encode the blocks so the randomness isn’t visible in the final rendered file.
When you have that prepped, you compute the final hash, then at each block position you select the block representing the letter you want (and its associated random data). So then you can select letters matching the actual file hash value.
It only works against hash functions with practical multicollision attacks. Doesn’t work on SHA256 and newer hashes.
I know some of these words. But I think I roughly understood the general idea. Thanks!
Tldr, modern hash algorithms process data in fixed size blocks. For MD5 you take 128 bits at a time.
The core function in a hash is a little scrambler function (permutation) that takes two different inputs and gives you a single output back.
So it starts with a fixed value built into the algorithm, and then scrambles the first block of the message with it. Then it takes that scrambled piece and mixes that with the next block of the message, then takes THAT scrambled piece and mixes it with the next block. And so on until the end of the message. The last scrambled piece is the hash value.
Collision attacks target that core function by figuring out how to tweak multiple messages so that their scrambler outputs “collide”, ending up equal. So you can hash two tweaked messages and get the same hash value. These tweaks usually include a bunch of random looking bits to work.
Then for a multicollision we don’t just do it for two messages. We do it for every letter in the alphabet. For a HTML document we encode something like <div hidden garbage=xyz>a</div> and repeat for every letter. Every letter gets a distinct random looking value. Then we have many documents with the same hash and one letter different. We can show you a hash and then pick which letter to present you with in the document. All of them checks out.
But then we repeat the attack. We add another whole alphabet right after the first one! Now we have <div hidden_garbage=xyz>a</div> <div hidden_garbage_2=xyz>a</div>. And because the second letter is in a different block, that works just fine! Adding a second letter don’t change the first intermediate value, and you can attack the second intermediate value for the second letter separately. So you add the whole alphabet again (with new associated calculated garbage for every letter in the second position), and now after the second letter we have a new intermediate value which is the same regardless of which letter we pick in the second position.
So now we can independently pick a random letter in the first position and in the second position too! Every combination of two letters has the same hash because of the hidden calculated garbage after each letter!
Then we just repeat the multicollision attack on the whole alphabet over and over until your document is long enough to encode your message. And that message may include the document’s own hash.
Okay first of all this message is really nicely written to explain multi collision attacks! (I knew some stuff about hashing and collision attacks before but not about multi collision and why that would be really useful here.)
However, I first thought they were looking for inputs which basically preserve a known state and then generating an alphabet with those kinds of blocks (basically have one for each symbol and up to n additional blocks to “reset” the state to the known value) because that could shrink the size of stored blocks by a lot (I’d imagine).
But now I am wondering if that’s even possible currently (even with an algorithm as “broken” as MD5 has become now)?
That’s a second pre-image attacks when you’re targeting existing state (attacking hash values of existing data by creating a second file matching it). For some reason even with MD5 that’s still infeasible - but collision attacks where you don’t have a target output value, but instead have partial target inputs which need to have the same output hash, are however practical and fast.
md5 has been broken for years, but thats pretty damn
coolscary.Yeah that only due to md5 hash collisions though. That wouldn’t work on sha for example
*whispers* I stole that signature from cryptostorms warrant canary: https://cryptostorm.is/canary.txt
You fraud.
Hold on I gotta pgp sign my PGP sign so my pgp is signed and I know who it came from.
It might be possible to keep signing with a different key until it matches. But I assume the signature is of the above text.
I mean if you’re prepared to do it 2^128 times in a row…
Or at once if we have a big enough quantum computer.
yea would be interesting. but im also too lazy to type all that text in by hand to verify
Here:
iQIzBAEBCgAdFiEETYf5hKIig5JX/jalu9uZGunHyUIFAmaB8YEACgkQu9uZGunH yUKi7Q/+OJPzHWfGPtzk53KnMJ3C8KQGEUCzKkSKmE0ugdI 9h1Lj4SkvHpKWECK Y1GxNujMPRM/aAS2M97AEbtYolenWzgYm01wt131/hEG4tk+iYeB2Sfyvngbg5KI y4D7mapcVWYSf6S13vUX8VuyKeTxK6xdkp95E0wPVLfJwx505nHOnjLXxeW0IblY URLonem/yuBrJ6Ny3XX9+sKRKcdI9tOghMhTxPcQySXcTx1pAG7YE7G5UqTbJxis wy7LbYZB5Yy0F03CtRIkA+cclG4y2RMM9M9buHzXTWCyDuoQao68yEVh40dqwH1U 5AUnqdve5SiwygF/vc50Ila6VjJ4hyz1qVQnjqqD96p7CSVzVudLDDZMQZ8WvgLh gaEr51xJvH6p6/CP1ji4HHucbJf6BhtSqc8ID9KFfaXxjfZHiUtgsVDYMV0e7u9v 1hcDH/3kmw/JImX25qsEsBeQyzOJsBvx0YD31ZIwSY9+7KNGVQstFrEvCuVPHr72 BQJPIhg3+9g6m36+9Uhs1N6b8G9DsZ60gnNqr9dGturUg6CtRsLSpqoZq0ET9cLA tnFTJDaXgx1DZnsLGDSoQQYjZ3vS+YYZ8jG86KGLEyXVK+uSssvorm9YR1/GGOy7 suaxro72An+MxCczF5TIR9n3gisKvcwa8ZbdoaGd9cigyzWlYg8= =EgZm
You can but you need to define what part of the data the signature covers (a signature can’t sign itself, so it must be excluded from the data bundle). Signed PDF files has the signature appended after the document data
Exactly. And even though there are message start and end markers it’s not quite clear at which pixel the signed image starts and ends. Also the image format that is signed is not defined.
deleted by creator