Maybe, depending on the algorithm used. Some are designed to produce the same output given similar inputs.
It’s also easy to abuse systems like that in order to get someone falsely flagged, by generating a file with the same checksum as known CSAM.
It’s also easy for someone in power (or with the right access) to add checksums of anything they don’t like, such as documents associated with opposing political or religious views.
One-way math doesn’t preclude finding a collision.
(And just to be clear, checksum in the context of this conversation is a generic term that includes cryptographic hashes and perceptual hashes.)
Also, since we’re talking about a list of checksums, an attacker wouldn’t even have to find a collision with a specific one to get someone in trouble. This makes an attack far easier. See also: the birthday problem.
Checksums, on the other hand, are designed to minimize the probability of collisions between similar inputs, without regard for collisions between very different inputs.[8] Instances where bad actors attempt to create or find hash collisions are known as collision attacks.[9]
Maybe, depending on the algorithm used. Some are designed to produce the same output given similar inputs.
It’s also easy to abuse systems like that in order to get someone falsely flagged, by generating a file with the same checksum as known CSAM.
It’s also easy for someone in power (or with the right access) to add checksums of anything they don’t like, such as documents associated with opposing political or religious views.
In other words, still invasive and dangerous.
More thoughts here: https://www.eff.org/deeplinks/2019/11/why-adding-client-side-scanning-breaks-end-end-encryption
Checksums wouldnt work well for their purposes if they could easily be made to match any desired checksum. It’s one way math.
One-way math doesn’t preclude finding a collision.
(And just to be clear, checksum in the context of this conversation is a generic term that includes cryptographic hashes and perceptual hashes.)
Also, since we’re talking about a list of checksums, an attacker wouldn’t even have to find a collision with a specific one to get someone in trouble. This makes an attack far easier. See also: the birthday problem.
Checksums, on the other hand, are designed to minimize the probability of collisions between similar inputs, without regard for collisions between very different inputs.[8] Instances where bad actors attempt to create or find hash collisions are known as collision attacks.[9]
https://en.m.wikipedia.org/wiki/Hash_collision#:~:text=Checksums%2C on the other hand,are known as collision attacks.