- cross-posted to:
- technology@lemmy.world
- cross-posted to:
- technology@lemmy.world
Today, a prominent child safety organization, Thorn, in partnership with a leading cloud-based AI solutions provider, Hive, announced the release of an AI model designed to flag unknown CSAM at upload. It’s the earliest AI technology striving to expose unreported CSAM at scale.
I am a bit confused how it is legal for them to have the training data here?
Like is there anything a corpo can’t do?
Like why can’t subway Jared and Catholic church “train the AI”
Only half way joking, what’s the catch here?
There are laws around it. Law enforcement doesn’t just delete any digital CSAM they seize.
Known CSAM is archived and analyzed rather than destroyed, and used to recognize additional instances of the same files in the wild. Wherever file scanning is possible.
Institutions and corporation can request licenses to access the database, or just the metadata that allows software to tell if a given file might be a copy of known CSAM.
This is the first time an attempt is being made at using the database to create software able to recognize CSAM that isn’t already known.
I’m personally quite sceptical of the merit. It may well be useful for scanning the public internet, but I’m guessing the plan is to push for it to be somehow implemented for private communication, no matter how badly that compromises the integrity of encryption.
I don’t think you even need the actual stuff to train a neural network to recognize it. For example, if I wanted to train a neural network to recognize pictures of lions, but I didn’t have any actual pictures of lions, I could use pictures of lion-shaped things, lion-colored things and locations where lions might appear. If a picture is hitting all three of those, it’s very likely to be a lion. Very likely is all a neural network can do, so it’s good enough for my purposes.