A.I.’s un-learning problem: Researchers say it’s virtually impossible to make an A.I. model ‘forget’ the things it learns from private user data

assassin_aragorn@lemmy.world · 1 year ago

A.I.’s un-learning problem: Researchers say it’s virtually impossible to make an A.I. model ‘forget’ the things it learns from private user data

FaceDeer@kbin.social · 1 year ago

It’s more like the law is saying you must draw seven red lines, all of them strictly perpendicular, some with green ink and some with transparent ink.

It’s not “virtually” impossible, it’s literally impossible. If the law requires that it be possible then it’s the law that must change. Otherwise it’s simply a more complicated way of banning AI entirely, which means that some other jurisdiction will become the world leader in such things.

Primarily0617@kbin.social · 1 year ago

ok i guess you don’t get to use private data in your models too bad so sad

why does the capitalistic urge to become “the world leader” in whatever technology-of-the-month is popular right now supersede a basic human right to privacy?

LittleLordLimerick@lemm.ee · 1 year ago

ok i guess you don’t get to use private data in your models too bad so sad

You seem to have an assumption that all AI models are intended for the sole benefit of corporations. What about medical models that can predict disease more accurately and more quickly than human doctors? Something like that could be hugely beneficial for society as a whole. Do you think we should just not do it because someone doesn’t like that their data was used to train the model?

Primarily0617@kbin.social · 1 year ago

You seem to have an assumption that all AI models are intended for the sole benefit of corporations.

You seem to have the assumption that they’re not. And that “helping society” is anything more than a happy accident that results from “making big profits”.

What about medical models

A pretty big “what if” when every single model that’s been tried for the purpose you suggest so far has either predicted based off the age of a medical imaging scan, or off the doctor’s signature in the corner of one.

Are you asking me whether it’s a good idea to give up the concept of “Privacy” in return for an image classifier that detects how much film grain there is in a given image?

LittleLordLimerick@lemm.ee · 1 year ago

You seem to have the assumption that they’re not. And that “helping society” is anything more than a happy accident that results from “making big profits”.

It’s not an assumption. There’s academic researchers at universities working on developing these kinds of models as we speak.

Are you asking me whether it’s a good idea to give up the concept of “Privacy” in return for an image classifier that detects how much film grain there is in a given image?

I’m not wasting time responding to straw men.

Primarily0617@kbin.social · edit-2 1 year ago

There’s academic researchers at universities working on developing these kinds of models as we speak.

Where does the funding for these models come from? Why are they willing to fund those models? And in comparison, why does so little funding go towards research into how to make neural networks more privacy-compatible?

I’m not wasting time responding to straw men.

Please learn what a straw man argument is
The technology you’re describing doesn’t exist, and likely won’t for a very long time, so all you’re doing is allowing data harvesting en-masse in return for nothing. Your hypothetical would have more teeth if it was anywhere close to being anything but a hypothetical.

Ottomateeverything@lemmy.world · 1 year ago

It’s more like the law is saying you must draw seven red lines, all of them strictly perpendicular, some with green ink and some with transparent ink.

No, it’s more like the law is saying you have to draw seven red lines and you’re saying, “well I can’t do that with indigo, because indigo creates purple ink, therefore the law must change!” No, you just can’t use indigo. Find a different resource.

It’s not “virtually” impossible, it’s literally impossible. If the law requires that it be possible then it’s the law that must change.

There’s nothing that says AI has to exist in a form created from harvesting massive user data in a way that can’t be reversed or retracted. It’s not technically impossible to do that at all, we just haven’t done it because it’s inconvenient and more work.

The law sometimes makes things illegal because they should be illegal. It’s not like you run around saying we need to change murder laws because you can’t kill your annoying neighbor without going to prison.

Otherwise it’s simply a more complicated way of banning AI entirely

No it’s not, AI is way broader than this. There are tons of forms of AI besides forms that consume raw existing data. And there are ways you could harvest only data you could then “untrain”, it’s just more work.

Some things, like user privacy, are actually worth protecting.

LittleLordLimerick@lemm.ee · 1 year ago

There’s nothing that says AI has to exist in a form created from harvesting massive user data in a way that can’t be reversed or retracted. It’s not technically impossible to do that at all, we just haven’t done it because it’s inconvenient and more work.

What if you want to create a model that predicts, say, diseases or medical conditions? You have to train that on medical data or you can’t train it at all. There’s simply no way that such a model could be created without using private data. Are you suggesting that we simply not build models like that? What if they can save lives and massively reduce medical costs? Should we scrap a massively expensive and successful medical AI model just because one person whose data was used in training wants their data removed?

eltimablo@kbin.social · 1 year ago

I guarantee the person you’re arguing with would rather see people die than let an AI help them and be proven wrong.

Ottomateeverything@lemmy.world · edit-2 1 year ago

Well then you’d be wrong. What a fucking fried and delusional take. The fuck is wrong with you?

Ottomateeverything@lemmy.world · 1 year ago

This is an entirely different context - most of the talk here is about LLMs, health data is entirely different, health regulations and legalities are entirely different, people don’t publicly post their health data to begin with, health data isn’t obtained without consent and already has tons of red tape around it. It would be much easier to obtain “well sourced” medical data than thebroad swaths of stuff LLMs are sifting through.

But the point still stands - if you want to train a model on private data, there are different ways to do it.

SkyNTP@lemmy.ml · 1 year ago

At some point, you have to ask yourself if “being a world leader in ai” is worth everything you are sacrificing for it.

AFAIK, trading human creativity for AI art and ai poems is a shit trade. For a lot of reasons. But primarily because AI art is kind of boring.

As for military use of ai… You don’t need grama’s cookie recipe or violating people’s humanity to build it.

a4ng3l@lemmy.world · 1 year ago

All applications of ai & assimilated aren’t nefarious… I’m shopping for a solution to help my company classify its data and do data discovery. I really hope I find a solution - which will likely be based on ai - because the alternative is either we don’t do the activity or the guys that will do it will be miserable. No one should have to spend days looking at very old data stores and wonder what’s in it - and then be accountable for the classification.

Bogasse@lemmy.ml · edit-2 1 year ago

How is “don’t rely on content you have no right to use” litteraly impossible?

We teach to children that there is a Google filter to include only the CC images (that they should use for their presentations).

Also it’s not like we are talking small companies here, a new billion-making industry is being born and it could totally afford contracts with big platforms that would allow to use their content.

stealthnerd@lemmy.world · 1 year ago

This is an article about unlearning data, not about not consuming it in the first place.

LLM’s are not storing learned data in it’s raw, original form. They are injesting it and building an understanding of language based off of it.

Attempting to peel out that knowledge would be incredibly difficult, if not impossible because there’s really no way to identify it.

LittleLordLimerick@lemm.ee · 1 year ago

How is “don’t rely on content you have no right to use” litteraly impossible?

At the time they used the data, they had a right to use it. The participants later revoked their consent for their data to be used, after the model was already trained at an enormous cost.

Bogasse@lemmy.ml · 1 year ago

I have to admit my comment is not really relevant to the article itself (also, I read only the free part of it).

It was more a reaction to the comment above, which felt more generic. My concern about LLMs is that I could never find an auditable list of websites that were crawled, which would be reasonable to ask for, I think.

rebelsimile@sh.itjust.works · 1 year ago

And the rest of the data Google has been viewing, cataloging and selling back to everyone for years, because they’re legally allowed to do so… you don’t see the irony in that?

Bogasse@lemmy.ml · edit-2 1 year ago

Are they selling back scrapped content? I thought it was only user behaviors through the ad network?

About cataloging at least it is opt-out though robot.txt 🤷

EDIT: plus, “we are already doing bad” is never a good argument to continue doing bad, if Google were to be in fault this could get the traction to slap their ass

rebelsimile@sh.itjust.works · 1 year ago

Google crawls the internet, archives entire actual photos, large snippets (at least) from every website it sees, aggregates it into a different form and serves it back to people for profit. It’s the same business model, different results with the processing of the data.

bobettes_bob@kbin.social · 1 year ago

Google doesn’t sell the data they collect… They sell ads and use their data to better target people with said ads. Third parties are paying google to target their ads to the right people.

rebelsimile@sh.itjust.works · 1 year ago

You go to google because of the data they collected from the open internet. Peoples’ photos, articles they’ve written, books, etc. They aggregate it, process it and serve it back to you alongside ads. They also collect data about you and sell that as well. But no one would go to Google if they hadn’t aggregated, processed and repackaged the internet’s data.

bobettes_bob@kbin.social · edit-2 1 year ago

They also collect data about you and sell that as well.

No they don’t. Why would they sell the data they use to target ads? If other corporations could just buy the data, they wouldn’t need to pay google to target the ads, they’d just buy the data and do it themselves, Google isn’t a data broker. They keep the data for them, it would be business suicide if they’d just sell all the data they collect.

eltimablo@kbin.social · 1 year ago

https://www.eff.org/deeplinks/2020/03/google-says-it-doesnt-sell-your-data-heres-how-company-shares-monetizes-and

BraveSirZaphod@kbin.social · 1 year ago

Because the question of what data one has the right to use is a very open legal question right now.

There is absolutely nothing illegal about a person examining publicly accessible artwork or text, learning from it, and attempting to reproduce a similar style. AIs are, in essence, doing basically the same thing. However, the sheer difference in time and scale may warrant a different legal treatment. That has not yet been settled, and it will probably take a fair amount of societal debate and new legislation before we have a definite answer.