OpenAI now tries to hide that ChatGPT was trained on copyrighted books, including J.K. Rowling's Harry Potter series

L4sBot@lemmy.world · 2 years ago

OpenAI now tries to hide that ChatGPT was trained on copyrighted books, including J.K. Rowling's Harry Potter series

fubo@lemmy.world · edit-2 2 years ago

If I memorize the text of Harry Potter, my brain does not thereby become a copyright infringement.

A copyright infringement only occurs if I then reproduce that text, e.g. by writing it down or reciting it in a public performance.

Training an LLM from a corpus that includes a piece of copyrighted material does not necessarily produce a work that is legally a derivative work of that copyrighted material. The copyright status of that LLM’s “brain” has not yet been adjudicated by any court anywhere.

If the developers have taken steps to ensure that the LLM cannot recite copyrighted material, that should count in their favor, not against them. Calling it “hiding” is backwards.

cantstopthesignal@sh.itjust.works · edit-2 2 years ago

You are a human, you are allowed to create derivative works under the law. Copyright law as it relates to machines regurgitating what humans have created is fundamentally different. Future legislation will have to address a lot of the nuance of this issue.

uis@lemmy.world · 2 years ago

And allowed get sued anyway

UnculturedSwine@lemmy.world · 2 years ago

Another sensationalist title. The article makes it clear that the problem is users reconstructing large portions of a copyrighted work word for word. OpenAI is trying to implement a solution that prevents ChatGPT from regurgitating entire copyrighted works using “maliciously designed” prompts. OpenAI doesn’t hide the fact that these tools were trained using copyrighted works and legally it probably isn’t an issue.

Gyoza Power@discuss.tchncs.de · 2 years ago

Let’s not pretend that LLMs are like people where you’d read a bunch of books and draw inspiration from them. An LLM does not think nor does it have an actual creative process like we do. It should still be a breach of copyright.

efstajas@lemmy.world · 2 years ago

… you’re getting into philosophical territory here. The plain fact is that LLMs generate cohesive text that is original and doesn’t occur in their training sets, and it’s very hard if not impossible to get them to quote back copyrighted source material to you verbatim. Whether you want to call that “creativity” or not is up to you, but it certainly seems to disqualify the notion that LLMs commit copyright infringement.

Snorf@reddthat.com · edit-2 2 years ago

This topic is fascinating.

I really do think i understand both sides here and want to find the hard line that seperates man from machine.

But it feels, to me, that some philosophical discussion may be required. Art is not something that is just manufactured. “Created” is the word to use without quotation marks. Or maybe not, i don’t know…

Gyoza Power@discuss.tchncs.de · 2 years ago

I wasn’t referring to whether the LLM commits copyright infringement when creating a text (though that’s an interesting topic as well), but rather the act of feeding it the texts. My point was that it is not like us in a sense that we read and draw inspiration from it. It’s just taking texts and digesting them. And also, from a privacy standpoint, I feel kind of disgusted at the thought of LLMs having used comments such as these ones (not exactly these, but you get it), for this purpose as well, without any sort of permission on our part.

That’s mainly my issue, the fact that they have done so the usual capitalistic way: it’s easier to ask for forgiveness than to ask for permission.

RedKrieg@lemmy.redkrieg.com · 2 years ago

I think you’re putting too much faith in humans here. As best we can tell the only difference between how we compute and what these models do is scale and complexity. Your brain often lies to you and makes up reasoning behind your actions after the fact. We’re just complex networks doing math.

Schadrach@lemmy.sdf.org · 2 years ago

but rather the act of feeding it the texts.

Unless you are going to argue the act of feeding it the texts is distributing the original text or doing some kind of public performance of the text, I don’t see how.

CrustyCrinkles@sh.itjust.works · 2 years ago

*could

StrongFox@lemmy.world · 2 years ago

you bought the book to memorize from, anyway.

Agent641@lemmy.world · 2 years ago

No, I shoplifted it from an Aldi

Blapoo@lemmy.ml · 2 years ago

We have to distinguish between LLMs

Trained on copyrighted material and
Outputting copyrighted material

They are not one and the same

Tetsuo@jlai.lu · 2 years ago

Output from an AI has just been recently considered as not copyrightable.

I think it stemmed from the actors strikes recently.

It was stated that only work originating from a human can be copyrighted.

Anders429@lemmy.world · 2 years ago

Output from an AI has just been recently considered as not copyrightable.

Where can I read more about this? I’ve seen it mentioned a few times, but never with any links.

Even_Adder@lemmy.dbzer0.com · 2 years ago

They clearly only read the headline If they’re talking about the ruling that came out this week, that whole thing was about trying to give an AI authorship of a work generated solely by a machine and having the copyright go to the owner of the machine through the work-for-hire doctrine. So an AI itself can’t be authors or hold a copyright, but humans using them can still be copyright holders of any qualifying works.

TwilightVulpine@lemmy.world · 2 years ago

Should we distinguish it though? Why shouldn’t (and didn’t) artists have a say if their art is used to train LLMs? Just like publicly displayed art doesn’t provide a permission to copy it and use it in other unspecified purposes, it would be reasonable that the same would apply to AI training.

Blapoo@lemmy.ml · 2 years ago

Ah, but that’s the thing. Training isn’t copying. It’s pattern recognition. If you train a model “The dog says woof” and then ask a model “What does the dog say”, it’s not guaranteed to say “woof”.

Similarly, just because a model was trained on Harry Potter, all that means is it has a good corpus of how the sentences in that book go.

Thus the distinction. Can I train on a comment section discussing the book?

Even_Adder@lemmy.dbzer0.com · 2 years ago

Yeah, this headline is trying to make it seem like training on copyrighted material is or should be wrong.

scv@discuss.online · 2 years ago

Legally the output of the training could be considered a derived work. We treat brains differently here, that’s all.

I think the current intellectual property system makes no sense and AI is revealing that fact.

TropicalDingdong@lemmy.world · edit-2 8 months ago

Removed by mod

Skanky@lemmy.world · 2 years ago

Vanilla Ice had it right all along. Nobody gives a shit about copyright until big money is involved.

uis@lemmy.world · 2 years ago

Yep. Legally every word is copyrighted. Yes, law is THAT stupid.

UsernameIsTooLon@lemmy.world · 2 years ago

People think it’s a broken system, but it actually works exactly how the rich want it to work.

stappern@lemmy.one · 2 years ago

So did I so what? Is my brain property of Warner now?

OkToBeTakei@lemm.ee · edit-2 2 years ago

deleted by creator

Asuka@sh.itjust.works · 2 years ago

If I read Harry Potter and wrote a novel of my own, no doubt ideas from it could consciously or subconsciously influence it and be incorporated into it. Hey is that any different from what an LLM does?

stappern@lemmy.one · 2 years ago

Not what happened in this case tho.

newIdentity@sh.itjust.works · 2 years ago

Your brain isn’t an AI model

OR IS IT?

TwilightVulpine@lemmy.world · 2 years ago

You joke but AI advocates seem to forget that people have fundamentally different rights than tools and objects. A photocopier doesn’t get the right to “memorize” and “learn” from a text that a human being does. As much as people may argue that AIs work different, AIs are still not people.

And if they ever become people, the situation will be much more complicated than whether they can imitate some writer. But we aren’t there yet, even their advocates just uses them as tools.

kmkz_ninja@lemmy.world · 2 years ago

How do you see that as a difference? Tools are extensions of ourselves.

Restricting the use of LLMs is only restricting people.

TwilightVulpine@lemmy.world · 2 years ago

When we get to the realm of automation and AI, calling tools just an “extension of ourselves” doesn’t make sense.

Especially not when the people being “extended” by Machine Learning models did not want to be “extended” to begin with.

TropicalDingdong@lemmy.world · edit-2 8 months ago

Removed by mod

OkToBeTakei@lemm.ee · edit-2 2 years ago

deleted by creator

wmassingham@lemmy.world · edit-2 2 years ago

They can own it, actually. If you use the characters of Bugs Bunny, etc., or the setting (do they have a canonical setting?) then Warner does own the rights to the material you’re using.

For example, see how the original Winnie the Pooh material just entered public domain, but the subsequent Disney versions have not. You can use the original stuff (see the recent horror movie for an example of legal use) but not the later material like Tigger or Pooh in a red shirt.

Now if your work is satire or parody, then you can argue that it’s fair use. But generally, most companies don’t care about fan fiction because it doesn’t compete with their sales. If you publish your Harry Potter fan fiction on Livejournal, it wouldn’t be worth the money to pay the lawyers to take it down. But if you publish your Larry Cotter and the Wizard’s Rock story on Amazon, they’ll take it down because now it’s a competing product.

joxese3341@sh.itjust.works · edit-2 2 years ago

deleted by creator

Sethayy@sh.itjust.works · 2 years ago

I think its more like writing a loony toons fanfic based only on pirated material

stappern@lemmy.one · 2 years ago

How are you gonna prove that I watched it on tv or torrented?

Sethayy@sh.itjust.works · 2 years ago

Can’t but theyre pretty open on how they trained the model, so like almost admitted guilt (though they werent hosting the pirated content, its still out there and would be trained on). Cause unless they trained it on a paid Netflix account, there’s no way to get it legally.

Idk where this lands legally, but I’d assume not in their favour

CoderKat@lemm.ee · edit-2 2 years ago

It’s honestly a good question. It’s perfectly legal for you to memorize a copyrighted work. In some contexts, you can recite it, too (particularly the perilous fair use). And even if you don’t recite a copyrighted work directly, you are most certainly allowed to learn to write from reading copyrighted books, then try to come up with your own writing based off what you’ve read. You’ll probably try your best to avoid copying anyone, but you might still make mistakes, simply by forgetting that some idea isn’t your own.

But can AI? If we want to view AI as basically an artificial brain, then shouldn’t it be able to do what humans can do? Though at the same time, it’s not actually a brain nor is it a human. Humans are pretty limited in what they can remember, whereas an AI could be virtually boundless.

If we’re looking at intent, the AI companies certainly aren’t trying to recreate copyrighted works. They’ve actively tried to stop it as we can see. And LLMs don’t directly store the copyrighted works, either. They’re basically just storing super hard to understand sets of weights, which are a challenge even for experienced researchers to explain. They’re not denying that they read copyrighted works (like all of us do), but arguably they aren’t trying to write copyrighted works.

SubArcticTundra@lemmy.ml · 2 years ago

No, because you paid for a single viewing of that content with your cinema ticket. And frankly, I think that the price of a cinema ticket (= a single viewing, which it was) should be what OpenAI should be made to pay.

stappern@lemmy.one · 2 years ago

I didn’t. I torrented it.

rosenjcb@lemmy.world · edit-2 2 years ago

The powers that be have done a great job convincing the layperson that copyright is about protecting artists and not publishers. It’s historically inaccurate and you can discover that copyright law was pushed by publishers who did not want authors keeping second hand manuscripts of works they sold to publishing companies.

Additional reading: https://en.m.wikipedia.org/wiki/Statute_of_Anne

Sentau@lemmy.one · edit-2 2 years ago

I think a lot of people are not getting it. AI/LLMs can train on whatever they want but when then these LLMs are used for commercial reasons to make money, an argument can be made that the copyrighted material has been used in a money making endeavour. Similar to how using copyrighted clips in a monetized video can make you get a strike against your channel but if the video is not monetized, the chances of YouTube taking action against you is lower.

Edit - If this was an open source model available for use by the general public at no cost, I would be far less bothered by claims of copyright infringement by the model

Tyler_Zoro@ttrpg.network · 2 years ago

AI/LLMs can train on whatever they want but when then these LLMs are used for commercial reasons to make money, an argument can be made that the copyrighted material has been used in a money making endeavour.

And does this apply equally to all artists who have seen any of my work? Can I start charging all artists born after 1990, for training their neural networks on my work?

Learning is not and has never been considered a financial transaction.

maynarkh@feddit.nl · edit-2 2 years ago

Actually, it has. The whole consept of copyright is relatively new, and corporations absolutely tried to have people who learned proprietary copyrighted information not be able to use it in other places.

It’s just that labor movements got such non-compete agreements thrown out of our society, or at least severely restricted on humanitarian grounds. The argument is that a human being has the right to seek happiness by learning and using the proprietary information they learned to better their station. By the way, this needed a lot of violent convincing that we have this.

So yes, knowledge and information learned is absolutely withing the scope of copyright as it stands, it’s only that the fundamental rights that humans have override copyright. LLMs (and companies for that matter) do not have such fundamental rights.

Copyright by the way is stupid in its current implementation, but OpenAI and ChatGPT does not get to get out of it IMO just because it’s “learning”. We humans ourselves are only getting out of copyright because of our special legal status.

zbyte64@lemmy.blahaj.zone · 2 years ago

Ehh, “learning” is doing a lot of lifting. These models “learn” in a way that is foreign to most artists. And that’s ignoring the fact the humans are not capital. When we learn we aren’t building a form a capital; when models learn they are only building a form of capital.

Tyler_Zoro@ttrpg.network · 2 years ago

Artists, construction workers, administrative clerks, police and video game developers all develop their neural networks in the same way, a method simulated by ANNs.

This is not, “foreign to most artists,” it’s just that most artists have no idea what the mechanism of learning is.

The method by which you provide input to the network for training isn’t the same thing as learning.

zbyte64@lemmy.blahaj.zone · 2 years ago

ANNs are not the same as synapses, analogous yes, but different mathematically even when simulated.

Prager_U@lemmy.world · 2 years ago

This is orthogonal to the topic at hand. How does the chemistry of biological synapses alone result in a different type of learned model that therefore requires different types of legal treatment?

The overarching (and relevant) similarity between biological and artificial nets is the concept of connectionist distributed representations, and the projection of data onto lower dimensional manifolds. Whether the network achieves its final connectome through backpropagation or a more biologically plausible method is beside the point.

Sentau@lemmy.one · 2 years ago

Artists, construction workers, administrative clerks, police and video game developers all develop their neural networks in the same way, a method simulated by ANNs.

Do we know enough about how our brain functions and how neural networks functions to make this statement?

Yendor@reddthat.com · 2 years ago

Do we know enough about how our brain functions and how neural networks functions to make this statement?

Yes, we do. Take a university level course on ML if you want the long answer.

Sentau@lemmy.one · 2 years ago

My friends who took computer science told me that we don’t totally understand how machine learning algorithms work. Though this conversation was a few years ago in college. Will have to ask them again

Yendor@reddthat.com · 2 years ago

When we learn we aren’t building a form a capital; when models learn they are only building a form of capital.

What do you think education is? I went to university to acquire knowledge and train my skills so that I could later be paid for those skills. That was literally building my own human capital.

zbyte64@lemmy.blahaj.zone · 2 years ago

Humanities and Art majors are often criticized for not producing such capital.

FMT99@lemmy.world · 2 years ago

But wouldn’t this training and the subsequent output be so transformative that being based on the copyrighted work makes no difference? If I read a Harry Potter book and then write a story about a boy wizard who becomes a great hero, anyone trying to copyright strike that would be laughed at.

Sentau@lemmy.one · edit-2 2 years ago

Your probability of getting copyright strike depends on two major factors -

• How similar your story is to Harry Potter.

• If you are making money of that story.

uis@lemmy.world · 2 years ago

It doesn’t matter how similar. Copyright doesn’t protect meaning, copyright protect form. If you read HP and then draw a picture of it, said picture becomes its separate work, not even derivative.

1ird@notyour.rodeo · edit-2 2 years ago

How is it any different from someone reading the books, being influenced by them and writing their own book with that inspiration? Should the author of the original book be paid for sales of the second book?

Sentau@lemmy.one · 2 years ago

Again that is dependent on how similar the two books are. If I just change the names of the characters and change the grammatical structure and then try to sell the book as my own work, I am infringing the copyright. If my book has a different story but the themes are influenced by another book, then I don’t believe that is copyright infringement. Now where the line between infringement and no infringement lies is not something I can say and is a topic for another discussion

uis@lemmy.world · edit-2 2 years ago

change the grammatical structure

I.e. change form. Copyright protect form, thus in coutries that judge either by spirit or letter of law instead of size of moneybags this is ok.

Affine Connection@lemmy.world · 2 years ago

using copyrighted clips in a monetized video can make you get a strike against your channel

Much of the time, the use of very brief clips is clearly fair use, but the people who issue DMCA claims don’t care.

ciwolsey@lemmy.world · edit-2 2 years ago

You could run a paid training course using a paid-for book, that doesn’t mean you’re breaking copyright.

Schadrach@lemmy.sdf.org · 2 years ago

I think a lot of people are not getting it. AI/LLMs can train on whatever they want but when then these LLMs are used for commercial reasons to make money, an argument can be made that the copyrighted material has been used in a money making endeavour.

Only in the same way that I could argue that if you’ve ever watched any of the classic Disney animated movies then anything you ever draw for the rest of your life infringes on Disney’s copyright, and if you draw anything for money then the Disney animated movies you have seen in your life have been used in a money making endeavor. This is of course ridiculous and no one would buy that argument, but when you replace a human doing it with a machine doing essentially the same thing (observing and digesting a bunch of examples of a given kind of work, and producing original works of the general kind that meet a given description) suddenly it’s different, for some nebulous reason that mostly amounts to creatives who believed their jobs could not at least in part be automated away trying to get explicit protection from their jobs being at least in part automated away.

uzay@infosec.pub · 2 years ago

I hope OpenAI and JK Rowling take each other down

Touching_Grass@lemmy.world · 2 years ago

What’s the issue against openAI?

Corkyskog@sh.itjust.works · 2 years ago

They used to be a non profit, that immediately turned it into a for profit when their product was refined. They took a bunch of people’s effort whether it be training materials or training Monkeys using the product and then slapped a huge price tag on it.

Touching_Grass@lemmy.world · 2 years ago

I didn’t know they were a non profit. I’m good as long as they keep the current model. Release older models free to use while charging for extra or latest features

BURN@lemmy.world · 2 years ago

They’re stealing a ridiculous amount of copyrighted works to use to train their model without the consent of the copyright holders.

This includes the single person operations creating art that’s being used to feed the models that will take their jobs.

OpenAI should not be allowed to train on copyrighted material without paying a licensing fee at minimum.

uzay@infosec.pub · 2 years ago

Also Sam Altman is a grifter who gives people in need small amounts of monopoly money to get their biometric data

LifeInMultipleChoice@lemmy.ml · 2 years ago

So hypothetical here. If Dreddit did launch a system that made it so users could trade Karma in for real currency or some alternative, does that mean that all fan fictions and all other fan boy account created material would become copyright infringement as they are now making money off the original works?

Stamets [Mirror]@startrek.website · edit-2 1 year ago

Removed by mod

uzay@infosec.pub · 2 years ago

He’s not helping them. That’s my point. He’s taking advantage of them for his grift, so fuck him.

Stamets [Mirror]@startrek.website · edit-2 1 year ago

Removed by mod

Stamets [Mirror]@startrek.website · edit-2 1 year ago

Removed by mod

Snapz@lemmy.world · 2 years ago

Sticky this comment

paraphrand@lemmy.world · 2 years ago

Why are people defending a massive corporation that admits it is attempting to create something that will give them unparalleled power if they are successful?

bamboo@lemm.ee · 2 years ago

Mostly because fuck corporations trying to milk their copyright. I have no particular love for OpenAI (though I do like their product), but I do have great distain for already-successful corporations that would hold back the progress of humanity because they didn’t get paid (again).

msage@programming.dev · 2 years ago

But OpenAI will do the same?

bamboo@lemm.ee · 2 years ago

Perhaps, and when that happens I would be equally disdainful towards them.

LifeInMultipleChoice@lemmy.ml · edit-2 2 years ago

In the United States there was a judgement made the other day saying that works created soley by AI are not copyright-able. So that that would put a speed bumb there.
I may have misunderstood what you though.

msage@programming.dev · 2 years ago

Yeah, they might not copyright it, but after it becomes the ‘one true AI’, it will be at the hands of Microsoft, so please do not act friendly towards them.

It will turn on you just like every private company has.

(don’t mean specifically you, but everyone generally)

uis@lemmy.world · 2 years ago

Huh. Doesn’t this means technically AI cannot do copyright infringement.

LifeInMultipleChoice@lemmy.ml · 2 years ago

Nah, it would mean that you cannot copyright a work created by an AI, such as a piece of art.

E.g. if you tell it to draw you a donkey carting avocados, the picture can be used by anyone from what I understand.

uis@lemmy.world · 2 years ago

you cannot copyright a work created by an AI, such as a piece of art.

That’s what I said. Copyright infringement is when there is another copyrightable object that is copy of first object. AI is not witin copyright area. You can’t copyright it, but also you can’t be sued for copyright infringement too.

if you tell it to draw you a donkey carting avocados, the picture can be used by anyone from what I understand.

Yes. Same for Public Domain, but PD is another status. PD applies only to copyrightable work.

uis@lemmy.world · 2 years ago

It’s like argument “but new politicians will steal more” that I hear in Russia from people who protect Putin

msage@programming.dev · 2 years ago

It’s literally not, wtf.

Do not let any private entity to get overwhelming majority on anything period.

But do not kid yourself that Microsoft will let OpenAI do anything for public once it gets big enough.

OpenAI is open only in name after they rolled back all the promises of being for everyone.

uis@lemmy.world · edit-2 2 years ago

That’s my entire point. It’s not who, but how long.

Also Microsoft plays both sides here. OpenAI vs copyright is wrong question. There’s more: both are status-quo. Both are for keeping corporate ownership of ideas.

assassin_aragorn@lemmy.world · 2 years ago

There’s a massive difference though between corporations milking copyright and authors/musicians/artists wanting their copyright respected. All I see here is a corporation milking copyrighted works by creative individuals.

SCB@lemmy.world · 2 years ago

deleted by creator

Cosmic Cleric@lemmy.world · 2 years ago

Because ultimately, it’s about the truth of things, and not what team is winning or losing.

stappern@lemmy.one · 2 years ago

i think trying to keep this cat in the bag is jsut a waste of time. plus i dont respect copyright sooo…

Whimsical@lemmy.world · 2 years ago

The dream would be that they manage to make their own glorious free & open source version, so that after a brief spike in corporate profit as they fire all their writers and artists, suddenly nobody needs those corps anymore because EVERYONE gets access to the same tools - if everyone has the ability to churn out massive content without hiring anyone, that theoretically favors those who never had the capital to hire people to begin with, far more than those who did the hiring.

Of course, this stance doesn’t really have an answer for any of the other problems involved in the tech, not the least of which is that there’s bigger issues at play than just “content”.

Stinkywinks@lemmy.world · 2 years ago

Because everyone learns from books, it’s stupid.

otherbastard@lemm.ee · 2 years ago

An LLM is not a person, it is a product. It doesn’t matter that it “learns” like a human - at the end of the day, it is a product created by a corporation that used other people’s work, with the capacity to disrupt the market that those folks’ work competes in.

Touching_Grass@lemmy.world · edit-2 2 years ago

And it should be able to freely use anything that’s available to it. These massive corporations and entities have exploited all the free spaces to advertise and sell us their own products and are now sour.

If they had their way they are going to lock up much more of the net behind paywalls. Everybody should be with the LLMs on this.

otherbastard@lemm.ee · 2 years ago

You are somehow conflating “massive corporation” with “independent creator,” while also not recognizing that successful LLM implementations are and will be run by massive corporations, and eventually plagued with ads and paywalls.

People that make things should be allowed payment for their time and the value they provide their customer.

Touching_Grass@lemmy.world · edit-2 2 years ago

People are paid. But they’re greedy and expect far more compensation then they deserve. In this case they should not be compensated for having an LLM ingest their work work if that work was legally owned or obtained

assassin_aragorn@lemmy.world · 2 years ago

Except the massive corporations and entities are the ones getting rich on this. They’re seeking to exploit the work of authors and musicians and artists.

Respecting the intellectual property of creative workers is the anti corporate position here.

uis@lemmy.world · 2 years ago

Except corporations have infinitely more resources(money, lawyers) compared to people who create. Take Jarek Duda(mathematician from Poland) and Microsoft as an example. He created new compression algorythm, and Microsoft came few years later and patented it in Britain AFAIK. To file patent contest and prior art he needs 100k£.

assassin_aragorn@lemmy.world · 2 years ago

I think there’s an important distinction to make here between patents and copyright. Patents are the issue with corporations, and I couldn’t care less if AI consumed all that.

uis@lemmy.world · 2 years ago

And for copyright there is no possible way to contest it. Also when copyright expires there is no guarantee it will be accessable by humanity. Patents are bad, copyright even worse.

uis@lemmy.world · 2 years ago

There is nothing anti corporate if result can be alienated.

Cosmic Cleric@lemmy.world · 2 years ago

If they had their way they are going to lock up much more of the net behind paywalls.

This!

When the Internet was first a thing corpos tried to put everything behind paywalls, and we pushed back and won.

Now, the next generation is advocating to put everything behind a paywall again?

Stinkywinks@lemmy.world · 2 years ago

How are we going to make ai, if it can’t learn?

scarabic@lemmy.world · 2 years ago

First, we don’t have to make AI.

Second, it’s not about it being unable to learn, it’s about the fact that they aren’t paying the people who are teaching it.

Stinkywinks@lemmy.world · 2 years ago

Then give the AI a library card, feel better?

FatCrab@lemmy.one · 2 years ago

The reasoning that claims training a generative model is infringing IP would still mean a robot going into a library with a card it has to optically read all the books there to create the same generative model would still be infringing IP.

Touching_Grass@lemmy.world · 2 years ago

Same way that counting cards is illegal

AncientMariner@lemmy.world · 2 years ago

Humans can judge information make decisions on it and adapt it. AI mostly just looks at what is statistically what is most likely based on training data. If 1 piece of data exists, it will copy, not paraphrase. Example was from I think copilot where it just printed out the code and comments from an old game verbatim. I think Quake2. It isn’t intelligence, it is statistical copying.

uis@lemmy.world · 2 years ago

Well, mathematics cannot be copyrighted. In most countries at least.

stappern@lemmy.one · 2 years ago

yeah lets not explore this technology because it might hurt some copyrights holders

LOOOOL fuck em

assassin_aragorn@lemmy.world · 2 years ago

because it might hurt authors and musicians and artists and other creative workers

FTFY. Corporations shouldn’t be making a fucking dime from any of these works without fairly paying the creators.

SCB@lemmy.world · 2 years ago

Leftists hating on AI while dreaming of post-scarcity will never not be funny

Crozekiel@lemmy.zip · 2 years ago

AI is the new fan boy following since it became official that nfts are all fucking scams. They need a new technological God to push to feel superior to everyone else…

GroggyGuava@lemmy.world · 2 years ago

Are you ok? You seem upset

nave@lemmy.zip · edit-2 1 year ago

deleted by creator

RadialMonster@lemmy.world · 2 years ago

what if they scraped a whole lot of the internet, and those excerpts were in random blogs and posts and quotes and memes etc etc all over the place? They didnt injest the material directly, or knowingly.

beetus@sh.itjust.works · 2 years ago

Not knowing something is a crime doesn’t stop you from being prosecuted for committing it.

It doesn’t matter if someone else is sharing copyright works and you don’t know it and use it in ways that infringes on that copyright.

“I didn’t know that was copyrighted” is not a valid defence.

stewsters@lemmy.world · 2 years ago

Is reading a passage from a book actually a crime though?

Sure, you could try to regenerate the full text from quotes you read online, much like you could open a lot of video reviews and recreate larger portions of the original text, but you would not blame the video editing program for that, you would blame the one who did it and decided to post it online.

chemical_cutthroat@lemmy.world · 2 years ago

That’s why this whole argument is worthless, and why I think that, at its core, it is disingenuous. I would be willing to be a steak dinner that a lot of these lawsuits are just fishing for money, and the rest are set up by competition trying to slow the market down because they are lagging behind. AI is an arms race, and it’s growing so fast that if you got in too late, you are just out of luck. So, companies that want in are trying to slow down the leaders, at best, and at worst they are trying to make them publish their training material so they can just copy it. AI training models should be considered IP, and should be protected as such. It’s like trying to get the Colonel’s secret recipe by saying that all the spices that were used have been used in other recipes before, so it should be fair game.

Kujo@lemm.ee · 2 years ago

If training models are considered IP then shouldn’t we allow other training models to view and learn from the competition? If learning from other IPs that are copywritten is okay, why should the training models be treated different?

chemical_cutthroat@lemmy.world · 2 years ago

They are allegedly learning from copyrighted material, there is no actual proof that they have been trained on the actual material, or just snippets that have been published online. And it would be illegal for them to be trained on full copyrighted materials, because it is protected by laws that prevent that.

ClamDrinker@lemmy.world · edit-2 2 years ago

This is just OpenAI covering their ass by attempting to block the most egregious and obvious outputs in legal gray areas, something they’ve been doing for a while, hence why their AI models are known to be massively censored. I wouldn’t call that ‘hiding’. It’s kind of hard to hide it was trained on copyrighted material, since that’s common knowledge, really.

Default_Defect@midwest.social · 2 years ago

They made it read Harry Potter? No wonder its gonna kill us all one day.

Thorny_Thicket@sopuli.xyz · 2 years ago

I don’t get why this is an issue. Assuming they purchased a legal copy that it was trained on then what’s the problem? Like really. What does it matter that it knows a certain book from cover to cover or is able to imitate art styles etc. That’s exactly what people do too. We’re just not quite as good at it.

Hildegarde@lemmy.world · 2 years ago

A copyright holder has the right to control who has the right to create derivative works based on their copyright. If you want to take someone’s copyright and use it to create something else, you need permission from the copyright holder.

The one major exception is Fair Use. It is unlikely that AI training is a fair use. However this point has not been adjudicated in a court as far as I am aware.

FatCat@lemmy.world · 2 years ago

It is not a derivative it is transformative work. Just like human artists “synthesise” art they see around them and make new art, so do LLMs.

BURN@lemmy.world · 2 years ago

LLMs don’t create anything new. They have limited access to what they can be based on, and all assumptions made by it are based on that data. They do not learn new things or present new ideas. Only ideas that have been already done and are present in their training.

Hildegarde@lemmy.world · 2 years ago

Transformative works are not a thing.

If you copy the copyrightable elements of another work, you have created a derivative work. That work needs to be transformative in order to be eligible for its own copyright, but being transformative alone is not enough to make it non-infringing.

There are four fair use factors. Transformativeness is only considered by one of them. That is not enough to make a fair use.

Cosmic Cleric@lemmy.world · 2 years ago

Transformativeness is only considered by one of them. That is not enough to make a fair use.

Somebody better let YouTube content creators know that. /s

LordShrek@lemmy.world · 2 years ago

this is so fucking stupid though. almost everyone reads books and/or watches movies, and their speech is developed from that. the way we speak is modeled after characters and dialogue in books. the way we think is often from books. do we track down what percentage of each sentence comes from what book every time we think or talk?

SpiderShoeCult@sopuli.xyz · 2 years ago

Aye, but I’m thinking the whole notion of copyright is banking on the fact that human beings are inherently lazy and not everyone will start churning out books in the same universe or style. And if they do, it takes quite some time to get the finished product and they just get sued for it. It’s easy, because there’s a single target.

So there’s an extra deterrent to people writing and publishing a new harry potter novel, unaffiliated with the current owner of the copyright. Invest all that time and resources just to be sued? Nah…

Issue with generating stuff with 'puters is that you invest way less time, so the same issue pops up for the copyright owner, they’re just DDoS-ed on their possible attack routes. Will they really sue thousands or hundreds of thoudands of internet randos generating harry potter erotica using a LLM? Would you even know who they are? People can hide money away in Switzerland from entite governments, I’m sure there are ways to hide your identity from a book publisher.

It was never about the content, it’s about the opportunities the technology provides to halt the gears of the system that works to enforce questionable laws. So they’re nipping it in the bud.

LordShrek@lemmy.world · 2 years ago

this brings up the question: what is a book? what is art? if an “AI” can now churn out the next harry potter sequel and people literally can’t tell that it’s not written by JK Rowling, then what does that mean for what people value in stories? what is a story? is this a sign that we humans should figure something new out, instead of reacting according to an outdated protocol?

yes, authors made money in the past before AI. now that we have AI and most people can get satisfied by a book written by AI, what will differentiate human authors from AI? will it become a niche thing, where some people can tell the difference and they prefer human authors? or will there be some small number of exceptional authors who can produce something that is obviously different from AI?

i see this as an opportunity for artists to compete with AI, rather than say “hey! no fair! he can think and write faster than me!”

SpiderShoeCult@sopuli.xyz · 2 years ago

Well, poor literature has always existed, which some might not even dignify to call literature. Are writers of such things threatened by LLMs? Of course they are. Every new technology has beought with it the fear of upending somebody’s world. And to some extent, every new technology has indeed done just that.

Personally, and… this will probably be highly unpopular, I honestly don’t care who or what created a piece of art. Is it pretty? Does it satisfy my need for just the right amount of weird, funny and disturbing to stir emotions or make me go ‘heh, interesting!’? Then it really doesn’t matter where it comes from. We put way too much emphasis on the pedigree of art and not on the content. Hell, one very nice short story I read was the greentext one about humans being AI and escaping from the simulation. Wonder how many would scoff at calling art something that came out of 4chan?

Maybe this is the issue? Art is thought of as a purely human endeavour (also birds do it, and that one pufferfish that draws on the seabed, but they’re “dumb” animals so they don’t count, right? hell, there’s even a jumping spider that does some pretty rad dances). And if code in a machine can do it just as well (can it? let it - we’ll be all the better for it. can’t it? let it be then - no issue) then what would be the significance of being human?

stappern@lemmy.one · 2 years ago

ssuming they purchased a legal copy that it was trained on then what’s the problem?

i never purchased a copy of harry potter i got a loaner. now what?

Uriel238 [all pronouns]@lemmy.blahaj.zone · edit-2 2 years ago

Training AI on copyrighted material is no more illegal or unethical than training human beings on copyrighted material (from library books or borrowed books, nonetheless!). And trying to challenge the veracity of generative AI systems on the notion that it was trained on copyrighted material only raises the specter that IP law has lost its validity as a public good.

The only valid concern about generative AI is that it could displace human workers (or swap out skilled jobs for menial ones) which is a problem because our society recognizes the value of human beings only in their capacity to provide a compensation-worthy service to people with money.

The problem is this is a shitty, unethical way to determine who gets to survive and who doesn’t. All the current controversy about generative AI does is kick this can down the road a bit. But we’re going to have to address soon that our monied elites will be glad to dispose of the rest of us as soon as they can.

Also, amateur creators are as good as professionals, given the same resources. Maybe we should look at creating content by other means than for-profit companies.

Tetsuo@jlai.lu · edit-2 2 years ago

If I’m not mistaken AI work was just recently considered as NOT copyrightable.

So I find interesting that an AI learning from copyrighted work is an issue even though what will be generated will NOT be copyrightable.

So even if you generated some copy of Harry Potter you would not be able to copyright it. So in no way could you really compete with the original art.

I’m not saying that it makes it ok to train AIs on copyrighted art but I think it’s still an interesting aspect of this topic.

As others probably have stated, the AI may be creating content that is transformative and therefore under fair use. But even if that work is transformative it cannot be copyrighted because it wasn’t created by a human.

Even_Adder@lemmy.dbzer0.com · edit-2 2 years ago

If you’re talking about the ruling that came out this week, that whole thing was about trying to give an AI authorship of a work generated solely by a machine and having the copyright go to the owner of the machine through the work-for-hire doctrine. So an AI itself can’t be authors or hold a copyright, but humans using them can still be copyright holders of any qualifying works.

XEAL@lemm.ee · 2 years ago

How are they going to prove if something was written by an AI? Also, you can take the AI’s output and then modify it.

Tetsuo@jlai.lu · 2 years ago

That’s definitely an issue. At what point does copyright applies if you are just helped by an AI ?

I guess the courts will have to decide that…

habanhero@lemmy.ca · 2 years ago

How do you tell if a piece of work contains AI generated content or not?

It’s not hard to generate a piece of AI content, put in some hours to round out AI’s signatures / common mistakes, and pass it off as your own. So in practise it’s still easy to benefit from AI systems by masking generate content as largely your own.

Lucidlethargy@sh.itjust.works · 2 years ago

That’s not how copyright works. I’m embarrassed for you, and all the people who blindly upvoted you. Like… Yikes. What’s happening to this world?

You can’t publish copyrighted work as your own just because you’re legally not able to publish copyrighted work. That’s a open and shut case of copyright infringement. Why do I have to say this? Am I on candid camera?