In Cringe Video, OpenAI CTO Says She Doesn’t Know Where Sora’s Training Data Came From

ylai@lemmy.ml · 1 year ago

In Cringe Video, OpenAI CTO Says She Doesn’t Know Where Sora’s Training Data Came From

Echo Dot@feddit.uk · edit-2 8 months ago

Removed by mod

oKtosiTe@lemmy.world · 1 year ago

if I pirate a movie and then only I watch it, I don’t think anyone would really think I should be arrested for that

There are definitely people out there that think you should be arrested for that.

Echo Dot@feddit.uk · edit-2 8 months ago

Removed by mod

exanime@lemmy.today · 1 year ago

But they ARE selling it … Every answer Chat GPT makes came from possibly stolen material

BoscoBear@lemmy.sdf.org · 1 year ago

Isn’t that true of every opinion you have. All the knowledge you have is based on works of others that came before you.

exanime@lemmy.today · 1 year ago

Not untill I bill you for it

Also, no there is such a thing as an original thought or opinion… Even if it’s informed on other knowledge

There is a difference between reinterpreting other knowledge and just Frankensteining multiple work together

BoscoBear@lemmy.sdf.org · 1 year ago

I don’t know enough about LLMs but Neural networks are capable of original thought. I suspect LLMs are too because of their relationship to Neural Networks.

confusedbytheBasics@lemmy.world · 1 year ago

You’re using the word ‘stolen’ which doesn’t fit. It would be accurate to say 'every answer comes from possibly unlicensed material '.

Guntrigger@feddit.ch · 1 year ago

Allegedly possibly maybe accidentally whoopsie not quite licensed fully material.

exanime@lemmy.today · 1 year ago

Yeap, the real term (I think) would be copyright infringement

rottingleaf@lemmy.zip · 1 year ago

That is a bad thing if they want to be exempt from the law because they are doing a big, very important thing, and we shouldn’t.

The copyright laws are shit, but applying them selectively is orders of magnitude worse.

A_Very_Big_Fan@lemmy.world · 1 year ago

if I pirate a movie and then only I watch it, I don’t think anyone would really think I should be arrested for that, so why is it unacceptable for them but fine for me?

Because it’s more analogous to watching a video being broadcasted outdoors in the public, or looking at a mural someone painted on a wall, and letting it inform your creative works going forward. Not even recording it, just looking at it.

As far as we know, they never pirated anything. What we do know is it was trained on data that literally anybody can go out and look at yourself and have it inform your own work. If they’re out here torrenting a bunch of movies they don’t own or aren’t licencing, then the argument against them has merit. But until then, I think all of this is a bunch of AI hysteria over some shit humans have been doing since the first human created a thing.

StarPupil@ttrpg.network · 1 year ago

An AI (in its current form) isn’t a person drawing inspiration from the world around it, it’s a program made by people with inputs chosen by those people. If those people didn’t ask permission to use other people’s licensed work for their product, then they are plagiarising that work, and they should be subject to the same penalties that, for example, a game company using stolen art in their game should face. An AI doesn’t become inspired, it copies existing things to predict what it thinks its user wants to see. If we produce a real thinking AI at some point in the future, one with self determination and whatnot, the story will be different, but for now it isn’t.

A_Very_Big_Fan@lemmy.world · 1 year ago

What is web scraping if not gathering information from around the world? As long as you’re not distributing copyrighted content (and the models in question here don’t, btw), then fair use is at play. I’m not plagiarizing the news by reading it or by talking about what I learned, but I would be if I just copy/pasted my response from the article.

Reading publicly available data isn’t a copyright violation, and it certainly isn’t a violation of fair use. If it were, then you just plagiarized my comment by reading it before you responded.

exanime@lemmy.today · 1 year ago

Because the actual comparison is that you stole ALL movies, started your own Netflix with them and are lining up to literally make billions by taking the jobs of millions of people, including those you stole from

BoscoBear@lemmy.sdf.org · 1 year ago

I would say it is closer to watching all the movies, regardless of how you got them, then taught a film class at UCLA.

A_Very_Big_Fan@lemmy.world · edit-2 1 year ago

If I paint a melty clock hanging off of a table, how have I stolen from Salvador Dali? What did I “steal” from Tolkien when I drew this?

you stole ALL movies, started your own Netflix with them

The model in question can’t even try to distribute copyrighted material. You could have easily checked for yourself, but once again I find myself having to do the footwork for you guys.

exanime@lemmy.today · 1 year ago

If you sell your melty clock yes, it not “stealing” but you are violating copyright, that’s how it works

The “model in question” is a bit of a prototype, I thought is was clear we are talking about where these models are going… Maybe you’d get it if you came down of your high horse

A_Very_Big_Fan@lemmy.world · 1 year ago

Dali doesn’t own the concept of a melting clock. If I include a melting clock in my own work, as long as it’s not his melting clock with all the other elements of his painting, it’s fair use.

GPT hasn’t been a prototype since before 2018, and the copyright restrictions are only getting tighter every time it’s updated so idk what you’re on about.

GiveMemes@jlai.lu · 1 year ago

Ok but training an ai is not equivalent to watching a movie. It’s more like putting a game on one of those 300 games in one DS cartridges back in the day.

BoscoBear@lemmy.sdf.org · 1 year ago

I don’t think that is true. You aren’t reselling the movies. It is more like watching the movies then writing a recap or critique of the movies. Do you owe the copyright holder for doing that?

Gabu@lemmy.world · 1 year ago

The problem with that being?

GiveMemes@jlai.lu · 1 year ago

Obviously, it’s illegal to sell a product that’s using copyrighted material you don’t have the copyright to. This AI is not open source, it’s a for profit system.

A_Very_Big_Fan@lemmy.world · 1 year ago

deleted by creator

A_Very_Big_Fan@lemmy.world · 1 year ago

It doesn’t, though. You could have easily checked yourself, but I guess I’ll do your research for you.

GiveMemes@jlai.lu · 1 year ago

It does though. You could have easily checked for yourself, but I guess I’ll do your research for you.

https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html

A_Very_Big_Fan@lemmy.world · edit-2 1 year ago

That article doesn’t even claim it’s distributing copyrighted material.

If that qualifies as distributing stolen copyrighted material, then this is stealing and distributing the “you shall not pass” LoTR scene. Which, again, ChatGPT won’t even do

GiveMemes@jlai.lu · 1 year ago

Sorry, I know reading the whole article is hard:

The complaint cites several examples when a chatbot provided users with near-verbatim excerpts from Times articles that would otherwise require a paid subscription to view.

A_Very_Big_Fan@lemmy.world · edit-2 1 year ago

Yeah lmao after like 20 paragraphs of nothing, it wasn’t hard to believe you didn’t know what you were talking about. But I looked at the complaint itself out of curiosity, and it’s flimsy and misleading.

The first issue is 100% of the allegedly paywalled text from all 4 articles mentioned in the complaint can be read by non-paying customers for free outside of the paywall. You can’t read the whole article, but you can get far enough to read all 4 quotes mentioned in the complaint yourself. The links to each article are in the complaint if you don’t believe me. They have nothing to show they bypassed a paywall or that it was trained on unlicensed content.

The second issue is the third exhibit claims it will bypass paywalls when asked. This is demonstrably false because for one, the article they asked it for isn’t paywalled, and for two, using their exact prompts word for word doesn’t work if you try it yourself.

Two of the four exhibits don’t even have screenshots, so there’s no evidence it happened in the first place, but more importantly they don’t (and apparently won’t when asked) disclose what lengths they had to go to in order to get that output. For all we know they gave it 90% of the words and told it to fill in the gaps.