Google says AI systems should be able to mine publishers’ work unless companies opt out, turning copyright law on its head

0x815@feddit.de · 2 years ago

Google says AI systems should be able to mine publishers’ work unless companies opt out, turning copyright law on its head

ConsciousCode@beehaw.org · 2 years ago

To be honest I’m fine with it in isolation, copyright is bullshit and the internet is a quasi-socialist utopia where information (an infinitely-copyable resource which thus has infinite supply and 0 value under capitalist economics) is free and humanity can collaborate as a species. The problem becomes that companies like Google are parasites that take and don’t give back, or even make life actively worse for everyone else. The demand for compensation isn’t so much because people deserve compensation for IP per se, it’s an implicit understanding of the inherent unfairness of Google claiming ownership of other people’s information while hoarding it and the wealth it generates with no compensation for the people who actually made that wealth. “If you’re going to steal from us, at least pay us a fraction of the wealth like a normal capitalist”.

If they made the models open source then it’d at least be debatable, though still suss since there’s a huge push for companies to replace all cognitive labor with AI whether or not it’s even ready for that (which itself is only a problem insofar as people need to work to live, professionally created media is art insofar as humans make it for a purpose but corporations only care about it as media/content so AI fits the bill perfectly). Corporations are artificial metaintelligences with misaligned terminal goals so this is a match made in superhell. There’s a nonzero chance corporations might actually replace all human employees and even shareholders and just become their own version of skynet.

Really what I’m saying is we should eat the rich, burn down the googleplex, and take back the means of production.

cambriakilgannon@beehaw.org · 2 years ago

Or, if it was some non-profit doing the work for the good of everyone :')

ConsciousCode@beehaw.org · 2 years ago

If only there were some kind of open AI research lab lmao. In all seriousness Anthropic is pretty close to that, though it appears to be a public benefit corporation rather than a nonprofit. Luckily the open source community in general is really picking up the slack even without a centralized organization, I wouldn’t be surprised if we get something like the Linux Foundation eventually.

Ubermeisters@lemmy.zip · 2 years ago

Okay so I took back the means of production but it says it’s a subscription basis now

ConsciousCode@beehaw.org · 2 years ago

That’s late-stage capitalism for you – even revolution comes with a subscription fee

SpaceCowboy@lemmy.ca · 2 years ago

Probably shoulda read the Revolution TOS before clicking “I Agree”.

superkret@feddit.de · edit-2 2 years ago

deleted by creator

ConsciousCode@beehaw.org · 2 years ago

That’s fair, also congratulations. Idk if I would count that towards contributing to the internet though, since it’s all within their walled garden on their own terms. It’s helpful for people, but only insofar as it helps Google. 10 years ago I might be less critical since they were still in their “don’t be evil” phase and creating open source projects like Android left and right, something they’re evidently regretting now and trying to lock down using propriety core apps. It’s also worth noting Google’s AI employees authored “Attention is all you need”, the paper which laid the groundwork for modern Transformer-based LLMs, though that’s an architecture and not a full model or code.

SokathHisEyesOpen@lemmy.ml · edit-2 2 years ago

This is like the beginning of a Hitchhiker’s Guide to the Galaxy, where they put the responsibility on the main character to go to the department of transportation basement and see that they had posted a notice that they’re going to destroy his house. No Google, you don’t get to dictate that people come to your dark pattern website and tell you you’re not allowed to use their content. Disapproval is implied until people OPT-IN! It’s a good thing Google changed their motto from Don’t Be Evil or we’d have quite the conundrum.

YⓄ乙 @aussie.zone · edit-2 2 years ago

Can we get some young politicians elected who has a degree in IT ? Boomers dont understand technology that’s why these companies keeps screwing the people.

zephyrvs@lemmy.ml · 2 years ago

It’s because they’re corrupt and young people are just as susceptible to lobbyists bribes, unfortunately. The gerontocracy doesn’t make things better though, that’s for sure.

Storksforlegs@beehaw.org · edit-2 2 years ago

True but that doesn’t mean it wouldn’t be better to have politicians who have a better understanding of the systems they’re legislating. “People can be bribed” isn’t a good excuse to not change anything.

zephyrvs@lemmy.ml · 2 years ago

Definitely, I didn’t mean to sound too defeatist.

wargreymon2023@sopuli.xyz · 2 years ago

This is more true than anything.

katy ✨@lemmy.blahaj.zone · 2 years ago

Me, twenty years ago: i wish the word web 2.0 could disappear forever

Me, like 8 years ago: i wish the word web 3.0 could disappear forever

Monkeys paw: 👆

AI, coming crashing through the window and blindsiding me upside the head: surprise bitch

FaceDeer@kbin.social · 2 years ago

Copyright law already allows generative AI systems to scrape the internet. You need to change the law to forbid something, it isn’t forbidden by default. Currently, if something is published publicly then it can be read and learned from by anyone (or anything) that can see it. Copyright law only prevents making copies of it, which a large language model does not do when trained on it.

lostmypasswordanew@feddit.de · 2 years ago

An AI model is a derivative work of its training data and thus a copyright violation if the training data is copyrighted.

BlameThePeacock@lemmy.ca · 2 years ago

A human is a derivative work of its training data, thus a copyright violation if the training data is copyrighted.

The difference between a human and ai is getting much smaller all the time. The training process is essentially the same at this point, show them a bunch of examples and then have them practice and provide feedback.

If that human is trained to draw on Disney art, then goes on to create similar style art for sale that isn’t a copyright infringement. Nor should it be.

lostmypasswordanew@feddit.de · 2 years ago

Humans and AI are not the same and an equivalence should never be drawn.

BlameThePeacock@lemmy.ca · 2 years ago

Your feelings don’t really matter, the fact of the matter is that the goal of ai is literally to replicate the function of a human brain. The way we’re building them is often mimicking the same processes.

acastcandream@beehaw.org · 2 years ago

the fact of the matter is that the goal of AI is literally to replicate the function of a human brain

…says who? That’s absolutely your feeling and not facts.

nickwitha_k (he/him)@lemmy.sdf.org · 2 years ago

And LLMs and related technologies, by themselves, are artificial but not intelligent. So, the facts are not in favor of your argument to allow commercial parasitism on creative works.

BlameThePeacock@lemmy.ca · 2 years ago

I think you’re missing a point here. If someone uses these to models to produce and distribute copyright infringing works, the original rights holder could go after the infringer.

The model itself isn’t infringing though, and the process of creating the model isn’t either.

It’s a similar kind of argument to the laws that protect gun manufacturers from culpability from someone using their weapon to commit a crime. The user is the one doing the bad thing, they just produce a tool.

Otherwise, could Disney go after a pencil company because someone used one of their pencils to infringe on their copyright. Even if that pencil company had designed the pencil to be extremely good at producing Disney imagery by looking at a whole bunch of Disney images and movies to make sure it matches the size, colour, etc? No, because a pencil isn’t a copyright infringement of art, regardless of the process used to design it.

acastcandream@beehaw.org · edit-2 2 years ago

deleted by creator

nickwitha_k (he/him)@lemmy.sdf.org · edit-2 2 years ago

Nah. You’re missing the forest for the trees. Let’s get abstract:

Person A makes a living by making product X and selling it.

Person B makes a living by making product Y and selling it.

Both A and B are in the same industry.

Person C uses a machine to extract the essence of product X and Y and blend them. Person C then claims authorship and sells it as product Z, which they sell in competition to X and Y.

Person C has not created anything. Their machine does not have value in the absence of products X and Y, yet received no permission, offers no credit nor compensation. In addition, they are competing for the same customers and harming the livelihoods of A and B. Person C is acting in a purely parasitic manner that cannot be seen as ethical in any widely accepted definition of the word.

Phanatik@kbin.social · 2 years ago

This is stupid and I’ll tell you why.
As humans, we have a perception filter. This filter is unique to every individual because it’s fed by our experiences and emotions. Artists make great use of this by producing art which leverages their view of the world, it’s why Van Gogh or Picasso is interesting because they had a unique view of the world that is shown through their work.
These bots do not have perception filters. They’re designed to break down whatever they’re trained on into numbers and decipher how the style is constructed so it can replicate it. It has no intention or purpose behind any of its decisions beyond straight replication.
You would be correct if a human’s only goal was to replicate Van Gogh’s style but that’s not every artist. With these art bots, that’s the only goal that they will ever have.

I have to repeat this every time there’s a discussion on LLM or art bots:
The imitation of intelligence does not equate to actual intelligence.

acastcandream@beehaw.org · 2 years ago

this is stupid I’ll tell you why

Not sure why you think anyone would read anything if that’s how you start it.

BlameThePeacock@lemmy.ca · 2 years ago

You’re completely wrong, and I’ll tell you why.

None of what you said matters, perception filters, intent, intelligence… it’s all irrelevant to the discussion.

Copyright infringement only gives certain rights, and at least here in Canada using them to generate a model isn’t one of those. Rights are for things like distribution, reproduction, public performance, communication, and exhibition. US law says you can’t “Prepare derivative works based upon the work.” but the model isn’t a derivative work because it’s not really a work at all, you can’t even visually look at the model. You can’t copyright an algorithm in the US or Canada.

Only the created art should be scrutinized for copyright infringement, and these systems can generate both (just like a human can).

Any enforcement should then be handled when that protected work is then used to infringe on the actual rights of the copyright holder.

Phanatik@kbin.social · 2 years ago

I wasn’t talking about copyright law in regards to the model itself.

I was talking about what is/isn’t grounds for plagiarism. I strongly disagree with the idea that artists and art bots go through the same process. They don’t and it’s reductive to claim otherwise. It negatively impacts the perception of artists’ work to assert that these models can automate a creative process which might not even involve looking at other artists’ work because humans are able to create on their own.

A person who has never looked upon a single painting in their life can still produce a piece but the same cannot be said for an art bot. A model must be trained on work that you want the model to be able to imitate.

This is why ChatGPT required the internet to do what it does (the privacy violation is another big concern there). The model needed vast quantities of information to be sufficiently trained because language is difficult to decipher. Languages evolved by getting in contact with other languages and organically making new words. ChatGPT will never invent a new word because it’s not intelligent, it is merely imitating intelligence.

BlameThePeacock@lemmy.ca · 2 years ago

“A person who has never looked upon a single painting in their life can still produce a piece but the same cannot be said for an art bot. A model must be trained on work that you want the model to be able to imitate.”

No, they really can’t. Go look a 1 year old’s first attempt at “art” because it’s nothing more than random smashing of colour on paper. A computer could easily generate such “work” as well with no training data at all. They’ve seen art at that point, and still can’t replicate it because they need much more training first.

Humans require books (or teachers who read books) to learn how to read and write. That is “vast quantities of information” being consumed to learn how to do it. If you had never seen or heard of a book, you wouldn’t be able to write a novel. It’s also completely ignoring the fact that you had to previously learn the spoken language as well (which is a vast quantity of information that takes a human decades to acquire proficiency in even with daily practice)

Phanatik@kbin.social · 2 years ago

Once again, being reductive about artists’ work. Jackson Pollock’s entire career was smashing colours on a canvas. If you want to argue that Pollock had to look at thousands of paintings before making his, I honestly can’t take you seriously at that point.

A computer could easily generate such “work” as well with no training data at all.

Yes and in the eyes of its creators, that was deemed a failure which is why Midjourney and Dall-E are the way they are. These bots don’t want to create art, they want to imitate it.

Children have barely any experiences and can still create something. You might not deem it worthy of calling it art but they created something despite their limited knowledge and life experience.

Of course, you’d need books to read and write. The words have to be written and you need to see the words in written form if you also want to write them. But one thing you don’t take into account is handwriting. Another thing that is unique to every individual. Some have worse handwriting than others and with practice (like any muscle) it can be improved but you haven’t had to have seen handwritten text before writing it yourself. You only need to be taught how to hold a pen and you can write.

Novels are complex structures of language just like poetry. In order to write novels, you have to consume novels because it’s well understood that to find your own narrative voice you must see how others express theirs. Stories are told in unique ways and it’s crucial as a writer to understand and break these concepts down. Intention and purpose form a core part of storytelling and an LLM cannot and will not be able to express those things.

They’re written in certain ways because the author intended them to be that way, such as Cormac McCarthy deciding to be very minimalist with his punctuation.
I would love to see you make a point that an LLM without being specifically prompted to do so would make that stylistic decision. An LLM can’t make that decision because unless you specify a style it is aware of, it won’t organically do it.

I am also a writer. I’ve written a short story. One of my stylistic choices is that I don’t use dialogue tags like “said”. An LLM won’t make that choice because it isn’t designed to do so, it won’t decide to minimise its use of dialogue tags to improve the flow of the narrative unless you told it to.

It’s also completely ignoring the fact that you had to previously learn the spoken language as well (which is a vast quantity of information that takes a human decades to acquire proficiency in even with daily practice).

Yes, in order to learn a spoken language you have to have heard it. However, languages evolve over time. You develop regional accents and dialects. All of the UK speaks English but no two towns speak the same way.

frog 🐸@beehaw.org · 2 years ago

Absolutely agreed! I think if the proponents of AI artwork actually had any knowledge of art history, they’d understand that humans don’t just iterate the same ideas over and over again. Van Gogh, Picasso, and many others, did work that was genuinely unique and not just a derivative of what had come before, because they brought more to the process than just looking at other artworks.

nickwitha_k (he/him)@lemmy.sdf.org · 2 years ago

Yup. There seems to be a strong motive in many to not understand this concept as it makes their practices clearly ethically questionable.

frog 🐸@beehaw.org · 2 years ago

My feeling is that the vast majority of pro-AI techbros come from a computer science, finance, or business background; undoubtedly intelligent people, but completely and utterly lacking in any appreciation or understanding of what actually goes into creative work. I’m sure they genuinely believe that there’s no difference between what a human does and what an AI does, because they think art (or writing, music, etc) are just the product of an algorithm.

Phanatik@kbin.social · 2 years ago

Ironically, my background is in mathematics but I also happen to be a writer so I see both sides of the argument. I just see the utter lack of compassion people have for those who produce creative work and the same people believe that if it can be automated, it should be automated.

nickwitha_k (he/him)@lemmy.sdf.org · 2 years ago

Likely. Which is weird because algorithms are only a subset of software engineering, which requires abstract and creative thought to perform well.

FaceDeer@kbin.social · 2 years ago

It is not a derivative work, the model does not contain any recognizable part of the original material that it was trained on.

frog 🐸@beehaw.org · 2 years ago

Except when it produces exact copies of existing works, or when it includes a recognisable signature or watermark?

NumbersCanBeFun@kbin.social · 2 years ago

deleted by creator

frog 🐸@beehaw.org · 2 years ago

The point is that if the model doesn’t contain any recognisable parts of the original material it was trained on, how can it reproduce recognisable parts of the original material it was trained on?

ricecake@beehaw.org · 2 years ago

That’s sorta the point of it.
I can recreate the phrase “apple pie” in any number of styles and fonts using my hands and a writing tool. Would you say that I “contain” the phrase “apple pie”? Where is the letter ‘p’ in my brain?

Specifically, the AI contains the relationship between sets of words, and sets of relationships between lines, contrasts and colors.
From there, it knows how to take a set of words, and make an image that proportionally replicates those line pattern and color relationships.

You can probably replicate the Getty images watermark close enough for it to be recognizable, but you don’t contain a copy of it in the sense that people typically mean.
Likewise, because you can recognize the artist who produced a piece, you contain an awareness of that same relationship between color, contrast and line that the AI does. I could show you a Picasso you were unfamiliar with, and you’d likely know it was him based on the style.
You’ve been “trained” on his works, so you have internalized many of the key markers of his style. That doesn’t mean you “contain” his works.

maynarkh@feddit.nl · 2 years ago

A lot of licensing prevents or constrains creating derivative works and monetizing them. The question is for example if you train an AI on GPL code, does the output of the model constitute a derivative work?

If yes, Github Copilot is illegal as it produces code that should comply to multiple conflicting license requirements. If no, I can write some simple AI that is “trained” to regurgitate its output on a prompt, and run a leaked copy of Windows through it, then go around selling Binbows and MSFT can’t do anything about it.

The truth is mostly between the two, this is just piracy, which always has been a gray area because of the difficulty of prosecuting it, previously because the perpetrators were many and hard to find, now it’s because the perpetrators are billion dollar companies with expensive lawyer teams.

FaceDeer@kbin.social · 2 years ago

The question is for example if you train an AI on GPL code, does the output of the model constitute a derivative work?

This question is completely independent of whether the code was generated by an AI or a human. You compare code A with code B, and if the judge and jury agree that code A is a derivative work of code B then you win the case. If the two bodies of work don’t have sufficient similarities then they aren’t derivative.

If no, I can write some simple AI that is “trained” to regurgitate its output on a prompt

You’ve reinvented copy-and-paste, not an “AI.” AIs are deliberately designed to not copy-and-paste. What would be the point of one that did? Nobody wants that.

Filtering the code through something you call an AI isn’t going to have any impact on whether you get sued. If the resulting code looks like copyrighted code, then you’re in trouble. If it doesn’t look like copyrighted code then you’re fine.

maynarkh@feddit.nl · 2 years ago

AIs are deliberately designed to not copy-and-paste.

AI is a marketing term, not a technical one. You can call anything “AI”, but it’s usually predictive models that get called that.

AIs are deliberately designed to not copy-and-paste. What would be the point of one that did? Nobody wants that.

For example if the powers that be decided to say licenses don’t apply once you feed material through an “AI”, and failed to define AI, you could say you wrote this awesome OS using an AI that you trained exclusively using Microsoft proprietary code. Their licenses and copyright and stuff doesn’t apply to AI training data so you could sell that new code your AI just created.

It doesn’t even have to be 100% identical to Windows source code. What if it’s just 80%? 50%? 20%? 5%? Where is the bar where the author can claim “that’s my code!”?

Just to compare, the guys who set out to reimplement Win32 APIs for use in Linux (the thing that made it into MacOS as well now) deliberately would not accept help from anyone who ever saw any Microsoft source code for fear of being sued. The bar was that high when it was a small FOSS organization doing it. It was 0%, proven beyond a doubt.

Now that Microsoft is the author, it’s not a problem when Github Copilot spits out GPL code word for word, ironically together with its license.

FaceDeer@kbin.social · 2 years ago

AI is a marketing term, not a technical one.

The reverse, actually. Artificial intelligence is a field of research that includes things like machine learning, as well as lots of even more mundane applications. It’s pop culture that has hijacked it to mean “a thing exactly as capable as a human brain, but in computer form.”

For example if the powers that be decided to say licenses don’t apply once you feed material through an “AI”, and failed to define AI, you could say you wrote this awesome OS using an AI that you trained exclusively using Microsoft proprietary code.

Once again, it doesn’t matter what you “feed code through.” Copyright applies to the tangible result. If the output from the AI matches closely to something that’s already copyrighted then that copyright applies to it. If it doesn’t match closely then that copyright doesn’t apply to it. The actual process by which the code was produced doesn’t matter one whit. If I took a Harry Potter book, put its pages through a shredder, randomly glued the particles of paper back together and it just so happened to closely replicate Lord of the Rings then the Tolkien estate has a case against me but the Rowling estate does not.

AbsolutelyNotABot@feddit.it · 2 years ago

then go around selling Binbows and MSFT can’t do anything about it

I think this already happen. A very practical example, windows GUI has been copied by many Linus distros. And with windows 11 there’s clearly a reference to Apple MacOS GUI with a sparkling of Google material design.

Should apple and Google be able to sue Microsoft because it “copied” their work? Should Google be able to sue apple because they “copied” the notification drop-down in iOS?

As you say it’s really a grey area because the only reason we consider AI code to be “regurgitated” while human code to be “inspired” is only because we give humans more recognition of their intellectual abilities.

Boinketh@lemm.ee · 2 years ago

Except that programmers get sued for taking their expertise to competitors and musical artists get sued for using similar melodies. Give everyone the freedom you want to give AI or don’t give it to anyone, but carving out an exception for AI is just plainly wrong.

nous@programming.dev · 2 years ago

Someone getting sued does not mean they are wrong or that they lost the case. Each case needs to look at the works in question and decide if that perceptual case violates copy write. Lots of things are taken into account here, and even is small elements might have been used or be similar does not automatically win the case.

There is also a difference between some implementation and the overall feature in question. For instance, APIs are not copy writeable, nor are cords in music, nor what something does overall. Only specific implementations are copy writeable.

The same can apply to AI - if it generates a work that if a human did it it would violate copy write then it does - if not then it does not. But AI shows a different problem. That of scale. There is only a limited amount of work that a human can do. But an AI can produce vastly more content - enough that a case by case evaluation of infringement might not be viable. And if that becomes the case then AI works might need to be treated differently from human created works - or maybe how the models are created and how they can use copy writed works. The current laws were never designed with the speed at which AI can work in mind.

Boinketh@lemm.ee · 2 years ago

If an AI has been trained on copyrighted material and can be shown to be capable of reproducing something close enough to said material, would that be infringement already or not? If you use a paid service like Midjourney to generate copyrighted content, the company is essentially selling you access to copyrighted content they lack the rights to.

nous@programming.dev · 2 years ago

What do you mean by infringement already? So you mean it automatically infringes copyright for all its output just because it might create something similar to a copyrighted work? Or do you mean that if it does create a copyrighted work that work in infringing on a copyright? Your wording is vague here.

can be shown to be capable of reproducing something close enough to said material

I don’t think it is a good benchmark for forbidding AI generation of content. If you create a random image generate that has no inputs and is truly random then it is capable of generating something similar to copyrighted work - by pure chance. Even if that chance is very low you could generate enough images and show it can create something similar to copyrighted works.

What happens if you create one that is trained only on public domain images or works properly licensed? Its output is still partially random and could still generate an image similar to some other copyrighted work outside of its training set by pure chance.

I would argue that both of these should be allowed. They are not doing anything obviously wrong even if they could be used to generate copyrighted works. Just like you could use photoshop - or a paint brush to create copyrighted work.

But then, what if you take some other AI that is trained on all sorts of data, copyrighted or not. But then the output of that is fed through a checker that compares it to the training set (and maybe more copyrighted content) and rejects/regenerates work until it is known to not infringe on copyrighted work. Making the chances of it ever producing a copyrighted work far less then the above programs? Should that be allowed? It is using copyrighted work much like an artist would and you could argue that any copyrighted work it does produce was by pure accident as there are intentional steps to mitigate that.

If you use a paid service like Midjourney to generate copyrighted content, the company is essentially selling you access to copyrighted content they lack the rights to.

As far as I understand the laws involved, yeah I would expect that to infringe on some copyright holders work and midjourney would likely be coppable for damages. Just like hiring a artist to create some work and they decide to copy some copyrighted work would also make that artist coppable for damages.

And you also have to consider another side of things - if you can effectively stop AI from training on most works you will effectively stunt its usefulness. Which could lead all efforts in regulated nations to become useless which can result in it just moving to places that are much more open with the technology and where authors of the copyrighted work will have far less control over things. IMO AI generated content is out of the bag now and we will not get it back in. So the best we can do is ensure the right people get compensated for their works. Push to hard in the wrong direction (either way) and there is a real chance they never will.

I don’t really have the solutions to many of these problems - but I do think it is worth talking about and don’t think that outright bans (or actions leading to an effective ban) on this tech is the correct way to go.

Boinketh@lemm.ee · edit-2 2 years ago

To be clear, my position is that copyright law should be loosened, not tightened. I know that it’s unreasonable and infeasible to limit AI like that, both for practical and competitive reasons.

When I said that it could be shown to generate copyrighted content, I didn’t mean it had a chance, I meant showing actual examples of it doing so. I also think that it should be allowed to do that, but so should everyone else. In my opinion, derivative works should almost always be allowed unless they can be proven to cause significant harm to the original creator.

Niello@kbin.social · 2 years ago

Exactly this right here.

andresil@lemm.ee · edit-2 2 years ago

Copyright law is gaslighting at this point. Piracy being extremely illegal but then this kind of shit being allowed by default is insane.

We really are living under the boot of the ruling classes.

z3rOR0ne@lemmy.ml · edit-2 2 years ago

The ruling class is seeing the end of capitalism. They’re getting desperate and making it obvious.

Gutless2615@ttrpg.network · edit-2 2 years ago

It’s not turning copyright law on its head, in fact asserting that copyright needs to be expanded to cover training a data set IS turning it on its head. This is not a reproduction of the original work, its learning about that work and and making a transformative use from it. An generative work using a trained dataset isn’t copying the original, its learning about the relationships that original has to the other pieces in the data set.

argv_minus_one@beehaw.org · 2 years ago

This is artificial pseudointelligence, not a person. It doesn’t learn about or transform anything.

acastcandream@beehaw.org · edit-2 1 year ago

spoiler

asdfasdfsadfasfasdf

jarfil@beehaw.org · edit-2 2 years ago

To take those statements seriously, you will need to:

define and describe in detail the processes by which “a person” learns
define and describe in detail how “a person” transforms anything
define and describe in detail what is “intelligence”
define and describe in detail what these “artificial paeudointelligences” are doing
define and describe in detail the differences between the latter and the previous points

Otherwise, I’ll claim that “a person” is running exactly the same processes (neural networks, LLMs, hallucinations), and that calling these AIs “artificial paeudointelligences” is nothing else than dehumanizing a minority just because you feel threatened by them.

Gutless2615@ttrpg.network · edit-2 2 years ago

Im not the one anthropomorphising the technology here.

phillaholic@lemm.ee · 2 years ago

The lines between learning and copying are being blurred with AI. Imagine if you could replay a movie any time you like in your head just from watching it once. Current copyright law wasn’t written with that in mind. It’s going to be interesting how this goes.

ricecake@beehaw.org · 2 years ago

Imagine being able to recall the important parts of a movie, it’s overall feel, and significant themes and attributes after only watching it one time.

That’s significantly closer to what current AI models do. It’s not copyright infringement that there are significant chunks of some movies that I can play back in my head precisely. First because memory being owned by someone else is a horrifying thought, and second because it’s not a distributable copy.

SkepticElliptic@beehaw.org · 2 years ago

How many movies are based on each other? It’s a lot, even if it’s just loosely based on it. If you stopped allowing that then you would run out of new things to do.

phillaholic@lemm.ee · 2 years ago

the thought of human memory being owned is horrifying. We’re talking about AI. This is a paradigm shift. New laws are inevitable. Do we want AI to be able to replicate small creators work and ruin their chances at profitability? If we aren’t careful, we are looking at yet another extinction wave where only the richest who can afford the AI can make anything. I don’t think it’s hyperbole to be concerned.

ricecake@beehaw.org · 2 years ago

The question to me is how you define what the AI is doing in a way that isn’t hilariously overbroad to the point of saying “Disney can copyright the style of having big eyes and ears”, or “computers can’t analyze images”.

Any law expanding copyright protections will be 90% used by large IP holders to prevent small creators from doing anything.

What exactly should be protected that isn’t?

acastcandream@beehaw.org · edit-2 2 years ago

Let me ask you this: do you think our brains and LLM’s are, overall, pretty distinct? This is not a trick or bait or something, I’m just going through this methodically in hopes my position - which is shared by some others in this thread it seems - is better understood.

ricecake@beehaw.org · 2 years ago

I don’t think they work the same way, but I think they work in ways that are close enough in function that they can be treated the same for the purposes of this conversation.

Pen and pencil are “the same”, and either of those and printed paper are “basically the same”.
The relationship between a typical modern AI system and the human mind is like that between a pencil written document and a word document: entirely dissimilar in essentially every way, except for the central issue of the discussion, namely as a means to convey the written word.

Both the human mind and a modern AI take in input data, and extract relationships and correlations from that data and store those patterns in a batched fashion with other data.
Some data is stored with a lot of weight, which is why I can quote a movie at you, and the AI can produce a watermark: they’ve been used as inputs a lot. Likewise, the AI can’t perfectly recreate those watermarks and I can’t tell you every detail from the scene: only the important bits are extracted. Less important details are too intermingled with data from other sources to be extracted with high fidelity.

jarfil@beehaw.org · edit-2 2 years ago

my head […] not a distributable copy.

There has been an interesting counter-proposal to that: make all copies “non-distributable” by replacing the 1:1 copying, by AI:AI learning, so the new AI would never have a 1:1 copy of the original.

It’s in part embodied in the concept of “perishable software”, where instead of having a 1:1 copy of an OS installed on your smartphone/PC, a neural network hardware would “learn how to be a smartphone/PC”.

Reinstalling, would mean “killing” the previous software, and training the device again.

MachineFab812@discuss.tchncs.de · 2 years ago

Right, because the cool part of upgrading your phone is trying to make it feel like its your phone, from scratch. Perishable software is anything but desirable, unless you enjoy having the very air you breathe sold to you.

jarfil@beehaw.org · 2 years ago

Well, depends on desirable “by whom”.

Imagine being a phone manufacturer and having all your users running a black box only you have the means to re-flash or upgrade, with software developers having to go through you so you can train users’ phones to “behave like they have the software installed”

It’s a dictatorial phone manufacturer’s wet dream.

MachineFab812@discuss.tchncs.de · 2 years ago

Yes, that’s exactly my problem with it.

jarfil@beehaw.org · edit-2 2 years ago

Imagine if you could replay a movie any time you like in your head just from watching it once.

Two points:

These AIs can’t do that; they need thousands or millions of repetitions to “learn” the movie, and every time they “replay” the movie it is different from the original.
“learning by rote” is something fleshbags can do, and are actually required to by most education systems.

So either humans have been breaking the copyright all this time, or the machines aren’t breaking it either.

phillaholic@lemm.ee · 2 years ago

You have one brain. You could have as many instances of AI as you can afford. In a general sense, it’s different, and acting like it’s not is going to hit you like a freight train if you don’t prepare for it.

jarfil@beehaw.org · edit-2 2 years ago

That’s a different goalpost. I get the difference between 8 billion brains, and 8 billion instances of the same AI. That has nothing to do with whether there is a difference in copyright infringement, though.

If you want another goalpost, that IMHO is more interesting: let’s discuss the difference between 8 billion brains with up to 100 years life experience each, vs. just a million copies of an AI with the experience of all human knowledge each.

(That’s still not really what’s happening, which is tending more towards several billion copies of AIs with vast slices of human knowledge each).

phillaholic@lemm.ee · 2 years ago

It’s all theoretical at this stage, but like everything else that society waits until it’s too late for, I think it’s reasonable to be cautious and not just let AI go unregulated.

jarfil@beehaw.org · 2 years ago

It’s not reasonable to regulate stuff before it gets developed. Regulation means establishing some limits and controls on something, which can’t be reasonably defined before that “something” even exists, much less tested or decided whether the regulation has whatever desired effects it intends.

For what is worth, a “theoretical regulation” already exists: it’s the Asimov’s Rules of Robotics. Turns out current AIs are not robots, and that regulation is nonsense when applied to stable diffusion or LLMs.

phillaholic@lemm.ee · 2 years ago

I disagree. Over the last twenty years or so we have plenty examples of things they should have been regulated from the start that weren’t, and now it’s very difficult to do so. Every “gig economy” business for example.

SokathHisEyesOpen@lemmy.ml · 2 years ago

Well fleshbags have to pay several years worth of salary to get their education, so by your comparison, Google’s AI should too.

MachineFab812@discuss.tchncs.de · 2 years ago

Imagine thinking Public Education doesn’t count. Or that no one without a college degree ever invented anything useful. That’s before we get to your notion of “College SHOULD be expensive, for everyone, always”.

The problem with education is NOT that some people pay less for theirs, or nothing at all, nor that some even have the audacity to learn quickly. AI could help everyone to have a chance to learn cheaply, even quickly.

SokathHisEyesOpen@lemmy.ml · 2 years ago

You’re just off on your own little rant now, arguing points I never even implied.

jarfil@beehaw.org · edit-2 2 years ago

That’s wrong on so many levels:

Go check the Gutenberg Project and the patent registry, come back when you’ve learned them all, they’re 100% free for everyone.
Fleshbags have to pay for “dumbed down” educational material just to have a chance at learning anything during their lifespan, AIs don’t.
The lion’s share of “paying for education” isn’t even paid for education, but for certification. AIs would have to pay the same… if any were dumb enough to spend “several years worth of salary” on some diploma.
The only part worth paying for, is “hands on experience”, which right now is far more expensive for AIs (need simulations and robots built).
Training AIs already isn’t free, they need thousands to millions of repetitions to learn the stuff, which means quite a buck in server costs.

So just because fleshbags are really bad at learning, does not mean Google’s AI has to pay for the same shortcomings, they already pay for their own.

MonsiuerPatEBrown@reddthat.com · edit-2 2 years ago

If my data is worth scraping then it is worth Google paying me for it.

How exactly did books.google.com turn out ?

modulus@lemmy.ml · 2 years ago

Worth considering that this is already the law in the EU. Specifically, the Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 on copyright and related rights in the Digital Single Market has exceptions for text and data mining.

Article 3 has a very broad exception for scientific research: “Member States shall provide for an exception to the rights provided for in Article 5(a) and Article 7(1) of Directive 96/9/EC, Article 2 of Directive 2001/29/EC, and Article 15(1) of this Directive for reproductions and extractions made by research organisations and cultural heritage institutions in order to carry out, for the purposes of scientific research, text and data mining of works or other subject matter to which they have lawful access.” There is no opt-out clause to this.

Article 4 has a narrower exception for text and data mining in general: “Member States shall provide for an exception or limitation to the rights provided for in Article 5(a) and Article 7(1) of Directive 96/9/EC, Article 2 of Directive 2001/29/EC, Article 4(1)(a) and (b) of Directive 2009/24/EC and Article 15(1) of this Directive for reproductions and extractions of lawfully accessible works and other subject matter for the purposes of text and data mining.” This one’s narrower because it also provides that, “The exception or limitation provided for in paragraph 1 shall apply on condition that the use of works and other subject matter referred to in that paragraph has not been expressly reserved by their rightholders in an appropriate manner, such as machine-readable means in the case of content made publicly available online.”

So, effectively, this means scientific research can data mine freely without rights’ holders being able to opt out, and other uses for data mining such as commercial applications can data mine provided there has not been an opt out through machine-readable means.

frog 🐸@beehaw.org · 2 years ago

I think the key problem with a lot of the models right now is that they were developed for “research”, without the rights holders having the option to opt out when the models were switched to for-profit. The portfolio and gallery websites, from which the bulk of the artwork came from, didn’t even have opt out options until a couple of months ago. Artists were therefore considered to have opted in to their work being used commercially because they were never presented with the option to opt out.

So at the bare minimum, a mechanism needs to be provided for retroactively removing works that would have been opted out of commercial usage if the option had been available and the rights holders had been informed about the commercial intentions of the project. I would favour a complete rebuild of the models that only draws from works that are either in the public domain or whose rights holders have explicitly opted in to their work being used for commercial models.

Basically, you can’t deny rights’ holders an ability to opt out, and then say “hey, it’s not our fault that you didn’t opt out, now we can use your stuff to profit ourselves”.

SokathHisEyesOpen@lemmy.ml · 2 years ago

The standard needs to be opt-in, not opt-out. You can’t take people’s stuff without their permission. Just because they didn’t contact you and tell you directly that you’re not allowed to take their lawn ornaments doesn’t make them free.

modulus@lemmy.ml · 2 years ago

Why not? Copyright is a monopoly. Generally society benefits from having it as weak as possible.

Pixel@lemmy.sdf.org · 2 years ago

Books will start needing to add a robots.txt page to the back of the book

sirjash@feddit.de · 2 years ago

Which will be ignored by search engines, as is tradition?

some_guy@lemmy.sdf.org · 2 years ago

… which was the style at the time.