this post was submitted on 22 Aug 2023
1 points (100.0% liked)

Technology

82296 readers
3901 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

OpenAI now tries to hide that ChatGPT was trained on copyrighted books, including J.K. Rowling's Harry Potter series::A new research paper laid out ways in which AI developers should try and avoid showing LLMs have been trained on copyrighted material.

top 50 comments
sorted by: hot top controversial new old
[–] uriel238@lemmy.blahaj.zone 2 points 2 years ago* (last edited 2 years ago)

Training AI on copyrighted material is no more illegal or unethical than training human beings on copyrighted material (from library books or borrowed books, nonetheless!). And trying to challenge the veracity of generative AI systems on the notion that it was trained on copyrighted material only raises the specter that IP law has lost its validity as a public good.

The only valid concern about generative AI is that it could displace human workers (or swap out skilled jobs for menial ones) which is a problem because our society recognizes the value of human beings only in their capacity to provide a compensation-worthy service to people with money.

The problem is this is a shitty, unethical way to determine who gets to survive and who doesn't. All the current controversy about generative AI does is kick this can down the road a bit. But we're going to have to address soon that our monied elites will be glad to dispose of the rest of us as soon as they can.

Also, amateur creators are as good as professionals, given the same resources. Maybe we should look at creating content by other means than for-profit companies.

[–] rosenjcb@lemmy.world 2 points 2 years ago* (last edited 2 years ago)

The powers that be have done a great job convincing the layperson that copyright is about protecting artists and not publishers. It's historically inaccurate and you can discover that copyright law was pushed by publishers who did not want authors keeping second hand manuscripts of works they sold to publishing companies.

Additional reading: https://en.m.wikipedia.org/wiki/Statute_of_Anne

[–] paraphrand@lemmy.world 1 points 2 years ago (5 children)

Why are people defending a massive corporation that admits it is attempting to create something that will give them unparalleled power if they are successful?

[–] CosmicCleric@lemmy.world 1 points 2 years ago

Because ultimately, it's about the truth of things, and not what team is winning or losing.

[–] Whimsical@lemmy.world 1 points 2 years ago

The dream would be that they manage to make their own glorious free & open source version, so that after a brief spike in corporate profit as they fire all their writers and artists, suddenly nobody needs those corps anymore because EVERYONE gets access to the same tools - if everyone has the ability to churn out massive content without hiring anyone, that theoretically favors those who never had the capital to hire people to begin with, far more than those who did the hiring.

Of course, this stance doesn't really have an answer for any of the other problems involved in the tech, not the least of which is that there's bigger issues at play than just "content".

[–] Crozekiel@lemmy.zip 1 points 2 years ago

AI is the new fan boy following since it became official that nfts are all fucking scams. They need a new technological God to push to feel superior to everyone else...

[–] bamboo@lemm.ee 1 points 2 years ago (2 children)

Mostly because fuck corporations trying to milk their copyright. I have no particular love for OpenAI (though I do like their product), but I do have great distain for already-successful corporations that would hold back the progress of humanity because they didn't get paid (again).

[–] msage@programming.dev 1 points 2 years ago (3 children)

But OpenAI will do the same?

load more comments (3 replies)
[–] assassin_aragorn@lemmy.world 1 points 2 years ago

There's a massive difference though between corporations milking copyright and authors/musicians/artists wanting their copyright respected. All I see here is a corporation milking copyrighted works by creative individuals.

load more comments (1 replies)
[–] uzay@infosec.pub 1 points 2 years ago (12 children)

I hope OpenAI and JK Rowling take each other down

load more comments (12 replies)
[–] Skanky@lemmy.world 1 points 2 years ago

Vanilla Ice had it right all along. Nobody gives a shit about copyright until big money is involved.

[–] afraid_of_zombies@lemmy.world 1 points 2 years ago

I am sure they have patched it by now but at one point I was able to get chatgpt to give me copyright text from books by asking for ever large quotations. It seemed more willing to do this with books out of print.

[–] Default_Defect@midwest.social 1 points 2 years ago

They made it read Harry Potter? No wonder its gonna kill us all one day.

[–] ClamDrinker@lemmy.world 1 points 2 years ago* (last edited 2 years ago)

This is just OpenAI covering their ass by attempting to block the most egregious and obvious outputs in legal gray areas, something they've been doing for a while, hence why their AI models are known to be massively censored. I wouldn't call that 'hiding'. It's kind of hard to hide it was trained on copyrighted material, since that's common knowledge, really.

[–] TropicalDingdong@lemmy.world 1 points 2 years ago (2 children)

Its a bit pedantic, but I'm not really sure I support this kind of extremist view of copyright and the scale of whats being interpreted as 'possessed' under the idea of copyright. Once an idea is communicated, it becomes a part of the collective consciousness. Different people interpret and build upon that idea in various ways, making it a dynamic entity that evolves beyond the original creator's intention. Its like issues with sampling beats or records in the early days of hiphop. Its like the very principal of an idea goes against this vision, more that, once you put something out into the commons, its irretrievable. Its not really yours any more once its been communicated. I think if you want to keep an idea truly yours, then you should keep it to yourself. Otherwise you are participating in a shared vision of the idea. You don't control how the idea is interpreted so its not really yours any more.

If thats ChatGPT or Public Enemy is neither here nor there to me. The idea that a work like Peter Pan is still possessed is such a very real but very silly obvious malady of this weirdly accepted but very extreme view of the ability to possess an idea.

load more comments (2 replies)
[–] RadialMonster@lemmy.world 1 points 2 years ago (2 children)

what if they scraped a whole lot of the internet, and those excerpts were in random blogs and posts and quotes and memes etc etc all over the place? They didnt injest the material directly, or knowingly.

[–] beetus@sh.itjust.works 1 points 2 years ago

Not knowing something is a crime doesn't stop you from being prosecuted for committing it.

It doesn't matter if someone else is sharing copyright works and you don't know it and use it in ways that infringes on that copyright.

"I didn't know that was copyrighted" is not a valid defence.

load more comments (1 replies)
[–] fubo@lemmy.world 1 points 2 years ago* (last edited 2 years ago) (6 children)

If I memorize the text of Harry Potter, my brain does not thereby become a copyright infringement.

A copyright infringement only occurs if I then reproduce that text, e.g. by writing it down or reciting it in a public performance.

Training an LLM from a corpus that includes a piece of copyrighted material does not necessarily produce a work that is legally a derivative work of that copyrighted material. The copyright status of that LLM's "brain" has not yet been adjudicated by any court anywhere.

If the developers have taken steps to ensure that the LLM cannot recite copyrighted material, that should count in their favor, not against them. Calling it "hiding" is backwards.

load more comments (6 replies)
load more comments
view more: next ›