this post was submitted on 19 Jan 2026

204 points (100.0% liked)

Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ

69223 readers

65 users here now

⚓ Dedicated to the discussion of digital piracy, including ethical problems and legal advancements.

Rules • Full Version

1. Posts must be related to the discussion of digital piracy

2. Don't request invites, trade, sell, or self-promote

3. Don't request or link to specific pirated titles, including DMs

4. Don't submit low-quality posts, be entitled, or harass others

Loot, Pillage, & Plunder

We heartily recommend visiting the free port of freemediaheckyeah (aka FMHY) while you sail the high seas, for all the freshest links the ocean has to offer.

📜 c/Piracy Wiki (Community Edition):

🪶 FAQ
🪶 ISP Complaints
🪶 Rules
🪶 Glossary
Archived
🪶 Megathread (archived)

🏴‍☠️ Other communities

FUCK ADOBE!

!GenP@lemmy.dbzer0.com

Torrenting/P2P:

Gaming:

💰 Please help cover server costs.


Ko-fi	Liberapay

founded 2 years ago

MODERATORS

db0@lemmy.dbzer0.com

Flatworm7591@lemmy.dbzer0.com

RandomLegend@lemmy.dbzer0.com

Andromxda@lemmy.dbzer0.com

CosmicTurtle0@lemmy.dbzer0.com

tenchiken@lemmy.dbzer0.com

unruffled@anarchist.nexus

204

NVIDIA Contacted Anna’s Archive to Secure Access to Millions of Pirated Books (torrentfreak.com)

submitted 4 months ago* (last edited 4 months ago) by hexagonwin@lemmy.sdf.org to c/piracy@lemmy.dbzer0.com

12 comments fedilink hide all child comments

damn i really hope they stay. this right after their spotify crawl and domain suspension doesn't inspire hope.

all 13 comments

sorted by: hot top controversial new old

[–] abbiistabbii@lemmy.blahaj.zone 51 points 4 months ago (1 children)

I don't know why I find it. Absolutely hilarious that Nvidia, a company currently in the AI business, notorious for not giving a shit about copyright, or just straight up going to Anna's archive.

[–] halcyoncmdr@piefed.social 18 points 4 months ago (1 children)

Nah, it's pretty simple actually. If the archive doesn't exist at all, they can't even steal from it.

[–] Whostosay@sh.itjust.works 14 points 4 months ago

Fucking Schroedinger's copyright

[–] hexagonwin@lemmy.sdf.org 33 points 4 months ago

Anna's new comment on this matter from reddit.

[–] B0rax@feddit.org 29 points 4 months ago (1 children)

Isn’t that what Anna’s archive is looking for? They even have a separate page exactly for that usecase: https://annas-archive.li/llm

[–] mrmaplebar@fedia.io 7 points 4 months ago (2 children)

Maybe I'm missing something, but I'm confused how they can promise "high speed access" to the data while also claiming:

We do not host any copyrighted materials here. We are a search engine, and as such only index metadata that is already publicly available. When downloading from these external sources, we would suggest to check the laws in your jurisdiction with respect to what is allowed. We are not responsible for content hosted by others.

Do they have the data or do they not have it?

They also claim to be able to do things like extract text and deduplicate the data... That seems to suggest a significant amount of storage and compute power for a non-profit that has only been around for ~3 years.

I find this entire thing fishy as fuck. Call me a conspiracy theorist, but I'm not convinced that the entire existence of this data theft operation isn't simply to be a illicit data broker for AI companies. And now their is direct evidence tying both Anthropic and NVidia to them.

[–] hexagonwin@lemmy.sdf.org 8 points 4 months ago

i think they mean they'll provide direct access to data hosted by "third party"s (torrents?), without the captchas and throttling/rate limiting present when normally using the annas archive website

they're asking for text extraction and dedup in exchange for providing datasets. at least publicly they claim this whole project is aimed at data preservation and wide access.. they're mostly aggregating/collecting data from other shadow libraries and even if they have malicious(?) intent, i'd say they're a net positive since their code and datas are mostly(?) open sourced.

[–] B0rax@feddit.org 3 points 4 months ago

Nono, they need deduplication and text extracts in exchange for access.

[–] BlueSquid0741@lemmy.sdf.org 28 points 4 months ago

Anna’s Archive is the perfect place to find specific translations of ebooks. Something I hadn’t thought of the need for until recently.

[–] Almacca@aussie.zone 21 points 4 months ago* (last edited 3 months ago)

Have you seen the quality of some of those OCR scans? I'm reading the Stainless Steel Rat books from Anna's Archive right now, and the number of errors is ridiculous, and it's not an isolated case. Pretty much every one I've read had at least a few. Good luck getting decent training data from them.

[–] phpinjected@lemmy.sdf.org 8 points 4 months ago

simply a data grab for their ai training sets.

[–] nil@piefed.ca 1 points 4 months ago

Pirating books is helping AI... this is going to cause some double standards