Tech

2445 readers

3 users here now

A community for high quality news and discussion around technological advancements and changes

Things that fit:

New tech releases
Major tech changes
Major milestones for tech
Major tech news such as data breaches, discontinuation

Things that don't fit

Minor app updates
Government legislation
Company news
Opinion pieces

Community Wiki

founded 2 years ago

MODERATORS

Vacant@programming.dev

136

Wikipedia Asks AI Companies to Stop Scraping Data and to Start Paying Up (www.cnet.com)

submitted 2 months ago* (last edited 1 month ago) by throws_lemy@lemmy.nz to c/tech@programming.dev

9 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] theunknownmuncher@lemmy.world 13 points 2 months ago (1 children)

This makes no sense, the snapshots are updated regularly and Wikipedia isn't even that big. Like 25GB.

[–] Gullible@sh.itjust.works 19 points 2 months ago* (last edited 2 months ago) (1 children)

The answer is simpler than you could ever conceive. Companies piloted by incompetent, selfish pricks are just scraping the entire internet in order to grab every niblet of data they can. Writing code to do what they’re doing in a less destructive fashion would require effort that they are entirely unwilling to put in. If that weren’t the case, the overwhelming majority of scrapers wouldn’t ignore robot.txt files. I hate AI companies so fucking much.

[–] pivot_root@lemmy.world 8 points 2 months ago

"robots.txt files? You mean those things we use as part of the site index when scraping it?"

— AI companies, probably