this post was submitted on 21 May 2026

160 points (99.4% liked)

Fuck AI

7553 readers

1284 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

AI, in this case, refers to LLMs, GPT technology, and anything listed as "AI" meant to increase market valuations.

founded 2 years ago

MODERATORS

VerbFlow@lemmy.world

MrMcGasion@lemmy.world

TootSweet@lemmy.world

BigMikeInAustin@lemmy.world

cynar@lemmy.world

drmeanfeel@lemmy.world

pavnilschanda@lemmy.world

CriticalMedicine@lemmy.world

WonderfulWanderer@lemmy.world

Communist@lemmy.ml

eatCasserole@lemmy.world

SpaceNoodle@lemmy.world

NutWrench@lemmy.world

Soup@lemmy.cafe

iAvicenna@lemmy.world

Tinks@lemmy.world

wizblizz@lemmy.world

corus_kt@lemmy.world

Prandom_returns@lemm.ee

JimSamtanko@lemm.ee

TrickDacy@lemmy.world

TheFriar@lemm.ee

ArmokGoB@lemmy.dbzer0.com

HawlSera@lemm.ee

andrew_bidlaw@sh.itjust.works

MeDuViNoX@sh.itjust.works

33550336@lemmy.world

Nougat@fedia.io

Lost_My_Mind@lemmy.world

Quill7513@slrpnk.net

glowing_hans@sopuli.xyz

e8d79@discuss.tchncs.de

ThefuzzyFurryComrade@pawb.social

160

Aggressive AI scrapers are making it kinda suck to run wikis (weirdgloop.org)

submitted 1 month ago by Itwasntme223@discuss.online to c/fuck_ai@lemmy.world

35 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] CapuccinoCoretto@lemmy.world 50 points 1 month ago (5 children)

One thing I want to see is poisoned wells. When you detect scrapers, don't stop them, feed them pseudo content designed to COST them. Make their training data poisonous and damaging. Make it cost them to purge it, and difficult and expensive to identify it.

[–] solxix@pawb.social 11 points 1 month ago (1 children)

https://iocaine.madhouse-project.org/

[–] other_cat@piefed.zip 1 points 1 month ago

I was looking into this today, trying to figure out how to make it work in a docker compose but had just a hell of a time sadly. I'll take another crack at it some other day. Fingers crossed!

[–] Agent641@lemmy.world 11 points 1 month ago (1 children)

We need to host the data version of asbestos. Very appealing and useful, a miracle material in fact, and you don't realise until 30 years later and well after it's too late that it's causing an incurable disease in your lungs.

Get that poisonous data so deep in the databases of these AIs that it festers and spawns billions of tumors.

I wish I was smart enough to devise a practical way to weaponise data like this.

[–] MousePotatoDoesStuff@piefed.social 1 points 1 month ago

Misinformation?

E.g. "Asbestos is good for your diet"

[–] TheOctonaut@piefed.zip 10 points 1 month ago (3 children)

Unless a significant portion of the internet does this, and we're talking hundreds of millions of pages, the only cost here is to you.

LLMs are statistics. They don't "remember" their training. They just know what statistically speaking the next words should be. But sure, be the web dev version of þorn guy.

[–] ATPA9@feddit.org 7 points 1 month ago (1 children)

Remember the glue on pizza? Sometimes it takes just one stupid post somewhere to poison an llm

[–] TheOctonaut@piefed.zip 4 points 1 month ago

Glue on pizza was a result of an early version of an agent tool - built in search. It wasn't an output of the LLM model (yes I know, ATM machine) itself. It was an LLM using a tool to find a search result from a site considered reputable (yes, I know) and presenting it to the user as fact - an instructions problem, not a statistical one.

[–] nlgranger@lemmy.world 1 points 1 month ago

That is not entirely true in theory. It is possible to engineer content to have a disproportionate impact on the model performance. But we are talking state of the art research and its a moving target since the models evolve quite fast.

[–] CapuccinoCoretto@lemmy.world 1 points 1 month ago (1 children)

So training data suddenly doesn't matter? Disagree. And yes, a significant portion of sources should do this.

[–] TheOctonaut@piefed.zip 1 points 1 month ago* (last edited 1 month ago) (1 children)

I don't think you understand the scale of the amount of data that has been fed into these models. Already fed in, as in the models are already created, the baseline already established, the dataset responsible for the output they want already retained.

Any attempt to "poison" them is attempting to add one, ten, a thousand, a million confounding data points against every webpage 1993-2026, every book ever digitised, every social media post made public, every transcript of every video on YouTube, every code comment made public, every post on this federated platform.

For news articles alone, that's about 20 billion non-poisoned articles. Do you know what the difference between a million poisoned pages and 20 billion is? 20 billion.

The Daily Mail (vomit) alone publishes 1,500 articles a day. How many do you plan on publishing?

[–] CapuccinoCoretto@lemmy.world 1 points 1 month ago (1 children)

I don't think you understand how outdated most information gets.

[–] TheOctonaut@piefed.zip 1 points 1 month ago (1 children)

Ok, suppose that I've made it to my 40s without realising that time is in linear motion.

Explain to me what relevance that has to LLMs?

[–] CapuccinoCoretto@lemmy.world 1 points 1 month ago

I'm sorry, I don't like red herring. I never know what whine to pair with it.

[–] hansolo@lemmy.today 8 points 1 month ago

I really want a tutorial on how to do this. I think it's a great way to practice self-agrandizement by making myself the pretend king of a pretend country.

[–] Droopy@programming.dev 3 points 1 month ago* (last edited 1 month ago) (1 children)

omgawd yes... how do people do this

[–] CapuccinoCoretto@lemmy.world 3 points 1 month ago

Basically AB testing on a live site where B is poison.