TechTakes

2577 readers

49 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago

MODERATORS

dgerard@awful.systems

110

You can’t feed generative AI on ‘bad’ data then filter it for only ‘good’ data (pivot-to-ai.com)

submitted 1 year ago by dgerard@awful.systems to c/techtakes@awful.systems

57 comments fedilink hide all child comments

video version

you are viewing a single comment's thread
view the rest of the comments

[–] dgerard@awful.systems 12 points 1 year ago (1 children)

why did you post literally just the text from the article

[–] homesweethomeMrL@lemmy.world -5 points 1 year ago (1 children)

It's just a section. There's more of the article.

Like this:

Another day, another preprint paper shocked that it’s trivial to make a chatbot spew out undesirable and horrible content. [arXiv]

How do you break LLM security with “prompt injection”? Just ask it! Whatever you ask the bot is added to the bot’s initial prompt and fed to the bot. It’s all “prompt injection.”

An LLM is a lossy compressor for text. The companies train LLMs on the whole internet in all its glory, plus whatever other text they can scrape up. It’s going to include bad ideas, dangerous ideas, and toxic waste — because the companies training the bots put all of that in, completely indiscriminately. And it’ll happily spit it back out again.

There are “guard rails.” They don’t work.

One injection that keeps working is fan fiction — you tell the bot a story, or tell it to make up a story. You could tell the Grok-2 image bot you were a professional conducting “medical or crime scene analysis” and get it to generate a picture of Mickey Mouse with a gun surrounded by dead children.

Another recent prompt injection wraps the attack in XML code. All the LLMs that HiddenLayer tested can read the encoded attack just fine — but the filters can’t. [HiddenLayer]

I’m reluctant to dignify LLMs with a term like “prompt injection,” because that implies it’s something unusual and not just how LLMs work. Every prompt is just input. “Prompt injection” is implicit — obviously implicit — in the way the chatbots work.

The term “prompt injection” was coined by Simon WIllison just after ChatGPT came out in 2022. Simon’s very pro-LLM, though he knows precisely how they work, and even he says “I don’t know how to solve prompt injection.” [blog]

[–] dgerard@awful.systems 15 points 1 year ago (1 children)

Yes, I know, I wrote it. Why do you consider this useful to post here?

[–] homesweethomeMrL@lemmy.world -5 points 1 year ago (2 children)

Well, I don't think that last part was useful, but I do think the previous part was useful as a way to focus conversation. Many people don't read the article, and I thought that was the most relevant section.

[–] blakestacey@awful.systems 9 points 1 year ago

Good grief. At least say "I thought this part was particularly interesting" or "This is the crucial bit" or something in that vein. Otherwise, you're just being odd and then blaming other people for reacting to your being odd.

[–] swlabr@awful.systems 5 points 1 year ago (1 children)

Actually I’m finding this quite useful. Do you mind posting more of the article? I can’t open links on my phone for some reason

[–] homesweethomeMrL@lemmy.world -5 points 1 year ago

Actually this comm seems really messed up, so I'mma just block it and move on. Sorry for ruffling your feathers, guv.