this post was submitted on 20 Jun 2026

399 points (97.6% liked)

Technology

85670 readers

3380 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 3 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

399

Low-skilled attacker used Claude, Codex to breach 14 companies (www.helpnetsecurity.com)

submitted 4 days ago by sanitation@lemmy.today to c/technology@lemmy.world

86 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] beveradb@sh.itjust.works 16 points 4 days ago (3 children)

Most people on lemmy seem to condemn use of LLMs in any way for anything, I wonder what those folks opinion of this stance is - should companies use the tools or not?

[–] village604@adultswim.fan 18 points 4 days ago (2 children)

Cybersecurity is actually one of the few fields that can benefit from AI. There are companies like Horizon3 who are using it alongside their other threat models to do continuous pen testing.

[–] Chronographs@lemmy.zip 15 points 4 days ago (1 children)

Yeah imo the one thing ai is legitimately useful for is finding answers to difficult problems that can be trivially verified as correct.

[–] MalReynolds@slrpnk.net 6 points 4 days ago

In this case hallucinations actually help...

[–] Duke_Nukem_1990@feddit.org 0 points 4 days ago (1 children)

Gonna take a guess here that what is used in cybersecurity is not LLMs but one of the more useful machine learning applications. Just a nitpick cause today "ai" and "LLM" are sadly synonymous.

[–] boonhet@sopuli.xyz 10 points 4 days ago (1 children)

No, LLMs can definitely be useful for cyber too. It's the whole reason the US government banned Claude Fable for export.

An LLM can not just try existing exploits like a script kiddy, but with iteration it can try variations and if you know what runs on the server, inspect the source for potential exploits.

They can also look at your setup and say what issues they see (reverse proxy config, etc).

Doesn't replace an expert, but can be useful for a first pass before you get the highly paid people involved.

[–] Duke_Nukem_1990@feddit.org 4 points 4 days ago (1 children)

You know what, fair enough. I don't know enough about that particular one.

[–] boonhet@sopuli.xyz 1 points 2 days ago

I do. I reverse engineered some proprietary software using an agent. A pro could've maybe done it faster, but I did it AFK with little knowledge about reverse engineering.

An agent could similarly try tons of attacks against online targets. Fairly sure some are doing it.

[–] marzhall@lemmy.world 6 points 3 days ago (2 children)

Finding holes in software has employed "fuzzing", where you send completely random payloads, as a research tactic for quite a while (and it has found exploits). LLMs just seem like "educated" fuzzing, I don't see why anyone would complain about updating your suite with them.

[–] borari@lemmy.dbzer0.com 3 points 3 days ago* (last edited 3 days ago)

I’ve been fucking around with using Claude to solve CTF challenges. I’m using a harness built out of a custom agent I wrote that progressively loads specific a specific skill for the challenge category, cryptography, binary exploitation, reverse engineering, forensics, etc.

It’s solving the simple shit in <1m using sonnet. It’s solved some shit that I couldn’t figure out at all during the CTF in the time limit we had in ~20 minutes. There’s been 2 challenges that after about 25 minutes I’ll kill the agent working on it, change to opus, then opus solved them in about 20m. One crypto challenge was so math heavy i never would have figured it out. One bin exp challenge didn’t provide a local binary, everything was remote. There was a catch that I never would have solved bc it was remote only and I couldn’t locally debug it.

It’s fucking scary good at solving these things. I just prompt with “use to solve ./category/challenge/“ and it fully just does everything. It’s definitely akin a fuzzer that can be used for way more than just finding crashes and memory leaks. It takes some work and understanding to make it context/token efficient I think, but it lowers the bar so tremendously that I definitely see why there’s concern here. And again it’s solving most of these things with sonnet, not even opus and definitely not fable.

All told, this feels like the same panic that happened when metasploit first got released/demo’d at defcon back in the day.

[–] ozymandias117@lemmy.world 3 points 3 days ago

As long as they produce a PoC like fuzzing tools, I don't think anyone is complaining

It's the theoretical attacks that nearly always turn out to be impossible, wasting time, and making it harder to find the real issues that need investigation that's the problem with slop reports

[–] DeadDigger@lemmy.zip 5 points 4 days ago (1 children)

Well the problem is that for example curl got flooded with generated security reports where only 5% had some true security potential. So your llm will basically flood you with false positives

[–] ByteJunk@lemmy.world 5 points 4 days ago (2 children)

If 5% of the reports are genuine security vulnerabilities that they wouldn't have found otherwise, that's looking like a big win to me, not sure how you see it differently.

[–] DeadDigger@lemmy.zip 1 points 3 days ago

No 5% is very low compared to before AI and this still does not mean the absolute number of found bugs has risen. From my understanding it didn't for curl. Further it is unlikely that bugs in curl are not found. Basically everything works with curl and it's a paid bug bounty program so a lot of security researchers are looking at it

[–] frongt@lemmy.zip 4 points 4 days ago (1 children)

The problem is identifying which 5%. Nobody wants to filter that much AI slop.

[–] AwesomeLowlander@sh.itjust.works 9 points 4 days ago (3 children)

If you're working for a company's cybersec, that's your job. And a much preferable one to waiting for an attacker to do it for you.

[–] borari@lemmy.dbzer0.com 6 points 3 days ago

If you’re submitting a vulnerability to a public repo, that’s also your job. These slop reports that are wasting maintainers time should never have been reported. The person tasking the LLM is out of their depth and can’t be the human in the loop that verifies the vulnerability report before submitting because they don’t have the required knowledge to do that. It’s a shame, because if people who had the requisite knowledge were the ones submitting, the ratio of valid reports to noise would be way higher than 5% and open source maintainers wouldn’t be feeling burned the fuck out.

[–] ByteJunk@lemmy.world 5 points 3 days ago

Exactly. If you go through 100 tickets and find 5 real vulnerabilities to patch, that sounds incredibly good...

[–] frongt@lemmy.zip -1 points 3 days ago

Sure, but nobody wants to do that, even at fair pay. Unpaid open source volunteer projects REALLY don't want to do that, and risk burning out what is typically a solo main dev.