this post was submitted on 27 Apr 2026

1171 points (97.7% liked)

Science Memes

20042 readers

2463 users here now

Welcome to c/science_memes @ Mander.xyz!

A place for majestic STEMLORD peacocking, as well as memes about the realities of working in a lab.

Rules

Don't throw mud. Behave like an intellectual and remember the human.
Keep it rooted (on topic).
No spam.
Infographics welcome, get schooled.

This is a science community. We use the Dawkins definition of meme.

Research Committee

!spiders@lemmy.world

Other Mander Communities

Science and Research

Biology and Life Sciences

Physical Sciences

Humanities and Social Sciences

Practical and Applied Sciences

Memes

Miscellaneous

founded 3 years ago

MODERATORS

Sal@mander.xyz

fossilesque@mander.xyz

SciBot@mander.xyz

fossilesque@lemmy.dbzer0.com

1171

Scibot! (mander.xyz)

submitted 2 days ago by fossilesque@mander.xyz to c/science_memes@mander.xyz

78 comments fedilink hide all child comments

https://sci-bot.ru/

you are viewing a single comment's thread
view the rest of the comments

[–] Oriion@jlai.lu 48 points 2 days ago (10 children)

And without hallucinations ??? That sounds freaking awesome

[–] a_non_monotonic_function@lemmy.world 125 points 2 days ago (1 children)

Of course not.

[–] OfCourseNot@fedia.io 37 points 2 days ago (1 children)

Aye?

[–] WhyIHateTheInternet@lemmy.world 18 points 2 days ago (1 children)

You're them! You're the person! Holy shit!!

[–] msage@programming.dev 8 points 2 days ago (1 children)

That's why you hate the internet???

[–] Klear@quokk.au 6 points 1 day ago (1 children)

Clearly.

[–] 0ops@piefed.zip 3 points 1 day ago

Sorry 'bout that

[–] Madrigal@lemmy.world 83 points 2 days ago (1 children)

Yeah they added “Don’t hallucinate” to the prompt.

[–] fartographer@lemmy.world 8 points 2 days ago

Seems like the kind of prompt a hallucination would say

[–] FiskFisk33@startrek.website 42 points 2 days ago

Have they solved the huge unsolved problem no one else has solved

yeah, no.

[–] morto@piefed.social 68 points 2 days ago

And without hallucinations ???

Likely not

[–] iceberg314@slrpnk.net 44 points 2 days ago (2 children)

It probably uses Retrieval Augmented Generation, which can still hallucinate, but usually does a better job for niche questions and it can even provide a source sometimes depending on how you set it up

[–] Dojan@pawb.social -5 points 2 days ago (4 children)

I hate it when people use unnecessary terms to describe something.

It’s a script that runs a search and then the LLM takes the output of that and reformats it into an answer. It’s the same as feeding it a document and having it rephrase something.

[–] TheTechnician27@lemmy.world 55 points 2 days ago* (last edited 2 days ago) (2 children)

It’s a script that runs a search and then the LLM takes the output of that and reformats it into an answer.

"I hate when people use concise, reasonably common, and understandable terminology. Why can't we just expand everything into full sentences that are also oversimplified?"

[–] Dojan@pawb.social 7 points 2 days ago (1 children)

that aren't even entirely accurate?

Point it out then.

RAG is literally just polling for information and rewriting it. It’s the same garbage that gave us Gemini telling us to put glue on pizza to prevent the cheese from slipping off.

You can, and should be more critical of where you source the information but it’s not going to magically make language models actually intelligent. It’s not going to make them reason, or be able to properly select what is relevant or not. Just because you give it a bunch of scientific papers doesn’t mean that the stuff they output will be accurate or not misleading.

They’re still just token prediction engines.

[–] TheTechnician27@lemmy.world 12 points 2 days ago* (last edited 2 days ago) (1 children)

Point it out then.

Literally here. And sorry, before you posted this, I did quickly edit my comment to "oversimplified". Because technically yes, it's searching and using what it's retrieved mixed with a (modified) user prompt to generate an output. But it's searching based on a prompt (rewriting it to aid retrieval), often reranking results, stripping the query-specific context from the results into chunks, attempting to resolve contradictions between sources (which is objectively more than just rephrasing), and then synthesizing between whatever its pretraining is and what its retrieval results are (thus "retrieval-augmented generation"). That's why I amended it to "oversimplified": you're, for no explicable reason, taking well-established terminology that you think people shouldn't use (for being "unnecessary"), expanding it out to sentence-length, and even then oversimplifying the process.

[–] Dojan@pawb.social -3 points 2 days ago (1 children)

LLMs do not possess the ability to reason over the information that it is fed. It converts it to numbers and performs arithmetics on it. Augmenting it with scripts won’t change the fundamental nature of how it works.

It takes information and regurgitates it. There is no analytical capability present that makes it able to distinguish the importance between a small segue and the main points. They can just as easily combine several separate facts into a single point, and phrase things in a way that a footnote has as much weight as the main subjects.

Hiding the actual workings behind silly marketing buzzwords serves to sensationalise what these things actually do. It feeds the AI hysteria and further muddles the discussion around them. It’s why laymen think these models are basically magic and buy into the idea that they’re somehow going to solve all our problems.

I love machine learning. It is, and has historically been a fantastic tool for plenty of tasks, but it isn’t magic.

If I implement a script to automate database migrations during application deployment I could definitely market that as Deployment Ready Database Optimisations or some other BS term, but that doesn’t make it more than a simple automation.

[–] TheTechnician27@lemmy.world 10 points 2 days ago (1 children)

LLMs do not possess the ability to reason over the information that it is fed.

Ah, yes, I forgot that if an LLM has no conscious ability to reason, then we shouldn't have any terminology to describe the general process it's using to create an output. Case closed. I'm glad you've enlightened us about how useful jargon isn't actually useful. Data goes in, data goes out; you can't explain that.

[–] Dojan@pawb.social 1 points 2 days ago* (last edited 2 days ago) (1 children)

That isn't what I said. You're doing a pretty good LLM impression yourself.

I hate it when people use unnecessary terms to describe something. Hiding the actual workings behind silly marketing buzzwords serves to sensationalise what these things actually do.

That is why I hate marketing buzzwords.

Putting an LLM to process the output of a search in a repository of scientific papers isn't going to automatically make the output useful or accurate. Papers aren't necessarily high quality just because they've been published, just look at the garbage that Lisa Littman, Kenneth Zucker, and their ilk have shat out over the decades.

An LLM, no matter how many scripts or cleverly written prompts you augment it with, will never be able to differentiate good science from bad, and will just as easily give equal credence to garbage papers as it will to actual quality ones. That's a problem, without "hallucinations" even entering the picture.

Edit: I think the overall idea of the site is awesome, knowledge should be freely available. I just don't see the value add that an LLM provides. I only see problems with it.

[–] TheTechnician27@lemmy.world 5 points 2 days ago* (last edited 2 days ago)

Putting an LLM to process the output of a search in a repository of scientific papers isn't going to automatically make the output useful or accurate. Papers aren't necessarily high quality just because they've been published.

For someone who likes to get riled up about people not responding to "what you said", this whole tangent about the accuracy of RAG and the fact scientific papers aren't automatically 100% reliable is pretty hilarious. Literally nobody was arguing that it makes it "automatically useful or accurate" or that published papers are "necessarily high-quality".

You're genuinely acting like you're taking issue with terminology describing a process because that process isn't perfect. "RAG" adequately describes a general technique to improve the accuracy of an output of a query to an LLM, and all you're doing now is pissing and moaning that "um, just because it's published doesn't mean it's *high-quality" – which has categorical fuck-all to do with the usefulness of the term.

We'll continue to use it, and you're welcome to continue being annoyeed by it.

PS: I write material that LLMs are trained on as a hobby; sorry if it annoys you that my writing style is coincidentally similar.

[–] BigDiction@lemmy.world 2 points 2 days ago

Need the deets asap with all that hot tea low key context? Get on the RAG!

Pre-order access for $5.99/USD month for your first 12 months. You know the next one comin’ soon!

[–] psycotica0@lemmy.ca 17 points 2 days ago

Sure, but RAG has a Wikipedia article about the specifics of the process, history of its use, links to papers and articles about it and its advantages and drawbacks. It's also useful as a feature on a matrix for comparing one tool or model's capabilities to another. None of that is true of the sentence.

Virtually all of computing could be reduced to voltages across terminals changing over time, but it can still be useful to give specific terms to specific applications of this process, so we have something to talk about.

[–] Not_mikey@lemmy.dbzer0.com 10 points 2 days ago

Retrieval augmented generation

is way easier to search then:

a script that runs a search and then the LLM takes the output of that and reformats it into an answer.

So if people want to look into it further and research what it is, instead of taking some persons 1 sentence explanation, they can.

Ironically trying to search for that phrase would work better in a RAG then a standard key word search.

[–] Fmstrat@lemmy.world 3 points 2 days ago

So... Search.. Assisted.. Generation?

RAG is a name from a research paper that very accurately describes what happens, but your argument seems to say you just don't like acronyms.

[–] expr@piefed.social 20 points 2 days ago

Obviously not, because that's not possible.

[–] takeda@lemmy.dbzer0.com 1 points 1 day ago

LOL, of course not.

Speaking of hallucinations, I think the best way to see them is to go to Google Gemini (Reddit is selling them Reddit posts) and start a conversation about Reddit account you have and act as you don't know anything. It usually starts good but as it progresses you can see how it is making shit up. The more you ask the more insane it gets.

And this is supposedly having all the comments at its disposal.

I also tried Lemmy as I'm sure they are also indexing it. It is telling me that I'm actually admin who created Lemmy.dbzer0.com

[–] DarrinBrunner@lemmy.world 10 points 2 days ago

What fun would that be?

[–] Atelopus-zeteki@fedia.io 7 points 2 days ago (1 children)

I'll keep the hallucinations for myself, tyvm.

Per sci-hub.ru this has been available since March 6th.

"Hear the good news: recent advances in artificial intelligence enabled Sci-Hub to launch a robot that gives scientifically-grounded responses to questions. The robot starts with searching for relevant literature in Sci-Hub database, then turns to selecting and reading most recent studies, and composes the answer based on this information. The answer includes all the references, and each referenced article can be read on Sci-Hub with one click.

Unlike question-answering robots that were based upon the early generation of neural networks, Sci-Hub bot does not hallucinate and is not making up scientific facts and does not cite sources that do not exist. To support its statements, Sci-Bot uses articles from Sci-Hub database. Questions can be asked in any language, and answers can be saved on server and shared.

The alpha version only supports answerig one question, and a more advanced variation that supports conversation mode is coming soon. Right column displays example questions that has been answered by robot - push the question to see the generated answer."

[–] Oriion@jlai.lu 9 points 2 days ago (1 children)

Thanks for doing what I should have done, I actually red that and thought it sounded great. The claim of "no hallucination" should of course be taken with a grain of salt, as other comments have pointed out.

[–] Atelopus-zeteki@fedia.io 2 points 1 day ago (1 children)

Sci-hub has been an invaluable resource. I posted a question yesterday at work. There was a queue, and it was time to leave, so I'll see what the result was when I get over there. I've avoided using AI, but this was too tempting. My question was in a area where I have some knowledge, so I'm hoping I'll be able to spot any problems in the reply.

[–] Oriion@jlai.lu 1 points 1 day ago

I'd be interested in having your feedback !!

[–] IrateAnteater@sh.itjust.works 2 points 2 days ago (1 children)

From what I understand from the sales brochure, these types of "AI" that are modeled on highly curated data are far less prone to hallucinations.

[–] sobchak@programming.dev 3 points 2 days ago

I doubt it's fine-tuned, it's likely just one of the open-weight LLMs with RAG. I've done similar things, and they don't really work as well as I'd like (the most relevant chunks of text aren't always ranked the highest/have the least embedding distance, and the models still hallucinate sometimes).