this post was submitted on 03 Mar 2026
136 points (97.9% liked)

Health - Resources and discussion for everything health-related

4264 readers
206 users here now

Health: physical and mental, individual and public.

Discussions, issues, resources, news, everything.

See the pinned post for a long list of other communities dedicated to health or specific diagnoses. The list is continuously updated.

Nothing here shall be taken as medical or any other kind of professional advice.

Commercial advertising is considered spam and not allowed. If you're not sure, contact mods to ask beforehand.

Linked videos without original description context by OP to initiate healthy, constructive discussions will be removed.

Regular rules of lemmy.world apply. Be civil.

founded 2 years ago
MODERATORS
 

Researchers tested different medical scenarios with the chatbot. In more than half of cases in which doctors would send patients to the ER, the chatbot said it was OK to delay care.

ChatGPT Health


OpenAI's new health-focused chatbot


frequently underestimated the severity of medical emergencies, according to a study published last week in the journal Nature Medicine.

In the study, researchers tested ChatGPT Health's ability to triage, or assess the severity of, medical cases based on real-life scenarios.

Previous research has shown that ChatGPT can pass medical exams, and nearly two-thirds of physicians reported using some form of AI in 2024. But other research has shown that chatbots, including ChatGPT, don't provide reliable medical advice.

you are viewing a single comment's thread
view the rest of the comments
[–] SaveTheTuaHawk@lemmy.ca 2 points 2 days ago* (last edited 2 days ago)

In the study, the researchers fed 60 medical scenarios to ChatGPT Health. The chatbot’s responses were compared with the responses of three physicians who also reviewed the scenarios and triaged each one based on medical guidelines and clinical expertise.

They should have included more physician opinions, because they can be highly variable, and, they should have done this blinded so the physicians didn't know which cases were in the study and they could have been taking more time and effort, skewing the data. The LLM will be more consistent that random MDs at the end of a 12 hour shift at 5am. I would have asked for more real world real time physician opinions versus Chat GPT Health.

Regardless, the genie is out of the bottle and all hospitals will eventually use LLMs to cross-check MD decisions. Certainly in pathology reports, automated scoring of imaging is far more accurate than even three MDs agreeing and pathology decisions are notoriously innaccurate from meatbags.

Here's a Harvard study where 83% of radiologists missed a gorilla pasted into images.

Pigeons are less biased in image anaysis.