this post was submitted on 19 May 2025
1343 points (98.0% liked)

Microblog Memes

7647 readers
1874 users here now

A place to share screenshots of Microblog posts, whether from Mastodon, tumblr, ~~Twitter~~ X, KBin, Threads or elsewhere.

Created as an evolution of White People Twitter and other tweet-capture subreddits.

Rules:

  1. Please put at least one word relevant to the post in the post title.
  2. Be nice.
  3. No advertising, brand promotion or guerilla marketing.
  4. Posters are encouraged to link to the toot or tweet etc in the description of posts.

Related communities:

founded 2 years ago
MODERATORS
 
you are viewing a single comment's thread
view the rest of the comments
[–] ByteJunk@lemmy.world 0 points 11 hours ago (1 children)

Could you provide references? I'm genuinely interested, and what I found seems to say differently:

Overall, GPT-4 passed the board residency examination in four of five specialties, revealing a median score higher than the official passing score of 65%.

AI NEJM

Also I believe you're seriously underestimating the abilities of present day LLMs. They are able to ask relevant follow up questions, as well as interpreting that information to request additional studies, and achieve accurate diagnosis.

See here a study specifically on conversational diagnosis AIs. It has some important limitations, crucially from having to work around the text interface which is not ideal, but otherwise achieved really interesting results.

Call them "idiot machines" all you want, but I feel this is going down the same path as full self driving cars - eventually they'll be doing less errors than humans, and that will save lives.

[–] medgremlin@midwest.social 1 points 4 hours ago* (last edited 4 hours ago) (1 children)

My mistake, I recalled incorrectly. It got 83% wrong. https://arstechnica.com/science/2024/01/dont-use-chatgpt-to-diagnose-your-kids-illness-study-finds-83-error-rate/

The chat interface is stupid in so many ways and I would hate using text to talk to a patient myself. There are so many non-verbal aspects of communication that are hard to teach to humans that would be impossible to teach to an AI. If you are familiar with people and know how to work with them, you can pick up on things like intonation and body language that can indicate that they didn't actually understand the question and you need to rephrase it to get the information you need, or that there's something the patient is uncomfortable about saying/asking. Or indications that they might be lying about things like sexual activity or substance use. And that's not even getting into the part where AI's can't do a physical exam which may reveal things that the interview did not. This also ignores patients that can't tell you what's wrong because they are babies or they have an altered mental status or are unconscious. There are so many situations where an LLM is just completely fucking useless in the diagnostic process, and even more when you start talking about treatments that aren't pills.

Also, the exams are only one part of your evaluation to get through medical training. As a medical student and as a resident, your performance and interactions are constantly evaluated and examined to ensure that you are actually competent as a physician before you're allowed to see patients without a supervising attending physician. For example, there was a student at my school that had almost perfect grades and passed the first board exam easily, but once he was in the room with real patients and interacting with the other medical staff, it became blatantly apparent that he had no business being in the medical field at all. He said and did things that were wildly inappropriate and was summarily expelled. If becoming a doctor was just a matter of passing the boards, he would have gotten through and likely would have been an actual danger to patients. Medicine is as much an art as it is a science, and the only way to test the art portion of it is through supervised practice until they are able to operate independently.

[–] ByteJunk@lemmy.world 1 points 3 minutes ago

From the article referenced in your news source:

_JAMA Pediatrics and the NEJM were accessed for pediatric case challenges (N = 100). The text from each case was pasted into ChatGPT version 3.5 with the prompt List a differential diagnosis and a final diagnosis. _

A couple of key points:

  • These are case challenges, which are usually meant to be hard. I could find no comparison to actual physician results in the article, which would have been nice.
  • More importantly however: it was conducted in June 2023, and used GPT-3.5. GPT-4 improved substantially upon it, especially for complex scientific or scientific problems, and this shows in the newer studies that have used it.

I don't think anyone's advocating that an AI will replace doctors, much like it won't replace white collar jobs either.

But if it helps achieve better outcomes for the patients, like the current research seems to indicate, aren't you sworn to consider it in your practice?