Skip to Main Content

Diagnosis is an especially tantalizing application for generative AI: Even when given tough cases that might stump doctors, the large language model GPT-4 has solved them surprisingly well.

But a new study points out that accuracy isn’t everything — and shows exactly why health care leaders already rushing to deploy GPT-4 should slow down and proceed with caution. When the tool was asked to drum up likely diagnoses, or come up with a patient case study, it in some cases produced problematic, biased results.

advertisement

“GPT-4, being trained off of our own textual communication, shows the same — or maybe even more exaggerated — racial and sex biases as humans,” said Adam Rodman, a clinical reasoning researcher who co-directs the iMED Initiative at Beth Israel Deaconess Medical Center and was not involved in the research.

STAT+ Exclusive Story

STAT+

This article is exclusive to STAT+ subscribers

Unlock this article — and get additional analysis of the technologies disrupting health care — by subscribing to STAT+.

Already have an account? Log in

Already have an account? Log in

Monthly

$39

Totals $468 per year

$39/month Get Started

Totals $468 per year

Starter

$30

for 3 months, then $39/month

$30 for 3 months Get Started

Then $39/month

Annual

$399

Save 15%

$399/year Get Started

Save 15%

11+ Users

Custom

Savings start at 25%!

Request A Quote Request A Quote

Savings start at 25%!

2-10 Users

$300

Annually per user

$300/year Get Started

$300 Annually per user

View All Plans

Get unlimited access to award-winning journalism and exclusive events.

Subscribe

STAT encourages you to share your voice. We welcome your commentary, criticism, and expertise on our subscriber-only platform, STAT+ Connect

To submit a correction request, please visit our Contact Us page.