DURING THE PANDEMIC, the medical and scientific communities held out great hope that artificial intelligence (AI) would crunch all the data being generated worldwide about covid and literally save lives.
“An unprecedented number of AI models was released during the pandemic into the medical literature,” said Sara Murray, MD, MAS, chief health AI officer at the University of California, San Francisco, speaking at UCSF’s fall 2023 hospital medicine conference. Unfortunately, she pointed out, as one author of a Harvard Business Review article, most of those models “landed with a thud.”
“People were desperate to use AI to predict covid surges, but we were never able to predict them,” she said. “People wanted AI to detect early covid on chest X-rays, but we never saw that widely implemented.” Those AI models didn’t pan out for many reasons, Dr. Murray added, including the use of bad data sets that weren’t big enough to be generalizable.
“With this new technology, we’ll finally be able to undo some of the harms that previous technology, like EHRs, has caused us.”
Health care has reaped some recent AI wins, although most of those have been in large health systems like UCSF. Moreover, those have focused on operational use cases: predicting bed availability by acuity level, for instance, or projecting unplanned admissions for patients undergoing ambulatory surgery.
“We’ve implemented a number of these models for UCSF’s capacity management center, and they have all been relatively useful,” Dr. Murray said. “But the clinical models that traditional AI has been used for—like sepsis alerts—haven’t been very successful. As a clinician, I want to know things that I don’t already know.”
Now, however, Dr. Murray said that physicians are about to experience what cardiologist Eric Topol, MD, has called “keyboard liberation,” freedom from hours a day of documentation and a potential reduction in burnout.
“We’re entering the era of generative AI, and I think things are actually going to change quite dramatically,” she said. “With this new technology, we’ll finally be able to undo some of the harms that previous technology, like EHRs, has caused us both as providers and in our relationships with patients.”
Generative AI—so called because it actually generates new content, as opposed to traditional AI, which could analyze data, answer specific questions and produce predictions—canunderstand the relationship between words and derive meaning from language. “GPT” stands for generative pre-trained transformers.
Generative AI also ushers in a paradigm shift: Only computer scientists worked with traditional AI to figure out tools forhealth systems or EHRs. “Now, generative tools are open and available to the public, and everyone can access them,” said Dr. Murray. ChatGPT has gotten the most press, but Google has Bard and Amazon has Bedrock.
Generative AI is also rapidly being improved upon, with GPT-4 drastically outperforming its predecessor on medical use cases. When earlier versions were first released, “it was very easy to get those models to ‘hallucinate,’ ” a term used for GPT’s tendency to generate wrong information, she said. Now, however, “GPT-4 can fact-check itself for hallucinations and identify human errors.”
But for clinicians, the challenge right now is that none of these tools is HIPAA-compliant or secure—yet. “Organizations like ours are building infrastructure—the UCSF tool is called Versa—that clinicians will be able to use to put clinical data into GPT.”
STAT+, which is a subscription news service, maintains a GPT tracker that keeps tabs on the top applications, developers and target audiences for health care AI products. (The monster developers right now are Epic and Microsoft, which are working together, and Google.)
Clinicians are the target audience for the vast majority ofapplications being developed. Clinical documentation is the top application with the most products under development.
Dr. Murray said she remembers the “very low-tech” days of phoning in notes to be transcribed. Some health systems later got in-person scribes. More recently, those gave way to ambient systems that rely on voice recognition “but still have a human in the loop, editing as necessary.” But what’s being marketed now as AI Assistants don’t have a person in the loop, and “it’s a very competitive market. There are so many AI scribe companies working on streamlining documentation that the pace has just been unprecedented.”
AI scribes will be adopted by outpatient clinics and by hospitals, Dr. Murray said. “The tools that are ultimately most effective will have a deep integration with the EHR,” she explained, “and they will do things for you other than write your notes.”
Other tasks that generative AI may soon be helping with in the hospital include automating your billing and checking your medication reconciliation to make sure patients’ med lists areright. And while it’s not happening yet, “AI eventually should be able to check your orders and even pend orders for you. I think we’ll definitely be using these applications in the hospital in the next few years.”
Medical review and handoffs
The type of application that excites Dr. Murray the most is AI’sability to summarize medical information. “They’ll do that in many ways,” she said, “with literature, for instance, and with summarizing medical information at different educational levels to help patients.”
Some of hospitalists’ most challenging cases, she pointed out,are “transfer patients or patients in our own hospital sent to us from surgery after being hospitalized for six weeks.” GPT-4 can take those reams of impenetrable electronic notes, generate the relevant clinical events and condense the output, “all while remembering that the intended audience is doctors. The program can generate what I actually want to hear from trainees when they sign out to me in the morning.”
If you ask the program to do a similar summary for patients and families, it will accommodate various levels of reading proficiency—an 8th grade or a 3rd grade level, for instance—and (soon) deliver those summaries in different languages.
“Generative AI will transform medical record review for new admissions and handoffs between services or hospitals,” Dr. Murray said. “It will do the same thing for lay readers. This is something that’s very digestible, not just for clinicians but for everyone else involved in a patient’s care, including family members and case management.”
As for clinical cases, “we were all really impressed when we heard that an early GPT could pass USMLE,” said Dr. Murray. As she added, however, “it barely passed”—53%—”and you wouldn’t want that to be your doctor.”
GPT-4, on the other hand, has aced that test, scoring more than 86%. “Not only does it answer questions correctly without seeing the pictures or hearing the audio files like we do, but it can provide detailed explanations on why one answer is correct and why the alternatives are wrong,” she noted. “If you ask it to, it will even rewrite the prompts to make other answers be right.”
Imagine, she said, being in medical school and practicing for the exam with such a tool. “You get brilliant explanations, as well as words of encouragement from GPT-4 because you’re still learning,” Dr. Murray said. “I think this could revolutionize how we teach medicine.” (See “AI and the next generation of doctors,” below.)
While AI will certainly have its uses, it comes with plenty of challenges and ethical issues. “We’re still trying to figure this out,” said Dr. Murray. “When we implement AI in health care, we have to ensure that it’s trustworthy.”
Toward that end, Dr. Murray noted that UCSF has set up what itcalls AI governance, “a multidisciplinary group of people—leaders, researchers, ethicists—that looks at AI tools we’re thinking of implementing. “Then we assess them according to guidelines for trustworthiness from the HHS.”
“We want these tools to be transparent and explainable, which is extraordinarily difficult with generative AI,” she noted. “We have to figure out how to adapt those principles in this environment, and we want accountability in how these tools are used.” An article published by Politico in October 2023 pointed out that the U.S. government and the FDA specifically are still trying to figure out what new regulations are needed.
A lot of attention is also being paid to assessing AI tools for bias. “We’ll have to have separate conversations to start holding our vendors accountable for meeting AI guidelines”—another evolving area, said Dr. Murray. “If there is bias in the underlying data that are used to train generative AI, you’re going to have a biased model.”
At the same time, Dr. Murray admitted, “we don’t fully understand the risks here. We have technology now that is capable, by itself, of creating new technology.” In March 2023, tens of thousands of technology leaders and scientists signed a letter that called for a six-month moratorium on AI development beyond GPT-4. They believe that timeout is desperately needed to craft safety protocols that need to be agreed upon and to allow policy-makers to come up with regulations.
Even more disconcerting: One key AI developer, writing in Time, claimed he didn’t sign that letter because it wasn’t urgent enough. Instead, he called for an indefinite moratorium on generative AI development and raised the specter of an artificial intelligence in the future that far outstrips our own.
Dr. Murray certainly didn’t sound that level of alarm. But “we have to be very thoughtful about this,” she said. “I’m focusing on uses in medicine, particularly hospital medicine, but more broadly we have to move forward carefully. When you talk to leaders of AI at Microsoft and elsewhere, no one really knows why these models work as well as they do, and that’s a bit scary.”
Phyllis Maguire is Executive Editor of Today’s Hospitalist.
AI and the next generation of doctors
AS AN EXAMPLE of the dramatic advances in different GPT models, Sara Murray, MD, MAS, chief health AI officer for the University of California, San Francisco, shared an anecdote at this fall’s UCSF hospital medicine conference.
With an earlier version of GPT, she said, she used to enjoy making it “hallucinate,” a word used to describe when generative AI produces false information that can be very persuasive.
“I asked it to draft a letter asking for prior authorization to order apixaban to treat insomnia,” Dr. Murray said. “That earlier GPT model claimed that recent studies had found that apixaban indeed had the potential to improve sleep.
But “when I posed that same query to GPT-4, I got scolded. It explained the evidence behind apixaban and then wrote that ‘it would be unethical and inappropriate for me to draft such a request.’ So problems with hallucination are definitely getting better.”
An audience member picked up on that anecdote and pointed out that Dr. Murray, as a practicing clinician, had enough knowledge when working with GPT to know when it was wrong. But what about residents and medical students? Going forward, how can medical educators ensure that trainees receive enough knowledge to complement AI rather than be dominated by it?
That, Dr. Murray said, “is honestly a real area of concern. People in education are trying to figure out how to train the next generation of doctors.” Right now, she added, “we are still asking them not to use this technology as a primary source of information.” But she predicted that will change over the next five to 10 years.
“I think AI tools will eventually serve as a very reliable source of information,” Dr. Murray said, “like a more sophisticated version of UpToDate. While residents may end up using the tool to tell them the best way to present a case on rounds, they will still have to learn how to communicate.”