Doctors Wrestle With A.I. in Patient Care, Citing Lax Rules

In medicine, the cautionary tales about the unintended effects of artificial intelligence are already legendary.

There was the program meant to predict when patients would develop sepsis, a deadly bloodstream infection, that triggered a litany of false alarms. Another, intended to improve follow-up care for the sickest patients, appeared to deepen troubling health disparities.

Wary of such flaws, physicians have kept A.I. working on the sidelines: assisting as a scribe, as a casual second opinion and as a back-office organizer. But the field has gained investment and momentum for uses in medicine and beyond.

Within the Food and Drug Administration, which plays a key role in approving new medical products, A.I. is a hot topic. It is helping to discover new drugs. It could pinpoint unexpected side effects. And it is even being discussed as an aid to staff who are overwhelmed with repetitive, rote tasks.

Yet in one crucial way, the F.D.A.’s role has been subject to sharp criticism: how carefully it vets and describes the programs it approves to help doctors detect everything from tumors to blood clots to collapsed lungs.

“We’re going to have a lot of choices. It’s exciting,” Dr. Jesse Ehrenfeld, president of the American Medical Association, a leading doctors’ lobbying group, said in an interview. “But if physicians are going to incorporate these things into their workflow, if they’re going to pay for them and if they’re going to use them — we’re going to have to have some confidence that these tools work.”

From doctors’ offices to the White House and Congress, the rise of A.I. has elicited calls for heightened scrutiny. No single agency governs the entire landscape. Senator Chuck Schumer, Democrat of New York and the majority leader, summoned tech executives to Capitol Hill in September to discuss ways to nurture the field and also identify pitfalls.

Google has already drawn attention from Congress with its pilot of a new chatbot for health workers. Called Med-PaLM 2, it is designed to answer medical questions, but has raised concerns about patient privacy and informed consent.

How the F.D.A. will oversee such “large language models,” or programs that mimic expert advisers, is just one area where the agency lags behind rapidly evolving advances in the A.I. field. Agency officials have only begun to talk about reviewing technology that would continue to “learn” as it processes thousands of diagnostic scans. And the agency’s existing rules encourage developers to focus on one problem at a time — like a heart murmur or a brain aneurysm — a contrast to A.I. tools used in Europe that scan for a range of problems.

The agency’s reach is limited to products being approved for sale. It has no authority over programs that health systems build and use internally. Large health systems like Stanford, Mayo Clinic and Duke — as well as health insurers — can build their own A.I. tools that affect care and coverage decisions for thousands of patients with little to no direct government oversight.

Still, doctors are raising more questions as they attempt to deploy the roughly 350 software tools that the F.D.A. has cleared to help detect clots, tumors or a hole in the lung. They have found few answers to basic questions: How was the program built? How many people was it tested on? Is it likely to identify something a typical doctor would miss?

The lack of publicly available information, perhaps paradoxical in a realm replete with data, is causing doctors to hang back, wary that technology that sounds exciting can lead patients down a path to more biopsies, higher medical bills and toxic drugs without significantly improving care.

Dr. Eric Topol, author of a book on A.I. in medicine, is a nearly unflappable optimist about the technology’s potential. But he said the F.D.A. had fumbled by allowing A.I. developers to keep their “secret sauce” under wraps and failing to require careful studies to assess any meaningful benefits.

Other forces complicate efforts to adapt machine learning for major hospital and health networks. Software systems don’t talk to each other. No one agrees on who should pay for them.

Dr. Kottler said she began evaluating approved A.I. programs by quizzing their developers and then tested some to see which programs missed relatively obvious problems or pinpointed subtle ones.

She rejected one approved program that did not detect lung abnormalities beyond the cases her radiologists found — and missed some obvious ones.

Another program that scanned images of the head for aneurysms, a potentially life-threatening condition, proved impressive, she said. Though it flagged many false positives, it detected about 24 percent more cases than radiologists had identified. More people with an apparent brain aneurysm received follow-up care, including a 47-year-old with a bulging vessel in an unexpected corner of the brain.

At the end of a telehealth appointment in August, Dr. Roy Fagan realized he was having trouble speaking to the patient. Suspecting a stroke, he hurried to a hospital in rural North Carolina for a CT scan.

The image went to Greensboro Radiology, a Radiology Partners practice, where it set off an alert in a stroke-triage A.I. program. A radiologist didn’t have to sift through cases ahead of Dr. Fagan’s or click through more than 1,000 image slices; the one spotting the brain clot popped up immediately.

The radiologist had Dr. Fagan transferred to a larger hospital that could rapidly remove the clot. He woke up feeling normal.

“It doesn’t always work this well,” said Dr. Sriyesh Krishnan, of Greensboro Radiology, who is also director of innovation development at Radiology Partners. “But when it works this well, it’s life changing for these patients.”

Dr. Fagan wanted to return to work the following Monday, but agreed to rest for a week. Impressed with the A.I. program, he said, “It’s a real advancement to have it here now.”

Radiology Partners has not published its findings in medical journals. Some researchers who have, though, highlighted less inspiring instances of the effects of A.I. in medicine.

University of Michigan researchers examined a widely used A.I. tool in an electronic health-record system meant to predict which patients would develop sepsis. They found that the program fired off alerts on one in five patients — though only 12 percent went on to develop sepsis.

Another program that analyzed health costs as a proxy to predict medical needs ended up depriving treatment to Black patients who were just as sick as white ones. The cost data turned out to be a bad stand-in for illness, a study in the journal Science found, since less money is typically spent on Black patients.

Those programs were not vetted by the F.D.A. But given the uncertainties, doctors have turned to agency approval records for reassurance. They found little. One research team looking at A.I. programs for critically ill patients found evidence of real-world use “completely absent” or based on computer models. The University of Pennsylvania and University of Southern California team also discovered that some of the programs were approved based on their similarities to existing medical devices — including some that did not even use artificial intelligence.

Another study of F.D.A.-cleared programs through 2021 found that of 118 A.I. tools, only one described the geographic and racial breakdown of the patients the program was trained on. The majority of the programs were tested on 500 or fewer cases — not enough, the study concluded, to justify deploying them widely.

Dr. Keith Dreyer, a study author and chief data science officer at Massachusetts General Hospital, is now leading a project through the American College of Radiology to fill the gap of information. With the help of A.I. vendors that have been willing to share information, he and colleagues plan to publish an update on the agency-cleared programs.

Sahred From Source link Health

Leave a Reply

Your email address will not be published. Required fields are marked *