What can Amazon teach healthcare about AI?

Sep 26, 2021

2021 09 27 20 03 9251 Business Amazon Building 400

There's a lot that healthcare can learn from online retail giant Amazon about how to integrate artificial intelligence (AI) into clinical care, Regina Barzilay, PhD, said during a September 26 plenary talk at the 2021 annual American Association for Clinical Chemistry (AACC) meeting.

In her talk, Barzilay discussed the current AI landscape in healthcare, doled out advice, and dispelled some myths. Barzilay is the distinguished professor for AI and health AI at the Massachusetts Institute of Technology (MIT) School of Engineering.

What can the healthcare field learn from Amazon?

Barzilay said that when Amazon gives a product recommendation to a consumer, for better or for worse, the platform aggregates all the available data on that consumer before deciding what to recommend.

She contrasted this "use-everything" approach with how clinical decisions are currently made based on biomarkers of dubious value and findings from clinical trials that enroll about 3% of the population.

Applying the Amazon approach, all the data known about a patient, including tests, history, and demographics, would be utilized in a flexible manner for each diagnosis.

"In an ideal case, where I see the diagnostics moving is [providing] the same flexibility," Barzilay said. "The way I envision us moving forward is in taking all the information about the patient and from there predicting all the different outcomes that you can predict about the patient."

Learning from the outcomes

Next, Barzilay addressed an issue she is very familiar with: AI and cancer screening.

Barzilay's group has successfully deployed machine-learning models for cancer detection. She is interested in the best approach to formulating machine-learning prediction tasks -- and these methods may not always align with what other clinicians think. Barzilay advocated removing humans from the loop and proceeding directly from the data to the outcome.

Specifically, Barzilay questioned using density in breast exams as a biomarker to predict a woman's risk of developing breast cancer. (That is, women whose breasts are dense have a higher risk of breast cancer versus women whose breasts are fatty.) In the U.S., most states require a breast density notification after a mammogram.

Common approaches to predicting a woman's risk of breast cancer include the classical risk model in which clinicians record data such as age, family, history, and prior medical breast procedures and incorporate that into a risk assessment score. In the classical risk model, where random guessing corresponds to an area under the curve (AUC) of 0.5, this added information provides a risk score with an AUC of 0.6. When the federally regulated density biomarker is added, the AUC moves to 0.63.

In other words, "not really great," Barzilay said.

Additionally, breast density is often assigned by breast radiologists, who are inconsistent in characterizing it. "Human doctors are so inconsistent in how they mark this density," she said.

She highlighted a study in which breast radiologists looked at the same set of patients, and some found dense breasts in 6.3% of the patients evaluated and others found dense breasts in 84.5% of the patients evaluated.

Barzilay suggested moving away from the breast density biomarker and toward machine-learning models, which are not as inconsistent as humans and can learn from image outcomes -- the same way that face recognition systems can spontaneously learn what constitutes a face.

"I very strongly advocate against using this kind of very rough biomarker," she said. "You can give images to the model and explain what happens to a patient in the next five years. Let the machine itself identify this pattern so you don't need to say how much [density] should be in the image."

Dispelling AI myths

Barzilay dispelled two of the most common myths about AI she has encountered.

Myth No. 1: "Humans provide the gold standard."

Barzilay questioned the assumption that human judgment is the gold standard by which AI performance should be judged.

In particular, she cited the American College of Radiology (ACR). In 2020, the ACR issued a statement in which it urged caution accepting the medical judgments of AI over human interpretation. While Barzilay admits that there are cases where machines make mistakes, she noted that human medical professionals make plenty of mistakes.

Barzilay cited a paper addressing the high percentage of missed diagnoses of breast cancer. The study, which was based on MRI scans of women with the BRCA genetic mutation, a high-risk category, found that 31% of breast cancers that were visible on a previous MRI were missed, causing a one-year delay in diagnosis.

Human judgments are often a low bar to clear rather than a gold standard to aim for. "We should not have this very low barrier in comparing ourselves," she said. "We should aim much higher."

Myth No. 2: "Every clinician can be an AI researcher."

The second myth that Barzilay dispelled involves the democratization of AI technology. Today, anyone can download code and train their own AI model. "We are just starting to see [the] increasing availability of tools to enable on-premises development of AI models by clinicians," she noted.

However, Barzilay cautioned that the results from a homegrown system will not be nearly as accurate as the results that can be obtained from an AI expert. To achieve the highest possible prediction performance, the model needs to be tweaked by a sophisticated AI practitioner with PhD-level training. "Don't be overconfident," she concluded.

AI in drug discovery

Barzilay's team at the MIT Jameel Clinic is also evaluating the role of AI in the drug discovery space, which she said is "developing very fast, faster than clinical AI."

The team's in silico, deep-learning approach identified an antibacterial molecule called halicin that showed a strong profile against Clostridium difficile. She said that the molecule was identified with very little experimentation and went through a full clinical evaluation in the lab.

Barzilay and her team also published a study on a machine-learning model that was able to predict desired combinations of drugs to treat patients with COVID-19. Her research in this field is supported by a consortium of companies, including Pfizer, Merck, Amgen, Lilly, GSK, and Janssen.

Humans and machines need to work together to counterbalance each other, Barzilay said. "[The] landscape of AI in healthcare is not about competing with humans but finding a way where you can capitalize on the strengths of humans and capitalize on the strength of machines to improve the care, because humans and machines do different types of mistakes," she concluded.