The Robots Are Coming, Part 2

Last week we tiptoed up to some of the problems that increasing automation in health care may uncover. This week let’s talk briefly about proposed FDA action to address them.

FDA approvals of automated and artificially intelligent health care platforms have been accelerating, and in the flurry of activity that accompanies all Presidential transfers of power, the outgoing Trump administration had controversially proposed waiving much of the regulatory oversight of medical artificial intelligence tools. With a new administration in place, the original WhiteHouse.gov link I’d originally included in this post no longer exists, and the FDA has pivoted hard in the other direction, releasing a new five-part action plan laying out its efforts to regulate products that incorporate “Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD).” Let’s go through them one at a time:

  1. One of the big challenges in regulating artificial intelligence is that an AI program is, by definition, not a static product the way a drug or a physical device is. Once the FDA approves a drug, we know that we’ll be dispensed the same drug ten years from now that we’ll get today.

    But artificial intelligence learns over time. So as the dataset that a software platform “learns from” grows, the interpretation of that data by the program may change. We call this process of gradual improvement through multiple steps an “iterative” process. So the FDA has expressed an expectation that manufacturers and the FDA be able to transparently monitor performance through this iterative process in hopes of maximizing safety while allowing the gradual change and improvement of the platform.

  2. When you go to an ATM you have the expectation that you can put any bank card into the ATM and check your bank balance, even if your ATM card was issued from a different bank than the machine. When you put a CD into a CD player, you know it will play, regardless of whether the machine is a Sony or a Panasonic. But this isn’t true of every device. Electronic health records are notoriously finicky about what data they can exchange, for example. So the FDA has asked manufacturers to use “Good Machine Learning Practice (GMLP)” to “encourage harmonization of the development of programs through consensus standards efforts.” These include harmonization of data management, feature extraction, training, interpretability, evaluation, security, and documentation standards.

  3. Many artificial intelligence/machine learning platforms are trained on existing datasets. The patients who contributed their data were real, but the data is now so divorced from any living creature that it is easy to think of it in hard, mathematical terms rather than as attached to a real, living person who had thoughts, feelings, emotions, and parents. So the FDA has asked that manufacturers take a “patient centered” approach to how these technologies interact with people. What this precisely means is still under discussion, but broadly it seems to mean transparency to users (that is, you’ll know when your data is being used, and you’ll know when a machine is helping make decisions in your care), usability (meaning the operation of the software won’t be a mystery), equity (everyone gets a fair shake at representation within the software’s training dataset, for example), trust, and accountability.

  4. The machines we build carry our biases. You may have read about bias in software used in sentencing for convicted criminals. The software was meant to reduce bias in sentencing, but since it was trained on a dataset that displayed the bias of past sentencing, the software itself was biased against certain groups of people. Or you may have seen news of this in facial recognition algorithms, which only become adept at recognizing faces when intentionally exposed to diverse faces. A good example of a similar phenomenon in medicine is a tool developed to predict knee pain in patients with osteoarthritis. A commonly used tool built on data from mostly white, British patients was found to be less accurate than a similar tool that was trained on data that included more Black and low-income patients. The new, more diversely trained model roughly doubled the likelihood that an evaluated Black patient would be considered eligible for surgery. With this in mind, the FDA has pledged to “evaluate and address algorithmic bias and to promote algorithm robustness,” specifically as it relates to race, ethnicity, and socioeconomic status, to avoid biases present in the health care system from seeping into algorithms.

  5. Since many currently commercially available AI and machine learning products were approved based on their performance with historical datasets, not based on prospective testing on real patients the way a drug or another device would have historically been, we don’t know for certain how they’ll do in the real, nitty-gritty care of patients. But even with the traditional model of drug testing prior to approval, a need for post-approval testing is not unprecedented. Roughly a fifth of medications prescribed, for example, are used “off-label,” meaning they’re used for something other than the purpose for which the FDA originally approved them. Gabapentin, for example, is frequently used for pain relief, but it’s FDA approvals are only for seizure disorder and for a specific type of pain syndrome called “postherpetic neuralgia.” With this in mind, the FDA has pledged to monitor “Real-World Performance.” This not only allows the FDA to monitor the performance of the device in terms of accuracy of its recommendations, but it allows the FDA to monitor just exactly how the devices are being used. As far as I can tell, real-world performance data monitoring at this point is voluntary but encouraged. Depending on how well this voluntary system works, FDA intends to develop a framework for any mandatory prospective reporting in the future.

 Will any of this solidify public and physician trust in artificial intelligence? I don’t know. My hunch is that trust on the physician side will hinge more on the positive or negative effect of AI on clinics’ bottom line. And public trust of technology seems to depend more on convenience than on the good or ill intentions of the company. Few of us complain about Google, for example, because even though Google knows a lot about us it makes certain parts of our lives, like the composition of this blog post on Google Docs, better.

As the Medical Director of the Kansas Business Group on Health I’m sometimes asked to weigh in on hot topics that might affect employers or employees. This is a reprint of a blog post from KBGH.