These days, it isn’t just doctors and nurses who keep a hospital running smoothly: It is also computer programs. Data scientists have developed scores of brainy algorithms to pinpoint patients at risk of complications, catch errors in medical records or prescriptions, fast-track paperwork and billing, and even diagnose patients.
While new doctors usually recite some version of the Hippocratic Oath at their medical school graduation — swearing to uphold ethical standards in treating their patients — programmers who develop AI for the health care industry are rarely given formal ethics training. Mildred Cho, PhD, a professor of pediatrics and associate director of the Stanford Center for Biomedical Ethics, is trying to change that.
“Developers are often not from a medical background and haven’t spent years thinking about this moral framework — how things like respect and justice and personal principles spill over into medicine,” Cho said. “But as we start seeing artificial intelligence programs being used more widely in medicine, it’s important that developers think about the real-world ethical implications of their work.”
Over the past five years, Cho has interviewed dozens of programmers who work in a variety of settings, all creating health care-related machine learning programs. With machine learning, developers input existing data on patients into a computer, which pinpoints patterns that might not be obvious to a person. Using these patterns, the machine learning algorithm can then analyze new data from outside the original set. Machine learning algorithms can be used to identify patients who are at risk of dangerous complications like malnutrition, falling or infections, for instance, and flag them for additional attention or treatments.
“Almost anything you can think of in medicine is being tackled with machine learning right now, because so much of medicine is about pattern recognition,” Cho said.
But artificial intelligence algorithms also pose hazards: They can misdiagnose patients, fail to identify people at risk of complications or reveal pieces of private information.
AI can also exacerbate existing biases in the health care system: If doctors are less likely to diagnose women or minorities with a particular condition, machine learning platforms will assume that people in those groups develop the condition less often, perpetuating the bias. Scientists at Duke University Hospital, for instance, designed an AI program to identify children at risk of sepsis, a dangerous response to an infection. But the program took longer to flag Latino kids than white kids, possibly delaying the identification and treatment of Latino children with sepsis. The bias, it turned out, existed because doctors themselves took longer to diagnose sepsis in Latino kids. This taught the AI program that these children might develop sepsis more slowly or less often than white children.
“What’s really lacking in AI right now are standards for evaluating data quality,” Cho said. “What does it mean to have a safe and effective AI tool in a health care setting?”
When Cho interviewed developers, she was surprised by how many admitted the potential pitfalls of their products; she had suspected they might not be aware of all the risks and biases.
“Despite not having training in medical research, most developers were actually able to identify quite a few potential harms that might come about as a result of their work,” she said. “They were thinking on a much bigger level than I thought they might be.”
But when she asked them what to do about these potential downsides, the developers tended to pass the buck. They said that someone else — their bosses, their companies, the health care systems using their products, or physicians themselves — should be making sure the AI programs were used ethically and responsibly.
“The phrase I heard the most often was ‘at the end of the day,’” recalled Cho. “They would shrug and say things like, ‘At the end of the day, this is a business’ or, ‘At the end of the day, I’m just a low-level data scientist and it isn’t my problem.’”
Cho doesn’t agree. She wants to teach AI developers that small decisions they make while coding can have massive implications for patient care. So in 2022, she began a pilot program offering two-hour group training sessions in ethics for AI programmers. In each session, she asked the developers to begin brainstorming what they would need to make a machine learning algorithm that predicted diabetes risk.
At first, she told them they were making a research tool and asked what they’d need to consider in creating it. Their list, she said, was mostly technical: They needed high-quality patient data and good existing models of what health and demographic factors influence diabetes. Next, Cho asked them to repeat the exercise but to assume their program would be used in a large health care system rather than only for research. At this point, she said, the developers started talking about clinicians for the first time, imagining how doctors and nurses might implement the AI into their practice. These kinds of considerations, Cho explained, can ultimately change how the AI is designed in the first place.
Finally, Cho and her colleagues asked the developers to imagine that they were creating the AI tool for diabetes screening not just for any health care system, but for their own health care system. Suddenly, the developers began talking about the patient perspective of the algorithm, discussing topics like how to ensure that patient privacy is maintained and that health care remains high-quality.
“They actually switched their entire perspective and considered completely new aspects of the project,” Cho said.
Her hope is that developers who go through this exercise can apply the lessons to their work, putting themselves in clinicians’ and patients’ shoes while creating health care-related AI programs.
“What I want is for developers to move toward thinking about their own ethical responsibilities, anticipating what harms their programs could have, and pulling those ideas into the design phases of their work,” she said.
So far, Cho has tested the training with five groups of four developers. Eventually, she’d like to try it in a workplace environment, with developers carrying out the exercise with real AI software that they’re in the process of coding, rather than a hypothetical diabetes-prediction tool.
Peter Washington, PhD, an assistant professor of information and computer sciences at the University of Hawaii, participated in Cho’s pilot program when he was a graduate student at Stanford University. Washington has built machine learning programs to detect autism and has worked as an intern at Google, Amazon and Microsoft Research. He now leads a digital health research lab in Hawaii building a variety of machine learning models for diagnosis and disease tracking. He said that programs like Cho’s, which encourage developers to think about the applications of their work, can help improve the privacy and fairness of AI. Now, he integrates ethics lessons that he learned while interacting with Cho and other members of the Stanford Center for Biomedical Ethics into the computer science classes he teaches.
“Ethics is not usually taught in computer science programs, and if it is, it’s an elective rather than a required course,” Washington said. “But I think it’s really important for developers to understand core sociotechnical issues like the inherent trade-offs that exist between things such as privacy and accuracy.”
In other words, the more patient details an AI program has access to, the more accurate it may be but the more likely it is to invade privacy. He said it’s especially important for developers to think deeply about what data they are using to train their machine learning programs. A program that has learned patterns from patients that are mostly white, for instance, may not work as well for Black or Latino patients; a program trained on data from a small, rural hospital may not draw accurate conclusions if used in a large, urban hospital.
Even when they don’t have ultimate control over when and how their products are used, AI developers can make changes that go a long way towards solving these challenges, Washington said. For instance, fairness metrics — numbers showing the potential biases within an AI model — can add transparency to an AI program and help users understand how the model might perform differently in different settings and for different populations of patients.
“You can write a few quick lines of code that calculate fairness metrics that will uncover potential biases in the model,” he explained. “This is incredibly easy to implement if you have the demographic data available, but very few groups are incorporating this as standard practice.”
Ultimately, the companies developing AI tools and the health care systems deploying them in hospitals and clinics do have to take responsibility for the ethical use of AI in medicine, Cho said.
“It is hard to completely pin responsibility on developers for how things are used after they’re released,” she admitted.
But the more developers think about how to minimize bias in their tools and be transparent in the strengths, weaknesses and best applications of their AI tools, the easier it will be to use their products in appropriate — and ethical — ways, she said.