Peering inside the black box of AI

Stephen Ornes

doi:10.1073/pnas.2307432120

. 2023 May 24;120(22):e2307432120. doi: 10.1073/pnas.2307432120

Peering inside the black box of AI

PMCID: PMC10235959 PMID: 37224179

Five years ago, computer scientist Alan Fern and his colleagues at Oregon State University, in Corvallis, set out to answer one of the most pressing questions in artificial intelligence (AI): Could they understand how AI reasons, makes decisions, and chooses its actions?

graphic file with name pnas.2307432120fig01.jpg — The military is among the institutions using artificial intelligence to perform high-tech, high-stakes, and potentially dangerous tasks. Understanding exactly how that AI makes decisions could help users more effectively manage, debug, and, importantly, trust their machine partners. Image credit: Shutterstock/sibsky2016.

The proving ground for their exploration was five classic arcade games, circa 1980, including Pong and Space Invaders. The player was a computer program that didn’t follow predetermined rules, but learned by trial and error, playing those games over and over to develop internal rules for winning, based on its past mistakes (1). These rules, because they’re generated by the algorithm, can run counter to human intuition and be difficult, if not impossible, to decipher.

Fern and his colleagues were optimistic—at first. They developed methods to understand the focus of their game-playing AI’s visual attention. But they found it nearly impossible to confidently decode its winning strategies

Fern’s group was one of 11 that had joined an Explainable AI, or XAI, project funded by the Defense Advanced Research Projects Agency (DARPA), the research and development arm of the US Department of Defense (2). The program, which ended in 2021, was driven in large part by military applications. In principle, research findings would help members of the military understand, effectively manage, debug, and, importantly, trust an “emerging generation of artificially intelligent machine partners,” according to its stated mission.

But XAI implications reach far beyond the military. Researchers worry that a lack of understanding of AI systems and how they make decisions will erode trust among all kinds of users. That may not matter for playing arcade games or knowing how Uber assigns its drivers. “Most of the time, most people don’t understand anything about the internal workings of the hardware and software that they use,” Fern says. “It is just magic.”

But it matters plenty when the stakes are higher and a lack of understanding could limit AI’s utility—whether making decisions on the battlefield or in a hospital. It could also matter in the case of increasingly prominent generative AI applications like ChatGPT and GPT-4, which can generate impressive facsimiles of human language—and even computer code—but can also produce falsehoods or err in reasoning. “It’s only when the decision process is transparent to the user [will] the user trust or rely on the system, even if it’s really good,” says Luyao Yuan, a computer scientist at Meta, in New York City, who worked with another group in the DARPA XAI project (3).

Results from the XAI program suggest that there may never be a one-size-fits-all solution to understanding AI, says Matt Turek, a DARPA computer scientist who helped run the program. But with AI seeping into nearly every facet of our lives, some contend that the quest to open the black box has never been more vital.

Hidden Figures

The term “artificial intelligence” generally refers to computer programs that tackle problems usually solved by humans. Not every such algorithm is complicated and opaque. One family of algorithms called decision trees, for example, follows a strict set of tests (yes/no or true/false questions, for example) to classify information, and its reasoning process is transparent. A user can see the step-by-step reasoning, and the outcome is deterministic.

I think the real promise of this is triaging cases that are easy or noncontroversial.

—Patrick Shafto

But over the last decade or so, interest has exploded in more sophisticated AI methods called deep neural networks composed of computational nodes. The nodes are arranged in layers, with one or more layers sandwiched between the input and the output. Training these networks—a process called deep learning—involves iteratively adjusting the weights, or the strength of the connections between the nodes, until the network produces an acceptably accurate output for a given input.

This also makes deep networks opaque. For example, whatever ChatGPT has learned is encoded in hundreds of billions of internal weights, and it’s impossible to make sense of the AI’s decision-making by simply examining those weights.

Still, deep neural networks have driven many of the latest headline-grabbing advances in AI, from helping physicians diagnose disease to generating original images in the style of a favorite artist, to composing human-like prose, and to enabling the development of autonomous vehicles.

Understanding these networks has become critical. For example, researchers have been testing such AI systems against human clinicians in areas ranging from emergency triage, where they can stratify incoming patients according to how quickly they need attention, to analyzing computed tomography (CT) scans for diagnosing the severity of COVID-19 cases (4, 5). If people don’t understand the AI system, they may not be able to use it effectively—or trust it. “Explanations do lead to trust,” Turek says.

Peeking Under the Hood

It’s to this end that DARPA launched the XAI program in 2015, led by DARPA computer scientist David Gunning. (Turek took over the program in 2018.) It was motivated in part by advances in the abilities of deep networks, for example, to accurately classify images. Researchers recognized the extraordinary potential of these systems, but became increasingly concerned with their black-box-like opacity. Some AI also has troubling outputs that beg for explanation—a tendency, for example, to amplify racial or other biases implicit in their training datasets (6).

Gunning and Turek anticipated that their results would fall into one of three categories (7). The first would include new kinds of “interpretable” AI models that could be understood by experts in the field, like mechanics peering into the inner workings of an engine, to track the reasoning process. A second class would include models that not only produce predictions, but also output explanations that a human user would understand. The third approach was less direct: Instead of directly probing the black box itself, the model would generate a “white box” proxy—an explainable model that mimicked the inputs and outputs of the original.

The Fern group’s arcade-game-playing AI fell into the second category. They approached the problem by augmenting their algorithm with code to show where, on the screen, the AI focused its attention when it acted and what part of its internal network was active (1). But the results were maddeningly inconclusive. In some games, the algorithm seemed to decide based on some event on the screen, but not in others. Also, the observed correlations between the algorithm’s decisions and patterns of its internal activity weren’t consistent, so Fern felt there was still a huge gap in their understanding of the system. “No matter how hard we tried, we never really found a one-to-one correspondence,” Fern says.

Computer scientist and mathematician Patrick Shafto at Rutgers University in Newark, New Jersey, led the development of a model in the third category—one that could effectively categorize what happened in the black box, even without getting inside—and had some success.

The group developed a probabilistic method for explaining a deep neural network that could recognize and diagnose pneumothorax, also known as a collapsed lung, in chest X-rays. In addition to reporting whether it identified the condition, the algorithm selected two or three images from the dataset of training X-rays that best explained why the image was classified as pneumothorax or not. It then shared those images with a subject matter expert—a radiologist, in this case—as a visual explanation for its decision regarding the new image.

Shafto and his team found that experts who saw these sample images, after a new diagnosis, were more likely to trust the system (8). Even so, Shafto thinks it’s unwise to entirely trust an AI system in high-stakes domains like medicine. At the same time, he sees ways for new models to ease the workload of radiologists. “I think the real promise of this is triaging cases that are easy or noncontroversial,” Shafto says.

Explain Yourself

Leading members of the AI community suspected that better explanations led to worse performance, Turek says, but some of the DARPA projects challenged that assumption. In some cases, models trained to provide explanations did better than their unexplainable counterparts. “That was something we didn’t expect and wouldn’t have been intuitive at the beginning,” Turek says.

Work by Yuan’s team serves as an example. Before he joined Meta, Yuan worked with a group at the University of California Los Angeles to develop an AI-based game, similar to the game Minesweeper, in which a robotic scout autonomously navigates a safe passage through a dangerous field littered with bombs. As the AI makes decisions, it sends plain-language descriptions of its rationale back to a user, who can respond. Based on whether the user prefers the fastest, shortest, or safest route, for example, the robot adjusts its rationale and behavior. Most AI systems are built to achieve a well-defined task, Yuan says, but his robotic scout could pivot to a new task because of interactions with a user (3). “The goal could change every time, but there’s no way we could program that into the model,” he says.

The idea that an AI system could work better if it explains itself aligns with recent work by Duke University computer scientist Cynthia Rudin, who says it’s a “myth” that interpretability requires sacrificing accuracy (9). In recent work, she and her colleagues have developed interpretable, machine learning systems that don’t use deep networks and are transparent in their reasoning. For tasks related to finance, healthcare, and predicting criminal activity, these simpler models performed as well as or better than black box programs (10). Understanding when and where AI is appropriate—and where it isn’t—remains a huge challenge, she says.

Explainability can also improve user trust, Turek says. When Yuan and colleagues invited volunteers to play the game, they found that explanations and solicitations for guidance increased players’ ratings of how highly they trusted the system. “A system that could take advice was seen as more trustworthy to users than a system that didn’t,” Turek says.

A big challenge, however, is grasping just how reliable and comprehensive an explanation must be in order to be of practical use. After Fern became frustrated with the inconsistent results of his arcade-game-playing AIs, he closely examined other studies claiming to have cracked the black box of neural nets—and found those wanting as well. For example, some teams claimed that the activations of the computational nodes in their neural network corresponded with certain pieces of information, even if that correspondence was true only 60% of the time, Fern says. “But if I’m talking to you and whenever you say ‘cat,’ it only means ‘cat’ 60% of the time, you’re not a very reliable person to talk to.”

In a 2019 talk titled “Don’t Be Fooled by Explanations,” he expressed skepticism about published results claiming to “explain” how AI works in recognizing images. “To some degree, this stuff is like reading tea leaves,” Fern says. Rudin, at Duke, points out that if an explanation is wrong even 10% of the time, then it probably shouldn’t be trusted. “Even worse, those 10% are probably the most difficult and important 10% of the data to classify,” she says.

Efforts to build interpretable and explainable AI have an important purpose, Fern says, even if it’s not the one that researchers originally envisioned. He thinks generating explanations will be useful in debugging a system—if a vehicle’s AI system, for example, can explain how it mistook a truck for a tree, or why it thought a lung infection was a tumor on a CT scan, then programmers can debug the algorithm. “If an explanation looks wrong, then it can highlight a problem in the system,” Fern says.

Moreover, it’s unrealistic to expect every algorithm to be able to explain itself in the same way, Fern adds. “Getting something that is human-understandable for a lot of really hard problems that neural nets are solving is not going to be possible,” he says. Plus, the research into XAI, including his own group’s findings, have revealed that the very notion of explainability is slippery. “Humans don’t even know all the things that go into our own decision-making,” he notes. Nevertheless, Fern says that he does believe it will be possible to develop a better understanding of how these systems work with further advances in both computer and cognitive science.

Indeed, as real-world applications increasingly depend on decisions by AI systems, a working knowledge of the decision process will be critical in building trust and avoiding harm. “What we want,” Shafto says, “are better outcomes.”

References

1.Koul A., Greydanus S., Fern A., Learning finite state representations of recurrent policy networks. arXiv [Preprint] (2018). https://arxiv.org/abs/1811.12530 (Accessed 25 March 2023).
2.Turek M., Explainable artificial intelligence (XAI). https://www.darpa.mil/program/explainable-artificial-intelligence. Accessed 25 March 2023.
3.Yuan L., et al. , In situ bidirectional human-robot value alignment. Sci. Robot. 7, eabm4183 (2022). [DOI] [PubMed] [Google Scholar]
4.Raita Y., et al. , Emergency department triage prediction of clinical outcomes using machine learning models. Crit. Care 23, 64 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Fusco R., et al. , Artificial intelligence and COVID-19 using chest CT scan and chest X-ray images: Machine learning and deep learning approaches for diagnosis and treatment. J. Pers. Med. 11, 993 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
6.OpenAI, GPT-4 technical report. arXiv [Preprint] (2023). https://arxiv.org/abs/2303.08774 (Accessed 25 March 2023).
7.Gunning D., Vorm E., Wang J. Y., Turek M., DARPA’s explainable AI (XAI) program: A retrospective. Appl. AI Lett. 2, e61 (2021). [Google Scholar]
8.Folke T., Yang S.C.-H., Anderson S., Shafto P., Explainable AI for medical imaging: Explaining pneumothorax diagnoses with Bayesian teaching. arXiv [Preprint] (2021). https://arxiv.org/abs/2106.04684 (Accessed 1 March 2023).
9.Rudin C., Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Angelino E., Larus-Stone N., Alabi D., Seltzer M., Rudin C., Learning certifiably optimal rule lists for categorical data. J. Mach. Learn. Res. 18, 1–78 (2018). [Google Scholar]

[r1] 1.Koul A., Greydanus S., Fern A., Learning finite state representations of recurrent policy networks. arXiv [Preprint] (2018). https://arxiv.org/abs/1811.12530 (Accessed 25 March 2023).

[r2] 2.Turek M., Explainable artificial intelligence (XAI). https://www.darpa.mil/program/explainable-artificial-intelligence. Accessed 25 March 2023.

[r3] 3.Yuan L., et al. , In situ bidirectional human-robot value alignment. Sci. Robot. 7, eabm4183 (2022). [DOI] [PubMed] [Google Scholar]

[r4] 4.Raita Y., et al. , Emergency department triage prediction of clinical outcomes using machine learning models. Crit. Care 23, 64 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r5] 5.Fusco R., et al. , Artificial intelligence and COVID-19 using chest CT scan and chest X-ray images: Machine learning and deep learning approaches for diagnosis and treatment. J. Pers. Med. 11, 993 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r6] 6.OpenAI, GPT-4 technical report. arXiv [Preprint] (2023). https://arxiv.org/abs/2303.08774 (Accessed 25 March 2023).

[r7] 7.Gunning D., Vorm E., Wang J. Y., Turek M., DARPA’s explainable AI (XAI) program: A retrospective. Appl. AI Lett. 2, e61 (2021). [Google Scholar]

[r8] 8.Folke T., Yang S.C.-H., Anderson S., Shafto P., Explainable AI for medical imaging: Explaining pneumothorax diagnoses with Bayesian teaching. arXiv [Preprint] (2021). https://arxiv.org/abs/2106.04684 (Accessed 1 March 2023).

[r9] 9.Rudin C., Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[r10] 10.Angelino E., Larus-Stone N., Alabi D., Seltzer M., Rudin C., Learning certifiably optimal rule lists for categorical data. J. Mach. Learn. Res. 18, 1–78 (2018). [Google Scholar]

PERMALINK

Peering inside the black box of AI

Stephen Ornes

Roles

Hidden Figures

Peeking Under the Hood

Explain Yourself

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Peering inside the black box of AI

Stephen Ornes

Roles

Hidden Figures

Peeking Under the Hood

Explain Yourself

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases