Skip to main content
PNAS Nexus logoLink to PNAS Nexus
. 2025 Mar 11;4(3):pgaf030. doi: 10.1093/pnasnexus/pgaf030

Engineering and AI: Advancing the synergy

Ramalingam Chellappa 1,2,b,, Guru Madhavan 3,, T E Schlesinger 4, John L Anderson 5
Editor: Yannis Yortsos
PMCID: PMC11887848  PMID: 40070433

Abstract

Recent developments in artificial intelligence (AI) and machine learning (ML), driven by unprecedented data and computing capabilities, have transformed fields from computer vision to medicine, beginning to influence culture at large. These advances face key challenges: accuracy and trustworthiness issues, security vulnerabilities, algorithmic bias, lack of interpretability, and performance degradation when deployment conditions differ from training data. Fields lacking large datasets have yet to see similar impacts. This paper examines AI and ML's growing influence on engineering systems—from self-driving vehicles to materials discovery—while addressing safety and performance assurance. We analyze current progress and challenges to strengthen the engineering-AI synergy.

Keywords: artificial intelligence, autonomous systems, robotics, materials and manufacturing, ethics and responsibility

Introduction

In March 2004, 15 teams with 15 driverless vehicles trekked out to the Mohave Desert to compete in a $1 million Grand Challenge hosted by DARPA, the US Defense Advanced Research Projects Agency. The task: to have their vehicles, piloted only by AI, complete a winding, 150-mile dirt-road route from Barstow, CA, to Primm, NV, negotiating its sometimes-sharp curves and avoiding the stones, steep embankments, and sharp drop-offs found at various points along the route. DARPA hoped that the contest might help accelerate the development of autonomous vehicle technology that could be adapted for military purposes.

It did not go as well as one might have hoped. The 15 vehicles were remnants of an original group of 21 entrants who had been tested to navigate a mile-long stretch of the California Speedway that had been made more challenging by the placement of various obstacles. Only 7 of the 21 vehicles reached the end, although 8 more got far enough to move on to the real test.

Of those final 15, 2 pulled out ahead of time, and another 5 did not make it out of the starting area for one reason or another. One of those, for instance, flipped over and started leaking fuel. The vehicle with the day's best performance—from a Carnegie-Mellon group called the Red Team—made it 7.36 miles, <5% of the planned course, before running into an obstacle that caused its front wheels to catch fire. Others came up (way) short because of a failed sensor, a problem with the vehicle's throttle, or getting caught on an embankment.

One of the clear lessons of the challenge was that while the teams had tended to focus their efforts on developing AI systems that could observe the surrounding environment and pilot the vehicle, it was the interaction with the physical world that produced engineering issues that proved to be the Achilles’ heel for most of the entrants. No matter how “smart” the AI might have been, it needed to be paired with an engineered system that was capable, well matched, and well integrated with the AI.

Undaunted by the underwhelming performances—DARPA announced that it would repeat the challenge in 18 months with double the prize money, $2 million, awarded to whichever vehicle could finish the course in the fastest time. The agency's optimism was rewarded in October 2005 when five vehicles finished the revised 132-mile course, and all but one of the challenge's 23 finalists traveled further than the Red Team's 7.36 miles in the first competition.

A team from Stanford was the winner, with its vehicle completing the course in 6 h and 54 min. Two teams from Carnegie-Mellon, the Red Team, and Red Team Too, came in second and third, 11 h and 20 min behind the winner. While shorter than the earlier one, the course was significantly more challenging, with narrower roads, many more curves, and a final stretch through a mountain pass with steep drop-offs on either side. DARPA saw the challenge as a success since it created a mindset and jump started a community that would lead to the development of various autonomous military ground vehicles and autonomous cars, which are now starting to become a presence in various US communities.

More broadly, as AI has grown in power, developers in a variety of areas have begun efforts to combine AI with engineered systems to create machines that can gather information from their environments, make decisions, and then take actions based on those decisions, with or without human supervision. Supporters and visionaries have suggested that the marriage of AI and engineering has the potential to revolutionize nearly every part of the designed world, from transportation and manufacturing to medicine, consumer goods, and military technology. The potential comes not just from adding the capabilities of AI to the capabilities of engineered systems but from the expected synergistic effects of merging the two technologies, as engineered AI systems are typically much more than the sum of their parts.

However, as the DARPA challenges illustrated, the road to this future will not likely be smooth. New technologies always pose challenges, and the hurdles to developing effective and trustworthy AI systems are becoming increasingly apparent. And any new engineering design will have its challenges; in addition, one can expect synergy to be at play here, too, as the challenges to building engineered AI systems are likely to be much greater than the sum of the challenges of the individual parts.

Consider, for instance, the problems discovered in DARPA's Grand Challenge. While many of the teams focused on the AI part of the challenge, believing that engineering an off-road vehicle should not be that difficult, a significant percentage of the breakdowns actually involved the engineered parts. Various issues arose from the interplay between the AI, the engineered design, and the physical world in which the system was operating. An analysis after the fact found a direct correlation between how many autonomous miles a vehicle had driven before the challenge and how well it performed in that challenge. This is unsurprising to engineers, who recognize that unanticipated problems will inevitably arise in complex systems and that testing is the best way to identify and address failure modes. Still, it is a lesson that must be kept in mind as more engineered AI systems are developed and used.

In “The Unreasonable Effectiveness of Data,” Halevy, Norvig, and Pereira (2009) showed that enormous data sets could be used effectively for statistical machine translation. More importantly, they suggested that data could be used in a new way to understand systems of inherent complexity that do not lend themselves to “traditional” mathematical approaches for their description. This has blossomed today in various application areas through the various architectures and techniques we refer to as data science and AI. But many fundamental questions remain unanswered, such as:

  • Are these approaches revealing new truths about the universe, in the same way that traditional mathematical descriptions of physics allowed us to understand aspects of how the universe works, or are they just especially useful and important tools helpful in advancing our descriptive knowledge and, again, enhancing our ability to do useful tasks.

  • Why do these systems work as well as they do? What emergent behaviors might surprise us? Can we fully understand all that data science and AI allow us to do for engineering design and beyond?

  • What are the implications for educating engineers and fostering broader cultural responsibility?

Given the growing interest in engineered AI systems of various sorts, it is a good time to take stock of what might be expected, both in terms of potential and challenges, as AI and engineering are brought together—and not just what AI can offer to engineered systems but what engineering has to offer those developing AI systems. For this paper, we define an engineered AI system as one that exploits sensed data and domain knowledge using techniques from AI and machine learning (ML), subject to the expectations that such a system will be safe, robust, reliable, and provide assured performance.

This article proceeds as follows. See “Autonomous systems” for a discussion of the development of AI-enabled autonomous cars and planes. See “Robotics” presents developments in AI-assisted surgery and the potential of an AI-guided closed-loop anesthesiology. It also covers the discussions on AI and robotics, emphasizing high-mix manufacturing. See “Materials science” discusses recent developments in AI-guided discovery of new materials. The technical, societal, and policy-related challenges of building and deploying engineered AI systems are discussed in “Challenges,” followed by a summary.

Autonomous systems

No other use for engineered AI systems has received more attention than autonomous vehicles. Whether on land, in the air, or at sea, vehicles that can pilot themselves have several advantages over those that require humans behind the wheel: they can take on jobs that are too dangerous for humans, they can potentially be much safer in terms of avoiding accidents, they can operate without stopping for much longer periods than is possible with a human operator, and they can be significantly cheaper to build and deploy, among other things. Furthermore, they are much farther along than most other engineered AI systems, offering an obvious place to start our discussion.

Autonomous vehicles

Since DARPA's Grand Challenge two decades ago, autonomous ground vehicles have come a long way. Waymo driverless cabs are on the streets of Los Angeles, San Francisco, and Phoenix. Driverless buses are circling a route on Treasure Island in San Francisco Bay—albeit each with a human attendant who can take over the driving if necessary. And Tesla has some 2 million cars on the road with its full self-driving option. However, the company warns that the option should be used only in certain circumstances and under the supervision of a human driver. At this point, the Tesla cars surveil their environment with eight cameras that provide a 360° view, and the driving decisions are made by a neural net trained on a huge—and rapidly growing—dataset of cars traveling in every sort of situation on every sort of road.

The stumbles of these self-driving vehicles have been well reported. On 2024 June 19, in Phoenix, police pulled over an autonomous Waymo vehicle because it had been driving into oncoming traffic. According to the company, the car had gotten confused by inconsistent signage in a construction zone, driven into the wrong lane, and been blocked from returning to the correct lane. When a police car pulled in behind it and turned its lights on, the company said, it pulled through an intersection to avoid blocking it and then pulled into a parking lot. The police report was a bit more colorful, saying that the Waymo vehicle drove through a red light and “FREAKED OUT” before it pulled over (1).

Meanwhile, a report noted that Tesla cars in autopilot mode (a somewhat limited version of full self-driving) had been involved in 736 crashes since 2019, including 17 fatalities (2). Tesla responded that its vehicles on autopilot or in full self-driving mode have better overall safety records than other cars driven by humans and attributed the accidents to drivers using the autopilot and self-driving mode in ways they were not supposed to—either on roads these driving tools were not designed for or without paying sufficient attention to what the car was doing. Critics counter that Tesla's advertising of these options leads drivers to believe it is safe to let them do the driving with minimal oversight.

One difficult-to-overcome problem is that autonomous cars tend to make nonintuitive mistakes—or at least ones that seem unexplainable to human drivers who do not understand AI but expect it to make common-sense choices. The Waymo taxi driving into oncoming traffic and then running a red light when the police car turned on its lights is one example. But one of the best examples is the Tesla car on autopilot that ignored an 18-wheeler that had pulled out in front of it and drove full speed into the side of the trailer, instantly killing the driver (3). Tesla argued in court documents that its autopilot function had not been trained to reliably detect vehicles moving perpendicular to the car—as the semi was—and while that sort of technical argument may well win in court, it will not do much to assuage a consumer who is wondering how the “AI” driving a car can completely ignore the fact that an 18-wheeler has pulled out in front of it.

This is part of a larger phenomenon that those developing AI and engineered AI systems must grapple with: average consumers—that is, those who have not delved into the practical details of AI—expect that something described as “intelligent” should have at least a modicum of what people think of as “common sense.” However, existing AI systems do not have common sense. AI “knows” what it has been trained to know, and in the areas in which it has been trained, it generally performs better than most humans. But pose a question or ask that it decide or take an action it has not been professionally trained for, and you will most likely be disappointed. Another way of saying this is AI does not like surprises!

Bringing an engineering perspective to some issues affecting autonomous vehicles could point the way to possible solutions. Engineers understand, for instance, that handling the physics aspects of this technology is easy; dealing with human drivers is hard. Tesla's approach seems to be to develop an elegant technology and then expect its users to perform exactly as Tesla said they should—e.g. even though the cars are marketed as having a “full self-driving mode,” drivers should always keep their hands on or near the wheel and their eyes on the road, paying attention to their own car and the other cars around them and always being ready to take over. But a human-factor engineer understands that technological design needs to consider how users will likely behave and adjust accordingly. This insight could be applied to any engineered AI design. The design of trustworthy engineered AI systems is contingent on understanding how humans should interact with AI systems. Investigation of human–AI interfaces is one of the significant challenges that need much more attention in the future (4).

Autonomous flying vehicles

Autonomous flying vehicles face different challenges than autonomous vehicles that travel on the ground, and in terms of real-world deployment, they are further along. Consider, for instance, the suite of products made by Dzyne Technologies: Long-Endurance Aerial Platform (LEAP), ULTRA, and ROBOpilot.

The LEAP is a fully autonomous aircraft used by the military for intelligence, surveillance, and reconnaissance (ISR)—that is, it can fly over an area with cameras and other detection devices to get a clear look at what is happening on the ground—and it can stay in the air for 30–36 h. Based on a commercial aircraft design, it takes advantage of commercial production lines to keep its cost low, and it took only 10 months from design to first flight.

LEAP is designed to do its job with a minimum of human involvement. It can take off and land by itself, allowing “point and click” operations, where a controller tells it where to go, and the plane takes it from there. It also has a sense-and-avoid capability to prevent crashing into something else. It analyzes the data from its imaging sensors to detect and track targets. And it has various sensors that allow it to monitor its performance and diagnose problems. Over its 60,000 h of operations, the vast majority of undesired incidents have been caused by human error after a human operator decided to take over. In other words, experience has shown that interfacing with a human operator is the major challenge facing the plane's designers and not the autonomous flying itself. Since greater autonomy has led to higher reliability, the ideal solution would seem to be to engineer humans out of the loop altogether. Still, the military has insisted that a human operator must be involved. Thus, challenges remain in finding the right balance between autonomy, human oversight, and occasional involvement.

The ULTRA, for Long-Endurance UAS Platform (UAS refers to an uncrewed aerial system), has taken several steps beyond LEAP. Another point-and-click, fully automated ISR aircraft can stay in the air for over 3 days—enough to fly from Abu Dhabi to the tip of South Africa, loiter there for a day, and fly back.

The AI developed for these autonomous aircraft makes it possible to carry out missions with fewer operators. More generally, automation makes it possible to fly smaller aircraft (because no pilot is needed), fly farther, and reach locations other aircraft cannot—basically, carry out tasks that are impossible with human-crewed aircraft.

ROBOpilot is a robot that can be installed in a general aviation aircraft, turning it into a fully autonomous vehicle. ROBOpilot is trained in a simulator; it performs fully autonomous takeoff, midflight, and landing; it handles most contingencies; and, designed to mount on standard seat rails, it can be installed in 4 h. It uses cameras to watch the plane's gauges. Getting ROBOpilot to the point that it could fly independently took about a year and a half of simulator training. The large amount of training data it required was generated by simulation.

One of the key questions raised by both autonomous flying and ground vehicles is how AI should interact with a human pilot or driver. How should the responsibilities be divided? Should the AI be given complete control? Should the human be able to step in and assume control when it seems necessary? Should there be well-defined criteria for when AI is in charge and when humans are not? Should the human always have the ability to override the AI? The answers to these questions may ultimately not be based on any objective engineering considerations but, at least in part, on how humans “feel” their role should or should not be preserved.

Ultimately, the real value of AI-controlled vehicles will not appear until it is possible to eliminate humans altogether and have, for example, buses without drivers and airplanes without pilots. But that point has not yet been reached, and as long as we insist that a human driver or pilot must be on standby, ready to take over when necessary, another challenge must be addressed. It is human nature that when one is not fully engaged, attention wanders, and an optimal “cognitive load” is required to keep humans engaged (5). That means that if something happens that requires human intervention, the human may not notice until it is too late or may not be fully prepared to take the necessary actions.

It is possible, of course, for the AI to learn to recognize certain situations where human intervention might be needed—if an autonomous car is entering a construction zone, for example—and then signal the human to be ready to take over. But there will always be situations, such as the 18-wheeler pulling out in front of the Tesla—where either the AI does not recognize a problem, or the problem appears too quickly for the human, who probably was not paying full attention, to respond in time. It is also possible that the human takes over when it is not necessary and takes actions that make an accident more likely. Experience with autonomous drones has shown that accidents are most likely to occur after a human takes over.

Robotics

The simplest definition of a robot is a machine equipped with sensors, processors, and actuators that allow it to sense its environment, make decisions, and perform actions in the physical world. Many special-purpose robots are operating today, from the Roomba vacuuming our rooms to assembly robots working in manufacturing plants and the array of robots that Amazon uses in its warehouses and fulfillment facilities to sort and move packages. General-purpose robots, on the other hand, are still more a dream than a reality. It should be noted that the vacuuming Roomba was designed using subsumption architecture, which was originally proposed nearly four decades ago (6).

ML plays multiple roles in robots. It is used in the robot's perception system to help it recognize objects in its surroundings and deal with the fact that an object can look quite different when seen from different angles, in different light, or against different backgrounds. It is used in planning so the robot can learn from experience and make better decisions. And it is used in the robot's movement control loop to improve accuracy. In short, ML helps a robot adapt to the physical world and respond effectively to it.

Many factors have driven the increasing use of robots in various settings, such as dramatic improvements in AI and ML, increased computing power, and the availability of more reliable and affordable robot hardware. Robot perception is still a major bottleneck, but recognizing the value of AI-directed robots means that much money is being spent on overcoming this challenge and others. A short list of the areas where AI-driven robots have tremendous potential would include warehouse robotics, consumer robotics, robot delivery, surveillance robots, agriculture robots, autonomous driving, robot construction, and manufacturing. Each of these areas represents a market of tens of billions to thousands of billions of dollars.

A major challenge is developing robots that can move around in challenging environments. Today, most mobile robots are designed to work on relatively flat surfaces with relatively few obstacles and do not do well in challenging environments. However, considerable progress has been made in mobile robot hardware—including both humanoid-style robots and quadruped robots that move more like dogs—and there is continuing work to improve the AI systems that direct these robots, so the future is likely to see AI-directed robots that are much better at moving around challenging, unpredictable environments.

Beyond that one goal will be to develop general-purpose robots that can operate in multiple environments and learn to do various tasks. However, this will remain far in the future.

High-mix manufacturing applications

To explore the potential and challenges of AI in robotics in more detail, consider the use of AI-powered smart robotic cells in high-mix manufacturing. “High-mix” manufacturing refers to manufacturing a large number of different parts, each in low quantities, as opposed to mass manufacturing.

Robots have a long history in mass production since they can be set up and programmed once and then perform the same task over and over again, thousands or even millions of times, with relatively little human oversight or intervention other than to make sure that the robot is performing exactly as designed. However, high-mix manufacturing—which is a part of, for example, aircraft production and the making of machine tools—has a different set of requirements that are not well matched with these traditional robots. With each new task, the robot may need to switch out tools and likely have to use different motions. An example would be a painting robot that must paint various parts with different shapes and sizes. As long as it is painting thousands of identical parts, it makes sense to design and program the robot for that one job. Still, if it is painting just 5 or 10 identical parts at a time and then switching to something else, having a human program it for each new part is not practical. On the other hand, this sort of tedious task is not something that most people want to do, so it would be valuable to do it with robots.

Enter AI. The steps required to use robots in high-mix manufacturing are all tasks that AI has been shown to be able to do in various settings. First, the robot must be able to examine a part and create a digital model of the part that will be used to guide its motions. Then, it must generate an efficient trajectory along the part to allow it to complete its tasks. And, since there will always be uncertainties in the model's representation of the part, the robot must have a control system to deal with the uncertainties. While the robot is carrying out its task, it must monitor the work to guarantee its quality. It also needs a smart human–machine interface so that it can respond correctly to human directions and guidance.

Such AI-powered smart robotic cells do exist (for example, made by GrayMatter Robotics (7)). One such robot is used to sand parts, for instance. It learns its tasks by watching an expert human doing the desired task, and it uses sensors and a physics-based model to make decisions about its behavior.

A smart robotic cell of this sort needs five characteristics: it must be able to program itself, it must be able to adapt by conducting experiments, it must ensure safe execution under uncertainty, it must be able to seek help from humans, and it must be able to communicate effectively with humans.

AI is used in various ways in these smart robotic cells. It detects defects through deep learning. It carries out active learning for tuning simulation models using experimental data. It makes predictions through models trained using simulation data. It builds models using reinforcement learning. It visualizes neural networks for diagnostics. And generative AI is used to create synthetic data for training and creating test scenarios.

One challenge is that data-driven AI requires a large amount of data to develop a complex model, but manufacturing data are very limited. In particular, experimentally generating large amounts of data is not feasible. Sensor data are noisy, so experimentally generated data can lead to overfitting, and the predicted results may contradict physics.

One solution is to use a physics-based model to reduce the data requirement by constraining the model and reducing the effects of experimental errors. Such physics-informed AI takes advantage of known physics-based models, using a data-driven approach to augment the models with experimental data.

However, even if physics-informed AI removes some challenges to creating smart robotic cells, others will remain. It is challenging, for instance, to build a system that meets all five practical manufacturing requirements: quality, speed, task coverage, availability, and affordable cost. Furthermore, the robotics community often underestimates the adoption challenges, and since affordability can only come through scaling, this creates a chicken-and-egg problem. The best way to make inroads into manufacturing for AI-driven robotics may be to emphasize some of its advantages that other approaches cannot match, such as creating digital twins, traceability, improved product performance, and reducing the need for humans to work on ergonomically challenging tasks.

Ultimately, the vision for AI-directed robots is to use AI for such things as vision, language, planning, and control to create a robot that can communicate with humans, ask questions to make sure it understands a command, survey its environment, make a plan for how to accomplish its task given the details of that environment, then perform the motions needed to accomplish the task, use feedback to determine whether it performed the task correctly, and, if not, figure out what needs to be done differently and try again. Most of these capabilities already exist in one form or another, but a robot combining them all remains a dream.

Surgical applications

In medicine, engineered AI systems could be used—or already are being used—in various ways. One exemplar is robot-assisted surgery. Regarding surgery, humans and machines have different and complementary capabilities. Humans have excellent situational and task awareness, good judgment, and the ability to improvise. Machines are very precise, know where they are in three dimensions with great accuracy, can monitor their surroundings with various sensitive sensors, and can control their movements according to what they detect with the sensors. Thus, an ideal approach to surgery would be to combine the strengths of humans with those of machines to make it possible to do things that neither could do alone.

There are various ways to do this. In the current paradigm, a surgeon manipulates handles, which control a robot or robots carrying out the procedure. While the computer controlling the robot has much to do in translating the surgeon's movements, most of the information about the patient's anatomy, the goal of the procedure, and the approach is in the surgeon's head.

An emerging paradigm is to take advantage of the fact that there is an AI system between the surgeon and the machine doing the surgery and have AI take on further responsibilities. For example, in robotic facial surgeries, it is possible to teach the AI software controlling the robot where the facial nerve is so that it provides haptic feedback to let the surgeon know when the tool is getting too close to the nerve. It can even make it impossible for the surgeon to get closer than a certain safety margin. The surgeon still guides the tool, but the AI software works with the surgeon to increase the safety of the procedure. They are working as partners.

One can list various levels of autonomy for surgical robots, from none at all, where the human surgeon makes all of the decisions and directs every move, to task autonomy, where the robot can perform certain operator-initiated tasks automatically, to high autonomy, where the robot can make decisions but under the supervision of a qualified human operator, and to full autonomy, where the robot performs an entire surgery. The higher levels of autonomy require more sophisticated and trustworthy AI to be in charge of the robot. Still, all of these levels of autonomy require that a human be able to make it clear to the robot what it is supposed to do and trust that the robot will do what it is supposed to do—and not do things it is not supposed to do.

This, in turn, requires a shared situational awareness—i.e. that the human and the AI-guided robot have congruent understandings of the situation, the plan, what is happening at the moment, and the ultimate goal. The current approach to creating this shared situational awareness is with a digital twin, a real-time simulation of both the patient and the robot that is updated in real-time as the robot observes, moves around in, and operates on the patient. Such a digital twin can be used to guide a surgeon—it can, for instance, show a surgeon where his tools are relative to various anatomical features that might be obscured by blood or something else—and it can be used to guide an autonomous robot as a surgeon monitors the operation.

Of course, the more autonomy that is required, the more data are needed to train the model, and there are various ways that this interaction between humans and robots can be used to accumulate data. Data can be generated by human surgeons operating on a cadaver with a robot monitoring or assisting the surgery. Surgeons can use the digital twin for practice surgeries, and the data generated from those can be used for additional training of the model. In a particular robot-assisted sinus surgery, a robot holds an endoscope, freeing the surgeon's hands; the data accumulated by the endoscope in these surgeries can be used to build or improve a digital twin, and so on.

As AI-directed surgical robots become more capable and trustworthy, they will likely be given increasing amounts of autonomy. However, given the potentially catastrophic consequences of surgical mistakes and the likely resistance of patients—and surgeons—to giving robots too much responsibility, it seems unlikely that in highly developed medical contexts, surgery robots will be given full autonomy at any point in the foreseeable future for any but the most basic, least risky procedures.

Materials science

Materials science seems to be almost tailor-made for AI and ML with its aim to find materials with particular desirable properties—say, materials with the strength and heat resistance to work in a high-temperature jet engine. The traditional approach to finding these materials has involved testing a large number of materials whose selection is partly informed by insights from fundamental physics and partly by the researcher's experience and intuition. However, the number of materials that can be tested is never more than just a small fraction of all possible options. A typical metal alloy, for instance, may have five different components, so choosing an alloy involves determining which five metals should be included in the alloy and also what the proportions of these five elements should be, with the properties of the alloy often depending sensitively on both factors, along with the processing conditions used to create the alloy, all of which can determine the structure and hence determine properties. How and when the components are added to the mixture, the use of heating and cooling, annealing treatments, and the use of catalysts are all factors that can be varied to affect the final product—and are factors that must be considered when searching for a material with certain desired properties.

The huge number of possibilities can be overwhelming to human researchers, but the potential for staggering amounts of data makes this exactly the sort of area where AI can shine, and researchers have suggested several different ways that AI can be used to improve the discovery of materials optimized for particular uses. One approach is to use AI to help generate and analyze data on large numbers of candidate materials. The process of creating material samples can be automated, with robots programmed to churn out samples with different compositions or different processing conditions or both, and the analysis of these samples can also be automated, with robots measuring such things as tensile strength or grain structure. These data can then be used, for example, to train an AI model that predicts what compositions and processing conditions will lead to the best properties. Indeed, the effectiveness of this approach has already been demonstrated (8).

The greatest potential of AI in this context lies in managing robotic systems for sample production and testing, creating an autonomous closed-loop materials discovery system where AI iteratively selects, creates, tests, and analyzes materials to identify the most promising options. It would be like having an army of graduate students looking for the optimal material for a particular use. Several labs are working toward such a closed, AI-directed materials discovery system, and various pieces of it have been created and tested, but the final step to an autonomous AI discovery system remains to be taken.

It is possible to use a similar approach with a relatively small number of samples. Not every lab will have the resources to create and test tens of thousands of samples, but not every materials search requires that huge amount of data. In some situations, for example, one has a reasonably good idea of composition, process, and structure that will work and needs to zero in on the optimum from a relatively small set of possibilities. One might, for example, be looking for a material of the form ABX3, where A is a cation such as cesium or methylammonium, B is a metal cation such as lead or tin, and X is a halide such as chloride, bromine, or iodine. The processing of the material requires combining these components with a solvent, so the optimization process involves looking at combinations of various choices for A, B, X, and the solvent. Such an approach has been used successfully with as few as 12 initial data points concerning properties of different materials, and, in a real-world application, it was used to optimize high-entropy alloys (an alloy with five or more elements in relatively equal proportions) for hardness for use in space actuators, which are part of the control systems for thrusters used on space vehicles (9).

A second approach to dealing with the huge number of potential materials involved in a search is to use AI to reduce the dimensionality of the search space and thus reduce the amount of computing resources needed. For instance, this approach has been used in work to find optimal alternative fuels for use in clean energy. This sort of materials search is very computationally intensive because not only are there multiple fuels to be considered, but also the performance of these fuels must be simulated in a complex combustion system where there will be reactive turbulent flows.

A team at Sandia National Laboratories has tackled this problem by defining a low-dimensional manifold in the high-dimensional composition space consisting of all the different possible fuel compositions and doing the calculations on that low-dimensional manifold (10). If that manifold is chosen correctly, it can serve as a surrogate for the full composition space—a “reduced-order model”—in the sense that an optimal solution on the manifold will be remarkably close to the optimal solution over the entire composition space. Thus, finding a near-optimal solution with far fewer computing resources is possible.

A suitable low-dimensional manifold in the composition space will generally exist because the performances of the different compositions are not independent of one another but have various interrelationships. Those interrelationships may not always be known, but ML can be used to discover them and define the low-dimensional manifold by looking for patterns in data taken from experiments with different fuels. This requires a significant amount of high-quality training data for ML, but far less data would be needed to examine huge numbers of different compositions, looking for the best fuel. In studies of compression ignition, the Sandia group increased the efficiency of its search by a factor of >200.

Challenges

As with any other new technology, particularly one with revolutionary potential, engineered AI systems face a number of challenges that must be overcome if they are to achieve that potential. These range from fundamental technical issues with AI systems themselves to challenges in human–AI interaction, social impacts, and policy considerations.

Technical challenges

At the core of engineered AI systems are the AI models themselves, which despite their exceptional performance over the past decade, face significant technical challenges. As outlined in recent work (11), convolutional neural networks (CNNs), transformers, and stable diffusion models have revolutionized ML and AI fields, outperforming nearly all classical ML models in many applications. For example, CNN models have made significant advances in a wide range of computer vision problems, such as object detection and tracking, image classification, action recognition, image captioning, human pose estimation, face recognition, and semantic segmentation.

However, these data-driven models suffer from several critical weaknesses that raise concerns about their trustworthiness. Dominant issues include domain shift, lack of robustness to adversarial attacks, bias in decision-making, and lack of explainability (11). Domain shift refers to the situation when training and test data come from different distributions. In the case of images or videos, domain shift could arise due to changes in object pose, scene illumination, the type of sensors used to acquire the images, videos, day/night operation, sensors, etc. It has been known for almost a decade that data-driven AI models will perform poorly, even when very small amounts of perturbations are added to the image, leading to misclassification. The literature is awash with numerous attacks and the consequences of such attacks. It has been shown that CNN models, especially those designed for face recognition, exhibit different levels of performance for different subgroups of populations, leading to concerns such as bias or a lack of fairness (12).

These fundamental AI challenges manifest in particularly concerning ways when the systems are deployed in real-world engineering applications. Of all of the AI-related phenomena that have emerged to date, perhaps the most striking is the way that AI systems that give the right answer or take the right action 99% of the time or 99.9% of the time will sometimes go wrong in the most nonsensical ways; these are not minor errors or near misses but are rather outrageously wrong in ways that are difficult to understand. Researchers have accumulated long lists of such errors, mostly from large language models such as ChatGPT or generative AI. A new feature that Google introduced, AI Overviews, was quickly limited in the questions it would answer after telling users to put glue in pizza and eat rocks to get nutrients (13). And in one striking example, when an AI algorithm was asked to examine a picture and identify the color of a dress worn by a woman playing tennis, it first correctly answered orange; however, when the tennis ball in the picture was replaced with a soccer ball but nothing else was changed, the answer to the same question was white.

Although it is challenging to know what an AI algorithm was “thinking” when it made such a mistake, it is not impossible to identify why AI makes mistakes of these sorts. For instance, one issue for visual AI systems is a limit on the number and detail of annotated images that can be used to train it. To offer a simplistic example, one might train an AI algorithm to recognize a dog in a photograph by giving it a million photographs of dogs and a million photographs without a dog in the picture; with that training, it can identify a photograph with a dog in it. What happens if you then provide a picture with a cat but no dog? The algorithm, never having been trained on cats, might well identify this as a picture with a dog. It will need another million photographs of cats to learn to recognize them as a separate animal. (Compare this to what a 2-year-old can do given a few examples of dogs and cats!)

The major limitation here is not the training itself—computers can scan and analyze photographs extremely fast—but rather the availability of enough training data. Training a visual AI system requires a huge number of photographs that have been annotated to let the computer know what is in them, and this annotation needs to be much more sophisticated than simple binary classifications.

At least three different sorts of challenges must be overcome to make AI systems more reliable. The first is that large language models are not cost-efficient to scale. One can reduce the frequency of errors by adding more data to the training set, but there are diminishing returns because as the number of errors drops linearly, the computing needs (data, computing time, and human supervision) increase exponentially. The second is a long-tail problem: in any dataset, a large percentage of the items will be repetitious and redundant. Meanwhile, many items are much less popular and have only relatively few entries. The third problem is temporal misalignment: large language models become stale over time. Studies have shown that these models are best on questions about things about 5 years before the cutoff date and then get consistently worse as the time point for the question gets closer to the cutoff point (14). This is because in reality the models are pretrained with data that are significantly older than the claimed cutoff date, so a huge percentage of the data that the model is trained on is from several years before the cutoff date.

Human–AI interaction challenges

Beyond the technical challenges inherent in AI systems themselves, a crucial set of challenges emerges at the interface between humans and AI systems. The modeling of human–AI (HAI) and human–machine team interactions has evolved from decades of research in human systems, human–computer, and human–robot interactions. These interactions are particularly intriguing as they fundamentally rest on questions of trustworthiness and justified confidence. As noted in recent research (15), the pace at which humans and AI make decisions presents a significant challenge for HAI systems.

Recent research has identified six grand challenges that must be addressed before efficient, resilient, and trustworthy HAI systems can be deployed: developing AI that (ⅰ) is human well-being oriented, (ⅱ) is responsible, (ⅲ) respects privacy, (ⅳ) incorporates human-centered design and evaluation frameworks, (ⅴ) is governance and oversight enabled, and (ⅵ) respects human cognitive processes at the human–AI interaction frontier (16).

Whether humans trust smart machines is a major concern. However, since trust is typically perceived as a binary yes-or-no concept, rather than considering whether end-users will “trust” their AI-enabled machines, we should consider instead how users gain justified confidence in smart systems over time (4). The process of building confidence in any AI-enabled system is continuous and cumulative. It never ends. A user's confidence in a smart system will depend on context: the nature and complexity of the task, the system's previous performance record, the user's familiarity with the system, and so on. In general, continued successful performance in lower risk, lower consequence tasks will give users more confidence in using AI when facing higher risk, higher consequence tasks.

Until users gain more experience teaming with smart machines, they will face the dilemma of placing too much or too little confidence in their AI-enabled systems. This topic is particularly relevant across domains ranging from combat to education to surgery and is especially evident in the examples discussed earlier, from autonomous vehicles to surgical systems.

Social challenges

As is typically the case with a new and powerful technology, engineered AI systems pose a number of significant social challenges related to equity, ethics, and privacy. Since AI and ML are reliant on large datasets for their training, skewed datasets can lead to results that favor some groups and disfavor others. There are only so many large databases that AI can be trained on, and none of them were assembled with an eye toward equity; furthermore, the obvious solution—adding data to a database to balance it out—is impractical, both because of expense and because it is not obvious what sorts of data are needed to balance out a database until skewed or wrong AI answers make it obvious.

In June 2024, Nvidia, the leading maker of the computer chips used in large language models and other AI processes, became the world's most valuable company, surpassing Microsoft, and Apple (17). This indicates just how important AI has already become to the world's economy. That importance will only grow as different uses are developed for AI and engineered AI systems. But who will benefit from that revolution, and who will be left behind?

From an equity perspective, no part of society must bear a disproportionate share of the costs of the coming AI or get less than its share of the benefits. In particular, AI technologies may cost workers their jobs in certain areas and open up new types of jobs for workers in other areas. If the job losses fall mainly on workers of one group (say, those with no more than a high school education) and the job gains are taken advantage of by workers of another group (say, those with a college or postgraduate education), then AI will have an inequitable impact on society.

None of these are new concerns—they are discussed with each new disruptive technology that appears. Most recently, much was said about how the digital revolution—computers, the internet, smartphones, and so on—was having a disparate impact on society. Nonetheless, it is still important to consider what steps might be taken to limit the inequities produced by coming AI technologies. For instance, the educational system could take steps to ensure that training in AI-related subjects is spread equitably. Given how difficult this task has proven to be in the more general area of STEM education, it seems likely that various inequalities will appear and persist in the specific area of AI-related education, but the effort should still be made.

Beyond equity concerns, various ethics issues are raised by AI technologies (18). Using large datasets to train AI algorithms raises significant privacy concerns. Will people whose data are found in those datasets find their privacy threatened? Should they be able to keep their data from being used to train AI algorithms in the first place? A related question is whether people whose data are used to train an AI system should share in any profits that system generates. This is particularly relevant for people who generate original content—writers, artists, photographers, and musicians—that is used to train generative AI. If an AI-generated song becomes a top 40 hit, should the musicians whose songs were used in training share in the proceeds?

The question of data ownership becomes increasingly critical as data becomes the vital “fuel” for AI systems. Does a principal research investigator own the lab's data? Does the university own it? Does the funding agency own it? Who should enjoy monopolistic rights to the data? The answers may differ depending on whether the data relate to individuals or materials, but these questions need resolution as AI systems become more prevalent.

Policy challenges

As AI becomes increasingly powerful and can make various decisions with or without human oversight, questions of governance and policy become increasingly urgent. Traditionally, transparency has been a key element required of decision-making since it is difficult to trust a process or believe it is fair if it has not been done transparently. However, by their very nature, AI processes are not transparent—an AI algorithm is trained on a database that is far too large for a human mind to comprehend, then acts as a black box when answering questions or making decisions. The algorithm uses its training data to reach answers in ways that human minds have not designed but rather emerge from the data patterns themselves. Furthermore, AI algorithms typically cannot supply reasons for their decisions—although large language models like ChatGPT will sometimes generate plausible-sounding but potentially incorrect explanations when asked.

A central policy challenge is determining how much autonomy to give to engineered AI systems and what sort of human oversight remains necessary. This is a critical issue for AI-driven cars, AI-piloted aircraft, closed-loop anesthesia-delivery systems, robotic surgeries, and autonomous weapons systems. The greater the potential damage from a mistaken AI-directed action, the less likely an AI system will be given full autonomy with no human oversight. While AI systems are already autonomously driving cars with passengers, these may be the riskiest systems given autonomy in the near term. It seems unlikely that AI-flown commercial airlines with no human pilots will appear soon. And despite science fiction predictions, it is difficult to imagine nuclear weapons ever being controlled by AI systems without human oversight.

Currently, there are no established guidelines—and little systematic analysis—concerning appropriate levels of AI system autonomy. Instead, different entities are trying various approaches that amount to real-world experimentation. In ground transportation, for instance, Wuhan, China, has 500 fully autonomous taxis operating without backup drivers (19), while Phoenix, Los Angeles, and San Francisco are conducting more limited trials. San Francisco's autonomous minibuses on Treasure Island retain human drivers for emergencies. Some Tesla drivers ignore warnings and use autopilot with minimal oversight, but the safety implications remain unclear without comprehensive data on autonomous miles driven.

Similar policy experiments are occurring wherever engineered AI systems are being deployed, from military drones to surgical robots to manufacturing, with operators determining appropriate autonomy levels case by case. While these experiments will inform policy development, this “figure it out as you go” approach has limitations—by the time optimal policies become clear, the technology may be too established to be significantly shaped by them. Technologies are most effectively guided by policy early in their development, precisely when the appropriate policies are hardest to determine.

Nevertheless, given their potential societal impact, developing thoughtful policies for engineered AI systems is critical. While much policy discussion focuses on risk management and maintaining US AI leadership, growing attention is being paid to beneficial AI deployment. A comprehensive national AI adoption plan could address key questions: Who bears responsibility for this technology? Which applications have the most potential and importance? How can we make decisions about a technology whose capabilities remain uncertain? How can we embed ethical principles in engineered AI systems so they embody democratic values and contribute to human flourishing?

Summary

We discussed several critical issues in the design of engineered AI systems. As AI has grown in power, we are witnessing many efforts to combine AI with engineered systems to create machines that can gather information from their environments, make decisions, and then take actions based on those decisions, with or without human supervision. The integration of AI and engineering has the potential to revolutionize nearly every part of the designed world, from transportation and manufacturing to medicine, consumer goods, and military technology. The potential comes not just from adding the capabilities of AI to engineered systems but from the expected synergistic effects of adding the two technologies, as engineered AI systems are typically much more than the sum of their parts. While our discussion focused primarily on autonomous vehicles, the principles and challenges we identified likely extend to other domains such as space exploration, agricultural robotics, and other emerging applications. Still, many technical challenges, such as safety and assurance in novel environments, bias interpretability, and societal implications due to potential displacements in employment, must be addressed to realize the full potential of engineered AI systems. Future work should explore how these challenges manifest across different domains of autonomous systems, particularly given the current landscape where AI development is concentrated among a small number of large companies. Additionally, as these systems become more prevalent, there is a pressing need for research into appropriate regulatory frameworks that can address questions of safety, liability, and governance across various applications of engineered AI systems—determining not just what should be regulated, but how to implement effective oversight while fostering innovation.

Acknowledgments

The authors thank Monty Alger, Jaafar El-Awady, John Baras, Behtash Babadi, James Bellingham, Emery Brown, Daniel Castro, Misha Chertkov, Jacqueline Chen, Paulette Clancy, Lisa Cooper, Munmun De Choudhury, Mark Dredze, Satyandra “SK” Gupta, Ranu Jung, Yannis Kevrekidis, Daniel Khashabi, Luis Kun, Michael Littman, K.J. Ray Liu, Dinesh Manocha, Derek Paley, Sudip Parikh, K.T. Ramesh, Kishan Sabnani, Abhinav Shrivastava, Thomas Strat, Russell Taylor, Matthew Turek, Rene Vidal, and Alan Yuille for their thoughtful contributions and insights that informed this piece. We are appreciative of Denis Wirtz, Julie Messersmith, and Chasmine Stoddart for supporting the program, Robert Pool for synthesizing the discussions, and Melissa Moore-Esquivel, Dorothea Nikas, and Janel Cummings for logistics.

Contributor Information

Ramalingam Chellappa, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21218, USA; School of Medicine, Johns Hopkins University, Baltimore, MD 21218, USA.

Guru Madhavan, National Academy of Engineering, Washington, DC 20001, USA.

T E Schlesinger, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD 21218, USA.

John L Anderson, National Academy of Engineering, Washington, DC 20001, USA.

Funding

The authors declare no funding.

Author Contributions

R.C., G.M., T.E.S., and J.L.A. developed the concept and the draft.

References


Articles from PNAS Nexus are provided here courtesy of Oxford University Press

RESOURCES