Knowledge-driven learning, optimization, and experimental design under uncertainty for materials discovery

Xiaoning Qian; Byung-Jun Yoon; Raymundo Arróyave; Xiaofeng Qian; Edward R Dougherty

doi:10.1016/j.patter.2023.100863

. 2023 Nov 10;4(11):100863. doi: 10.1016/j.patter.2023.100863

Knowledge-driven learning, optimization, and experimental design under uncertainty for materials discovery

Xiaoning Qian ^1,^3,^∗, Byung-Jun Yoon ^1,³, Raymundo Arróyave ², Xiaofeng Qian ², Edward R Dougherty ¹

PMCID: PMC10682757 PMID: 38035192

Summary

Significant acceleration of the future discovery of novel functional materials requires a fundamental shift from the current materials discovery practice, which is heavily dependent on trial-and-error campaigns and high-throughput screening, to one that builds on knowledge-driven advanced informatics techniques enabled by the latest advances in signal processing and machine learning. In this review, we discuss the major research issues that need to be addressed to expedite this transformation along with the salient challenges involved. We especially focus on Bayesian signal processing and machine learning schemes that are uncertainty aware and physics informed for knowledge-driven learning, robust optimization, and efficient objective-driven experimental design.

The bigger picture

Thanks to the rapid advances in artificial intelligence, AI for science (AI4Science) has emerged as one of the new promising research directions for modern science and engineering. In this review, we focus on recent efforts to develop knowledge-driven Bayesian learning and experimental design methods for accelerating the discovery of novel functional materials as well as enhancing the understanding of composition-process-structure-property relationships. We specifically discuss the challenges and opportunities in integrating prior scientific knowledge and physics principles with AI and machine learning (ML) models for accelerating materials and knowledge discovery. The current state-of-the-art methods in knowledge-based prior construction, model fusion, uncertainty quantification, optimal experimental design, and symbolic regression are detailed in the review, along with several detailed case studies and results in materials discovery.

Developments in the application of signal processing and machine learning methods for the discovery of novel materials can shift the current trial and error practices to informatics-driven discovery, which can reduce cost and time. This work reviews the recent developments in knowledge-driven Bayesian learning, model fusion, uncertainty quantification, optimal experimental design, and automated knowledge discovery in a historical context of robust signal processing and robust decision-making in materials science applications.

Introduction

Accelerating the development of novel functional materials with desirable properties is a worldwide imperative because it can facilitate advances in diverse fields across science, engineering, and biomedicine with significant potential contributions to economic growth. For example, the US Materials Genome Initiative (MGI) calls for cutting the cost and time for bringing new materials from discovery to deployment by half by integrating experiments, computer simulations, and data analytics.¹^,² However, the current prevailing practice in materials discovery primarily relies on trial-and-error experimental campaigns or high-throughput virtual screening approaches by computational simulations, neither of which can efficiently explore the huge materials design space to develop materials that possess targeted functional properties.

To fundamentally shift the current trial-and-error practice to an efficient informatics-driven practice, there have been increasing research efforts to develop signal processing (SP) and machine learning (ML) methods that may ultimately enable autonomous materials discovery and expedite the discovery of novel materials at a substantially reduced cost and time.³^,⁴^,⁵ When applying SP and ML methods in materials science, several unique challenges arise, which include (1) a limited amount of data (if any) for investigating and exploring new materials systems, (2) data of varying and inconsistent quality because of technical limitations and a lack of common profiling prototypes, (3) significant complexity and uncertainty in existing computational simulation and surrogate models,⁶ and (4) incomplete domain knowledge.

To cope with the aforementioned challenges and to effectively discover novel functional materials with the desired target properties, robust decision-making strategies are critical for efficient exploration of the immense materials design space through effective learning, optimization, and experimental design under significant uncertainty. Directly applying existing data-driven SP and ML methods falls short of achieving these goals, and comprehensive theoretical and methodological developments tailored to address these unique challenges are crucial. Some salient issues that need to be addressed include the following:

(1)
Knowledge-based prior construction: mapping scientific knowledge into a prior distribution that reflects model uncertainty to alleviate the issues stemming from data scarcity
(2)
Model fusion: updating the prior distribution to a posterior distribution with multiple uncertain models and data sources of different data quality
(3)
Uncertainty quantification (UQ): quantification of the cost of uncertainty relative to one or more objectives for efficient materials discovery
(4)
Optimization under uncertainty (OUU): derivation of an optimal operator from the posterior distribution
(5)
Optimal experimental design (OED): efficient experimental design and data acquisition schemes to improve the model to explore the materials design space more effectively
(6)
Knowledge discovery: closing the knowledge gap in the current model, such as composition-process-structure-property (CPSP) relationships relevant to the materials discovery objectives, based on newly acquired data or increased model knowledge

In this article, we will review the recent advances related to the aforementioned research issues. Especially, we will provide an in-depth review of Bayesian SP and ML approaches for knowledge-driven learning, objective-based UQ, and efficient experimental design for materials discovery under substantial model and data uncertainties. The core foundation underlying these strategies is a Bayesian framework that enables mathematical representation of the model and data uncertainties, encoding available domain knowledge into a Bayesian prior, seamlessly integrating experimental (or simulation) data with the domain knowledge to obtain a posterior, quantifying the impact of the uncertainty on the objective, and effective design of strategies that can reduce this uncertainty. It is important to note that the guiding principle of the aforementioned Bayesian framework is to have a (knowledge-based) prior represent an uncertainty class of models. The prior characterizes the state of our knowledge about the model representing the system, based on which we can design operators to achieve the scientific objectives. Artificial intelligence for science (AI4Science) has emerged as an enormous modern research field. Because of the rapidly evolving nature of this field, it is challenging to provide a comprehensive review of all ongoing research efforts, and the readers are strongly encouraged to refer to additional resources, including recent publications in AI4Science⁷^,⁸^,⁹^,¹⁰ as well as those in AI/ML-augmented materials discovery.¹¹^,¹²^,¹³

In the following sections, we first introduce the UQ framework that encompasses the various components in knowledge-driven learning, optimization, and experimental design. This will be followed by in-depth discussion of the individual research themes, where we will review the latest research results along these directions.

Bayesian learning, UQ, and experimental design

Engineering generally aims at optimization to achieve operational objectives when studying complex systems. Because all but very simple systems must account for randomness, modern engineering may be defined as the study of optimal operators on random processes. Besides the mathematical and computational challenges that arise with classical system identification (learning) and operator optimization (control or filtering, for example) problems, such as nonstationary processes, high dimensions, and nonlinear operators, another profound issue is model uncertainty. For instance, with linear filtering there may be incomplete knowledge regarding the covariance functions or power spectra in the case of Wiener filtering. In such cases, not only must optimization of the operator (i.e., filter in this example) be relative to the original cost function but also relative to an uncertainty class of random processes. This naturally leads to the need for postulation of a new cost function that integrates the original cost function with the model uncertainty. If there is a prior (or posterior) distribution governing the likelihood of a model within the uncertainty class, then one can choose an operator that minimizes the expected cost over all possible models in the uncertainty class. In what follows, we first lay out the mathematical foundations pertinent to quantifying and handling model uncertainty and then review relevant existing literature, with recent efforts focusing on materials science research.

Mathematical backgrounds

The design of optimal operators can take different forms depending on the random process constituting the scientific model and the operator class of interest. The operators might be filters, classifiers, or controllers. The underlying random process might be a random signal/image for filtering, a feature-label distribution for classification, or a Markov process for control. Optimal operator design involves a mathematical model representing the underlying (materials) system and a class of operators from which the best operator that minimizes the cost function reflecting the objective should be selected. It takes the general form

ψ_{opt} = \arg \min_{ψ \in Ψ} C (ψ),

(Equation 1)

where $Ψ$ is the operator class, and $C (ψ)$ is the cost of applying operator ψ on the system. The genesis of such an operator design formulation can be traced back to the Wiener-Kolmogorov theory in SP for optimal linear filters developed in the 1930s,¹⁴^,¹⁵ where the operational objective is to recover the underlying signals given noisy observations with the minimum mean squared error (MSE). In this class of filtering problems, the operators mentioned above are filters. The underlying system can be modeled by a joint random process $(X (t), Y (s))$ , $t \in T, s \in S$ . Optimal filtering involves estimating the signal $Y (s)$ at time s via a filter ψ given observations ${X (t)}_{t \in T}$ . A filter $ψ \in Ψ$ is a mapping on the space of possible observed signals, and a cost function takes the form $C (Y (s), \hat{Y} (s))$ , with $\hat{Y} (s) = ψ (X) (s)$ . For fixed $s \in S$ , an optimal filter is defined by Equation 1 with $C (ψ) = C (Y (s), ψ (X) (s)) = E [{(Y (s) - ψ (X) (s))}^{2}]$ . Similar operator design formulations have been adopted in control¹⁶ and, more recently, in ML,¹⁷ where the corresponding operators are controllers that can desirably alter the system behavior or predictive models for system properties of interest (e.g., classifiers). For example, the operator may be a predictor that tries to characterize the property of a given material based on input features (such as its composition and structure).

When the true model is not known with certainty, it would be prudent to consider the entire uncertainty class $Θ$ of possible models that contains that true model $θ \in Θ$ , where θ may be typically a parameter vector specifying the model rather than aiming at accurate inference of the true model. Given $Θ$ , the goal would then be to design a robust operator that guarantees good performance over all possible models. For example, there have been significant research efforts taking a minimax strategy to design robust operators:

ψ_{minimax}^{Θ} = \arg \min_{ψ \in Ψ} \max_{θ \in Θ} C_{θ} (ψ),

(Equation 2)

Where $C_{θ} (ψ)$ characterizes the cost of the operator ψ for model θ. Taking filtering as an example, $C_{θ} (ψ) = C (Y (s; θ), ψ (X) (s; θ))$ , where θ denotes the model parameters for the signal and observation random processes. Such a minimax robust strategy is risk averse because it aims to find an operator whose worst performance over an uncertainty class of models $Θ$ is the best among all operators in $Ψ$ ¹⁸^,¹⁹. Minimax robustness has been applied in many optimization frameworks; for example, for filtering²⁰^,²¹^,²²^,²³ with a general formulation in the context of game theory,²⁴ as well as recently in ML.²⁵^,²⁶ One critical downside of minimax robustness is that, in avoiding the worst-case scenario, the average performance of the designed operator can be poor, in particular when the prior knowledge about the uncertainty class $Θ$ is available and the worst-case model is unlikely. There has been extensive research on alleviating this potential issue by developing risk measures, such as conditional value at risk in a recently proposed risk quadrangle scheme,¹⁸ to achieve better trade-off between the attainment of the operational objective and the aversion of potential risk because of uncertainty.

Unlike such minimax robust strategies, we focus on Bayesian robust strategies that try to optimize the expected performance in the presence of uncertainty. This leads to the design of the intrinsically Bayesian robust (IBR) operator, which is defined as

ψ_{IBR}^{Θ} = \arg \min_{ψ \in Ψ} E_{Θ} [C_{θ} (ψ)],

(Equation 3)

where the expectation is with respect to a prior probability distribution $π (θ)$ of the uncertain model $θ \in Θ$ . While not as risk averse as minimax robust operators, these Bayesian robust operators guarantee optimal performance on average. The prior $π (θ)$ probabilistically characterizes our prior knowledge as to which models are more likely to be the true model than the others. If there is no prior knowledge beyond the uncertainty class itself, then a uniform (non-informative) prior may be used.

Related works

Before we delve into the Bayesian framework for learning, UQ, and experimental design, here we provide a literature review of related topics. We first review the history of operator design, in particular related to filtering, classification, and control. For optimal operator design in filtering, Kalman-Bucy recursive filtering was proposed in the 1960s²⁷ after the Wiener filter.¹⁴^,¹⁵ Optimal control began in the 1950s, as did classification as now understood. In all three areas, it was quickly recognized that often the underlying scientific model would not be known—hence the development of adaptive linear/Kalman filters and adaptive controllers.²⁸^,²⁹ Classification became dependent on classification rules that make no effort to estimate the true feature-label distribution.¹⁷ From the perspective of model uncertainty classes, control theorists delved into Bayesian robust control for Markov decision processes in the work of Bellman and Kalaba,³⁰ Silver,³¹ and Martin³⁰^,³¹^,³² in the 1960s, but computation was prohibitive, and adaptive methods prevailed. Optimal linear filtering was approached via minimax in the late 1970s in the work of Kuznetsov,²⁰ Kassam and Lim,²¹ Poor,²² and Verdu and Poor.²⁴ Model-constrained Bayesian robust (MCBR) MSE linear filtering and classification appeared in the early 2000s.³³^,³⁴

When considering uncertainty in optimization, there has been extensive research in designing different risk metrics for UQ. For example, different values at risk¹⁸ and quantities of interest (QoIs)¹⁹ have been proposed based on different statistics when modeling random processes or the corresponding model parameters as random variables, including the ones based on prediction variance³⁵ and predictive entropy.³⁶^,³⁷ With these risk metrics, different robust operator design strategies have been studied to derive risk-averse operators that can achieve good performance.¹⁹^,²⁵^,²⁶^,³⁵^,³⁷^,³⁸^,³⁹ While introducing additional risk metrics enables balancing the trade-off between the operational objectives and the potential risk (or regret) because of uncertainty, incorporating different metrics with different strategies can be subjective. For example, there may be large predictive variance or entropy, but it may not always directly affect the operational objectives and, thereafter, consequent decision-making.

Bayesian learning and experimental design offers one solution for robust design under uncertainty.¹⁹^,⁴⁰^,⁴¹^,⁴²^,⁴³^,⁴⁴ In this framework, UQ can be naturally measured by the loss of performance because of the utilization of a robust operator to cope with uncertainty. This leads to an experimental design strategy where experiments are selected to optimally reduce this performance loss, following the early thinking of Bayesian robust filtering and control.¹⁵^,³⁰^,³¹^,³² Such an experimental design framework, rooted in the foundation of modern engineering, closes the loop from scientific knowledge on a complex system, models for the complex system under uncertainty, data generated by the system, and experiments to enhance the current system knowledge to better attain the objectives. In this paper, we focus on this closed-loop framework, which distinguishes itself from (1) other existing schemes that are purely data driven⁴⁵^,⁴⁶^,⁴⁷^,⁴⁸ or (2) experimental design frameworks based on high-throughput simulations, such as $Π$ 4U⁴⁹ and DAKOTA.⁵⁰

Data-driven frameworks heavily depend on the availability of data, upon which “black box” surrogate models are trained. They typically model the operators (used to achieve the objectives) of interest rather than modeling the system itself when designing experiments. For example, in materials discovery, many existing methods rely on Bayesian optimization (BO), which uses Gaussian processes (GPs) as surrogate models to directly approximate the target materials properties as “black box” functions.⁵¹^,⁵²^,⁵³ While BO may be useful for optimizing the properties, the acquired data do not improve our knowledge regarding the materials system. As a consequence, there is often a scientific gap in making prior assumptions on these “black box” models and their uncertainty.⁵⁴ To better integrate scientific knowledge, such as materials’ process-structure-property relationships, as detailed under “Knowledge-driven prior construction,” the model uncertainty should be directly imposed on the system model that incorporates inter-relationships among the underlying random processes. For simulation-based frameworks, including $Π$ 4U and DAKOTA, UQ, sensitivity analysis, and experimental design are mostly based on forward model simulations, which do not provide a natural way to propagate the data generated by the selected experiments back to the system to fill the gap in our system knowledge and to improve the current model, which is precisely what our proposed paradigm aims to do. The emphasis here is that (1) the uncertainty is placed directly on the underlying random process (i.e., current knowledge regarding the materials system) and not on surrogate models that reflect operational performance on this uncertain process and that (2) the experimental design is centered around attaining specific objectives. A wide range of approaches can emerge, depending on the assumptions made regarding the uncertainty class, action space, and experiment space. Popular Bayesian experimental design policies, such as knowledge gradient (KG)⁴⁶^,⁴⁷ and efficient global optimization (EGO),⁴⁵ are special cases in this framework under their modeling assumptions. These approaches often adopt generic surrogate models with the uncertainty placed on the reward function; therefore, there is no direct connection between the prior model assumptions and the underlying process/system.

Because of these characteristics, Bayesian frameworks have been increasingly used to address a wide range of materials discovery problems.⁵⁵^,⁵⁶^,⁵⁷^,⁵⁸^,⁵⁹ BO’s ability to balance the exploration and exploitation is ideally suited in materials discovery tasks because queries to the materials design space (either through computations or experiments) are extremely resource intensive. Most approaches focused on materials discovery are myopic in the sense that increased knowledge of the materials space being explored is not necessarily part of the objective. In other cases, Bayesian learning is used to increase knowledge of the physics underlying observed physical phenomena without much attention being put on improving the materials’ performance relative to the existing state of the art.⁶⁰^,⁶¹^,⁶²^,⁶³^,⁶⁴ In materials discovery applications, the complexity and stochasticity because of substantial model and data uncertainty call for SP and ML approaches in a Bayesian setting that can provide a unified closed-loop framework for objective-based learning and optimal design of robust operators and effective experiments under uncertainty. This is illustrated in Figure 1.

Illustration of the knowledge-driven optimal experimental design (OED) cycle for materials discovery

IBR operator and mean objective cost of uncertainty (MOCU)-based UQ

In this section, we focus on the objective-based UQ (objective-UQ) framework using the MOCU,⁶⁵^,⁶⁶ which measures the expected loss with respect to the final operational objective because of the model uncertainty. Uncertainty is directly imposed on the model representing the underlying system and not on the parameters of the operator, as typically done in the ML community. Because the uncertainty is on the system model, reduction of this uncertainty inevitably leads to improving our knowledge regarding the system, leaving no discrepancy between what is learned (through data acquisition or experiments) about the model and what we know about the underlying system (and the relevant science).

Consider a stochastic model $M$ with uncertainty class $Θ$ composed of possible parameter vectors. Let C be a cost function and $Ψ$ a class of operators on $M$ . For each operator $ψ \in Ψ$ , $C_{θ} (ψ)$ denotes the cost of applying ψ on the model parametrized by $θ \in Θ$ . An IBR operator on $M$ is an operator $ψ_{IBR} \in Ψ$ so that the expected value over $Θ$ of the cost $C_{θ} (ψ)$ is minimized by $ψ_{IBR}$ as formulated in Equation 3,⁶⁷ the expected value being with respect to a prior probability distribution $π (θ)$ capturing model uncertainty over $Θ$ . Here, each parameter vector $θ \in Θ$ corresponds to a model, and $π (θ)$ quantifies the likelihood that a model is $Θ$ and therefore reflects prior knowledge. If there is no prior knowledge beyond the uncertainty class itself, then it is taken to be uniform with all models being equally likely. Given a data sample S sampled independently from the full model, the IBR theory can be used with a posterior distribution $π^{*} (θ) = π (θ | S)$ , giving the optimal Bayesian operator. Because of the optimality of the IBR operator $ψ_{IBR}$ over $Θ$ , $E_{Θ} [C_{θ} (ψ_{IBR})] \leq E_{Θ} [C_{θ} (ψ)]$ for any operator ψ. For $θ \in Θ$ , the objective cost of uncertainty relative to θ is the difference between $C_{θ} (ψ_{IBR})$ and $C_{θ} (ψ_{θ})$ . Averaging this loss differential provides our basic UQ, the MOCU:⁶⁵

MOCU (Θ) = E_{θ} [C_{θ} (ψ_{IBR}) - C_{θ} (ψ_{θ})],

(Equation 4)

where $ψ_{θ}$ denotes the optimal operator with respect to the model specified by the model parameter θ. The expectation is computed with respect to the distribution $π (θ)$ of the model θ in the uncertainty class $Θ$ .

While the entropy of the prior (or posterior) has been commonly used to measure model uncertainty, entropy, however, does not focus on the objective. In other words, there may be large entropy, but it may not directly affect the operational objective because it may not affect the expected cost in Equation 13. Unlike entropy, MOCU aims to quantify the uncertainty that practically “matters” as it pertains to a specific objective (Figure 1).

IBR (with a prior) and optimal Bayesian (with an updated posterior given observed data) operator design have been applied in modern engineering, statistics, and ML. Based on different operators (for example, for filtering, classification, and control) and their corresponding cost functions, the research focus has been mostly on solving the corresponding inference and optimization problems, known as Bayesian learning or Bayesian inverse problems.⁶⁸^,⁶⁹^,⁷⁰ When systems understanding and operator design are the objectives of modeling complex systems, Bayesian experimental design and decision-making are often with respect to the uncertainty class of models and the cost function related to the operator of interest. More importantly, MOCU provides a natural measure for the cost of uncertainty that quantifies the potential operator performance degradation because of uncertainty, directly focusing on operational objectives. Therefore, this IBR-MOCU framework not only provides the robust operator design and objective-oriented UQ but also leads to experimental design to choose an experiment to optimally reduce performance loss by adding to existing scientific knowledge. The IBR-MOCU paradigm follows in line from the early thinking of Wiener and Kolmogorov, and it extends and unifies previous work on robust filtering, classification, and control. The historical context of the IBR-MOCU framework is depicted in Figure 2.

Illustration of the historical context of the intrinsically Bayesian robust (IBR) framework and the concept of mean objective cost of uncertainty (MOCU)

In the following sections, we focus on recent developments on the corresponding components of this IBR-MOCU framework, including prior construction, model fusion, OED, and automated feature engineering for knowledge discovery in the context of materials science applications.

Knowledge-driven prior construction

The first challenge of applying SP/ML methods in the MOCU framework to materials science is modeling and quantifying uncertainty because there rarely exist sufficient data for satisfactory system identification because of the enormous search space and the complicated CPSP relationships.⁴ Small samples are commonplace in materials applications, in particular when the research focus is to discover novel complex functional materials. Thereafter, if prior knowledge, such as physics principles, may help constrain the SP/ML model space, it is critical to utilize these in systems modeling.⁷¹^,⁷²^,⁷³ While Bayesian methods naturally model the uncertainty because of their distribution-based nature to treat model parameters as random variables, the salient obstacle confronting Bayesian methods is how to appropriately impose model prior.

Regarding prior construction, Jaynes⁷⁴ has remarked, “… there must exist a general formal theory of determination of priors by logical analysis of prior information—and that to develop it is today the top priority research problem of Bayesian theory.” However, the most common practice of Bayesian methods is to adopt either non-informative or conjugate prior for computational convenience. When there are limited data or strong scientific prior knowledge, it is precisely then that the formal structure as commented by Jaynes⁷⁴ is critical for appropriate prior construction.

In this section, we first briefly review traditional prior construction methods and then focus on the formal structure for prior construction involving a constrained optimization, in which the constraints incorporate existing scientific knowledge augmented by slackness variables. The constraints tighten the prior distribution in accordance with prior knowledge while at the same time avoiding inadvertent over-restriction of the prior, an important consideration with small samples.

Traditional priors

Starting from Jeffreys’⁷⁵ non-informative prior, there was a series of information-theoretic and statistical methods: maximal data information priors (MDIP),⁷⁶ non-informative priors for integers,⁷⁷ entropic priors,⁷⁸ reference (non-informative) priors obtained through maximization of the missing information,⁷⁹ and least informative priors.⁸⁰ As discussed in the literature,⁸¹^,⁸²^,⁸³ the principle of maximum entropy can be seen as a method of constructing least informative priors,⁸⁴^,⁸⁵ though it was first introduced in statistical mechanics for assigning probabilities. Except in the Jeffreys’⁷⁵ prior, almost all of the methods are based on optimization: maximizing or minimizing an objective function, usually an information theoretic one. The least informative prior⁸⁰ is found among a restricted set of distributions, whereas the feasible region is a set of convex combinations of certain types of distributions. Zellner⁸⁶ proposed several non-informative and informative priors for different problems. All of these methods emphasize the separation of prior knowledge and observed sample data.

A priori knowledge in the form of graphical models (e.g., Markov random fields) has also been widely utilized to either constrain the model space (for example, in covariance matrix estimation in Gaussian graphical models)⁸⁷^,⁸⁸ or impose regularization terms.⁸⁹ In these studies, using a given graphical model illustrating the interactions between variables, different problems have been addressed; e.g., constraints on the matrix structure⁸⁷^,⁹⁰^,⁹¹ or known independencies between variables.⁸⁸^,⁹² Nonetheless, these studies rely on a fundamental assumption: the given prior knowledge is complete and hence provides one single solution. However, in many applications, the given prior knowledge is uncertain, incomplete, and may contain errors. Therefore, instead of interpreting the prior knowledge as a single solution (e.g., a single deterministic covariance matrix), we aim to construct a prior distribution on an uncertainty class.

CPSP relationships in materials science

A central tenet in the field of materials science and engineering is that the processing history controls the material’s internal structure, which, in turn, controls the effective (macroscale) properties or performance characteristics exhibited by the material. Exploration and exploitation of the materials space thus necessitate the generation of CPSP linkages.⁹³^,⁹⁴ Given the multiscale nature of the material’ structures,⁹³ such (abstract) sets of CPSP linkages can be visualized as a large connected and nested network of models that mediate the flow of information about the material’s state and behavior up and down the scales.

Any single model in this large network of models can be formally expressed as $f (μ, φ)$ , where μ represents the appropriate CPSP variables (i.e., related to process history, material structure, or material property), and φ denotes variables describing the physics controlling the material phenomenon of interest. Established domain knowledge can be used to construct a prior on φ. Seeking $f (μ, φ)$ allows us to explicitly capture physics in formulating our ML/AI models. This allows us to use physics-based simulation data to train $f (μ, φ)$ by independently varying μ and φ. Given the enormous challenges associated with the development of concurrent multiscale CPSP relationships, materials analysis tends to be carried out (most of the time) at different, not necessarily strongly coupled scales. At the mesoscale level and beyond (i.e., larger than the atomic scale), several efforts have been made to predict materials’ behavior by using data-driven approaches. Most successful efforts at this scale have exploited low-dimensional representation of microstructure information to build effective property models.⁹⁵ To date, however, there is not much work on the direct use of physical principles to constrain the models used to establish these CPSP linkages. In this regard, more success has been achieved when considering the structure-property connections at the atomic scale.

From the atomic point of view, materials are fundamentally composed of atoms of similar or different types of chemical elements located on real-space sites. The equilibrium atomic structures of materials are reached through the minimization of total energy originated from the complex interaction among ions and electrons in the presence/absence of the external field. It consists of the Coulomb and kinetic energy of electrons and ions and the additional important contributions from quantum mechanical effects, such as (1) exchange energy because of the fermionic spin statistics of electrons, (2) static and dynamical correlation energy beyond the single Slater determinant approximated electronic wave functions, and (3) nuclear quantum effects when tunneling and delocalization of ions become important.⁹⁶ Recently, a graph convolutional neural network has been applied to describe crystal and molecular structures of materials because atoms and bonds can be perfectly represented by graph nodes and edges, respectively. Recent examples include the crystal graph convolutional neural networks (CGCNN),⁹⁷ the improved CGCNN (iCGCNN),⁹⁸ the materials graph network (MEGNet),⁹⁹ etc. An underlying physical prior hypothesis is the locality of interactions; that is, the physical knowledge of interest can be learned from the local chemical interactions. For example, in the CGCNN,⁹⁷ the feature vector $v_{i}$ for atom i is updated via iterative convolution as

v_{i}^{(t + 1)} = v_{i}^{(t)} + \sum_{j, k} σ (z_{{(i, j)}_{k}}^{(t)} W_{f}^{(t)} + b_{f}^{(t)}) ⊙ g (z_{{(i, j)}_{k}}^{(t)} W_{s}^{(t)} + b_{s}^{(t)}),

(Equation 5)

where $z_{{(i, j)}_{k}}^{(t)} = v_{i}^{(t)} \oplus v_{j}^{(t)} \oplus u_{{(i, j)}_{k}}$ is the concatenated neighbor vector consisting of atom i’s feature vector $v_{i}$ , feature vector $v_{j}$ of atom j located on the k-th bond of atom i, and the corresponding bond feature $u_{{(i, j)}_{k}}$ . σ is a sigmoid function, and g is a nonlinear softmax activation function. $W$ and $b$ denote the convolution weight matrix and bias of the corresponding layer, respectively. In these convolutional filters, the summation only runs through the local neighboring sites via local coordination determination⁹⁷ or Voronoi tessellation.⁹⁸ The results from these graph convolutional neural network approaches are promising because it is generally true that the physical interaction decreases as the distance of $(i, j)$ atom pair (i.e., bond length), increases. This a priori physical knowledge is built inside these graph networks as an implicit constraint. While the bare Coulomb operator decays slowly with $1 / r$ , the destructive interference of electronic wave functions in many-particle systems leads to the nearsightedness of electronic matter in the absence of long-range ionic interactions;¹⁰⁰^,¹⁰¹ i.e., local electronic properties, such as electron density, depend mostly on the effective external potential at nearby locations. However, for ionic systems, the long-range Coulomb interaction can have a non-negligible contribution to the total energy and atomic forces even when the $(i, j)$ atom pair is separated far away, and further consideration to include these long-range interactions will be of great importance to more accurate describe the physical properties of ionic materials. In addition to these interaction-based physics principles, another important consideration when developing ML methods for materials systems is to make sure that the input feature and the derived descriptor representations should be invariant to the symmetries of the system, such as rotation, reflection, translation, and permutation of atoms of the same species. Kernel-based methods and topological invariants based on group theory have been recently investigated to help improve the accuracy of predictions in the ML modeling of solid state materials.¹⁰²

Maximal knowledge-driven prior (MKDIP) construction

Knowledge-driven prior construction utilizes first principles and expert domain knowledge to alleviate the model/data uncertainty and the small sample size issues through constraining the model space or deriving the uncertainty class of models based on physical and chemical constraints. Incorporating scientific knowledge to directly constrain Bayesian predictive models can achieve robust predictions, which would be impossible by using data alone. In materials science, there is a substantial body of knowledge in the form of phenomenological models and physical theories for prior construction. Such knowledge can be used in choosing features or descriptors and constrain the model space for predicting novel materials with desired properties.

To translate more general materials knowledge into Bayesian learning, a general prior construction framework can be developed to map the known physical, chemical, and structural constraints into prior distributions in Bayesian learning. We have proposed such a framework, capable of transforming any source of prior information to prior probabilities given an uncertainty class of predictive models.¹⁰³^,¹⁰⁴^,¹⁰⁵^,¹⁰⁶ We call the final prior probability constructed via this framework an MKDIP. The new MKDIP construction constitutes two steps: (1) functional information quantification, where prior knowledge manifested as functional relationships is quantified as constraints to regularize the prior probabilities in an information theoretic way, and (2) objective-based prior selection, where, by combining sample data and prior knowledge, we build an objective function in which the expected mean log likelihood is regularized by the quantified information in step (1). As a special case, where we do not have any sample data, or where there is only one data point available for constructing the prior probability, the proposed framework is reduced to a regularized extension of the maximum entropy principle (MaxEnt).¹⁰⁷

By introducing general constraints, which can appear as conditional statements based on expert domain knowledge or physics principles, the idea here is to maximally constrain the model uncertainty with respect to the prior knowledge characterized by these constraints. To give a simple example, assuming that we know a priori, based on physics principles, that certain microstructural properties R for a target material are determined by its composition X, we then can derive the corresponding constraint $E_{π} [H_{θ} (R | X)] \leq ξ$ , where $H_{θ} (R | X)$ denotes the conditional Shannon entropy of R given X under the probabilistic model determined by θ. If our prior knowledge is correct, then $H_{θ} (R | X) \to 0$ for any appropriate model. Hence, under the uncertainty characterized by the prior distribution $π (θ)$ , we aim to derive the MKDIP with the expected conditional entropy as small as possible. Depending on different types of prior knowledge, we can write different forms of such constraints. Specifically, the MKDIP construction integrates materials science and statistical learning by (1) model prior knowledge quantification, where general materials knowledge, from physical theories or expert domain knowledge, is quantified via quantitative constraints or conditional probabilities and (2) optimization, where MKDIP construction requires solving the constrained optimization problems depending on different applications and data types of available observed measurements. When sufficient data exist, we can also split the data for prior construction and for updating the posterior, appropriately integrating prior knowledge and existing data.

In particular, MKDIP aims to derive the solution to the following optimization problem:

\arg \min_{π \in Π} E_{π} [C_{θ} (ξ, D)],

(Equation 6)

where $Π$ is the set of all proper priors, and $C_{θ} (ξ, D)$ is a cost function that depends on 1 θ, the random vector parameterizing the underlying probability distribution; (2) ξ, our state of (prior) knowledge; and (3) D, partial observations. Alternatively, by parameterizing the prior probability as $π (θ; γ)$ , with $γ \in Γ$ denoting the hyperparameters, the MKDIP can be found by solving

\arg \min_{γ \in Γ} E_{π (θ; γ)} [C_{θ} (ξ, D)] .

(Equation 7)

We have considered cost functions $C_{θ}$ that can be decomposed into three terms:¹⁰⁶

C_{θ} (ξ, D) = λ_{1} [(1 - β) g_{θ}^{(1)} (γ) + β g_{θ}^{(2)} (D)] + λ_{2} g_{θ}^{(3)} (ξ),

where β, $λ_{1}$ , and $λ_{2}$ are non-negative regularization parameters. Here, $g^{(1)} (\cdot)$ denotes the information-theoretic cost, which can take different forms, including MaxEnt;¹⁰⁷ $g^{(2)} (\cdot)$ is the cost that involves the partially observed data when they are available, including regularized MDIP and regularized expected mean log likelihood prior;¹⁰³ and, more critically, $g^{(3)} (\cdot)$ denotes the knowledge-driven constraints that convert prior knowledge into functional constraints to further regularize the prior as detailed in Boluki et al.¹⁰⁶ Using this cost function, we formulate the MKDIP construction problem as the following optimization problem:

\begin{array}{c} {argmin}_{γ \in Γ} E_{θ} [(1 - β) g_{θ}^{(1)} (γ) + β g_{θ}^{(2)} (D)] \\ Subject to : E_{θ} [g_{θ, i}^{(3)}] > 0; i \in {1, \dots, n_{c}}, \end{array}

(Equation 8)

where $g_{θ, i}^{(3)}$ , $\forall i \in {1, \dots, n_{c}}$ , are constraints resulting from our state of knowledge ξ via the mapping $T$ : $ξ \to E_{θ} [g_{θ, i}^{(3)}] > 0$ , $\forall i \in {1, \dots, n_{c}}$ ; for example, based on the aforementioned composition-structure relationship $E_{π} [H_{θ} (R | X)]$ . The overall MKDIP scheme is illustrated in Figure 3.

Illustration of knowledge-based prior construction via MKDIP

In contrast to non-informative priors, MKDIP aims to incorporate the available prior knowledge and uses part of the data to construct an informative prior. While, in theory, the observed data can be entirely used in the optimization problem in Equation 8, in practice one should be cautious to avoid overfitting to the given data. The MKDIP construction here introduces a formal procedure for incorporating prior knowledge. It allows the incorporation of the knowledge of functional relationships and any constraints on the conditional probabilities. Finally, we shall note that deriving the solution to the MKDIP optimization problem Equation 8 can be challenging because of the non-convexity of the objective function and constraints. Nevertheless, feasible and local optimal solutions, especially with the specific distribution families and constraint forms, can be derived.¹⁰³

Integrating prior knowledge in materials science

Xue et al.⁵² have applied Bayesian learning and experimental design based on materials knowledge using results from the Landau-Devonshire theory for piezoelectric materials. In particular, a Bayesian regression model,⁵⁴ constrained by the Landau functional form and the constraints on morphotropic phase boundaries (MPBs), was developed to guide the design of novel materials with the functional response of interest and to help navigate the search space efficiently so that the desired composition can be achieved in a few trials. The Landau-Devonshire theory has been widely used to reproduce phase diagrams for many piezoelectrics and to investigate their performance at the MPB. The ferroelectric nanodomain phases can be characterized by different polarization vectors, $\vec{p} = \vec{n} p$ , where $\vec{n} = {[n_{1}, n_{2}, n_{3}]}^{T}$ is a unit vector in the direction of polarization, and p is its magnitude.¹⁰⁸ The free energy, g, of the ferroelectric system (e.g., ${B a T i O}_{3}$ -based piezoelectrics) can be described by a Landau polynomial that depends on the modulus of the polarization vector (p) and the polarization direction ( $\vec{n}$ ) at a given temperature τ:

g (p, \vec{n}; τ) = \frac{α}{2} (n_{1}^{2} + n_{2}^{2} + n_{3}^{2}) p^{2} + \frac{β_{1}}{4} {(n_{1}^{2} + n_{2}^{2} + n_{3}^{2})}^{2} p^{4} + \frac{β_{2}}{4} (n_{1}^{4} + n_{2}^{4} + n_{3}^{4}) p^{4} + \frac{γ_{1}}{6} {(n_{1}^{2} + n_{2}^{2} + n_{3}^{2})}^{3} p^{6} + \frac{γ_{2}}{6} (n_{1}^{6} + n_{2}^{6} + n_{3}^{6}) p^{6} + \frac{γ_{3}}{6} (n_{1}^{2} n_{2}^{2} n_{3}^{2}) p^{6},

where the coefficients α, β′s, and γ′s are materials dependent and often determined from experiments; for example, $β_{2} (τ, x)$ depends on the temperature (τ) and composition (x). The MPB is a phase boundary where the two phases (i.e., tetragonal [T] and rhombohedral [R] phases in ${BaTiO}_{3}$ -based piezoelectrics) coexist and have degenerate free energy. Therefore, at MPB ( $τ = τ_{M P B}$ and $x = x_{M P B}$ ), $g_{T} = g_{R}$ , which leads to

β_{2} (τ_{M P B}, x_{M P B}) + \frac{24 γ_{2} - γ_{3}}{27} p_{e q}^{2} (τ_{M P B}, x_{M P B}) = 0,

$p_{e q}^{2} (τ, x)$ denotes the polarization at equilibrium and has the functional form $ρ (τ - τ_{C} (x))$ , where ρ is a constant, and $τ_{C} (x)$ is the composition-dependent Curie temperature. Based on these relationships (more details can be found in Xue et al.⁵²), the MPB curve has the following quadratic form:

τ_{M P B} (x) = ω_{1} x^{2} + ω_{2} x + C_{1},

where $ω_{1}$ , $ω_{2}$ , and $C_{1}$ are the corresponding model parameters to learn from experimental data. This serves as the prior knowledge to constrain our Bayesian regression model to map the material composition x to the MPB curves.

As illustrated in Figure 4, with the minimal collected data (only 20 characterized BaTiO₃-based piezoelectrics), the Bayesian regression model with the aforementioned functional constraints provides reliable phase boundaries and faithful uncertainty estimates. More importantly, we demonstrated our approach for finding BaTiO3-based piezoelectrics with the desired target of a vertical MPB. We have predicted, synthesized, and characterized a solid solution, (Ba_0.5Ca_0.5)TiO₃-Ba(Ti_0.7Zr_0.3)O₃, with piezoelectric properties showing better temperature reliability than other BaTiO₃-based piezoelectrics in our initial training data.

Bayesian learning and experimental design constrained by the Landau functional for discovery of BaTiO₃-based piezoelectrics as described in the text

Shown are predicted (solid lines) and experimental (dots) phase diagrams for BZT-m50-n30, together with uncertainty estimates, from Bayesian regression. The solid lines show the mean phase boundaries, and the dashed lines mark the 95% confidence intervals. Notice the uncertainty reduction given more data.

When the prior knowledge, including different functional forms and constraints, is available, the MKDIP framework can help take the best advantage of them to explicitly determine the predictive models as well as their corresponding predictors for specific functional responses of interest. Besides such explicit functional-form prior knowledge, which allows us to directly constrain predictive models, the existing prior knowledge on CPSP relationships may simply be in the form of correlation, conditional relationships, and inequality constraints. To enable users, especially materials domain experts, to easily explore and integrate existing phenomenological knowledge into Bayesian learning, infrastructure and friendly user interfaces should be developed to help prior construction via active knowledge acquisition from either materials scientists or even more recent large language foundation models as the unprecedented knowledge base.⁹^,¹³ The current practice is mostly hand crafted based on different problems and how data scientists work with their collaborating materials scientists. More interfacing efforts between data scientists and materials domain experts are required to achieve more synergistic collaboration in materials science.

Bayesian model averaging (BMA) with experimental design

With a derived surrogate model, we would like to exploit it in combination with experiments to accelerate the development of new materials. However, often, because of incomplete prior knowledge, there are multiple feasible surrogate models within the uncertainty class. We further explore a Bayesian experimental design framework that is capable of adaptively selecting or aggregating competing models connecting materials composition and processing features to performance metrics through BMA.¹⁰⁹^,¹¹⁰

Review on Bayesian model fusion

Bayesian model fusion methods have been studied extensively to achieve better predictive accuracy as well as robust risk and uncertainty estimates.⁷⁰^,¹⁰⁹^,¹¹¹^,¹¹² There are different Bayesian model ensemble strategies stemming from the Bayes’ theorem from Bayesian inference,⁷⁰ including Bayesian model selection, Bayesian model combination, and BMA. They all start with an ensemble of candidate models as the uncertainty model class and then update the model posterior probabilities given observed data. The main difference among these different strategies lies in how the updated posterior probabilities guide the way to derive posterior predictive probabilities. For example, Bayesian model selection aims to identify the best predicting model(s) with different criteria, including the Bayesian information criterion (BIC) and Akaike information criterion (AIC).¹¹³^,¹¹⁴ Bayesian model combination often samples the best model subsets based on the updated model posterior, hoping to achieve better convergence.¹¹⁵ In this paper, we focus on BMA, which essentially relies on the weighted ensemble of the models in the uncertainty class by the model posterior.¹⁰⁹^,¹¹² The theoretical properties of BMA have been studied in the literature. For example, BMA can achieve better prediction performance than any model in the uncertainty class.¹⁰⁹^,¹¹⁶ The corresponding implementations addressing model uncertainty have also been investigated for more effective and efficient inference procedures.¹¹²

BMA with MOCU for OED

For Bayesian experimental design in general, there can be three categories of objective functions to guide the experimental design. In the first case, we have a parametric model where the parameters come from an underlying physical system. One such example is in biomedicine, where the objective function is the likelihood of the cell being in a cancerous state, given a state-space model based on genetic regulatory pathways.¹¹⁷ Another example is in imaging (for example, for image reconstruction or filtering), where the parameters characterize the image appearance, and the objective function is an error measure between two images.

In the second category, the features are given, and the parameters come from a surrogate model used in place of the actual physical model but are believed to be appropriately related to the physical model. For example, in the materials science applications under “OED with MOCU,” the surrogate model is based on the time-dependent Ginzburg-Landau (TDGL) theory and simulates the free energy given dopant parameters, the objective function is the energy dissipation, and the action is to find an optimal dopant and concentration.⁵ To see how the approach in Dehghannasiri et al.⁵ fits the above general theory, the reader can refer to Boluki et al.¹¹⁸

In the third category, we do not know the physical model, and we lack sufficient knowledge to posit a surrogate model with known features/forms relating to our objective. This case arises in many scenarios where the objective function is a “black box” function. Nevertheless, we can adopt a model, albeit one with known predictive properties. This model can be a kernel-based model, such as a GP.¹¹⁹ Moreover, this model can consist of a set of possible parametric families, a kernel-based model with different possible feature sets, or even kernel-based models with different choices for the kernel function. In such scenarios, we do not a priori have any knowledge about which feature set or model family would be the best, and reliable model selection cannot be performed before starting the experiment design loop because of the limited number of observed samples. Considering the average prediction from models based on different feature sets or model families weighted by their posterior probability of being the correct model, namely BMA, is one possible approach.

In the context of materials discovery, we can frame the model averaging problem in a hierarchy to define a family of uncertain model classes in which, for example, different features contribute differently to functional property prediction differently. With such a hierarchical Bayesian model, BMA, essentially weighing all the possible models by their corresponding probability of being the true model, is embedded in BO for OED to realize a system not only capable of autonomously and adaptively learning the surrogate predictive models for the most promising materials of desired properties but also utilizing the models to efficiently guide exploration of the design space. With more acquired data, the uncertainty of different models will be quantified, and improved predictive models as well as efficient experimental design can be attained.

Again, assume an uncertainty class $Θ$ with the probability measure $Π$ , characterizing predictive models on a design space $X$ . The experimental design goal is to optimize an objective function $f : Θ \times X \to R$ . For example, we want to find a design $x \in X$ that minimizes an unknown true objective function $f (x; θ_{t})$ over $X$ , where $θ_{t} \in Θ$ denotes the true model. When there is no strong prior knowledge on functional forms of the objective function, often GP regression (GPR) is adopted and iteratively updated given data from performed experiments $D_{n}$ : $P (f | x; D_{n}) \sim G P (y | μ, K)$ , where ${μ, K}$ denote the corresponding mean and kernel parameters. To account for potential model uncertainty, BMA can be used for more robust modeling of the objective function:

P (f | x; D_{n}) = \sum_{i = 1}^{L} P (i | D_{n}) P (f | x; D_{n}, μ_{i}, K_{i}),

(Equation 9)

where i is the index of the candidate models in the uncertainty class.

As explained under “Bayesian learning, UQ, and experimental design,” a robust design is an element $x^{R} \in X$ that minimizes the average of the objective function across all possibilities in the uncertainty class relative to a probability distribution governing the corresponding space. This probability at each experimental design iteration is the posterior distribution given the observed data points available up to that step. Mathematically,

x_{n}^{R} = \underset{x \in X}{argmin} E_{θ} [f (x; θ) | D_{n}],

(Equation 10)

where $D_{n}$ denotes the observed data till the nth iteration. MOCU in this context can be defined as the average gain in the attained objective between the robust design and the actual optimal designs across the possibilities:

{MOCU}_{n}^{X} (Θ) = E_{θ} [f (x_{n}^{R}; θ) - f (x_{θ}^{*}; θ) | D_{n}],

(Equation 11)

where $x_{θ}^{*}$ denotes the optimal action for a given model parameterized by θ, including both GPR parameters and additional parameters from BMA. Note that, if we actually knew the true (correct) model, then we would simply take the optimal design for that model, and MOCU would be 0. Denoting the set of possible experiments by $Ξ$ , the best experiment $ξ_{n}^{*}$ at each time step (in one-step look-ahead scenario) is the one that maximally reduces the expected MOCU following the experiment; i.e.,

\begin{array}{c} ξ_{n}^{*} = \underset{ξ \in Ξ}{argmin} E_{ξ} [E_{θ} [f (x_{n + 1}^{R}; θ) | ξ, D_{n}]] - E_{θ} [f (x_{n}^{R}; θ) | D_{n}] . \end{array}

(Equation 12)

In most cases in materials discovery, each experiment is synthesizing the corresponding materials design and measuring its actual properties (or their noisy versions). Thus, the experiment space is equivalent to the design space.

It is beneficial to recognize that MOCU can be viewed as the minimum expected value of a Bayesian loss function, where the Bayesian loss function maps an operator (the materials design in this context) to its differential objective value (for using the given operator instead of an optimal operator), and its minimum expectation is attained by an optimal robust operator that minimizes the average differential objective value. In decision theory, this differential objective value has been referred to as the regret. Under certain conditions, MOCU-based experimental design is, in fact, equivalent to KG and EGO.¹¹⁸

BMA for materials science applications

We have integrated BMA with the MOCU-based experimental design to deploy an autonomous computational materials discovery framework that is capable of performing optimal sequential computational experiments to find optimal materials and updating the knowledge on materials system model at the same time. One of our recent exercises¹²⁰ consisted of implementing the BMA approach for robust selection of computational experiments to optimize properties of the MAX phase crystal system.¹²¹ Employing BMA approaches using a set of GPR functions based on different feature sets, we demonstrated that the framework was robust against selection of poor feature sets because the approach considers all the feature sets at once, updating their relative statistical weights according to their ability to predict (successful) outcomes of unrealized simulations. More critically, we have demonstrated the effectiveness of our computational materials discovery platform for single and multiobjective optimization problems.

This framework has been used efficiently for objective-oriented exploration of materials design spaces (MDSs) through computational models and, more importantly to guide experiments by focusing on gathering data in sections of the MDS that will result in the most efficient path to achieving the optimal material within resource budgets. Additionally, the BO approach was successfully combined with BMA for autonomous and adaptive learning, which may be used to auto-select the best models in the MDS, thereby eliminating the requirement of knowing the best model a priori. Thus, this framework constitutes a paradigm shift in the approach to materials discovery by simultaneously (1) accounting for the need to adaptively build increasingly effective models for the accelerated discovery of materials while (2) accounting for the uncertainty in the models themselves. It enables a long-desired seamless connection between computation and experiments, each informing the other, while progressing optimally toward the target material.

In our implementation for MAX phase crystal systems, after training the GPs based on the current and previous observations, solving for the GP hyperparameters to maximize the marginal likelihood of the observed data, each GP provides a Gaussian distribution over the objective function value of each design. Averaging several GPs based on their posterior model probabilities is like mixing weighted Gaussian distributions over the objective value of each design. Based on the sum of weighted Gaussian distributions, the MOCU-based utility function or other acquisition functions, including expected improvement (EI) with a single objective⁴⁵ or expected hypervolume improvement (EHVI) with multiobjectives,¹²² can be calculated for all possible designs, and the maximizer is chosen as the next experiment. In our experiments, six sets of basic compositional and structural features were chosen a priori without assuming any knowledge of their suitability for the underlying true model that generates data. We have investigated whether the updated model posterior in BMA captured the expected CPSP relationships.¹²⁰ The goal of experimental design is to discover MAX phases with maximum bulk modulus and minimum shear modulus, which were computed through density functional theory (DFT) calculations¹²³^,¹²⁴ for 1,500 randomly sampled MAX ternary carbide/nitride crystals. Among these DFT-calculated results, there were 10 MAX phases belonging to the Pareto front when considering the design goals. All of the reported performances of Bayesian experimental design were based on the average values of 1,500 runs starting from the random initial sets of 10 training samples. In Figure 5A, we show the change of the average maximum bulk modulus with the iterations of sequential experimental design. It is clear that, among six models with different features, the feature set $F_{2}$ achieves the best experimental design performance because the average maximum bulk modulus is consistently higher than the other models. On the other hand, $F_{6}$ has the worst performance. When adopting BMA (either based on first-order or second-order maximum likelihood inference [BMA1 or BMA2, respectively]), it is clear that BMA achieves robust performance even when some models may not have good predictive power (Figure 5B). With the increasing number of iterations, it is also clear that the posterior probability of the best model, $F_{2}$ , gets higher (Figure 5C). Last but not least, as shown in Figure 5D, our BMA-based multiobjective experimental design can approach the Pareto front within a small number of sequential design iterations considering the vast MAX ternary carbide/nitride space. All of these experimental results on the maximization/minimization of mechanical properties of MAX phases suggest that BMA-based model fusion can lead to considerable reduction in the number of experiments/computations that need to be carried out to identify the desired solutions to this specific materials design problem.

Bayesian experimental design with BMA for MAX phases as described in the text

(A) The change of average maximum bulk modulus for the original six feature sets with the number of design iterations.

(B) The change of average maximum bulk modulus comparing BMA surrogates with the best and worst feature sets.

(C) The change of posterior model probabilities corresponding to six feature sets.

(D) The average number of sampled Pareto front points when considering bulk modulus and shear modulus.

Along these directions, we can develop robust Bayesian learning methods by model fusion that exploit correlations among sources/models. Together with a multiinformation source optimization framework driven by scientific knowledge, they will reliably and efficiently identify, given the current knowledge, the next best information source to query and guide the materials design.¹²⁵

OED with MOCU

In the context of OED, it has a long history in science and engineering as a properly designed experimental procedure that provides much greater efficiency than simply making random probes. Indeed, Francis Bacon’s call for experimental design in 1620 is often taken to be the beginning of modern science.¹²⁶

MOCU-based OED

Because the MOCU⁶⁵^,⁶⁶ can be used to quantify the objective-based uncertainty, it provides an effective means to estimate the expected impact of potential experiments on the objective (i.e., operational goal) through the reduction of model uncertainty. Suppose we are given a set of potential experiments from which the next experiment could be chosen. Which among the possible experiments should be selected if we wish to optimally improve the operational performance of the operator based on the expected experimental outcome? A natural way to select the best possible experiment would be to choose the one that would lead to the minimum expected remaining MOCU after observing its outcome. To be more specific, let $ξ \in Ξ$ be an experiment in the experimental design space $Ξ$ . Given ξ, the MOCU conditioned on this experiment can be computed as

MOCU (Θ | ξ) = E_{θ | ξ} [C_{θ} (ψ_{IBR}^{Θ | ξ}) - C_{θ} (ψ_{θ})],

(Equation 13)

where $ψ_{IBR}^{Θ | ξ}$ is the IBR operator that is optimally robust for the uncertainty class of models $Θ | ξ$ that is now conditioned on this experiment ξ, and the expectation is taken with respect to the conditional distribution $π (θ | ξ)$ . The expected remaining MOCU can be evaluated by

R (Θ | ξ) = E_{ξ} [MOCU (Θ | ξ)],

(Equation 14)

and the optimal experiment $ξ^{*}$ is the one that minimizes the expected remaining MOCU in Equation 14 so that it satisfies

ξ^{*} = \underset{ξ \in Ξ}{argmin} R (Θ | ξ) .

(Equation 15)

While this strategy does not guarantee that the selected experiment will indeed minimize the uncertainty impacting the objective among all experiments (because the experimental outcome is not known in advance with certainty), it will be optimal on average. Recently, this MOCU-based experimental design scheme has been developed for a variety of systems and applications, which include enhancing the performance of gene-regulatory network intervention with partial network knowledge,¹¹⁷^,¹²⁷ synchronization of an uncertain Kuramoto model that consists of interconnected oscillators with uncertain interaction strength,¹²⁸^,¹²⁹ optimal sequential sampling,¹³⁰ Bayesian classification through active learning,⁴⁴^,¹³¹ and robust filtering of uncertain stochastic differential equation (SDE) systems.⁴²

For materials discovery via OED guided by MOCU, as shown in Equation 15, optimization algorithms have to be developed based on the structure of the input design space as well as the properties of the MOCU computation based on different problem settings. For example, if we are investigating pool-based high-throughput screening or discovery problems with a finite set of candidates, either exhaustive search as in typical BO implementations⁵^,⁵³^,¹³¹ or dynamic programming algorithms based on KGs⁴⁷^,¹³²^,¹³³^,¹³⁴ can be developed for solving the optimization problems. When the input design space is continuous and the gradient of MOCU can be estimated, gradient-based local search algorithms can be implemented, as discussed in Zhao et al.¹³⁵ There are also other solution strategies that can be used to solve OED guided by MOCU, including sampling and genetic and other evolutionary algorithms.¹³⁶

Figure 6 shows the performance of the MOCU-based OED strategy in reducing the uncertainty that impacts the synchronization cost of a Kuramoto model that consists of 5 oscillators, where the coupling strength between oscillators is uncertain and known only up to a range.¹²⁸ In this example, an experiment picks an oscillator pair and observes whether the selected oscillator pair is synchronized in the absence of external control. The observation can be used to reduce the range of the uncertain coupling strength between the oscillators. For this Kuramoto model, there exist $(\begin{array}{c} 5 \\ 2 \end{array})$ potential experiments in the experimental design space $Ξ$ , and Figure 6 shows how MOCU decreases as a function of experimental updates. As can be seen, the MOCU-based OED strategy leads to a sharp reduction in uncertainty within a few updates, outperforming random selection (which selects one of the possible experiments from $Ξ$ with uniform probability) or an entropy-based approach (which selects the experiment for the oscillator pair whose coupling strength has the largest uncertain range).

Experimental design results based on a 5-oscillator Kuramoto model with uncertain coupling strength between the oscillators as described in the text

The MOCU-based OED scheme quickly reduces the model uncertainty that impacts the performance.

OED for shape memory alloy (SMA)

In materials design, the MOCU-based OED strategy has been applied to a computational problem for shape memory alloy (SMA) design with desired stress-strain profiles for a particular dopant at a given concentration utilizing the TDGL theory.⁵ The TDGL model simulates the free energy for a specific dopant with a specified concentration, given the dopant’s parameters, which is considered an oracle in the experiments. Because the computational complexity of the TDGL model is enormous, an uncertain surrogate model is first trained to approximately predict a dissipation energy for a specified dopant and concentration. In particular, based on TDGL, a reciprocal function is adopted to model the energy dissipation at a specific temperature as a function of dopant potency, dopant spread, and dopant concentration. The experimental design goal is to discover SMAs with the minimum energy dissipation, and therefore this surrogate model is used as the cost function to define MOCU to efficiently guide throughout the experimental design iterations for an optimal dopant and concentration. With the MOCU defined based on this Landau mesoscale surrogate for SMAs as the cost function, the expected remaining MOCU, given the corresponding dopant and its corresponding concentration levels, can be computed by the definition in Equation 14. The optimal experiment can then be determined to minimize the expected remaining MOCU under model uncertainty as in Equation 15.

In the reported experiments,⁵ MOCU-based OED was compared with the pure exploitation and random selection policies. Averaged over 10,000 simulations, our MOCU-based OED strategy, which strives to minimize the uncertainty in the model pertaining to the design objective, identified the dopant and concentration with the optimal dissipation after only two iterations on average, while either exploitation or random selection policies cannot find the optimal dopant even after 10 iterations. Getting optimal results after fewer iterations is especially crucial in materials discovery, where measurements by either high-throughput simulation models or synthesis and profiling experiments are expensive and time consuming.

Automatic feature engineering (AFE)

Finally, with accumulated knowledge and data from experimental design based on objective-UQ using MOCU, we may help fill in the missing gap of the understanding in materials systems under study. In materials science, the fundamental paradigm is the existence of causal relationships connecting composition and processing (i.e., the modifications to a material’s current state), structure (i.e., the multiscale arrangement of the material), and properties (i.e., the response of the material to an external stimulus); i.e., CPSP relationships. The navigation of this CPSP space is enormously resource intensive, regardless of whether this query is on physical experiments or computational ones. As a result, it typically takes more than 20 years to identify, develop, and finally deploy one material in real-world applications—a key bottleneck for the MGI.¹^,³^,⁴ Attempting to use physics-agnostic models to build these relationships is limited by the scarcity of the training data itself. Moreover, one would be interested in discovering derived relationships that connect features to properties/behavior because these relationships can further be used to design/discover materials with optimal properties. Besides designing and discovering promising new materials with desired functional properties, identifying critical input features (related to composition, process, structure) that determine function properties as well as principled CPSP relationships can provide a systematic understanding of the underlying physics for different materials systems. Such knowledge can be explored and updated, as illustrated in the previous examples under “Integrating prior knowledge in materials science” and “BMA for materials science applications.” One such knowledge discovery strategy is AFE, which enables us to use physics constraints on learning surrogate models while facilitating the discovery of fundamental materials design rules at the same time.

Engineered features obeying physics principles provide valuable interpretability that is critical to help new knowledge discovery and consequent critical decision-making. It is worth noting that, in scientific ML (sciML) involving complex systems, training data tend to be scarce and noisy because obtaining data can be difficult, time consuming, and costly. Materials problems clearly reflect these challenges.

Related work in feature engineering

Feature representation learning has been studied extensively in the SP/ML community, including “white box” methods based on specific basis families (Fourier and wavelet are two representatives) and data-driven “black box” methods, such as dictionary learning and deep learning.¹³⁷^,¹³⁸^,¹³⁹ Although “black box” deep AFE models¹⁴⁰ have shown great potential to improve the corresponding ML algorithm performance, we focus on feature engineering, aiming to derive features based on explicit functional forms in this survey. Desirable feature engineering should attain considerable improvement of prediction performance and generalizability as well as good interpretability with little manual labor. Among the existing methods, deep feature synthesis¹⁴¹ extracts features based on explicit functional relationships without experts’ domain knowledge through stacking multiple primary features and implementing operations or transformations on them, but it suffers from efficiency and scalability problems because of its brute-force way to generate and select features. Kaul et al.¹⁴² proposed Autolearn by regression-based feature learning through mining pairwise feature associations. While it avoids overfitting, to which deep learning-based FE methods are amenable, and improves the efficiency by selecting subsets of engineered features according to stability and information gain, it does not directly produce interpretable features. Khurana et al.¹⁴³ introduced Cognito, which formulates the feature engineering problem as a search on the transformation tree with an incremental search strategy to explore the prominent features and later extended the framework by combining reinforcement learning (RL) with a linear functional approximation¹⁴⁴ to improve the efficiency. A similar framework has recently been developed in Zhang et al.,¹⁴⁵ where the deep reinforcement learning (DRL) policy is learned on a tree-like transformation graph. It improves the policy learning capability compared with Cognito. However, both frameworks do not explicitly incorporate available prior knowledge into the AFE procedures.

For AFE in materials science applications, we are interested in finding the actuating mechanisms of the materials’ functional properties of interest by identifying a set of physically meaningful variables and their relationships.¹⁴⁶ Such a set of physical variables with corresponding parameters that uniquely describe the materials’ properties of interest can be denoted as “descriptors.” Discovering descriptors in materials science can help better predict target functional properties with potential interpretability for a given complete class of materials.¹⁴⁷ Several methods have been developed, such as a method based on compressed sensing¹⁴⁷ and the more recent Sure Independent Screening and Sparse Operation (SISSO)¹⁴⁸ by brute-force search to generate and select subsets of generated features by sure independent screening¹⁴⁹ together with sparse operators such as least absolute shrinkage and selection operator (LASSO).¹⁵⁰ These methods pose a scalability challenge with the exponentially growing memory requirement to store intermediate features and high computational complexity to search for features.

Physics-constrained AFE

In our recently developed AFE framework,¹⁵¹ a feature generation tree (FGT) was constructed with physics constraints to explore the engineered feature (descriptor) space more efficiently based on first principles, which was demonstrated in several materials problems to be able to take advantage of prior chemical and physical knowledge of the materials systems under study.

Our FGT-based AFE framework focuses on sciML applications, where interpretability is critical to help consequent critical decision-making under data scarcity and uncertainty. Specifically, AFE strategies have been developed by combining FGT exploration with Deep Reinforcement Learning (DRL)¹⁵² to address the interpretability and scalability challenges. Instead of employing a brute-force way to perform algebraic operations on the raw features in a given dataset and then selecting important descriptors, we combine the descriptor generating and selecting processes together by constructing FGTs and developing the corresponding tree exploration policies guided by a deep Q network (DQN). An efficient exploration of the prominent descriptors can be attained in the growing feature space based on the allowed algebraic operations. Our FGT-based AFE strategies construct interpretable descriptors based on a list of operations according to the DRL learned policies, which are more scalable and flexible with the performance-complexity trade-off with the help of adjustable batch size for generating intermediate features. More critical to materials science and other sciML problems, our FGT provides a flexible framework for incorporating prior knowledge (e.g., physics constraints) to generate and select features. This is important for knowledge discovery via interpretable learning with physics constraints under data scarcity and uncertainty because the space connecting intrinsic materials attributes/features to materials behavior is vast, sparse, and complex in nature.

In particular, let $x_{0}$ denote the finite set of p variables as raw or primary features ${x_{0}^{1}, \dots, x_{0}^{p}}$ and y the target output vector. AFE is to develop an algorithm to construct sets of engineered features as interpretable and predictive descriptors $F_{i} = {g_{1} (x_{0}, c_{1}), g_{2} (x_{0}, c_{2}), \dots}$ based on explicit functional forms with allowed algebraic operations that accurately predict y. The set of algebraic operations φ in an operation set O can be constructed based on prior knowledge; for example, with the following unary and binary operations: $O = {\exp (\cdot), \log (\cdot), {(\cdot)}^{2}, {(\cdot)}^{3}, {(\cdot)}^{- 1}, \sqrt{\cdot}, \sqrt[3]{\cdot}, +, -, \times, \div}$ . For each function $g (x_{0}, c)$ , c denotes the complexity of the corresponding generated descriptor—the number of algebraic operations. For example, the function $\exp (x_{0}^{0}) \times {(x_{0}^{1})}^{2} + \sqrt{(x_{0}^{2})}$ has a complexity of 5. The operation set O can be pre-defined based on the prior knowledge about the system under study. If we denote the primary features $x_{0}$ by $F_{0}$ , then $F_{i}$ denotes the iteratively generated set of descriptors with the maximum allowed complexity $c_{i}$ . Our goal is to find an optimal descriptor set $F^{*}$ that maximizes the prediction performance score; for example, by classification or regression accuracy, $A_{L} {F^{'}, y}$ :

F^{*} = \underset{\forall f^{k} \in F, f^{k} \in F}{argmax} A_{L} {F, y},

(Equation 16)

where L denotes the prediction model (for example, linear regression or Support Vector Machine (SVM) for interpretability with generated descriptors), and $f^{k}$ is any descriptor (including primary features) in $F$ , the set of all generated features with the maximum allowed complexity $c_{\max}$ .

The combinatorial optimization problem in Equation 16 is NP hard. We solve it approximately by introducing the FGT to iteratively construct the descriptor space and transform the problem into a tree search problem for efficient AFE. Each node in the FGT represents a set of descriptors $F_{i}$ , and each edge represents an operation φ. We denote ${(F^{d})}^{*} = {{(f^{1})}^{*}, {(f^{2})}^{*}, \dots, {(f^{d})}^{*}}$ as the top d optimal features when we choose the cardinality of $F^{*}$ as d and ${(f^{d})}^{*}$ as the selected optimal feature for the dth dimension of ${(F^{d})}^{*}$ . The FGT exploration aims to search for the best descriptors ${(f^{1})}^{*}, {(f^{2})}^{*}, \dots$ one by one based on the testing accuracy given the observed data. The corresponding complete AFE procedure constructs the feature subspace $F^{d}$ sequentially as the search space of each ${(f^{d})}^{*}$ exploration, starting from the root node $F_{0}$ with the primary feature set. At each node $F_{i}$ , we would like to learn a generation policy π to choose an operation $φ_{i}$ to generate the new descriptor set $F_{j} (j > i)$ as the corresponding child node, with which the current optimal ${(F^{d})}^{'}$ and $A_{L} {{(F^{d})}^{'}, y}$ will be updated accordingly. The FGT will grow by repeating the operations above until it reaches the maximum complexity $c_{\max}$ .

To learn the FGT generation policy π, we adopt a DQN with experience replay.¹⁵² Formally, we define the states, actions and rewards as follows:

•
state $F_{i}^{d}$ , denoting a set of primary features or generated descriptors when looking for the dth optimal descriptor;
•
action $π (F_{i}^{d}) = φ_{i}$ , denoting an operation in the set O;
•
reward: $R (F_{i}^{d}, φ_{i}) = \max_{F^{'}} {(1.001 - A_{L} {F^{'}, y})}^{- 1}$ , where $0 \leq A_{L} {F^{'}, y} \leq 1$ .

The pseudo-code for learning DQN-based FGT exploration is given in Algorithm 1. To have a flexible exploration procedure for performance-complexity trade-off and incorporation of prior knowledge, each ${(f^{d})}^{*}$ in $F^{*}$ can be chosen from the top n features with highest rewards in the corresponding feature subspace $F^{d}$ , composing a candidate set $S^{d}$ . So ${(F^{d - 1})}^{*}$ can have multiple combinations according to the whole candidate sets $S = {S^{1}, \dots, S^{d - 1}}$ , and $F^{'}$ also has multiple combinations according to different ${(F^{d - 1})}^{*}$ and $f^{d}$ . Consequently the reward is computed as the maximum reward over $F^{'}$ .

Algorithm 1. DQN for AFE.

1: input: Primary features $F_{0}$ , Action set O
2: for $d = 1,2, \dots$ do
3: Construct new DQN
4: Clear Buffer
5: for $e p i s o d e = 1,2, \dots, N$ do
6: for $i = 0,1, \dots,$ do
7: $φ_{i} = ϵ$ -Greedy Method( $F_{i}, ϵ$ )
8: $F_{i + 1}^{d}, R_{i}, c_{i + 1} =$ FGT_Grow ${F_{i}^{d}, φ_{i}, c_{i}}$
9: Buffer $\leftarrow {F_{i}^{d}, F_{i + 1}^{d}, φ_{i}, R_{i}, c_{i + 1}}$
10: Train DQN with experience replay
11: if $R_{i} > t h r e s h o l d$ then
12: goto Output
13: end if
14: if $c_{i} \geq c_{\max}$ then
15: break
16: end if
17: end for
18: end for
19: $S \leftarrow$ Candidate set $S^{d}$ with n features of highest $R_{i}$
20: end for
21: Output: Optimal feature set $F^{*}$ chosen from $S$

Note that when we apply binary operations on $F_{i}$ , beside the one feature in the $F_{i}$ , we have to choose another feature in the generated descriptor space, leading to the exponentially exploding number of new descriptors. To achieve appropriate performance-complexity trade-off, we introduce flexible batch sampling to randomly sample a feature subspace B from $F$ as a “batch set” each time and enumerate $f^{s}$ only from B and take the maximum reward from all of the combinations as the reward. When prior knowledge is available as physics constraints on applying corresponding operations to specific feature groups, this batch sampling procedure can naturally take care of them.

AFE to learn interatomic potential models for copper

First-principles DFT¹²³^,¹²⁴ has been extensively applied in materials science, physics, and chemistry. However, it is often constrained to simulate materials with 100–1,000 atoms for several thousands ab initio molecular dynamics steps, covering about 10 picoseconds. In contrast, classic interatomic potentials have been widely adopted in the past, allowing large-scale molecular dynamics simulation of millions of atoms for millions of time steps (that is, covering $>$ 10 nanoseconds). However, the construction of the functional form and the optimization of the corresponding parameters of classic potentials are highly nontrivial. Recently, developing and training neural network potentials based on first-principles DFT calculations of relatively small systems has become an important research direction in the atomistic simulation in combination with active learning.¹⁵³^,¹⁵⁴^,¹⁵⁵^,¹⁵⁶ Very recently, a feature engineering method has been pursued where genetic programming was applied to develop fast and accurate classic interatomic potentials with explicit functional forms from physically meaningful hypothesis space.¹³⁶ Particularly, genetic programming was applied to optimize the exact functional form of pairwise and many-body potentials as well as other potential forms, highlighting an important avenue toward the development of physics-constrained models with analytic functional form.

Different from the above genetic programming approach, we have adopted our FGT-based AFE and evaluated its ability to find potential models from data generated by DFT. To compare with the genetic programming symbolic regression approach,¹³⁶ we used the same 150 snapshots of 32-atom DFT molecular dynamics simulations on fcc copper in Hernandez et al.,¹³⁶ where each snapshot was generated every 100 steps with a time step of 1 fs. We adopted the same 150 snapshots, including 50 snapshots from ab initio molecular dynamics performed at 300 K in the canonical (NVT) ensemble, 50 snapshots at 1,400 K in the NVT ensemble, and 50 snapshots at 1,400 K in the isothermal-isobaric (NPT) ensemble with pressure at 100 kPa. The 150 total energies calculated by Hernandez et al.¹³⁶ were considered the target output of interest,¹⁵⁷ with the random split of 125 structures and their corresponding total energies for training, and the remaining 125 structures and total energies for validation, for feature engineering.

We have compared our AFE with the recently developed physics-informed genetic programming method¹³⁶ to arrive at analytical many-body classical interatomic potential models. Figure 7 shows the plots of predicted total energy of these different copper structures vs. the simulated total energy based on primary features (left), genetic-programming-generated descriptors (center), and AFE-generated descriptors (right) on the same held-out testing data. With the same simulated molecular dynamics data and experimental setup in the paper, our AFE has achieved the total energy prediction with a mean absolute error (MAE) of 3.73 meV/atom within 12 h. By contrast, the reported model GP1 by genetic programming in Hernandez et al.¹³⁶ had a prediction MAE of 4.13 meV/atom after 360 CPU hours on the same training and test sets.

Copper energy regression results for different interatomic potential models

Our proposed AFE strategies approximate the expected future reward of engineered descriptors through DQN-based policy learning and replace the exhaustive feature generation by DQN-guided FGT exploration considering physics prior knowledge. Consequently, our AFE enhances scalability and computational efficiency without sacrificing prediction performance, as demonstrated in the reported experiment as well as other materials systems in Xiang et al.¹⁵¹ The results of these real-world materials science experiments have demonstrated the potential of our DQN-guided FGT exploration in reducing the runtime and enhancing the scalability for AFE. More importantly, the engineered descriptors are interpretable with the corresponding lists of algebraic operations on the original primary features. Our physics-constrained AFE aims at generalizable learning under data scarcity and uncertainty. Interpretable instead of “black box” learning helps new knowledge discovery and better decision-making.

Conclusions and future work

When facing real-world complex systems in various science and engineering domains—such as complex materials systems, which are our focus in this paper—where acquisition of ample data is practically difficult and formidably expensive, currently existing ML methods fail to produce reliable and generalizable predictions. To cope with the current shortcomings of existing SP and ML schemes in materials discovery, there is a pressing need for novel methods that enable robust optimal decision-making under challenging conditions, such as small data size, enormous system complexity, nonstationarity, as well as data and model uncertainty. In this paper, we have presented recent efforts in knowledge-driven learning, optimization, and experimental design, and we have also provided their historical context against the rich research history in the SP community revolving around robust filtering and control, which can probably be dated back to the 1950s. Specifically, we have shown several examples in an objective-based UQ framework—via MOCU—to develop sciML methods to address the aforementioned challenges in accelerating materials discovery, focusing on learning and experimental design under uncertainty. The problems of studying complex systems will persist in diverse science and engineering disciplines, and we expect that the learning and optimization schemes based on objective-based UQ presented in this paper would provide a useful guideline for developing new sciML methods that more effectively incorporate scientific knowledge, design surrogate ML models that are better suited for the given systems under study, and devise computational solutions that are more scalable and efficient.

Acknowledgments

The authors would like to thank the collaborators involved in the reviewed projects: Shahin Boluki, Guang Zhao, Mingzhou Fan, Ziyu Xiang, Tao Hu, Roozbeh Dehghannasiri, Nathan Wilson, and Daniel Willhelm. The authors were supported in part by National Science Foundation (NSF) awards CCF-1553281, IIS-1812641, OAC-1835690, DMR-1753054, DMR-2119103, CMMI-2226908, SHF-2215573, and IIS-2212419 as well as by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Mathematical Multifaceted Integrated Capability Centers program under award DE-SC0019303.

Declaration of interests

The authors declare no competing interests.

References

1.Kaufman L., Ågren J. CALPHAD, first and second generation–Birth of the materials genome. Scripta Mater. 2014;70:3–6. doi: 10.1016/j.scriptamat.2012.12.003. [DOI] [Google Scholar]
2.McDowell D.L., Kalidindi S.R. The materials innovation ecosystem: a key enabler for the materials genome initiative. MRS Bull. 2016;41:326–337. doi: 10.1557/mrs.2016.61. [DOI] [Google Scholar]
3.Ghiringhelli L.M., Vybiral J., Levchenko S.V., Draxl C., Scheffler M. Big data of materials science: Critical role of the descriptor. Phys. Rev. Lett. 2015;114 doi: 10.1103/PhysRevLett.114.105503. [DOI] [PubMed] [Google Scholar]
4.Kim C., Pilania G., Ramprasad R. From organized high-throughput data to phenomenological theory using machine learning: The example of dielectric breakdown. Chem. Mater. 2016;28:1304–1311. doi: 10.1021/acs.chemmater.5b04109. [DOI] [Google Scholar]
5.Dehghannasiri R., Xue D., Balachandran P.V., Yousefi M.R., Dalton L.A., Lookman T., Dougherty E.R. Optimal experimental design for materials discovery. Comput. Mater. Sci. 2017;129:311–322. doi: 10.1016/j.commatsci.2016.11.041. [DOI] [Google Scholar]
6.Yip S., editor. Handbook of Materials Modeling. Springer; 2005. [DOI] [Google Scholar]
7.Krenn M., Pollice R., Guo S.Y., Aldeghi M., Cervera-Lierta A., Friederich P., Dos Passos Gomes G., Häse F., Jinich A., Nigam A.K., et al. On scientific understanding with artificial intelligence. Nat. Rev. Phys. 2022;4:761–769. doi: 10.1038/s42254-022-00518-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Wang H., Fu T., Du Y., Gao W., Huang K., Liu Z., Chandak P., Liu S., Van Katwyk P., Deac A., et al. Scientific discovery in the age of artificial intelligence. Nature. 2023;620:47–60. doi: 10.1038/s41586-023-06221-2. [DOI] [PubMed] [Google Scholar]
9.Zhang X., Wang L., Jacob H., Luo Y., Fu C., Xie Y., Liu M., Lin Y., Zhao X., Yan K., et al. Artificial intelligence for science in quantum, atomistic, and continuum systems. arXiv. 2023 doi: 10.48550/arXiv.2307.08423. Preprint at. [DOI] [Google Scholar]
10.Choudhary A., Fox G., Hey T. World Scientific Publishing Co. Pte Ltd; 2023. Artificial Intelligence for Science: A Deep Learning Revolution. [DOI] [Google Scholar]
11.Arróyave R., Khatamsaz D., Vela B., Couperthwaite R., Molkeri A., Singh P., Johnson D.D., Qian X., Srivastava A., Allaire D. A perspective on Bayesian methods applied to materials discovery and design. MRS Communications. 2022;12:1037–1049. doi: 10.1557/s43579-022-00288-0. [DOI] [Google Scholar]
12.Fuhr A.S., Sumpter B.G. Deep generative models for materials discovery and machine learning-accelerated innovation. Front. Mater. 2022;9 doi: 10.3389/fmats.2022.865270. [DOI] [Google Scholar]
13.Pyzer-Knapp E.O., Pitera J.W., Staar P.W.J., Takeda S., Laino T., Sanders D.P., Sexton J., Smith J.R., Curioni A. Accelerating materials discovery using artificial intelligence, high performance computing and robotics. npj Comput. Mater. 2022;8:84. doi: 10.1038/s41524-022-00765-z. [DOI] [Google Scholar]
14.Wiener N. Wiley; 1949. Extrapolation, Interpolation, and Smoothing of Stationary Time Series. [DOI] [Google Scholar]
15.Kailath T., Sayed A.H., Hassibi B. Prentice-Hall; 2000. Linear Estimation. [Google Scholar]
16.Betts J.T. 2nd ed. SIAM Press; 2010. Practical Methods for Optimal Control Using Nonlinear Programming. [DOI] [Google Scholar]
17.Dalton L., Dougherty E.R. SPIE Press; 2020. Optimal Bayesian Classification. [Google Scholar]
18.Rockafellar R.T., Uryasev S. The fundamental risk quadrangle in risk management, optimization and statistical estimation. Surveys in Operations Research and Management Science. 2013;18:33–53. doi: 10.1016/j.sorms.2013.03.001. [DOI] [Google Scholar]
19.Spantini A., Cui T., Willcox K., Tenorio L., Marzouk Y. Goal-oriented optimal approximations of Bayesian linear inverse problems. SIAM J. Sci. Comput. 2017;39:S167–S196. doi: 10.1137/16m1082123. [DOI] [Google Scholar]
20.Kuznetsov V.P. Stable detection when signal and spectrum of normal noise are inaccurately known. Telecommun. Radio Eng. 1976;30:58–64. [Google Scholar]
21.Kassam S.A., Lim T.L. Robust Wiener filters. J. Franklin Inst. 1977;304:171–185. doi: 10.1016/0016-0032(77)90011-4. [DOI] [Google Scholar]
22.Poor H. Poor. On robust Wiener filtering. IEEE Trans. Automat. Control. 1980;25:531–536. doi: 10.1109/TAC.1980.1102349. [DOI] [Google Scholar]
23.Chen Y., Chen B. Minimax robust deconvolution filters under stochastic parametric and noise uncertainties. IEEE Trans. Signal Process. 1994;42:32–45. doi: 10.1109/78.258119. [DOI] [Google Scholar]
24.Verdu S., Poor H. Minimax linear observers and regulators for stochastic systems with uncertain second-order statistics. IEEE Trans. Automat. Control. 1984;29:499–511. doi: 10.1109/TAC.1984.1103576. [DOI] [Google Scholar]
25.Li T., Yi X., Carmanis C., Ravikumar P. In: Singh A., Zhu J., editors. 1–9. 2017. Minimax Gaussian classification & clustering.https://proceedings.mlr.press/v54/li17a.html (Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Volume 54 of Proceedings of Machine Learning Research). [Google Scholar]
26.Bertsimas D., Paskov I. Stable regression: On the power of optimization over randomization. J. Mach. Learn. Res. 2020;21:1–25. http://jmlr.org/papers/v21/19-408.html [Google Scholar]
27.Kalman R.E., Bucy R.S. New results in linear filtering and prediction theory. J. Basic Eng. 1961;83:95–108. doi: 10.1115/1.3658902. [DOI] [Google Scholar]
28.Mehra R. Approaches to adaptive filtering. IEEE Trans. Automat. Control. 1972;17:693–698. doi: 10.1109/TAC.1972.1100100. [DOI] [Google Scholar]
29.Morris J. The Kalman filter: A robust estimator for some classes of linear quadratic problems. IEEE Trans. Inf. Theor. 1976;22:526–534. doi: 10.1109/TIT.1976.1055611. [DOI] [Google Scholar]
30.Bellman R., Kalaba R. Dynamic programming and adaptive processes: Mathematical foundation. IRE Trans. Automatic Control. 1960;5:5–10. doi: 10.1109/TAC.1960.6429288. [DOI] [Google Scholar]
31.Silver E.A. MIT Cambridge Operations Research Center; 1963. Markovian Decision Processes with Uncertain Transition Probabilities or Rewards. Technical report. [Google Scholar]
32.Martin J.J. Wiley; 1967. Bayesian Decision Problems and Markov Chains. [Google Scholar]
33.Grigoryan A.M., Dougherty E.R. Bayesian robust optimal linear filters. Signal Process. 2001;81:2503–2521. doi: 10.1016/S0165-1684(01)00144-X. [DOI] [Google Scholar]
34.Dougherty E.R., Hua J., Xiong Z., Chen Y. Optimal robust classifiers. Pattern Recogn. 2005;38:1520–1532. doi: 10.1016/j.patcog.2005.01.019. [DOI] [Google Scholar]
35.Anthony C., Atkinson . 1–17. American Cancer Society; 2015. (Optimal Design). [DOI] [Google Scholar]
36.Sebastiani P., Wynn H.P. Maximum entropy sampling and optimal Bayesian experimental design. J. Roy. Stat. Soc. B. 2000;62:145–157. doi: 10.1111/1467-9868.00225. [DOI] [Google Scholar]
37.Mussmann S., Liang P. Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research. Stockholmsmässan; 2018. On the relationship between data efficiency and error for uncertainty sampling Jennifer Dy and Andreas Krause; pp. 3674–3682. PMLR. [Google Scholar]
38.Fedorov V.V., Leonov S.L. Chapman and Hall/CRC Press; 2014. Optimal Design for Nonlinear Response Models. [DOI] [Google Scholar]
39.Duarte B.P.M., Wong W.K., Atkinson A.C. A semi-infinite programming based algorithm for determining t-optimum designs for model discrimination. J. Multivariate Anal. 2015;135:11–24. doi: 10.1016/j.jmva.2014.11.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Dennis V. SIAM; 1972. Lindley. Bayesian Statistics, A Review. [Google Scholar]
41.Huan X., Marzouk Y. Gradient-based stochastic optimization methods in Bayesian experimental design. Int. J. Uncertain. Quantification. 2014;4:479–510. doi: 10.1615/int.j.uncertaintyquantification.2014006730. [DOI] [Google Scholar]
42.Zhao G., Qian X., Yoon B.-J., Alexander F.J., Dougherty E.R. Model-based robust filtering and experimental design for stochastic differential equation systems. IEEE Trans. Signal Process. 2020;68:3849–3859. doi: 10.1109/TSP.2020.3001384. [DOI] [Google Scholar]
43.Foster A., Martin J., O’Meara M., Yee W.T., Rainforth T. Vol. 108. 2020. A unified stochastic gradient approach to designing Bayesian-optimal experiments; pp. 2959–2969.https://proceedings.mlr.press/v108/foster20a.html (Proceedings of the 23th International Conference on Artificial Intelligence and Statistics (AISTATS)). [Google Scholar]
44.Zhao G., Dougherty E., Yoon B.-J., Alexander F., Qian X. 9th International Conference on Learning Representations. ICLR); 2021. Uncertainty-aware active learning for optimal Bayesian classifier. [Google Scholar]
45.Jones D.R., Schonlau M., Welch W.J. Efficient global optimization of expensive black-box functions. J. Global Optim. 1998;13:455–492. doi: 10.1023/a:1008306431147. [DOI] [Google Scholar]
46.Frazier P.I., Powell W.B., Dayanik S. A knowledge-gradient policy for sequential information collection. SIAM J. Control Optim. 2008;47:2410–2439. doi: 10.1137/070693424. [DOI] [Google Scholar]
47.Frazier P., Powell W., Dayanik S. The knowledge-gradient policy for correlated normal beliefs. Inf. J. Comput. 2009;21:599–613. doi: 10.1287/ijoc.1080.0314. [DOI] [Google Scholar]
48.Denil M., Agrawal P., Kulkarni T.D., Tom E., Battaglia P., de Freitas N. International Conference on Learning Representations (ICLR) 2017. Learning to perform physics experiments via deep reinforcement learning. [Google Scholar]
49.Hadjidoukas P.E., Angelikopoulos P., Papadimitriou C., Koumoutsakos P. Π4U: A high performance computing framework for Bayesian uncertainty quantification of complex models. Journal of Computational Physics. 2015;284:1–21. [Google Scholar]
50.Adams B.M., Bohnhoff W.J., Dalbey K.R., Ebeida M.S., Eddy J.P., Eldred M.S., Hooper R.W., Hough P.D., Hu K.T., Jakeman J.D., et al. A Multilevel Parallel Object-Oriented Framework for Design Optimization, Parameter Estimation, Uncertainty Quantification, and Sensitivity Analysis: Version 6.15 User’s Manual. Sandia Technical Report SAND2020-12495. 2021 https://dakota.sandia.gov/content/about. [Google Scholar]
51.Xue D., Balachandran P.V., Hogden J., Theiler J., Xue D., Lookman T. Accelerated search for materials with targeted properties by adaptive design. Nat. Commun. 2016;7 doi: 10.1038/ncomms11241. [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Xue D., Balachandran P.V., Yuan R., Hu T., Qian X., Dougherty E.R., Lookman T. Accelerated search for BaTiO3-based piezoelectrics with vertical morphotropic phase boundary using Bayesian learning. Proc. Natl. Acad. Sci. USA. 2016;113:13301–13306. doi: 10.1073/pnas.1607412113. [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Peter I.F., Wang J. Information Science for Materials Discovery and Design. Springer; 2016. Bayesian optimization for materials design; pp. 45–75. [DOI] [Google Scholar]
54.Qian X., Dougherty E.R. Bayesian regression with network prior: Optimal Bayesian filtering perspective. IEEE Trans. Signal Process. 2016;64:6243–6253. doi: 10.1109/TSP.2016.2605072. [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Ueno T., Rhone T.D., Hou Z., Mizoguchi T., Tsuda K. COMBO: An efficient bayesian optimization library for materials science. Materials Discovery. 2016;4:18–21. doi: 10.1016/j.md.2016.04.001. [DOI] [Google Scholar]
56.Seko A., Maekawa T., Tsuda K., Tanaka I. Machine learning with systematic density-functional theory calculations: Application to melting temperatures of single-and binary-component solids. Phys. Rev. B. 2014;89 doi: 10.1103/PhysRevB.89.054303. [DOI] [Google Scholar]
57.Ju S., Shiga T., Feng L., Hou Z., Tsuda K., Shiomi J. Designing nanostructures for phonon transport via Bayesian optimization. Phys. Rev. X. 2017;7 doi: 10.1103/PhysRevX.7.021024. [DOI] [Google Scholar]
58.Gopakumar A.M., Balachandran P.V., Xue D., Gubernatis J.E., Lookman T. Multi-objective optimization for materials discovery via adaptive design. Sci. Rep. 2018;8:3738. doi: 10.1038/s41598-018-21936-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
59.Khatamsaz D., Molkeri A., Couperthwaite R., James J., Arróyave R., Srivastava A., Allaire D. Adaptive active subspace-based efficient multifidelity materials design. Mater. Des. 2021;209 doi: 10.1016/j.matdes.2021.110001. [DOI] [Google Scholar]
60.Castillo A.R., Kalidindi S.R. Bayesian estimation of single ply anisotropic elastic constants from spherical indentations on multi-laminate polymer-matrix fiber-reinforced composite samples. Meccanica. 2021;56:1575–1586. doi: 10.1007/s11012-020-01154-w. [DOI] [Google Scholar]
61.Marshall A., Kalidindi S.R. Autonomous development of a machine-learning model for the plastic response of two-phase composites from micromechanical finite element models. JOM. 2021;73:2085–2095. doi: 10.1007/s11837-021-04696-w. [DOI] [Google Scholar]
62.Honarmandi P., Hossain M.A., Arroyave R., Baxevanis T. A top-down characterization of NiTi single-crystal inelastic properties within confidence bounds through Bayesian inference. Shap. Mem. Superelasticity. 2021;7:50–64. doi: 10.1007/s40830-021-00311-8. [DOI] [Google Scholar]
63.Ladygin V., Beniya I., Makarov E., Shapeev A. Bayesian learning of thermodynamic integration and numerical convergence for accurate phase diagrams. Phys. Rev. B. 2021;104 doi: 10.1103/PhysRevB.104.104102. [DOI] [Google Scholar]
64.Olivier A., Shields M.D., Graham-Brady L. Bayesian neural networks for uncertainty quantification in data-driven materials modeling. Comput. Methods Appl. Mech. Eng. 2021;386 doi: 10.1016/j.cma.2021.114079. [DOI] [Google Scholar]
65.Yoon B.-J., Qian X., Dougherty E.R. Quantifying the objective cost of uncertainty in complex dynamical systems. IEEE Trans. Signal Process. 2013;61:2256–2266. doi: 10.1109/TSP.2013.2251336. [DOI] [Google Scholar]
66.Yoon B.-J., Qian X., Dougherty E.R. Quantifying the multi-objective cost of uncertainty. IEEE Access. 2021;9:80351–80359. doi: 10.1109/ACCESS.2021.3085486. [DOI] [Google Scholar]
67.Dalton L.A., Dougherty E.R. Intrinsically optimal Bayesian robust filtering. IEEE Trans. Signal Process. 2014;62:657–670. doi: 10.1109/TSP.2013.2291213. [DOI] [Google Scholar]
68.Box G.E.P., Tiao G.C. Wiley; 1973. Bayesian Inference in Statistical Analysis. [DOI] [Google Scholar]
69.Berger J.O. Springer-Verlag; 1985. Statistical Decision Theory and Bayesian Analysis. [DOI] [Google Scholar]
70.Christopher M. Springer; 2006. Bishop. Pattern Recognition and Machine Learning.https://link.springer.com/book/9780387310732 [Google Scholar]
71.Dougherty E.R., Zollanvari A., Braga-Neto U.M. The illusion of distribution-free small-sample classification in genomics. Curr. Genom. 2011;12:333–341. doi: 10.2174/138920211796429763. [DOI] [PMC free article] [PubMed] [Google Scholar]
72.Dougherty E.R., Dalton L.A. Scientific knowledge is possible with small-sample classification. EURASIP J. Bioinf. Syst. Biol. 2013;2013:10–12. doi: 10.1186/1687-4153-2013-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
73.Coveney P.V., Dougherty E.R., Highfield R.R. Big data need big theory too. Phil. Trans. R. Soc. A. 2016;374 doi: 10.1098/rsta.2016.0153. [DOI] [PMC free article] [PubMed] [Google Scholar]
74.Jaynes E.T. 1980. What Is the Question? Bayesian Statistics. [Google Scholar]
75.Jeffreys H. An invariant form for the prior probability in estimation problems. Proc. R. Soc. Lond. A Math. Phys. Sci. 1946;186:453–461. doi: 10.1098/rspa.1946.0056. [DOI] [PubMed] [Google Scholar]
76.Zellner A., University of Chicago . University of Chicago, Graduate School of Business, Department of Economics; 1995. Department of Economics. Past and Recent Results on Maximal Data Information Priors. Working Paper Series in Economics and Econometrics. [Google Scholar]
77.Rissanen J. A universal prior for integers and estimation by minimum description length. Ann. Stat. 1983;11:416–431. doi: 10.1214/aos/1176346150. [DOI] [Google Scholar]
78.Rodriguez C.C. Entropic priors for discrete probabilistic networks and for mixtures of Gaussian models. AIP Conf. Proc. 2002 doi: 10.1063/1.1477063. [DOI] [Google Scholar]
79.Berger J.O., Bernardo J.M. On the development of reference priors. Bayesian statistics. 1992;4:35–60. [Google Scholar]
80.Spall J.C., Hill S.D. Least-informative Bayesian prior distributions for finite samples based on information theory. IEEE Trans. Automat. Control. 1990;35:580–583. doi: 10.1109/CDC.1989.70640. [DOI] [Google Scholar]
81.Bernardo J.M. Reference posterior distributions for Bayesian inference. J. Roy. Stat. Soc. B. 1979;41:113–128. doi: 10.1111/j.2517-6161.1979.tb01066.x. [DOI] [Google Scholar]
82.Kass R.E., Wasserman L. The selection of prior distributions by formal rules. J. Am. Stat. Assoc. 1996;91:1343–1370. doi: 10.1080/01621459.1996.10477003. [DOI] [Google Scholar]
83.Berger J.O., Bernardo J.M., Sun D. Objective priors for discrete parameter spaces. J. Am. Stat. Assoc. 2012;107:636–648. doi: 10.1080/01621459.2012.682538. [DOI] [Google Scholar]
84.Jaynes E.T. Information theory and statistical mechanics. Phys. Rev. 1957;106:620–630. doi: 10.1103/PhysRev.106.620. [DOI] [Google Scholar]
85.Jaynes E. Jaynes. Prior probabilities. IEEE Trans. Syst. Sci. Cybern. 1968;4:227–241. doi: 10.1109/TSSC.1968.300117. [DOI] [Google Scholar]
86.Zellner A. Models, prior information, and Bayesian analysis. J. Econom. 1996;75:51–68. doi: 10.1016/0304-4076(95)01768-2. [DOI] [Google Scholar]
87.Werner K., Jansson M., Stoica P. On estimation of covariance matrices with kronecker product structure. IEEE Trans. Signal Process. 2008;56:478–491. doi: 10.1109/TSP.2007.907834. [DOI] [Google Scholar]
88.Wiesel A., Eldar Y.C., Hero A.O. Covariance estimation in decomposable Gaussian graphical models. IEEE Trans. Signal Process. 2010;58:1482–1492. doi: 10.1109/TSP.2009.2037350. [DOI] [Google Scholar]
89.Eldar Y.C. Generalized SURE for exponential families: Applications to regularization. IEEE Trans. Signal Process. 2009;57:471–481. doi: 10.1109/TSP.2008.2008212. [DOI] [Google Scholar]
90.Burg J.P., Luenberger D.G., Wenger D.L. Estimation of structured covariance matrices. Proc. IEEE. 1982;70:963–974. doi: 10.1109/PROC.1982.12427. [DOI] [Google Scholar]
91.Wei P., Pan W. Bayesian joint modeling of multiple gene networks and diverse genomic data to identify target genes of a transcription factor. Ann. Appl. Stat. 2012;6:334–355. doi: 10.1214/11-aoas502. [DOI] [PMC free article] [PubMed] [Google Scholar]
92.Wiesel A., Hero A.O. 2010 IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM) IEEE; 2010. Distributed covariance estimation in Gaussian graphical models; pp. 193–196. [Google Scholar]
93.Kalidindi S.R. Elsevier; 2015. Hierarchical Materials Informatics: Novel Analytics for Materials Data. [DOI] [Google Scholar]
94.Ghoreishi S.F., Molkeri A., Srivastava A., Arroyave R., Allaire D. Multi-information source fusion and optimization to realize ICME: Application to dual-phase materials. J. Mech. Des. N. Y. 2018;140 doi: 10.1115/1.4041034. [DOI] [Google Scholar]
95.Kalidindi S.R. A Bayesian framework for materials knowledge systems. MRS Communications. 2019;9:518–531. doi: 10.1557/mrc.2019.56. [DOI] [Google Scholar]
96.Markland T.E., Ceriotti M. Markland and Michele Ceriotti. Nuclear quantum effects enter the mainstream. Nat. Rev. Chem. 2018;2 doi: 10.1038/s41570-017-0109. [DOI] [Google Scholar]
97.Xie T., Grossman J.C. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys. Rev. Lett. 2018;120 doi: 10.1103/PhysRevLett.120.145301. [DOI] [PubMed] [Google Scholar]
98.Park C.W., Wolverton C. Developing an improved crystal graph convolutional neural network framework for accelerated materials discovery. Phys. Rev. Mater. Jun 2020;4 doi: 10.1103/PhysRevMaterials.4.063801. [DOI] [Google Scholar]
99.Chen C., Ye W., Zuo Y., Zheng C., Ong S.P. Graph networks as a universal machine learning framework for molecules and crystals. Chem. Mater. 2019;31:3564–3572. doi: 10.1021/acs.chemmater.9b01294. URL https://doi.org/10.1021/acs.chemmater.9b01294. [DOI] [Google Scholar]
100.Kohn W. Density functional and density matrix method scaling linearly with the number of atoms. Phys. Rev. Lett. 1996;76:3168–3171. doi: 10.1103/PhysRevLett.76.3168. [DOI] [PubMed] [Google Scholar]
101.Prodan E., Kohn W. Nearsightedness of electronic matter. Proc. Natl. Acad. Sci. USA. 2005;102:11635–11638. doi: 10.1073/pnas.0505436102. [DOI] [PMC free article] [PubMed] [Google Scholar]
102.Schmidt J., Marques M.R.G., Botti S., Marques M.A.L. Recent advances and applications of machine learning in solid-state materials science. npj Comput. Mater. 2019;5:83–3960. doi: 10.1038/s41524-019-0221-0. [DOI] [Google Scholar]
103.Esfahani M.S., Dougherty E.R. Incorporation of biological pathway knowledge in the construction of priors for optimal Bayesian classification. IEEE ACM Trans. Comput. Biol. Bioinf. 2014;11:202–218. doi: 10.1109/TCBB.2013.143. [DOI] [PubMed] [Google Scholar]
104.Esfahani M.S., Shahrokh E., Dougherty E.R. An optimization-based framework for the transformation of incomplete biological knowledge into a probabilistic structure and its application to the utilization of gene/protein signaling pathways in discrete phenotype classification. IEEE ACM Trans. Comput. Biol. Bioinf. 2015;12:1304–1321. doi: 10.1109/TCBB.2015.2424407. [DOI] [PubMed] [Google Scholar]
105.Boluki S., Esfahani M.S., Qian X., Dougherty E.R. Constructing pathway-based priors within a constructing pathway-based priors within a Gaussian mixture model for Bayesian regression and classification. IEEE ACM Trans. Comput. Biol. Bioinf. 2019;16:524–537. doi: 10.1109/TCBB.2017.2778715. [DOI] [PubMed] [Google Scholar]
106.Boluki S., Esfahani M.S., Qian X., Dougherty E.R. Incorporating biological prior knowledge for Bayesian learning via maximal knowledge-driven information priors. BMC Bioinf. 2017;18(Suppl 14):552. doi: 10.1186/s12859-017-1893-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
107.Guiasu S., Shenitzer A. The principle of maximum entropy. Math. Intel. 1985;7:42–48. doi: 10.1007/bf03023004. [DOI] [Google Scholar]
108.Heitmann A.A., Rossetti G.A. Thermodynamics of ferroelectric solid solutions with morphotropic phase boundaries. J. Am. Ceram. Soc. 2014;97:1661–1685. doi: 10.1111/jace.12979. [DOI] [Google Scholar]
109.Hoeting J., Madigan D., Raftery A., Volinsky C. Bayesian model averaging: A tutorial. Stat. Sci. 1999;4:382–417. https://www.jstor.org/stable/2676803 [Google Scholar]
110.Wasserman L. Bayesian model selection and model averaging. J. Math. Psychol. 2000;44:92–107. doi: 10.1006/jmps.1999.1278. [DOI] [PubMed] [Google Scholar]
111.Clarke B. Comparing Bayes model averaging and stacking when model approximation error cannot be ignored. J. Mach. Learn. Res. 2003;4:683–712. doi: 10.1162/153244304773936090. [DOI] [Google Scholar]
112.Clyde M.A., Ghosh J., Littman M.L. Bayesian adaptive sampling for variable selection and model averaging. J. Comput. Graph Stat. 2011;20:80–101. doi: 10.1198/jcgs.2010.09049. [DOI] [Google Scholar]
113.George E., Foster D. Calibration and empirical Bayes variable selection. Biometrika. 2000;87:731–747. doi: 10.1093/biomet/87.4.731. [DOI] [Google Scholar]
114.Yang Y. Regression with multiple candidate models: Selecting or mixing? Stat. Sin. 2003;13:783–809. [Google Scholar]
115.Monteith K., Carroll J.L., Seppi K., Martinez T. The 2011 International Joint Conference on Neural Networks. 2011. Turning Bayesian model averaging into Bayesian model combination; pp. 2657–2663. [DOI] [Google Scholar]
116.Madigan D., Raftery A.E. Model selection and accounting for model uncertainty in graphical models using Occam’s window. J. Am. Stat. Assoc. 1994;89:1535–1546. doi: 10.2307/2291017. [DOI] [Google Scholar]
117.Dehghannasiri R., Yoon B.-J., Dougherty E.R. Optimal experimental design for gene regulatory networks in the presence of uncertainty. IEEE ACM Trans. Comput. Biol. Bioinf. 2015;12:938–950. doi: 10.1109/TCBB.2014.2377733. [DOI] [PubMed] [Google Scholar]
118.Boluki S., Qian X., Dougherty E.R. Experimental design via generalized mean objective cost of uncertainty. IEEE Access. 2019;7:2223–2230. doi: 10.1109/ACCESS.2018.2886576. [DOI] [Google Scholar]
119.Rasmussen C.E., Williams C.K.I. The MIT Press; 2005. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) [DOI] [Google Scholar]
120.Talapatra A., Boluki S., Duong T., Qian X., Dougherty E., Arróyave R. Autonomous efficient experiment design for materials discovery with Bayesian model averaging. Phys. Rev. Mater. 2018;2 doi: 10.1103/PhysRevMaterials.2.113803. [DOI] [Google Scholar]
121.Barsoum M.W. John Wiley & Sons; 2013. MAX Phases: Properties of Machinable Ternary Carbides and Nitrides. [DOI] [Google Scholar]
122.Emmerich M.T.M., André H.D., Jan W.K. 2011 IEEE Congress of Evolutionary Computation (CEC) IEEE; 2011. Hypervolume-based expected improvement: Monotonicity properties and exact computation; pp. 2147–2154. [Google Scholar]
123.Hohenberg P., Kohn W. Inhomogeneous electron gas. Phys. Rev. 1964;136:B864–B871. doi: 10.1103/PhysRev.136.B864. [DOI] [Google Scholar]
124.Kohn W., Sham L.J. Self-consistent equations including exchange and correlation effects. Phys. Rev. 1965;140:A1133–A1138. doi: 10.1103/PhysRev.140.A1133. [DOI] [Google Scholar]
125.Talapatra A., Boluki S., Honarmandi P., Solomou A., Zhao G., Ghoreishi S.F., Molkeri A., Allaire D., Srivastava A., Qian X., et al. Experiment design frameworks for accelerated discovery of targeted materials across scales. Front. Mater. 2019;6:82. doi: 10.3389/fmats.2019.00082. [DOI] [Google Scholar]
126.Bacon F. Cambridge University Press; 2000. The New Organon. [DOI] [Google Scholar]
127.Dehghannasiri R., Yoon B.-J., Dougherty E.R. Efficient experimental design for uncertainty reduction in gene regulatory networks. BMC Bioinf. 2015;16:S2. doi: 10.1186/1471-2105-16-S13-S2. [DOI] [PMC free article] [PubMed] [Google Scholar]
128.Hong Y., Kwon B., Yoon B.-J. Optimal experimental design for uncertain systems based on coupled differential equations. IEEE Access. 2021;9:53804–53810. doi: 10.1109/ACCESS.2021.3071038. [DOI] [Google Scholar]
129.Woo H.-M., Hong Y., Kwon B., Yoon B.-J. Accelerating optimal experimental design for robust synchronization of uncertain kuramoto oscillator model using machine learning. IEEE Trans. Signal Process. 2021;69:6473–6487. doi: 10.1109/TSP.2021.3130967. [DOI] [Google Scholar]
130.Broumand A., Esfahani M.S., Yoon B.-J., Dougherty E.R. Discrete optimal Bayesian classification with error-conditioned sequential sampling. Pattern Recogn. 2015;48:3766–3782. doi: 10.1016/j.patcog.2015.03.023. [DOI] [Google Scholar]
131.Zhao G., Dougherty E., Yoon B.-J., Alexander F., Qian X. 24th International Conference on Artificial Intelligence and Statistics. AISTATS); 2021. Bayesian active learning by soft mean objective cost of uncertainty. [Google Scholar]
132.Ben-Gal I., Caramanis M. Sequential DOE via dynamic programming. IIE Trans. 2002;34:1087–1100. doi: 10.1023/A:1019670414725. [DOI] [Google Scholar]
133.Warren B. 2nd edition. John Wiley & Sons, Inc.; 2011. Powell. Approximate Dynamic Programming: Solving the Curses of Dimensionality. [DOI] [Google Scholar]
134.Huan X., Marzouk Y.M. Sequential Bayesian optimal experimental design via approximate dynamic programming. arXiv. 2016 doi: 10.48550/arXiv.1604.08320. Preprint at. [DOI] [Google Scholar]
135.Zhao G., Dougherty E., Yoon B.-J., Alexander F., Qian X. 35th Conference on Neural Information Processing Systems. NeurIPS); 2021. Efficient active learning for Gaussian process classification by error reduction. [Google Scholar]
136.Hernandez A., Balasubramanian A., Yuan F., Mason S.A.M., Mueller T. Fast, accurate, and transferable many-body interatomic potentials by symbolic regression. npj Comput. Mater. 2019;5:112. doi: 10.1038/s41524-019-0249-1. [DOI] [Google Scholar]
137.Rubinstein R., Bruckstein A.M., Elad M. Dictionaries for sparse representation modeling. Proc. IEEE. 2010;98:1045–1057. doi: 10.1109/JPROC.2010.2040551. [DOI] [Google Scholar]
138.Bengio Y., Courville A., Vincent P. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 2013;35:1798–1828. doi: 10.1109/TPAMI.2013.50. [DOI] [PubMed] [Google Scholar]
139.Li Y., Yang M., Zhang Z. A survey of multi-view representation learning. IEEE Trans. Knowl. Data Eng. 2019;31:1863–1883. doi: 10.1109/TKDE.2018.2872063. [DOI] [Google Scholar]
140.Long W., Lu Z., Cui L. Deep learning-based feature engineering for stock price movement prediction. Knowl. Base Syst. 2019;164:163–173. doi: 10.1016/j.knosys.2018.10.034. [DOI] [Google Scholar]
141.James M.K., Veeramachaneni K. 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA) IEEE; 2015. Deep feature synthesis: Towards automating data science endeavors; pp. 1–10. [DOI] [Google Scholar]
142.Kaul A., Maheshwary S., Pudi V. 2017 IEEE International Conference on Data Mining (ICDM) IEEE; 2017. Autolearn—automated feature generation and selection; pp. 217–226. [Google Scholar]
143.Khurana U., Turaga D., Samulowitz H., Srinivasan P. 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW) IEEE; 2016. Cognito: Automated feature engineering for supervised learning; pp. 1304–1307. [Google Scholar]
144.Khurana U., Samulowitz H., Turaga D. Feature engineering for predictive modeling using reinforcement learning. Proc of AAAI 2018. 2018;32:3407–3414. [Google Scholar]
145.Zhang J., Jianye H., Fogelman-Soulié F., Wang Z. Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. 2019. Automatic feature engineering by deep reinforcement learning; pp. 2312–2314. [Google Scholar]
146.Ghiringhelli L.M., Vybiral J., Levchenko S.V., Draxl C., Scheffler M. Big data of materials science: critical role of the descriptor. Phys. Rev. Lett. 2015;114 doi: 10.1103/PhysRevLett.114.105503. [DOI] [PubMed] [Google Scholar]
147.Ghiringhelli L.M., Vybiral J., Ahmetcik E., Ouyang R., Levchenko S.V., Draxl C., Scheffler M. Learning physical descriptors for materials science by compressed sensing. New J. Phys. 2017;19 doi: 10.1088/1367-2630/aa57bf. [DOI] [Google Scholar]
148.Ouyang R., Curtarolo S., Ahmetcik E., Scheffler M., Ghiringhelli L.M. SISSO: A compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates. Phys. Rev. Mater. 2018;2 doi: 10.1103/PhysRevMaterials.2.083802. [DOI] [Google Scholar]
149.Fan J., Lv J. Sure independence screening for ultrahigh dimensional feature space. J. Roy. Stat. Soc. B. 2008;70:849–911. doi: 10.1111/j.1467-9868.2008.00674.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
150.Tibshirani R. Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. B. 1996;58:267–288. https://www.jstor.org/stable/2346178 [Google Scholar]
151.Xiang Z., Fan M., Guillermo Vazquez Tovar, Trehern W., Yoon B.-J., Qian X., Arróyave R., Qian X. the 35th AAAI Conference on Artificial Intelligence. AAAI 2021); 2021. Physics-constrained automatic feature engineering for predictive modeling in materials science. [Google Scholar]
152.Mnih V., Kavukcuoglu K., Silver D., Rusu A.A., Veness J., Bellemare M.G., Graves A., Riedmiller M., Fidjeland A.K., Ostrovski G., et al. Human-level control through deep reinforcement learning. Nature. 2015;518:529–533. doi: 10.1038/nature14236. [DOI] [PubMed] [Google Scholar]
153.Podryabinkin E.V., Shapeev A.V. Active learning of linearly parametrized interatomic potentials. Comput. Mater. Sci. 2017;140:171–180. doi: 10.1016/j.commatsci.2017.08.031. [DOI] [Google Scholar]
154.Zhang L., Lin D.-Y., Wang H., Car R., Weinan E. Active learning of uniformly accurate interatomic potentials for materials simulation. Phys. Rev. Mater. Feb 2019;3 doi: 10.1103/PhysRevMaterials.3.023804. [DOI] [Google Scholar]
155.Vandermause J., Torrisi S.B., Batzner S., Xie Y., Sun L., Kolpak A.M., Kozinsky B. On-the-fly active learning of interpretable Bayesian force fields for atomistic rare events. npj Comput. Mater. 2020;6:20. doi: 10.1038/s41524-020-0283-z. [DOI] [Google Scholar]
156.Wilson N., Willhelm D., Qian X., Arróyave R., Qian X. Batch active learning for accelerating the development of interatomic potentials. Comput. Mater. Sci. 2022;208 doi: 10.1016/j.commatsci.2022.111330. [DOI] [Google Scholar]
157.Chen C., Deng Z., Tran R., Tang H., Chu I.-H., Ong S.P. Accurate force field for molybdenum by machine learning large materials data. Phys. Rev. Mater. 2017;1 doi: 10.1103/PhysRevMaterials.1.043603. [DOI] [Google Scholar]

[bib1] 1.Kaufman L., Ågren J. CALPHAD, first and second generation–Birth of the materials genome. Scripta Mater. 2014;70:3–6. doi: 10.1016/j.scriptamat.2012.12.003. [DOI] [Google Scholar]

[bib2] 2.McDowell D.L., Kalidindi S.R. The materials innovation ecosystem: a key enabler for the materials genome initiative. MRS Bull. 2016;41:326–337. doi: 10.1557/mrs.2016.61. [DOI] [Google Scholar]

[bib3] 3.Ghiringhelli L.M., Vybiral J., Levchenko S.V., Draxl C., Scheffler M. Big data of materials science: Critical role of the descriptor. Phys. Rev. Lett. 2015;114 doi: 10.1103/PhysRevLett.114.105503. [DOI] [PubMed] [Google Scholar]

[bib4] 4.Kim C., Pilania G., Ramprasad R. From organized high-throughput data to phenomenological theory using machine learning: The example of dielectric breakdown. Chem. Mater. 2016;28:1304–1311. doi: 10.1021/acs.chemmater.5b04109. [DOI] [Google Scholar]

[bib5] 5.Dehghannasiri R., Xue D., Balachandran P.V., Yousefi M.R., Dalton L.A., Lookman T., Dougherty E.R. Optimal experimental design for materials discovery. Comput. Mater. Sci. 2017;129:311–322. doi: 10.1016/j.commatsci.2016.11.041. [DOI] [Google Scholar]

[bib6] 6.Yip S., editor. Handbook of Materials Modeling. Springer; 2005. [DOI] [Google Scholar]

[bib7] 7.Krenn M., Pollice R., Guo S.Y., Aldeghi M., Cervera-Lierta A., Friederich P., Dos Passos Gomes G., Häse F., Jinich A., Nigam A.K., et al. On scientific understanding with artificial intelligence. Nat. Rev. Phys. 2022;4:761–769. doi: 10.1038/s42254-022-00518-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib8] 8.Wang H., Fu T., Du Y., Gao W., Huang K., Liu Z., Chandak P., Liu S., Van Katwyk P., Deac A., et al. Scientific discovery in the age of artificial intelligence. Nature. 2023;620:47–60. doi: 10.1038/s41586-023-06221-2. [DOI] [PubMed] [Google Scholar]

[bib9] 9.Zhang X., Wang L., Jacob H., Luo Y., Fu C., Xie Y., Liu M., Lin Y., Zhao X., Yan K., et al. Artificial intelligence for science in quantum, atomistic, and continuum systems. arXiv. 2023 doi: 10.48550/arXiv.2307.08423. Preprint at. [DOI] [Google Scholar]

[bib10] 10.Choudhary A., Fox G., Hey T. World Scientific Publishing Co. Pte Ltd; 2023. Artificial Intelligence for Science: A Deep Learning Revolution. [DOI] [Google Scholar]

[bib11] 11.Arróyave R., Khatamsaz D., Vela B., Couperthwaite R., Molkeri A., Singh P., Johnson D.D., Qian X., Srivastava A., Allaire D. A perspective on Bayesian methods applied to materials discovery and design. MRS Communications. 2022;12:1037–1049. doi: 10.1557/s43579-022-00288-0. [DOI] [Google Scholar]

[bib12] 12.Fuhr A.S., Sumpter B.G. Deep generative models for materials discovery and machine learning-accelerated innovation. Front. Mater. 2022;9 doi: 10.3389/fmats.2022.865270. [DOI] [Google Scholar]

[bib13] 13.Pyzer-Knapp E.O., Pitera J.W., Staar P.W.J., Takeda S., Laino T., Sanders D.P., Sexton J., Smith J.R., Curioni A. Accelerating materials discovery using artificial intelligence, high performance computing and robotics. npj Comput. Mater. 2022;8:84. doi: 10.1038/s41524-022-00765-z. [DOI] [Google Scholar]

[bib14] 14.Wiener N. Wiley; 1949. Extrapolation, Interpolation, and Smoothing of Stationary Time Series. [DOI] [Google Scholar]

[bib15] 15.Kailath T., Sayed A.H., Hassibi B. Prentice-Hall; 2000. Linear Estimation. [Google Scholar]

[bib16] 16.Betts J.T. 2nd ed. SIAM Press; 2010. Practical Methods for Optimal Control Using Nonlinear Programming. [DOI] [Google Scholar]

[bib17] 17.Dalton L., Dougherty E.R. SPIE Press; 2020. Optimal Bayesian Classification. [Google Scholar]

[bib18] 18.Rockafellar R.T., Uryasev S. The fundamental risk quadrangle in risk management, optimization and statistical estimation. Surveys in Operations Research and Management Science. 2013;18:33–53. doi: 10.1016/j.sorms.2013.03.001. [DOI] [Google Scholar]

[bib19] 19.Spantini A., Cui T., Willcox K., Tenorio L., Marzouk Y. Goal-oriented optimal approximations of Bayesian linear inverse problems. SIAM J. Sci. Comput. 2017;39:S167–S196. doi: 10.1137/16m1082123. [DOI] [Google Scholar]

[bib20] 20.Kuznetsov V.P. Stable detection when signal and spectrum of normal noise are inaccurately known. Telecommun. Radio Eng. 1976;30:58–64. [Google Scholar]

[bib21] 21.Kassam S.A., Lim T.L. Robust Wiener filters. J. Franklin Inst. 1977;304:171–185. doi: 10.1016/0016-0032(77)90011-4. [DOI] [Google Scholar]

[bib22] 22.Poor H. Poor. On robust Wiener filtering. IEEE Trans. Automat. Control. 1980;25:531–536. doi: 10.1109/TAC.1980.1102349. [DOI] [Google Scholar]

[bib23] 23.Chen Y., Chen B. Minimax robust deconvolution filters under stochastic parametric and noise uncertainties. IEEE Trans. Signal Process. 1994;42:32–45. doi: 10.1109/78.258119. [DOI] [Google Scholar]

[bib24] 24.Verdu S., Poor H. Minimax linear observers and regulators for stochastic systems with uncertain second-order statistics. IEEE Trans. Automat. Control. 1984;29:499–511. doi: 10.1109/TAC.1984.1103576. [DOI] [Google Scholar]

[bib25] 25.Li T., Yi X., Carmanis C., Ravikumar P. In: Singh A., Zhu J., editors. 1–9. 2017. Minimax Gaussian classification & clustering.https://proceedings.mlr.press/v54/li17a.html (Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Volume 54 of Proceedings of Machine Learning Research). [Google Scholar]

[bib26] 26.Bertsimas D., Paskov I. Stable regression: On the power of optimization over randomization. J. Mach. Learn. Res. 2020;21:1–25. http://jmlr.org/papers/v21/19-408.html [Google Scholar]

[bib27] 27.Kalman R.E., Bucy R.S. New results in linear filtering and prediction theory. J. Basic Eng. 1961;83:95–108. doi: 10.1115/1.3658902. [DOI] [Google Scholar]

[bib28] 28.Mehra R. Approaches to adaptive filtering. IEEE Trans. Automat. Control. 1972;17:693–698. doi: 10.1109/TAC.1972.1100100. [DOI] [Google Scholar]

[bib29] 29.Morris J. The Kalman filter: A robust estimator for some classes of linear quadratic problems. IEEE Trans. Inf. Theor. 1976;22:526–534. doi: 10.1109/TIT.1976.1055611. [DOI] [Google Scholar]

[bib30] 30.Bellman R., Kalaba R. Dynamic programming and adaptive processes: Mathematical foundation. IRE Trans. Automatic Control. 1960;5:5–10. doi: 10.1109/TAC.1960.6429288. [DOI] [Google Scholar]

[bib31] 31.Silver E.A. MIT Cambridge Operations Research Center; 1963. Markovian Decision Processes with Uncertain Transition Probabilities or Rewards. Technical report. [Google Scholar]

[bib32] 32.Martin J.J. Wiley; 1967. Bayesian Decision Problems and Markov Chains. [Google Scholar]

[bib33] 33.Grigoryan A.M., Dougherty E.R. Bayesian robust optimal linear filters. Signal Process. 2001;81:2503–2521. doi: 10.1016/S0165-1684(01)00144-X. [DOI] [Google Scholar]

[bib34] 34.Dougherty E.R., Hua J., Xiong Z., Chen Y. Optimal robust classifiers. Pattern Recogn. 2005;38:1520–1532. doi: 10.1016/j.patcog.2005.01.019. [DOI] [Google Scholar]

[bib35] 35.Anthony C., Atkinson . 1–17. American Cancer Society; 2015. (Optimal Design). [DOI] [Google Scholar]

[bib36] 36.Sebastiani P., Wynn H.P. Maximum entropy sampling and optimal Bayesian experimental design. J. Roy. Stat. Soc. B. 2000;62:145–157. doi: 10.1111/1467-9868.00225. [DOI] [Google Scholar]

[bib37] 37.Mussmann S., Liang P. Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research. Stockholmsmässan; 2018. On the relationship between data efficiency and error for uncertainty sampling Jennifer Dy and Andreas Krause; pp. 3674–3682. PMLR. [Google Scholar]

[bib38] 38.Fedorov V.V., Leonov S.L. Chapman and Hall/CRC Press; 2014. Optimal Design for Nonlinear Response Models. [DOI] [Google Scholar]

[bib39] 39.Duarte B.P.M., Wong W.K., Atkinson A.C. A semi-infinite programming based algorithm for determining t-optimum designs for model discrimination. J. Multivariate Anal. 2015;135:11–24. doi: 10.1016/j.jmva.2014.11.006. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib40] 40.Dennis V. SIAM; 1972. Lindley. Bayesian Statistics, A Review. [Google Scholar]

[bib41] 41.Huan X., Marzouk Y. Gradient-based stochastic optimization methods in Bayesian experimental design. Int. J. Uncertain. Quantification. 2014;4:479–510. doi: 10.1615/int.j.uncertaintyquantification.2014006730. [DOI] [Google Scholar]

[bib42] 42.Zhao G., Qian X., Yoon B.-J., Alexander F.J., Dougherty E.R. Model-based robust filtering and experimental design for stochastic differential equation systems. IEEE Trans. Signal Process. 2020;68:3849–3859. doi: 10.1109/TSP.2020.3001384. [DOI] [Google Scholar]

[bib43] 43.Foster A., Martin J., O’Meara M., Yee W.T., Rainforth T. Vol. 108. 2020. A unified stochastic gradient approach to designing Bayesian-optimal experiments; pp. 2959–2969.https://proceedings.mlr.press/v108/foster20a.html (Proceedings of the 23th International Conference on Artificial Intelligence and Statistics (AISTATS)). [Google Scholar]

[bib44] 44.Zhao G., Dougherty E., Yoon B.-J., Alexander F., Qian X. 9th International Conference on Learning Representations. ICLR); 2021. Uncertainty-aware active learning for optimal Bayesian classifier. [Google Scholar]

[bib45] 45.Jones D.R., Schonlau M., Welch W.J. Efficient global optimization of expensive black-box functions. J. Global Optim. 1998;13:455–492. doi: 10.1023/a:1008306431147. [DOI] [Google Scholar]

[bib46] 46.Frazier P.I., Powell W.B., Dayanik S. A knowledge-gradient policy for sequential information collection. SIAM J. Control Optim. 2008;47:2410–2439. doi: 10.1137/070693424. [DOI] [Google Scholar]

[bib47] 47.Frazier P., Powell W., Dayanik S. The knowledge-gradient policy for correlated normal beliefs. Inf. J. Comput. 2009;21:599–613. doi: 10.1287/ijoc.1080.0314. [DOI] [Google Scholar]

[bib48] 48.Denil M., Agrawal P., Kulkarni T.D., Tom E., Battaglia P., de Freitas N. International Conference on Learning Representations (ICLR) 2017. Learning to perform physics experiments via deep reinforcement learning. [Google Scholar]

[bib49] 49.Hadjidoukas P.E., Angelikopoulos P., Papadimitriou C., Koumoutsakos P. Π4U: A high performance computing framework for Bayesian uncertainty quantification of complex models. Journal of Computational Physics. 2015;284:1–21. [Google Scholar]

[bib50] 50.Adams B.M., Bohnhoff W.J., Dalbey K.R., Ebeida M.S., Eddy J.P., Eldred M.S., Hooper R.W., Hough P.D., Hu K.T., Jakeman J.D., et al. A Multilevel Parallel Object-Oriented Framework for Design Optimization, Parameter Estimation, Uncertainty Quantification, and Sensitivity Analysis: Version 6.15 User’s Manual. Sandia Technical Report SAND2020-12495. 2021 https://dakota.sandia.gov/content/about. [Google Scholar]

[bib51] 51.Xue D., Balachandran P.V., Hogden J., Theiler J., Xue D., Lookman T. Accelerated search for materials with targeted properties by adaptive design. Nat. Commun. 2016;7 doi: 10.1038/ncomms11241. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib52] 52.Xue D., Balachandran P.V., Yuan R., Hu T., Qian X., Dougherty E.R., Lookman T. Accelerated search for BaTiO3-based piezoelectrics with vertical morphotropic phase boundary using Bayesian learning. Proc. Natl. Acad. Sci. USA. 2016;113:13301–13306. doi: 10.1073/pnas.1607412113. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib53] 53.Peter I.F., Wang J. Information Science for Materials Discovery and Design. Springer; 2016. Bayesian optimization for materials design; pp. 45–75. [DOI] [Google Scholar]

[bib54] 54.Qian X., Dougherty E.R. Bayesian regression with network prior: Optimal Bayesian filtering perspective. IEEE Trans. Signal Process. 2016;64:6243–6253. doi: 10.1109/TSP.2016.2605072. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib55] 55.Ueno T., Rhone T.D., Hou Z., Mizoguchi T., Tsuda K. COMBO: An efficient bayesian optimization library for materials science. Materials Discovery. 2016;4:18–21. doi: 10.1016/j.md.2016.04.001. [DOI] [Google Scholar]

[bib56] 56.Seko A., Maekawa T., Tsuda K., Tanaka I. Machine learning with systematic density-functional theory calculations: Application to melting temperatures of single-and binary-component solids. Phys. Rev. B. 2014;89 doi: 10.1103/PhysRevB.89.054303. [DOI] [Google Scholar]

[bib57] 57.Ju S., Shiga T., Feng L., Hou Z., Tsuda K., Shiomi J. Designing nanostructures for phonon transport via Bayesian optimization. Phys. Rev. X. 2017;7 doi: 10.1103/PhysRevX.7.021024. [DOI] [Google Scholar]

[bib58] 58.Gopakumar A.M., Balachandran P.V., Xue D., Gubernatis J.E., Lookman T. Multi-objective optimization for materials discovery via adaptive design. Sci. Rep. 2018;8:3738. doi: 10.1038/s41598-018-21936-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib59] 59.Khatamsaz D., Molkeri A., Couperthwaite R., James J., Arróyave R., Srivastava A., Allaire D. Adaptive active subspace-based efficient multifidelity materials design. Mater. Des. 2021;209 doi: 10.1016/j.matdes.2021.110001. [DOI] [Google Scholar]

[bib60] 60.Castillo A.R., Kalidindi S.R. Bayesian estimation of single ply anisotropic elastic constants from spherical indentations on multi-laminate polymer-matrix fiber-reinforced composite samples. Meccanica. 2021;56:1575–1586. doi: 10.1007/s11012-020-01154-w. [DOI] [Google Scholar]

[bib61] 61.Marshall A., Kalidindi S.R. Autonomous development of a machine-learning model for the plastic response of two-phase composites from micromechanical finite element models. JOM. 2021;73:2085–2095. doi: 10.1007/s11837-021-04696-w. [DOI] [Google Scholar]

[bib62] 62.Honarmandi P., Hossain M.A., Arroyave R., Baxevanis T. A top-down characterization of NiTi single-crystal inelastic properties within confidence bounds through Bayesian inference. Shap. Mem. Superelasticity. 2021;7:50–64. doi: 10.1007/s40830-021-00311-8. [DOI] [Google Scholar]

[bib63] 63.Ladygin V., Beniya I., Makarov E., Shapeev A. Bayesian learning of thermodynamic integration and numerical convergence for accurate phase diagrams. Phys. Rev. B. 2021;104 doi: 10.1103/PhysRevB.104.104102. [DOI] [Google Scholar]

[bib64] 64.Olivier A., Shields M.D., Graham-Brady L. Bayesian neural networks for uncertainty quantification in data-driven materials modeling. Comput. Methods Appl. Mech. Eng. 2021;386 doi: 10.1016/j.cma.2021.114079. [DOI] [Google Scholar]

[bib65] 65.Yoon B.-J., Qian X., Dougherty E.R. Quantifying the objective cost of uncertainty in complex dynamical systems. IEEE Trans. Signal Process. 2013;61:2256–2266. doi: 10.1109/TSP.2013.2251336. [DOI] [Google Scholar]

[bib66] 66.Yoon B.-J., Qian X., Dougherty E.R. Quantifying the multi-objective cost of uncertainty. IEEE Access. 2021;9:80351–80359. doi: 10.1109/ACCESS.2021.3085486. [DOI] [Google Scholar]

[bib67] 67.Dalton L.A., Dougherty E.R. Intrinsically optimal Bayesian robust filtering. IEEE Trans. Signal Process. 2014;62:657–670. doi: 10.1109/TSP.2013.2291213. [DOI] [Google Scholar]

[bib68] 68.Box G.E.P., Tiao G.C. Wiley; 1973. Bayesian Inference in Statistical Analysis. [DOI] [Google Scholar]

[bib69] 69.Berger J.O. Springer-Verlag; 1985. Statistical Decision Theory and Bayesian Analysis. [DOI] [Google Scholar]

[bib70] 70.Christopher M. Springer; 2006. Bishop. Pattern Recognition and Machine Learning.https://link.springer.com/book/9780387310732 [Google Scholar]

[bib71] 71.Dougherty E.R., Zollanvari A., Braga-Neto U.M. The illusion of distribution-free small-sample classification in genomics. Curr. Genom. 2011;12:333–341. doi: 10.2174/138920211796429763. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib72] 72.Dougherty E.R., Dalton L.A. Scientific knowledge is possible with small-sample classification. EURASIP J. Bioinf. Syst. Biol. 2013;2013:10–12. doi: 10.1186/1687-4153-2013-10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib73] 73.Coveney P.V., Dougherty E.R., Highfield R.R. Big data need big theory too. Phil. Trans. R. Soc. A. 2016;374 doi: 10.1098/rsta.2016.0153. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib74] 74.Jaynes E.T. 1980. What Is the Question? Bayesian Statistics. [Google Scholar]

[bib75] 75.Jeffreys H. An invariant form for the prior probability in estimation problems. Proc. R. Soc. Lond. A Math. Phys. Sci. 1946;186:453–461. doi: 10.1098/rspa.1946.0056. [DOI] [PubMed] [Google Scholar]

[bib76] 76.Zellner A., University of Chicago . University of Chicago, Graduate School of Business, Department of Economics; 1995. Department of Economics. Past and Recent Results on Maximal Data Information Priors. Working Paper Series in Economics and Econometrics. [Google Scholar]

[bib77] 77.Rissanen J. A universal prior for integers and estimation by minimum description length. Ann. Stat. 1983;11:416–431. doi: 10.1214/aos/1176346150. [DOI] [Google Scholar]

[bib78] 78.Rodriguez C.C. Entropic priors for discrete probabilistic networks and for mixtures of Gaussian models. AIP Conf. Proc. 2002 doi: 10.1063/1.1477063. [DOI] [Google Scholar]

[bib79] 79.Berger J.O., Bernardo J.M. On the development of reference priors. Bayesian statistics. 1992;4:35–60. [Google Scholar]

[bib80] 80.Spall J.C., Hill S.D. Least-informative Bayesian prior distributions for finite samples based on information theory. IEEE Trans. Automat. Control. 1990;35:580–583. doi: 10.1109/CDC.1989.70640. [DOI] [Google Scholar]

[bib81] 81.Bernardo J.M. Reference posterior distributions for Bayesian inference. J. Roy. Stat. Soc. B. 1979;41:113–128. doi: 10.1111/j.2517-6161.1979.tb01066.x. [DOI] [Google Scholar]

[bib82] 82.Kass R.E., Wasserman L. The selection of prior distributions by formal rules. J. Am. Stat. Assoc. 1996;91:1343–1370. doi: 10.1080/01621459.1996.10477003. [DOI] [Google Scholar]

[bib83] 83.Berger J.O., Bernardo J.M., Sun D. Objective priors for discrete parameter spaces. J. Am. Stat. Assoc. 2012;107:636–648. doi: 10.1080/01621459.2012.682538. [DOI] [Google Scholar]

[bib84] 84.Jaynes E.T. Information theory and statistical mechanics. Phys. Rev. 1957;106:620–630. doi: 10.1103/PhysRev.106.620. [DOI] [Google Scholar]

[bib85] 85.Jaynes E. Jaynes. Prior probabilities. IEEE Trans. Syst. Sci. Cybern. 1968;4:227–241. doi: 10.1109/TSSC.1968.300117. [DOI] [Google Scholar]

[bib86] 86.Zellner A. Models, prior information, and Bayesian analysis. J. Econom. 1996;75:51–68. doi: 10.1016/0304-4076(95)01768-2. [DOI] [Google Scholar]

[bib87] 87.Werner K., Jansson M., Stoica P. On estimation of covariance matrices with kronecker product structure. IEEE Trans. Signal Process. 2008;56:478–491. doi: 10.1109/TSP.2007.907834. [DOI] [Google Scholar]

[bib88] 88.Wiesel A., Eldar Y.C., Hero A.O. Covariance estimation in decomposable Gaussian graphical models. IEEE Trans. Signal Process. 2010;58:1482–1492. doi: 10.1109/TSP.2009.2037350. [DOI] [Google Scholar]

[bib89] 89.Eldar Y.C. Generalized SURE for exponential families: Applications to regularization. IEEE Trans. Signal Process. 2009;57:471–481. doi: 10.1109/TSP.2008.2008212. [DOI] [Google Scholar]

[bib90] 90.Burg J.P., Luenberger D.G., Wenger D.L. Estimation of structured covariance matrices. Proc. IEEE. 1982;70:963–974. doi: 10.1109/PROC.1982.12427. [DOI] [Google Scholar]

[bib91] 91.Wei P., Pan W. Bayesian joint modeling of multiple gene networks and diverse genomic data to identify target genes of a transcription factor. Ann. Appl. Stat. 2012;6:334–355. doi: 10.1214/11-aoas502. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib92] 92.Wiesel A., Hero A.O. 2010 IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM) IEEE; 2010. Distributed covariance estimation in Gaussian graphical models; pp. 193–196. [Google Scholar]

[bib93] 93.Kalidindi S.R. Elsevier; 2015. Hierarchical Materials Informatics: Novel Analytics for Materials Data. [DOI] [Google Scholar]

[bib94] 94.Ghoreishi S.F., Molkeri A., Srivastava A., Arroyave R., Allaire D. Multi-information source fusion and optimization to realize ICME: Application to dual-phase materials. J. Mech. Des. N. Y. 2018;140 doi: 10.1115/1.4041034. [DOI] [Google Scholar]

[bib95] 95.Kalidindi S.R. A Bayesian framework for materials knowledge systems. MRS Communications. 2019;9:518–531. doi: 10.1557/mrc.2019.56. [DOI] [Google Scholar]

[bib96] 96.Markland T.E., Ceriotti M. Markland and Michele Ceriotti. Nuclear quantum effects enter the mainstream. Nat. Rev. Chem. 2018;2 doi: 10.1038/s41570-017-0109. [DOI] [Google Scholar]

[bib97] 97.Xie T., Grossman J.C. Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties. Phys. Rev. Lett. 2018;120 doi: 10.1103/PhysRevLett.120.145301. [DOI] [PubMed] [Google Scholar]

[bib98] 98.Park C.W., Wolverton C. Developing an improved crystal graph convolutional neural network framework for accelerated materials discovery. Phys. Rev. Mater. Jun 2020;4 doi: 10.1103/PhysRevMaterials.4.063801. [DOI] [Google Scholar]

[bib99] 99.Chen C., Ye W., Zuo Y., Zheng C., Ong S.P. Graph networks as a universal machine learning framework for molecules and crystals. Chem. Mater. 2019;31:3564–3572. doi: 10.1021/acs.chemmater.9b01294. URL https://doi.org/10.1021/acs.chemmater.9b01294. [DOI] [Google Scholar]

[bib100] 100.Kohn W. Density functional and density matrix method scaling linearly with the number of atoms. Phys. Rev. Lett. 1996;76:3168–3171. doi: 10.1103/PhysRevLett.76.3168. [DOI] [PubMed] [Google Scholar]

[bib101] 101.Prodan E., Kohn W. Nearsightedness of electronic matter. Proc. Natl. Acad. Sci. USA. 2005;102:11635–11638. doi: 10.1073/pnas.0505436102. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib102] 102.Schmidt J., Marques M.R.G., Botti S., Marques M.A.L. Recent advances and applications of machine learning in solid-state materials science. npj Comput. Mater. 2019;5:83–3960. doi: 10.1038/s41524-019-0221-0. [DOI] [Google Scholar]

[bib103] 103.Esfahani M.S., Dougherty E.R. Incorporation of biological pathway knowledge in the construction of priors for optimal Bayesian classification. IEEE ACM Trans. Comput. Biol. Bioinf. 2014;11:202–218. doi: 10.1109/TCBB.2013.143. [DOI] [PubMed] [Google Scholar]

[bib104] 104.Esfahani M.S., Shahrokh E., Dougherty E.R. An optimization-based framework for the transformation of incomplete biological knowledge into a probabilistic structure and its application to the utilization of gene/protein signaling pathways in discrete phenotype classification. IEEE ACM Trans. Comput. Biol. Bioinf. 2015;12:1304–1321. doi: 10.1109/TCBB.2015.2424407. [DOI] [PubMed] [Google Scholar]

[bib105] 105.Boluki S., Esfahani M.S., Qian X., Dougherty E.R. Constructing pathway-based priors within a constructing pathway-based priors within a Gaussian mixture model for Bayesian regression and classification. IEEE ACM Trans. Comput. Biol. Bioinf. 2019;16:524–537. doi: 10.1109/TCBB.2017.2778715. [DOI] [PubMed] [Google Scholar]

[bib106] 106.Boluki S., Esfahani M.S., Qian X., Dougherty E.R. Incorporating biological prior knowledge for Bayesian learning via maximal knowledge-driven information priors. BMC Bioinf. 2017;18(Suppl 14):552. doi: 10.1186/s12859-017-1893-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib107] 107.Guiasu S., Shenitzer A. The principle of maximum entropy. Math. Intel. 1985;7:42–48. doi: 10.1007/bf03023004. [DOI] [Google Scholar]

[bib108] 108.Heitmann A.A., Rossetti G.A. Thermodynamics of ferroelectric solid solutions with morphotropic phase boundaries. J. Am. Ceram. Soc. 2014;97:1661–1685. doi: 10.1111/jace.12979. [DOI] [Google Scholar]

[bib109] 109.Hoeting J., Madigan D., Raftery A., Volinsky C. Bayesian model averaging: A tutorial. Stat. Sci. 1999;4:382–417. https://www.jstor.org/stable/2676803 [Google Scholar]

[bib110] 110.Wasserman L. Bayesian model selection and model averaging. J. Math. Psychol. 2000;44:92–107. doi: 10.1006/jmps.1999.1278. [DOI] [PubMed] [Google Scholar]

[bib111] 111.Clarke B. Comparing Bayes model averaging and stacking when model approximation error cannot be ignored. J. Mach. Learn. Res. 2003;4:683–712. doi: 10.1162/153244304773936090. [DOI] [Google Scholar]

[bib112] 112.Clyde M.A., Ghosh J., Littman M.L. Bayesian adaptive sampling for variable selection and model averaging. J. Comput. Graph Stat. 2011;20:80–101. doi: 10.1198/jcgs.2010.09049. [DOI] [Google Scholar]

[bib113] 113.George E., Foster D. Calibration and empirical Bayes variable selection. Biometrika. 2000;87:731–747. doi: 10.1093/biomet/87.4.731. [DOI] [Google Scholar]

[bib114] 114.Yang Y. Regression with multiple candidate models: Selecting or mixing? Stat. Sin. 2003;13:783–809. [Google Scholar]

[bib115] 115.Monteith K., Carroll J.L., Seppi K., Martinez T. The 2011 International Joint Conference on Neural Networks. 2011. Turning Bayesian model averaging into Bayesian model combination; pp. 2657–2663. [DOI] [Google Scholar]

[bib116] 116.Madigan D., Raftery A.E. Model selection and accounting for model uncertainty in graphical models using Occam’s window. J. Am. Stat. Assoc. 1994;89:1535–1546. doi: 10.2307/2291017. [DOI] [Google Scholar]

[bib117] 117.Dehghannasiri R., Yoon B.-J., Dougherty E.R. Optimal experimental design for gene regulatory networks in the presence of uncertainty. IEEE ACM Trans. Comput. Biol. Bioinf. 2015;12:938–950. doi: 10.1109/TCBB.2014.2377733. [DOI] [PubMed] [Google Scholar]

[bib118] 118.Boluki S., Qian X., Dougherty E.R. Experimental design via generalized mean objective cost of uncertainty. IEEE Access. 2019;7:2223–2230. doi: 10.1109/ACCESS.2018.2886576. [DOI] [Google Scholar]

[bib119] 119.Rasmussen C.E., Williams C.K.I. The MIT Press; 2005. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) [DOI] [Google Scholar]

[bib120] 120.Talapatra A., Boluki S., Duong T., Qian X., Dougherty E., Arróyave R. Autonomous efficient experiment design for materials discovery with Bayesian model averaging. Phys. Rev. Mater. 2018;2 doi: 10.1103/PhysRevMaterials.2.113803. [DOI] [Google Scholar]

[bib121] 121.Barsoum M.W. John Wiley & Sons; 2013. MAX Phases: Properties of Machinable Ternary Carbides and Nitrides. [DOI] [Google Scholar]

[bib122] 122.Emmerich M.T.M., André H.D., Jan W.K. 2011 IEEE Congress of Evolutionary Computation (CEC) IEEE; 2011. Hypervolume-based expected improvement: Monotonicity properties and exact computation; pp. 2147–2154. [Google Scholar]

[bib123] 123.Hohenberg P., Kohn W. Inhomogeneous electron gas. Phys. Rev. 1964;136:B864–B871. doi: 10.1103/PhysRev.136.B864. [DOI] [Google Scholar]

[bib124] 124.Kohn W., Sham L.J. Self-consistent equations including exchange and correlation effects. Phys. Rev. 1965;140:A1133–A1138. doi: 10.1103/PhysRev.140.A1133. [DOI] [Google Scholar]

[bib125] 125.Talapatra A., Boluki S., Honarmandi P., Solomou A., Zhao G., Ghoreishi S.F., Molkeri A., Allaire D., Srivastava A., Qian X., et al. Experiment design frameworks for accelerated discovery of targeted materials across scales. Front. Mater. 2019;6:82. doi: 10.3389/fmats.2019.00082. [DOI] [Google Scholar]

[bib126] 126.Bacon F. Cambridge University Press; 2000. The New Organon. [DOI] [Google Scholar]

[bib127] 127.Dehghannasiri R., Yoon B.-J., Dougherty E.R. Efficient experimental design for uncertainty reduction in gene regulatory networks. BMC Bioinf. 2015;16:S2. doi: 10.1186/1471-2105-16-S13-S2. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib128] 128.Hong Y., Kwon B., Yoon B.-J. Optimal experimental design for uncertain systems based on coupled differential equations. IEEE Access. 2021;9:53804–53810. doi: 10.1109/ACCESS.2021.3071038. [DOI] [Google Scholar]

[bib129] 129.Woo H.-M., Hong Y., Kwon B., Yoon B.-J. Accelerating optimal experimental design for robust synchronization of uncertain kuramoto oscillator model using machine learning. IEEE Trans. Signal Process. 2021;69:6473–6487. doi: 10.1109/TSP.2021.3130967. [DOI] [Google Scholar]

[bib130] 130.Broumand A., Esfahani M.S., Yoon B.-J., Dougherty E.R. Discrete optimal Bayesian classification with error-conditioned sequential sampling. Pattern Recogn. 2015;48:3766–3782. doi: 10.1016/j.patcog.2015.03.023. [DOI] [Google Scholar]

[bib131] 131.Zhao G., Dougherty E., Yoon B.-J., Alexander F., Qian X. 24th International Conference on Artificial Intelligence and Statistics. AISTATS); 2021. Bayesian active learning by soft mean objective cost of uncertainty. [Google Scholar]

[bib132] 132.Ben-Gal I., Caramanis M. Sequential DOE via dynamic programming. IIE Trans. 2002;34:1087–1100. doi: 10.1023/A:1019670414725. [DOI] [Google Scholar]

[bib133] 133.Warren B. 2nd edition. John Wiley & Sons, Inc.; 2011. Powell. Approximate Dynamic Programming: Solving the Curses of Dimensionality. [DOI] [Google Scholar]

[bib134] 134.Huan X., Marzouk Y.M. Sequential Bayesian optimal experimental design via approximate dynamic programming. arXiv. 2016 doi: 10.48550/arXiv.1604.08320. Preprint at. [DOI] [Google Scholar]

[bib135] 135.Zhao G., Dougherty E., Yoon B.-J., Alexander F., Qian X. 35th Conference on Neural Information Processing Systems. NeurIPS); 2021. Efficient active learning for Gaussian process classification by error reduction. [Google Scholar]

[bib136] 136.Hernandez A., Balasubramanian A., Yuan F., Mason S.A.M., Mueller T. Fast, accurate, and transferable many-body interatomic potentials by symbolic regression. npj Comput. Mater. 2019;5:112. doi: 10.1038/s41524-019-0249-1. [DOI] [Google Scholar]

[bib137] 137.Rubinstein R., Bruckstein A.M., Elad M. Dictionaries for sparse representation modeling. Proc. IEEE. 2010;98:1045–1057. doi: 10.1109/JPROC.2010.2040551. [DOI] [Google Scholar]

[bib138] 138.Bengio Y., Courville A., Vincent P. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 2013;35:1798–1828. doi: 10.1109/TPAMI.2013.50. [DOI] [PubMed] [Google Scholar]

[bib139] 139.Li Y., Yang M., Zhang Z. A survey of multi-view representation learning. IEEE Trans. Knowl. Data Eng. 2019;31:1863–1883. doi: 10.1109/TKDE.2018.2872063. [DOI] [Google Scholar]

[bib140] 140.Long W., Lu Z., Cui L. Deep learning-based feature engineering for stock price movement prediction. Knowl. Base Syst. 2019;164:163–173. doi: 10.1016/j.knosys.2018.10.034. [DOI] [Google Scholar]

[bib141] 141.James M.K., Veeramachaneni K. 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA) IEEE; 2015. Deep feature synthesis: Towards automating data science endeavors; pp. 1–10. [DOI] [Google Scholar]

[bib142] 142.Kaul A., Maheshwary S., Pudi V. 2017 IEEE International Conference on Data Mining (ICDM) IEEE; 2017. Autolearn—automated feature generation and selection; pp. 217–226. [Google Scholar]

[bib143] 143.Khurana U., Turaga D., Samulowitz H., Srinivasan P. 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW) IEEE; 2016. Cognito: Automated feature engineering for supervised learning; pp. 1304–1307. [Google Scholar]

[bib144] 144.Khurana U., Samulowitz H., Turaga D. Feature engineering for predictive modeling using reinforcement learning. Proc of AAAI 2018. 2018;32:3407–3414. [Google Scholar]

[bib145] 145.Zhang J., Jianye H., Fogelman-Soulié F., Wang Z. Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. 2019. Automatic feature engineering by deep reinforcement learning; pp. 2312–2314. [Google Scholar]

[bib146] 146.Ghiringhelli L.M., Vybiral J., Levchenko S.V., Draxl C., Scheffler M. Big data of materials science: critical role of the descriptor. Phys. Rev. Lett. 2015;114 doi: 10.1103/PhysRevLett.114.105503. [DOI] [PubMed] [Google Scholar]

[bib147] 147.Ghiringhelli L.M., Vybiral J., Ahmetcik E., Ouyang R., Levchenko S.V., Draxl C., Scheffler M. Learning physical descriptors for materials science by compressed sensing. New J. Phys. 2017;19 doi: 10.1088/1367-2630/aa57bf. [DOI] [Google Scholar]

[bib148] 148.Ouyang R., Curtarolo S., Ahmetcik E., Scheffler M., Ghiringhelli L.M. SISSO: A compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates. Phys. Rev. Mater. 2018;2 doi: 10.1103/PhysRevMaterials.2.083802. [DOI] [Google Scholar]

[bib149] 149.Fan J., Lv J. Sure independence screening for ultrahigh dimensional feature space. J. Roy. Stat. Soc. B. 2008;70:849–911. doi: 10.1111/j.1467-9868.2008.00674.x. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib150] 150.Tibshirani R. Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. B. 1996;58:267–288. https://www.jstor.org/stable/2346178 [Google Scholar]

[bib151] 151.Xiang Z., Fan M., Guillermo Vazquez Tovar, Trehern W., Yoon B.-J., Qian X., Arróyave R., Qian X. the 35th AAAI Conference on Artificial Intelligence. AAAI 2021); 2021. Physics-constrained automatic feature engineering for predictive modeling in materials science. [Google Scholar]

[bib152] 152.Mnih V., Kavukcuoglu K., Silver D., Rusu A.A., Veness J., Bellemare M.G., Graves A., Riedmiller M., Fidjeland A.K., Ostrovski G., et al. Human-level control through deep reinforcement learning. Nature. 2015;518:529–533. doi: 10.1038/nature14236. [DOI] [PubMed] [Google Scholar]

[bib153] 153.Podryabinkin E.V., Shapeev A.V. Active learning of linearly parametrized interatomic potentials. Comput. Mater. Sci. 2017;140:171–180. doi: 10.1016/j.commatsci.2017.08.031. [DOI] [Google Scholar]

[bib154] 154.Zhang L., Lin D.-Y., Wang H., Car R., Weinan E. Active learning of uniformly accurate interatomic potentials for materials simulation. Phys. Rev. Mater. Feb 2019;3 doi: 10.1103/PhysRevMaterials.3.023804. [DOI] [Google Scholar]

[bib155] 155.Vandermause J., Torrisi S.B., Batzner S., Xie Y., Sun L., Kolpak A.M., Kozinsky B. On-the-fly active learning of interpretable Bayesian force fields for atomistic rare events. npj Comput. Mater. 2020;6:20. doi: 10.1038/s41524-020-0283-z. [DOI] [Google Scholar]

[bib156] 156.Wilson N., Willhelm D., Qian X., Arróyave R., Qian X. Batch active learning for accelerating the development of interatomic potentials. Comput. Mater. Sci. 2022;208 doi: 10.1016/j.commatsci.2022.111330. [DOI] [Google Scholar]

[bib157] 157.Chen C., Deng Z., Tran R., Tang H., Chu I.-H., Ong S.P. Accurate force field for molybdenum by machine learning large materials data. Phys. Rev. Mater. 2017;1 doi: 10.1103/PhysRevMaterials.1.043603. [DOI] [Google Scholar]

PERMALINK

Knowledge-driven learning, optimization, and experimental design under uncertainty for materials discovery

Xiaoning Qian

Byung-Jun Yoon

Raymundo Arróyave

Xiaofeng Qian

Edward R Dougherty

Summary

The bigger picture

Introduction

Bayesian learning, UQ, and experimental design

Mathematical backgrounds

Related works

Figure 1.

IBR operator and mean objective cost of uncertainty (MOCU)-based UQ

Figure 2.

Knowledge-driven prior construction

Traditional priors

CPSP relationships in materials science

Maximal knowledge-driven prior (MKDIP) construction

Figure 3.

Integrating prior knowledge in materials science

Figure 4.

Bayesian model averaging (BMA) with experimental design

Review on Bayesian model fusion

BMA with MOCU for OED

BMA for materials science applications

Figure 5.

OED with MOCU

MOCU-based OED

Figure 6.

OED for shape memory alloy (SMA)

Automatic feature engineering (AFE)

Related work in feature engineering

Physics-constrained AFE

Algorithm 1. DQN for AFE.

AFE to learn interatomic potential models for copper

Figure 7.

Conclusions and future work

Acknowledgments

Declaration of interests

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases