Abstract
Recently there has been an ever-increasing trend in the use of machine learning (ML) and artificial intelligence (AI) methods by the materials science, condensed matter physics, and chemistry communities. This perspective article identifies key scientific, technical, and social opportunities that the materials community must prioritize to consistently develop and leverage Scientific AI (SciAI) to provide a credible path towards the advancement of current materials-limited technologies. Here we highlight the intersections of these opportunities with a series of proposed paths forward. The opportunities are roughly sorted from scientific/technical (e.g. development of robust, physically meaningful multiscale material representations) to social (e.g. promoting an AI-ready workforce). The proposed paths forward range from developing new infrastructure and capabilities to deploying them in industry and academia. We provide a brief introduction to AI in materials science and engineering, followed by detailed discussions of each of the opportunities and paths forward.
1. A Brief perspective on AI in materials science
Recent reports, reviews, symposia, and workshops have heralded machine learning (ML) and artificial intelligence (AI) methods as the next scientific paradigm in materials discovery and optimization [1-5]. Applications to materials science have exploded, spanning data analysis, knowledge extraction, and experiment selection [1, 6-9]. The numerous reasons for this trend are related to the omnipresence of ML systems in our everyday lives, the free availability software, and the demonstrated successes in materials discovery and on-the-fly data acquisition inspired by the Materials Genome Initiative (MGI) [1, 10-12]. However, despite their recent prominence, these techniques have been applied in a variety of materials science fields since the early 1960’s [13-17].
Some recent examples of the successful implementation of ML to materials science were demonstrated by the high-throughput experimental (HTE, also known as “combinatorial”) community. Parallel material synthesis and rapid characterization introduces a critical bottleneck in the analysis of hundreds to thousands of high-quality measurements correlated in composition, processing and microstructure [18-21]. There have been several international efforts to standardize data formats and create data analysis and interpretation tools for large scale data sets [22-24]. The rise of the HTE community resulted in the creation of new and creative modes of measuring properties and visualizing and interpreting data. As algorithms for automating these tasks mature, decision-making and experimental planning are emerging as new bottlenecks in the materials research process. This means that advances in AI for materials research are as important as ever for accelerating innovation in materials, for example through emerging autonomous experimental systems [25-29].
Although the application of AI is now increasingly commonplace in the materials community, we are approaching the peak of excitement and inflated expectations. Some disillusionment is inevitable, but we believe that by pursuing the following opportunities the community will more rapidly reach a steady state of widespread productive application of AI. Achieving this goal will require 1) methodological development in scientific AI, 2) significant investment in cyber-physical infrastructure, 3) commitment to measures improving trust in AI systems, and 4) workforce development.
2. Opportunities in Scientific AI
The transition from expectations to practice for AI will require development of robust Scientific AI systems that can go beyond generating leads, i.e., nudges in the right direction, to providing rich functionality that enables scientific discovery. Two opportunities to close this gap are:
developing Scientific AI systems that combine ML techniques with physical mechanisms
innovative applications of AI systems to directly derive scientific insight
A robust community of interdisciplinary materials science and engineering (MSE) and ML researchers is needed to enable the algorithmic development to support these two goals. Distributed automated laboratory systems will facilitate this development by equalizing access to cutting edge experimental materials science, providing a substrate for high-impact interdisciplinary collaboration. Materials have always been technology enablers, and currently there are many key technology areas that await materials discovery and processing solutions. Addressing these opportunities will drive and propel the required developments.
2.1. Incorporating physical mechanisms into ML models
The brute force strategy of collecting massive annotated datasets, such as those that enabled the current wave of advances in image recognition, natural language understanding, and neural translation, is untenable due to the relative scarcity of many types of materials data, and the high cost of obtaining materials data. Instead, the materials community needs to address underdeveloped material and processing representations to improve model quality and expand application of AI methods, leveraging the so-called bias-variance tradeoff [30]. Simple models fail to capture the complexity of hierarchical materials structures (i.e., they underfit due to high bias), while high capacity models often yield pathological or trivial results for small and medium-sized datasets (i.e., they overfit due to high variance). The challenge is to introduce the right kind of bias into high capacity models by designing input representations and model forms to reflect known invariances, equivariances, and symmetries in the domain [31-35]. In the context of scientific AI, this means incorporation of mechanistic biases to create interpretable models and learning algorithms, explicitly incorporating physical heuristics, theories, and laws into the model form. For example, the Physically Inspired Neural Network interatomic potential [36] uses a neural network to adaptively parametrize a classical interatomic potential form instead of directly modeling forces and energies. More expansively, universal differential equations[34] directly incorporate neural networks into mechanistic differential equation models.
A key opportunity is to systematically integrate the vast implicit and explicit materials knowledge in the published literature on a per-task basis through model form specification and learning algorithm design choices. This principle is applied in a limited way in the materials AI community, but much research is needed to more fully incorporate physical intuition before ML models can extrapolate to new regions of material space. Development of knowledge graphs and ontologies that capture subject matter expertise will help to provide more actionable material representations and hierarchical material models. Differentiable programming [35] (and probabilistic programming more generally) is a promising new set of tools for coordinating and unifying complementary sources of mechanistic physical information.
In addition to incorporating mechanistic biases at the level of individual modeling tasks, scientific AI systems for materials development will require the development of hybrid machine learning systems that bridge time and length scales as well as experimental and computational paradigms. Outside of the interatomic potentials community, there are few demonstrations of representing material structure representations tailored for dynamic processes. It is difficult to encode certain types of metadata (environment, processing paths, heat transfer characteristics that depend on geometry, etc.) that are known to influence material properties. Vector-valued and time-varying material processing attributes (such as loading and annealing schedules) are often reduced to categorical and tabular representations. Importantly, much effort is required to address technologically important materials systems, where the complexity of material processing far exceeds that of laboratory-scale studies.
2.2. Deriving scientific insight from AI models
In many ways, current applications of AI in materials science focus more on solving engineering and design problems than on directly deriving fundamental scientific insight from data. Current materials science AI applications predominantly focus on lead generation and black-box optimization. To realize the full potential of AI to help us more efficiently and effectively practice scientific inquiry, the materials community must develop AI systems that can represent, evaluate, and perform inference about physical mechanisms underlying observational data.
In the short term, creative application of existing ML methods is enabling new avenues to accelerate scientific discovery. Active learning, for example, might be applied to identify a set of optimal experiments to disambiguate a list of potential physical theories, as is being explored in the social sciences [37]. Similarly, algorithmically driven experimentation could be used to search for counterexamples to heuristic models or physical theories, potentially providing materials scientists with valuable insights into why these heuristics and theories break down [38]. Furthermore, much of the existing materials knowledge base is in the form of implicit institutional knowledge and expert intuition. Thus, development of “human-in-the-loop” methodologies leveraging real-time model visualization, introspection, and feedback must not be overlooked.
An important next step in scientific AI is the development of new AI methods tailored for scientific discovery. This includes methods that can infer physical relationships, mechanisms, and principles from data, potentially drawing from the fields of causal discovery [39] and probabilistic programming [40]. At the “Strong AI” extreme of this line of inquiry, hypothetical AI systems will be expected to formulate and test scientific theories to credibly identify new scientific paradigms. Even if such systems can be constructed, they will still need to overcome the Pauling Problem [41], where physical bias overwhelms new evidence of worldview-breaking phenomena such as superconductivity, 2D materials, or quasicrystals.
2.3. Paths forward
Cross-disciplinary Collaboration
Generate funding opportunities targeted towards funding cross-disciplinary research at the cutting edge of MSE, ML, AI, and Robotics to promote communication skills to identify and frame mutually interesting research.
Collaborate to develop multiscale materials and knowledge representations and generative modeling techniques
Create career opportunities at the research associate and technician levels in applied ML and Software Engineering.
Explore probabilistic programming methods to meld physical and phenomenological modeling with machine learning.
Develop objective methods for identification and evaluation of the most informative or unusual datum in any given scientific dataset.
Autonomous research platforms:
Develop open autonomous research platforms to provide a substrate for developing and deploying materials AI methods on large-scale materials design problems.
Provide opportunities for the broad materials and AI communities to have access to these platforms, lowering the barrier to entry to materials discovery and design.
Reference data:
Develop challenge problems to focus innovation and collaboration on difficult scientific discovery problems, i.e. the materials discovery and design analog to Large Scale Visual Recognition Challenge [42].
Compile materials datasets with annotated physical rules and heuristics.
3. Opportunities in Cyber-Physical infrastructure
Realization of scientific AI’s potential in materials science and engineering will require advanced cyber-physical infrastructure. We have identified four major opportunities to facilitate this development:
Improved standards and coordination in materials data infrastructure
Development of open and interoperable API-enabled experimental tools
Development of scalable on-demand synthesis/characterization capabilities
Democratization of research platforms
An improved materials data infrastructure will enable data stewardship throughout the research data lifecycle, which will greatly improve the accessibility of data and metadata to both AI systems and human researchers. Fully automate-able synthesis and characterization tools that execute standardized experimental protocols will improve reproducibility while seamlessly capturing provenance. This will decrease the cost of generating new data and knowledge, and will support real-time distributed and autonomous experimentation. Development of new impedance matched, on-demand synthesis and characterization techniques will be critical to expand the applicability of this approach. The fundamental question is how do we rethink the “synthesize-then-characterize” framework when actionable knowledge can be generated at a rate faster than it takes to transfer the specimens? Finally, we must develop organizational frameworks to democratize access to these new experimental, computational, and data resources, something, comparable to the user facility paradigm at high performance computing centers. Ultimately, this framework would enable scientists and engineers to focus more of their time on conceiving, planning, and executing scientific studies.
3.1. Standards and coordination in materials data infrastructure
Over the past decade, several reports have identified materials data infrastructure as critical gaps limiting innovation in materials research [43-45]. These reports consistently highlight the need for long-term support of shared data services, improved coordination among government agencies, publication of all research data (novel as well as null) with robust metadata, and improved development of community standards for these data and metadata. Findable, Accessible, Interoperable, and Reusable (FAIR) data principles [46] can guide the materials science and engineering community in developing infrastructure suited to collaborative and adaptive research. However, the complexity of materials science and engineering data poses unique challenges to the adoption of FAIR principles. International groups, such as the Research Data Alliance, are fully embracing FAIR Data Principles and are extending them beyond data and metadata, to data types, instruments, and physical samples. Currently, the materials science and engineering community does not have robust frameworks for assigning persistent identifiers to data types, instruments, physical samples, and data and metadata within a larger dataset. Furthermore, once persistent identifiers are assigned on smaller units within a larger dataset, the community will face challenges in effectively and uniformly citing data.
3.2. Open and interoperable API-enabled experimental tools
Critical bottlenecks for adaptive science and autonomous control of experimental systems are (i) a widespread absence of application programming interfaces (API) to interact with laboratory equipment, (ii) lack of a unified language for experimental workflow protocols, and (iii) lack of standardized and open data formats to facilitate accessibility and interoperability. Currently, downstream researchers are developing ad hoc hardware interfaces, duplicating effort and often incurring substantial technical debt. Materials synthesis and characterization workflows are typically manifested in custom software rather than in composable and machine-actionable data representations. Finally, experimental equipment is supported by a diverse collection of vendor-specific interfaces and formats, which may not be well-documented, and may be difficult to use independently from vendor-developed software frameworks. This presents an unnecessary impediment to innovation in real-time data analysis and adaptive experimental planning and control. There is a significant need to facilitate industry-lead development of standards for open and machine-actionable instrument APIs, executable protocols for experimental workflows, and file formats.
3.3. Scalable on-demand synthesis/characterization capabilities
Current materials synthesis and characterization tools are not designed for low latency and high agility between experiments, leading to a significant time-constant mismatch with the algorithmic decision-making that is enabling autonomous experimentation. Currently, an individual high-throughput experimental campaign is restricted to depositing a monolithic combinatorial library under (typically) identical processing conditions and characterizing each sample within the library for the composition, structure, and multiple figures of merit. While high-throughput synthesis techniques enabled revolutionary improvements in the rapid exploration of process-structure-property relationships [47-51], library generation now presents a major bottleneck due to its high latency and the intensity of human labor involved. Therefore, low latency, automated synthesis platforms, integrated with multimodal characterization tools, should be developed. AI also presents unprecedented opportunities for novel adaptive experiments enabled by in situ automated perception and data analysis, e.g., through real-time identification, tracking, and subsequent fine-grained analysis of features of interest [52]. For low-latency decision-making, it may be necessary to leverage edge computing [53], e.g. running a deep learning model directly on detector output.
3.4. Resource Democratization
Large materials research user facilities (e.g. Advanced Photon Source, NERSC) have demonstrated a model for decoupling the construction and operation of experimental tools and computing infrastructure from the use of those tools by scientific subject matter experts. Similarly, the adaptive synthesis and integrated multimodal characterization platforms described above will require significant capital investment to invent, develop, build, and operate. Therefore, the materials community, and the greater community at large, is presented with an opportunity to develop an organizational and technological framework to facilitate collaboration between theoretical and experimental research groups, and to lower the barrier for cross-material-system, cross-synthesis-method, and cross-modality studies. This framework would also provide increased access to cutting edge experimental materials capabilities to new user communities from underrepresented groups and smaller institutions.
In addition to the cyber-physical infrastructure challenges described above, experimental synthesis and characterization methods are very specific to a given class of materials. There is unlikely to be even one brick and mortar facility to allow researchers to study several materials classes. The Materials Innovation Platforms at the National Science Foundation [54] provide one avenue for resource democratization spanning from predictive synthesis to characterization. These topical platforms are well suited to serve as highly-connected experimental nodes in a research network where information is shared through repositories with community-designed schemas and communication protocols. MIPs might also provide a means of performing the expansive microstructure and interface characterization needed to explore property/performance landscapes across a diverse set of critical materials systems, where microstructure and interfaces strongly mediate material performance.
3.5. Paths Forward
Consortia
Develop community standards to enable FAIR data and equipment interoperability, while learning from successful examples, such as MTConnect
Design, deploy, operate, and provide democratized access to distributed autonomous laboratory platforms and broader cyber-physical infrastructure, as advocated in the high throughput experimental materials collaboratory (HTE-MC) concept [55].
Launch new funding initiatives to support creation of materials-focused AI Research Centers and Mission-Driven AI Laboratories as described in [56].
Autonomous materials science
Design for automation: Rethink the “high throughput” materials synthesis methodology portfolio in light of new capabilities in real-time automated perception, modeling, decision making, and the need for real-time closed-loop feedback from multiple structure and property probes.
Leverage automation: identify new opportunities to turn ex situ analysis methods into AI-driven in situ adaptive techniques.
4. Opportunities in Trust
Promoting community-wide trust in Scientific AI results is key to reducing the impact of increased disillusionment. We have identified three important opportunities for improving confidence in scientific AI as applied to materials:
Develop and enforce community wide standards for reporting uncertainty from archival data to final model predictions.
Create a scientific culture that values and promotes reproducibility, validation, and verification of published data
Work towards improving the interpretability of AI models and providing a solid foundation towards trust in their predictions
Creating a robust interdisciplinary community spanning MSE and computer science will create opportunities for real-time methods for exploring materials representations, permitting researchers to have confidence that the final model reflects solid physical principles. Generation of reference data sets and materials data challenge problems will allow benchmarking of new models/algorithms using specially designed performance indicators relevant to the specific AI task. Dissemination of best practices will create a community of informed skeptics that request open code and datasets, look for task-appropriate performance indicators, and are alert to issues of dataset and modeling bias.
4.1. Uncertainty: archival data to final model predictions
Current applications of AI in materials science largely ignore the uncertainty of the raw data used to train models. Leveraging larger scale datasets derived from the open literature and published materials databases will require systematic evaluation of source and reporting of measurement and model uncertainty. Reporting uncertainty is introduced by incomplete collection, storage, and/or publication of relevant data, metadata, experimental uncertainties, and potential spurious covariates. At any point where manual annotations from human experts (or non-experts) enter the process, one must also account for annotation uncertainty.
Robust uncertainty quantification is particularly important for robust data fusion and transfer learning based on multiple experimental and simulation-based information sources. These will play an important role in scientific AI for materials research because of the diversity of information needed for modeling complex materials, the relative expense of experiments, and the dominance of simulation results in the current body of openly available materials data. Each material characterization and simulation technique has known ranges of applicability and sources of bias and uncertainty, but these are not typically expressed in a quantitative form amenable to seamless composition of simulation and experiment. This presents an opportunity to develop methods for propagating such uncertainties through AI models, while providing guard rails that alert users to known limitations of the input data. A familiar example is to use caution when interpreting the role of high-throughput DFT bandgaps (which exhibit well-known systematic biases [57]) as model inputs, especially when modeling properties that arise from unrelated phenomena, such as melting temperature. There is a critical need for validation and verification data (e.g. [58]) to benchmark data fusion and transfer learning efforts, and to assess the physicality of the predictions of scientific AI.
The outputs of ML models also have uncertainty related to the model selection and fitting process. This kind of uncertainty must be systematically propagated through a larger pipeline of interlinked physical and ML models. Unbiased assessments of the full model uncertainty from raw data through final predictions are needed to determine with confidence whether it is reasonable to trust the predictions of a machine learning pipeline. Furthermore, well-calibrated uncertainty estimates are crucial to the performance of active learning systems, which rely on quantification of model uncertainty to identify experiments that are likely to be informative.
4.2. Reproducibility, validation, and verification
Ensuring the reproducibility of scientific AI in materials research depends critically on transparency in publication, attention to correct methodology in evaluating results, and independent testing and verification of model predictions [59]. We must develop a strong culture of scrutinizing modeling assumptions, checking for due diligence in training procedures, and verifying that ML models are not being applied outside their regime of applicability.
Recent development of open libraries (Matminer, TPOT), data repositories and platforms (MP, JARVIS, AFLOW, OQMD, Citrination, MDF, NOMAD and AIIDA), and paper repositories are significantly increasing the accessibility and reproducibility of materials research. However, manuscripts often do not fully document model hyperparameters, or the model selection and tuning process used, and data and software are not commonly made available. This can make it difficult to evaluate whether the results suffer from overfitting or information leakage, and impossible for independent verification or comparison with other works. Researchers using AI methods should investigate and publish the failure modes of the models they use, as this can promote improvement in, and trust of, AI methods. For any given modeling task, choice of appropriate performance metrics is of paramount importance [60]. Metrics that account for dataset bias are particularly important in the face of systemic publication bias in favor of “positive” scientific results, community pursuit of “lead material” derivatives, and in modeling phenomena governed by rare materials features (such as fatigue crack initiation).
Finally, much of the materials AI literature describes proof-of-concept work applied to a single material system, and experimental validation of predictions is often deferred to followup studies. In contrast, many computer science venues expect methods papers to demonstrate generalizable results on multiple datasets and/or multiple tasks. Addressing this problem will entail finding ways to lower the barrier for groups to collaborate. Creation of a reproducibility and validation consortium would facilitate the collaboration process and potentially lead to extensive use of shared resources throughout the materials research landscape.
4.3. Establishing Interpretability and Trust
Models and theories are fundamental to the scientific method, and scientists expect to be able to rationalize predictions and discoveries by explaining observations through an underlying phenomenon or mechanism. Thus, the interpretability of scientific AI models is necessary to establish sufficient trust in AI methods for widespread scientific application. Interestingly, trust and interpretability currently lack consensus definitions in computer science and psychology [61]. The challenge of interpretability lies in balancing faithful representation of the model’s mechanisms and the ease of intuitive understanding by a human [62], while trust corresponds to a user’s willingness to accept or reject model predictions relative to the baseline error rate of the model [61].
The most common scenario in AI-driven materials science involves completely opaque previously-generated models, for example in a process-oriented environment [63]. Here the user does not have access to the full descriptor set or material representation, may not know the model form, and only has access to the final prediction. Thus the distribution of user trust levels may have a large variance. Informed trust in this scenario must be gained through meticulous empirical validation procedures. Feature importance ranking provides some level of insight into a model, but does not go far enough to support claims of physical realism; this interpretability tool is not typically robust in the face of correlated or spurious inputs. Great opportunities exist to boost the interpretability of AI models by providing output in the form of either a human interpretable series of selection criteria (e.g. a simple decision tree/process flow diagram) [64], a set of physically meaningful equations, or a textual explanation. At this level of interpretability expert opinions could be built transparently into the framework through extensive interactions and the trust in the model outputs will be increased.
4.4. Paths Forward
Cross-disciplinary Collaboration:
Develop and deploy real-time algorithms for exploring interpretable material representations during research campaigns.
Design human-in-the-loop methodologies for quantifying interpretability and trust.
Reference Data:
Develop and adopt common benchmark datasets and performance indicators for measuring and comparing methodological progress.
Create dedicated funding mechanisms for experimental validation of materials predictions
Best Practices:
Reviewers insist on full and open access publication of source code, machine-readable training data, and artifacts such as trained models from publicly funded research.
Reviewers insist on task-appropriate performance indicators and fully-documented research protocol.
Identify and quantify bias / variance issues in datasets
Assess dataset and source bias through round-robin type studies to establish reproducible results.
Create community accepted benchmarks for fusing experimental and computational data with uncertainty and applicability propagation through the model training, testing, and interpretation pipeline.
5. Opportunities in Workforce Development
There is an urgent need for workforce development to ensure that AI techniques are introduced into the materials science workflow with the appropriate level of scientific rigor. Briefly, there are opportunities in:
Educating the next generation workforce to be conversant in AI techniques and their application to materials science.
Expanding skills within the current workforce, enabling them to effectively mentor the next-generation workforce.
Adopting an open data culture
Here consortia will play an important role by developing open source educational materials, hosting bootcamps, and introducing workshop tracks at professional meetings, creating learning opportunities and materials that can be disseminated up and down the educational tiers. [65] summarizes the current status in these areas. Likewise the definition, publication, and demonstration of best practices in AI (e.g., [5]) will go a long way to increasing awareness and trust within the community.
5.1. Educating the next-generation workforce
Materials science curricula are in need of urgent restructuring to produce a competitive next generation workforce [65, 66]. This restructuring needs to take into account the level of skills transferability needed throughout the overall data landscape, in addition to direct application to materials research. Traditional materials science education contains few required courses in statistical methods and programming. This is a major limitation on the adoption of ML techniques by the materials science community, as graduates lack foundational knowledge and skills.
At the undergraduate level, there are few treatments in the application of AI to materials science available for developing course modules, though graduate programs and standalone summer courses are rapidly expanding [65]. One critical need is the development of open data/code repositories that provide “plug and play” modules to augment the current materials science undergraduate curriculum. GaTech [67] and recently developed ML content on nanoHUB [68-70] are excellent early examples of this educational model. Open educational resources, in addition to formalized courses providing a more rigorous introduction to research computing and statistical methods, are needed to create a BS-level workforce capable of implementing ML.
5.2. Expanding skills within the current workforce
At the graduate and post-graduate level, there is an urgent need for providing salient feedback on the relevance of models and their outputs. Mid to late career materials scientists might feel unprepared to mentor researchers applying ML techniques to their research. This can lead to naively trusting (or dismissing out-of-hand) results from ML workflows, or feeling unequipped to practice informed skepticism.
There is great need for a new professional track in the materials field, since federally-funded data-intensive centers and facilities will start building both physical and cyber data infrastructure. At present, the availability of data technicians is minimal. Establishing a few training pilots across the country amongst undergraduate institutions and community colleges will provide the needed workforce to accomplish this task. If the MGI/AI visions are to be realized, there is an immediate urgency for workforce training pilots.
5.3. Adopting an open data culture
Data sciences have proven to be a democratizer in a variety of fields, notably in astronomy, bioinformatics, and high energy physics. In an open data culture, data and metadata are rigorously acquired and deposited in standardized open repositories, also accessible to low-capacity research institutions. Adopting the open data paradigm will afford community colleges and low-capacity research institutions membership to the materials research community and greatly contribute to diversity.
5.4. Paths Forward
Consortia:
Develop open source educational materials (e.g., https://datacarpentry.org/) broadly targeting other opportunities throughout this document. Educational materials should leverage and reference known best practices. The materials, associated data, and code should be promoted in a way that they are indexed by internet search engines to maximize visibility.
Host bootcamps [e.g. NIST’s MLMR], webinars, and hackathons framed around AI usage in materials science
Introduce a “workshop track” at major materials conferences for students and researchers to acquire and practice new skills.
Support internships [e.g. NIST’s SURF program] for students and researchers to develop real-world experience.
Best Practices:
Engage with stakeholders to define, publish, and demonstrate best practices in the use of AI in Materials science and engineering
6. Summary and Outlook
In the previous sections, we provided a perspective on the application of AI in materials science and engineering and outlined four overarching opportunities for potential advancement within the community. We have proposed five cross-cutting paths forward for each opportunity:
Cross-disciplinary Collaboration
Reinvigorated collaboration among the domains of materials science and engineering, computer science, and data science will advance state-of-the-art solutions for scientific AI and cyber-physical infrastructure while enabling trust in AI.
Autonomous Research Platforms
The development and deployment of diverse autonomous research platforms will enable implementation and evaluation of new technology in scientific AI and cyber-physical infrastructure by the rapid generation of high-quality experimental materials data. Connecting these platforms will create compound network effects that increase the leverage of any single experiment or calculation.
Reference Data
The creation of new reference and challenge datasets will enable the broader community to develop scientific AI and increase trust in AI, just as the classic MNIST handwritten digit database [71] has enabled these outcomes in the broader STEM community.
Consortia
The creation of new consortia will engage stakeholders in industry, government and academia to provide economically sustainable frameworks for the deployment and operation of cyber-physical infrastructure and expanding the AI skills of the current and future workforce, which will boost consortia member trust in AI.
Best Practices
The creation of stakeholder-lead standards and best practices will enable trust in AI and foster a workforce that understands how to use AI effectively.
If the community makes coordinated efforts in these areas, we can anticipate rapid acceleration of materials discovery and process optimization, which will open new pathways for technological advancement in sustainable development, transportation, water security, medicine, and other technologies central to human welfare.
Figure 1:
Opportunities and proposed paths forward that will enable broader and more effective use of AI in materials science and engineering. The blue intersections represent areas of particular importance to be discussed in this paper.
Footnotes
Disclosures
This article was prepared while E.M. Campo was employed at the National Science Foundation. The opinions expressed in this article are the author’s own and do not reflect the view of the National Science Foundation.
Data Availability
Data sharing is not applicable to this article as no new data were created or analysed in this study.
References
- [1].Agrawal Ankit and Choudhary Alok. “Perspective: Materials Informatics and Big Data: Realization of the "Fourth Paradigm" of Science in Materials Science”. In: APL Materials 4.5 (2016), p. 053208. DOI: 10.1063/1.4946894. URL: 10.1063/1.4946894 (cit. on p. 1). [DOI] [Google Scholar]
- [2].Kalidindi Surya R and De Graef Marc. “Materials data science: current status and future outlook”. In: Annual Review of Materials Research 45 (2015), pp. 171–193 (cit. on p. 1). [Google Scholar]
- [3].Dimiduk Dennis M, Holm Elizabeth A, and Niezgoda Stephen R. “Perspectives on the impact of machine learning, deep learning, and artificial intelligence on materials, processes, and structures engineering”. In: Integrating Materials and Manufacturing Innovation 7.3 (2018), pp. 157–172 (cit. on p. 1). [Google Scholar]
- [4].Schmidt Jonathan et al. “Recent advances and applications of machine learning in solid-state materials science”. In: npj Computational Materials 5.1 (2019), pp. 1–36 (cit. on p. 1). [Google Scholar]
- [5].Yu-Tung Wang Anthony et al. “Machine Learning for Materials Scientists: An introductory guide towards best practices”. In: (2020) (cit. on pp. 1, 10). [Google Scholar]
- [6].Holdren John P et al. “Materials genome initiative strategic plan”. In: Washington DC: Office of Science and Technology Policy; 6 (2014) (cit. on p. 1). [Google Scholar]
- [7].Aspuru-Guzik A and Persson K. “Materials Acceleration Platform: Accelerating Advanced Energy Materials Discovery by Integrating High-Throughput Methods and Artificial Intelligence.” In: Mission Innovation: Innovation Challenge 6 (2018) (cit. on p. 1). [Google Scholar]
- [8].Workshop on Artificial Intelligence Applied to Materials Discovery and Design. Tech. rep. DOE Advanced Manufacturing Office, 2017. URL: https://www.energy.gov/sites/prod/files/2018/03/f49/AI%20Applied%20to%20Materials%20Discovery%20and%20Design_Workshop%20Summary%20Report.pdf (cit. on p. 1). [Google Scholar]
- [9].Workshop Report on Basic Research Needs for Scientific Machine Learning: Core Technologies for Artificial Intelligence. Tech. rep. DOE Office of Scientific and Technical Information, 2019. DOI: 10.2172/1478744. URL: https://www.osti.gov/biblio/1478744/ (cit. on p. 1). [DOI] [Google Scholar]
- [10].Aziza Bruno. “Machine Learning And Data – Where You’d Least Expect It”. In: Forbes Magazine (2018). URL: https://www.forbes.com/sites/ciocentral/2018/10/30/machine-learning-data-where-youd-least-expect-it/#5c4fc16e9871 (cit. on p. 1). [Google Scholar]
- [11].Lookman Turab, Alexander Francis J, and Bishop Alan R. “Perspective: Codesign for materials science: An optimal learning approach”. In: APL Materials 4.5 (2016), p. 053501 (cit. on p. 1). [Google Scholar]
- [12].Ren Fang et al. “Accelerated discovery of metallic glasses through iteration of machine learning and high-throughput experiments”. In: Science advances 4.4 (2018), eaaq1566 (cit. on p. 1). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Hussey SJ, Placek PL, and Schack CH. An Introduction to Statistical Design of Experiments in Metallurgical Research. Tech. rep 1963. (cit. on p. 1). [Google Scholar]
- [14].De Wilde WP and Sol H. “Anisotropic Material Identification Using Measured Resonant Frequencies of Rectangular Composite Plates”. In: Composite Structures 4. Dordrecht: Springer Netherlands, 1987, pp. 317–324. DOI: 10.1007/978-94-009-3457-3_24. URL: http://link.springer.com/10.1007/978-94-009-3457-3%7B%5C_%7D24 (cit. on p. 1). [DOI] [Google Scholar]
- [15].Burati jr JL., Antle CE, and Willenbrock JH. “Development of a Bayesian Acceptance Approach for Bituminous Pavements”. In: Transportation Research Record 924 (1983) (cit. on p. 1). [Google Scholar]
- [16].Teti R and Caprino G. “Prediction Of Composite Laminate Residual Strength Based On A Neural Network Approach”. In: WIT Transactions on Information and Communication Technologies 6.8 (January. 1994). DOI: 10.2495/AI940071. URL: https://www.witpress.com/elibrary/wit-transactions-on-information-and-communication-technologies/6/10933 (cit. on p. 1). [DOI] [Google Scholar]
- [17].Bhadeshia HKDH. “Neural Networks in Materials Science.” In: ISIJ International 39.10 (1999), pp. 966–979. ISSN: 0915-1559. DOI: 10.2355/isijinternational.39.966. URL: 10.2355/isijinternational.39.966 (cit. on p. 1). [DOI] [Google Scholar]
- [18].Long CJ et al. “Rapid structural mapping of ternary metallic alloy systems using the combinatorial approach and cluster analysis”. In: Review of Scientific Instruments 78.7 (2007), p. 072217 (cit. on p. 1). [DOI] [PubMed] [Google Scholar]
- [19].Long CJ et al. “Rapid identification of structural phases in combinatorial thin-film libraries using x-ray diffraction and non-negative matrix factorization”. In: Review of Scientific Instruments 80.10 (2009), p. 103902 (cit. on p. 1). [DOI] [PubMed] [Google Scholar]
- [20].Kusne Aaron Gilad et al. “On-the-fly machine-learning for high-throughput experiments: search for rare-earth-free permanent magnets”. In: Scientific reports 4 (2014) (cit. on p. 1). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Suram Santosh K et al. “Automated phase mapping with AgileFD and its application to light absorber discovery in the V–Mn–Nb oxide system”. In: ACS combinatorial science 19.1 (2016), pp. 37–46 (cit. on p. 1). [DOI] [PubMed] [Google Scholar]
- [22].Koinuma Hideomi. “Combinatorial materials research projects in Japan”. In: Applied surface science 189.3-4 (2002), pp. 179–187 (cit. on p. 1). [Google Scholar]
- [23].Lippmaa Mikk et al. “On-line data management for high-throughput experimentation”. In: MRS Online Proceedings Library Archive 894 (2005) (cit. on p. 1). [Google Scholar]
- [24].Chikyow Toyohiro. Trends in materials informatics in research on inorganic materials. Tech. rep. NISTEP Science & Technology Foresight Center, 2006. (cit. on p. 1). [Google Scholar]
- [25].Nikolaev Pavel et al. “Autonomy in Materials Research: a Case Study in Carbon Nanotube Growth”. In: npj Computational Materials 2.1 (2016), p. 16031. DOI: 10.1038/npjcompumats.2016.31. URL: 10.1038/npjcompumats.2016.31 (cit. on p. 1). [DOI] [Google Scholar]
- [26].Sanchez-Lengeling Benjamin and Aspuru-Guzik Alán. “Inverse Molecular Design Using Machine Learning: Generative Models for Matter Engineering”. In: Science 361.6400 (2018), pp. 360–365. DOI: 10.1126/science.aat2663. URL: 10.1126/science.aat2663 (cit. on p. 1). [DOI] [PubMed] [Google Scholar]
- [27].Dunn Alexander, Brenneck Julien, and Jain Anubhav. “Rocketsled: a software library for optimizing high-throughput computational searches”. In: Journal of Physics: Materials 2.3 (2019), p. 034002 (cit. on p. 1). [Google Scholar]
- [28].Talapatra Anjana et al. “Autonomous efficient experiment design for materials discovery with Bayesian model averaging”. In: Physical Review Materials 2.11 (2018), p. 113803 (cit. on p. 1). [Google Scholar]
- [29].Gongora Aldair E et al. “A Bayesian experimental autonomous researcher for mechanical design”. In: Science Advances 6.15 (2020), eaaz1708 (cit. on p. 1). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].Kohavi Ron, Wolpert David H, et al. “Bias plus variance decomposition for zero-one loss functions”. In: ICML. Vol. 96. 1996, pp. 275–83 (cit. on p. 2). [Google Scholar]
- [31].Anselmi Fabio et al. “Symmetry-adapted representation learning”. In: Pattern Recognition 86 (2019), pp. 201–208 (cit. on p. 2). [Google Scholar]
- [32].Senior Andrew W et al. “Improved protein structure prediction using potentials from deep learning.” In: Nature (January. 2020). ISSN: 1476-4687. DOI: 10.1038/s41586-019-1923-7. URL: http://www.ncbi.nlm.nih.gov/pubmed/31942072 (cit. on p. 2). [DOI] [PubMed] [Google Scholar]
- [33].Qi Chen Tian et al. “Neural ordinary differential equations”. In: Advances in neural information processing systems. 2018, pp. 6571–6583 (cit. on p. 2). [Google Scholar]
- [34].Rackauckas Christopher et al. “Universal Differential Equations for Scientific Machine Learning”. In: (2020). arXiv: 2001.04385 [cs.LG] (cit. on pp. 2, 3). [Google Scholar]
- [35].Innes Mike et al. “Zygote: A differentiable programming system to bridge machine learning and scientific computing”. In: arXiv preprint arXiv:1907.07587 (2019) (cit. on pp. 2, 3). [Google Scholar]
- [36].Purja Pun GP et al. “Physically informed artificial neural networks for atomistic modeling of materials”. In: Nature communications 10 (2019) (cit. on p. 3). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [37].Ouyang Long et al. “Practical Optimal Experiment Design With Probabilistic Programs”. In: CoRR; (2016). arXiv: 1608.05046 [cs.AI]. URL: http://arxiv.org/abs/1608.05046v1 (cit. on p. 3). [Google Scholar]
- [38].Jia Xiwen et al. “Anthropogenic biases in chemical reaction data hinder exploratory inorganic synthesis”. In: Nature 573.7773 (2019), pp. 251–255 (cit. on p. 3). [DOI] [PubMed] [Google Scholar]
- [39].Heckerman David, Meek Christopher, and Cooper Gregory. “A Bayesian approach to causal discovery”. In: Computation, causation, and discovery 19 (1999), pp. 141–166 (cit. on p. 3). [Google Scholar]
- [40].Vajda Steven. Probabilistic programming. Academic Press, 2014. (cit. on p. 3). [Google Scholar]
- [41].Shechtman Daniel. “Quasi-Periodic Crystals—The Long Road from Discovery to Acceptance”. In: Rambam Maimonides medical journal 4.1 (2013) (cit. on p. 4). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [42].Russakovsky Olga et al. “ImageNet Large Scale Visual Recognition Challenge”. In: International Journal of Computer Vision 115.3 (April. 2015), pp. 211–252. ISSN: 1573-1405. DOI: 10.1007/s11263-015-0816-y. URL: 10.1007/s11263-015-0816-y (cit. on p. 4). [DOI] [Google Scholar]
- [43].Tinkle Sally et al. “Sharing data in materials science”. In: Nature 503 (2013) (cit. on p. 5). [DOI] [PubMed] [Google Scholar]
- [44].Ward Charles H and Warren James A. Materials genome initiative: materials data. US Department of Commerce, National Institute of Standards and Technology, 2015. DOI: 10.6028/NIST.IR.8038 (cit. on p. 5). [DOI] [Google Scholar]
- [45].Jain Anubhav, Persson Kristin A, and Ceder Gerbrand. “Research Update: The materials genome initiative: Data sharing and the impact of collaborative ab initio databases”. In: APL Materials 4.5 (2016), p. 053102 (cit. on p. 5). [Google Scholar]
- [46].Wilkinson Mark D et al. “The FAIR Guiding Principles for scientific data management and stewardship”. In: Scientific data 3 (2016) (cit. on p. 5). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [47].Simon Carl G and Lin-Gibson Sheng. “Combinatorial and High-Throughput Screening of Biomaterials”. In: Advanced Materials 23.3 (January. 2011), pp. 369–87. ISSN: 1521-4095. DOI: 10.1002/adma.201001763. URL: http://www.ncbi.nlm.nih.gov/pubmed/20839249 (cit. on p. 6). [DOI] [PubMed] [Google Scholar]
- [48].Green Martin L., Takeuchi Ichiro, and Hattrick-Simpers Jason R.. “Applications of high throughput (combinatorial) methodologies to electronic, magnetic, optical, and energy-related materials”. en. In: Journal of Applied Physics 113.23 (June 2013), p. 231101. ISSN: 00218979. DOI: 10.1063/1.4803530. URL: http://link.aip.org/link/JAPIAU/v113/i23/p231101/s1%7B%5C&%7DAgg=doi (cit. on p. 6). [DOI] [Google Scholar]
- [49].Potyrailo Radislav and Mirsky Vladimir M.. “Combinatorial and High-Throughput Development of Sensing Materials: The First 10 Years”. In: Chemical Reviews 108.2 (2008), pp. 770–813 (cit. on p. 6). [DOI] [PubMed] [Google Scholar]
- [50].Maier Wilhelm F, Stöwe Klaus, and Sieg Simone. “Combinatorial and High-Throughput Materials Science.” In: Angewandte Chemie (International ed. in English) 46.32 (January. 2007), pp. 6016–67. ISSN: 1433-7851. DOI: 10.1002/anie.200603675. URL: http://www.ncbi.nlm.nih.gov/pubmed/17640024 (cit. on p. 6). [DOI] [PubMed] [Google Scholar]
- [51].Potyrailo Radislav et al. “Combinatorial and high-throughput screening of materials libraries: review of state of the art.” In: ACS Combinatorial Science 13.6 (November. 2011), pp. 579–633. ISSN: 2156-8944. (cit. on p. 6). [DOI] [PubMed] [Google Scholar]
- [52].Burnett TL and Withers PJ. “Completing the picture through correlative characterization”. In: Nature materials (2019), p. 1 (cit. on p. 6). [DOI] [PubMed] [Google Scholar]
- [53].Shi Weisong et al. “Edge computing: Vision and challenges”. In: IEEE Internet of Things Journal 3.5 (2016), pp. 637–646 (cit. on p. 6). [Google Scholar]
- [54].Program Solicitation NSF 19-526: Materials Innovation Platforms. Tech. rep 2018. URL: https://www.nsf.gov/funding/pgm_summ.jsp?pims_id=505133 (cit. on p. 6). [Google Scholar]
- [55].Workshop on Advanced Energy Materials Discovery, Development, and Process Design Utilizing High-Throughput Experimental Methods, Artificial Intelligence, Autonomous Systems, and a Collaboratory Network. Tech. rep. DOE Office of Energy Efficiency and Renewable Energy, 2018. URL: https://www.energy.gov/sites/prod/files/2018/12/f58/Multi-Agency%20Multi-Year%20Program%20Plan%20in%20Advanced%20Energy%20Materials%20Discovery%20Development%20and%20Process%20Design_Workshop%20Summary%20Report.pdf (cit. on p. 6). [Google Scholar]
- [56].Gil Yolanda and Selman Bart. “A 20-Year Community Roadmap for Artificial Intelligence Research in the US”. In: arXiv preprint arXiv:1908.02624 (2019) (cit. on p. 7). [Google Scholar]
- [57].Cohen Aron J, Mori-Sánchez Paula, and Yang Weitao. “Insights into current limitations of density functional theory”. In: Science 321.5890 (2008), pp. 792–794 (cit. on p. 8). [DOI] [PubMed] [Google Scholar]
- [58].Kirklin Scott et al. “The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies”. In: npj Computational Materials 1.1 (2015), pp. 1–15 (cit. on p. 8). [Google Scholar]
- [59].Hanisch Robert J, Gilmore Ian S, and Plant Anne L. “Improving Reproducibility in Research: The Role of Measurement Science”. In: Journal of Research of the National Institute of Standards and Technology 124 (2019), pp. 1–13 (cit. on p. 8). [DOI] [PMC free article] [PubMed] [Google Scholar]
- [60].Riley Patrick. Three pitfalls to avoid in machine learning. 2019. (cit. on p. 8). [DOI] [PubMed] [Google Scholar]
- [61].Schmidt Philipp and Biessmann Felix. “Quantifying Interpretability and Trust in Machine Learning Systems”. In: arXiv preprint arXiv:1901.08558 (2019) (cit. on p. 9). [Google Scholar]
- [62].Herman Bernease. “The promise and peril of human evaluation for model interpretability”. In: arXiv preprint arXiv:1711.07414 (2017) (cit. on p. 9). [Google Scholar]
- [63].Holm Elizabeth A. “In defense of the black box”. In: Science 364.6435 (2019), pp. 26–27 (cit. on p. 9). [DOI] [PubMed] [Google Scholar]
- [64].Raccuglia Paul et al. “Machine-learning-assisted materials discovery using failed experiments”. In: Nature 533.7601 (2016), p. 73 (cit. on p. 9). [DOI] [PubMed] [Google Scholar]
- [65].The Minerals Metals & Materials Society (TMS). Creating the Next-Generation Materials Genome Initiative Workforce. Pittsburgh, PA: TMS, 2019. ISBN: 978-0-578-60369-8. URL: 10.7449/mgiworkforce_1 (cit. on p. 10). [DOI] [Google Scholar]
- [66].Alekseeva Liudmila et al. “The Demand for AI Skills in the Labor Market”. In: (2020). URL: https://cepr.org/active/publications/discussion_papers/dp.php?dpno=14320 (cit. on p. 10). [Google Scholar]
- [67].Kalidindi SR. Materials Data Sciences and Informatics. https://www.coursera.org/learn/material-informatics. Accessed 13/05/2020. 2018. (cit. on p. 10). [Google Scholar]
- [68].Klimeck Gerhard et al. “nanohub. org: Advancing education and research in nanotechnology”. In: Computing in Science & Engineering 10.5 (2008), pp. 17–23 (cit. on p. 10). [Google Scholar]
- [69].Carlos Verduzco Gastelum Juan, Strachan Alejandro, and Desai Saaketh. Machine Learning for Materials Science: Part 1. February. 2019. DOI: doi : 10.21981/WGQC-3249. URL: https://nanohub.org/resources/mseml (cit. on p. 10). [DOI] [Google Scholar]
- [70].Carlos Verduzco Gastelum Juan and Strachan Alejandro. Citrine Tools for Materials Informatics. December. 2019. DOI: doi: 10.21981/EH1N-T337. URL: https://nanohub.org/resources/citrinetools (cit. on p. 10). [DOI]
- [71].Grother Patrick J. “NIST special database 19”. In: Handprinted forms and characters database, National Institute of Standards and Technology; (1995) (cit. on p. 11). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data sharing is not applicable to this article as no new data were created or analysed in this study.

