Assembling the world's largest materials image and spectroscopy dataset enables training of machine learning models that learn hidden relationships in materials data, providing a key example of the data requirements to capitalize on recent advancements in computer science.
Abstract
As the materials science community seeks to capitalize on recent advancements in computer science, the sparsity of well-labelled experimental data and limited throughput by which it can be generated have inhibited deployment of machine learning algorithms to date. Several successful examples in computational chemistry have inspired further adoption of machine learning algorithms, and in the present work we present autoencoding algorithms for measured optical properties of metal oxides, which can serve as an exemplar for the breadth and depth of data required for modern algorithms to learn the underlying structure of experimental materials science data. Our set of 178 994 distinct materials samples spans 78 distinct composition spaces, includes 45 elements, and contains more than 80 000 unique quinary oxide and 67 000 unique quaternary oxide compositions, making it the largest and most diverse experimental materials set utilized in machine learning studies. The extensive dataset enabled training and validation of 3 distinct models for mapping between sample images and absorption spectra, including a conditional variational autoencoder that generates images of hypothetical materials with tailored absorption properties. The absorption patterns auto-generated from sample images capture the salient features of ground truth spectra, and band gap energies extracted from these auto-generated patterns are quite accurate with a mean absolute error of 180 meV, which is the approximate uncertainty from traditional extraction of the band gap energy from measurements of the full transmission and reflection spectra. Optical properties of materials are not only ubiquitous in materials applications but also emblematic of the confluence of underlying physical phenomena yielding the type of complex data relationships that merit and benefit from neural network-type modelling.
Introduction
Recent advances in computer science1–4,38 enable materials scientists to predict new properties,2 generate entirely new materials,5 and identify reaction pathways.6 Illustrative examples of predictive machine learning models in materials science include the prediction of optical and electrical properties based on representations of crystal structures as fragments7–9 and the prediction of materials with complex electronic structures such as thermoelectrics10 or organic light emitting diodes.11 These successful implementations of modern machine learning algorithms are mostly limited to theoretical (i.e. computational) data, leaving an open question as to whether such algorithms can be impactful in materials science experiments. Many machine learning algorithms require diverse, expansive training datasets, limiting their adoption in experimental materials science where such data is generally not available. High-throughput materials science12–17 can help address these data scarcity issues, as demonstrated by Zakutayev et al.18 with their utilization of high throughput experiment data to train a random forest model that predicts electrical resistivity from material composition using 16 093 distinct materials. While there exists a variety of algorithms for machine learning in low data regimes (<103 samples), which to date have been primarily applied in organic chemistry,19–23 meaningful exploration and prediction in the breadth of materials compositions offered by the periodic table will most likely require significantly larger datasets.
Among the materials science machine learning algorithms reported to date, random forest models are commonly used, which is understandable given their predictive power, but the lack of interpretability of this and other machine learning models limit their ability to generate new materials knowledge. Design of materials with tailored properties is central to materials research, and machine learning-based acceleration of materials design was demonstrated by Gómez-Bombarelli et al.5 through development of a conditional variational autoencoders (cVAE) that predicts new organic molecules based on user-specified properties. Variational autoencoders (VAE)24 and cVAEs25 utilize neural networks, whose deployment in materials science can enable new modes of scientific discovery through exploration of the latent space to reveal new and previously unknown relationships.26 Our quest to develop models that learn the underlying structure of experimental materials data has resulted in the development of a VAE and cVAE to predict optical absorption spectra from images of materials, and images from user-tailored absorption spectra.
Our focus on optical characterization data is motivated by the importance of optical properties for a broad span of technologies, from computer displays to solar energy utilization.27 The data employed in the present work results from years of extensive materials synthesis and optical characterization using a fixed set of high throughput instruments in the Joint Center for Artificial Photosynthesis.28,29 Optical characterizations utilizing inexpensive commercial sensors are particularly amenable to high throughput experimentation, making coarse optical characterization of new materials more expedient by experiment than by computation, particularly due to the high computational expense for predicting optical properties like band gap energy at reasonable accuracy; state of the art hybrid functionals require several CPU hours per material to achieve a bandgap prediction RMSE of 0.74 eV for metal oxide materials.30 The recently reported machine learning model by Oses et al.7 achieves an RMSE of 0.51 eV for computationally-predicted band gaps, which improves band gap prediction but not band gap measurement. Recently published algorithms have automated the extraction of band gap energy from an ultraviolet-visible (UV-vis) optical absorption spectrum,30,31 leaving spectrum acquisition as the rate limiting step of band gap measurement. As a result, the prediction of absorption spectra from a higher throughput experimental technique, such as imaging with a consumer product, would be quite impactful. We demonstrate machine learning automation of this task by combining a VAE with a deep neural network, requiring only an image of the material as input. We also exploit machine learning of the relationships between image and absorption spectrum to create a predictive model for the image of a material with tailored optical absorption properties, which is the first generative model5,19 trained exclusively from experimental materials data.
Results and discussion
Design and training of machine learning models
At a high level, imaging a material with a standard sensor, such as a red-green-blue (RGB) complementary metal oxide semiconductor (CMOS) sensor, is a spatially resolved measurement of an optical property averaged over some spectral range, including some spectral overlap of the 3 color filters.32 The optical property being measured is an unknown combination of reflection, absorption and transmission properties, which is complementary to a spectral optical absorption measurement that averages over a sample region (lower spatial resolution) but uses spectrometers to attain high energy (wavelength) resolution. The standard spectral absorption technique also employs distinct transmission and reflection measurements from which the spectral absorption can be modelled. The inability to derive a first-principles transformation between these 2 types of optical data arises from the unknown relationship between the RGB image and absorption, and unknown mapping from the broad spectral response of each CMOS channel to the high energy resolution of an absorption spectrum, which in the present case is 220 energies between 1.31 and 3.1 eV. Deriving such a mapping would be facilitated by a low-parameter functional form for how absorption varies with energy in metal oxide materials, but such a model is not forthcoming due to the various types of absorption phenomena and the mixing of absorption signals from multiple phases in the typically mixed-phase metal oxide samples. Consequently, machine learning of the underlying data relationships is the only viable option. Predicting absorption spectra from images is thus only possible if a machine learning algorithm can exploit “hidden” information in the high spatial-resolution images, i.e. patterns unbeknownst to expert materials scientists.
Our exploration of the ability of machine learning to model complex relationships in materials data proceeded through the development of 3 models (Fig. 1) with training and validation data extracted from a set of 180 902 images and spectra, including 1908 “blank” samples (nothing deposited on the substrate) and 178 994 metal oxide samples synthesized via inkjet printing of mixed elemental precursors followed by thermal processing in an O2-containing atmosphere. The metal oxides samples contain various combinations of 1 to 4 cation elements along with various inkjet printing and thermal processing parameters, and while these important metadata are included in the dataset, they are not used in the models describe herein.
In the present work, the absorption spectrum of a sample is taken to be the product of the spectral absorption coefficient and the sample thickness, which is a unitless quantity and the standard output from spectral absorption measurements. The thickness of the inkjet printed samples are on the order of 100 nm, so an absorption spectrum value of 1 corresponds to an absorption coefficient on the order of 105 cm–1. As previously described,23 the inkjet printed samples are rough over multiple length scales, further complicating the relationship between images and absorption spectra. The discrete samples are deposited over approximately 1 mm2 of the glass substrate with a 2 mm sample pitch. Each sample image is automatically cropped from the image of the entire library of materials on the glass substrate, and each 2.1 mm × 2.1 mm sample image is sufficiently large to cover the extent of each material with a border of bare substrate.
Model 1 – variational autoencoder
To establish the appropriate methods for encoding images of metal oxides, we commenced with the design and training of model 1, an autoencoder for flatbed scanner images of materials synthesized by the inkjet printing technique. An autoencoder takes an input (here an image of a material) and encodes it into a latent space of lower dimension (here 100 dimensions). Decoding a latent space coordinate produces an image in the same format as the input data, making the process akin to lossy compression, and the latent space (compressed) representation can enable new analyses and algorithms. Models employing convolutional layers19 excel at reconstructing sample morphology and were thus employed in model 1, requiring hyperparameter optimization as described below.
Model 2 – prediction of UV-vis spectra
The Absorption Spectra Prediction Model (ASPM) builds upon the compact latent space representation of the VAE to predict a UV-vis absorption spectrum (220 energies). Under the assumption that the image encoder captures various image properties such as the color, color variation, morphology, etc. in its latent space representation, this approach exploits the high information density of the latent space (100 dimensions compared to the 12 288 dimensions of the 64 × 64 RGB image) for the construction of absorption spectra, in this case using a hybrid dense and convolutional deep neural network model that was trained independently from model 1.
Model 3 – conditional variational autoencoder
The conditional Variational Autoencoder (cVAE) follows the general structure of the VAE with modified inputs for both the encoding and decoding algorithms. The encoder input is the concatenation of the flattened image and absorption spectrum, and the decoder input is the concatenation of the latent space coordinate and the conditional absorption spectrum so that the resulting image represents the latent space coordinate under the condition that the material exhibits the specified ‘conditional’ absorption spectrum. During training, the same absorption spectrum was used in the encoder and decoder inputs as noted in Fig. 1. During application of the model (conditional decoding), the conditional absorption spectrum was user-specified as described below.
Image autoencoding and spectral prediction
The VAE of model 1 was trained for 100 epochs after training hyperparameters including the number of filters in the convolutional layers and latent space dimensions, as shown in the ESI.† Using t-distributed stochastic neighbor embedding (t-SNE),33 the 100-dimensional latent space can be visualized as shown in Fig. 2a for the 54 270 images of test set, where each sample point is plotted using its representative color (see figure caption). Even though the VAE was not supplied any spectral information, it inherently exploits spectral features during autoencoding, as evident from the black-brown to blue-purple color gradient from left to right. The apparent clustering of samples, particularly those with a similar representative color, is emblematic of the structuring in the latent space based on optical properties.
Example raw (Ii) and VAE-reconstructed (IĨi) images are shown in Fig. 3, demonstrating that the general appearance and especially the human-perceived color of the materials is well reconstructed, however with some additional blurring that occurs in image autoencoding with dimensionality of the latent space well below that of the images.24,25 Since an absorption spectrum is measured with illumination of the entire sample, yielding the spatially “averaged” absorption signal, this blurriness of the reconstructed images is not important for the present purposes, but it is worth noting that the presence of a so-called coffee ring35 in Ii typically results in a darker edge of the sample blob in IĨi. The VAE preservation of perceived color (Fig. 3) and color-based clustering in the latent space (Fig. 2a) indicate that the VAE successfully encoded spectral features even though the model was not supplied any spectral information, motivating the use of latent space representations for predicting spectral absorption.
Absorption spectrum prediction
The Absorption Spectra Prediction Model (ASPM) was trained and validated using the VAE latent space coordinates, with the same train-test split used in model 1. The weights of the VAE of model 1 were no longer trainable at this stage. Overall, there was good convergence for the ASPM across the energy range of the absorption spectra as shown in Fig. 4. The relatively high and consistent R2 and Pearson correlation coefficients together with low residual losses demonstrate that at each energy value, the measured absorption spectra are well-reconstructed by model 2. Visual inspection of the absorption spectrum prediction for a span of representative samples is shown in Fig. 5 that compares ground truth absorption spectra (green) from the test set and their prediction (black) from model 2. The figure includes a row of plots from each loss decile, with ten randomly selected samples in each row. For up to the 80th loss percentile, the predicted spectra appear in good agreement with the ground truth spectra. Impressively, the model reconstructs fine features of the absorption spectra such as local maxima that result from sub-band gap absorption or thin-film interference, even when these features occur over spectral ranges much smaller than the sensitivity range of the original RGB sensor. Even an expert materials scientist cannot identify the presence of such features from inspection of an image, demonstrating the super-human analysis capabilities of these machine learning models.
The quality of the predicted UV-vis spectra allows their utilization for estimating band gap energy, which is typically a manual human analysis exercise but has recently been automated to identify a representative band gap energy for a given absorption spectrum.31,34 As most of the herein studied materials are multiphase materials (due to their high computational order) it should be noted that the MARS algorithm employed here returns only a single representative band gap energy without a measure of uncertainty. For each sample i, the performance of model 2 for band gap estimation is thus evaluated by comparing the MARS-identified direct band gap energy from the ground truth spectra Si to that from the predicted spectra S[combining tilde]i, as summarized in Fig. 6 for the test set. The mean band gap error is –10 meV (median 8.8 meV); the root mean squared error is 261 meV; and the mean absolute error is 180 meV. The prediction of band gaps based on the latent space representation of images therefore outperforms the ab initio calculations noted above by extracting knowledge from the coarse optical characterization data in the flatbed scanner images.
Conditional variational autoencoder
A complementary demonstration of the ability of machine learning models to encode materials properties is the development of a generative model that makes predictions of materials data from user-specified properties. For this purpose, the cVAE of model 3 was designed to predict how a printed material looks like based on a target absorption spectrum. From visual inspection of reconstructed images using the experimental image and spectrum as input the cVAE performs relatively similar to the VAE.
To generate conditionally decoded images, we used a random sample from the test set and identified its latent space coordinate Z[combining tilde]i. From this fixed point in the cVAE latent space, various tailored absorption spectra were applied as the conditional input to the decoder, resulting in cVAE-generated images of hypothetical materials as shown in Fig. 7.
To generate a series of synthetic absorption spectra that each represent a semiconductor's absorption edge, a sigmoid function was used with variation of the inflection point energy (representing absorption edge energy, increasing from bottom to top in Fig. 7a) and the slope (representing sharpness of absorption increase, decreasing from left to right in Fig. 7a). Application of the MARS algorithm to this series of absorption patterns results in band gap variation of approximately 1.4 to 2.9 eV. To model different material thickness or maximum absorption coefficient, this family of synthetic absorption spectra was scaled to maximum values of 0.84, 0.42, and 0.21 for image generation in Fig. 7b–d, respectively. The highest absorption factor measured in the test set was 0.75, making the generated images in Fig. 7b an extension beyond the span of absorption spectra in the train set. Fig. 7b is commensurate with the general observation of metal oxide semiconductors that a material with a high band gap typically appears yellowish-transparent (e.g. BiVO4 with 2.5 eV band gap), a material with an intermediate band gap appears red-brown (e.g. Fe2O3 with 2.2 eV band gap), and a material with very low band gap appears blue-grey (e.g. Si with 1.2 eV band gap). The apparent transparency and saturation of the generated images is also quite intuitive as high absorption values and low sigmoid slopes that correspond to absorption over a broad spectral range lead to high opacity and low color saturation. With lower maximum absorption, the center part of each image tends to become grayer, which is assumed to be the model's simulation of transparency given the gray appearance of the substrate/background in the flatbed scanner images. High absorption slope in the conditional spectra results in slightly increased sample size in the decoded image, likely due to the network trying to match the absorption condition by making the absorbing part of the image larger, an unintended but interesting mechanism by which the conditional spectrum impacts shapes in the generated image.
To ascertain the relative insensitivity of the generated image to the starting latent space coordinate, Fig. S6† shows a series of images using the conditional spectra from Fig. 7b and 200 Z[combining tilde]i values from randomly chosen samples. The ability to generate simulated data for a “coarse” measurement based on a desired fundamental property enables rapid screening for desired materials, but more foundationally the cVAE demonstrates the successful training of a generative model using only experimental data, charting a pathway similar to ChemVAE in organic chemistry.5
Deploying variational autoencoders in materials science
As noted above, a VAE was chosen as the basis for model 1 due to its latent space representation that enables prediction of other properties, such as the ASPM of model 2. This is a powerful approach for situations where the training set for the desired property is smaller than that of the related, less expensive property. The demonstration of this latent space exploitation to gain “expensive” information from “inexpensive” data motivated our use of the VAE instead of other algorithms that could also predict properties, such as band gap energy, from images. A regression model would be among the simplest algorithms for such a prediction but would not provide the compact latent space representation that we believe will become a key construct in the machine learning-based acceleration of materials science.
To assess the size of materials data required for training the VAE and ASPM, we made subsets of the train set from the above models by randomly splitting it into 5 distinct sets, each with 20% of the train set size. Similarly, 10, 30 and 100 subsets were generated with 10%, 3.33% and 1% of the train set size, respectively. The VAE and ASPM (models 1 and 2) were trained from random initializations with each of these 145 subsets without use of the test set from models 1 and 2, enabling the evaluation of each trained model using a static test set (>5 104 materials), as summarized in Fig. 8. The R2 correlation increases dramatically with log test size, indicating that our successful training and use of VAE and cVAEs for experimental materials science was enabled by the >105 materials in the train set, and that expansion of the train set by additional order of magnitude would further improve the predictive power of the models.
Building more experimental materials databases of this size requires a revolution in data and metadata management. To date, computational materials datasets have been more amenable to machine learning due to relative ease in integration of data across research groups, whereas variations in experimental instruments and the lack of a framework to encode differences between instruments and experimental techniques limits assembly of large experimental databases. Consequently, the machine learning demonstrations in experimental solid state materials science, namely the work by Zakutayev et al.18 and the present work, utilize specific types of data acquired within a single research organization, and we believe these demonstrations lay the foundation for future generation of more broadly-applicable machine learning models3,36,37 in experimental solid state materials science so that the field may catch on the tremendous progress made in organic chemistry and drug discovery.19–23
Conclusion
Empowered by an unprecedented dataset of optical characterizations of metal oxide materials, we train a series of machine learning models employing convolutional and deep neural networks. A materials image autoencoder was developed by training a VAE using images of thin film materials acquired with a consumer flatbed scanner. The VAE, even though not trained with spectral information, encodes spectral characteristics in its information-rich latent space, enabling the development of a DNN model for predicting the full UV-vis absorption spectrum of a material from only its image. Band gap energies extracted from the predicted spectra match the uncertainty from the extraction algorithm and supersedes common ab initio methods for phase-pure materials. An additional model predicts the image of a hypothetical material based on its user-specified absorption pattern, providing the first example of a cVAE model trained exclusively of experimental materials data. This study has been enabled by the construction of a database of over 105 materials, demonstrating the utility of high throughput experiments with rigorous data management for further adoption of machine learning in experimental materials science.
Methods
The dataset for this study was generated through a database search in the Materials Experiment and Analysis Database (MEAD) of the High-Throughput Experimentation (HTE) group at the Joint Center for Artificial Photosynthesis at Caltech. The dataset containing sample images, UV-vis spectra and composition can be found at The dataset for this study was generated through a database search in the Materials Experiment and Analysis Database (MEAD) of the High-Throughput Experimentation (HTE) group at the Joint Center for Artificial Photosynthesis at Caltech. The dataset containing sample images, UV-vis spectra and composition can be found at 〈TBD〉. We cannot assert the absence of a bias towards either larger or smaller bandgap materials in this dataset since there is no comparably large dataset with optical properties on mixed metal oxides to compare. Samples were synthesized using ink-jet printing of precursors salts, typically metal nitrates, that are subsequently annealed to form metal oxides.TBDThe dataset for this study was generated through a database search in the Materials Experiment and Analysis Database (MEAD) of the High-Throughput Experimentation (HTE) group at the Joint Center for Artificial Photosynthesis at Caltech. The dataset containing sample images, UV-vis spectra and composition can be found at 〈TBD〉. We cannot assert the absence of a bias towards either larger or smaller bandgap materials in this dataset since there is no comparably large dataset with optical properties on mixed metal oxides to compare. Samples were synthesized using ink-jet printing of precursors salts, typically metal nitrates, that are subsequently annealed to form metal oxides.. We cannot assert the absence of a bias towards either larger or smaller bandgap materials in this dataset since there is no comparably large dataset with optical properties on mixed metal oxides to compare. Samples were synthesized using ink-jet printing of precursors salts, typically metal nitrates, that are subsequently annealed to form metal oxides.31 Optical absorption spectra were recorded using an on-the-fly scanning UV-vis dual-sphere spectrometer as described elsewhere.22 Sample images were acquired using a commercially-available consumer flatbed scanner (EPSON Perfection V600) in reflection configuration as described elsewhere.23 The scanner acquired 1200 dpi images at a rate of 2.0 cm2 s–1, corresponding to 0.019 s per sample with our library design of approximately 1 mm2 samples on a square grid with 2 mm pitch. Original 2.1 mm × 2.1 mm sample images were 101 × 101 pixels with 24 bit color depth and were rescaled to 64 × 64 pixels via the python image library (pillow) with anti-aliasing.
All calculations were performed on an Alienware Aurora R7 workstation equipped with an Intel i7-8700K@3.70 GHz CPU, 32 GB RAM, a Nvidia GTX1080Ti GPU with 12 GB dedicated GPU memory. Software used was Python version 3.6.4, Keras version 2.1.5, and TensorFlow version 1.1.0. The random test-train split was 30% test, 70% train. Since we base no decision on the test set an additional validation set is not generated. The test-train split is random.
Machine learning model descriptions
Model 1
For autoencoding a convolutional variational autoencoder was trained. The encoder uses a series of four 2D convolutional layers that are followed by a max pooling layer with 2 × 2 poolsize. All layers used the ReLU activation function, filter sizes were 8, 16, 4, 4 and kernel size of 3 × 3, found through hyperparameter optimization (see ESI†), resulting in 23 778 trainable parameters, less than one fifth the size of the training set. The output of the convolutional layers is flattened and passed to two layers μ and σ with 100 output dimensions (length of latent space embedding also optimized through hyperparameter optimization) and linear activation. The output of these was passed to a sampling layer z that samples the latent space via:z = μ + aεeσ/2where ε is a random normal tensor of the same shape as μ with zero mean and unit variance. During training the constant a is set to one, otherwise zero. The model until here is the encoder EVAE as shown in Fig. 1. The output of z (i.e. the encoder output) is fed to a Dense layer with 64 output dimensions with ReLU activation. The output of this layer is reshaped to match the dimensions of the last convolutional layer of the encoder to be able to mirror its structure (i.e. 4 × 4 × 8). The decoder mirrors the encoder such that four 2D convolutional layers are each followed by a up sampling 2D layer with kernel size 2 × 2. The filter sizes of the 2D convolutional layer are reversed i.e. 4, 4, 16, 8. The final decoder layer is a convolutional 2D layer with 3 filters (corresponding to RGB layers) and sigmoid activation. The model from latent space to sample image is the decoder, DVAE. The model is trained using the Adam optimizer over 50 epochs. The training loss is the sum of the Kullback–Leibler divergence and the image reconstruction binary cross entropy which is multiplied by the number of values in the output image (12 288). The scaling of the binary cross entropy ensures convergence of both the KL-loss and the image reconstruction during training. When the reconstruction loss was not weighted the KL-loss converged but images were not reconstructed well.
Model 2
For spectrum prediction using the ASPM model a mixture of Dense and 1D convolutional layers is used. Since the latent space entails no spatial information we use two dense layers with output dimensions of 100 (found via hyperparameter optimization) and 55 using a PReLU activation. On the output of the second dense layer two convolutional 1D layers (again with PReLU activation) and filter sizes of 64 and 32, kernel sizes of 10 and 20 are used that are each followed by a upsampling 1D layer. The ASPM is trained using the Adam optimizer and a multiplied R2 score and MSE loss. This loss is an R2 score that is calculated and subsequently averaged along the energy of all spectra in a batch i.e. the per-energy R2 is calculated for each batch along the 220 energies (Re2) and translated to a loss via subtraction from 1 (i.e. Re,loss2 = 1 – Re2). The loss L for the spectrum model is therefore:
An equivalent definition of L is the MSE loss scaled by the sum of the fractions of unexplained variance per energy. Interestingly the MSE between train and test set varies only slightly while the Re2 loss varies significantly.
Model 3
The conditional Variational Autoencoder (cVAE) followed the structure of a purely dense layer implementation of the VAE except for the concatenation of an absorption spectrum to the image prior to encoding and to the latent space before decoding.25 The spectra were not scaled or transformed.
Conflicts of interest
There are no conflicts to declare.
Supplementary Material
Acknowledgments
This study is based upon work performed by the Joint Center for Artificial Photosynthesis, a DOE Energy Innovation Hub, supported through the Office of Science of the U.S. Department of Energy (Award No. DE-SC0004993).
Footnotes
†Electronic supplementary information (ESI) available: Details of model structure, additional model results, and file containing all model weights. See DOI: 10.1039/c8sc03077d; The full dataset can be found at DOI: 10.22002/D1.1103
References
- Ramprasad R., Batra R., Pilania G., Mannodi-Kanakkithodi A., Kim C. npj Comput. Mater. 2017;3:54. [Google Scholar]
- Ward L., Wolverton C. Curr. Opin. Solid State Mater. Sci. 2017;21:167–176. [Google Scholar]
- Suram S. K., Pesenson M. Z. and Gregoire J. M.High Throughput Combinatorial Experimentation + Informatics = Combinatorial Science, in Information Science for Materials Discovery and Design 271–300, Springer International Publishing, 2015, 10.1007/978-3-319-23871-5_14. [DOI] [Google Scholar]
- Rajan K. Annu. Rev. Mater. Res. 2015;45:153–169. [Google Scholar]
- Gómez-Bombarelli R. ACS Cent. Sci. 2018;4:268–276. doi: 10.1021/acscentsci.7b00572. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ulissi Z. W., Medford A. J., Bligaard T., Nørskov J. K. Nat. Commun. 2017;8:14621. doi: 10.1038/ncomms14621. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oses C. Nat. Commun. 2017;8:1–12. [Google Scholar]
- Isayev O. Chem. Mater. 2015;27:735–743. [Google Scholar]
- Schütt K. T. Phys. Rev. B: Condens. Matter Mater. Phys. 2014;89:1875. [Google Scholar]
- Carrete J., Li W., Mingo N., Wang S., Curtarolo S. Phys. Rev. X. 2014;4:18. [Google Scholar]
- Gómez-Bombarelli R. Nat. Mater. 2016;15:1120–1127. doi: 10.1038/nmat4717. [DOI] [PubMed] [Google Scholar]
- Green M. L. Appl. Phys. Rev. 2017;4:011105. [Google Scholar]
- Setyawan W., Curtarolo S. Comput. Mater. Sci. 2010;49:299–312. [Google Scholar]
- Ren F. Sci. Adv. 2018;4:eaaq1566. doi: 10.1126/sciadv.aaq1566. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ludwig A., Zarnetta R., Hamann S. J. Mater. Chem. A. 2008;99:1144–1149. [Google Scholar]
- Woodhouse M., Parkinson B. A. Chem. Soc. Rev. 2008;38:197–210. doi: 10.1039/b719545c. [DOI] [PubMed] [Google Scholar]
- Woodhouse M., Herman G. S., Parkinson B. A. Chem. Mater. 2005;17:4318–4324. [Google Scholar]
- Zakutayev A., et al., High Throughput Experimental Materials Database, 2017, 10.7799/1407128. [DOI] [Google Scholar]
- Altae-Tran H., Ramsundar B., Pappu A. S., Pande V. ACS Cent. Sci. 2017;3:283–293. doi: 10.1021/acscentsci.6b00367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duros V. Angew. Chem., Int. Ed. 2017;56:10815–10820. doi: 10.1002/anie.201705721. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dragone V., Sans V., Henson A. B., Granda J. M., Cronin L. Nat. Commun. 2017;8:15733. doi: 10.1038/ncomms15733. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roch L. M., et al., ChemOS: An Orchestration Software to Democratize Autonomous Discovery, 2018, 10.26434/chemrxiv.5953606.v1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Houben C., Lapkin A. A. Curr. Opin. Chem. Eng. 2015;9:1–7. [Google Scholar]
- Kingma D. P. and Welling M., Auto-encoding variational bayes, iclr 2016 stat.ml, 1312.6114v10.
- Radford A., Metz L. and Chintala S., Unsupervised representation learning with deep convolutional generative adversarial networks, ICLR 2016 1511.06434v2.
- Pyzer-Knapp E. O., Li K., Aspuru-Guzik A. Adv. Funct. Mater. 2015;25:6495–6502. [Google Scholar]
- Döscher H., Geisz J. F., Deutsch T. G., Turner J. A. Energy Environ. Sci. 2014;7:2951–2956. [Google Scholar]
- Mitrovic S. Rev. Sci. Instrum. 2015;86:013904. doi: 10.1063/1.4905365. [DOI] [PubMed] [Google Scholar]
- Mitrovic S. ACS Comb. Sci. 2015;17:176–181. doi: 10.1021/co500151u. [DOI] [PubMed] [Google Scholar]
- Morales-García Á., Valero R., Illas F. J. Phys. Chem. C. 2017;121:18862–18866. [Google Scholar]
- Schwarting M., Siol S., Talley K., Zakutayev A., Phillips C. Materials Discovery. 2017;10:43–52. [Google Scholar]
- Agranov G., Berezin V., Tsai R. H. IEEE Trans. Electron Devices. 2003;50:4–11. [Google Scholar]
- van der Maaten L. J. Mach. Learn. Res. 2014;15:3221–3245. [Google Scholar]
- Suram S. K., Newhouse P. F., Gregoire J. M. ACS Comb. Sci. 2016;18:673–681. doi: 10.1021/acscombsci.6b00053. [DOI] [PubMed] [Google Scholar]
- Li H. Chem. Sci. 2018;9:7596–7605. doi: 10.1039/c8sc03302a. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xue Y. et al., Phase-Mapper: An AI Platform to Accelerate High Throughput Materials Discovery. aaai.org IAAI-17, pp. 4635–4642.
- Stein H. S., Jiao S., Ludwig A. ACS Comb. Sci. 2017;19:1–8. doi: 10.1021/acscombsci.6b00151. [DOI] [PubMed] [Google Scholar]
- Sanchez-Langeling B., Aspuru-Guzik A. Science. 2018;361(6400):360–365. doi: 10.1126/science.aat2663. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.