Skip to main content
ACS AuthorChoice logoLink to ACS AuthorChoice
. 2022 Jul 29;126(31):13053–13061. doi: 10.1021/acs.jpcc.2c04725

Data-Driven Analysis of Hole-Transporting Materials for Perovskite Solar Cells Performance

Marcos del Cueto †,*, Charles Rawski-Furman , Juan Aragó , Enrique Ortí , Alessandro Troisi
PMCID: PMC9376947  PMID: 35983311

Abstract

graphic file with name jp2c04725_0010.jpg

We have created a dataset of 269 perovskite solar cells, containing information about their perovskite family, cell architecture, and multiple hole-transporting materials features, including fingerprints, additives, and structural and electronic features. We propose a predictive machine learning model that is trained on these data and can be used to screen possible candidate hole-transporting materials. Our approach allows us to predict the performance of perovskite solar cells with reasonable accuracy and is able to successfully identify most of the top-performing and lowest-performing hole-transporting materials in the dataset. We discuss the effect of data biases on the distribution of perovskite families/architectures on the model’s accuracy and offer an analysis with a subset of the data to accurately study the effect of the hole-transporting material on the solar cell performance. Finally, we discuss some chemical fragments, like arylamine and aryloxy groups, which present a relatively large positive correlation with the efficiency of the cell, whereas other groups, like thiophene groups, display a negative correlation with power conversion efficiency (PCE).

1. Introduction

Photovoltaic cells have been gaining popularity as a renewable source of energy and, among the different types, perovskite solar cells (PSCs) have emerged as low-cost alternatives to traditional silicon-based devices. Since the seminal work by Miyasaka and co-workers in 2009,1 the power conversion efficiencies (PCEs) for PSC-based devices have exceptionally risen from the initial 3.8% to values over 25%.2 This unprecedented enhancement is due to the outstanding intrinsic features of perovskites, including wide light absorption (from visible to near-infrared), tunable bandgap energy, long electron/hole diffusion lengths, high charge-carrier mobilities, and also solution-processed fabrication.3,4 Nevertheless, PSCs still have stability issues. To overcome this limitation, the common organic–inorganic metal-halide perovskites can be compositionally engineered (i.e., by modifying both the cations and anions of the ABX3 stoichiometry) to obtain novel and enhanced perovskite materials with improved stability and efficiency.46 In the most widely used device architecture, the active perovskite is infiltrated in a mesoporous scaffold of TiO2 and subsequently covered by a layer of a hole-transporting material (HTM). The hole-transporting layer extracts the positive charges from the perovskite and blocks the electron movement, thus minimizing charge recombination, and also improves the stability of the perovskite layer by protecting it against moisture and oxygen.7,8 In this regard, the quest for novel, low-cost and efficient HTMs for their implementation in PSCs is an active research field.

In the last years, there have been significant efforts to synthesize a wide variety of HTMs with different structural motifs for PSC applications, as discussed in several recent reviews.913 Despite hundreds of HTMs being available, their impact on the efficiency of the PSCs is not straightforward. From a computational perspective, there have been different attempts1420 to understand the role of HTMs in the efficiency of PSCs by analyzing the structural and electronic features of the targeted HTMs with quantum-chemical calculations. However, these studies tend to be computationally expensive and limit the number of molecules one can study at once. It is therefore difficult to extract valuable structure–property guidelines for a better rational design of novel and enhanced HTMs beyond the trial-and-error approach. A possible way of overcoming this limitation is taking advantage of the available data in the literature and training a machine learning (ML) model to detect correlations between relevant HTM properties and PCEs of the PSC-based devices. Similar approaches have been used, for example, in the context of organic solar cells2123 and other energy materials.2428

To date, the ML models applied in the context of PSCs29,30 have been mainly focused on the prediction of perovskite properties, like the band gap,31,32 stability,33,34 ionic conductivities,35 and other transport properties.36 Only a few models have tried to predict solar cell performance.3739 All of these ML models focused on the perovskite, and the reduced number of HTMs limited the scope of the models to link the role of the HTM with the performance of the cell. However, in recent years, the number and types of HTMs have shown a significant increase, and we are currently able to gather hundreds of experimental PCE values from the literature. In this paper, we build an ML model to predict the PCE of the cell, using a series of features describing the properties of the HTM (fingerprints, structural properties, electronic properties, and additives), as well as the perovskite type and cell architecture. This model can be used to screen possible candidate HTMs, identifying those that are more likely to have a larger PCE, although it should be noted that in practice, one might want to consider other factors like stability or synthesizability.40

We first describe how we gather and process the data, paying attention to alternative ways to partition the data set. Then, we briefly describe how we built our ML model, how we trained it, and how the model was evaluated with unknown data. We finally have a discussion on the distribution of different chemical groups present in cells with large PCE and how one can use the fingerprints to analyze what chemical groups are positively and negatively correlated with cell efficiency. A schematic representation of this workflow is shown in Figure 1.

Figure 1.

Figure 1

Workflow used to construct the database and train/evaluate the predictive ML model.

2. Methods

We have gathered data for a total of 269 perovskite solar cells, including the PCE, perovskite type, device architecture, HTM structures, and additives from the literature. Data are sourced from multiple review articles published in the last 6 years,913 eliminating repeated data points, data that did not provide complete HTM structure or perovskite information and outliers in the PCE < 5% region, where only a few points are available and would be challenging to make predictions. We show the chemical structure of all HTMs in Figure S7 in the Supporting Information (SI). In addition, we offer the complete database and the analysis code in a public repository.41

2.1. Perovskite Features

We have classified the perovskites in our dataset in several families, broadly matching the literature classification,42 depending on their composition: methylammonium (MA) lead halide perovskites are labeled as family 1 (MAPbX3, X = Cl, Br, I), mixed-halide perovskites as family 2 (MAPbXxI3–x, X = Cl, Br), and different mixed-cation perovskites as family 3 ([FAPbIxX3–x]1–y[MAPbX3]y, FA = formamidinium, X = Cl, Br, I), family 4 ([MAxFA1–x]Pb[IwBr3–w]), family 5 (Csy[FAxMA1–x]1–yPb[IwBr3–w]), and family 6 (FAxCs1–xPbIyBr3–y). To further classify the perovskites in the PSCs, we group them as a function of the device architecture:43 mesoporous (M), planar (P), or inverted planar (IP)—sketched in Figure 2.

Figure 2.

Figure 2

Schematic representation of the three perovskite solar cell architectures present in our database: mesoporous (M), planar (P), and inverted planar (IP).

We show the number of data points in our database belonging to each family and architecture in Figure 3, where one can observe how the database presents a heterogeneous distribution of perovskite family/architecture. Our model accounts for this heterogeneity, and each PSC has a one-hot encoded44 vector of length 6 indicating its perovskite family and a one-hot encoded vector of length 3 indicating its device architecture. However, to isolate the effect of the HTM on the PSC efficiency, we have considered a subset of the total database, in which we only consider the most numerous class involving the same perovskite family and device architecture. Thus, we are considering two different databases:

  • Heterogeneous database. A total of 269 PSCs, with six different families of perovskites and three possible architectures.

  • Homogeneous database. A total of 100 PSCs, with the standard MAPbX3 perovskite and a mesoporous architecture.

Figure 3.

Figure 3

Number of data points of each perovskite family and architecture in our database.

2.2. HTM Features

Each HTM molecular structure is encoded into a SMILES string (a simplified molecular-input line-entry system that defines the structure of a chemical species as a short text string).45 The SMILES of each HTM is used to calculate different HTM features:

  • Molecular fingerprints: The chemical structure of the HTMs is expected to influence the efficiency of the solar cell, which motivates us to use their Morgan fingerprints.46 These fingerprints are bit-vectors indicating the presence or absence of specific fragments within the HTM, and thus can be used to describe the chemical similarity between our HTMs. We have used a radius of 2 and a length of 2048 bits per fingerprint.

  • Structural features: We explicitly included additional structural features that might affect the efficiency of the cell. We show in Section S1 in the SI the list of the 32 structural features we initially considered, which include the number of specific chemical groups (e.g., porphyrin, triphenylamine (TPA), carbazole, etc.) and other structural properties obtainable from the HTM SMILES (e.g., number of rotatable bonds, number of stereocenters, number of heteroatoms, etc.). As shown in the SI, some of these features are strongly correlated, and we reduced the number of relevant structural features to 24 by eliminating those with a high correlation (r > 0.7). After performing a recursive feature elimination, we obtain the best performance when using 12 structural features with the heterogeneous database (number of phthalocyanine groups, number of porphyrin groups, number of carbazole groups, number of triphenylamine groups, whether HTM is a polymer or not, number of acenaphthene groups, number of benzotrithiophene groups, number of silicon atoms, number of rotatable bonds, number of aliphatic carbocycles, number of aliphatic heterocycles and number of stereocenters), and using nine structural features with the homogeneous database (number of spiro atoms, number of carbazole groups, number of triphenylamine groups, number of acenaphthene groups, number of benzotrithiophene groups, number of sp3 carbon atoms, number of aliphatic heterocycles, number of aromatic heterocycles and molecular planarity—calculated as the sum of atomic distances from the plane of best fit).47

  • Electronic features: To account for possible limitations of the fingerprints and structural features, which are based just on the molecular topology and connectivity, we have included features obtained from the HTM electronic structure. Initially, we considered four electronic features: (i) highest occupied molecular orbital (HOMO) energy (EHOMO), which is expected to be correlated to the PCE; (ii) lowest unoccupied molecular orbital (LUMO) energy (ELUMO), which might be relevant since the HTM ELUMO should be higher than the conduction band of the perovskite to act as an electron blocker; (iii) ionization potential (IP), which might affect the generation of a hole; and (iv) reorganization energy for oxidation (λ), which largely influences the charge transport behavior of the HTM. As shown in Figure S5 in the SI, the IP and EHOMO are (as expected) strongly correlated, so we finally only considered EHOMO, ELUMO, and λ. Starting from each HTM SMILES, we optimized the most stable conformer at the semiempirical PM748 level, as described in Section S2 in the SI, and we used that geometry as a starting point to optimize the geometry of the neutral molecule and cation at the B3LYP/3-21G* level, which has been shown to reproduce similar trends to larger basis sets.49 We then performed single-point calculations of the neutral molecule and cation using CH2Cl2 as a solvent within the polarized continuum model (PCM) at the B3LYP/6-31G** level, with the Gaussian16 software.50

  • HTM additives: It is common to dope the HTMs with different additives to increase their hole mobility. We have considered the presence or absence of 10 different additives for each HTM in our database: Li-TFSi, t-BP, FK102, FK209, FK269, F4-TCNQ, MY11, Et4N-TFSi, H-TFSi, and 2-Py. These additives have been one-hot encoded, so each data point in the database has a binary array of length 10 describing its HTM additives.

2.3. Machine Learning model

We have used Kernel Ridge Regression (KRR), as implemented in Scikit-learn,51 to build our predictive model. The KRR algorithm has been previously used to predict other properties of PSCs52,53 and organic solar cells,54,55 being particularly effective for multidimensional datasets with a limited amount of data. The algorithm combines ridge regression with the kernel trick and is able to approximate nonlinear functions. Given a training set defined by {xi} features and y target property values (PCE in our case), the target property of a new data point (y′) is approximated as

2.3. 1

where α is the regularization parameter, I is the identity matrix, K is the kernel matrix defined as Ki,j= f(xi,xj), and ki= f(xi, x′). The function f is the kernel function, which effectively measures the similarity between different data points in the feature space. Our kernel function is defined as:

2.3. 2

where the γi parameters indicate the weight of each distance, and the Di(xi,xj) terms measure the distance of the different features:

  • Dfam: measured as the Euclidean distance between the one-hot-encoded vectors indicating the perovskite family of each PSC.

  • Darch: measured as the Euclidean distance between the one-hot-encoded vectors indicating the device architecture of each PSC.

  • Dfp: measured from the Tanimoto similarity index (T)56 between two fingerprint vectors as Dfp(xi,xj) = 1 – T(xifp,xj).

  • Dstr: measured as the Euclidean distance between the vectors containing all structural features (after standardizing them to have a mean value of 0 and a standard deviation of 1).

  • Dadd: measured as the Euclidean distance between the one-hot-encoded vectors indicating the presence/absence of each HTM additive.

  • Delec: measured as the Euclidean distance between the vectors containing all electronic features (after standardizing them to have a mean value of 0 and a standard deviation of 1).

Note that having these different features with different weights in the kernel allows us to see their contribution to the final model, as well as control which features are considered in the model since they represent unique properties and they might have different availability. In our model, we have used 90% of our data as a training set and used the other randomly chosen 10% of our dataset as a test set. We have used a leave-one-out cross-validation (LOO-CV) to optimize the γi parameters and the regularization parameter α, by minimizing the root-mean-square error (rmse) of our prediction in the training set. Finally, the optimized parameters are used to evaluate the PCE of the data points in the test set. The parameters of the model are optimized using a differential evolution algorithm.57 The details of this procedure, along with the optimized parameters, are shown in Section S3 in the SI.

3. Results and Discussion

We first discuss our ML model with the heterogeneous database. In Figure 4, we show the predicted PCE and experimental PCE values for the 242 points in the training set, as well as the 27 points in the test set. Both sets result in a very similar rmse = 3.0% and Pearson correlation (rtrain = 0.68 and rtest = 0.72). These results are very promising and suggest that one could use this model to screen unknown HTM candidates. However, the errors for some of these points are relatively large (>5%), so we also discuss how one can use this ML model to make qualitative predictions that might accelerate the screening of novel HTM candidates. For example, we can split our test set into two groups: top-performance HTMs (14 points with the highest PCE) and lowest-performance HTMs (13 points with the lowest PCE). Then, we can rank our predictions and see if top-performance HTMs are correctly predicted in the experimental top-14. We show both the experimental and predicted PCE values in Table 1, where the HTM number (HTM no.) corresponds to the list shown in Figure S7. We can observe how our model correctly predicts 12 out of the top-14 HTMs as top-performers and 11 out of the lowest-13 HTMs as lowest-performers.

Figure 4.

Figure 4

Experimental and predicted PCE of data in the heterogeneous database. Blue points correspond to data in the training set, and orange points correspond to data in the test set.

Table 1. Experimental and Predicted PCE Values for Each of the HTMs in the Test Set of the Heterogeneous Databasea.

HTM no. experimental PCE predicted PCE experimental rank order predicted rank order
9 21.04 18.72 1 1
15 20.56 15.29 2 15*
16 20.38 17.66 3 5
36 19.42 17.31 4 6
42 19.27 16.35 5 9
67 18.48 16.75 6 7
72 18.40 18.05 7 4
81 18.17 18.60 8 2
91 18.03 16.32 9 10
100 17.76 15.94 10 12
106 17.60 14.44 11 20*
121 17.00 15.83 12 13
125 16.90 18.44 13 3
126 16.90 15.35 14 14
137 16.50 10.96 15 26
151 16.00 15.28 16 16
162 15.46 16.38 17 8*
166 15.40 13.66 18 22
177 14.20 15.23 19 17
185 13.90 14.68 20 18
213 11.90 12.65 21 23
217 11.51 11.17 22 25
224 11.00 14.02 23 21
229 10.80 14.61 24 19
237 9.67 16.32 25 11*
265 5.60 9.19 26 27
268 5.20 11.51 27 24
a

Ranked order for experimental and predicted values corresponds to the sorted values in decreasing order. Values marked with * are incorrectly categorized as top- or lowest-performer.

However, it should be noted that the perovskite families 2–6 only have a few data points, as shown in Figure 3. In addition, families 2–6 tend to have a narrow PCE distribution, as shown in Figure 5, which means that simply predicting the average performance of these families would return accurate results, even when ignoring the effect of the HTM. These data clusters are not uncommon in experimental datasets since negative results are rarely reported and datasets tend to have clusters of high-performing points, which affects the performance of the predictive model.58 There have been attempts to mitigate these biases, like including data from failed experiments,59 optimizing the design of experiments,60 or adopting novel frameworks.61,62 In our case, we can reduce the complexity of the problem and isolate the effect of the HTM simply by working with our homogeneous dataset, which corresponds to a data subset with the most numerous class—family 1 (MAPbX3) and mesoporous architecture. Note that if more data becomes available for some of the less populated families and architectures, one could also try to improve the performance by doing a more sophisticated encoding of the perovskite layer and considering additional features with its composition and properties.

Figure 5.

Figure 5

PCE distribution in our heterogeneous database as a function of the perovskite family.

When working with the homogeneous database, we reoptimize the model parameters to minimize the rmse in the training set. As in the case of the heterogeneous database, both training and test sets produce very similar results (rmsetrain = 2.8% and rmsetest = 2.7%) and Pearson correlation (rtrain = 0.59 and rtest = 0.57), as shown in Figure 6.

Figure 6.

Figure 6

Experimental and predicted PCE of data in the homogeneous database. Blue points correspond to data in the training set, and orange points correspond to data in the test set.

Although the prediction error committed in some of these HTMs is not negligible, the model still returns a moderate correlation when predicting the efficiency of unknown cells (r = 0.57). We show in Table 2 the experimental and predicted PCE values of the 10 HTMs in the test set of the homogeneous database. As we did with the hetereogeneous database, we can split our test set as top-performers and lowest-performers (top-5 HTMs and lowest-5 HTMs, respectively). Then, if we rank our predicted PCE values, we can observe how four out of five of our top-5 predictions are within the experimental top-5. Similarly, four out of five of the lowest-5 predictions are within the experimental lowest-5. The chemical structures for these 10 HTMs are shown in Figure 7.

Table 2. Experimental and Predicted PCE Values for Each of the HTMs in the Test Set of Our Homogeneous Databasea.

HTM no. experimental PCE predicted PCE experimental rank order predicted rank order*
109 17.50 13.03 1 3
134 16.60 10.80 2 6*
172 14.90 14.72 3 1
200 13.00 12.30 4 5
205 12.40 12.94 5 4
206 12.30 8.81 6 10
223 11.00 13.16 7 2*
235 9.70 10.60 8 7
247 8.62 9.21 9 8
257 7.40 8.94 10 9
a

Ranked order for experimental and predicted values corresponds to their respective PCE values sorted in decreasing order. Values marked with * are incorrectly categorized as top- or lowest-performer.

Figure 7.

Figure 7

Chemical structures of the 10 HTMs used as a test set to evaluate our ML model with the homogeneous database.

Finally, it is worth noting that, because our predictive model is formed by several types of features (e.g., fingerprints, structural and electronic features), we can explore how our predictions are affected when using only one of these features at a time (along with the HTM additives, to account for data points that might have the same HTM, but a different additive, affecting the PCE). We show these cases in Table 3, along with the prediction results when combining all features. We observe how using only structural or electronic features results in large rmse and low r, which indicates that they are not suitable to predict PCE by themselves. However, when we use only fingerprints, the rmse and r improve significantly, which tells us that this simple feature can account for most of the correlation in our data, and one could build a reasonably accurate model with just the fingerprints, which has also been observed in other organic photovoltaic materials.54 Combining all features results in a marginal increase of the model performance, although it comes at the cost of adding more computationally expensive features. The optimized parameters and predicted values are shown in Section S3 in the SI.

Table 3. Values of rmse and r When Training the Model with Different Sets of Features with the Homogeneous Databasea.

features rmse (%) r
fingerprint + additives 2.8 (2.8) 0.59 (0.54)
structural + additives 6.9 (3.9) 0.19 (0.27)
electronic + additives 4.1 (5.4) 0.40 (0.65)
fingerprint + structural + electronic + additives 2.8 (2.7) 0.59 (0.57)
a

Values outside the parentheses correspond to the training set, and those within parentheses correspond to the test set.

Since most of our model performance is due to the information encoded in the fingerprint, we performed an analysis to detect if any fragment is significantly correlated with PCE. For each of the 2048 bits in the fingerprints, we studied the correlation between its presence in each molecule of the dataset and the corresponding PCE value with a point biserial correlation coefficient, r, which is equivalent to Pearson’s correlation when one variable is binary.

When we focus on the homogeneous database, we observe that there are 19 bits with |r| > 0.30, which are shown in Figure 8. In Section S4 in the SI, we show the p-value corresponding to this measurement (below 0.003 in all cases) and the 95% confidence interval for r, showing that there is statistical significance to these trends. We also show in the SI the results of a Mann–Whitney U test for each of these fragments, which measures the statistical difference between the PCE values of the molecules without the fragment and the PCE values of the molecules with the fragment, confirming that there is a statistically significant correlation between the presence of these fragments and the PCE values.

Figure 8.

Figure 8

Fragments corresponding to the bits of the fingerprint which have a correlation coefficient r > 0.30 (top) and r < −0.30 (bottom). Bits representations were obtained with the RDKit package.64

We can observe in Figure 8 how most of the bits with a positive correlation correspond to conjugated fragments with oxygen and nitrogen (e.g., arylamine and aryloxy groups) that form part of the triphenylamine (TPA) or diphenylamine (DPA) peripheral units present in many HTMs. This statistical analysis performed over many HTMs therefore supports the empirical trend that arylamine units with alkoxy chains are important for efficient HTMs for PSC devices. On the other hand, it is worth noting that the fragments with a negative correlation with PCE are alkane chains (bit #740, #2, and #1445) or conjugated fragments with sulfur (bit #145, #1568, #123, and #1936) like thiophene rings. This is an interesting trend since two out of five molecules with the highest PCE in the homogeneous database contain thiophene rings, the average PCE of molecules with thiophene rings in the homogeneous database is 11.7%, which is lower than the overall average PCE for this database (12.3%). This means that even if there is a significant percentage of top-performers with thiophene rings, only a small portion of molecules with this fragment are top-performers, and we observe a slight negative correlation between this group and the cell’s performance. Importantly, this negative correlation contrasts with the general conception, extracted from only a few HTM systems, that sulfur-containing conjugated systems are good HTM candidates due to favorable sulfur–perovskite interactions.63

The results for the heterogeneous database are very similar, and 13 of the 20 most correlated bits are the same observed in the homogeneous database. In this case, only one fragment (#1097) has a value of |r| > 0.3, although there are a total of 27 fragments with a correlation of |r| > 0.2, as shown in the SI.

4. Conclusions

We have gathered data of 269 perovskite solar cells, including their PCE, perovskite family, device architecture, and multiple HTM features from the literature. Constructing PSCs, or even performing accurate electronic structure calculations of the HTMs, might be very resource- and time-consuming. Therefore, we have used our database to train a predictive ML model with reasonably good accuracy, which suggests that this model would be a useful HTM screening tool. Moreover, we have shown how this model can also be used to correctly identify most of the top-performing and lowest-performing HTMs. We have discussed how the performance of the predictive model is overestimated by data clusters for different perovskite and architectures, although one can still gain insight from predictions made for HTMs corresponding to a heterogeneous distribution of perovskite and architecture. Finally, we have shown the correlation between specific molecular fragments, like arylamine and aryloxy groups, which have a large positive correlation with PCE, or thiophene groups, which have a large negative correlation with PCE. In essence, this approach enables researchers to discriminate between apparent correlations emerging for very limited observations and statistically meaningful correlations consistent with the entire dataset. We expect this type of analysis to become increasingly more powerful as more data are added to the provided dataset and more sophisticated descriptors are used, helping to increase the prediction generalizability and reduce bias effects.

Acknowledgments

The authors acknowledge the financial support of the EPSRC and ERC (grant no. 101020369) and the EPSRC vacation internship program. They are also grateful to the Spanish Ministry of Science and Innovation (MICINN) (projects PID2021-128569NB-I00, RED2018-102815-T, RED2018-102331-T, and RyC-2017-23500) and the Generalitat Valenciana (PROMETEO/2020/077).

Supporting Information Available

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.jpcc.2c04725.

  • List of features used in the model, additional details about the conformer search and ML model, k-nearest neighbor results, extra information on the correlation between chemical fragments and PCE, and chemical structures in the database (PDF)

The authors declare no competing financial interest.

Supplementary Material

jp2c04725_si_001.pdf (4.8MB, pdf)

References

  1. Kojima A.; Teshima K.; Shirai Y.; Miyasaka T. Organometal Halide Perovskites as Visible-Light Sensitizers for Photovoltaic Cells. J. Am. Chem. Soc. 2009, 131, 6050–6051. 10.1021/ja809598r. [DOI] [PubMed] [Google Scholar]
  2. Min H.; Lee D. Y.; Kim J.; Kim G.; Lee K. S.; Kim J.; Paik M. J.; Kim Y. K.; Kim K. S.; Kim M. G.; et al. Perovskite Solar Cells with Atomically Coherent Interlayers on SnO2 Electrodes. Nature 2021, 598, 444–450. 10.1038/s41586-021-03964-8. [DOI] [PubMed] [Google Scholar]
  3. Brenner T. M.; Egger D. A.; Kronik L.; Hodes G.; Cahen D. Hybrid Organic—Inorganic Perovskites: Low-Cost Semiconductors with Intriguing Charge-Transport Properties. Nat. Rev. Mater. 2016, 1, 15007 10.1038/natrevmats.2015.7. [DOI] [Google Scholar]
  4. Yang W. S.; Park B.-W.; Jung E. H.; Jeon N. J.; Kim Y. C.; Lee D. U.; Shin S. S.; Seo J.; Kim E. K.; Noh J. H.; Seok S. I. Iodide Management in Formamidinium-Lead-Halide–Based Perovskite Layers for Efficient Solar Cells. Science 2017, 356, 1376–1379. 10.1126/science.aan2301. [DOI] [PubMed] [Google Scholar]
  5. Jeon N. J.; Noh J. H.; Yang W. S.; Kim Y. C.; Ryu S.; Seo J.; Seok S. I. Compositional Engineering of Perovskite Materials for High-Performance Solar Cells. Nature 2015, 517, 476–480. 10.1038/nature14133. [DOI] [PubMed] [Google Scholar]
  6. Saliba M.; Matsui T.; Seo J.-Y.; Domanski K.; Correa-Baena J.-P.; Nazeeruddin M. K.; Zakeeruddin S. M.; Tress W.; Abate A.; Hagfeldt A.; et al. Cesium-containing triple cation perovskite solar cells: improved stability, reproducibility and high efficiency. Energy Environ. Sci. 2016, 9, 1989–1997. 10.1039/C5EE03874J. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Tiep N. H.; Ku Z.; Fan H. J. Recent Advances in Improving the Stability of Perovskite Solar Cells. Adv. Energy Mater. 2016, 6, 1501420 10.1002/aenm.201501420. [DOI] [Google Scholar]
  8. Boyd C. C.; Cheacharoen R.; Leijtens T.; McGehee M. D. Understanding Degradation Mechanisms and Improving Stability of Perovskite Photovoltaics. Chem. Rev. 2019, 119, 3418–3451. 10.1021/acs.chemrev.8b00336. [DOI] [PubMed] [Google Scholar]
  9. Calió L.; Kazim S.; Grätzel M.; Ahmad S. Hole-Transport Materials for Perovskite Solar Cells. Angew. Chem., Int. Ed. 2016, 55, 14522–14545. 10.1002/anie.201601757. [DOI] [PubMed] [Google Scholar]
  10. Ameen S.; Rub M. A.; Kosa S. A.; Alamry K. A.; Akhtar M. S.; Shin H.-S.; Seo H.-K.; Asiri A. M.; Nazeeruddin M. K. Perovskite Solar Cells: Influence of Hole Transporting Materials on Power Conversion Efficiency. ChemSusChem 2016, 9, 10–27. 10.1002/cssc.201501228. [DOI] [PubMed] [Google Scholar]
  11. Zhang L.; Zhou X.; Liu C.; Wang X.; Xu B. A Review on Solution-Processable Dopant-Free Small Molecules as Hole-Transporting Materials for Efficient Perovskite Solar Cells. Small Methods 2020, 4, 2000254 10.1002/smtd.202000254. [DOI] [Google Scholar]
  12. Urieta-Mora J.; García-Benito I.; Molina-Ontoria A.; Martín N. Hole Transporting Materials for Perovskite Solar Cells: A Chemical Approach. Chem. Soc. Rev. 2018, 47, 8541–8571. 10.1039/C8CS00262B. [DOI] [PubMed] [Google Scholar]
  13. Rakstys K.; Igci C.; Nazeeruddin M. K. Efficiency vs. Stability: Dopant-Free Hole Transporting Materials towards Stabilized Perovskite Solar Cells. Chem. Sci. 2019, 10, 6748–6769. 10.1039/C9SC01184F. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Low P. J.; Paterson M. A. J.; Yufit D. S.; Howard J. A. K.; Cherryman J. C.; Tackley D. R.; Brook R.; Brown B. Towards an Understanding of Structure–Property Relationships in Hole-Transport Materials: The Influence of Molecular Conformation on Oxidation Potential in Poly(Aryl)Amines. J. Mater. Chem. 2005, 15, 2304–2315. 10.1039/b417962e. [DOI] [Google Scholar]
  15. Zimmermann I.; Urieta-Mora J.; Gratia P.; Aragó J.; Grancini G.; Molina-Ontoria A.; Ortí E.; Martín N.; Nazeeruddin M. K. High-Efficiency Perovskite Solar Cells Using Molecularly Engineered, Thiophene-Rich, Hole-Transporting Materials: Influence of Alkyl Chain Length on Power Conversion Efficiency. Adv. Energy Mater. 2017, 7, 1601674 10.1002/aenm.201601674. [DOI] [Google Scholar]
  16. Calbo J.; Viruela R.; Aragó J.; Ortí E. Theoretical Insights into the Structural, Electronic and Optical Properties of Benzotrithiophene-Based Hole-Transporting Materials. Theor. Chem. Acc. 2017, 136, 73 10.1007/s00214-017-2100-4. [DOI] [Google Scholar]
  17. Ashassi-Sorkhabi H.; Salehi-Abar P. Design of Two Novel Hole Transport Materials via Replacing the Core of Spiro-OMeTAD with Tetrathiafulvalene and Tetraazafulvalene for Application in Perovskite Solar Cells. Sol. Energy 2018, 173, 132–138. 10.1016/j.solener.2018.07.047. [DOI] [Google Scholar]
  18. García-Benito I.; Zimmermann I.; Urieta-Mora J.; Aragó J.; Calbo J.; Perles J.; Serrano A.; Molina-Ontoria A.; Ortí E.; Martín N.; et al. Heteroatom Effect on Star-Shaped Hole-Transporting Materials for Perovskite Solar Cells. Adv. Funct. Mater. 2018, 28, 1801734 10.1002/adfm.201801734. [DOI] [Google Scholar]
  19. Qiu M.; Pei W.; Lu Q.; Li Z.; Li Y.; Liang J. DFT Characteristics of Charge Transport in DBTP-Based Hole Transport Materials. Appl. Sci. 2019, 9, 2244 10.3390/app9112244. [DOI] [Google Scholar]
  20. Naqvi S.; Patra A. Hole Transport Materials for Perovskite Solar Cells: A Computational Study. Mater. Chem. Phys. 2021, 258, 123863 10.1016/j.matchemphys.2020.123863. [DOI] [Google Scholar]
  21. Saeki A.; Kranthiraja K. A High Throughput Molecular Screening for Organic Electronics via Machine Learning: Present Status and Perspective. Jpn. J. Appl. Phys. 2019, 59, SD0801 10.7567/1347-4065/ab4f39. [DOI] [Google Scholar]
  22. Lin Y.-C.; Lu Y.-J.; Tsao C.-S.; Saeki A.; Li J.-X.; Chen C.-H.; Wang H.-C.; Chen H.-C.; Meng D.; Wu K.-H.; et al. Enhancing Photovoltaic Performance by Tuning the Domain Sizes of a Small-Molecule Acceptor by Side-Chain-Engineered Polymer Donors. J. Mater. Chem. A 2019, 7, 3072–3082. 10.1039/C8TA11059J. [DOI] [Google Scholar]
  23. Mahmood A.; Wang J.-L. Machine Learning for High Performance Organic Solar Cells: Current Scenario and Future Prospects. Energy Environ. Sci. 2021, 14, 90–105. 10.1039/D0EE02838J. [DOI] [Google Scholar]
  24. Kuhar K.; Pandey M.; Thygesen K. S.; Jacobsen K. W. High-Throughput Computational Assessment of Previously Synthesized Semiconductors for Photovoltaic and Photoelectrochemical Devices. ACS Energy Lett. 2018, 3, 436–446. 10.1021/acsenergylett.7b01312. [DOI] [Google Scholar]
  25. Herbol H. C.; Hu W.; Frazier P.; Clancy P.; Poloczek M. Efficient Search of Compositional Space for Hybrid Organic–Inorganic Perovskites via Bayesian Optimization. Npj Comput. Mater. 2018, 4, 51 10.1038/s41524-018-0106-7. [DOI] [Google Scholar]
  26. Paterson L.; May F.; Andrienko D. Computer Aided Design of Stable and Efficient OLEDs. J. Appl. Phys. 2020, 128, 160901 10.1063/5.0022870. [DOI] [Google Scholar]
  27. Wang H.; Ji Y.; Li Y. Simulation and Design of Energy Materials Accelerated by Machine Learning. WIREs Comput. Mol. Sci. 2020, 10, e1421 10.1002/wcms.1421. [DOI] [Google Scholar]
  28. Srivastava M.; Howard J. M.; Gong T.; Rebello Sousa Dias M.; Leite M. S. Machine Learning Roadmap for Perovskite Photovoltaics. J. Phys. Chem. Lett. 2021, 12, 7866–7877. 10.1021/acs.jpclett.1c01961. [DOI] [PubMed] [Google Scholar]
  29. Yılmaz B.; Yıldırım R. Critical Review of Machine Learning Applications in Perovskite Solar Research. Nano Energy 2021, 80, 105546 10.1016/j.nanoen.2020.105546. [DOI] [Google Scholar]
  30. Tao Q.; Xu P.; Li M.; Lu W. Machine Learning for Perovskite Materials Design and Discovery. Npj Comput. Mater. 2021, 7, 23 10.1038/s41524-021-00495-8. [DOI] [Google Scholar]
  31. Takahashi K.; Takahashi L.; Miyazato I.; Tanaka Y. Searching for Hidden Perovskite Materials for Photovoltaic Systems by Combining Data Science and First Principle Calculations. ACS Photonics 2018, 5, 771–775. 10.1021/acsphotonics.7b01479. [DOI] [Google Scholar]
  32. Gladkikh V.; Kim D. Y.; Hajibabaei A.; Jana A.; Myung C. W.; Kim K. S. Machine Learning for Predicting the Band Gaps of ABX3 Perovskites from Elemental Properties. J. Phys. Chem. C 2020, 124, 8905–8918. 10.1021/acs.jpcc.9b11768. [DOI] [Google Scholar]
  33. Li W.; Jacobs R.; Morgan D. Predicting the Thermodynamic Stability of Perovskite Oxides Using Machine Learning Models. Comput. Mater. Sci. 2018, 150, 454–463. 10.1016/j.commatsci.2018.04.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Lu S.; Zhou Q.; Ouyang Y.; Guo Y.; Li Q.; Wang J. Accelerated Discovery of Stable Lead-Free Hybrid Organic-Inorganic Perovskites via Machine Learning. Nat. Commun. 2018, 9, 3405 10.1038/s41467-018-05761-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Xu L.; Wencong L.; Chunrong P.; Qiang S.; Jin G. Two Semi-Empirical Approaches for the Prediction of Oxide Ionic Conductivities in ABO3 Perovskites. Comput. Mater. Sci. 2009, 46, 860–868. 10.1016/j.commatsci.2009.04.047. [DOI] [Google Scholar]
  36. Li L.; You Y.; Hu S.; Shi Y.; Zhao G.; Chen C.; Wang Y.; Stroppa A.; Ren W. Electronic Transport of Organic-Inorganic Hybrid Perovskites from First-Principles and Machine Learning. Appl. Phys. Lett. 2019, 114, 083102 10.1063/1.5045512. [DOI] [Google Scholar]
  37. Odabaşı Ç.; Yıldırım R. Performance Analysis of Perovskite Solar Cells in 2013–2018 Using Machine-Learning Tools. Nano Energy 2019, 56, 770–791. 10.1016/j.nanoen.2018.11.069. [DOI] [Google Scholar]
  38. Li J.; Pradhan B.; Gaur S.; Thomas J. Predictions and Strategies Learned from Machine Learning to Develop High-Performing Perovskite Solar Cells. Adv. Energy Mater. 2019, 9, 1901891 10.1002/aenm.201901891. [DOI] [Google Scholar]
  39. Gok E. C.; Yildirim M. O.; Haris M. P. U.; Eren E.; Pegu M.; Hemasiri N. H.; Huang P.; Kazim S.; Uygun Oksuz A.; Ahmad S. Predicting Perovskite Bandgap and Solar Cell Performance with Machine Learning. Sol. RRL 2022, 6, 2100927 10.1002/solr.202100927. [DOI] [Google Scholar]
  40. Anizelli H.; David T. W.; Tyagi P.; Laureto E.; Kettle J. Enhancing the Stability of Perovskite Solar Cells through Functionalisation of Metal Oxide Transport Layers with Self-Assembled Monolayers. Sol. Energy 2020, 203, 157–163. 10.1016/j.solener.2020.04.035. [DOI] [Google Scholar]
  41. del Cueto M.; Rawski-Furman K.; Aragó J.; Ortí E.; Troisi A.. HTM Effect on PCE. github.com/marcosdelcueto/htm_effect_on_pce (accessed April 26, 2022).
  42. Pitchaiya S.; Natarajan M.; Santhanam A.; Asokan V.; Yuvapragasam A.; Madurai Ramakrishnan V.; Palanisamy S. E.; Sundaram S.; Velauthapillai D. A Review on the Classification of Organic/Inorganic/Carbonaceous Hole Transporting Materials for Perovskite Solar Cell Application. Arabian J. Chem. 2020, 13, 2526–2557. 10.1016/j.arabjc.2018.06.006. [DOI] [Google Scholar]
  43. Fakharuddin A.; Schmidt-Mende L.; Garcia-Belmonte G.; Jose R.; Mora-Sero I. Interfaces in Perovskite Solar Cells. Adv. Energy Mater. 2017, 7, 1700623 10.1002/aenm.201700623. [DOI] [Google Scholar]
  44. Harris D.; Harris S. L.. Digital Design and Computer Architecture; Morgan Kaufmann, 2010. [Google Scholar]
  45. Weininger D. SMILES, a Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules. J. Chem. Inf. Comput. Sci. 1988, 28, 31–36. 10.1021/ci00057a005. [DOI] [Google Scholar]
  46. Rogers D.; Hahn M. Extended-Connectivity Fingerprints. J. Chem. Inf. Model. 2010, 50, 742–754. 10.1021/ci100050t. [DOI] [PubMed] [Google Scholar]
  47. Firth N. C.; Brown N.; Blagg J. Plane of Best Fit: A Novel Method to Characterize the Three-Dimensionality of Molecules. J. Chem. Inf. Model. 2012, 52, 2516–2525. 10.1021/ci300293f. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Stewart J. J. P. Optimization of Parameters for Semiempirical Methods VI: More Modifications to the NDDO Approximations and Re-Optimization of Parameters. J. Mol. Model. 2013, 19, 1–32. 10.1007/s00894-012-1667-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Padula D.; Omar Ö. H.; Nematiaram T.; Troisi A. Singlet Fission Molecules among Known Compounds: Finding a Few Needles in a Haystack. Energy Environ. Sci. 2019, 12, 2412–2416. 10.1039/C9EE01508F. [DOI] [Google Scholar]
  50. Frisch M. J.; Trucks G. W.; Schlegel H. B.; Scuseria G. E.; Robb M. A.; Cheeseman J. R.; Scalmani G.; Barone V.; Petersson G. A.; Nakatsuji H.. et al. Gaussian 16, revision C.01; Gaussian, Inc.: Wallingford, CT, 2016.
  51. Pedregosa F.; Varoquaux G.; Gramfort A.; Michel V.; Thirion B.; Grisel O.; Blondel M.; Prettenhofer P.; Weiss R.; Dubourg V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  52. Kim C.; Pilania G.; Ramprasad R. Machine Learning Assisted Predictions of Intrinsic Dielectric Breakdown Strength of ABX3 Perovskites. J. Phys. Chem. C 2016, 120, 14575–14580. 10.1021/acs.jpcc.6b05068. [DOI] [Google Scholar]
  53. Pilania G.; Mannodi-Kanakkithodi A.; Uberuaga B. P.; Ramprasad R.; Gubernatis J. E.; Lookman T. Machine Learning Bandgaps of Double Perovskites. Sci. Rep. 2016, 6, 19375 10.1038/srep19375. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Zhao Z.-W.; del Cueto M.; Geng Y.; Troisi A. Effect of Increasing the Descriptor Set on Machine Learning Prediction of Small Molecule-Based Organic Solar Cells. Chem. Mater. 2020, 32, 7777–7787. 10.1021/acs.chemmater.0c02325. [DOI] [Google Scholar]
  55. Padula D.; Simpson J. D.; Troisi A. Combining Electronic and Structural Features in Machine Learning Models to Predict Organic Solar Cells Properties. Mater. Horiz. 2019, 6, 343–349. 10.1039/C8MH01135D. [DOI] [Google Scholar]
  56. Bajusz D.; Rácz A.; Héberger K. Why Is Tanimoto Index an Appropriate Choice for Fingerprint-Based Similarity Calculations?. J. Cheminf. 2015, 7, 20 10.1186/s13321-015-0069-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Storn R.; Price K. Differential Evolution – A Simple and Efficient Heuristic for Global Optimization over Continuous Spaces. J. Global Optim. 1997, 11, 341–359. 10.1023/A:1008202821328. [DOI] [Google Scholar]
  58. Krawczyk B. Learning from Imbalanced Data: Open Challenges and Future Directions. Prog. Artif. Intell. 2016, 5, 221–232. 10.1007/s13748-016-0094-0. [DOI] [Google Scholar]
  59. Raccuglia P.; Elbert K. C.; Adler P. D. F.; Falk C.; Wenny M. B.; Mollo A.; Zeller M.; Friedler S. A.; Schrier J.; Norquist A. J. Machine-Learning-Assisted Materials Discovery Using Failed Experiments. Nature 2016, 533, 73–76. 10.1038/nature17439. [DOI] [PubMed] [Google Scholar]
  60. Cao B.; Adutwum L. A.; Oliynyk A. O.; Luber E. J.; Olsen B. C.; Mar A.; Buriak J. M. How To Optimize Materials and Devices via Design of Experiments and Machine Learning: Demonstration Using Organic Photovoltaics. ACS Nano 2018, 12, 7434. 10.1021/acsnano.8b04726. [DOI] [PubMed] [Google Scholar]
  61. Kailkhura B.; Gallagher B.; Kim S.; Hiszpanski A.; Han T. Y.-J. Reliable and Explainable Machine-Learning Methods for Accelerated Material Discovery. Npj Comput. Mater. 2019, 5, 108 10.1038/s41524-019-0248-2. [DOI] [Google Scholar]
  62. del Cueto M.; Troisi A. Determining Usefulness of Machine Learning in Materials Discovery Using Simulated Research Landscapes. Phys. Chem. Chem. Phys. 2021, 23, 14156–14163. 10.1039/D1CP01761F. [DOI] [PubMed] [Google Scholar]
  63. Saliba M.; Orlandi S.; Matsui T.; Aghazada S.; Cavazzini M.; Correa-Baena J.-P.; Gao P.; Scopelliti R.; Mosconi E.; Dahmen K.-H.; et al. A molecularly engineered hole-transporting material for efficient perovskite solar cells. Nat. Energy 2016, 1, 15017 10.1038/nenergy.2015.17. [DOI] [Google Scholar]
  64. Landrum G.RDKit: Open-Source Cheminformatics Software. https://www.rdkit.org/ (accessed April 25, 2022).

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

jp2c04725_si_001.pdf (4.8MB, pdf)

Articles from The Journal of Physical Chemistry. C, Nanomaterials and Interfaces are provided here courtesy of American Chemical Society

RESOURCES