Skip to main content
Advanced Science logoLink to Advanced Science
. 2026 Mar 24;13(23):e74952. doi: 10.1002/advs.74952

Machine Learning for Designing Perovskites and Perovskite‐Inspired Solar Materials: Emerging Opportunities and Challenges

Yangfan Zhang 1, Yiming Xia 1, Ali Shakiba 1, Hongrui Zhang 1, Xiaojing Hao 1,, Priyank V Kumar 2,, Mahesh P Suryawanshi 1,
PMCID: PMC13104142  PMID: 41874517

ABSTRACT

The development of perovskites and perovskite‐inspired materials (PIMs) is driven by the need for efficient, non‐toxic and stable solar energy conversion technologies. While halide perovskites exhibit outstanding optoelectronic properties, their practical deployment remains hindered by toxicity concerns and long‐term instability. Conventional experimental and computational approaches, though effective, are often limited by high costs and low throughput, prompting the need for data‐driven strategies. In this review, we provide a comprehensive analysis of machine learning (ML)‐driven approaches for predicting key properties such as bandgap, stability, and lattice constants in perovskite and PIMs systems. We outline a complete ML workflow, from target identification and data collection to feature engineering and model selection across supervised, unsupervised, and reinforcement learning frameworks. Special attention is given to the transferability of ML strategies developed for halide perovskites to the more chemical diverse PIMs landscape. By highlighting recent progress and current limitations, we provide a critical roadmap for integrating ML into the rational design and discovery of next‐generation non‐toxic, stable solar materials. These insights are expected to accelerate the discovery‐to‐deployment cycle for low‐toxicity, high‐efficiency solar absorbers and catalyze innovation across the broader field of data‐driven energy materials.

Keywords: machine learning, perovskites, perovskite‐inspired materials, solar energy conversion


This review offers a comprehensive comparison between perovskites and perovskite‐inspired materials (PIMs), focusing on their crystal structures, electronic properties, and chemical compositions. It evaluates the applicability of machine learning (ML) descriptors and models across both material classes. The review outlines recent ML advancements, details a complete ML framework for material discovery, and discusses the challenges and future prospects of applying ML to accelerate the development of high‐performance photovoltaic materials.

graphic file with name ADVS-13-e74952-g004.jpg

1. Introduction

Perovskite materials have rapidly advanced as promising photovoltaic (PV) candidates due to their unique crystal structure and excellent optoelectronic characteristics [1]. Lead (Pb)‐based halide perovskites (LHPs), in particular, have received significant attention as excellent PV absorber materials, achieving remarkable progress in power conversion efficiencies (PCEs) from 3.8% to 27% within just over a decade [2]. These materials offer efficient light absorption, high efficiency, and a simple fabrication process [3]. However, despite their significant potential, LHPs including well‐known examples such as both inorganic or hybrid organic‐inorganic perovskites (HOIPs) face critical challenges such as Pb toxicity and poor environmental stability [4, 5]. To address these issues, significant efforts have been made to find safer, lead‐free alternatives also known as perovskite‐inspired materials (PIMs). PIMs have gained significant momentum in recent years as next‐generation PV candidates, offering the potential to combine desirable optoelectronic properties with improved environmental stability and lower toxicity. These materials aim to retain the key advantages of LHPs such as tunable bandgaps, strong light absorption, and defect tolerance, while addressing critical issues such as Pb toxicity and structural instability under ambient conditions [6, 7]. The design and screening of such materials, however, remains a major challenge due to the vast and complex compositional space they occupy.

Traditional discovery of such new energy materials has relied heavily on trial‐and‐error experimentation. While density functional theory (DFT) has emerged as a particularly powerful and widely adopted tool for predicting quantum level material properties and has supported the high throughput screening of extensive compounds databases, thereby reducing reliance on purely experimental methods [8]. Although DFT is considerably faster than many‐body perturbation methods like GW, its computational costs remains significant compared to more efficient alternatives such as machine‐learned interatomic potentials (MLIPs) especially when in high‐throughput workflows involving millions of hypothetical candidates or complex chemical compositions [9, 10]. MLIPs can directly be trained on DFT reference data then reproduce potential energy surfaces with near‐DFT precision (typically within 5–10 meV atom 1 in energy and 0.05 eV Å 1 in forces) while being 103–105 times faster than DFT. This enables atomistic simulations involving up to millions of atoms and extended nanosecond‐to‐microsecond time scales, allowing realistic modeling of processes such as phase transitions, defect dynamics, and thermal transport [11].

Beyond atomistic potential modeling, the broader field of machine learning (ML) has also demonstrated remarkable potential in accelerating materials discovery [8, 12, 13]. ML models, when trained either on DFT‐derived data or experimental data, can rapidly and accurately predict material properties such as bandgap, formation energy, decomposition energy (Ehull) and lattice constants at a fraction of the computational cost [14, 15, 16, 17, 18, 19]. This enables efficient screening of the vast material composition spaces associated with PIMs, allowing faster identification of non‐toxic, stable and promising candidates for solar energy conversion. While a few excellent review papers have discussed PIMs developments [20, 21], and the application of ML in halide perovskites [8, 17, 22] no comprehensive review to date has systematically started from the structural and compositional similarities between perovskites and PIMs, then exploring the extent to which ML descriptors are shared between these two material families and how ML strategies on perovskites can be applied on PIMs.

In this review, we address that gap. We provide a comprehensive analysis of ML applications in perovskites and PIMs, evaluating differences in crystal structures, electronic structures, and chemical compositions. We critically examine the generalizability of ML frameworks, from feature engineering to model selection, and highlight key advances, limitations, transfer strategies. This review is organized as follows: Section 2 presents structural, dimensional, and compositional insights into perovskites and PIMs. Sections 3, 4, 5, 6, 7, 8 outlines the complete ML workflow applied to these materials, as illustrated in Figure 1, including algorithms from foundational to state‐of‐the‐art, and their performance in terms of usage and accuracy in relevant materials systems. Sections 9 and 10 offers a recent key progress of ML applications in both perovskites and PIMs classes. Finally, Section 11 discusses open challenges such as data scarcity, model interpretability, and physics‐informed learning and proposes future directions for accelerating ML‐driven discovery of next‐generation PIMs. Given that the manuscript contains abbreviations spanning different fields, we provide a consolidated list of all abbreviations and their definitions in Table S1 to facilitate clarity and ease of reading.

FIGURE 1.

FIGURE 1

A complete workflow of an ML model. Structured by problem formulation, data collection or generation, feature engineering, model training, and application, data sources, feature engineering methods, and ML algorithms are included.

2. Structure, Dimension, and Composition of Perovskites and PIMs

With the development of energy material research, perovskites have demonstrated significant potential in PV, photocatalysis (PC), and other fields due to their excellent photoelectric properties, rich chemical diversity and tunable structure [23, 24, 25, 26, 27]. Building on the three‐dimensional (3D) perovskite framework, extensive efforts have led to the advancement of low‐dimensional counterparts—such as 2D, 1D, and 0D perovskites [28], which also exhibit remarkable potential for PV applications [29]. Meanwhile, PIMs further enhance the functionality and stability of perovskites by introducing structural modifications or substituting specific elements within the conventional perovskite lattice [21]. Figure 2 illustrates the schematic representations of various perovskites and PIMs, highlighting their structure, composition differences and similarities. With the widespread adoption of ML in materials science, an important research direction is the transfer of ML methodologies from perovskites to PIMs. The structural and compositional similarities between perovskites and PIMs provide a feasible basis for this transfer. Therefore, before exploring ML strategies in depth, it is essential to comprehensively analyze the structural and chemical characteristics of both for subsequent research. This chapter systematically compares the structural and chemical characteristics of perovskites and PIMs, tracing the transition from traditional 3D perovskites to low‐dimensional perovskites and exploring the development trends of PIMs. We analyze how PIMs inherit and extend key properties of perovskites and their potential applications in PV. By comparing their structural and chemical properties, we establish a theoretical foundation for the migration of ML strategies, which will be further explored in subsequent chapters.

FIGURE 2.

FIGURE 2

Illustration of the crystal structure of different perovskites and PIMs. Perovskite 2D, 1D, and 0D structure reproduced with permission [28]. Copyright 2018, Elsevier. Chalcogenide perovskite's structure reproduced under terms of the CC‐BY license [29]. Copyright 2021, The Authors, published by IOP Publishing.

2.1. Perovskites

One of the most widely studied energy materials in recent years is halide perovskite material [30]. The research of halide perovskite materials applied to PV applications started in 2009; Miyasaka and his team first reported the use of organic‐inorganic lead halide perovskite materials CH3NH3PbI3 (MAPbI3) as light absorbers in dye‐sensitized solar cells and achieved a PCE of about 3.8% [31]. In 2012, Kim and co‐workers reported the first solid‐state perovskite solar cells with a PCE of 9.7% and 500 h stability, where the devices are stored in air at room temperature without encapsulation [32]; this became a milestone of halide perovskite materials for PV applications. Afterward, research on perovskite PVs experienced remarkable and rapid growth, leading to record efficiencies reaching 27% in recent years [2], and is approaching the PCE of the most efficient crystalline silicon solar cell (27.3%) [33]. ABX3 is a typical structure formula for 3D perovskite, where A is a monovalent organic or inorganic cation, B is an octahedrally coordinated divalent cation with a smaller radius, and X is usually oxygen or a monovalent halide such as Cl, Br, or I [34]. Theoretically, most elements in the periodic table can be the A or B sites of ABX3 to form perovskites, but the typical perovskite crystal structure can only be maintained when B site cation has a +2 oxidation state and the A site ionic radius can fit within the crystal structure [29]. In details, there are two main factors usually used to judge the structure formability and phase stability of perovskite, which are Goldschmidt tolerance factor t [35], and octahedral factor µ [36] as can be seen in Equations (1) and (2), where rA , rB , rX are ionic radii of A, B, and X respectively.

t=rA+rx2rB+rx (1)
μ=rBrx (2)

Empirically, stable perovskite structures should have t ranging from 0.81 to 1, while a µ value between 0.44 and 0.90 [37], which can only be achieved by a limited number of compositions [38]. Meanwhile, recent studies indicate that the accuracy of Goldschmidt tolerance factor t is insufficient especially for halide system [39]. To address this problem, Bartel et al. [40] introduced a new tolerance factor τ, which can be represented as

τ=rxrBnAnArA/rBlnrA/rB (3)

where, nA is the oxidation state of A. When τ < 4.18 indicates the perovskite structure. This new descriptor achieves a much higher accuracy (92%) compared to the traditional t (74%) and the prediction accuracy of halides system increased from 31% to 91%.

Currently, three main types of 3D perovskite are widely researched for PV applications: HOIPs, inorganic perovskite, and double perovskite.

2.1.1. Hybrid Organic Inorganic Perovskites

In addition to the HOIPs mentioned above with organic cations at the A site, in fact, the X site of HOIPs can also be replaced from the traditional halide element to an organic anion. The introduction of these organic components gives HOIPs more functionality and structural flexibility, which is unattainable by inorganic perovskite [39]. At the same time, their diverse structures and chemical spaces provide more opportunities to tune and regulate their physical properties through simple chemical modifications [41]. Since the X site is changed to an organic anion, the tolerance factor also needs to be adjusted by the following equation. Where h x is the length of the X site molecular ions.

t=rA+rx2rB+hx/2 (4)

Moreover, another key characteristic of HOIPs is the dynamic movement; the organic cations in HOIPs (occasionally also occurring on the X‐site) can undergo rotation, vibration, or multidirectional movement as can be seen in Figure 3a, leading to a disordered state, while for inorganic perovskites there are only off‐center displacement of A‐site cations [39]. When temperature decreases or environmental conditions shift, these organic molecules may become fixed in specific orientations, transitioning into an ordered state. This disorder‐to‐order transformation inevitably alters hydrogen bonding and intermolecular interactions, such as van der Waals and dispersion forces, thereby influencing the crystal symmetry of HOIPs. Symmetry changes can further lead to octahedral tilting, displacements, and order–disorder often result in more complicated phase transitions in HOIPs [42]. The interplay between the A, B, and X sites during phase transitions in HOIPs gives rise to different functionalities, such as conductivity [43] and dielectric properties [44] that are not observed in pure inorganic perovskites. Diverse A and X site options greatly expand the structural and chemical versatility of HOIPs, but the instability caused by the reaction of these molecules with the external environment prevents HOIPs from being further commercialized in the PV field [45]. Consequently, improving the stability of HOIPs stands as a primary research focus in this field. The instability of HOIPs mainly occurred under the following four environmental conditions: humidity, heat, light, and oxygen [45, 46, 47, 48]. The specific degradation mechanism under each condition is shown in shown in Figure 3b. Therefore, the general research direction to solve the stability of HOIPs in the field of materials is to screen out the combination of suitable HOIPs in the huge molecular space group so that it has the appropriate geometric structure and thermodynamic stability to obtain a more stable phase.

FIGURE 3.

FIGURE 3

Comprehensive illustration of the main challenges of HOIPs, inorganic perovskites, and double perovskites. (a) Illustration of the dynamic movement of A site organic cation reproduced with permission [37]. Copyright 2025, Springer Nature Limited. (b) The decomposition route of HOIPs under humidity, heat, light, and oxygen reproduced with permission [45]. Copyright 2016, Angewandte Chemie International Edition. (c) Structural phase transitions of CsPbI3 among cubic (α), tetragonal (β), orthorhombic (γ), and metastable (δ) phases, governed by octahedral distortions and temperature triggers reproduced with permission [49]. Copyright 2019, The American Association for the Advancement of Science. (d) calculated transition energy levels e(q/q’) for intrinsic acceptors (up) and intrinsic donors (down) in Cs2AgBiBr6 reproduced with permission [50]. Copyright 2016, WILEY‐VCH GmbH & Co. KGaA, Weinheim. (e) The schematic representation of electronic structure changing of Cs2AgInCl6 by Sb doping reproduced with permission [51]. Copyright 2017, Royal Society of Chemistry.

2.1.2. Inorganic Perovskites

To overcome the inherent instability of HOIPs, inorganic perovskites are attracting more attention due to their better thermodynamic stability under high temperatures. Among them, Cs‐based inorganic perovskite shows the best potential, which can maintain the original composition and crystal structure under high‐temperature conditions up to 400°C [52]. It is also the best choice for the preparation of tandem solar cells because of the suitable wide bandgap [53]. At the same time, inorganic perovskite materials also have the excellent optoelectronic properties of HOIPs, including high carrier mobility and long carrier lifetime [54], and the PCE of recorded CsPbI3 has reached to 21% [55]. Although inorganic perovskites have developed rapidly, they also face the problem of further commercialization which raise from their phase stability and phase transition. For example, CsPbI3 exhibits four distinct phases—cubic (α), tetragonal (β), orthorhombic (γ), and non‐perovskite yellow (δ). In the stable α phase (above 350°C), the material features a perfect three‐dimensional perovskite lattice where [PbI6]4 octahedra are connected by corners with a Pb─I─Pb bond angle of 180°, endowing it with excellent optoelectronic properties ideal for PV applications. When the temperature decreases, the bond angle is then reduced to approximately 170° (at around 260°C) and 150° (at around 175°C), and the structure transforms into the β and γ phases, respectively; these phases retain the three‐dimensional network, albeit with slightly distorted octahedra. In contrast, the δ phase, characterized by non‐corner‐sharing octahedra, exhibits a much larger bandgap and, consequently, poor light absorption and charge transport properties, making it unsuitable for solar cell use [49]. Notably, the phase transition process is reversible; when the temperature increases to above 350°C, the δ phase can transfer to α phase again. The phase transition route can be seen in Figure 3c. Researchers believe that there are two main reasons for the phase transition of Cs‐based inorganic perovskites: first, the tolerance factor of Cs‐based perovskites is usually close to the critical ideal value (about 0.81), which makes the material itself unstable in phase transition; second, the eccentricity of Cs+ ions will introduce local lattice strain, further destroying the ideal cubic symmetry. This structural mismatch makes it difficult for the lattice to maintain a stable cubic phase [52]. There are currently two research directions for solving the inorganic perovskite phase transition problem: crystal size reduction and doping. Crystal size reduction specifically refers to the use of quantum dot preparation methods, which leads to the formation of a more symmetrical crystal structure [56]. The doping technique is currently the most widely studied method. Doping new ions could modify the surface of the perovskite crystal or be incorporated into the crystal lattice, replacing one of its substituents [57]. For example, several doping strategies can be implemented to enhance the stability of Cs‐based perovskites. One approach involves A‐site doping to mitigate the inherent Cs off‐centering. Additionally, replacing the A‐site with a larger cation, performing monovalent or heterovalent doping at the B site (such as Sn2 +, Ge2 +) [58, 59], or incorporating smaller anions at the X‐site can collectively shift the tolerance factor away from its critical value. These modifications work together to stabilize the perovskite lattice and reduce the propensity for phase transitions [60, 61, 62]. Meanwhile, the band structure and the corresponding optical and electronic properties, such as light absorption coefficient, bandgap, and charge carrier diffusion length, can be significantly altered by element substitution on the B and/or X sites [63].

2.1.3. Double Perovskites

During the search for low‐toxicity B site alternatives in perovskites, In, Sb, and Bi have emerged as promising substitutes for Pb. However, because these elements typically exhibit a +3‐oxidation state, they cannot form the conventional ABX3 perovskite structure. This issue is circumvented by adopting a double perovskite architecture with the formula A2B'B″X6. In this structure, the A site and X site remain occupied by monovalent cations and halides, respectively, while two Pb2 + ions are replaced by a pair of monovalent B′ and trivalent B″ cations to maintain overall charge neutrality. Consequently, the crystal is composed of alternating [B'X6]5 and [B″X6]3 octahedra. The availability of a wide range of A site candidates (both inorganic and organic), diverse options for B site combinations, and various halide compositions at the X site provides expanded opportunities for tailoring material properties. Therefore, extensive investigations into double perovskites are essential [64, 65, 66, 67]. Theoretically, double perovskites could combine the high performance of hybrid organic‐inorganic perovskites (HOIPs) with the enhanced stability of inorganic ABX3 perovskites. However, the reported properties of these materials have not yet reached ideal levels. For example, halide double perovskites such as Cs2AgBiBr6 have achieved a power conversion efficiency of only 6.37% since their initial PV application in 2017 [68, 69], despite its excellent stability which can maintain a stable phase in ambient air (with a relative humidity of ∼60%) and in the dark for 3 months [70]. The current low efficiency performance of double perovskite is caused by two main factors: large/indirect bandgap, and the lower defect tolerance compared to traditional ABX3 inorganic perovskite. Bandgaps of double perovskites normally range from around 2 to 3.4 eV [71], making them less ideal for single‐junction PV applications, although bandgaps around 2 eV are still suitable for use as top cells in tandem solar cells. Therefore, researchers have also tried to use doping technology to change the electronic structure of double perovskite to lower the bandgap. Tran et al. [51] demonstrated their bandgap engineering of Cs2Ag(SbxIn1−x)Cl6 by doping Sb to partially substitute the original In element. In this way, its CBM electronic structure was successfully changed as a result for smaller bandgap, and due to the impact from the Sn‐5p orbital, the bandgap type was also changed from direct to indirect as can be seen in Figure 3d. The indirect bandgap normally yields low absorption coefficients (102–104 cm 1 in the visible region [72]). Compared with conventional ABX3 perovskites, double perovskites exhibit comparatively lower defect tolerance. In detail, Xiao et al. [50] studied twenty different point defects, including vacancies, cation‐on‐anion antisites, anion‐on‐cation antisites, and interstitials for Cs2AgBiBr6. Their study showed that AgBi and BiAg antisites, Bi and Br vacancies, are not thermally ionized within the lattice as can be seen in Figure 3e, forming uncontrollable non‐radiative recombination channels (governed Shockley‐Read‐Hall recombination), which eliminate the charges and deteriorate device performance [73]. Cs2AgBiBr6 thus shows defect intolerance, possessing more deep‐level defects compared to lead‐based ABX3 perovskites.

2.1.4. Low‐Dimensional Perovskites

The earliest studies on 2D metal halide perovskites were conducted by Maruyama et al. in 1986 [74]. Compared with the narrow compositional range of 3D metal halide perovskites, which are limited by satisfying the t‐factor constraint, the introduction of larger organic A′‐site cations (organic spacers) can relax the constraints on the formation of viable 2D perovskites [75]. 2D perovskites are composed of two parts: an organic interlayer and an inorganic octahedral layer. The introduction of interlayer cations not only introduces a new compositional dimension, represented by A′, but also brings unprecedented structural complexity, thus achieving tunability of optoelectronic properties [76]. The dimension reformation process can be seen in Figure 4a. Normally, the chemical formula of a 2D perovskite can be written as (A’)m(A)n–1BnX(3n+1), where the m represents the number of interlayer cations (usually 1 or 2) and n represents the number of inorganic layers [77], where the value of n can be controlled by adjusting the stoichiometric ratio of the A’‐site cations to the A‐site cations. When the thickness (number of layers) of the inorganic layer increases, the bandgap and exciton binding energy (Eb) of the material decreases [78]. However, when n is equal to 1, it is generally considered to be a pure 2D perovskite, in which the exciton binding energy is too high and therefore unsuitable for PV applications. When 2 ≤ n ≤ 5, it is called a quasi‐2D perovskite; when n > 5, it is called a quasi‐3D perovskite. When n tends to infinity, its dimension can be considered to be restored to a 3D perovskite. 2D perovskites can be divided into layered perovskites with <100>, <110>, and <111> orientations (see Figure 4b). Among them, the <100> orientation is dominant. The introduction of most interlayer cations will lead to a <100> oriented structure, making it the most common 2D perovskite structure. According to the charge of the cations, the interlayer cations can be divided into monovalent (+1) and divalent (+2). The different geometries of these cations have different effects on the crystal structure of 2D perovskites, because they may form ionic bonds with a single inorganic layer or two inorganic layers, respectively, spanning the organic spacer layer. Therefore, based on the <100> orientation, 2D perovskites can be further divided into three types according to the molecular structure of the organic cations: Ruddlesden–Popper (RP) phase, Dion–Jacobson (DJ) phase, and alternating cation interlayer (ACI) phase [79], as can be seen in Figure 4c. In most cases, the RP phase corresponds to monovalent cations, while the DJ phase corresponds to divalent cations. However, there are special cases where monovalent cations present in the DJ phase and divalent cations present in the RP phase [80, 81]. These exceptional cases, along with the ACI phase [82, 83, 84] involve very complex organic content, so we will not discuss it in more depth here. Currently, RP phase is the most widely studied one, characterized by the diversity of interlayer cations, which enables constructing a vast perovskite network. The 2D RP phase successfully improved the stability of the Sn and Ge based perovskite especially in terms of anti‐oxidation performance, when a larger organic cations butylammonium (BA) and 2‐phenylethylammonium (PEA) are incorporated with Sn or Ge in a halide system, it forms (PEA)2GeI4 and (PEA)2SnI4 with direct bandgap 2.12 and 2.2 eV respectively [85]. It showed better stability in humid environments [86]. In addition, this type of compound itself can also maintain thermal stability under high temperature conditions above 200°C [87]. DJ phase also shows great performance in terms of stability. For example, the (4AMP)(FA)3Sn4I13 absorber formed using 4‐(aminomethyl) piperidinium as the organic component only lost 9% of its initial PCE (initial efficiency of 4.22%) after continuous operation in a N2 atmosphere [88]. Even better results were achieved in a system using 1,4‐butanediamine (BEA)—(BEA)FA2Sn3I10, which maintained 90% of its initial efficiency of 6.43% after 1000 h of operation under similar conditions and also performed well after 200 h of operation at room temperature [89]. However, compared with 3D perovskites, 2D perovskites show poor charge transport properties due to the anisotropy of the inorganic skeleton. There are many reasons for this phenomenon. First, due to the structural symmetry, the dispersion of the inorganic layer in the vertical direction is almost zero in 2D perovskite materials, indicating weak interlayer coupling [90]. The movement of electrons in the direction perpendicular to the perovskite crystal plane is restricted, resulting in localized and narrowed energy bands. In addition, the distortion of the electronic structure caused by the electron‐phonon interaction in the Jahn‐Teller effect further narrows the localized energy band, resulting in a wider bandgap for 2D perovskites than their 3D counterparts [91]. Second, the unique physicochemical properties of the organic spacer cation ligands have a significant impact on the optoelectronic properties of 2D perovskites. The hydrophobic organic spacer cations can separate the conductive inorganic layers to form a quantum well structure as can be seen in Figure 4d. In this structure, the inorganic layer and the organic spacer layer act as potential wells and barriers, respectively. Organic interlayers destroy the orbital hybridization between adjacent inorganic layers, thereby confining photogenerated charge carriers within the inorganic layers, severely hindering external charge transport [92]. In addition, in 2D perovskites, the dielectric constant of organic interlayer cations is much lower than that of inorganic octahedral layers, which leads to the dielectric solid confinement effect. The charge screening ability provided by organic interlayers is weak, thereby enhancing the Coulomb interaction between photogenerated electrons and holes. The significant quantum confinement effect and the dielectric confinement effect together contribute to the higher exciton binding energy in 2D perovskites, making it easier for excitons to form rather than dissociate into free electrons and holes, thereby limiting the charge transport capacity and reducing the PCE [79, 93]. Third, the introduction of organic interlayer cations changes the stacking mode of inorganic octahedral layers and distorts the bond angle and bond length between metals and halogens. These changes significantly affect the orbital overlap between metal and halogen ions, thereby affecting the bandgap [76]. Fourth, 2D perovskite films prepared by solution method are usually composed of multiple quantum wells, whose width (n value) and orientation are randomly distributed, which will lead to reduced mobility and diffusion length of charge carriers from the perspective of the overall material [94, 95]. Although 2D perovskites exhibit good stability, low charge transport is still a big challenge for PV applications. Generally, strategies to improve the performance of 2D perovskite are still to adjust the A', B and X site components. For the A site, Hautzinger et al. [96] studied a variety of A‐site cations and found that FA+ and MA+ are “ideal” choices. Cations that are too large (such as EA+, DMA+) or too small (such as Cs+) can cause distortion of the inorganic layer, increase defects and non‐radiative recombination, and increase the bandgap. At the same time, Zhou et al. [97] found that FA+ helps reduce non‐radiative recombination centers and form high‐quality, highly oriented films. So, there is no need to engineer the A site at current stage. At A’ site, By introducing functional groups (such as ‐F, ‐OH, ‐CN) or heteroatoms (such as S), the distance and electronic coupling between inorganic layers can be finely controlled, which helps to improve charge transport and reduce trap density [98]. The selection of aromatic or conjugated organic cations can reduce the dielectric mismatch between the organic and inorganic layers, reduce the exciton binding energy, and enhance the crystal stability through π‐π interactions [99, 100]. Using symmetrical imidazolium‐based cation such as benzimidazolium (Bn) and benzodiimidazolium (Bdi) can narrow the bandgap. These cations have shown evidence that effectively narrow down the bandgap of Sn Based halide system to 1.81 and 1.79 eV for Bn2SnI4 and BdiSnI4, respectively [85]. At B site, where is usually Pb2 + (or Sn2 +, Bi3 + and other alternative ions). By replacing or partially doping, the band structure, carrier mobility and defect tolerance of the material can be adjusted [101]. At X site, Halogens play a vital role in two‐dimensional perovskites, and their influence can be summarized into three aspects: optical properties, structural regulation, and charge transport. Br and I can significantly improve the light absorption coefficient and material stability, but the diffusion rate is slow, while Cl exhibits excellent charge transport characteristics but weak absorption capacity. The different halogen ion radius, polarizability, and electronegativity will change the tolerance factor of the inorganic framework, leading to tetrahedral distortion, thereby regulating the lattice parameters, interlayer distance, and bandgap size; for example, the increase in Cl content will aggravate the lattice distortion and increase the bandgap, while affecting the formation of self‐trapped excitons and changing the photoluminescence properties [102]. In addition, halogens also change the valence band structure by regulating the degree of orbital mixing between metals and halogens, so that the bandgap gradually decreases when the I content increases [103]. Therefore, the ratio of different X site components has become an important design strategy for optimising the optoelectronic properties of 2D perovskites.

FIGURE 4.

FIGURE 4

Schematic diagram of 2D perovskites reproduced with permission [76]. Copyright 2024, Wiley‐VCH GmbH. (a) different n‐values. (b) different crystal plane orientations. (c) 2D perovskite structure of RP phase, DJ phase, and ACI phase. (d) Schematic diagram of a quantum well (QW), where “O” and “I” represent organic and inorganic layers in 2D perovskites.

2.2. Perovskite‐Inspired Materials (PIMs)

PIMs are another class of semiconducting materials designed to retain the outstanding optoelectronic properties of lead halide perovskites while mitigating their instability and toxicity [29]. Most PIMs are lead‐free systems that emulate the light absorption and charge transport properties of traditional perovskites by precisely tailoring their crystal structure and chemical composition [104, 105, 106, 107].

2.2.1. Chalcogenide Perovskites

Chalcogenide perovskite is new emerging PIMs which also have the ABX3 chemical formula, the A site remains occupied by monovalent or divalent cations while the B site is typically composed of high‐valent metals such as Zr4 + or Bi3+ and the halogen element (such as Cl, Br, I) of traditional perovskites with chalcogenide (such as S, Se, Te). Widely studied chalcogenide perovskites including BaZrS3, SrZrS3, and LaYS3 [29]. The composition substitution not only eliminates toxic Pb but also strengthens the metal–chalcogen bonds, thereby enhancing the thermal and moisture stability relative to traditional halide perovskites [108]. For example, the LaYS3 thin film perovskite has already been used in n‐i‐p cell structure, verifying the device feasibility of chalcogenide perovskite [109]. At the same time, the 1.8 eV bandgap of BaZrS3 is very suitable as the upper layer of silicon cells. Researchers estimate that if BaZrS3 correlated with silicon as a tandem solar cell, the PCE is hopefully to achieve around 35% [109]. However, this promising approach comes with its own set of challenges. The strong covalent character of the chalcogenide bonds often induces structural distortions as can be seen in Figure 5a, resulting in an orthorhombically or even hexagonally distorted perovskite lattice rather than the ideal cubic symmetry found in typical ABX3 perovskites [35, 110, 111]. Consequently, most chalcogenide perovskites such as BaZrS3 generally exhibit wider bandgaps, typically in the range of 1.7 to 2.1 eV, which can be used in tandem solar cells, but is not able to be used directly in single‐junction cells [109]. Although some elements can be used to dope anions, for example, replacing part of sulfur with a certain proportion of Se can adjust the bandgap range of BaZr(S,Se)3 from 1.5–1.9 eV, its structural stability will gradually decrease with the increase of Se content [112]. Therefore, optimising the bandgap by regulating the chemical composition while maintaining structural stability is a major research direction of chalcogenide perovskite.

FIGURE 5.

FIGURE 5

Illustration of chalcogenide perovskite structures and composition engineering of chalcohalide. (a) Distorted structures of chalcogenide perovskites reproduced with permission [111]. Copyright 2022, American Chemical Society. (b) Split‐anion approach of chalcohalide material to replace Pb in CH3NH3PbI3 reproduced with permission [120]. Copyright 2016, Royal Society of Chemistry.

2.2.2. Chalcohalide Materials

Chalcohalide materials, which contain compounds of both chalcogen anions and halogen anions and is generally described by the general formula MChX (M is a metal cation, Ch is a chalcogen anion, and X is a halogen anion). This type of material can be considered as a derivative PIM from typical perovskite or chalcogenide perovskite, a “split anion” system formed by partially replacing halogen with chalcogen elements (or vice versa). Chalcohalide materials have shown considerable promise in various energy applications, ranging from PVs to photocatalysis and thermoelectrics. Their ability to combine the properties of both chalcogenide perovskites and typical halide perovskite allows for a versatile manipulation of their band structures and electronic behaviors, making them highly adaptable for different energy‐related technologies [113, 114]. Currently, the chalcohalide materials can be classified into four categories: heavy pnictogen chalcohalides, transition/post‐transition chalcohalides, mixed‐metals chalcohalides, and hybrid organic‐inorganic chalcohalides.

In terms of the heavy pnictogen chalcohalides, SbSI and BiSI are two examples that have been studied extensively in the early stage. They exhibit low effective mass, high absorption coefficient and suitable bandgap. In addition, since Sb3 + and Bi3 + has ns2 lone electron pairs, its chemical properties can mimic Pb2 + to a certain extent, thus giving the material better defect tolerance and optoelectronic properties. This type of material is not only suitable for PV applications but also shows potential advantages in fields such as ferroelectricity and photocatalysis [115, 116, 117]. For transition/post‐transition chalcohalides, this type usually uses transition metals or post‐transition metals as the M position, and its electronic structure and crystal chemistry are different from those of heavy group chalcohalides. First report about transition/post‐transition chalcohalides was in 1960 for Ag3SI and Ag3SBr, which chemical structure is also called anti‐perovskite structure [118]. Ag3SX (X = I, Br, or Cl) normally have a bandgap at around 0.9–1.1 eV at room temperature and is suitable for single‐junction and tandem solar cells but is currently limited by expensive and time‐consuming synthesis and deposition processes (e.g., laser ablation requires pre‐synthesis under high temperature vacuum conditions) [119]. By introducing two or more different metal ions at the M position, it forms the mixed‐metals chalcohalides that the bandgap, band edge and defect energy level can be finely controlled. The mixed metal system can not only adjust the light absorption characteristics of the material to a certain extent but also improve its crystal stability and carrier transport performance. This multi‐metal cation strategy is similar as the double perovskite for designing materials to achieve both high PCE and good stability [113]. In the hybrid organic‐inorganic chalcohalides, organic cations are combined with inorganic chalcohalide structures to form a hybrid structure similar to traditional HOIPs [120] as can be seen in Figure 5b. The hybrid structure helps to further reduce the bandgap of the material, improve interface bonding, and regulate the growth dynamics of the film. It may also introduce flexibility and adjustability unique to organic components. But at the same time, the hybrid system also faces challenges in terms of long‐term environmental stability from moisture and heat [121]. It is worth noting that for any type of chalcohalide materials, although the introduction of mixed anions can increase the covalency of the B─X bond and thus improve the structural stability, it also increases the thermodynamic difficulty of forming the perovskite phase [20]. Therefore, the research direction of chalcohalide should be to screen candidates with a suitable bandgap of thermodynamic temperature in a huge space combination group.

3. Candidates Screening and Problem Formulation

Whether for perovskite or PIMs, the biggest challenge lies in how to identify potential candidate materials from the nearly infinite chemical space. Currently, the most widely used method is to set up a reasonable “funnel” screening channel [29]. Specifically, limiting element selection to non‐toxic elements is an effective first step in PIMs screening, followed by further screening, including charge neutrality, valency and electronegativity can effectively reduce the search space by two orders of magnitude from 1012 possibilities [122]. The Goldschmidt tolerance factor t, octahedral factor µ and new tolerance factor τ mentioned above are the primary standard for screening traditional 3D perovskite structure candidates. Successfully applying ML to the design of perovskites and PIMs requires correctly formulating the problem into a suitable method for ML. This step shapes the project framework, influencing algorithm selection, data collection, and evaluation methods [8]. According to the discussion in the previous chapter, we can find that the main research direction of perovskite and PIMs with organic composition is the selection or modification of organic cations to solve the stability problem, and the directions of inorganic perovskites and PIMs can be summarized as composition engineering to improve photoelectric performance, and the main direction of low‐dimensional perovskite is to improve passivation capability or surface optimization capability when cooperating with 3D perovskite. So based on different demands, the screening criteria are further determined by the following four properties: stability, synthesisability, optical properties and electrical properties.

3.1. Stability

Thermodynamic stability is a fundamental property that determines whether a material can remain in its desired phase without decomposing into other phases under equilibrium conditions. For PV materials, achieving high thermodynamic stability is crucial to ensure durability and consistent performance over extended periods. Thermodynamic stability is typically evaluated using two key parameters: the formation energy (Ef ) and the energy above the convex hull (Ehull ). Formation energy measures the energy required to form a compound from its constituent elements. A lower Ef indicates that the material is energetically favourable and less likely to decompose. Mathematically, it is expressed as:

Ef=Etotaliniμi (5)

Where Etotal is the total energy of the compound, ni is the number of atoms of element i, and µ i is the chemical potential of element i. For a PV material, a negative and low Ef relative to competitive phases is desirable to ensure phase stability during operation. In high‐throughput computational screening and ML‐guided materials discovery, Ehull  is widely recognised as a more rigorous indicator for thermodynamic stability of a material than Ef alone. Ehull quantifies the energy above convex hull, the thermodynamic ground state of all competing phases at a given composition. A value of Ehull =  0 indicates that the compound lies on the convex hull and is thus thermodynamically stable. Materials with Ehull > 0 are metastable, meaning they may decompose into more stable phases [123]. However, metastable compounds can still be experimentally synthesizable, especially if kinetic barriers prevent decomposition. A widely adopted threshold for synthesizability is Ehull ≤ 50 meV/atom, under which materials are generally considered stable enough for practical synthesis and applications [124].

3.2. Synthesizability

While thermodynamic and kinetic stability are foundational, they are insufficient to guarantee lab‐scale synthesizability. In many cases, compounds predicted to be thermodynamically stable remain unsynthesized due to precursor incompatibility, or complex reaction pathways [125, 126, 127, 128, 129]. To address this gap, data‐driven approaches have recently been developed to directly assess synthesizability based on historical experiment examined datasets. For example, positive‐unlabelled (PU) learning models, label experimentally confirmed entries from the Inorganic Crystal Structure Database (ICSD) as positive samples (P), while unverified computational structures from the Materials Project (MP) are treated as unlabelled data (U). Since explicit negative examples are unavailable, a subset of U is randomly selected and temporarily assigned as negative samples. Then, using both P and this sampled subset, the model is trained as a binary classifier. This process is carried out multiple times using bootstrap aggregation. The model produces a Crystal‐Likeness Score (CLscore) between 0 and 1, which shows how likely it is that the synthesis will be successful. This partially supervised strategy lets the model make statistical predictions about the structural patterns that make up synthesizable materials without needing negative data that has already been set up which have achieved predication accuracies of 75–88% [130]. Based on PU learning, Gleaves et al. [131] developed semi‐supervised teacher–student neural network which introduces a dual‐network architecture to dynamically leverage both labelled (from ICSD and PU learning) and unlabelled data (from MP). In this framework, the teacher model generates pseudo‐labels for unlabelled samples, while the student model learns from these labels and provides feedback to refine the teacher's predictions. This adaptive learning mechanism effectively mitigates bias from limited labelled data and achieves superior synthesizability prediction performance up to 92.9%, surpassing conventional PU learning approaches. Moreover, recent Crystal Synthesis Large Language Models (CSLLM) converts crystallographic data (CIF and POSCAR) into “Material Strings” that encode lattice parameters, space groups, and atomic coordinates in a textual format suitable for natural language processing. which can not only predict the synthesizability of materials with 98.6% accuracy, but also can provide precursors path for screened materials [132]. These emerging alternative approaches dramatically reduce the reliance on extensive thermodynamic and AIMD calculations, thereby accelerating the pace of materials discovery.

3.3. Optical Properties

The bandgap is of paramount importance among all optical properties. Only materials with appropriate bandgap types (direct / indirect) and values can serve as potential candidates for ideal PV materials and the suitability of the bandgap value is determined by Shockley–Queisser (SQ) limit [133]. According to the SQ limit, the maximum efficiency achievable for a single‐junction solar cell is approximately 33% [134]. This maximum efficiency is being continuously approached by materials with bandgap values in the range of 1.1–1.45 eV. This range includes many technologically significant semiconductors, such as crystalline silicon (Si), gallium arsenide (GaAs), and copper indium gallium selenide (Cu(In,Ga)Se2) across a wide range of indium‐to‐gallium ratios, as well as cadmium telluride (CdTe). Currently, the highest certified efficiency record for single junction solar cell is around 29.1%, achieved using GaAs thin film technology by Alta Devices [135, 136]. For multi‐junction or tandem solar cells, the design aims to combine semiconductor with various bandgaps to achieve higher efficiencies. An ideal configuration would involve pairing the bandgap of silicon or Cu(In,Ga)Se2 at approximately bandgap around 1.1 eV with a higher bandgap material in the range around 1.7 eV [137]. Therefore, the bandgap of the ideal PV material needs to meet the criteria for 1.1–1.45 eV for single‐junction or around 1.7 eV for multi‐junction (Perovskite/silicon tandem solar cells). In addition to the bandgap, the optical absorption coefficient is another key optical property. An ideal PV material should have a high absorption coefficient to absorb sunlight efficiently. This means that the material should have strong absorption capabilities in the visible and near‐infrared spectrum. Generally, materials with an optical absorption coefficient above 104 cm−1 can be considered to have good light absorption capabilities [138]. A higher light absorption coefficient means that a thinner absorber layer can absorb most of the incident light, which is very important for reducing material costs and improving device efficiency [139]. Additionally, exciton binding energy is another important optical property worth considering. A low exciton binding energy is preferred, as it facilitates the dissociation of photo‐generated excitons (electron‐hole pairs) into free carriers, thereby improving charge collection efficiency and overall device performance [140].

3.4. Electrical Properties

Electrical properties are another critical factor in determining the suitability of a material for PV applications. These properties primarily influence the collection and transport efficiency of photogenerated charge carriers, directly affecting the overall PCE of solar cells. Among the key electrical properties, carrier mobility (µ) and carrier lifetime (τ) are of paramount importance [137]. Carrier mobility quantifies how quickly charge carriers (electrons and holes) can move through a material under the influence of an electric field. High carrier mobility ensures efficient charge transport to the electrodes before recombination occurs, minimizing energy losses during the process. For an ideal PV material, the carrier mobility should be sufficiently high to support long‐distance transport of charge carriers without significant scattering or recombination. However, direct prediction of carrier mobility is time‐consuming. Therefore, the effective mass can normally be used as alternative metrics to evaluate if a material has high carrier mobility. Carrier mobility is inversely proportional to the effective mass, meaning that the lower the effective mass, the higher the mobility. For example, gallium arsenide (GaAs) has an effective mass of 0.067 and a high mobility of 8500 cm2/V·s, while silicon (Si) has an effective mass of 1.09 and a lower mobility of 1400 cm2/V·s. Therefore, any compound with an effective mass below 1 is highly likely to be a promising PV material [141]. Likewise, carrier lifetime defines the average time photogenerated charge carriers (electrons and holes) exist before recombination occurs. It directly affects the efficiency of charge collection and transport within solar cells [142]. However, due to the complex dependencies, it is difficult to be captured directly with ML models, so it is also necessary to select simple and alternative targets and criteria. For example, Debye temperatures can be used. When it is greater than 500k, it can be beneficial for suppression of nonradiative combination and thus improve carrier lifetime [143]. In addition, defect tolerance has become increasingly crucial for perovskites and PIMs. Shallow energy level defects have little effect on carrier transport, while deep energy level defects will become non‐radiative recombination centers, seriously reducing carrier lifetime and device efficiency [144]. Therefore, accurately predicting defect energy levels is essential for evaluating the electronic properties of materials [145].

4. Data Collection and Processing

The dataset utilized for ML often consists of both features (independent variables) and targets (dependent variables) that are related to the materials. Independent variables, also called features or descriptors, are the specific details that represent the structure and characteristics of materials. These details include the chemical composition, atomic or molecular parameters, structural parameters, and the technological conditions used in the synthesis process. The dependent variables are the specific properties of the materials that are influenced by the independent factors, which are also referred to as the target variables [146, 147]. For accurate ML prediction, the training dataset must be both sufficiently large and of high quality. While the required data volume varies depending on the model type and task complexity, large models such as neural networks and deep learning architectures, typically demand dataset ranging from 104 to 106 entries to ensure robust and generalizable performance [148]. In terms of data quality, the use of high‐quality data can prevent the consideration of erroneous, missing, or redundant information; hence we need to ensure data comes from reliable sources [8]. Current main data sources can be broadly categorized into two types: public sources and self‐generated data. Public sources are further divided into two categories: scientific literature and public databases. Scientific literature offers the latest research finding and serves as a rich resource for data, including experimental results, synthesis procedures, and property measurements [149, 150]. With the development of materials science, public databases are now becoming the main data sources in ML projects [151, 152]. Currently, there are many reliable public databases such as MP, ICSD, Open Quantum Materials Database (OQMD), etc [153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171]. These databases illustrated in Table 1 collectively store a vast array of structures and properties of materials. Self‐generated data sources can also be divided into two parts, experimental datasets and computational datasets [172]. Experimental datasets are fundamental as they offer ground truth data against which ML models can be trained and validated, it normally compiled from laboratory experiments and provide accurate data on various compounds including chemical composition, synthesis methods, crystal structures, and measured properties like bandgap, stability, carrier mobility and efficiency [173]. Computational datasets are primarily derived from DFT calculations [174, 175]. It allows researchers to simulate material properties, predict their stability, and optimize compositions for specific applications. DFT is an ab initio computational method that minimizes computational intensity of electronic structure calculations by leveraging charge density as a key variable as opposed to the more complex wavefunction. Although DFT formulation is exact, practical DFT employs approximate so‐called exchange‐correlation functionals including local density approximation and generalized gradient approximation, and also hybrid functionals, to achieve a balance between computational efficiency and accuracy for predicting electronic structures, bandgaps, phonon properties, and reaction energetics of materials [176]. Another crucial consideration in data collection is bandgap predication, the inconsistencies arising from different levels of theory in DFT calculations. For example, bandgaps computed using the GGA, such as Perdew‐Burke‐Ernzerhof (PBE) functional, are known to systematically underestimate experimental values. In contrast Heyd‐Scuseria‐Ernzerhof (HSE) estimates bandgap that are typically in much closer agreement with experimental results [177, 178, 179]. Mixing bandgap values obtained from different functionals within the same training dataset can introduce significant biases and degrade model performance. Therefore, to ensure accuracy and generalizability in ML models, it is essential to clearly distinguish and consistently use data generated from same level of theory during dataset assembly. Beyond electronic structure calculations, AIMD simulations are increasingly used to study the finite‐temperature mechanical and thermodynamic behavior of materials at the atomic scale. These simulations capture how materials respond to varying temperatures, pressures, and external forces, generating extensive time‐series data. This data is frequently leveraged in ML frameworks to develop interatomic potentials and predict transport properties such as diffusion and mechanical response [152]. Building upon these computational approaches, high‐throughput computational screening has become a powerful and systematic approach for accelerating materials discovery by automating large‐scale first‐principles calculations. It leverages computational frameworks, databases, and workflow automation tools to efficiently explore vast compositional and structural spaces, significantly reducing the time and effort required to identify promising materials for specific applications. Commonly used tools for high throughput screening can be seen in Table S2 [180, 181, 182, 183, 184, 185, 186, 187, 188]. However, due to experimental conditions, measurement technology, human bias, and other factors, the same type of experimental data may differ across different sources, leading to data inconsistency problems. To address this issue, it is essential to compare data from multiple databases and apply techniques such as data fusion [189], data reconciliation [190], and consensus methods [191]. Beyond these methods, Multi‐fidelity learning has recently become another powerful tool to combine different datasets with different levels of accuracy and computational cost. In materials science, this method allows for the integration of low‐fidelity data (such as PBE‐based DFT calculations) with high‐fidelity results (like HSE or experimental measurements) to create hierarchical models that reflect both general patterns and specific physical relationships [192]. It can move knowledge from one dataset to another by understanding how deviation are related. This makes predictions more accurate and cuts down on the need for expensive high‐level calculations [193]. This method is quite useful for perovskite and PIMs, where there aren't many experimental data and the accuracy of theoretical datasets varies depending on the functional or computational scheme used. Figure 6a schematically illustrates the fundamental principle of multi‐fidelity learning.

TABLE 1.

Public material databases.

Database Data Type URL Free API
AFLOWLIB Inorganic & Computational http://aflowlib.org Yes Yes
ASM Inorganic http://www.asminternational.org No No
AiiDA Alloy Phase Diagram http://www.aiida.net Yes Yes
C2DB 2D https://cmr.fysik.dtu.dk/c2db/c2db.html Yes Yes
CMR Multiple (3D, 2D) https://cmr.fysik.dtu.dk Yes Yes
COD Multiple & Experimental http://crystallography.net Yes Yes
Cambridge Structural Database (CSD) Multiple https://www.ccdc.cam.ac.uk No Yes
ChEMBL Bioactive molecules https://www.ebi.ac.uk/chembl Yes Yes
ChemSpider Multiple https://www.chemspider.com Yes Yes
Citrination Multiple https://citrination.com No Yes
Clean Energy Project Solar cell https://cepdb.molecularspace.org Yes No
EELS Data Base Spectra https://eelsdb.eu Yes No
GDB Small organic molecules http://gdb.unibe.ch Yes No
HTEM Inorganic https://htem.nrel.gov/ Yes No
ICSD Inorganic & Experimental https://icsd.fiz‐karlsruhe.de No Yes
JARVIS‐DFT 2D https://www.ctcms.nist.gov/~knc6/JVASP.html Yes No
LPF Multiple https://paulingfile.com Yes No
MPDS Multiple https://mpds.io/#/modal/menu Yes No
MatNavi Multiple https://mits.nims.go.jp Yes No
MatWeb Engineering http://matweb.com Yes No
Materials Cloud Multiple (3D, 2D) https://www.materialscloud.org Yes No
Materials Commons Computational http://materialscommons.org Yes No
Materials Project Multiple https://materialsproject.org Yes Yes
NOMAD Multiple https://nomad‐repository.eu Yes Yes
NREL Materials Computational & Renewable https://materials.nrel.gov Yes No
Nano‐HUB Nanomaterials http://nanohub.org Yes No
OMDB‐GAP1 Organic crystals https://omdb.mathub.io/dataset Yes No
OQMD Multiple & Computational http://oqmd.org Yes No
PCD Multiple http://www.crystalimpact.com/pcd No No
Springer Materials Multiple https://materials.springer.com No No
Supercon Superconducting https://supercon.nims.go.jp Yes No
TEDesignLab Thermoelectric http://www.tedesignlab.org Yes No
XAFS database Spectra https://www.cat.hokudai.ac.jp/catdb/ Yes No

FIGURE 6.

FIGURE 6

Illustration of data processing methods. (a) Multi‐fidelity learning. (b) Active learning.

As a complement to multi‐fidelity learning, active learning provides a dynamic strategy to further optimize data acquisition and model training. Instead of processing data in a static and linear manner on fixed dataset, active learning continuously evaluates the data within a closed loop [194, 195]. As shown in Figure 6b, the workflow begins with an initial model trained on a small labelled datasets, which then evaluates a large unlabelled dataset. A query strategy based on uncertainty, diversity, or expected model improvement, selects a subset of the most informative samples for further evaluation. These selected samples are then labelled through human annotation, and the updated labelled dataset is used to retrain the model. This creates a closed feedback loop that keeps improving predictive accuracy and generalization.

The integration of these data processing techniques into the research of perovskites and PIMs will definitely solve the data shortage problem, thereby accelerate the pace of material discovery and design.

5. Feature Engineering and Selection

In ML, feature engineering and selection are two important steps following data collection. The primary objectives of these two processes are multifaceted: enhancing the performance of ML models, reducing computational complexity, and improving model interpretability. A complete feature engineering process should include feature extraction, preprocessing, normalization, and standardization [196]. In the context of periodic inorganic materials such as inorganic perovskites or PIMs, feature extraction is often carried out using domain‐specific tools such as Matminer [197], Pymatgen [186], the smooth overlap of atomic positions [198], and component‐based feature vector [199]. These tools can effectively generate features based on crystal structures or chemical formulas making them suitable for inorganic compounds with periodic lattices. For organic compounds, feature extraction commonly employs specialized molecular representation methods, as illustrated in Figure 7. These molecular representations are crucial for ML applications [200]. Typical representations for organic molecules include molecular fingerprints, simplified molecular‐input line‐entry systems (SMILES), potentials, weighted graph representations, Coulomb matrices, bag‐of‐bonds or fragments, three‐dimensional (3D) geometry, and electronic density distributions. Molecular fingerprints represent molecules as binary vectors indicating the presence or absence of particular chemical substructures, which allows efficient analysis of structure‐property relationships. SMILES provides a textual encoding of molecular structures, facilitating its use in cheminformatics and molecular design workflows. Potentials describe the energetic landscapes of molecules, useful in predicting their stability and reactivity. Weighted graph representations interpret molecules as graphs composed of atoms (nodes) and chemical bonds (edges), with detailed atomic and bonding information, thus enabling graph neural networks (GNNs) to effectively capture the topological and chemical characteristics. The Coulomb matrix encodes molecules based on atomic charges and spatial positions, effectively capturing internal electrostatic interactions. The bag‐of‐bonds or fragments approach decomposes molecules into distinct bonding environments or functional groups, offering intuitive and interpretable chemical descriptors. Additionally, the accurate representation of molecular geometry via 3D coordinates provides essential spatial structural information, which is crucial for predicting properties influenced by stereochemistry and molecular conformations. Finally, electronic density distributions derived from quantum chemical calculations directly reflect electron distributions within molecules, enabling precise predictions of electronic properties and chemical behavior. Utilizing these diverse molecular representation techniques significantly enhances ML models' ability to predict various properties of organic compounds, such as electronic, optical, thermal characteristics, chemical reactivity, and stability, thus supporting efficient inverse molecular design processes.

FIGURE 7.

FIGURE 7

Different molecular representations commonly used in feature extraction for organic compounds in ML: (1) Molecular fingerprints — binary encodings indicating the presence or absence of specific chemical substructures; (2) SMILES strings — textual representations that describe molecular structure; (3) Potential energy surfaces — representations of molecular energetic landscapes; (4) Weighted molecular graphs — nodes (atoms) and edges (bonds) capturing topological and bonding information; (5) Coulomb matrices — numerical encodings of pairwise electrostatic interactions; (6) Bag‐of‐bonds or fragment descriptors — simplified vectorized groups reflecting local chemical environments; (7) 3D geometries — explicit spatial coordinates of atoms; and (8) Electron density maps — spatial distributions derived from quantum chemical calculations. Reproduced with permission [200]. Copyright 2018, The American Association for the Advancement of Science.

These standardized feature generation tools form the cornerstone for transferring ML strategies developed for perovskites to PIMs. By providing a unified and physically consistent feature representation framework, they bridge the structural and compositional similarities between the two material families. Once the reliability of these descriptors and corresponding ML models has been verified on large, data‐rich perovskite datasets, the entire feature–ML framework can be confidently extended to data‐augmented PIMs datasets, enabling comparable predictive accuracy and interpretability.

When data is abundant but contains redundancy and irrelevant features, feature selection becomes a crucial step in data preprocessing. It effectively addresses the ‘curse of dimensionality’, a phenomenon where the feature space becomes exponentially large and the model's performance starts to deteriorate rather than improve [201]. Feature selection is categorized into three approaches: filter, wrapper, and embedded methods [202]. The filter method selects features based on their intrinsic properties, independent of any ML model. It ranks features using a relevance score and applies a predefined threshold to determine the optimal subset. The workflow of this method is more like a ranking process, selecting the high‐scoring feature and discarding low‐scoring features [203]. The wrapper method, on the other hand, assesses subsets of features based on their effectiveness in improving a particular model. It wraps the feature selection around the learning algorithm and, using cross‐validation, then evaluates the benefits by adding or removing a feature. A commonly used wrapper method is sequential forward selection (SFS). SFS starts with an empty feature set and iteratively adds one feature at a time, selecting the feature that maximizes model accuracy at each step [204, 205]. Embedded methods integrate feature selection as part of the model training process, offering a balance between filter and wrapper methods [206, 207]. During normalization and standardization, features and target matrices are rescaled to ensure equal footing, preventing large‐magnitude features from dominating and reducing accuracy [208]. Table S3 shows some important features of perovskites.

6. Model Selection

Model selection is a crucial step in any ML project; it not only determines the prediction performance but also its applicability to new data. The complete procedure starts with choosing the most appropriate algorithm from a pool of candidates in different ML methods to effectively capture the fundamental patterns or features of datasets. A well‐chosen model can effectively balance bias and variance, minimizing overfitting and underfitting, thus ensuring reliable performance on unseen data. Furthermore, model selection has a significant impact on computational efficiency which is crucial for processing large datasets. Liu et al. [209] also highlighted that the model selection is one of the five main parts of a complete ML‐driven perovskite research, along with data curation, feature engineering, model validation, and interpretability. Their framework provides a valuable reference for the algorithm selection discussed in this section. In the following chapter, we mainly discuss the established ML algorithms covering supervised, unsupervised, and reinforcement learning which have been applied to perovskite materials and devices. And we also highlight the emerging trend toward advanced generative models, which represent a promising direction for the future of new materials discovery. A brief overview of more detailed ML categories and algorithms can be found in Figure S1.

6.1. Supervised Learning

Supervised learning is a task‐driven approach aimed at constructing a relationship between a collection of input variables (X) and an output variable (Y). This relationship is then used to predict outcomes for unknown data [210]. There is another important concept in supervised learning: Interpolation and extrapolation (as can be seen in Figure S2). Interpolation involves making predictions within the scale of the training data, the model estimates outputs for inputs that are similar or within the bounds of the data it has already seen [211]. In material science, interpolation generally corresponds to predictions made within the composition or structural space represented in the training dataset. When a model encounters compounds with similar element combinations, lattice types, or bonding environments to those it has already learned, the predictions are usually more stable and accurate. By contrast, extrapolation involves making predictions outside the domain of the training data [211]. In this case, the model's predictive reliability deteriorates due to the lack of reference data, leading to higher uncertainty and potential overfitting to complex correlations.

To tackle this issue, uncertainty quantification (UQ) methodologies, including ensemble learning [212] and Bayesian neural networks [213] have been progressively implemented to evaluate model confidence and differentiate between epistemic (model‐related) and aleatoric (data‐related) uncertainty. Adding UQ to ML‐driven screening pipelines makes predictions easier and more accurate. This helps researchers focus on candidates who are likely to perform satisfactorily with minimal uncertainty, which makes data‐driven materials discovery stronger.

6.1.1. Linear Models

The linear regression model, along with its derivation, multilinear regression, stands as the most basic and extensively utilized ML model in material science, often referred to as:

fw,bx=ωx+b (6)

where, f w,b (x) abbreviated as y is the target property, x is the feature descriptor, b is the bias and ω is the regression weight which is the value to construct the relationship between y and x. Normally, feature descriptors are multiple and multidimensional which can be presented in matrix: X=x11x12x13x21x22x23x31x32x33, with the corresponding target property of y=y1y2y3, and ω=ω1ω2ω3, then for the given data x 1, that is, the first column data of the matrix X, the prediction result µ1 will be given by the following equation:

μ1=x11x21x31Tω1ω2ω3 (7)

Logistic regression is another type of linear regression which is called generalized linear model for classification tasks. Its primary goal is to establish a regression equation for classification by defining the decision boundary using existing data [214]. It achieves this through the sigmoid function, which models conditional probability. The sigmoid function is expressed as:

gz=11+ez (8)

Where z represents either a single value or an array of values. When applied to arrays, the sigmoid function leverages vectorization, enhancing computational efficiency. The logistic regression equation applies the sigmoid function to the linear model, as follows:

fw,bx=gwx+b (9)

Linear models are easy to train and interpret and can achieve good results for properties with approximately linear relationships. Li et al. [215] used ridge regression to predict the thermodynamic stability of perovskite oxides, and the results showed that ridge regression has the smallest prediction deviation for thermal stability. The advantages of linear models are simple implementation, fast speed, and interpretable results, which are suitable for small data sets or preliminary analysis. The disadvantages are that they perform poorly when there is a nonlinear relationship between properties and descriptors, and it is difficult to capture complex interactions, and artificial features need to be constructed to compensate for the underfitting of the model [216].

6.1.2. Kernel‐Based Models

SVM is an advanced model, with its fundamental concept revolving around identifying the optimal hyperplane that separates the different classes in the feature space. In two‐dimensional space, this hyperplane manifests as a line, aiming to characterize the data into distinct classes. The core of an SVM lies in solving a quadratic programming problem [217] to maximize the margin between the data points and the hyperplane. The equation is:

fw,bx=wT+b=i=1maiyixi,x+b (10)

The 〈x (i), x〉 above is the Kernel function applied on input space (x (i), x) which can be also written as k(x (i), x). As mentioned, SVM excels in situations requiring linear two‐dimensional classification. However, when the data is not linearly separable in the original space or input space, a transformation is necessary to map the data from the original space to a higher dimensional feature space. The goal is to make the classes linearly separable in this new, high‐dimensional feature space, allowing for the fitting of decision boundaries to separate classes and make predictions. A crucial technique in SVM for addressing non‐linear data is the kernel trick. This technique allows SVM to efficiently handle non‐linear data by enabling learning algorithms to function in high‐dimensional spaces without changing the underlying algorithm. However, kernel techniques can also lead to the curse of dimensionality in some cases, especially when the feature space dimension is too high, the data becomes sparse, increasing the difficulty of model training and the risk of overfitting. In addition, the computational complexity of the kernel matrix grows quadratically with the size of the dataset, resulting in a significant increase in computational and memory costs, especially on large datasets. Improperly selected kernel functions may also make the model too complex, further exacerbating the problem of the curse of dimensionality. Therefore, when using kernel techniques, it is necessary to carefully select the kernel function and balance the data size with the model complexity to avoid the effects of the curse of dimensionality and excessive computational complexity. There is a simplified explanation of kernel trick, supposing we have data (x(i),x)X, and there is a mapping from original feature x to high dimensional feature vector x(x), then a kernel function k (x, z) =  ∅(x (i)) T ∅(x) can be applied to simplify the high dimensional vector.

Kernel‐Based Models can be used to predict many perovskite properties: Feng et al. [218] used SVM to predict the bandgap of HOIPs, the SVM achieved 0.974 R2 which showed relatively acceptable accuracy. In addition, SVM also has advantages in predicting crystal structure and lattice parameters. Jarin et al. [219] used genetic algorithm‐supported SVM to achieve 87% accuracy in predicting crystal structure type (including Cubic, orthorhombic, tetragonal and orthorhombic crystal systems) and 95% accuracy in predicting lattice parameters. On the other hand, kernel ridge regression also achieved high accuracy with 0.854 R2 in the study of ABX3 perovskite bandgap [220]. The kernel‐based model can effectively handle nonlinear relationships, is robust in the case of high‐dimensional feature space and small samples and can avoid data overfitting. However, the model parameters (such as kernel function type, penalty coefficient, etc.) need to be carefully adjusted, the training speed is not ideal for large data sets, and the model results are not as intuitive as linear models and have weaker interpretability [221].

6.1.3. Tree‐Based Models

Tree‐based models represent a significant category of ML algorithms, built upon the foundation of decision trees. These models are widely utilized for both classification and regression tasks. The core mechanism of tree‐based models involves performing conditional judgments on data features to recursively partition the dataset into smaller subsets, ultimately forming a tree‐structured decision framework [222]. The construction of decision trees is grounded in the principles of nodes and splits. Each internal node represents a test on a specific feature, while each leaf node corresponds to a class label or a numerical prediction. The dataset is split into subsets by selecting the optimal feature and its corresponding threshold. This splitting process is guided by three criteria: Information Gain, Gini Index, and Reduction in Variance [223]. The splitting is recursively applied to each subset until a stopping condition is met, such as reaching a maximum depth, a minimum sample size, etc. In addition to decision trees, most traditional models currently applied in the field of materials science are also tree‐based models. These include algorithms such as Random Forest [224], LightGBM [225], and eXtreme Gradient Boosting (XGBoost) [226], which have become widely utilized due to their robustness and effectiveness in handling complex datasets.

Tree‐based models are widely used in predicting PV material properties and often achieve high accuracy. Eren et al. [227] used the random forest model to predict the bandgap and device PCE of perovskites, and achieved high‐precision fitting of both results for a small amount of experimental data (8 compositions) (R2> 0.99 for absorption spectrum data and R2> 0.82 for JV characteristic data), which can accurately reproduce the experimental values. On the other hand, tree‐based models are also used to predict the band structure of perovskites. Mattur et al. [173] used RF to classify perovskite oxides as direct bandgap or indirect bandgap. The RF model achieved an accuracy of about 91%. Liu et al. [228] compared LR, KNN, SVR, RF, MLP and XGBoost to predict the bandgap of ABX3 type perovskites based on 227 sets of experimentally measured data collected from 1254 publications. XGBoost model achieved the highest predictive accuracy (RMSE = 0.055 eV, R = 0.99) while RF model also achieved high accuracy (RMSE = 0.064 eV, R = 0.98). Even compared to currently mainstream neural networks, tree models still perform well. Zhu et al. [229] compared ANN, GBDT, KNN, RF, and XGBoost to predict the decomposition energy, bandgap, and spectrally limited maximum efficiency (SLME) of 177 264 halide perovskites. The XGBoost model achieved the best performance, with an RMSE of 0.19 eV for decomposition energy, 0.20 eV for bandgap, and 1.77% for SLME, demonstrating high prediction accuracy and generalization capability. The advantages of tree‐based models include being able to automatically characterize high‐dimensional nonlinear relationships, being generally insensitive to input feature distributions, and having good robustness to small and medium‐sized data. In addition, random forests and other models have low risks of noise and overfitting and can provide a robust benchmark model. However, decision tree models are more dependent on data quality, and data bias may affect model generalization. Boosting models have more hyperparameters and require sufficient tuning to prevent overfitting. Compared with simple models, the results of tree models are difficult to directly parse into physical meaning (explanation methods are required). Nevertheless, the balance between comprehensive accuracy and interpretability makes tree models one of the common choices for predicting the properties of perovskite materials [230, 231].

6.1.4. Deep Learning

Deep learning extracts feature relationships through multi‐layer neural networks and have strong nonlinear fitting capabilities. In the research of perovskite materials, deep models have begun to emerge in recent years [22]. Multilayer feedforward neural networks (also known as ANNs or BPNNs) have demonstrated significant effectiveness in predicting various properties of perovskites. For example, a previous study successfully developed a BPNN model to simultaneously predict multiple key properties of perovskite oxides, including formation energy, thermodynamic stability, unit cell volume, and oxygen vacancy formation energy. The results indicated that the neural network achieved low prediction errors across all these properties, particularly excelling in the prediction of oxygen vacancy formation energy, outperforming other models tested. In contrast, other simple models have their own strengths (e.g. RF is better at forming energy, SVM is better at volume), while deep neural networks have achieved higher accuracy in all indicators [215]. This shows that deep learning models can capture complex mappings of different properties at the same time. In addition, deep learning can also be combined with more abstract descriptions, such as directly using atomic composition and structure as input. A representative example is the graph neural network (GNN) approach, particularly the Crystal Graph Convolutional Neural Network (CGCNN), which has shown strong performance in materials research [232]. CGCNN treats a crystal as a graph in which atoms act as nodes and bonds as edges, enabling the model to learn structure–property relationships directly without using predefined descriptors. Through convolutional operations on atomic graphs, it captures both local coordination and long‐range interactions. This method would be especially useful for complex systems like PIMs, where atomic arrangement and mixed‐anion coordination strongly influence stability and optoelectronic behavior [20]. Figure 8 illustrates the working mechanism of CGCNN.

FIGURE 8.

FIGURE 8

Illustration of the crystal graph convolutional neural networks. (a) Construction of the crystal graph. (b) Structure of the convolutional neural network on top of the crystal graph. Reproduced with permission [232]. Copyright 2018, Physical Review Letters.

In the field of perovskite solar cells, many major research efforts have employed neural network to simulate complicated device behaviors and performance relationships. Liu et al. [233] combined convolutional neural network (CNN) and RF with SHapley Additive exPlanations (SHAP) and Genetic Programming Symbolic Regression (GPSR) to predict both photovoltaic parameters (PCE, J sc, V oc, FF) and EQE spectra with high accuracy (RMSE ≤ 1.25%). Yan et al. [234] established an interpretable RF framework combined with SHAP to find a link between experimental characterization data and photovoltaic performance. The model accurately reproduced both the bandgap (R2 > 0.99) and PCE (R2 > 0.82) using optical absorption and JV curves from eight perovskite compositions as input. The SHAP‐based feature attribution further identified that the absorption edge (∼750 nm) and JV slope are the most influential factors governing PCE, demonstrating that ML can serve as a “virtual characterization tool” directly linking spectral features to device efficiency with excellent precision. Liu et al. [235] employed Random Forest (RF) and XGBoost models integrated with SHAP to predict multiple key photovoltaic parameters of perovskite solar cells, including PCE, V oc, J sc, and FF, based on 814 experimentally validated datasets. The optimized RF model achieved outstanding outcomes, with RMSE values of 1.58% (PCE), 0.051 V (V oc), 1.04 mA cm 2 (J sc), and 0.046 (FF), and correlation coefficients (r) above 0.85 for most targets. Moreover, SHAP analysis indicated that a moderate bandgap (1.25–1.5 eV), an appropriate energy offset (∼0.15–0.2 eV), and elevated carrier mobility are essential factors that improve device efficiency, providing robust physical interpretability and alignment with the SQ limit.

The advantages of deep learning are that it can automatically extract complex patterns from big data without the need to manually set specific nonlinear forms. Especially when the amount of data is large enough (for example, a perovskite database containing tens of thousands of samples), deep models are expected to further improve prediction accuracy. At the same time, deep models are easy to combine with physical priors (such as combining SQ limit to verify the relationship between bandgap and device parameters and used for joint optimization of materials and devices). Its disadvantage is that it has high requirements for data volume and quality. If the deep model is not trained with enough samples, it is easy to overfit, resulting in inaccurate predictions of unknown combinations. In addition, deep learning are also “black box” models that lack intuitive interpretability and require additional methods (such as feature visualization and sensitivity analysis) to explain their decision‐making basis. Therefore, for materials with limited data, such as PIMs, it is often necessary to carefully evaluate whether to use complex neural networks or data augmentation to ensure model reliability. Figure 9 (collected from ref [236]) summarises the usage of different ML algorithms on perovskite solar cell research and highlights the advantages and disadvantages of four main classes of supervised learning.

FIGURE 9.

FIGURE 9

The usage of different algorithms in 119 perovskite solar cell research papers reproduced with permission [236]. Copyright 2025, Elsevier.

6.2. Unsupervised Learning

Unsupervised learning, another important type of ML, involves analysing data without predetermined labels or categories. This approach enables computers to independently discover inherent patterns and categorizations within the data [237]. This method is based on the principle of identifying inherent structures in data, which is a key concept in the field of AI. It deviates from supervised learning, which depends on labelled datasets to train models [238]. The theoretical foundation of unsupervised learning revolves around the assumption that datasets, regardless of human‐imposed classifications, inherently exhibit inherent clusters or patterns that can be identified by algorithmic analysis [239]. Principal component analysis (PCA) and word embedding methods are two main approaches used for discovering solar materials and PIMs. PCA is a statistical dimensionality reduction technique that transforms complex, high‐dimensional datasets into a smaller set of uncorrelated variables, known as principal components, effectively capturing most of the variance in the original dataset. However, this method naturally has the risk of information loss, so it is necessary to carefully select the appropriate dimension during the dimensionality reduction process to achieve the best balance between information retention and model complexity [240, 241]. Hosni et al. [242] employed PCA to systematically reduce the dimensionality of perovskite materials data to optimize ML predictions of specific surface area, a critical factor influencing PV performance. The results show that PCA can significantly improve the prediction performance of the model in both the RF model and the SVR model. The difference is that in the SVR model, when the feature dimension reaches five, the model performance tends to be stable, but in the RF model, when the dimension reaches the optimal state (nine dimension), there is a downward trend. This result shows that PCA may be more suitable for algorithms that is sensitive to linear relationships in data. In addition, Mai et al. [243] employed PCA for dimensionality reduction in their ML work aimed at identifying organic spacer materials (ammonium salts) that could significantly enhance the PCE of perovskite solar cells. By reducing the original 16‐dimensional feature space to just two principal components, PCA substantially simplified the complexity of the predictive model. This dimensionality reduction not only streamlined the data representation but also markedly improved the model's predictive accuracy and generalization to novel, unexplored materials. In addition to PCA, word embedding techniques, rooted in natural language processing (NLP), have been effectively applied to mining and predicting PV materials from extensive literature databases. Zhang and He [244] utilized an NLP‐based unsupervised learning method, leveraging word embedding models to automatically extract hidden relationships between material compositions and their PV properties from textual databases, and successfully output well‐known solar materials including Si, GaAs, ZnO, CIGS, InP, c‐Si, CdS, GaInP, and InGaAsP. In addition to common solar cell materials widely reported in the literature, the NLP‐based approach also predicted several uncommon materials, such as As2O5; the optoelectronic properties of As2O5 were further investigated through first‐principles calculations, and the accuracy of the ML predictions was confirmed. Expanding the scope of word embedding techniques, Zhang et al. [245] developed a new NLP model specifically for non‐English scientific literature. They constructed a substantial database comprising 210 000 Chinese‐language abstracts on materials and chemical research. The model successfully screens out materials that are highly relevant to “solar cells” from the literature, identifies known materials (such as CH3NH3PbX3), and predicts potential new candidates (such as TiNb2O7, BiPO4, Y2O3). These new candidates were further verified by DFT under three evaluation metrics include bandgap (1.6 eV for TiNb2O7 and Y2O3, 2.1 eV for BiPO4), UV–vis absorption spectrum and three theoretical efficiencies: spectrally limited maximum efficiency (SLME, 26.7–28.1%), SQ limit (20.7‐3‐0.6%) and “potential energy loss” maximum achievable efficiency (16.0–24.9%). The results confirm the potential of these materials for solar cell applications.

6.3. Reinforcement Learning

Reinforcement Learning (RL) is a special type of ML where an agent learns to make decisions by carrying out actions in an environment towards achieving some goals. What sets RL algorithms apart is that they can learn what the best actions are through the experience of interaction with the environment, not learning from a pointed set of datasets [246, 247]. One of the main parts of RL is the policy, which allows the agent to execute the next step based on its current stage. Algorithms used in RL mainly fall into three classes: (1) policy‐based, (2) value‐based, and (3) model‐based methods. Regarding policy‐based methods, it directly learns the policy function that maps to state the probability of taking each action. The advantage of policy‐based methods is their effectiveness in high‐dimensional or continuous action spaces [248]. Value‐based methods, like Q‐learning and Deep Q‐Networks, learn the value of each action in a state from experiences and choose actions based on the values learned [249]. In contrast to policy‐based methods, value‐based methods focus on learning the value of each action in a given state rather than directly learning the policy. They are simpler and can be more effective when dealing with discrete action space [250]. Model‐based methods attempt to model the environment and then plan the workings of the model; they can be more sample efficient than model‐free methods (like policy‐ and value‐based), as they can use their model to simulate experiences rather than having to actually experience them [251, 252]. Currently, there is limited report on the application of RL in the field of material exploration. It is usually used for the optimization design of PV devices. For example, Jiang and Yoshie [253] proposed an RL method. The high‐dimensional refractive index data of more than 300 materials were reduced to a two‐dimensional environment space, and the materials and thickness of the multilayer solar absorber were automatically optimized. After 1000 iterations, the method designed a 5‐layer structure composed of MgF2, TiO2, Si, Ge, and Cu, with thicknesses ranging from 35.3 to 200.0 nm, and achieved an average absorption rate of 91% in the 250–800 nm band. On the other hand, Sajedian et al. [254] used a dual deep Q network to optimize a symmetric metamaterial consisting of a cylinder and two layers of film, generating 1250 perfect solar absorbers in about 35 000 steps from about 527 billion designs. The absorption rate of these structures in the 350–800 nm band exceeds 90% (up to 97.6%) and found the absorption rate in the 8–13 µm band is less than 10% (minimum 1.37%). These two studies demonstrate the application of RL in the efficient exploration of the design space of solar energy and perovskite‐related materials, respectively improving the absorption performance through multi‐material selection and structural optimisation and providing a new path for automated design in the PV field.

6.4. Generative Models

Generative models have recently become a breakthrough method in material science [255, 256]. They make it possible to inverse design new materials including perovskite and PIMs. Generative models are different from traditional supervised models, they attempt to understand the fundamental probability distribution of materials data, thereby enabling the independent creation of chemically valid structures with specified physicochemical properties. This ability makes it possible to systematically explore areas of compositional and structural space that have never been explored before, which can definitely accelerate the discovery of new materials. Currently, there are two main branches of generative models: Large Language Models (LLMs) and Diffusion Models.

LLMs rely on autoregressive token prediction, where the model learns contextual dependencies within sequential data to predict the next word or token, thereby capturing high‐dimensional relationships between syntax, semantics, and scientific knowledge. When trained on scientific papers, this mechanism enables the LLM to learn the co‐occurrence patterns between materials, properties, and numerical values that appear in research papers. As a result, LLMs can effectively capture structure–composition‐property relationships and extract key information such as bandgap and Ehull directly from literature. In this way, LLMs can rapidly construct reliable, experiment‐based databases by capturing and extracting key information from vast scientific literature, effectively addressing the long‐standing issue of data scarcity. Sipilä et al. [257] demonstrated the practical application of LLM in helping constructing PV600 database which contains bandgaps information of HOIPs, by extracting information from 238 431 open access paper. Figure 10a shows the conception of information extraction from literature.

FIGURE 10.

FIGURE 10

Illustration of generative models. (a) Demonstration of key information extraction by LLMs reproduced under terms of the CC‐BY license [257]. Copyright 2025, The Authors, Springer Nature. (b–d) Inorganic materials design with MatterGen reproduced under terms of the CC‐BY license [258]. Copyright 2025, The Authors, Springer Nature.

Diffusion models, on the other hand, learn how to turn Gaussian noise into realistic crystal structures by denoising them over and over. This gives them strong generative stability and structural diversity. Recent frameworks like MatterGen [258] has shown that it can make inorganic materials with lattice symmetry, formation energy, and electronic structure that can be controlled. MatterGen combines a crystal graph encoder with a denoising diffusion probabilistic model to sample atomic positions and lattice parameters at the same time. This makes it the best tool for structural validity and property control. Figure 10b–d demonstrates the complete inorganic material design workflow of MatterGen.

Although the application of generative models on perovskites and PIMs is still developing, these methods have already shown great potential for solving one of the biggest problems in material science: the lack of high‐quality large datasets. We believe that generative models will become a more reliable model development direction for material science in the future.

7. Model Optimization and Evaluation

During model training, the optimal values of parameters ω and b are typically determined by minimizing the discrepancy between the predicted output µ and the actual output y. This discrepancy is quantified through a loss function. For linear regression, the Mean Squared Error (MSE) is commonly employed:

Jw,b=12mi=0m1fω,bxiyi2 (11)

Where m is the number of training examples. One problem with linear regression is the possibility of underfitting because it seeks the unbiased estimate with the smallest MSE. If the model is underfitted, it will not be able to achieve good prediction results [259]. This occurs because linear regression seeks an unbiased estimate with the lowest MSE, which might not be flexible enough for complex data. To address this, certain regularization techniques, such as Ridge or Lasso regression, intentionally introduce a small amount of bias to the model, which can help reduce variance and improve predictive accuracy.

Gradient descent is an optimization method widely used to minimize the loss function by iteratively updating parameters [260]. The equation of gradient descent is listed below, and where λ is the learning rate:

b=bλJw,bbω=ωλJw,bω (12)

Coefficient ω and b will keep updating simultaneously, where:

Jw,bb=1mi=0m1fω,bxiyi (13)
Jw,bω=1mi=0m1fω,bxiyixi (14)

In the gradient descent, the choice of λ is crucial, when the value of λ is too big, the coefficient may not converge. If the λ is too small, the gradient descent can be very time‐consuming or even get stuck in a plateau. Thus, careful tuning of the learning rate is necessary for efficient convergence.

For logistic regression, the commonly used loss function is cross‐entropy loss, suitable for classification tasks, where predictions are probabilities between 0 and 1:

Jw,b=1mi=0m1Lossfw,bxi,yi (15)
Loss=yilogfw,b(xi1yilog1fw,bxi (16)

Moreover, cross‐validation and grid search are two commonly used hyperparameter tuning (optimizing) methods while performance metrics is for evaluation. The hyperparameter in ML refers to a parameter whose value is set before the learning process begins, as opposed to parameters that are learned from the data during training. Hyperparameters control the overall behavior of the learning algorithm, influencing how the model learns and performs [261]. Cross‐validation (CV) is a commonly used technique for evaluating the prediction accuracy of statistical models, especially when it comes to selecting the best model and fine‐tuning its hyperparameters [262]. The fundamental logic of CV is partitioning the data into some subsets, training the model using certain subsets, and then verifying it using the remaining part. There are many CV schemes, such as k‐fold CV, leave‐one‐out (LOO), and blocked CV. K‐fold is one of the most used methods, this is because the data sets in materials science are usually limited in size, and K‐fold cross‐validation can provide more stable and reliable performance evaluation on limited data. At the same time, the computational cost of K‐fold cross‐validation is relatively low and can be effectively implemented in practical applications. In addition, k‐fold cross‐validation can more comprehensively evaluate the stability and generalization performance of the model by splitting and validating the data multiple times [263]. LOO is an extreme case of K‐Fold, where K is equal to the size of the dataset. In each iteration, only one data point is used as the validation set, and the remaining data points are used as the training set. LOO can maximize the use of data, but the computational cost is high, especially for large datasets [264]. Blocked CV is always applied in dealing with time series data or data with a natural order, mainly used to process batch experimental data: for example, material samples produced in different batches may have different properties due to batch differences [265]. Therefore, it is not as versatile as k‐fold in the field of material science. Grid Search is another important hyperparameter optimization technique widely used in ML. It systematically searches every combination in a predefined parameter space to find the parameter settings that optimize the model performance. Different models have different hyperparameters, and these hyperparameters have a significant impact on the performance of the model [266]. For example, in SVM model, it can optimize the regularization parameter C and kernel function parameters [267], in the random forest algorithm, it can optimize the number of trees (n_estimators), maximum depth (max_depth), the minimum number of sample leaf nodes (min_samples_leaf), etc [268]. And optimizing learning rate, number and size of hidden layers, activation functions, etc. in neural networks [269]. Performance metrics are used to evaluate and compare model performance. Commonly used metrics include MSE, percentage of absolute difference (PAD%), accuracy, Precision, Recall, F1‐score, coefficient of determination (R2), Pearson coefficient (r), and Area Under the Receiver Operating Characteristic curve (AUC‐ROC). The explanations of these metrics can be found in Equations (11), (17)–(19) for regression and (15), (20)–(23) for classification.

PAD%=experimentalpredictedexperimental100 (17)
R2=1iyitureyipred2iyiturey¯itrue2 (18)
r2=iyiturey¯itrueyipredy¯iprediyiturey¯itrue2yipredy¯ipred2 (19)
Accuracy=TP+TNTP+FP+TN+FN (20)
Precision=TPTP+FP (21)
Recall=TPTP+FN (22)
F1=2×Precision×RecallPrecision+Recall (23)

Where TP is the true positive rate, FP is the false positive rate, TN is the true negative rate, and FN is the false negative rate.

8. Model Interpretability

With the development of ML models, the complexity of models is also increasing. Being able to understand them has become an important part in material science. This makes sure that predictions are not only correct but also has certain physical meaning. Interpretability connects data‐driven correlations with a basic understanding of physics. This makes it possible to find structure‐property relationships that can help with smart material design. In this section, we summarize important interpretability frameworks, such as SHAP, symbolic regression, and attention mechanisms. These frameworks go beyond just making predictions to give us mechanistic and quantifiable information about how materials behave.

8.1. SHAP

Among interpretability techniques, SHAP has become one of the most powerful and widely used approaches for quantifying feature contributions in complex models. SHAP attributes a model's output to individual features by estimating their marginal contribution relative to all possible feature combinations. This additive explanation ensures both local consistency (for a single prediction) and global interpretability (across the dataset) [270]. In materials science, SHAP allows researchers to rank and see how various structural, compositional, and electronic descriptors affect a property of interest. For example, SHAP has shown that ionic radii, electronegativity differences, and tolerance factors have the most effect on the bandgap, formation energy, and thermodynamic stability of perovskite [123, 271]. Apart from feature ranking, SHAP is also able to capture the nonlinear relationship among features by dependence plots and interaction values which greatly reduce the loss of information caused by directly removing correlated features. Figure 11a shows SHAP quantitatively attributes the model output to each feature by evaluating its marginal contribution across all possible feature combinations. The horizontal distribution of SHAP values indicates both the direction and magnitude of each feature's influence, where red and blue represent high and low feature values, respectively.

FIGURE 11.

FIGURE 11

Illustration of three representative interpretability techniques. (a) SHAP Beeswarm plot [270]. (b) Symbolic regression. (c) Attention mechanisms.

8.2. Symbolic Regression

Symbolic regression, on the other hand, tries to find models that are straightforward to understand. Instead of trying to provide meaning of a trained model, symbolic regression looks for a clear, short, and human‐readable mathematical expression that fits the data during training. This mathematical equation (for example y=a·x1+b·x22++c·xn4) is itself a model, thus providing global interpretability. Symbolic regression creates a white‐box model that lets users see how the input features affect the output by showing them how they are related mathematically. This is particularly useful in material science and other scientific research areas that need to find underlying mechanisms or physical laws. It is a method that truly ensures interpretability from the source of model construction. Figure 11b describes the process of symbolic regression, where the algorithm constructs an analytical equation tree that directly represents the mathematical relationship between features and output, thereby offering a transparent and physically meaningful model.

8.3. Attention Mechanisms

Unlike SHAP, which explains predictions after a model is trained, or symbolic regression, which explicitly discovers mathematical relationships, attention mechanisms provide intrinsic interpretability within deep learning models [272, 273]. Attention does not analyse the model after training; instead, it dynamically learns the importance of each input for making a prediction during training. It does this by calculating a similarity score for each input or connection and converting it into attention weights using the Softmax function [274]. The Softmax function normalizes these scores so that their sum is 1, visually demonstrating the relative importance of each input feature or atomic environment relative to the others. The attention mechanism enables ANNs to focus on the most critical parts of the data while deemphasizing other information. Figure 11c shows the workflow of an attention mechanism, in which the model computes similarity scores between query and key vectors, applies a Softmax normalization to obtain attention weights, and aggregates the weighted values to form the attention output.

9. Applications of ML in Perovskite and Perovskite Inspired Materials

ML has significantly impacted materials science, especially in new materials discovery. One of the earliest notable applications of ML in this field was during the 1960s with the DENDRAL project, which was among the first computer programs designed for chemical synthesis prediction [275]. In recent years, ML has also significantly contributed to exploring and optimizing energy materials, particularly in developing high‐efficiency halide perovskite solar cells [276]. The achievements in this area primarily focus on using ML algorithms to predict material properties, including bandgap, formation energy, thermodynamic stability, crystal volume, defect energy level, etc., that are crucial for enhancing solar cell performance [215, 277]. This section highlights successful ML applications in exploring various perovskite materials and PIMs with their properties.

9.1. Inorganic Perovskites

Earlier research on perovskite using ML dates back to 2007; Javed et al. [278] reported their work using ML on lattice constant prediction of orthorhombic ABO3 perovskites. SVR algorithm was used in their work; the final model has the versatility of the prediction of the lattice constant of other structurally known perovskites. As a regression work, there were five features used as input patterns, which are ionic radii of cation A, rA and B, rB, electronegativity of cation A, xA and B, xB, and valence of cation A, zA, while the target was experimental lattice constants a,b or c. In the model training stage, datasets were separated into a training set and testing set first; then, the training set was further divided into a validation set and training set; this step was achieved by cross‐validation under four‐fold [279]. ANN was also used with its predicted time cost and accuracy compared against the SVR (ground‐truth) values. In terms of time cost, the smallest gap was observed in the test group for the b lattice constant (the SVR's performance was twice that of the ANN), whereas the largest gap appeared in the c lattice constant training group (nearly a 100‐fold difference). For prediction performance, the coefficient of result accuracy (PAD) was used. Across all three lattice constants, the average PAD remained under 0.7%, significantly outperforming the ANN's predictions. The complete workflow of this work can be seen from Figure 12a, while the model performance comparison can be seen from Figure 12b,c. Although this work was done very long ago, the ML algorithm, data splitting method, and overall process are still widely used nowadays.

FIGURE 12.

FIGURE 12

Lattice constant prediction (a–c) reproduced with permission [278]. Copyright 2007, Elsevier, and narrow‐bandgap prediction of inorganic perovskites (d–f) reproduced with permission [283]. Copyright 2024, Springer Nature. (a) The complete workflow of the SVR model. (b) Average PAD performance of ANN and SVR models on training set. (c) Average PAD performance of ANN and SVR model on testing set. (d) The ML workflow for narrow‐bandgap prediction. (e) Accuracy comparison of four algorithms on training and testing set. (f) AUC‐ROC for four algorithms on the testing set.

In recent years, numerous studies have combined ML with research on inorganic perovskites, employing various algorithms to focus on different properties such as ionic conductivity [280], dielectric breakdown strength [281], and maximum magnetic entropy change [282]. But most ML work on perovskites still centers on bandgap prediction. In 2024, Li et al. [283] proposed a method by using the XGBoost algorithm to predict the narrow bandgap range of inorganic halide perovskites. Data for 447 ABX3‐type inorganic halide perovskites were collected from the Materials Project database, covering the bandgap range of 0 to 6 eV, and coded samples with bandgaps between 0 and 3 eV as Samples from 1, 3 to 6 eV are coded as 0, and the data set is divided into a training set and a test set at a ratio of 4:1. In feature engineering stage, by using the Matminer tool, more additional features were generated. Afterward, the researchers performed data standardization to eliminate unit differences between features and identified and removed highly correlated features by calculating the Pearson correlation coefficient [284], ultimately retaining 116 valid features. for model training. The research team selected four classification algorithms: XGBoost, Support Vector Machine (SVC), Multilayer Perceptron (MLP), and RF, and optimized the hyperparameters of these algorithms through ten‐fold cross‐validation and RandomizedSearchCV to improve predictions accuracy and generalization ability of the model. Model performance was evaluated through indicators such as AUC‐ROC, accuracy, precision, recall, and F1 score. The results showed that the XGBoost model performed best on the test set, with an accuracy of 95%. To further interpret the model output, the researchers used SHAP [285] and identified the electron negativity range as an important factor affecting the bandgap. The larger the electron negativity range, the higher the possibility that the perovskite has a narrow bandgap. Compared to this work and the past ML work, we can find that there are many stages have been optimized, such as the application of Matminer for feature engineering, cross‐validation with more folds, hyperparameter optimization using RandomizedSearchCV (random sampling of parameter combinations which is more efficient than Grid Search where the parameter space is large and the optimal parameter range is uncertain.) and a more comprehensive evaluation of the ML model. The complete workflow can be seen in Figure 12d, and the performance of XGBoost and comparison with other models can be seen in Figure 12e,f.

9.2. HOIPs

One classic ML work on HOIPs was completed by Lu et al. in 2018 [286]. The dataset was collected from previous high‐throughput first‐principles calculations, containing 346 HOIP samples. From these, 212 HOIP compounds with orthorhombic crystal structures and bandgaps calculated using the PBE functional were selected for training; 212 compounds were then divided into a training dataset (80%) and a test dataset (20%). Feature engineering involves selecting and evaluating the initial 30 features (such as ionic radii, tolerance factor, and electronegativity) to describe HOIPs in the chemical space. The GBR algorithm is used for feature selection, employing a “last‐place elimination” procedure to exclude features with less impact on the bandgap, ultimately selecting 14 most important features. In ML training and evaluation stage, six algorithms were chosen under five‐fold cross‐validation, which are GBR, SVR, KRR, GP, DT, and Multilayer Perceptron Regression (MLPR), and evaluating their performance using metrics including the R2, r, and MSE. Among these, GBR was found to be the best‐performing model. Model validation involves predicting the bandgaps of all 5504 possible HOIP structures using the trained ML model, initially screening 1669 HOIPs with appropriate bandgaps and further narrowing down to 218 HOIPs suitable for PV applications. Further DFT calculations were conducted to validate the electronic structure, thermal stability, and environmental stability of the selected HOIPs. As shown in Figure 13a, the workflow involves feature engineering, statistical inference, and ML‐based screening to predict potential HOIP candidates, followed by DFT‐based validation for global optimization, bandgap calculation, and stability assessments. The predicted bandgaps of six newly identified lead‐free HOIPs are illustrated in Figure 13b, where a comparison between ML predictions and PBE‐calculated values is provided. These materials exhibit suitable bandgaps for solar cells, making them promising candidates. Additionally, the crystal structures of two screened‐out HOIPs are visualized in Figure 13c, highlighting their octahedral coordination and molecular configurations.

FIGURE 13.

FIGURE 13

Bandgap prediction of HOIPs (a–c) reproduced under terms of the CC‐BY license [286]. Copyright 2018, The Authors, published by Nature Publishing Group UK, (d–f) reproduced with permission [18]. Copyright 2019, Elsevier. (a) The ML framework designed by Lu et al. (b) ML model accuracy comparison with DFT for screened six new HOIPs candidates with suitable bandgap. (c) The final two candidates have direct bandgap and excellent environmental stability. (d) The ML framework designed by Wu et al. (e) Fitting result of DFT bandgap and GBR predicted bandgap. (f) Four representatives newly discovered HOIPs.

In 2019, Wu et al. [18] used the same target‐driven method as Lu et al. [286] and successfully selected 132 nontoxic and stable candidates from 38230 potential candidates. Notably, unlike the traditional 4:1 train‐test split ratio, Wu et al. used a 90% training and 10% testing split. This split allows the model to better capture the complex features of the data, thereby improving its predictive capability. Additionally, having more training data makes the model more stable and less susceptible to being affected by individual outliers, enhancing its robustness; the potential generalization issues caused by the smaller 10% testing set are addressed through the application of ten‐fold cross‐validation. This method provides a potential solution for future applications of ML in new material discovery, especially when faced with the challenge of insufficient datasets. Figure 13d illustrates the screening process, where charge neutrality and stability filtering reduce the initial dataset to 38086 candidates. ML models (GBR, SVR, KRR) further refine the selection before DFT validation. Figure 13e shows a strong correlation (R2 = 0.827) between ML‐predicted and DFT‐calculated bandgaps, confirming model accuracy. Figure 13f presents the crystal structures of selected non‐toxic HOIPs with suitable bandgaps for solar cell applications.

9.3. Double Perovskites

In 2022, Liu et al. [287] collected bandgap data for 236 double perovskites from approximately 60 peer‐reviewed publications, including 62 A‐site doped perovskites, 110 B‐site doped perovskites, and 43 A‐site doped perovskites, B‐site co‐doped perovskites and 21 pure perovskites and used 80% of the data set for training and 20% for testing. The feature engineering stage selected 42 initial features related to the electronic structure, the relative position of elements in the periodic table, and their physical properties. The importance of the features is evaluated by calculating the Pearson correlation coefficient between each feature and the bandgap value, and a weighted average of the features according to the A and B positions ensures uniformity. The bandgap value of double perovskite oxide was predicted using RFR, feature selection was done using Recursive Feature Elimination (RFE) and Univariate Feature Selection (UFS), and the optimum feature combination was used to develop the model. The model's performance was assessed using RMSE and R2. The finally selected model had an R2 value of 0.932 and an RMSE of 0.196 eV on the test set, showing high prediction accuracy. The research team then manually generated the A′1‐xA′′xB′1‐yB′′yO3 candidate data set and finally obtained 4058905 stable DP oxide materials by screening stability and electroneutrality and applied training. The good model predicted the bandgap values of these materials and screened out 75 723 materials with bandgaps between 1.1 and 1.7 eV, suitable for PV applications. Finally, the model was verified using a validation set that was independent of the training set. The results showed that the bandgap values predicted by the model were in good agreement with the experimental values, verifying the high accuracy and reliability of the model. Figure 14a outlines the ML workflow, from dataset preparation and feature selection to stability screening and bandgap prediction, identifying 75 723 promising candidates. Figure 14b demonstrates the effects of different feature sets and feature selection methods on the prediction performance (R2 and RMSE) of the random forest regression model. Figure 14c highlights the feature importance and test set prediction performance metrics for the best random forest model (M1), identifying rspA, nveB, and vβ as key influential features. Figure 14d presents the bandgap prediction distribution of 9576 BiFeO3‐based double perovskites using model M1, with color intensity indicating the electronegativity of the doped B″‐site element; the inset specifically illustrates the distribution of doped elements among the 236 best candidate materials with bandgaps between 1.46 and 1.7 eV, highlighting Rh, Pd, and Ir as impactful dopants. In 2023, Talapatra et al. [288] construct a multi‐model task for discovering ideal double perovskite. They compiled a dataset of bandgap values for over 5000 materials sourced from the Materials Project database, calculating their bandgaps using DFT. The materials were categorized as either narrow bandgap (<0.5 eV) or wide bandgap (≥0.5 eV). Corresponding descriptors were generated from the dataset, which, combined with the bandgap data, facilitated the construction of two distinct ML models: a binary classification model distinguishing narrow from wide bandgap materials, and a regression model predicting specific bandgap values for materials identified as wide‐bandgap candidates. Hyperparameter tuning was performed using ten‐fold cross‐validation and RandomizedSearchCV, significantly enhancing the predictive accuracy and generalization of the ML models. Model performance metrics included the AUC‐ROC, accuracy, precision, recall, and F1‐score, with the XGBoost algorithm achieving optimal performance and an accuracy of 95% on the test dataset. The optimized models were subsequently applied to an extensive chemical space comprising 68 elements that could potentially form thermodynamically stable double perovskites. This process systematically screened for wide‐bandgap candidate materials, resulting in the identification of 13 589 cubic oxide perovskite compounds. Among these, 310 were predicted with more than 90% confidence in their stability and formability, marking them as promising candidates for further investigation. To provide deeper insights into the ML models' predictions, SHAP analysis was employed. This analysis highlighted the electronegativity range as a critical feature influencing bandgap predictions, revealing that a larger electronegativity range correlates with an increased likelihood of narrow‐bandgap perovskites. Notably, this methodology mirrors the approach previously employed by Li et al. [283], confirming the robustness and replicability of this ML workflow across different inorganic perovskites or PIMs. Additionally, the re‐verification of electronegativity as a critical determinant underscores its broader relevance in predicting perovskite bandgaps. Figure 14e summarizes the ML model development process, outlining steps from feature selection and model validation to the identification of high‐confidence oxide perovskite candidates. Figure 14f details the feature importance analysis, emphasizing key electronic and structural parameters influencing the bandgap classification, alongside the confusion matrix assessing classification performance. Figure 14g presents a comparative analysis of predicted versus DFT‐calculated bandgap values, demonstrating a strong correlation (R2 = 0.84) and low MAE = 0.21 eV, reinforcing the reliability and accuracy of the ML model predictions.

FIGURE 14.

FIGURE 14

Bandgap prediction of doped double perovskites (a–d) reproduced with permission [287]. Copyright 2022, Elsevier, and application of multi‐model for both classification and prediction of double perovskites (e–g) reproduced under terms of the CC‐BY license [288]. Copyright 2023, The Authors, published by Springer Nature. (a) ML framework designed by Liu et al. for predicting the bandgap of doped double perovskites. (b) Effects of different feature sets and feature selection methods on the prediction performance (R2 and RMSE) of the random forest regression model. (c) Feature importance and test set prediction performance metrics for the best RF model (M1). (d) Bandgap prediction distribution of 9576 BiFeO3 based double perovskites using model M1, where the color indicates the electronegativity of the doped B″‐site element (the inset shows the specific distribution of doped elements in the 236 best candidate materials with bandgaps between 1.46–1.7 eV). (e) ML framework designed by Talapatra et al. (f) Feature importance analysis. (g) Comparison of actual and calculated bandgap predictions.

9.4. Chalcogenide Perovskites

In 2023, Sharma et al. [289] used ML and successfully identified that Ca doping at the Ba site is superior to Ti‐doping at the Zr site. The goal of this research was to identify the optimal dopant to adjust the bandgap of BaZrS3 to the optimal range for high‐efficiency PV devices. Since BaZrS3 has a direct bandgap of 1.7–1.8 eV, which exceeds the optimal value for single‐junction solar cells (about 1.3 eV), doping is required to reduce the bandgap. Through DFT calculations, a database was created containing 35 different dopants with doping concentrations of 8.33%, 12.5%, and 25%, and calculated the defect formation energies of different dopants to evaluate the stability of the doped structure. In the feature engineering stage, some important chemical descriptors were selected from the Mendeleev database, such as atomic radius, the heat of formation, density, electron affinity, Pauling electronegativity, Glawe number, covalent radius, and dipole polarizability, etc. These descriptors represent various chemical and physical properties of elements. At the same time, to apply these chemical descriptors to the doped BaZrS3 structure, a weighted average of the chemical descriptors of each element was performed. The weighted average is obtained by multiplying each descriptor by the number of corresponding elements and dividing its sum by the total number of elements. For example, for BaZrS3 doped with calcium and titanium, the weighted average of the chemical descriptors is calculated as follows:

ηweighted=ixiηiN (24)

Where xi is the number of elements, i, η i is the chemical descriptor of element i, and N is the total number of elements. RFR and Crystal Graph Convolutional Neural Network (CGCNN) were used. RFR was used to predict the bandgap and formation energy and calculated feature importance through the Gini index; CGCNN is based on the deep learning method of graph theory to encode the atomic attributes and relationships in the crystal structure. The data set was divided into a training set (75%) and a test set (25%), and model performance was evaluated by mean absolute error (MAE) and correlation coefficient (R2). The MAE of the bandgap prediction is 0.14 eV, the R2 of the training set is 0.964, and the R2 of the test set is 0.762; the MAE of the formation energy prediction is 0.02 eV/atom, the R2 of the training set is 0.971, and the R2 of the test set is 0.797. Ca in the A‐site (Ba) and Ti in the B‐site (Zr) was identified as the best dopants by filtering criteria based on bandgap within the Shockley‐Queisser limit (1–1.5 eV) and structural stability. Experimental validation confirmed the accuracy of the theoretical forecast by producing Ca‐doped BaZrS3 thin films and conducting measurements on their photoluminescence and bandgaps. Ca‐doped BaZrS3 films were synthesized using chemical vapor deposition and the success of doping and structural integrity was confirmed using X‐ray diffraction, scanning electron microscopy, and X‐ray photoelectron spectroscopy. Experimental results showed that Ca doping significantly reduced the bandgap of BaZrS3 from about 1.75 to about 1.26 eV, and the doping concentration is lower than 2 at%. In terms of bandgap tuning, Ca doping at the Ba site is better than Ti doping at the Zr site. This study demonstrates the great potential of ML technology in accelerating PV material discovery and optimization. Figure 15a illustrates the effect of A‐site (Ba) Ca doping and B‐site (Zr) Ti doping on the bandgap of BaZrS3, reducing it from 1.7 to 1.26 eV and 1.4 eV, respectively. Figure 15b shows the ML‐predicted vs. HSE‐shifted bandgaps, with Ca and Ti doping highlighted near the Shockley‐Queisser limit. Figure 15c presents feature importance analysis, where heat of formation and electronegativity (Pauling scale) are the most influential descriptors in bandgap prediction.

FIGURE 15.

FIGURE 15

Applications of ML on chalcogenide perovskites (a–c) reproduced with permission [289]. Copyright 2023, American Chemical Society, (d–g) reproduced with permission [290]. Copyright 2019, John Wiley and Sons. (a) Schematic diagram of doping of BaZrS3 in Sharma et al.’s work. (b) Comparison of predicted bandgap and HSE shifted bandgap. (c) Feature importance analysis chart. (d) Workflow of ML designed by Agiorgousis et al. (e) Pearson correlation matrix among 12 input features and bandgap. (f) and (g) Two ideal double chalcogenide perovskites for potential PV application.

Moreover, Agiorgousis et al. [290] applied ML on the discovery of double chalcogenide perovskites for PV application. In their work, DFT was first used to calculate the structural and electronic properties of 220 initial compounds to ensure that these compounds had stable oxidation states. In the feature engineering stage, the atomic properties, including ionization potential, Pauling electronegativity, and atomic radius in the periodic table, are used as input features. In this work, there are two ML model selection stages. For the first time, the researchers used SVM, RFR, and KRR to directly predict the material's bandgap; each of them conducted 200 runs of training and testing under a ratio of 8:2 on input datasets, and the best hyperparameters of each model were selected through five‐fold cross‐validation. However, the performance of each model on different training set and test set splits is quite different; the test error and standard deviation for each model was SVM: 0.457 ± 0.28 eV, KRR: 0.514 ± 0.294 eV, and RFR 0.466 ± 0.226 eV. Since the predictions of these regression models were not stable enough, classification methods were applied. Due to the poor performance in regression work, KRR was removed, and only retained SVM and RF in the classification work. Classification of materials into potential PV absorbers (bandgap between 0.7 and 2.0 eV) and non‐potential absorbers (bandgap outside this range), the result showed the average accuracy of Random Forest Classifier (RFC) in five‐fold cross‐validation is 86.4%, which outperforms SVM. The trained SVM model was then applied to the entire compound space to screen out more than 450 materials with potential PV properties, which were further evaluated in detail for optical absorption, thermodynamic stability, and kinetic stability. By calculating the dielectric constant, formation energy, etc. of the materials, Ba2AlNbS6, Ba2GaNbS6, Ca2GaNbS6, Sr2InNbS6, and Ba2SnHfS6 were identified that have optimal bandgaps and significant optical absorption properties and are also thermodynamically and kinetically stable. Finally, through band structure calculations, it was found that these materials have nearly degenerate indirect and direct bandgaps, as well as low carrier effective masses, making them suitable as efficient PV absorbers. In particular, Ba2AlNbS6 and Ba2GaNbS6 have excellent performance in bandgap value and absorption performance and are the most promising solar energy absorption materials. There are many points worth learning and discussing in this work. For example, when the regression task cannot achieve the expected results, the classification task can be used as an alternative. Moreover, the researchers also found that after removing the low correlation features of A‐site cations, the prediction performance of the model was improved. The test errors of RFC and SVM after removing the features of A‐site cations were reduced to 13.28% and 16.70%, respectively, while including the A‐site. The test errors for cationic characteristics are 13.80% and 31.63%, respectively. It can be found that this result is different from the previous example, as discussed in the inorganic perovskite section about Li et al.’s work, which removed high correlation features to optimize model performance. This work removed low‐correlation features and obtained the model's performance optimization. The main reason for this difference is due to the difference in data sets and number of features. When processing a large number of feature sets, high correlation features need to be removed, while for this work, when there are only 12 feature sets, low correlation features need to be removed to remove noise and reduce dimensionality [291, 292]. And why in this work, the A‐site cation has a relatively small effect on the bandgap, which varies in the range of 0.05–0.3 eV. This is different from previous reports that the A‐site electronegativity can effectively affect the bandgap; whether this difference is due to the double chalcogenide perovskite structure is still unknown. Figure 15d presents a DT and RF model utilizing ionization potential, electronegativity, and atomic radius to classify potential absorbers based on bandgap range. Figure 15e shows the Pearson correlation matrix, indicating the relationships between bandgap and elemental descriptors. Figure 15f,g displays the electronic band structures of Ba2GaNbS6 and Ba2AlNbS6, respectively, highlighting their band dispersion and potential suitability as absorbers.

9.5. Chalcohalide Materials

In 2022, Ming et al. [293] utilized DFT computational screening coupled with experimental synthesis to identify stable, lead‐free, defect‐tolerant chalcohalide materials. Their investigation pinpointed CuBiSCl2 as a promising candidate with an optimal bandgap of approximately 1.37 eV, validated by comprehensive electronic structure and defect analyses. Although their work did not incorporate ML, their detailed computational screening workflow provides valuable insights applicable to future ML‐driven studies. Specifically, they noted that due to the well‐known underestimation of bandgaps by PBE calculations, broadening the initial bandgap screening range to 0.5–1.6 eV rather than the practical ideal of 1.1–1.6 eV is advisable. Notably, their search space from Material Project, which included cations (In, Sn, Sb, Bi) with lone pairs and large atomic numbers and anions comprising chalcogens (S, Se, Te) and halogens (Cl, Br, I), resulted in only 193 candidates, highlighting significant data shortage compared with perovskites and the pressing need for more extensive computational exploration within PIMs. Figure 16a illustrates the workflow and the structure of the identified compound.

FIGURE 16.

FIGURE 16

Applications of DFT and ML on chalcohalide materials. (a) High‐throughput screening refined by hybrid functional (HSE+SOC) calculations reproduced with permission [293]. Copyright 2022 John Wiley and Sons. (b) Computer‐aided‐design workflow for chalcohalide structure identification reproduced under terms of the CC‐BY license [294]. Copyright 2018, The Authors, published by Royal Society of Chemistry. (c) ML framework followed by multiple screening criteria including stability, electronic properties, and optical properties reproduced with permission [295]. Copyright 2019, American Chemical Society. (d) CNNs for predicting stability, bandgaps, and optical absorption coefficients of pnictogen chalcohalides reproduced under terms of the CC‐BY license [296]. Copyright 2024, The Authors, published by John Wiley and Sons.

In 2018, Davies et al. [294] proposed a method combining high‐throughput screening and structure prediction ML algorithm aiming to search for new chalcohalide materials for PV applications. The complete workflow and result can be seen in Figure 16b. First, the SMACT library and solid‐state energy (SSE) scale were used to perform preliminary compositional screening of 32 million compounds to ensure that the candidate compounds met the charge neutralization and electronegativity balance requirements and narrowed the candidate compounds to 161 000. Then, the SSE scale was used to screen compounds with a suitable bandgap range (1.5–2.5 eV) to further narrow the candidate range. Next, Pymatgen's structure replacement algorithm was used to predict the crystal structure of the candidate compounds by analogy with known crystal structures. This step provides a preliminary crystal structure model for the candidate compounds. Then, the USPEX evolutionary algorithm was used to perform a global structure search to find the possible lowest energy crystal structure. This stage overcomes the limitations of the analog prediction method and is able to identify new structural types. Through DFT calculations, the researchers evaluated the thermodynamic and kinetic stabilities of these compounds, determining their relative energy positions and their stability relative to phase transitions and decomposition. Four new unreported metal halide sulfide compounds (Sn5S4Cl2, Sn4SF6, Cd5S4Cl2, and Cd4SF6) were identified. Despite their slight metastability at 0 K (energy deviations within 100 meV/atom from the convex hull), these materials exhibited significant synthetic feasibility.

In 2019, Ma et al. [295] combined DFT and ML to accelerate the discovery of 2D chalcohalide materials with excellent optoelectronic properties as can be seen in Figure 16c. First, DFT was used to calculate the geometric and electronic properties of 300 2D chalcohalides as datasets. Four different algorithms were trained and evaluated, which are SVR, RFR, Bagging, and GBR. The GBR model performed best in the evaluation of ten‐fold cross‐validation with the lowest mean square error (MSE = 0.086) and the largest coefficient of determination (R2 = 0.835). This GBR model was then used to predict the electronic properties of 5000 potential 2D chalcohalide, 411 compounds were screened based on criteria including bandgap, toxicity, cost and kinetic stability. From the 411 materials screened, high‐cost materials were further excluded, and 73 candidate materials were finally selected. These materials were verified by detailed DFT calculations, including evaluation of kinetic stability, effective carrier mass, and carrier mobility. Among the 73 materials, six stable materials with appropriate bandgaps and high carrier mobility were identified. In particular, Bi2Se2Br2, Bi2Se2BrI, and Bi2Se2I2 showed excellent light absorption ability and optoelectronic properties, making them very suitable for PV applications. This study also introduced the concept of distorted stacking octahedron factor (DSOf) as an improved structural descriptor, which significantly improved the prediction accuracy of ML models. This approach not only accelerates the material discovery process but also provides a general framework for exploring other complex material systems. López et al. [296] employed an integrated bottom‐up approach combining ML models with DFT calculations to predict and optimize the properties of pnictogen chalcohalide for use in PV applications as can be seen in Figure 16d. They constructed a comprehensive dataset by performing DFT calculations on 125 different compositions of pnictogen chalcohalides with general ABC formula (where A = Bi, Sb; B = S, Se; and C = I, Br), focusing on key properties such as thermodynamic stability, energy bandgaps, and optical absorption coefficients. CNNs were used to predict these properties across a much larger compositional space, generating predictions for 9,261 possible candidates. Feature engineering was conducted by using the stoichiometry of the materials as the primary input features, and the model performance was evaluated using MAE through 20‐fold cross‐validation, ensuring robust and reliable results. Although RF models were also tested, they were found to be less accurate compared to CNNs and thus played a secondary role, primarily serving as a benchmark for model comparison. After obtaining the CNNs predictions, they filtered the results to focus on thermodynamically stable compositions, using a threshold of 0.1 eV per atom for the formation enthalpy, below which materials were considered stable against phase segregation. This filtering process narrowed down the candidate materials to those with the most promising properties for PV applications and successfully identified Bi0.3Sb0.7SeI as the ideal absorber layer.

10. Summary of Applications

Here, we summarize the application of ML on different perovskites and PIMs, as shown in Table 2. The table systematically shows all the applications of ML algorithms we mentioned in previous parts, including target prediction, the algorithms used, and their evaluation results.

TABLE 2.

Summary of ML applications on different materials with used algorithms and evaluation (metrics).

Material type Target Algorithms Evaluation
Inorganic perovskites lattice constant SVM, ANN SVM: PAD < 0.7% on training data, < 1% on testing data
Inorganic halide perovskites Bandgap classification XGBoost, SVC, MLP, RF XGBoost: Precision 95%, Recall 78%, F1 Score 0.86, AUC 0.89
HOIPs Bandgap prediction GBR, SVR, KRR, GPR, DTR, MLPR GBR: R2 = 0.97, Pearson's r = 0.985, MSE = 0.085
HOIPs Bandgap prediction GBR, SVR, KRR GBR: R2 = 0.827, MAE = 0.377, MSE = 0.201
Double perovskites Bandgap prediction RFR R2 = 0.932, RMSE = 0.196 eV
Double perovskites Bandgap classification and regression RFR, RFC Classification AUC = 0.98, Regression MAE = 0.18 eV, R2 = 0.86
Chalcogenide perovskite Bandgap prediction and formation energy RFR MAE = 0.14 eV for bandgap prediction, MAE = 0.02 eV for formation energy
Chalcogenide double perovskites Bandgap prediction SVM, RFR, KRR (regression); SVM, RFC (classification) Regression: SVM MAE = 0.457 eV, RFR MAE = 0.466 eV; Classification: RFC accuracy = 86.4%
Chalcohalides Bandgap and stability prediction

Pymatgen (structure prediction),

USPEX (global structure search), DFT

Energy deviation < 100 meV/atom, Bandgap: 0.9‐2.75 eV
2D octahedral chalcohalides Bandgap and carrier mobility prediction GBR, SVR, RFR, Bagging GBR: R2 = 0.835, MSE = 0.086
Pnictogen chalcohalides Bandgap, stability, and optical properties CNN, RF MAE for energy and optical properties predictions, Convex hull stability analysis

11. Conclusions and Outlook

In conclusion, integrating machine learning into the design, discovery and optimization of non‐toxic and stable polymeric materials, along with other associated material systems, represents a transformative step toward next‐generation solar absorber materials. Rapid advances in computational power and the growing availability of materials data have enabled ML techniques to address persistent challenges in toxicity, stability, and scalability more effectively than traditional trial‐and‐error approaches.

In this review, we presented a comprehensive analysis of ML applications in perovskites and PIMs, highlighting shared chemical and structural characteristics, while emphasizing the major challenges, particularly the scarcity of high‐quality datasets for emerging PIMs. To address this, we outlined two complementary strategies: multi‐fidelity learning, which integrates datasets of varying accuracy and cost to improve robustness; and active learning, which accelerates data generation by prioritizing the most informative samples.

One promising strategy for advancing PIMs is the transfer of established ML workflows from halide perovskites. We demonstrated how transferable descriptors, particularly electron affinity, a strong predictor of electronic structure and stability, can enable accurate predication in PIMs with minimal model retraining. We also introduced automated feature engineering tools such as Matminer and discussed how its integration with techniques like RFE and PCA enhances model performance and interpretability.

We also reviewed a range of ML algorithms, from classical models like SVM and RF, to modern techniques such as gradient boost (e.g., XGBoost) and generative models, which enable inverse materials design by proposing novel crystal structures. We further highlight the importance of model interpretability, as means to link ML outputs to physically meaningful insights and broaden accessibility across different disciplines such as materials science, chemistry, device physics, engineering and even among experimentalists and technology developers.

By systematically reviewing the state‐of‐the‐art, identifying unresolved challenges, and proposing practical solutions, this review offers a comprehensive roadmap for the ML‐guided discovery of high‐efficiency, low‐toxicity, and stable solar absorber materials. We anticipate that these insights will further catalyze interdisciplinary collaboration and accelerate the development of commercially viable, efficient, and sustainable solar conversion technologies.

Conflicts of Interest

The authors declare no conflict of interest.

Supporting information

Supporting file: advs74952‐sup‐0001‐SuppMat.docx

ADVS-13-e74952-s001.docx (407KB, docx)

Acknowledgements

X.H. acknowledges the Australian Research Council Future Fellowship (FT190100756). P.V.K. acknowledges the Scientia Fellowship scheme at the University of New South Wales and the Australian Research Council for financial support through the Discovery Early Career Researcher Award (DE210101259). M.P.S. gratefully acknowledges the support by the Australian Research Council under Discovery Early Career Researcher Award (DE210101565) and Discovery Project (DP230101676). The views expressed herein are those of the authors and are not necessarily those of the Australian Research Council.

Biographies

Yangfan Zhang is currently a Ph.D. candidate at UNSW under the supervision of Dr. Mahesh Suryawanshi. His research focuses on applying machine learning to accelerate the discovery and optimization of novel photovoltaic materials. By integrating data‐driven approaches with first‐principles calculations, Yangfan aims to identify stable, efficient, and environmentally friendly materials for next‐generation solar energy conversion. With a strong commitment to advancing sustainable energy technologies, his work explores innovative strategies for material design and screening, contributing to the development of high‐performance solar cells.

graphic file with name ADVS-13-e74952-g008.gif

Xiaojing Hao, a tenured professor and ARC future Fellow at UNSW, completed her Ph.D. in 2010 at UNSW. She focuses on high‐efficiency thin film and tandem solar cells using earth‐abundant materials such as chalcogenides. With over 160 peer‐reviewed publications, including in Nature Energy and Energy and Environmental Science, she has received significant recognition, including the ARC DECRA, Australian Renewable Energy Agency Postdoc Fellow, and the 2020 Prime Minister's Malcolm McIntosh Prize for Physical Scientist of the Year. She leads a strong research group, advancing efficiency in emerging thin film solar technologies.

graphic file with name ADVS-13-e74952-g009.gif

Priyank V. Kumar is a Scientia senior lecturer in chemical engineering at UNSW Sydney, Australia. His group is interested in understanding and designing nanomaterials using theory, computation and data‐driven methods including density functional theory (DFT), time‐dependent DFT (TDDFT), molecular dynamics and machine learning. The group focuses on applications such as photo/electro/thermo catalytic systems, batteries and functional polymers, and strives to collaborate with experimentalists in the relevant areas.

graphic file with name ADVS-13-e74952-g018.gif

Mahesh P. Suryawanshi is a senior lecturer and ARC DECRA Fellow at the School of Photovoltaic and Renewable Energy Engineering, UNSW Sydney. He has received multiple prestigious awards, including a Doctoral Exchange Scholarship (2012), the Brain‐Korea (BK‐21) Postdoctoral Fellowship (2016–2019), the Early Career Researcher Award (2019–2021) (Ministry of Spain & European Commission) and the Australian Research Council's DECRA (2021–2024). Since 2021, he leads a ‘Materials Innovation Lab for Sustainable Energy Futures’ that focuses on the nanoscale design and development of functional materials, exploring their structure‐optoelectronic properties through experiments and modeling for solar energy conversion and catalysis applications.

graphic file with name ADVS-13-e74952-g019.gif

Contributor Information

Xiaojing Hao, Email: xj.hao@unsw.edu.au.

Priyank V. Kumar, Email: priyank.kumar@unsw.edu.au.

Mahesh P. Suryawanshi, Email: m.suryawanshi@unsw.edu.au.

Data Availability Statement

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

References

  • 1. Wu T., Qin Z., Wang Y., et al., “The Main Progress of Perovskite Solar Cells in 2020–2021,” Nano‐Micro Letters 13 (2021): 152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. N. P. Research , Best Research‐Cell Efficiency Chart (2025), https://www2.nrel.gov/pv/cell‐efficiency.
  • 3. Ibn‐Mohammed T., Koh S., Reaney I., et al., “Perovskite Solar Cells: An Integrated Hybrid Lifecycle Assessment and Review in Comparison With Other Photovoltaic Technologies,” Renewable and Sustainable Energy Reviews 80 (2017): 1321–1344. [Google Scholar]
  • 4. Ju M.‐G., Dai J., Ma L., and Zeng X. C., “Lead‐Free Mixed Tin and Germanium Perovskites for Photovoltaic Application,” Journal of the American Chemical Society 139, no. 23 (2017): 8038–8043. [DOI] [PubMed] [Google Scholar]
  • 5. Noel N. K., Stranks S. D., Abate A., et al., “Lead‐Free Organic–Inorganic Tin Halide Perovskites for Photovoltaic Applications,” Energy & Environmental Science 7, no. 9 (2014): 3061–3068. [Google Scholar]
  • 6. Peng Y., Huq T. N., Mei J., et al., “Lead‐Free Perovskite‐Inspired Absorbers for Indoor Photovoltaics,” Advanced Energy Materials 11, no. 1 (2021): 2002761. [Google Scholar]
  • 7. Singh M. and Akash J. T., “Solar Cells Based on Pb‐Free and Perovskite‐Inspired Materials: Futuristic of Perovskite Solar Cells,” ACS Applied Energy Materials 7, no. 22 (2024): 10212–10229. [Google Scholar]
  • 8. Chen C., Zuo Y., Ye W., Li X., Deng Z., and Ong S. P., “A Critical Review of Machine Learning of Energy Materials,” Advanced Energy Materials 10, no. 8 (2020): 1903242. [Google Scholar]
  • 9. Tao S. X., Cao X., and Bobbert P. A., “Accurate and Efficient Band Gap Predictions of Metal Halide Perovskites Using the DFT‐1/2 Method: GW Accuracy With DFT Expense,” Scientific Reports 7, no. 1 (2017): 14386. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Deringer V. L., Caro M. A., and Csányi G., “Machine Learning Interatomic Potentials as Emerging Tools for Materials Science,” Advanced Materials 31, no. 46 (2019): 1902765. [DOI] [PubMed] [Google Scholar]
  • 11. Wang G., Wang C., Zhang X., Li Z., Zhou J., and Sun Z., “Machine Learning Interatomic Potential: Bridge the Gap Between Small‐Scale Models and Realistic Device‐Scale Simulations,” Iscience 27, no. 5 (2024): 109673. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Juan Y., Dai Y., Yang Y., and Zhang J., “Accelerating Materials Discovery Using Machine Learning,” Journal of Materials Science & Technology 79 (2021): 178–190. [Google Scholar]
  • 13. Liu Y., Esan O. C., Pan Z., and An L., “Machine Learning for Advanced Energy Materials,” Energy and AI 3 (2021): 100049. [Google Scholar]
  • 14. Jørgensen P. B., Mesta M., Shil S., et al., “Machine Learning‐Based Screening of Complex Molecules For Polymer Solar Cells,” Journal of Chemical Physics 148, no. 24 (2018): 241735. [DOI] [PubMed] [Google Scholar]
  • 15. Laakso J., Todorović M., Li J., Zhang G.‐X., and Rinke P., “Compositional Engineering of Perovskites With Machine Learning,” Physical Review Materials 6, no. 11 (2022): 113801. [Google Scholar]
  • 16. Sun S., Hartono N. T., Ren Z. D., et al., “Accelerated Development of Perovskite‐Inspired Materials via High‐Throughput Synthesis and Machine‐Learning Diagnosis,” Joule 3, no. 6 (2019): 1437–1451. [Google Scholar]
  • 17. Tao Q., Xu P., Li M., and Lu W., “Machine learning for perovskite materials design and discovery,” Npj Computational Materials 7, no. 1 (2021): 23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Wu T. and Wang J., “Global Discovery of Stable and Non‐Toxic Hybrid Organic‐Inorganic Perovskites for Photovoltaic Systems by Combining Machine Learning Method With First Principle Calculations,” Nano Energy 66 (2019): 104070. [Google Scholar]
  • 19. Yao Z., Lum Y., Johnston A., et al., “Machine Learning for a Sustainable Energy Future,” Nature Reviews Materials 8, no. 3 (2023): 202–215. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Ghorpade U. V., Suryawanshi M. P., Green M. A., Wu T., Hao X., and Ryan K. M., “Emerging Chalcohalide Materials for Energy Applications,” Chemical Reviews 123, no. 1 (2022): 327–378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Glück N. and Bein T., “Prospects of Lead‐Free Perovskite‐Inspired Materials for Photovoltaic Applications,” Energy & Environmental Science 13, no. 12 (2020): 4691–4716. [Google Scholar]
  • 22. Chen M., Yin Z., Shan Z., et al., “Application of Machine Learning in Perovskite Materials and Devices: A Review,” Journal of Energy Chemistry 94 (2024): 254–272. [Google Scholar]
  • 23. AboZied A. E.‐R. T., Ghani A., Ali A. I., and Salaheldin T. A., “Structure, Magnetic and Magnetocaloric Properties of Nano Crystalline Perovskite La0.8Ag0.2MnO3 ,” Journal of Magnetism and Magnetic Materials 479 (2019): 260–267. [Google Scholar]
  • 24. Carrasco‐Jaim O. A., Huerta‐Flores A. M., Torres‐Martínez L. M., and Moctezuma E., “Fast In‐Situ Photodeposition of Ag and Cu Nanoparticles Onto AgTaO3 Perovskite for an Enhanced Photocatalytic Hydrogen Generation,” International Journal of Hydrogen Energy 45, no. 16 (2020): 9744–9757. [Google Scholar]
  • 25. Mechi N., Hcini S., Alzahrani B., Boudard M., Dhahri A., and Bouazizi M. L., “La0.6Ca0.2Na0.2MnO3 Perovskite: Structural, Magnetic, Critical, and Magnetocaloric Properties,” Journal of Superconductivity and Novel Magnetism 33, no. 5 (2020): 1385–1393. [Google Scholar]
  • 26. Xia W., Li Q., Sun L., Huo L., and Zhao H., “Electrochemical Performance of Sn‐Doped Bi0.5Sr0.5FeO3‐δ Perovskite as Cathode Electrocatalyst for Solid Oxide Fuel Cells,” Journal of Alloys and Compounds 835 (2020): 155406. [Google Scholar]
  • 27. Zhou J., Yu M., Peng J., Lin R., Li Z., and Yu C., “Photocatalytic Degradation Characteristics of Tetracycline and Structural Transformation on Bismuth Silver Oxide Perovskite Nano‐Catalysts,” Applied Nanoscience 10 (2020): 2329–2338. [Google Scholar]
  • 28. Zhou C., Lin H., He Q., et al., “Low Dimensional Metal Halide Perovskites and Hybrids,” Materials Science and Engineering: R: Reports 137 (2019): 38–65. [Google Scholar]
  • 29. Huang Y.‐T., Kavanagh S. R., Scanlon D. O., Walsh A., and Hoye R. L., “Perovskite‐Inspired Materials for Photovoltaics and Beyond—From Design to Devices,” Nanotechnology 32, no. 13 (2021): 132004. [DOI] [PubMed] [Google Scholar]
  • 30. Yin W.‐J., Yang J.‐H., Kang J., Yan Y., and Wei S.‐H., “Halide Perovskite Materials for Solar Cells: A Theoretical Review,” Journal of Materials Chemistry A 3, no. 17 (2015): 8926–8942. [Google Scholar]
  • 31. Kojima A., Teshima K., Shirai Y., and Miyasaka T., “Organometal Halide Perovskites as Visible‐Light Sensitizers for Photovoltaic Cells,” Journal of the American Chemical Society 131, no. 17 (2009): 6050–6051. [DOI] [PubMed] [Google Scholar]
  • 32. Kim H.‐S., Lee C.‐R., Im J.‐H., et al., “Lead Iodide Perovskite Sensitized All‐Solid‐State Submicron Thin Film Mesoscopic Solar Cell With Efficiency Exceeding 9%,” Scientific Reports 2, no. 1 (2012): 591. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Longi, LONGi Sets New World‐Record for Silicon Solar Cell Efficiency, Launching 2nd Generation Ultra‐Efficient BC‐Based Module, accessed, https://www.longi.com/us/news/longi‐world‐record‐efficiency‐of‐monocrystalline‐silicon‐cells/.
  • 34. Wang H., Zhang Q., Qiu M., and Hu B., “Synthesis and Application of Perovskite‐Based Photocatalysts in Environmental Remediation: A Review,” Journal of Molecular Liquids 334 (2021): 116029. [Google Scholar]
  • 35. Goldschmidt V. M., “Die Gesetze der Krystallochemie,” Die Naturwissenschaften 14, no. 21 (1926): 477–485. [Google Scholar]
  • 36. Tai Q., Tang K.‐C., and Yan F., “Recent Progress of Inorganic Perovskite Solar Cells,” Energy & Environmental Science 12, no. 8 (2019): 2375–2405. [Google Scholar]
  • 37. Wang Y., Wang Y., Doherty T. A., Stranks S. D., Gao F., and Yang D., “Octahedral Units in Halide Perovskites,” Nature Reviews Chemistry 1 (2025): 261–277. [DOI] [PubMed] [Google Scholar]
  • 38. Travis W., Glover E., Bronstein H., Scanlon D., and Palgrave R., “On the Application of the Tolerance Factor to Inorganic and Hybrid Halide Perovskites: A Revised System,” Chemical Science 7, no. 7 (2016): 4548–4556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Li W., Wang Z., Deschler F., Gao S., Friend R. H., and Cheetham A. K., “Chemically Diverse and Multifunctional Hybrid Organic–Inorganic Perovskites,” Nature Reviews Materials 2, no. 3 (2017): 16099. [Google Scholar]
  • 40. Bartel C. J., Sutton C., Goldsmith B. R., et al., “New Tolerance Factor to Predict the Stability of Perovskite Oxides and Halides,” Science Advances 5, no. 2 (2019): aav0693. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Cheetham A. K. and Rao C., “There's Room in the Middle,” Science 318, no. 5847 (2007): 58–59. [DOI] [PubMed] [Google Scholar]
  • 42. Ptak M., Mączka M., Gągor A., et al., “Experimental and Theoretical Studies of Structural Phase Transition in a Novel Polar Perovskite‐Like [C2H5NH3][Na0.5Fe0.5(HCOO)3] Formate,” Dalton Transactions 45, no. 6 (2016): 2574–2583. [DOI] [PubMed] [Google Scholar]
  • 43. Sourisseau S., Louvain N., Bi W., et al., “Reduced Band Gap Hybrid Perovskites Resulting From Combined Hydrogen and Halogen Bonding at the Organic−Inorganic Interface,” Chemistry of Materials 19, no. 3 (2007): 600–607. [Google Scholar]
  • 44. Shang R., Xu G. C., Wang Z. M., and Gao S., “Phase Transitions, Prominent Dielectric Anomalies, and Negative Thermal Expansion in Three High Thermally Stable Ammonium Magnesium–Formate Frameworks,” Chemistry—A European Journal 20, no. 4 (2014): 1146–1158. [DOI] [PubMed] [Google Scholar]
  • 45. Wang Z., Shi Z., Li T., Chen Y., and Huang W., “Stability of Perovskite Solar Cells: A Prospective on the Substitution of the A Cation and X Anion,” Angewandte Chemie International Edition 56, no. 5 (2017): 1190–1212. [DOI] [PubMed] [Google Scholar]
  • 46. Kim H. S., Seo J. Y., and Park N. G., “Material and Device Stability in Perovskite Solar Cells,” Chemsuschem 9, no. 18 (2016): 2528–2540. [DOI] [PubMed] [Google Scholar]
  • 47. Rong Y., Liu L., Mei A., Li X., and Han H., “Beyond Efficiency: The Challenge of Stability in Mesoscopic Perovskite Solar Cells,” Advanced Energy Materials 5, no. 20 (2015): 1501066. [Google Scholar]
  • 48. Tiep N. H., Ku Z., and Fan H. J., “Recent Advances in Improving the Stability of Perovskite Solar Cells,” Advanced Energy Materials 6, no. 3 (2016): 1501420. [Google Scholar]
  • 49. Steele J. A., Jin H., Dovgaliuk I., et al., “Thermal Unequilibrium of Strained Black CsPbI3 Thin Films,” Science 365, no. 6454 (2019): 679–684. [DOI] [PubMed] [Google Scholar]
  • 50. Xiao Z., Meng W., Wang J., and Yan Y., “Thermodynamic Stability and Defect Chemistry of Bismuth‐Based Lead‐Free Double Perovskites,” Chemsuschem 9, no. 18 (2016): 2628–2633. [DOI] [PubMed] [Google Scholar]
  • 51. Tran T. T., Panella J. R., Chamorro J. R., Morey J. R., and McQueen T. M., “Designing Indirect–Direct Bandgap Transitions in Double Perovskites,” Materials Horizons 4, no. 4 (2017): 688–693. [Google Scholar]
  • 52. Ma X., Yang L., Lei K., Zheng S., Chen C., and Song H., “Doping in Inorganic Perovskite for Photovoltaic Application,” Nano Energy 78 (2020): 105354. [Google Scholar]
  • 53. Tian J., Xue Q., Yao Q., Li N., Brabec C. J., and Yip H. L., “Inorganic Halide Perovskite Solar Cells: Progress and Challenges,” Advanced Energy Materials 10 (2020): 2000183. [Google Scholar]
  • 54. De Roo J., Ibáñez M., Geiregat P., et al., “Highly Dynamic Ligand Binding and Light Absorption Coefficient of Cesium Lead Bromide Perovskite Nanocrystals,” ACS Nano 10, no. 2 (2016): 2071–2081. [DOI] [PubMed] [Google Scholar]
  • 55. Chen X., Peng H., Sun X., et al., “The Mixed Phases of α and γ‐CsPbI3 Enable Efficient and Stable Semitransparent Solar Cells,” Small 21 (2025): 2500710. [DOI] [PubMed] [Google Scholar]
  • 56. Swarnkar A., Marshall A. R., Sanehira E. M., et al., “Quantum Dot–Induced Phase Stabilization of α‐CsPbI3 Perovskite for High‐Efficiency Photovoltaics,” Science 354, no. 6308 (2016): 92–95. [DOI] [PubMed] [Google Scholar]
  • 57. Fan Y., Qin H., Ye W., Liu M., Huang F., and Zhong D., “Improving the Stability of Methylammonium Lead Iodide Perovskite Solar Cells by Cesium Doping,” Thin Solid Films 667 (2018): 40–47. [Google Scholar]
  • 58. Liu J., Meng X., Liu K., et al., “Optimizing the Performance of Ge‐Based Perovskite Solar Cells by Doping CsGeI3 Instead of Charge Transport Layer,” Solar Energy 259 (2023): 398–415. [Google Scholar]
  • 59. Wang L., Miao Q., Wang D., et al., “14.31 % Power Conversion Efficiency of Sn‐Based Perovskite Solar Cells via Efficient Reduction of Sn4+ ,” Angewandte Chemie International Edition 62, no. 33 (2023): 202307228. [DOI] [PubMed] [Google Scholar]
  • 60. Hu Y., Bai F., Liu X., et al., “Bismuth Incorporation Stabilized α‐CsPbI3 for Fully Inorganic Perovskite Solar Cells,” ACS Energy Letters 2, no. 10 (2017): 2219–2227. [Google Scholar]
  • 61. Yao Z., Jin Z., Zhang X., et al., “Pseudohalide (SCN − )‐Doped CsPbI3 for High‐Performance Solar Cells,” Journal of Materials Chemistry C 7, no. 44 (2019): 13736–13742. [Google Scholar]
  • 62. Zhang T., Dar M. I., Li G., et al., “Bication Lead Iodide 2D Perovskite Component to Stabilize Inorganic α‐CsPbI3 Perovskite phase for High‐Efficiency Solar Cells,” Science Advances 3, no. 9 (2017): 1700841. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Swarnkar A., Mir W. J., and Nag A., “Can B‐Site Doping or Alloying Improve Thermal‐ and Phase‐Stability of All‐Inorganic CsPbX3 (X = Cl, Br, I) Perovskites?,” ACS Energy Letters 3, no. 2 (2018): 286–289. [Google Scholar]
  • 64. Cheng P., Wu T., Li Y., Jiang L., Deng W., and Han K., “Combining Theory and Experiment in the Design of a Lead‐Free ((CH3NH3)2 AgBiI6) Double Perovskite,” New Journal of Chemistry 41, no. 18 (2017): 9598–9601. [Google Scholar]
  • 65. Luo J., Wang X., Li S., et al., “Efficient and Stable Emission of Warm‐White Light From Lead‐Free Halide Double Perovskites,” Nature 563, no. 7732 (2018): 541–545. [DOI] [PubMed] [Google Scholar]
  • 66. McClure E. T., Ball M. R., Windl W., and Woodward P. M., “Cs2 AgBiX6 (X = Br, Cl): New Visible Light Absorbing, Lead‐Free Halide Perovskite Semiconductors,” Chemistry of Materials 28, no. 5 (2016): 1348–1354. [Google Scholar]
  • 67. Volonakis G. and Giustino F., “Surface Properties of Lead‐Free Halide Double Perovskites: Possible Visible‐Light Photo‐Catalysts for Water Splitting,” Applied Physics Letters 112, no. 24 (2018): 243901. [Google Scholar]
  • 68. Greul E., Petrus M. L., Binek A., Docampo P., and Bein T., “Highly Stable, Phase Pure Cs2AgBiBr6 Double Perovskite Thin Films for Optoelectronic Applications,” Journal of Materials Chemistry A 5, no. 37 (2017): 19972–19981. [Google Scholar]
  • 69. Zhang Z., Sun Q., Lu Y., et al., “Hydrogenated Cs2AgBiBr6 for Significantly Improved Efficiency of Lead‐Free Inorganic Double Perovskite Solar Cell,” Nature Communications 13, no. 1 (2022): 3397. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70. Pan W., Wu H., Luo J., et al., “Cs2AgBiBr6 Single‐Crystal X‐Ray Detectors With a Low Detection Limit,” Nature Photonics 11, no. 11 (2017): 726–732. [Google Scholar]
  • 71. Creutz S. E., Crites E. N., De Siena M. C., and Gamelin D. R., “Colloidal Nanocrystals of Lead‐Free Double‐Perovskite (Elpasolite) Semiconductors: Synthesis and Anion Exchange To Access New Materials,” Nano Letters 18, no. 2 (2018): 1118–1123. [DOI] [PubMed] [Google Scholar]
  • 72. Hoye R. L., Eyre L., Wei F., et al., “Fundamental Carrier Lifetime Exceeding 1 µs in Cs2AgBiBr6 Double Perovskite,” Advanced Materials Interfaces 5, no. 15 (2018): 1800464. [Google Scholar]
  • 73. Freysoldt C., Grabowski B., Hickel T., et al., “First‐Principles Calculations for Point Defects in Solids,” Reviews of Modern Physics 86, no. 1 (2014): 253–305. [Google Scholar]
  • 74. Dolzhenko Y. I., Inabe T., and Maruyama Y., “In Situ X‐Ray Observation on the Intercalation of Weak Interaction Molecules Into Perovskite‐Type Layered Crystals (C9H19NH3)2PbI4 and (C10H21NH3)2CdCl4 ,” Bulletin of the Chemical Society of Japan 59, no. 2 (1986): 563–567. [Google Scholar]
  • 75. Stoumpos C. C., Cao D. H., Clark D. J., et al., “Ruddlesden–Popper Hybrid Lead Iodide Perovskite 2D Homologous Semiconductors,” Chemistry of Materials 28, no. 8 (2016): 2852–2867. [Google Scholar]
  • 76. Zhang Y., Abdi‐Jalebi M., Larson B. W., and Zhang F., “What Matters for the Charge Transport of 2D Perovskites?,” Advanced Materials 36, no. 31 (2024): 2404517. [DOI] [PubMed] [Google Scholar]
  • 77. Lu D., Lv G., Xu Z., Dong Y., Ji X., and Liu Y., “Thiophene‐Based Two‐Dimensional Dion–Jacobson Perovskite Solar Cells With Over 15% Efficiency,” Journal of the American Chemical Society 142, no. 25 (2020): 11114–11122. [DOI] [PubMed] [Google Scholar]
  • 78. Zheng K. and Pullerits T., Two Dimensions Are Better for Perovskites (ACS Publications, 2019), 5881–5885. [DOI] [PubMed] [Google Scholar]
  • 79. Yan L., Ma J., Li P., et al., “Charge‐Carrier Transport in Quasi‐2D Ruddlesden–Popper Perovskite Solar Cells,” Advanced Materials 34, no. 7 (2022): 2106822. [DOI] [PubMed] [Google Scholar]
  • 80. Yang M.‐J., Tang S.‐Y., Weng Y.‐R., et al., “H/F Substitution on the Spacer Cations Leads to 1D‐to‐2D Increment of the Pyrrolidinium‐Containing Lead Iodide Hybrid Perovskites,” Inorganic Chemistry 61, no. 15 (2022): 5836–5843. [DOI] [PubMed] [Google Scholar]
  • 81. Pariari D., Mehta S., Mandal S., et al., “Realizing the Lowest Bandgap and Exciton Binding Energy in a Two‐Dimensional Lead Halide System,” Journal of the American Chemical Society 145, no. 29 (2023): 15896–15905. [DOI] [PubMed] [Google Scholar]
  • 82. Wei Q., Ren H., Liu J., et al., “Long‐Lived Hot Carriers in Two‐Dimensional Perovskites: The Role of Alternating Cations in Interlayer Space,” ACS Energy Letters 8, no. 10 (2023): 4315–4322. [Google Scholar]
  • 83. Huang Y., Li Y., Lim E. L., et al., “Stable Layered 2D Perovskite Solar Cells With an Efficiency of Over 19% via Multifunctional Interfacial Engineering,” Journal of the American Chemical Society 143, no. 10 (2021): 3911–3917. [DOI] [PubMed] [Google Scholar]
  • 84. Ding Y., Wu Y., Tian Y., et al., “Effects of Guanidinium Cations on Structural, Optoelectronic and Photovoltaic Properties of Perovskites,” Journal of Energy Chemistry 58 (2021): 48–54. [Google Scholar]
  • 85. Xu Z., Mitzi D. B., Dimitrakopoulos C. D., and Maxcy K. R., “Semiconducting Perovskites (2‐XC6H4C2H4NH3)2 SnI4(X = F, Cl, Br):  Steric Interaction Between the Organic and Inorganic Layers,” Inorganic Chemistry 42, no. 6 (2003): 2031–2039. [DOI] [PubMed] [Google Scholar]
  • 86. Cheng P., Wu T., Zhang J., et al., “(C6H5C2H4NH3)2 GeI4: A Layered Two‐Dimensional Perovskite With Potential for Photovoltaic Applications,” Journal of Physical Chemistry Letters 8, no. 18 (2017): 4402–4406. [DOI] [PubMed] [Google Scholar]
  • 87. Mitzi D. B., “Synthesis, Crystal Structure, and Optical and Thermal Properties of (C4H9NH3)2 MI4 (M = Ge, Sn, Pb),” Chemistry of Materials 8, no. 3 (1996): 791–800. [Google Scholar]
  • 88. Chen M., Ju M.‐G., Hu M., et al., “Lead‐Free Dion–Jacobson Tin Halide Perovskites for Photovoltaics,” ACS Energy Letters 4, no. 1 (2018): 276–277. [Google Scholar]
  • 89. Li P., Liu X., Zhang Y., et al., “Low‐Dimensional Dion–Jacobson‐Phase Lead‐Free Perovskites for High‐Performance Photovoltaics With Improved Stability,” Angewandte Chemie International Edition 59, no. 17 (2020): 6909–6914. [DOI] [PubMed] [Google Scholar]
  • 90. Metcalf I., Sidhik S., Zhang H., et al., “Synergy of 3D and 2D Perovskites for Durable, Efficient Solar Cells and Beyond,” Chemical Reviews 123, no. 15 (2023): 9565–9652. [DOI] [PubMed] [Google Scholar]
  • 91. Gui D., Ji L., Muhammad A., et al., “Jahn–Teller Effect on Framework Flexibility of Hybrid Organic–Inorganic Perovskites,” Journal of Physical Chemistry Letters 9, no. 4 (2018): 751–755. [DOI] [PubMed] [Google Scholar]
  • 92. Gao L., Zhang F., Xiao C., et al., “Improving Charge Transport via Intermediate‐Controlled Crystal Growth in 2D Perovskite Solar Cells,” Advanced Functional Materials 29, no. 47 (2019): 1901652. [Google Scholar]
  • 93. Shao M., Bie T., Yang L., et al., “Over 21% Efficiency Stable 2D Perovskite Solar Cells,” Advanced Materials 34, no. 1 (2022): 2107211. [DOI] [PubMed] [Google Scholar]
  • 94. Liang J., Zhang Z., Zheng Y., et al., “Overcoming the Carrier Transport Limitation in Ruddlesden–Popper Perovskite Films by Using Lamellar Nickel Oxide Substrates,” Journal of Materials Chemistry A 9, no. 19 (2021): 11741–11752. [Google Scholar]
  • 95. Min L., Tian W., Cao F., Guo J., and Li L., “2D Ruddlesden–Popper Perovskite With Ordered Phase Distribution for High‐Performance Self‐Powered Photodetectors,” Advanced Materials 33, no. 35 (2021): 2101714. [DOI] [PubMed] [Google Scholar]
  • 96. Hautzinger M. P., Pan D., Pigg A. K., et al., “Band Edge Tuning of Two‐Dimensional Ruddlesden–Popper Perovskites by A Cation Size Revealed Through Nanoplates,” ACS Energy Letters 5, no. 5 (2020): 1430–1437. [Google Scholar]
  • 97. Zhou N., Shen Y., Li L., et al., “Exploration of Crystallization Kinetics in Quasi Two‐Dimensional Perovskite and High Performance Solar Cells,” Journal of the American Chemical Society 140, no. 1 (2018): 459–465. [DOI] [PubMed] [Google Scholar]
  • 98. Zhang F., Kim D. H., Lu H., et al., “Enhanced Charge Transport in 2D Perovskites via Fluorination of Organic Cation,” Journal of the American Chemical Society 141, no. 14 (2019): 5972–5979. [DOI] [PubMed] [Google Scholar]
  • 99. Li X., Ke W., Traoré B., et al., “Two‐Dimensional Dion–Jacobson Hybrid Lead Iodide Perovskites With Aromatic Diammonium Cations,” Journal of the American Chemical Society 141, no. 32 (2019): 12880–12890. [DOI] [PubMed] [Google Scholar]
  • 100. Ghosh D., Acharya D., Pedesseau L., et al., “Charge Carrier Dynamics in Two‐Dimensional Hybrid Perovskites: Dion–Jacobson vs. Ruddlesden–Popper Phases,” Journal of Materials Chemistry A 8, no. 42 (2020): 22009–22022. [Google Scholar]
  • 101. Wang F., Ju M.‐G., and Ma L., “Metal‐Cation‐Mixed Lead‐Less Two‐Dimensional Hybrid Perovskites With High Carrier Mobility and Promoted Light Adsorption,” Materials Today Physics 27 (2022): 100769. [Google Scholar]
  • 102. Mao L., Wu Y., Stoumpos C. C., et al., “Tunable White‐Light Emission in Single‐Cation‐Templated Three‐Layered 2D Perovskites (CH3CH2NH3)4Pb3Br10–xClx ,” Journal of the American Chemical Society 139, no. 34 (2017): 11956–11963. [DOI] [PubMed] [Google Scholar]
  • 103. Yangui A., Pillet S., Lusson A., et al., “Control of the White‐Light Emission in the Mixed Two‐Dimensional Hybrid Perovskites (C6H11NH3)2[PbBr4−xIx],” Journal of Alloys and Compounds 699 (2017): 1122–1133. [Google Scholar]
  • 104. Brandt R. E., Poindexter J. R., Gorai P., et al., “Searching for “Defect‐Tolerant” Photovoltaic Materials: Combined Theoretical and Experimental Screening,” Chemistry of Materials 29, no. 11 (2017): 4667–4674. [Google Scholar]
  • 105. Ganose A. M., Savory C. N., and Scanlon D. O., “Beyond Methylammonium Lead Iodide: Prospects for the Emergent Field of ns2 Containing Solar Absorbers,” Chemical Communications 53, no. 1 (2017): 20–44. [DOI] [PubMed] [Google Scholar]
  • 106. Hoye R. L., Schulz P., Schelhas L. T., et al., “Perovskite‐Inspired Photovoltaic Materials: Toward Best Practices in Materials Characterization and Calculations,” Chemistry of Materials 29, no. 5 (2017): 1964–1988. [Google Scholar]
  • 107. Lee L. C., Huq T. N., MacManus‐Driscoll J. L., and Hoye R. L., “Research Update: Bismuth‐Based Perovskite‐Inspired Photovoltaic Materials,” APL Materials 6, no. 8 (2018): 084502. [Google Scholar]
  • 108. Tiwari D., Hutter O. S., and Longo G., “Chalcogenide Perovskites for Photovoltaics: Current Status and Prospects,” Journal of Physics: Energy 3, no. 3 (2021): 034010. [Google Scholar]
  • 109. Agarwal S., Vincent K. C., and Agrawal R., “From Synthesis to Application: A Review of BaZrS3 Chalcogenide Perovskites,” Nanoscale 17 (2025): 4250–4300. [DOI] [PubMed] [Google Scholar]
  • 110. Filip M. R. and Giustino F., “The Geometric Blueprint of Perovskites,” Proceedings of the National Academy of Sciences 115, no. 21 (2018): 5397–5402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111. Jess A., Yang R., and Hages C. J., “On the Phase Stability of Chalcogenide Perovskites,” Chemistry of Materials 34, no. 15 (2022): 6894–6901. [Google Scholar]
  • 112. Sadeghi I., Van Sambeek J., Simonian T., et al., “Expanding the Perovskite Periodic Table to Include Chalcogenide Alloys With Tunable Band Gap Spanning 1.5–1.9 eV,” Advanced Functional Materials 33, no. 41 (2023): 2304575. [Google Scholar]
  • 113. Henkel P., Li J., Grandhi G. K., Vivo P., and Rinke P., “Screening Mixed‐Metal Sn2M(III)Ch2X3 Chalcohalides for Photovoltaic Applications,” Chemistry of Materials 35, no. 18 (2023): 7761–7769. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114. Nicolson A., Breternitz J., Kavanagh S. R., et al., “Interplay of Static and Dynamic Disorder in the Mixed‐Metal Chalcohalide Sn2SbS2I3 ,” Journal of the American Chemical Society 145, no. 23 (2023): 12509–12517. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115. Ran Z., Wang X., Li Y., et al., “Bismuth and Antimony‐Based Oxyhalides and Chalcohalides as Potential Optoelectronic Materials,” npj Computational Materials 4, no. 1 (2018): 14. [Google Scholar]
  • 116. Brandt R. E., Stevanović V., Ginley D. S., and Buonassisi T., “Identifying Defect‐Tolerant Semiconductors With High Minority‐Carrier Lifetimes: Beyond Hybrid Lead Halide Perovskites,” Mrs Communications 5, no. 2 (2015): 265–275. [Google Scholar]
  • 117. Gödel K. C. and Steiner U., “Thin Film Synthesis of SbSI Micro‐Crystals for Self‐Powered Photodetectors with Rapid Time Response,” Nanoscale 8, no. 35 (2016): 15920–15925. [DOI] [PubMed] [Google Scholar]
  • 118. Reuter B. and Hardel K., “Silbersulfidbromid und Silbersulfidjodid,” Angewandte Chemie 72, no. 4 (1960): 138–139. [Google Scholar]
  • 119. Palazon F., “Metal Chalcohalides: Next Generation Photovoltaic Materials?,” Solar RRL 6, no. 2 (2022): 2100829. [Google Scholar]
  • 120. Sun Y.‐Y., Shi J., Lian J., et al., “Discovering Lead‐Free Perovskite Solar Materials With a Split‐Anion Approach,” Nanoscale 8, no. 12 (2016): 6284–6289. [DOI] [PubMed] [Google Scholar]
  • 121. Hong F., Saparov B., Meng W., Xiao Z., Mitzi D. B., and Yan Y., “Viability of Lead‐Free Perovskites With Mixed Chalcogen and Halogen Anions for Photovoltaic Applications,” Journal of Physical Chemistry C 120, no. 12 (2016): 6435–6441. [Google Scholar]
  • 122. Davies D. W., Butler K. T., Jackson A. J., et al., “Computational Screening of All Stoichiometric Inorganic Materials,” Chemistry 1, no. 4 (2016): 617–627. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 123. Priyanga G. S., Sampath S., Shravan P., Sujith R., Javeed A. M., and Latha G., “Advanced Prediction of Perovskite Stability for Solar Energy Using Machine Learning,” Solar Energy 278 (2024): 112782. [Google Scholar]
  • 124. Kang S., Jeong W., Hong C., Hwang S., Yoon Y., and Han S., “Accelerated Identification of Equilibrium Structures of Multicomponent Inorganic Crystals Using Machine Learning Potentials,” npj Computational Materials 8, no. 1 (2022): 108. [Google Scholar]
  • 125. Miura A., Bartel C. J., Goto Y., et al., “Observing and Modeling the Sequential Pairwise Reactions that Drive Solid‐State Ceramic Synthesis,” Advanced Materials 33, no. 24 (2021): 2100312. [DOI] [PubMed] [Google Scholar]
  • 126. Miura A., Ito H., Bartel C. J., et al., “Selective Metathesis Synthesis of MgCr2S4 by Control of Thermodynamic Driving Forces,” Materials horizons 7, no. 5 (2020): 1310–1316. [Google Scholar]
  • 127. Narayan A., Bhutani A., Rubeck S., Eckstein J. N., Shoemaker D. P., and Wagner L. K., “Computational and experimental Investigation for new Transition Metal Selenides and Sulfides: The Importance of Experimental Verification for Stability,” Physical Review B 94, no. 4 (2016): 045105. [Google Scholar]
  • 128. Szymanski N. J., Nevatia P., Bartel C. J., Zeng Y., and Ceder G., “Autonomous and Dynamic Precursor Selection for Solid‐State Materials Synthesis,” Nature Communications 14, no. 1 (2023): 6956. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 129. Todd P. K., McDermott M. J., Rom C. L., et al., “Selectivity in Yttrium Manganese Oxide Synthesis via Local Chemical Potentials in Hyperdimensional Phase Space,” Journal of the American Chemical Society 143, no. 37 (2021): 15185–15194. [DOI] [PubMed] [Google Scholar]
  • 130. Jang J., Gu G. H., Noh J., Kim J., and Jung Y., “Structure‐Based Synthesizability Prediction of Crystals Using Partially Supervised Learning,” Journal of the American Chemical Society 142, no. 44 (2020): 18836–18843. [DOI] [PubMed] [Google Scholar]
  • 131. Gleaves D., Fu N., Siriwardane E. M. D., Zhao Y., and Hu J., “Materials Synthesizability and Stability Prediction Using a Semi‐Supervised Teacher‐Student Dual Neural Network,” Digital Discovery 2, no. 2 (2023): 377–391. [Google Scholar]
  • 132. Song Z., Lu S., Ju M., Zhou Q., and Wang J., “Accurate Prediction of Synthesizability and Precursors of 3D Crystal Structures via Large Language Models,” Nature Communications 16, no. 1 (2025): 6530. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 133. Shockley W. and Queisser H., Detailed Balance Limit of Efficiency of p–n Junction Solar Cells (Routledge, 2018). [Google Scholar]
  • 134. Kuddus A., Ismail A. B. M., and Hossain J., “Design of a Highly Efficient CdTe‐Based Dual‐Heterojunction Solar Cell With 44% Predicted Efficiency,” Solar Energy 221 (2021): 488–501. [Google Scholar]
  • 135. Green M. A., Dunlop E. D., Yoshita M., et al., “Solar Cell Efficiency Tables (Version 66),” Progress in Photovoltaics: Research and Applications 33, no. 7 (2025): 795–810, 10.1002/pip.3919. [DOI] [Google Scholar]
  • 136. Kayes B. M., Nie H., and Twist R., “27.6% Conversion Efficiency, A New Record for Single‐Junction Solar Cells under 1 Sun Illumination,” 2011 37th IEEE Photovoltaic Specialists Conference (IEEE, 2011): 000004–000008. [Google Scholar]
  • 137. Kirchartz T. and Rau U., “What Makes a Good Solar Cell?,” Advanced Energy Materials 8, no. 28 (2018): 1703385. [Google Scholar]
  • 138. Morales‐Acevedo A., “Effective Absorption Coefficient for Graded Band‐Gap Semiconductors and the Expected Photocurrent Density in Solar Cells,” Solar Energy Materials and Solar Cells 93, no. 1 (2009): 41–44. [Google Scholar]
  • 139. Han G., Zhang S., Boix P. P., Wong L. H., Sun L., and Lien S.‐Y., “Towards High Efficiency Thin Film Solar Cells,” Progress in Materials Science 87 (2017): 246–291. [Google Scholar]
  • 140. Miyata A., Mitioglu A., Plochocka P., et al., “Direct Measurement of the Exciton Binding Energy and Effective Masses for Charge Carriers in Organic–Inorganic Tri‐Halide Perovskites,” Nature Physics 11, no. 7 (2015): 582–587. [Google Scholar]
  • 141. Myronov M., Chapter 3 ‐ Molecular Beam Epitaxy of High Mobility Silicon, Silicon Germanium and Germanium Quantum Well Heterostructures (Elsevier, 2018): 37–54. [Google Scholar]
  • 142. Green M. A., “Solar Cells: Operating Principles, Technology, and System Applications,” Englewood Cliffs (1982). [Google Scholar]
  • 143. Cai X., Zhang Y., Shi Z., et al., “Discovery of Lead‐Free Perovskites for High‐Performance Solar Cells via Machine Learning: Ultrabroadband Absorption, Low Radiative Combination, and Enhanced Thermal Conductivities,” Advanced Science 9, no. 4 (2022): 2103648. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 144. Zakutayev A., Caskey C. M., Fioretti A. N., et al., “Defect Tolerant Semiconductors for Solar Energy Conversion,” Journal of Physical Chemistry Letters 5, no. 7 (2014): 1117–1125. [DOI] [PubMed] [Google Scholar]
  • 145. Wu X., Chen H., Wang J., and Niu X., “Machine Learning Accelerated Study of Defect Energy Levels in Perovskites,” Journal of Physical Chemistry C 127, no. 23 (2023): 11387–11395. [Google Scholar]
  • 146. Lu W., Xiao R., Yang J., Li H., and Zhang W., “Data Mining‐Aided Materials Discovery and Optimization,” Journal of Materiomics 3, no. 3 (2017): 191–201. [Google Scholar]
  • 147. Wan X., Feng W., Wang Y., et al., “Materials Discovery and Properties Prediction in Thermal Transport via Materials Informatics: A Mini Review,” Nano Letters 19, no. 6 (2019): 3387–3395. [DOI] [PubMed] [Google Scholar]
  • 148. Alwosheel A., Van Cranenburgh S., and Chorus C. G., “Is Your Dataset Big Enough? Sample Size Requirements When Using Artificial Neural Networks for Discrete Choice Analysis,” Journal of Choice Modelling 28 (2018): 167–182. [Google Scholar]
  • 149. Song Z., Chen X., Meng F., et al., “Machine Learning in Materials Design: Algorithm and Application*,” Chinese Physics B 29, no. 11 (2020): 116103. [Google Scholar]
  • 150. Roh Y., Heo G., and Whang S. E., “A Survey on Data Collection for Machine Learning: A Big Data—AI Integration Perspective,” IEEE Transactions on Knowledge and Data Engineering 33, no. 4 (2019): 1328–1347. [Google Scholar]
  • 151. Agrawal A. and Choudhary A., “Perspective: Materials Informatics and Big Data: Realization of the “Fourth Paradigm” of Science in Materials Science,” APL Materials 4, no. 5 (2016): 053208. [Google Scholar]
  • 152. Himanen L., Geurts A., Foster A. S., and Rinke P., “Data‐Driven Materials Science: Status, Challenges, and Perspectives,” Advanced Science 6, no. 21 (2019): 1900808. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 153. Fink T. and Reymond J.‐L., “Virtual Exploration of the Chemical Universe up to 11 Atoms of C, N, O, F:  Assembly of 26.4 Million Structures (110.9 Million Stereoisomers) and Analysis for New Ring Systems, Stereochemistry, Physicochemical Properties, Compound Classes, and Drug Discovery,” Journal of Chemical Information and Modeling 47, no. 2 (2007): 342–353. [DOI] [PubMed] [Google Scholar]
  • 154. Gaulton A., Hersey A., Nowotka M., et al., “The ChEMBL Database in 2017,” Nucleic Acids Research 45, no. D1 (2017): D945–D954. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 155. Glasser L., “Crystallographic Information Resources,” Journal of Chemical Education 93, no. 3 (2016): 542–549. [Google Scholar]
  • 156. Gorai P., Gao D., Ortiz B., et al., “TE Design Lab: A Virtual Laboratory for Thermoelectric Material Design,” Computational Materials Science 112 (2016): 368–376. [Google Scholar]
  • 157. Gražulis S., Chateigner D., Downs R. T., et al., “Crystallography Open Database – An Open‐Access Collection of Crystal Structures,” Journal of Applied Crystallography 42, no. 4 (2009): 726–729. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 158. Haastrup S., Strange M., Pandey M., et al., “The Computational 2D Materials Database: High‐Throughput Modeling and Discovery of Atomically Thin Crystals,” Nature Communications 9, no. 3249 (2018): 20. [Google Scholar]
  • 159. Hachmann J., Olivares‐Amaya R., Atahan‐Evrenk S., et al., “The Harvard Clean Energy Project: Large‐Scale Computational Screening and Design of Organic Photovoltaics on the World Community Grid,” Journal of Physical Chemistry Letters 2, no. 17 (2011): 2241–2251. [Google Scholar]
  • 160. Hellenbrandt M., “The Inorganic Crystal Structure Database (ICSD)—Present and Future,” Crystallography Reviews 10, no. 1 (2004): 17–22. [Google Scholar]
  • 161. Hill J., Mannodi‐Kanakkithodi A., Ramprasad R., and Meredig B., “Materials Data Infrastructure and Materials Informatics,” Computational Materials System Design (Springer Nature, 2018): 193–225. [Google Scholar]
  • 162. Jain A., Ong S. P., Hautier G., et al., “Commentary: The Materials Project: A Materials Genome Approach to Accelerating Materials Innovation,” APL Materials 1, no. 1 (2013): 011002. [Google Scholar]
  • 163. Kirklin S., Saal J. E., Meredig B., et al., “The Open Quantum Materials Database (OQMD): Assessing the Accuracy of DFT Formation Energies,” npj Computational Materials 1, no. 1 (2015): 15010. [Google Scholar]
  • 164. Landis D. D., Hummelshøj J. S., Nestorov S., et al., “The Computational Materials Repository,” Computing in Science & Engineering 14, no. 6 (2012): 51–57. [Google Scholar]
  • 165. Puchala B., Tarcea G., Marquis E. A., Hedstrom M., Jagadish H., and Allison J. E., “The Materials Commons: A Collaboration Platform and Information Repository for the Global Materials Community,” Jom Journal of the Minerals Metals and Materials Society 68 (2016): 2035–2044. [Google Scholar]
  • 166. Ramakrishnan R., Dral P. O., Rupp M., and Von Lilienfeld O. A., “Quantum Chemistry Structures and Properties of 134 Kilo Molecules,” Scientific Data 1, no. 1 (2014): 140022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 167. Saal J. E., Kirklin S., Aykol M., Meredig B., and Wolverton C., “Materials Design and Discovery With High‐Throughput Density Functional Theory: The Open Quantum Materials Database (OQMD),” Jom Journal of the Minerals Metals and Materials Society 65 (2013): 1501–1509. [Google Scholar]
  • 168. Sterling T. and Irwin J. J., “ZINC 15 – Ligand Discovery for Everyone,” Journal of Chemical Information and Modelling 55, no. 11 (2015): 2324–2337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 169. Stevanović V., Lany S., Zhang X., and Zunger A., “Correcting Density Functional Theory for Accurate Predictions of Compound Enthalpies of Formation: Fitted Elemental‐Phase Reference Energies,” Physical Review B 85, no. 11 (2012): 115104. [Google Scholar]
  • 170. Villars P., Onodera N., and Iwata S., “The Linus Pauling File (LPF) and Its Application to Materials Design,” Journal of Alloys and Compounds 279, no. 1 (1998): 1–7. [Google Scholar]
  • 171. Zakutayev A., Wunder N., Schwarting M., et al., “An Open Experimental Database for Exploring Inorganic Materials,” Scientific Data 5, no. 1 (2018): 180053. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 172. Fang J., Xie M., He X., et al., “Machine Learning Accelerates the Materials Discovery,” Materials Today Communications 33 (2022): 104900. [Google Scholar]
  • 173. Mattur M. N., Nagappan N., Rath S., and Thomas T., “Prediction of Nature of Band Gap of Perovskite Oxides (ABO3) Using a Machine Learning Approach,” Journal of Materiomics 8, no. 5 (2022): 937–948. [Google Scholar]
  • 174. Garrity K. F., Bennett J. W., Rabe K. M., and Vanderbilt D., “Pseudopotentials for High‐Throughput DFT Calculations,” Computational Materials Science 81 (2014): 446–452. [Google Scholar]
  • 175. Tang L., Leung P., Xu Q., and Flox C., “Machine Learning Orchestrating the Materials Discovery and Performance Optimization of Redox Flow Battery,” ChemElectroChem 11 (2024): 202400024. [Google Scholar]
  • 176. Morgan D. and Jacobs R., “Opportunities and Challenges for Machine Learning in Materials Science,” Annual Review of Materials Research 50, no. 1 (2020): 71–103. [Google Scholar]
  • 177. Mori‐Sánchez P., Cohen A. J., and Yang W., “Localization and Delocalization Errors in Density Functional Theory and Implications for Band‐Gap Prediction,” Physical Review Letters 100, no. 14 (2008): 146401, 10.1103/PhysRevLett.100.146401. [DOI] [PubMed] [Google Scholar]
  • 178. Perdew J. P. and Levy M., “Physical Content of the Exact Kohn‐Sham Orbital Energies: Band Gaps and Derivative Discontinuities,” Physical Review Letters 51, no. 20 (1983): 1884–1887. [Google Scholar]
  • 179. Sham L. J. and Schlüter M., “Density‐Functional Theory of the Energy Gap,” Physical Review Letters 51, no. 20 (1983): 1888–1891. [Google Scholar]
  • 180. Jain A., Ong S. P., Chen W., et al., “FireWorks: A Dynamic Workflow System Designed for High‐Throughput Applications,” Concurrency and Computation: Practice and Experience 27, no. 17 (2015): 5037–5059. [Google Scholar]
  • 181. Janssen J., Surendralal S., Lysogorskiy Y., et al., “pyiron: An Integrated Development Environment for Computational Materials Science,” Computational Materials Science 163 (2019): 24–36. [Google Scholar]
  • 182. Lambert H., Fekete A., Kermode J. R., and De Vita A., “Imeall: A Computational Framework for the Calculation of the Atomistic Properties of Grain Boundaries,” Computer Physics Communications 232 (2018): 256–263. [Google Scholar]
  • 183. Larsen A. H., Mortensen J. J., Blomqvist J., et al., “The Atomic Simulation Environment—A Python Library for Working with Atoms,” Journal of Physics: Condensed Matter 29, no. 27 (2017): 273002. [DOI] [PubMed] [Google Scholar]
  • 184. Mathew K., Montoya J. H., Faghaninia A., et al., “Atomate: A High‐Level Interface To Generate, Execute, and Analyze Computational Materials Science Workflows,” Computational Materials Science 139 (2017): 140–152. [Google Scholar]
  • 185. Mathew K., Singh A. K., Gabriel J. J., et al., “MPInterfaces: A Materials Project Based Python Tool for High‐Throughput Computational Screening of Interfacial Systems,” Computational Materials Science 122 (2016): 183–190. [Google Scholar]
  • 186. Ong S. P., Richards W. D., Jain A., et al., “Python Materials Genomics (pymatgen): A Robust, Open‐Source Python Library for Materials Analysis,” Computational Materials Science 68 (2013): 314–319. [Google Scholar]
  • 187. Pizzi G., Cepellotti A., Sabatini R., Marzari N., and Kozinsky B., “AiiDA: Automated Interactive Infrastructure and Database for Computational Science,” Computational Materials Science 111 (2016): 218–230. [Google Scholar]
  • 188. Supka A. R., Lyons T. E., Liyanage L., et al., “AFLOWπ: A Minimalist Approach to High‐Throughput Ab Initio Calculations Including the Generation of Tight‐Binding Hamiltonians,” Computational Materials Science 136 (2017): 76–84. [Google Scholar]
  • 189. Salustiano R. E. and dos Reis Filho C. A., “Signal‐Level Sensor Fusion Applied to Monitoring Environment Conditions,” 2006 International Caribbean Conference on Devices, Circuits and Systems (IEEE, 2006): 261–265. [Google Scholar]
  • 190. Cochinwala M., Kurien V., Lalk G., and Shasha D., “Efficient Data Reconciliation,” Information Sciences 137, no. 1‐4 (2001): 1–15. [Google Scholar]
  • 191. Goibert M., Calauzenes C., Irurozki E., and Clémençon S., “Robust Consensus in Ranking Data Analysis: Definitions, Properties and Computational Issues,” International Conference on Machine Learning (PMLR, 2023): 11584–11597. [Google Scholar]
  • 192. Pilania G., Gubernatis J. E., and Lookman T., “Multi‐Fidelity Machine Learning Models for Accurate Bandgap Predictions of Solids,” Computational Materials Science 129 (2017): 156–163. [Google Scholar]
  • 193. Fernández‐Godino M. G., Review of Multi‐Fidelity Models, arXiv (2016): 160907196.
  • 194. Lookman T., Balachandran P. V., Xue D., and Yuan R., “Active Learning in Materials Science With Emphasis on Adaptive Sampling Using Uncertainties for Targeted Design,” npj Computational Materials 5, no. 1 (2019): 21. [Google Scholar]
  • 195. Wang A., Liang H., McDannald A., Takeuchi I., and Kusne A. G., “Benchmarking Active Learning Strategies for Materials Optimization and Discovery,” Oxford Open Materials Science 2, no. 1 (2022): itac006. [Google Scholar]
  • 196. Zheng A. and Casari A., Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists (O'Reilly Media, Inc., 2018). [Google Scholar]
  • 197. Ward L., Dunn A., Faghaninia A., et al., “Matminer: An Open Source Toolkit for Materials Data Mining,” Computational Materials Science 152 (2018): 60–69. [Google Scholar]
  • 198. Bartók A. P., Kondor R., and Csányi G., “On Representing Chemical Environments,” Physical Review B 87, no. 18 (2013): 184115. [Google Scholar]
  • 199. Kauwe S. K., Graser J., Vazquez A., and Sparks T. D., “Machine Learning Prediction of Heat Capacity for Solid Inorganics,” Integrating Materials and Manufacturing Innovation 7 (2018): 43–51. [Google Scholar]
  • 200. Sanchez‐Lengeling B. and Aspuru‐Guzik A., “Inverse Molecular Design Using Machine Learning: Generative Models for Matter Engineering,” Science 361, no. 6400 (2018): 360–365. [DOI] [PubMed] [Google Scholar]
  • 201. Wang J., Xu P., Ji X., Li M., and Lu W., “Feature Selection in Machine Learning for Perovskite Materials Design and Discovery,” Materials 16, no. 8 (2023): 3134. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 202. Balasubramanian V., Ho S.‐S., and Vovk V., Conformal Prediction for Reliable Machine Learning: Theory, Adaptations and Applications (Elsevier, 2014). [Google Scholar]
  • 203. Lazar C., Taminau J., Meganck S., et al., “A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis,” IEEE/ACM Transactions on Computational Biology and Bioinformatics 9, no. 4 (2012): 1106–1119. [DOI] [PubMed] [Google Scholar]
  • 204. Blum A. L. and Langley P., “Selection of Relevant Features and Examples in Machine Learning,” Artificial Intelligence 97, no. 1‐2 (1997): 245–271. [Google Scholar]
  • 205. Guyon I. and Elisseeff A., “An Introduction to Variable and Feature Selection,” Journal of Machine Learning Research 3 (2003): 1157–1182. [Google Scholar]
  • 206. Chandrashekar G. and Sahin F., “A Survey on Feature Selection Methods,” Computers & Electrical Engineering 40, no. 1 (2014): 16–28. [Google Scholar]
  • 207. Remeseiro B. and Bolon‐Canedo V., “A Review of Feature Selection Methods in Medical Applications,” Computers in Biology and Medicine 112 (2019): 103375. [DOI] [PubMed] [Google Scholar]
  • 208. Amos R. D. and Kobayashi R., “Feature Engineering for Materials Chemistry—Does Size Matter?,” Journal of Chemical Information and Modeling 59, no. 5 (2019): 1873–1881. [DOI] [PubMed] [Google Scholar]
  • 209. Liu Y., Tan X., Liang J., Han H., Xiang P., and Yan W., “Machine Learning for Perovskite Solar Cells and Component Materials: Key Technologies and Prospects,” Advanced Functional Materials 33, no. 17 (2023): 2214271. [Google Scholar]
  • 210. Crisci C., Ghattas B., and Perera G., “A Review of Supervised Machine Learning Algorithms and Their Applications to Ecological Data,” Ecological Modelling 240 (2012): 113–122. [Google Scholar]
  • 211. Jiang B., Zhu X., Tian X., Yi W., and Wang S., “Integrating Interpolation and Extrapolation: A Hybrid Predictive Framework for Supervised Learning,” Applied Sciences 14, no. 15 (2024): 6414. [Google Scholar]
  • 212. Dong X., Yu Z., Cao W., Shi Y., and Ma Q., “A Survey on Ensemble Learning,” Frontiers of Computer Science 14, no. 2 (2020): 241–258. [Google Scholar]
  • 213. Jospin L. V., Laga H., Boussaid F., Buntine W., and Bennamoun M., “Hands‐On Bayesian Neural Networks—A Tutorial for Deep Learning Users,” IEEE Computational Intelligence Magazine 17, no. 2 (2022): 29–48. [Google Scholar]
  • 214. Ng A. and Jordan M., “On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes,” Advances in Neural Information Processing Systems 14 (2001). [Google Scholar]
  • 215. Li R., Deng Q., Tian D., Zhu D., and Lin B., “Predicting Perovskite Performance With Multiple Machine‐Learning Algorithms,” Crystals 11, no. 7 (2021): 818. [Google Scholar]
  • 216. Montgomery D. C., Peck E. A., and Vining G. G., Introduction to Linear Regression Analysis (John Wiley & Sons, 2021). [Google Scholar]
  • 217. Frank M. and Wolfe P., “An Algorithm for Quadratic Programming,” Naval Research Logistics Quarterly 3, no. 1–2 (1956): 95–110. [Google Scholar]
  • 218. Feng S. and Wang J., “Prediction of Organic–Inorganic Hybrid Perovskite Band Gap by Multiple Machine Learning Algorithms,” Molecules (Basel, Switzerland) 29, no. 2 (2024): 499. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 219. Jarin S., Yuan Y., Zhang M., et al., “Predicting the Crystal Structure and Lattice Parameters of the Perovskite Materials via Different Machine Learning Models Based on Basic Atom Properties,” Crystals 12, no. 11 (2022): 1570. [Google Scholar]
  • 220. Gladkikh V., Kim D. Y., Hajibabaei A., Jana A., Myung C. W., and Kim K. S., “Machine Learning for Predicting the Band Gaps of ABX3 Perovskites From Elemental Properties,” Journal of Physical Chemistry C 124, no. 16 (2020): 8905–8918. [Google Scholar]
  • 221. Müller K.‐R., Mika S., Tsuda K., and Schölkopf K., An Introduction to Kernel‐Based Learning Algorithms (CRC Press, 2018). [DOI] [PubMed] [Google Scholar]
  • 222. Clark L. A. and Pregibon D., Tree‐Based Models (Routledge, 2017). [Google Scholar]
  • 223. Raileanu L. E. and Stoffel K., “Theoretical Comparison Between the Gini Index and Information Gain Criteria,” Annals of Mathematics and Artificial Intelligence 41 (2004): 77–93. [Google Scholar]
  • 224. Breiman L., “Random Forests,” Machine Learning 45 (2001): 5–32. [Google Scholar]
  • 225. Ke G., Meng Q., Finley T., et al., “LightGBM: A Highly Efficient Gradient Boosting Decision Tree,” Advances in Neural Information Processing Systems 30 (2017): 3149–3157. [Google Scholar]
  • 226. Chen T. and Guestrin C., “XGBoost: A Scalable Tree Boosting System,” Proceedings of the 22nd ACM sigkdd International Conference On Knowledge Discovery And Data Mining (2016): 785–794.
  • 227. Datta S., Baul A., Sarker G. C., Sadhu P. K., and Hodges D. R., “A Comprehensive Review of the Application of Machine Learning in Fabrication and Implementation of Photovoltaic Systems,” IEEE Access 11 (2023): 77750–77778. [Google Scholar]
  • 228. Liu Y., Yan W., Zhu H., Tu Y., Guan L., and Tan X., “Study on Bandgap Predications of ABX3‐Type Perovskites by Machine Learning,” Organic Electronics 101 (2022): 106426. [Google Scholar]
  • 229. Zhu C., Liu Y., Wang D., et al., “Exploration of Highly Stable and Highly Efficient New Lead‐Free Halide Perovskite Solar Cells by Machine Learning,” Cell Reports Physical Science 5, no. 12 (2024): 102321. [Google Scholar]
  • 230. Grinsztajn L., Oyallon E., and Varoquaux G., “Why do Tree‐Based Models Still Outperform Deep Learning on Typical Tabular Data?,” Advances in Neural Information Processing Systems 35 (2022): 507–520. [Google Scholar]
  • 231. Touati S., Benghia A., Hebboul Z., Lefkaier I. K., Kanoun M. B., and Goumri‐Said S., “Machine Learning Models for Efficient Property Prediction of ABX3 Materials: A High‐Throughput Approach,” ACS Omega 9, no. 48 (2024): 47519–47531. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 232. Xie T. and Grossman J. C., “Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties,” Physical Review Letters 120, no. 14 (2018): 145301. [DOI] [PubMed] [Google Scholar]
  • 233. Liu Y., Tan X., Xiang P., et al., “Machine Learning as a Characterization Method For Analysis And Design Of Perovskite Solar Cells,” Materials Today Physics 42 (2024): 101359. [Google Scholar]
  • 234. Yan W., Liu Y., Zang Y., et al., “Machine Learning Enabled Development of Unexplored Perovskite Solar Cells With High Efficiency,” Nano Energy 99 (2022): 107394. [Google Scholar]
  • 235. Liu Y., Yan W., Han S., et al., “How Machine Learning Predicts and Explains the Performance of Perovskite Solar Cells,” Solar RRL 6, no. 6 (2022): 2101100. [Google Scholar]
  • 236. Mao L. and Xiang C., “A Comprehensive Review of Machine Learning Applications in Perovskite Solar Cells: Materials Discovery, Device Performance, Process Optimization and Systems Integration,” Materials Today Energy 47 (2025): 101742. [Google Scholar]
  • 237. Usama M., Qadir J., Raza A., et al., “Unsupervised Machine Learning for Networking: Techniques, Applications and Research Challenges,” IEEE Access 7 (2019): 65579–65615. [Google Scholar]
  • 238. Hastie T., Tibshirani R., and Friedman J., The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Springer Nature, 2009): 485. [Google Scholar]
  • 239. Chaudhry M., Shafi I., Mahnoor M., Vargas D. L. R., Thompson E. B., and Ashraf I., “A Systematic Literature Review on Identifying Patterns Using Unsupervised Clustering Algorithms: A Data Mining Perspective,” Symmetry 15, no. 9 (2023): 1679. [Google Scholar]
  • 240. Greenacre M., Groenen P. J., Hastie T., d'Enza A. I., Markos A., and Tuzhilina E., “Principal Component Analysis,” Nature Reviews Methods Primers 2, no. 1 (2022): 100. [Google Scholar]
  • 241. Wang Y., Huang H., Rudin C., and Shaposhnik Y., “Understanding How Dimension Reduction Tools Work: An Empirical Approach to Deciphering t‐SNE, UMAP, TriMap, and PaCMAP for Data Visualization,” Journal of Machine Learning Research 22, no. 1 (2021): 1–73. [Google Scholar]
  • 242. Hosni Z., Achour S., Saadi F., Lin J., Sheng J., and Al Qaraghuli M., “Specific Surface Area (SSA) of Perovskites With Uncertainty Estimation Approach,” Computational Materials Science 249 (2025): 113668. [Google Scholar]
  • 243. Mai Y., Tang J., Meng H., et al., “Machine Learning‐Based Screening of Two‐Dimensional Perovskite Organic Spacers,” Advanced Composites and Hybrid Materials 7, no. 3 (2024): 104. [Google Scholar]
  • 244. Zhang L. and He M., “Unsupervised Machine Learning for Solar Cell Materials from the Literature,” Journal of Applied Physics 131, no. 6 (2022): 064902. [Google Scholar]
  • 245. Zhang L., He M., Huang E., et al., “Overcoming Language Barrier for Scientific Studies via Unsupervised Literature Learning: Case Study on Solar Cell Materials Prediction,” Solar RRL 8, no. 10 (2024): 2301079. [Google Scholar]
  • 246. Arulkumaran K., Deisenroth M. P., Brundage M., and Bharath A. A., “Deep Reinforcement Learning: A Brief Survey,” IEEE Signal Processing Magazine 34, no. 6 (2017): 26–38. [Google Scholar]
  • 247. Wiering M. A. and Van Otterlo M., “Reinforcement Learning,” Adaptation, Learning, and Optimization 12, no. 3 (2012): 729. [Google Scholar]
  • 248. Li N., Li X., and Xu Z. Q., Policy Iteration Reinforcement Learning Method for Continuous‐Time Linear–Quadratic Mean‐Field Control Problems, (IEEE Transactions on Automatic Control, 2023). [Google Scholar]
  • 249. Patel N., Lee S., Mannelli S. S., Goldt S., and Saxe A. M., “The RL Perceptron: Dynamics Of Policy Learning in High Dimensions,” ICLR 2023 workshop on physics for machine learning (2023): 230610404. [Google Scholar]
  • 250. Sabbioni L., Corda F., and Restelli M., “Stepsize Learning for Policy Gradient Methods in Contextual Markov Decision Processes,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases (Springer, 2023): 506–523. [Google Scholar]
  • 251. Abdullah H. M., Gastli A., and Ben‐Brahim L., Reinforcement Learning Based EV Charging Management Systems–A Review (IEEE Access, 2021): 41506–41531. [Google Scholar]
  • 252. Du S. S., Kakade S. M., Wang R., and Yang L. F., “Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?,” arXiv preprint arXiv:1910.03016 (2019).
  • 253. Jiang A. and Yoshie O., “A Reinforcement Learning Method for Optical Thin‐Film Design,” IEICE Transactions on Electronics 105, no. 2 (2022): 95–101. [Google Scholar]
  • 254. Sajedian I., Badloe T., Lee H., and Rho J., “Deep Q‐Network to Produce Polarization‐Independent Perfect Solar Absorbers: A Statistical Report,” Nano Convergence 7 (2020): 26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 255. Anstine D. M. and Isayev O., “Generative Models as an Emerging Paradigm in the Chemical Sciences,” Journal of the American Chemical Society 145, no. 16 (2023): 8736–8750. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 256. Liu Y., Yang Z., Yu Z., et al., “Generative Artificial Intelligence and Its Applications in Materials Science: Current Situation and Future Perspectives,” Journal of Materiomics 9, no. 4 (2023): 798–816. [Google Scholar]
  • 257. Sipilä M., Mehryary F., Pyysalo S., Ginter F., and Todorović M., “Annotated Textual Dataset PV600 of Perovskite Bandgaps for Information Extraction from Literature,” Scientific Data 12, no. 1 (2025): 1401. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 258. Zeni C., Pinsler R., Zügner D., et al., “A Generative Model for Inorganic Materials Design,” Nature 639, no. 8055 (2025): 624–632. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 259. Jabbar H. and Khan R. Z., “Methods to Avoid Over‐Fitting and Under‐Fitting in Supervised Machine Learning (Comparative Study),” Computer Science, Communication and Instrumentation Devices 70, no. 10.3850 (2015): 978–981. [Google Scholar]
  • 260. Ying X., “An Overview of Overfitting and Its Solutions,” Journal of Physics: Conference Series 1168 (2019): 022022. [Google Scholar]
  • 261. Feurer M. and Hutter F., Automated Machine Learning: Methods, Systems, Challenges (Springer, 2019): 3. [Google Scholar]
  • 262. Raschka S., “Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning,” arXiv preprint arXiv:1811.12808 (2018).
  • 263. Wong T.‐T. and Yeh P.‐Y., “Reliable Accuracy Estimates From k ‐Fold Cross Validation,” IEEE Transactions on Knowledge and Data Engineering 32, no. 8 (2019): 1586–1594. [Google Scholar]
  • 264. Vehtari A., Gelman A., and Gabry J., “Practical Bayesian Model Evaluation Using Leave‐One‐Out Cross‐Validation and WAIC,” Statistics and Computing 27 (2017): 1413–1432. [Google Scholar]
  • 265. Merola G. M., Reducing Cross‐Validation Variance through Seed Blocking in Hyperparameter Tuning (Springer Nature, 2023). [Google Scholar]
  • 266. Liashchynskyi P. and Liashchynskyi P., “Grid Search, Random Search, Genetic Algorithm: A Big Comparison for NAS,” arXiv preprint arXiv:1912.06059 (2019).
  • 267. Syarif I., Prugel‐Bennett A., and Wills G., “SVM Parameter Optimization using Grid Search and Genetic Algorithm to Improve Classification Performance,” TELKOMNIKA (Telecommunication Computing Electronics and Control) 14, no. 4 (2016): 1502. [Google Scholar]
  • 268. Probst P., Wright M. N., and Boulesteix A. L., “Hyperparameters and Tuning Strategies for Random Forest,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 9, no. 3 (2019): 1301. [Google Scholar]
  • 269. Pontes F. J., Amorim G., Balestrassi P. P., Paiva A., and Ferreira J. R., “Design of Experiments and Focused Grid Search for Neural Network Parameter Optimization,” Neurocomputing 186 (2016): 22–34. [Google Scholar]
  • 270. Lundberg S. M. and Lee S.‐I., “A Unified Approach to Interpreting Model Predictions,” Advances in Neural Information Processing Systems 30 (2017). [Google Scholar]
  • 271. Khan A., Kandel J., Tayara H., and Chong K. T., “Predicting the Bandgap and Efficiency of Perovskite Solar Cells Using Machine Learning Methods,” Molecular Informatics 43, no. 2 (2024): 202300217. [DOI] [PubMed] [Google Scholar]
  • 272. Guo M.‐H., Xu T.‐X., Liu J.‐J., et al., “Attention Mechanisms in Computer Vision: A Survey,” Computational Visual Media 8, no. 3 (2022): 331–368. [Google Scholar]
  • 273. Niu Z., Zhong G., and Yu H., “A Review on the Attention Mechanism of Deep Learning,” Neurocomputing 452 (2021): 48–62. [Google Scholar]
  • 274. Qin Z., Sun W., Deng H., et al., “cosFormer: Rethinking Softmax in Attention,” arXiv preprint arXiv:2202.08791 (2022).
  • 275. Lindsay R. K., Buchanan B. G., Feigenbaum E. A., and Lederberg J., “DENDRAL: A Case Study of the First Expert System for Scientific Hypothesis Formation,” Artificial Intelligence 61, no. 2 (1993): 209–261. [Google Scholar]
  • 276. Guo Z. and Lin B., “Machine Learning Stability and Band Gap of Lead‐Free Halide Double Perovskite Materials for Perovskite Solar Cells,” Solar Energy 228 (2021): 689–699. [Google Scholar]
  • 277. Mahmood A. and Wang J.‐L., “Machine Learning for High Performance Organic Solar Cells: Current Scenario and Future Prospects,” Energy & Environmental Science 14, no. 1 (2021): 90–105. [Google Scholar]
  • 278. Javed S. G., Khan A., Majid A., Mirza A. M., and Bashir J., “Lattice Constant Prediction of Orthorhombic ABO3 Perovskites using Support Vector Machines,” Computational Materials Science 39, no. 3 (2007): 627–634. [Google Scholar]
  • 279. Hsu C.‐W., Chang C.‐C., and Lin C.‐J., A Practical Guide to Support Vector Classification (Taipei, 2003): 1396–1400. [Google Scholar]
  • 280. Xu L., Wencong L., Chunrong P., Qiang S., and Jin G., “Two Semi‐Empirical Approaches for the Prediction of Oxide Ionic Conductivities in ABO3 Perovskites,” Computational Materials Science 46, no. 4 (2009): 860–868. [Google Scholar]
  • 281. Kim C., Pilania G., and Ramprasad R., “Machine Learning Assisted Predictions of Intrinsic Dielectric Breakdown Strength of ABX3 Perovskites,” Journal of Physical Chemistry C 120, no. 27 (2016): 14575–14580. [Google Scholar]
  • 282. Zhang Y. and Xu X., “Machine Learning the Magnetocaloric Effect in Manganites from Lattice Parameters,” Applied Physics A 126, no. 5 (2020): 341. [Google Scholar]
  • 283. Li G., Wang C., Huang J., Huang L., and Zhu Y., “Machine Learning Guided Rapid Discovery of Narrow‐Bandgap Inorganic Halide Perovskite Materials,” Applied Physics A 130, no. 2 (2024): 93. [Google Scholar]
  • 284. Sedgwick P., “Confounding in Clinical Trials,” Bmj 345 (2012): 7951. [Google Scholar]
  • 285. Ekanayake I., Meddage D., and Rathnayake U., “A Novel Approach to Explain the Black‐Box Nature of Machine Learning in Compressive Strength Predictions of Concrete Using Shapley Additive Explanations (SHAP),” Case Studies in Construction Materials 16 (2022): 01059. [Google Scholar]
  • 286. Lu S., Zhou Q., Ouyang Y., Guo Y., Li Q., and Wang J., “Accelerated Discovery of Stable Lead‐Free Hybrid Organic‐Inorganic Perovskites via Machine Learning,” Nature Communications 9, no. 1 (2018): 3405. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 287. Liu H., Feng J., and Dong L., “Quick Screening Stable Double Perovskite Oxides for Photovoltaic Applications by Machine Learning,” Ceramics International 48, no. 13 (2022): 18074–18082. [Google Scholar]
  • 288. Talapatra A., Uberuaga B. P., Stanek C. R., and Pilania G., “Band Gap Predictions of Double Perovskite Oxides Using Machine Learning,” Communications Materials 4, no. 1 (2023): 46. [Google Scholar]
  • 289. Sharma S., Ward Z. D., Bhimani K., et al., “Machine Learning‐Aided Band Gap Engineering of BaZrS3 Chalcogenide Perovskite,” ACS Applied Materials & Interfaces 15, no. 15 (2023): 18962–18972. [DOI] [PubMed] [Google Scholar]
  • 290. Agiorgousis M. L., Sun Y. Y., Choe D. H., West D., and Zhang S., “Machine Learning Augmented Discovery of Chalcogenide Double Perovskites for Photovoltaics,” Advanced Theory and Simulations 2, no. 5 (2019): 1800173. [Google Scholar]
  • 291. Mishra S. and Pradhan R. K., “Analyzing the Impact of Feature Correlation on Classification Acuracy of Machine Learning Model,” in 2023 International Conference on Artificial Intelligence and Smart Communication (AISC) (IEEE, 2023): 879–883. [Google Scholar]
  • 292. Rickert C. A., Henkel M., and Lieleg O., “An Efficiency‐Driven, Correlation‐Based Feature Elimination Strategy for Small Datasets,” APL Machine Learning 1 (2023): 016105. [Google Scholar]
  • 293. Ming C., Chen Z., Zhang F., et al., “Mixed Chalcogenide‐Halides for Stable, Lead‐Free and Defect‐Tolerant Photovoltaics: Computational Screening and Experimental Validation of CuBiSCl2 With Ideal Band Gap,” Advanced Functional Materials 32, no. 27 (2022): 2112682. [Google Scholar]
  • 294. Davies D. W., Butler K. T., Skelton J. M., Xie C., Oganov A. R., and Walsh A., “Computer‐Aided Design of Metal Chalcohalide Semiconductors: From Chemical Composition to Crystal Structure,” Chemical Science 9, no. 4 (2018): 1022–1030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 295. Ma X.‐Y., Lewis J. P., Yan Q.‐B., and Su G., “Accelerated Discovery of Two‐Dimensional Optoelectronic Octahedral Oxyhalides via High‐Throughput Ab Initio Calculations and Machine Learning,” Journal of Physical Chemistry Letters 10, no. 21 (2019): 6734–6740. [DOI] [PubMed] [Google Scholar]
  • 296. López C., Caño I., Rovira D., et al., “Machine‐Learning Aided First‐Principles Prediction of Earth‐Abundant Pnictogen Chalcohalide Solid Solutions for Solar‐Cell Devices,” Advanced Functional Materials 34, (2024): 2406678. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting file: advs74952‐sup‐0001‐SuppMat.docx

ADVS-13-e74952-s001.docx (407KB, docx)

Data Availability Statement

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.


Articles from Advanced Science are provided here courtesy of Wiley

RESOURCES