Skip to main content
Science Advances logoLink to Science Advances
. 2026 Feb 11;12(7):eaeb1323. doi: 10.1126/sciadv.aeb1323

Navigating high-dimensional processing parameters in organic photovoltaics via a multitier machine learning framework

Yaping Wen 1,2,*, Yipu Zhang 1, Haibo Ma 2,*
PMCID: PMC12893291  PMID: 41671361

Abstract

Optimizing organic photovoltaic (OPV) performance requires navigating the high-dimensional, interdependent processing parameters governing bulk heterojunction morphology. To address this, we have constructed a standardized database integrating donor/acceptor pairs, nine key fabrication parameters, and device efficiencies, consolidating over a decade of experimental results. Leveraging this resource, we developed a three-tiered machine learning framework using gradient boosting regression trees. The strategy progresses from single-parameter baseline models to stage-combined models that capture intraprocess synergies, culminating in a global nine-parameter optimization model. This final model achieves a Pearson correlation of >0.9 and a success rate of >80% in identifying optimal multiparameter configurations. Validation on 78 external systems, each containing a previously unseen donor or acceptor, demonstrates robust generalization with >75% accuracy in predicting the optimal or secondary condition for individual parameters. This work establishes a practical, data-driven framework for accelerating the rational optimization of OPV photoactive layers.


A machine learning framework navigates complex processing parameters to accelerate the optimization of organic photovoltaics.

INTRODUCTION

Organic photovoltaics (OPVs) represent a promising sustainable energy technology, offering advantages in cost-effectiveness, lightweight design, environmental compatibility, and potential for flexible and semitransparent devices (14). Progress in OPV has been driven by continuous development in materials design and device engineering, where innovations in donor/acceptor (D/A) systems, particularly nonfullerene acceptors (510) and wide-bandgap polymer donors (1113), have substantially expanded the materials library. Actually, achieving commercially viable power conversion efficiencies (PCEs) also critically depends on precise optimization of the fabrication process, specifically nanoscale morphology control within the bulk heterojunction (BHJ) photoactive layer, which is a decisive factor in governing fundamental processes such as exciton formation, diffusion, dissociation, and charge transport (1418). Recent breakthroughs in morphology control have elevated champion efficiencies beyond 20% (1923), underscoring active layer optimization as a pivotal determinant of device performance.

Despite progress, optimizing BHJ morphology for new materials presents ongoing‌ challenges. Optimal active layer processing conditions exhibit strong system dependence, where subtle molecular variations such as alkyl chain topology or π-conjugation length may substantially alter processing-structure-property relationships, necessitating customized optimization for each D/A combination (5, 24). Moreover, the connections between processing conditions of active layer (e.g., solvent selection, additive treatments, solution concentration, deposition techniques, and annealing conditions), the resulting thin-film morphology, as well as the final device efficiency are highly nonlinear and interdependent. Navigating this high-dimensional parameter space requires labor-intensive exploration, while the intricate coupling between processing variables and device performance resists empirical prediction and displays limited cross-material transferability. Conventional approaches to OPV device optimization typically follow sequential protocols requiring considerable time investment, with inherent constraints in probing complex parameter interdependencies or extrapolating optimal conditions to unexplored systems. Quantum chemical methods offer valuable mechanistic insights but face constraint in timescales and system sizes relevant for experimental implementation.

Machine learning (ML) has accelerated the discovery of previously unknown D/A molecules for OPV by enabling accurate prediction of optoelectronic properties (e.g., absorption spectrum and energy levels) (2527) and photovoltaic performance (e.g., PCE) (2846). However, the application of ML to optimize device fabrication processes for given D/A pairs faces multiple barriers rooted in experimental realities. High experimental costs yield sparse datasets with incomplete coverage of the multidimensional parameter space, while data heterogeneity arising from laboratory-specific equipment, environmental conditions (e.g. temperature and humidity), and operator techniques compromises dataset consistency. Meanwhile, minor undocumented variations in processing details, such as spin coating parameters and posttreatment conditions, often lead to different morphological outcomes. Although prior efforts have focused on using ML to assist in regulating one or two processing conditions (4750), it is difficult to directly treat the entire fabrication process as a black box without considering the coupling effects between processing optimization steps. Crucially, efficiently identifying global optima within these interdependent high-dimensional fabrication parameter spaces of photoactive layer remains a key but unexplored computational and engineering issue.

Here, we develop a multitier ML framework for OPV photoactive layer optimization to address these interconnected obstacles. As illustrated in Fig. 1A, our workflow begins with creating a standardized materials-processing-performance database, rigorously curated from experimental literature reported from 2016 to 2025 years and encompassing D/A molecular structures alongside nine key active layer processing parameters (D/A weight ratio, solvent type, additive type, D/A blend concentration, spin coating rate, D/A layer thickness, annealing temperature, and annealing time), as well as the device PCE. We then integrate three complementary descriptor classes: geometric fingerprints of D/A pairs, quantum chemically derived electronic structural properties, and the processing parameters themselves. Within this infrastructure, we implement a three-tiered modeling strategy using the gradient boosting regression tree (GBRT) algorithm, beginning with single-parameter models to establish baseline relationships for step-by-step optimization, advancing to stage-combined models that group parameters into solution preparation, spin coating film formation, and postprocessing stages to capture essential intrastage synergies, such as solvent-additive interplay, and culminating in full-parameter optimization that navigates the complete nine-dimensional space to identify global optima where isolated approaches miss emergent couplings (Fig. 1B and table S1). This integrated framework systematically decodes the nonlinear relationships linking photoactive layer fabrication processes, material properties, and device performance in OPV, enabling theoretically guided optimization across multiple complexity levels.

Fig. 1. A multitier ML framework for OPV active layer processing optimization.

Fig. 1.

(A) Workflow for standardized materials-processing-performance database creation, descriptor integration, modeling strategy implementation, and photoactive layer fabrication optimization. (B) Three-tiered modeling strategy, progressing from single-parameter baseline models to stage-combined models (grouping parameters into solution preparation, spin coating film formation, and postprocessing) and culminating in full nine-parameter global optimization to capture emergent couplings.

RESULTS

Building on the established OPV database and descriptor framework, this work focuses on nine critical processing parameters known to strongly influence photoactive layer performance. These processing parameters were investigated across three distinct modeling levels: single-parameters, stage-coupled parameters, and the full nine-parameter set. ML models incorporating these parameters were constructed at each level to enable targeted optimization.

Single active layer processing parameter models

Establishing quantitative baselines for isolated parameter impacts is essential to decouple complex process-property relationships. This univariate analysis emulates the controlled-variable approach fundamental to experimental optimization, systematically quantifying individual parameter influences while mitigating combinatorial complexity. Therefore, we first focus on the univariate level, analyzing the individual relationships between each of these nine key parameters, the D/A pair, and the target PCE. Data points and range/type for nine parameters are given in table S2. Solvent type and additive type inherently represent categorical parameters within the nine studied. To effectively integrate them into regression models, descriptors with continuous values that capture their fundamental properties were identified. The solvent/additive energy gap critically influences stability and potential photochemical reactions; an appropriate energy gap minimizes undesirable side reactions with photoactive materials and enhances morphological stability. Furthermore, solvent/additive polarity, characterized by dipole moment and polarizability, governs D/A preaggregation in solution and modulates molecular diffusion and phase separation kinetics during film formation. This process ultimately determines the domain size of the resulting interpenetrating network (relative to the exciton diffusion length) and phase purity, factors paramount for efficient charge generation, transport, and collection (51, 52). Consequently, energy gap, dipole moment, and polarizability were selected as key electronic structural descriptors to jointly describe solvent/additive types within our ML models. Chemical structures of all solvents and additives in the database are shown in figs. S1 and S2. Other details, including full name, abbreviation, calculated energy gap, dipole moment, and polarizability are provided in tables S3 and S4.

Analysis of kernel density distributions (Fig. 2) revealed distinct correlations between solvent/additive types and PCE performance. High-PCE devices (PCE > 14%) predominantly used nongreen solvents (e.g., CB and CF) or transitional green solvents (e.g., o-XY, Tol, m-XY, and Mes), while pure green solvents [e.g., CPME, LM, Ace, BuOH, IPA, ethanol (EtOH), and EA] were rarely associated with top-tier performance. A similar pattern emerged for additives. Nongreen additives (e.g., DIO, CN, DPE, DCBB, DBCL, DIM, DMF, DIB, TCB, and TTCl) and transitional green additives (e.g., PN and 2-MN) frequently yielded higher PCEs (>14%), whereas green additives (e.g., VA and MeTHF) typically correlated with lower efficiencies. This statistical observation underscores the considerable potential of transitional green solvents/additives for enabling high-performance in environmentally sustainable devices, aligning well with current OPV development priorities. Analysis of the seven continuous processing parameters in Fig. 2 further identified trends associated with high-performance devices. High PCE values are predominantly clustered in regions with a D/A weight ratio between 0.5 and 1 and additive volume ratios below 2%. Blend concentrations of 10 to 20 mg/ml, a D/A layer thickness of 50 to 100 nm, and annealing time under 20 min are favorable for achieving desirable PCE. Conversely, clear trends were less discernible for annealing temperature and spin coating rate due to the scattered distribution of high-PCE data points across their respective ranges.

Fig. 2. Kernel density distributions of individual active layer processing parameters with PCE.

Fig. 2.

Among nine processing parameters, solvent type and additive type are categorical attributes. Numbers of solvent type correspond as follows: 1. CB 2. CF 3. DCB 4. o-XY 5. TMB 6. TCE 7. THF 8. TCB 9. CPME 10. LM 11. Tol 12. m-XY 13. Mes 14. Ace 15. BuOH 16. IPA 17. EtOH 18. EA. Numbers of additive type correspond as follows: 0. None 1. DIO 2. CN 3. DPE 4. NMP 5. MeTHF 6. DMN 7. CB 8. DCB 9. o-XY 10. THF 11. SHNa 12. ODT 13. CBA 14. PN 15. DBE 16. VA 17. BN 18. DCBB 19. DBCL 20. DIM 21. 2-MN 22. DMF 23. DIB 24. TCB 25. TTCl. Full name of all solvents and additives can be found in tables S3 and S4.

Three descriptor combinations, Dev + Fp, Dev + Ele, and Dev + Fp + Ele, are generated by combining the active layer processing parameters (Dev) with geometric fingerprints (Fp) and electronic structural properties (Ele) calculated by quantum chemical methods, which were used as input to the ML models using the GBRT algorithm. See Materials and Methods for detailed description of descriptors as well as evaluation criteria of ML models. Predictive performance for the ML models based on nine single-parameters is provided in tables S5 to S7. Models based on different descriptor combinations exhibited comparable mean absolute error (MAE), root mean squared error (RMSE), and Pearson’s correlation coefficient (r). However, discernible differences emerged in their predictive capability for device process parameters within identical D/A material groups. Integration of device parameters with geometric features and electronic structural properties generally enhanced the model capability for pinpointing the best-performing process conditions. For instance, regarding the test set of D/A layer thickness, the successful prediction rate of the optimal device parameters (PCE@top1 = 90%) based on the Dev + Fp + Ele combination is higher than that using only Dev + Fp (PCE@top1 = 70%) and Dev + Ele (PCE@top1 = 80%), respectively. Figures 3 and 4 visualize the overall predictive performance and predictive power on device parameter of nine single-device parameter models using the Dev + Ele + Fp descriptor set. As shown in Fig. 3, the models demonstrated excellent overall predictive power, and Pearson correlation of seven parameters simultaneously exceeded 0.90 on both training and test sets. The remaining two parameters (solvent type and additive type, represented by energy gap, dipole moment, and polarizability) also achieved strong correlations above 0.80 in both sets, indicating highly reliable models for predicting the individual parameter values.

Fig. 3. Overall predictive performance of nine single active layer processing parameter models.

Fig. 3.

Pearson correlation between experimental PCE and predicted PCE of ML model based on the Dev + Ele + Fp descriptor combination for both training and test sets. rval, average Pearson’s correlation coefficient from 10-fold cross-validation on the training set; rtest, Pearson’s correlation coefficient on the held-out external test set.

Fig. 4. Predictive accuracy on active layer processing parameter of nine single-parameter models.

Fig. 4.

Success rate of the processing parameter set yielding the optimal PCE (PCE@top1), the secondary PCE (PCE@top2), and other cases (PCE@others) predicted by an ML model based on Dev + Fp + Ele descriptor combination. The evaluation is only conducted within data groups with the identical D/A pairs.

Regarding predictive capability for device process parameters, Fig. 4 reveals more detailed results. While the success rate of PCE@top1 across all parameters exceeded 85% on the training set, test set performance varied considerably. Models for parameters like D/A layer thickness and annealing time achieved an exceptional success rate of more than 90% for PCE@top1 on the test set. In contrast, identification accuracy for solvent type, additive type, additive volume ratio, and annealing temperature dropped below 70% on the test set. This disparity likely stems from data distribution challenges: the number of distinct D/A pair groups available per parameter setting and the quantity of comparable data points within those groups directly affect the model’s ability to learn generalizable relationships and subsequently identify the true optimum within unseen groups. Parameters associated with fewer distinct groups or limited intragroup variation provide less robust learning signals, hindering the model’s capacity to generalize its identification capability effectively to the test set.

Stage-combined active layer processing parameter models

Building on the insights gained from analyzing individual parameters, we note that actual OPV device fabrication progresses through sequential experimental stages where parameters within each stage exhibit substantial interactions. Capturing these intrastage synergies is essential for aligning optimization with the physical sequence of fabrication. Accordingly, parameters were logically grouped into three fabrication stages reflecting the experimental workflow: solution preparation (encompassing D/A weight ratio, solvent type, additive type, and additive volume ratio), spin coating film formation (involving D/A blend concentration, spin coating rate, and D/A layer thickness), and postprocessing (defined by annealing temperature and time), as depicted in Fig. 5A. Data sizes for the three stages of photoactive layer preparation are provided in table S8.

Fig. 5. Parameter grouping for the three stages of photoactive layer fabrication and data distribution within each stage.

Fig. 5.

(A) Three stages, namely, solution preparation, spin coating film formation, and postprocessing, along with as well as the composition of device processing parameters for each stage. (B) Device processing parameters and PCE joint distributions for the three stages, where the levels of PCE are filled with different colors. Full name of the solvents and additives can be found in tables S3 and S4.

The joint distributions of device parameters and PCE within each stage are visualized in Fig. 5B. To elucidate synergistic relationships within these stages, particularly for the complex four-dimensional space of solution preparation, we analyzed projections combining any two parameters with PCE. These projections consistently revealed that high performance is associated with specific parameter combinations, indicating that optimal outcomes arise from coordinated tuning within a stage rather than from isolated parameter adjustment. This principle emerged clearly in the spin coating film formation and postprocessing stages. Superior device efficiencies reliably occurred when blend concentrations of 10 to 20 mg/ml were coupled with spin coating rate near 3000 rpm and D/A layer thicknesses approximating 150 nm. Similarly, optimal postprocessing consistently involved annealing temperatures around 100°C applied for annealing time shorter than 10 min. Collectively, these observations underscore that achieving peak PCE necessitates compatible parameter combinations at each distinct fabrication step.

Stage-specific ML models constructed using the three descriptor combinations (Dev + Fp, Dev + Ele, and Dev + Ele + Fp) provided valuable insights. All descriptor sets achieved comparable and robust accuracy in predicting the overall PCE for each stage on both training and test sets (tables S9 to S11), confirming the ability to estimate efficiency from stage-specific parameters and material descriptors. However, when evaluating the models’ capacity to identify the precise optimal parameter combination (PCE@top1) within D/A groups, we observed a decline in performance on the test set relative to the training set. For example, while the model for the spin coating film formation stage attained a near-perfect success rate on training data (97%), the corresponding rate on the test set was lower (60%; Fig. 6B). This difference between training and test performance in optimal-combination identification contrasts with the consistently high test-set correlations for PCE prediction (r = 0.79 to 0.91; Fig. 6A), indicating that the models reliably predict the magnitude of achievable PCE but face a greater challenge in pinpointing the exact best parameter set within a material group. The observed discrepancy likely stems from the combinatorial complexity within each stage coupled with the inherent sparsity of data covering all possible synergistic parameter combinations across diverse material systems, which limits the model’s ability to generalize the identification of the absolute optimum for this specific task.

Fig. 6. Overall predictive performance and predictive accuracy on active layer processing parameter of three stage-combined active layer processing parameter models.

Fig. 6.

(A) Pearson correlation between experimental PCE and predicted PCE of ML model based on the Dev + Ele + Fp descriptor combination for both training and test sets. rval, average Pearson’s correlation coefficient from 10-fold cross-validation on the training set; rtest, Pearson’s correlation coefficient on the held-out external test set. (B) Success rate of the processing parameter set yielding the optimal PCE (PCE@top1), the secondary PCE (PCE@top2), and other cases (PCE@others) predicted by ML model based on Dev + Fp + Ele descriptor combination. The evaluation is only conducted within data groups with the identical D/A pairs.

Integrating the findings from the stage-combined models with those of the single-parameter analyses provides a more complete understanding of the processing landscape. The stage-combined models complement the single-parameter approaches by explicitly capturing essential intrastage interactions (for example, the solvent-additive interplay during solution preparation), which are critical for realistic sequential fabrication. While single-parameter models efficiently identify promising ranges for individual variables under favorable data distributions, the stage-combined framework reveals how parameters cooperate within a fabrication step to drive performance. The difficulty in generalizing the identification of the absolute best parameter combination (PCE@top1) highlights the inherent challenge of optimizing high-dimensional, coupled parameter spaces with limited experimental data. Nevertheless, the strong PCE prediction accuracy maintained across stages demonstrates that the models have learned physically meaningful relationships, offering valuable guidance for narrowing down the parameter search in practice. This tiered strategy thus provides a progressive, interpretable route to navigate fabrication complexity, bridging isolated parameter effects and integrated process synergies.

Global optimization in the full parameter space of active layer processing

The intricate interplay between all nine processing parameters creates a complex, high-dimensional optimization landscape. Resolving cross-stage parameter interdependencies is imperative to navigate the high-dimensional optimization landscape. Identifying the global optimum within this multidimensional space represents a major challenge in photoactive layer fabrication. To address this challenge, we constructed models incorporating all nine parameters simultaneously, enabling exploration of the full parameter space for optimal PCE. The database supporting this analysis contained 1028 data points. Initial insights into parameter relationships were gleaned from the Pearson correlation matrix (Fig. 7A). Notably, solvent type and additive type, represented by their calculated energy gap, dipole moment, and polarizability, exhibited moderate but statistically significant correlations with PCE (r > 0.3), validating these electronic descriptors as effective discriminators capturing their functional impact. Furthermore, spin coating rate, D/A layer thickness, and D/A weight ratio also showed discernible linear correlations with PCE (r = 0.3, 0.2, and 0.1, respectively), reinforcing their established roles in performance control. Although correlations for other parameters were less pronounced in this analysis, there might be more complex, nonlinear relationships or interactions for warranting further investigation. Reinforcing the practical coherence of our parameter classification, a strong correlation (r = 0.5) was observed between D/A blend concentration and D/A layer thickness, aligning with empirical knowledge that thickness is governed by both blend concentration and spin speed. Value range/type of device parameters within the database are visualized in Fig. 7B.

Fig. 7. Database information and model performance of nine active layer processing parameters model.

Fig. 7.

(A) Pearson correlation of nine processing parameters (where the solvent type and additive type are jointly characterized by their respective band gaps, dipole moments, and polarizabilities) and PCE. (B) Value range/included types for the nine processing parameters in the database. (C) Pearson correlation between experimental PCE and predicted PCE of ML model based on the Dev + Ele descriptor combination for both training and test sets. rval, average Pearson’s correlation coefficient from 10-fold cross-validation on the training set; rtest, Pearson’s correlation coefficient on the held-out external test set. (D) Success rate of the processing parameter set yielding the optimal PCE (PCE@top1), the secondary PCE (PCE@top2), and other cases (PCE@others) predicted by ML model based on Dev + Fp + Ele descriptor combination. The evaluation is only conducted within data groups with the identical D/A pairs. (E) Feature importances of the nine processing parameters in the ML model based on all nine processing parameters combined with electronic structural properties (Dev + Ele).

Evaluating models using the three descriptor combinations (Dev + Fp, Dev + Ele, and Dev + Ele + Fp) for full-parameter PCE prediction yielded critical insights (tables S12 to S14). Overall prediction accuracy (MAE, RMSE, and r) proved largely comparable across descriptor sets. However, a key distinction surfaced in identifying the optimal parameter combination (PCE@top1). The Dev + Ele combination (device parameters combined with electronic properties) demonstrated superior PCE@top1 capability compared to models incorporating fingerprints (Dev + Fp and Dev + Ele + Fp). This outcome suggests that the high dimensionality of fingerprint descriptors may dilute the model’s capacity to learn critical interactions between the processing parameters themselves, which Dev + Ele captures more effectively. Model performance based on Dev + Ele is shown in Fig. 7 (C and D). Notably, the Dev + Ele–based model achieved exceptional generalization with nearly identical successful prediction rate of PCE@top1 on training (84%) and test (81%) sets, effectively resolving the overfitting limitations observed in both the single-parameter and stage-combined analyses. This success underscores that the electronic structural descriptors provide sufficient material context to effectively interpret the impact of device parameters without the potential noise introduced by high-dimensional geometric fingerprints in this specific global optimization task. Further analysis of feature importance (Fig. 7E) within the Dev + Ele–based model highlighted the D/A weight ratio, additive type, D/A blend concentration, and annealing temperature as particularly influential parameters for prediction.

This exploration of the full parameter space represents the pinnacle of our tiered optimization strategy. While the individual parameter models efficiently identified promising ranges for individual variables, and the stage-combined models revealed critical intrastage synergies, both approaches faced challenges in perfectly generalizing the identification of the absolute best parameter combination (PCE@top1) due to combinatorial complexity and data constraints within their respective scopes. The full-parameter model, particularly using Dev + Ele, successfully navigates this complexity. By considering the simultaneous influence of all nine parameters, it achieves robust generalization in identifying globally optimal configurations, overcoming the overfitting hurdle encountered in the previous levels of analysis. This success underscores that the electronic structural descriptors provide sufficient material context to effectively interpret the impact of device parameters without the potential noise introduced by high-dimensional geometric fingerprints in this specific global optimization task.

Extrapolation validation on previously unseen donor or acceptor structures

To rigorously evaluate the model’s generalizability, we performed an extrapolation validation on an external dataset comprising 78 recently reported D/A systems (5367). These systems were strictly independent of our training database, with each pair featuring either a donor or an acceptor with a chemical structure not present in the training set. This test thus assesses the model’s ability to predict processing conditions for unseen molecular structures, a key requirement for guiding the development of new materials. This dataset covered eight device parameters involved in this study, although parameter coverage varied because of experimental reporting practices. For example, the D/A weight ratio had 23 groups, while the spin coating rate and D/A layer thickness each had only 1 to 2 groups. Since these studies optimized parameters using controlled variable testing, evaluation was limited to the single active layer processing parameter models. D/A pairs and model-predicted results for each active layer processing parameter of 78 external validation cases can be found in table S15. Analysis showed that these models identified either the optimal or secondary optimal parameter setting in more than 75% of external validation cases across all parameters, as summarized in Fig. 8. This accuracy refers to the model’s success rate in predicting either the optimal or the secondary optimal value for each individual processing parameter separately, not an exact match of all parameters simultaneously for a given device. While discrepancies between absolute predicted and experimental PCE values persist, this outcome may be caused by two main reasons. First, the inherent heterogeneity of literature data introduces systematic noise into absolute PCE values due to unavoidable interlaboratory variations in measurement and fabrication protocols; while the model learns robust comparative trends, perfect calibration to every systematic offset is not feasible. Second, for processing conditions or material structures near the edge of the training data’s chemical space, the model’s ability to precisely quantify absolute performance can be limited, although its qualitative ranking often remains reliable.

Fig. 8. Experimental verification of extrapolation of single-device parameter models.

Fig. 8.

Counts of ML model predictions for eight device parameters yielding the optimal PCE (PCE@top1), the secondary PCE (PCE@top2), and other cases (PCE@others) in an external dataset with recently synthesized, previously unseen structures.

Building on the comprehensive OPV database, this study achieves state-of-the-art predictive accuracy on internal validation (table S16) and provides an effective framework for navigating complex experimental processing spaces. While quantitative deviations in absolute PCE values are observed in external validation, the models successfully identify the optimal or suboptimal processing conditions in the majority of cases, demonstrating robust ranking capability. This establishes a practical, data-driven foundation for experimental optimization. We envision that the framework established here will encourage more standardized reporting of experimental data. This, in turn, will be instrumental in building an expansive, high-fidelity knowledge base, powering iterative model refinement and enabling a more efficient materials discovery pipeline for OPV.

DISCUSSION

In summary, this study resolves the critical challenge of high-dimensional process optimization in OPV photoactive layer fabrication through a multitier structured ML framework. By curating a standardized materials-processing-performance database and integrating quantum chemical descriptors with key fabrication variables, we systematically decode multidimensional relationships across three interconnected tiers. Single-parameter models quantitatively established the relationship between each processing parameter and device efficiency, while stage-combined models uncover essential intraprocess synergies governing solution preparation, film formation, and posttreatment. The global optimization model using electronic property–enhanced device descriptors (Dev + Ele) suppresses combinatorial overfitting, achieving >0.9 Pearson’s correlation coefficients and >80% success rates in identifying optimal nine-parameter sets, with feature importance highlighting the D/A weight ratio, additive type, D/A blend concentration, and annealing temperature as dominant factors. This framework demonstrates robust generalizability in extrapolation, achieving a >75% success rate in identifying optimal or secondary processing parameters for external D/A systems that incorporate structurally unprecedented donor or acceptor components. This confirms the model’s capacity to capture fundamental processing-structure-property relationships that are transferable across molecular structures. Collectively, our multitier strategy delivers a virtual screening tool specifically for active layer fabrication, which is beneficial to accelerate the rational design of high-performance OPV devices.

Although numerous ML studies in OPVs have made remarkable strides in predicting molecular properties, designing previously unexplored structures, or extracting physical parameters from device characteristics, a systematic ML approach dedicated to navigating the high-dimensional fabrication parameter space of the photoactive layer remains less explored. This work aims to fill this gap by establishing a multitiered framework that not only predicts device performance but also elucidates intra- and interstage synergies among processing parameters, ultimately enabling extrapolative guidance for the fabrication of unexplored material systems. In contrast to prior ML works (3847) (summarized in table S16) in OPV that primarily focus on molecular discovery or postfabrication analysis, our framework provides a comprehensive strategy for the experimental optimization of the photoactive layer processing itself. The tiered modeling approach offers a more granular and interpretable understanding of parameter interactions than a single black-box model. When validated on external datasets of unprecedented D/A pairs, our model demonstrated robust extrapolation capabilities, a critical step toward the practical application of ML in guiding OPV experimentation. We envision that our methodology, which complements the thriving ML-based molecular design efforts, could form an essential part of a fully integrated ML-driven pipeline for OPV development from in silico molecule screening to optimized device fabrication.

While this work focuses on binary BHJ systems, the proposed multitier ML framework offers natural extensibility to multicomponent systems such as ternary blends and tandem cells. The methodology can be adapted by the following: (i) expanding the database to include additional components and their ratios; (ii) extending the descriptor set to capture interactions between multiple components; and (iii) maintaining the tiered modeling strategy to navigate the increased parameter space. Such extension would enable the optimization of complex multicomponent systems, representing a promising direction for future research that builds directly on the foundation established here.

MATERIALS AND METHODS

Database construction

A standardized materials-processing-performance database for OPV was established by manually curating 609 papers published between 2016 and 2025 years. The database encompasses 361 polymer donors and 933 nonfullerene acceptors, ensuring substantial molecular diversity. The donors include prominent families such as the J81 series, benzodithiophene (BDT)-based polymers, fluorinated/chlorinated polymers, the D18 series, and the PM6 series. The acceptors cover key structural motifs, including the A-D-A structured ITIC series, the A-D-A-D-A structured Y6 series, quinoxaline-based derivatives, fluoro-alkoxy end-capped groups, and L8-BO. The complete dataset is available in a public repository (see Data and materials availability). Molecular structure of the D/A pair, nine critical active layer processing parameters (D/A weight ratio, solvent type, additive type, additive volume ratio, D/A blend concentration, spin coating rate, D/A layer thickness, annealing temperature, and annealing time), as well as the ultimate PCE were contained. For polymeric donors, a single repeating unit was used as the structural representation. To benchmark optimal performance, the highest reported PCE for each D/A pair under certain processing parameter values was designated as the target value. Entries with identical D/A pairs but varying processing parameters were grouped to enable assessment of processing-dependent performance prediction.

A key consideration in curating the database was managing potential inconsistencies, such as those arising from different thickness measurement methods. To ensure data quality, a multistep procedure was adopted. First, a physical range filter (40 to 500 nm) was applied to exclude implausible values. Second, statistical outlier detection (beyond three SDs) was performed both globally and within local data groupings. Last, all flagged data points underwent manual verification against the original literature. This process, combined with the inherent robustness of the GBRT algorithm to noise, helped to minimize the influence of measurement discrepancies on the model’s learning process.

It should be noted that our comprehensive database includes additional parameters related to device architecture (conventional/inverted structures) and interfacial layers. However, the current study focuses specifically on photoactive layer processing parameters to enable a detailed investigation of morphology-property relationships in the BHJ. The systematic analysis of device architecture and interfacial layers constitutes a separate research effort that will be presented in subsequent work.

Descriptors

Three types of descriptors were used to characterize the material and device space of our database. Geometric features were encoded using 1024-bit Chemistry Development Kit (CDK) molecular fingerprints (68) generated for individual D and A components via the ChemDes platform (69); these fingerprints were concatenated to form a unified 2048-bit descriptor per D/A pair. Electronic structural properties were composed of 19 quantum-chemically derived descriptors relevant to OPV mechanisms (e.g., frontier orbital energies and dipole moments; see table S17 and fig. S3 as well as section S1 in the Supplementary Materials for detailed meanings). Molecular geometries were optimized using density functional theory at the M06-2X/6-31G(d) (70, 71) level within Gaussian 16 package (72). Last, the active layer processing parameters (Dev) constitute the third descriptor type. Depending on the modeling tier, different sets of these parameters were used as input. For the single-parameter models, the specific parameter being optimized (e.g., D/A weight ratio) was the primary Dev input, while the others provided context. For the stage-combined models, the Dev inputs were the parameters grouped within each stage: solution preparation (D/A weight ratio, solvent type, additive type, and additive volume ratio), spin coating film formation (D/A blend concentration, spin coating rate, D/A layer thickness), and postprocessing (annealing temperature and annealing time). The full-parameter model incorporated all nine parameters simultaneously. The representation of solvent and additive types was designed to capture their physicochemical characteristics. Solvent types were characterized by three electronic structural descriptors: energy gap, dipole moment, and polarizability. For additive types, in addition to these same three electronic descriptors, their physical state (solid or liquid) and key physicochemical properties, including melting point, boiling point, density, and molecular weight, were incorporated. This comprehensive descriptor set provides enhanced physical characterization of these processing components. Geometric structure optimization and vibrational frequency calculation of solvent and additive molecules were carried out at the M06-2X/6-31G(d,p) level, which was consistent with the computational level of D and A molecules.

Model inputs subsequently combined these descriptors into three distinct sets, i.e., active layer processing parameters with geometric features (Dev + Fp), active layer processing parameters with electronic structural properties (Dev + Ele), and the full complement of all three classes (Dev + Ele + Fp).

Data splitting

To ensure robust model evaluation and generalization, a stratified manual data splitting approach was implemented. The dataset was initially divided into 10 PCE-based groups (see table S18). Subsequently, 15% of entries were randomly selected from each group and pooled to form an external test set, while the remaining 85% per group constituted the training set. Critically, hyperparameter optimization and 10-fold cross-validation were performed exclusively within the training set, thereby maintaining strict isolation of the external test set throughout model development to ensure unbiased assessment of predictive generalizability.

Model construction and evaluation

All predictive models in this study used the GBRT algorithm implemented in the Scikit-learn platform (73), with hyperparameter specifications detailed in section S2 of the Supplementary Materials. Given the presence of multientry D/A groups (i.e., identical material pairs with varying processing conditions), performance assessment addressed two distinct objectives. First, overall predictive accuracy was quantified using conventional metrics, i.e., MAE, RMSE, and r, as defined in section S3 of the Supplementary Materials. Second, to evaluate practical utility in device parameters optimization, the prediction of experimental processing parameters within the D/A groups was classified into three cases: PCE@top1: successful prediction of the device parameter set yielding the optimal PCE; PCE@top2: successful prediction of the device parameter set yielding the secondary PCE (applied to groups with ≥3 entries); PCE@others: all remaining predictions. This dual evaluation indicators thus assesses both PCE prediction accuracy and the capability to guide experimental active layer processing optimization.

Acknowledgments

We thank K. Gao and K. Long for helpful discussions.

Funding:

The work was supported by the National Natural Science Foundation of China (grant 22403026 to Y.W. and grants 22325302 and 52541104 to H.M.), the National Key Research and Development Program of China (grant 2022YFA1503103 to H.M.), and the Key Research Projects of Henan Provincial Universities (grant 24B150017 to Y.W.).

Author contributions:

Y.W. conceived and designed the study and performed data collection, analysis, modeling, and validation. Y.Z. assisted with data collection. H.M. supervised the study. Y.W. wrote the original draft. H.M. reviewed and edited the manuscript.

Competing interests:

The authors declare that they have no competing interests.

Data and materials availability:

All data and code needed to evaluate and reproduce the results in the paper are present in the paper and/or the Supplementary Materials. The data for this study have been deposited in the Zenodo repository and are available at: https://doi.org/10.5281/zenodo.17656284. This study did not generate new materials.

Supplementary Materials

This PDF file includes:

Sections S1 to S3

Figs. S1 to S3

Tables S1 to S18

sciadv.aeb1323_sm.pdf (1.2MB, pdf)

REFERENCES

  • 1.Li Y., Huang X., Sheriff H. K. M. Jr., Forrest S. R., Semitransparent organic photovoltaics for building-integrated photovoltaic applications. Nat. Rev. Mater. 8, 186–201 (2022). [Google Scholar]
  • 2.Zhao Y., Li Z., Deger C., Wang M., Peric M., Yin Y., Meng D., Yang W., Wang X., Xing Q., Chang B., Scott E. G., Zhou Y., Zhang E., Zheng R., Bian J., Shi Y., Yavuz I., Wei K.-H., Houk K. N., Yang Y., Achieving sustainability of greenhouses by integrating stable semi-transparent organic photovoltaics. Nat. Sustain. 6, 539–548 (2023). [Google Scholar]
  • 3.Yan Y., Duan B., Ru M., Gu Q., Li S., Zhao W., Toward flexible and stretchable organic solar cells: A comprehensive review of transparent conductive electrodes, photoactive materials, and device performance. Adv. Energy Mater. 15, 2404233 (2024). [Google Scholar]
  • 4.Li S., Li Z., Wan X., Chen Y., Recent progress in flexible organic solar cells. eScience 3, 100085 (2023). [Google Scholar]
  • 5.Li C., Song J., Lai H., Zhang H., Zhou R., Xu J., Huang H., Liu L., Gao J., Li Y., Jee M. H., Zheng Z., Liu S., Yan J., Chen X.-K., Tang Z., Zhang C., Woo H. Y., He F., Gao F., Yan H., Sun Y., Non-fullerene acceptors with high crystallinity and photoluminescence quantum yield enable >20% efficiency organic solar cells. Nat. Mater. 24, 433–443 (2025). [DOI] [PubMed] [Google Scholar]
  • 6.Jiang Y., Sun S., Xu R., Liu F., Miao X., Ran G., Liu K., Yi Y., Zhang W., Zhu X., Non-fullerene acceptor with asymmetric structure and phenyl-substituted alkyl side chain for 20.2% efficiency organic solar cells. Nat. Energy 9, 975–986 (2024). [Google Scholar]
  • 7.Liu K., Jiang Y., Ran G., Liu F., Zhang W., Zhu X., 19.7% efficiency binary organic solar cells achieved by selective core fluorination of nonfullerene electron acceptors. Joule 8, 835–851 (2024). [Google Scholar]
  • 8.Sun Y., Wang L., Guo C., Xiao J., Liu C., Chen C., Xia W., Gan Z., Cheng J., Zhou J., Chen Z., Zhou J., Liu D., Wang T., Li W., π-extended nonfullerene acceptor for compressed molecular packing in organic solar cells to achieve over 20% efficiency. J. Am. Chem. Soc. 146, 12011–12019 (2024). [DOI] [PubMed] [Google Scholar]
  • 9.Chen Z., Ge J., Song W., Tong X., Liu H., Yu X., Li J., Shi J., Xie L., Han C., Liu Q., Ge Z., 20.2% efficiency organic photovoltaics employing a π-extension quinoxaline-based acceptor with ordered arrangement. Adv. Mater. 36, 2406690 (2024). [DOI] [PubMed] [Google Scholar]
  • 10.Cao X., Wang P., Jia X., Zhao W., Chen H., Xiao Z., Li J., Bi X., Yao Z., Guo Y., Long G., Li C., Wan X., Chen Y., Rebuilding peripheral F, Cl, Br footprints on acceptors enables binary organic photovoltaic efficiency exceeding 19.7%. Angew. Chem. Int. Ed. 64, e202417244 (2024). [DOI] [PubMed] [Google Scholar]
  • 11.Jiang Q., Yuan X., Li Y., Luo Y., Zhu J., Zhao F., Zhang Y., Wei W., Feng H., Li H., Wu J., Ma Z., Tang Z., Huang F., Cao Y., Duan C., A structurally simple polymer donor enables high-efficiency organic solar cells with minimal energy losses. Angew. Chem. Int. Ed. 64, e202416883 (2025). [DOI] [PubMed] [Google Scholar]
  • 12.Lin T., Hai Y., Luo Y., Feng L., Jia T., Wu J., Ma R., Dela Peña T. A., Li Y., Xing Z., Li M., Wang M., Xiao B., Wong K. S., Liu S., Li G., Isomerization of benzothiadiazole yields a promising polymer donor and organic solar cells with efficiency of 19.0%. Adv. Mater. 36, 2312311 (2024). [DOI] [PubMed] [Google Scholar]
  • 13.Liang H., Ma K., Ding S., Zhao W., Si X., Cao X., Yao Z., Duan T., Long G., Li C., Wan X., Chen Y., A pyrazinyl wide-bandgap polymer donor yields 19.35% efficiency in tandem organic solar cells. Adv. Energy Mater. 14, 2402370 (2024). [Google Scholar]
  • 14.Huang Y., Kramer E. J., Heeger A. J., Bazan G. C., Bulk heterojunction solar cells: Morphology and performance relationships. Chem. Rev. 114, 7006–7043 (2014). [DOI] [PubMed] [Google Scholar]
  • 15.Wang J., Xie Y., Chen K., Wu H., Hodgkiss J. M., Zhan X., Physical insights into non-fullerene organic photovoltaics. Nat. Rev. Phys. 6, 365–381 (2024). [Google Scholar]
  • 16.Zeng R., Zhang M., Wang X., Zhu L., Hao B., Zhong W., Zhou G., Deng J., Tan S., Zhuang J., Han F., Zhang A., Zhou Z., Xue X., Xu S., Xu J., Liu Y., Lu H., Wu X., Wang C., Fink Z., Russell T. P., Jing H., Zhang Y., Bo Z., Liu F., Achieving 19% efficiency in non-fused ring electron acceptor solar cells via solubility control of donor and acceptor crystallization. Nat. Energy 9, 1117–1128 (2024). [Google Scholar]
  • 17.Zhu L., Zhang M., Xu J., Li C., Yan J., Zhou G., Zhong W., Hao T., Song J., Xue X., Zhou Z., Zeng R., Zhu H., Chen C.-C., MacKenzie R. C. I., Zou Y., Nelson J., Zhang Y., Sun Y., Liu F., Single-junction organic solar cells with over 19% efficiency enabled by a refined double-fibril network morphology. Nat. Mater. 21, 656–663 (2022). [DOI] [PubMed] [Google Scholar]
  • 18.Zhang R., Chen H., Wang T., Kobera L., He L., Huang Y., Ding J., Zhang B., Khasbaatar A., Nanayakkara S., Zheng J., Chen W., Diao Y., Abbrent S., Brus J., Coffey A. H., Zhu C., Liu H., Lu X., Jiang Q., Coropceanu V., Brédas J.-L., Li Y., Li Y., Gao F., Equally high efficiencies of organic solar cells processed from different solvents reveal key factors for morphology control. Nat. Energy 10, 124–134 (2025). [Google Scholar]
  • 19.Chen H., Huang Y., Zhang R., Mou H., Ding J., Zhou J., Wang Z., Li H., Chen W., Zhu J., Cheng Q., Gu H., Wu X., Zhang T., Wang Y., Zhu H., Xie Z., Gao F., Li Y., Li Y., Organic solar cells with 20.82% efficiency and high tolerance of active layer thickness through crystallization sequence manipulation. Nat. Mater. 24, 444–453 (2025). [DOI] [PubMed] [Google Scholar]
  • 20.Zhu L., Zhang M., Zhou G., Wang Z., Zhong W., Zhuang J., Zhou Z., Gao X., Kan L., Hao B., Han F., Zeng R., Xue X., Xu S., Jing H., Xiao B., Zhu H., Zhang Y., Liu F., Achieving 20.8% organic solar cells via additive-assisted layer-by-layer fabrication with bulk p-i-n structure and improved optical management. Joule 8, 3153–3168 (2024). [Google Scholar]
  • 21.Wang H., Zhang B., Wang L., Guo X., Mei L., Cheng B., Sun W., Kan L., Xia X., Hao X., Geue T., Liu F., Zhang M., Chen X. K., Achieving uniform phase structure for layer-by-layer processed binary organic solar cells with 20.2% efficiency. Angew. Chem. Int. Ed. 64, e202508257 (2025). [DOI] [PubMed] [Google Scholar]
  • 22.Guan S., Li Y., Xu C., Yin N., Xu C., Wang C., Wang M., Xu Y., Chen Q., Wang D., Zuo L., Chen H., Self-assembled interlayer enables high-performance organic photovoltaics with power conversion efficiency exceeding 20%. Adv. Mater. 36, 2400342 (2024). [DOI] [PubMed] [Google Scholar]
  • 23.Liu L., Li H., Xie J., Yang Z., Bai Y., Li M., Huang Z., Zhang K., Huang F., Organic solar cell with efficiency of 20.49% enabled by solid additive and non-halogenated solvent. Adv. Mater. 37, e2500352 (2025). [DOI] [PubMed] [Google Scholar]
  • 24.Gong Y., Tan S., Li X., Qin S., Li X., Zou T., Li Y., Yuan M., Zhang Z., Hu H., Liang T., Zhang J., Meng L., Liu F., Li Y., Molecular geometry-property relationship of benzodipyrrole-based A-DA'D-A type acceptors for high-performance organic solar cells. Angew. Chem. Int. Ed. 64, e202505366 (2025). [DOI] [PubMed] [Google Scholar]
  • 25.Yan J., Rodríguez-Martínez X., Pearce D., Douglas H., Bili D., Azzouzi M., Eisner F., Virbule A., Rezasoltani E., Belova V., Dörling B., Few S., Szumska A. A., Hou X., Zhang G., Yip H.-L., Campoy-Quiles M., Nelson J., Identifying structure–absorption relationships and predicting absorption strength of non-fullerene acceptors for organic photovoltaics. Energy Environ. Sci. 15, 2958–2973 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Yao C., Li X., Yang Y., Li L., Bo M., Peng C., Wang J., Machine learning with quantum chemistry descriptors: Predicting the solubility of small-molecule optoelectronic materials for organic solar cells. J. Mater. Chem. A 10, 15999–16006 (2022). [Google Scholar]
  • 27.Moore G. J., Bardagot O., Banerji N., Deep transfer learning: A fast and accurate tool to predict the energy levels of donor molecules for organic photovoltaics. Adv. Theor. Simul. 5, 2100511 (2022). [Google Scholar]
  • 28.Hachmann J., Olivares-Amaya R., Atahan-Evrenk S., Amador-Bedolla C., Sánchez-Carrera R. S., Gold-Parker A., Vogt L., Brockway A. M., Aspuru-Guzik A., The Harvard Clean Energy Project: Large-scale computational screening and design of organic photovoltaics on the world community grid. J. Phys. Chem. Lett. 2, 2241–2251 (2011). [Google Scholar]
  • 29.Hachmann J., Olivares-Amaya R., Jinich A., Appleton A. L., Blood-Forsythe M. A., Seress L. R., Román-Salgado C., Trepte K., Atahan-Evrenk S., Er S., Shrestha S., Mondal R., Sokolov A., Bao Z., Aspuru-Guzik A., Lead candidates for high-performance organic photovoltaics from high-throughput quantum chemistry-the Harvard Clean Energy Project. Energy Environ. Sci. 7, 698–704 (2014). [Google Scholar]
  • 30.Sahu H., Ma H., Unraveling correlations between molecular properties and device parameters of organic solar cells using machine learning. J. Phys. Chem. Lett. 10, 7277–7284 (2019). [DOI] [PubMed] [Google Scholar]
  • 31.Sahu H., Yang F., Ye X., Ma J., Fang W., Ma H., Designing promising molecules for organic solar cells via machine learning assisted virtual screening. J. Mater. Chem. A 7, 17480–17488 (2019). [Google Scholar]
  • 32.Lopez S. A., Sanchez-Lengeling B., de Goes Soares J., Aspuru-Guzik A., Design principles and top non-fullerene acceptor candidates for organic photovoltaics. Joule 1, 857–870 (2017). [Google Scholar]
  • 33.Peng S.-P., Zhao Y., Convolutional neural networks for the design and analysis of non-fullerene acceptors. J. Chem. Inf. Model. 59, 4993–5001 (2019). [DOI] [PubMed] [Google Scholar]
  • 34.Lee M. H., Insights from machine learning techniques for predicting the efficiency of fullerene derivatives-based ternary organic solar cells at ternary blend design. Adv. Energy Mater. 9, 1900891 (2019). [Google Scholar]
  • 35.Kranthiraja K., Saeki A., Machine learning-assisted polymer design for improving the performance of non-fullerene organic solar cells. ACS Appl. Mater. Interfaces 14, 28936–28944 (2022). [DOI] [PubMed] [Google Scholar]
  • 36.Hußner M., Pacalaj R. A., Olaf Müller-Dieckert G., Liu C., Zhou Z., Majeed N., Greedy S., Ramirez I., Li N., Hosseini S. M., Uhrich C., Brabec C. J., Durrant J. R., Deibel C., MacKenzie R. C. I., Machine learning for ultra high throughput screening of organic solar cells: Solving the needle in the haystack problem. Adv. Funct. Mater. 14, 2303000 (2024). [Google Scholar]
  • 37.Liu C., Lüer L., Corre V. M. L., Forberich K., Weitz P., Heumüller T., Du X., Wortmann J., Zhang J., Wagner J., Ying L., Hauch J., Li N., Brabec C. J., Understanding causalities in organic photovoltaics device degradation in a machine-learning-driven high-throughput platform. Adv. Mater. 36, e2300259 (2024). [DOI] [PubMed] [Google Scholar]
  • 38.Nagasawa S., Al-Naamani E., Saeki A., Computer-aided screening of conjugated polymers for organic solar cell: Classification by random forest. J. Phys. Chem. Lett. 9, 2639–2646 (2018). [DOI] [PubMed] [Google Scholar]
  • 39.Sahu H., Rao W., Troisi A., Ma H., Toward predicting efficiency of organic solar cells via machine learning and improved descriptors. Adv. Energy Mater. 8, 1801032 (2018). [Google Scholar]
  • 40.Sun W., Zheng Y., Yang K., Zhang Q., Shah A. A., Wu Z., Sun Y., Feng L., Chen D., Xiao Z., Lu S., Li Y., Sun K., Machine learning-assisted molecular design and efficiency prediction for high-performance organic photovoltaic materials. Sci. Adv. 5, eaay4275 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Padula D., Troisi A., Concurrent optimization of organic donor–acceptor pairs through machine learning. Adv. Energy Mater. 9, 1902463 (2019). [Google Scholar]
  • 42.Wu Y., Guo J., Sun R., Min J., Machine learning for accelerating the discovery of high-performance donor/acceptor pairs in non-fullerene organic solar cells. NPJ Comput. Mater. 6, 120 (2020). [Google Scholar]
  • 43.Kranthiraja K., Saeki A., Experiment-oriented machine learning of polymer: Non-fullerene organic solar cells. Adv. Funct. Mater. 31, 2011168 (2021). [Google Scholar]
  • 44.Shetty P., Adeboye A., Gupta S., Zhang C., Ramprasad R., Accelerating materials discovery for polymer solar cells: Data-driven insights enabled by natural language processing. Chem. Mater. 36, 7676–7689 (2024). [Google Scholar]
  • 45.Liu X., Zhang X., Sheng Y., Zhang Z., Xiong P., Ju X., Zhu J., Ye C., Advanced organic photovoltaic materials by machine learning-driven design with polymer-unit fingerprints. NPJ Comput. Mater. 11, 107 (2025). [Google Scholar]
  • 46.Tadokoro S., Kamimura R., Ishiwari F., Saeki A., Design of simple-structured conjugated polymers for organic solar cells by machine learning-assisted structural modification and experimental validation. Digit. Discov. 4, 3774–3781 (2025). [Google Scholar]
  • 47.Mahmood A., Wang J.-L., A time and resource efficient machine learning assisted design of non-fullerene small molecule acceptors for P3HT-based organic solar cells and green solvent selection. J. Mater. Chem. A 9, 15684–15695 (2021). [Google Scholar]
  • 48.Wen Y., Liu Y., Yan B., Gaudin T., Ma J., Ma H., Simultaneous optimization of donor/acceptor pairs and device specifications for nonfullerene organic solar cells using a QSPR model with morphological descriptors. J. Phys. Chem. Lett. 12, 4980–4986 (2021). [DOI] [PubMed] [Google Scholar]
  • 49.Majeed N., Saladina M., Krompiec M., Greedy S., Deibel C., MacKenzie R. C. I., Using deep machine learning to understand the physical performance bottlenecks in novel thin-film solar cells. Adv. Funct. Mater. 30, 1907259 (2019). [Google Scholar]
  • 50.Mahmood A., Sandali Y., Wang J.-L., Easy and fast prediction of green solvents for small molecule donor-based organic solar cells through machine learning. Phys. Chem. Chem. Phys. 25, 10417–10426 (2023). [DOI] [PubMed] [Google Scholar]
  • 51.Chen Y., Zhan C., Yao J., Understanding solvent manipulation of morphology in bulk-heterojunction organic solar cells. Chem. Asian J. 11, 2620–2632 (2016). [DOI] [PubMed] [Google Scholar]
  • 52.Li W., Xie C., Qin X., Pan F., Luo X., Li X., Zhang B., Wang Z., Wang X., Lv M., Synergistic regulation of the active layer morphology by halogenated methoxybenzene derivative solid additives for high-performance organic solar cells. Mater. Today Chem. 46, 102730 (2025). [Google Scholar]
  • 53.Zhu J., Tang A., Tang L., Cong P., Li C., Guo Q., Wang Z., Xu X., Wu J., Zhou E., Chlorination of benzyl group on the terminal unit of A2-A1-D-A1-A2 type nonfullerene acceptor for high-voltage organic solar cells. Chin. Chem. Lett. 36, 110233 (2025). [Google Scholar]
  • 54.Zhang W., Zhao K., Zhang N., Dong Q., Shen S., Lu H., Hu B., Zhao F., Yuan S., Lu G., Chen Y., Ma Z., Bo Z., Song J., Backbone twisting and terminal overlapping via π-bridge engineering for highly efficient non-fused ring electron acceptors with balanced JSC-VOC. Adv. Funct. Mater. 35, 2423242 (2025). [Google Scholar]
  • 55.In Kim D., Kim K., Park B., Kim J., Kim Y. H., Lee K., Kwon S. K., Lee J., Effect of side chain modification on edge-on oriented dithienobenzodithiophene-based non-fullerene acceptors for organic solar cells. J. Appl. Polym. Sci. 141, e56216 (2024). [Google Scholar]
  • 56.Choi M.-W., Park S., Kim S.-Y., Kim D. W., Kim S. I., Yoon W. S., Park S. Y., Naphthyridinedione-based multifunctional small molecules for applications in both photovoltaics and transistors. J. Mater. Chem. C 12, 12474–12482 (2024). [Google Scholar]
  • 57.Xie F., Li X., Wang Z., Du M., Ji M., Du J., Cong P., Guo Q., Zhou E., Shamrock-shaped nonfullerene acceptors containing nonhalogenated end groups for high-voltage organic photovoltaics. Dye Pigm. 235, 112616 (2025). [Google Scholar]
  • 58.Zhang N., Chen T., Li Y., Li S., Yu J., Liu H., Wang M., Ye X. K., Ding X., Lu X., Li C. Z., Zhu H., Shi M., Chen H., Benzothiadiazole-fused cyanoindone: A superior building block for designing ultra-narrow bandgap electron acceptor with long-range ordered stacking. Angew. Chem. Int. Ed. 64, e202420090 (2024). [DOI] [PubMed] [Google Scholar]
  • 59.Wei Q., Li Y., Chen W., Shi Q., Zhu S., Yan W., Zou Y., Efficient organic solar cells based on low-cost pentacyclic fused-ring small molecule acceptors with a fill factor over 80%. J. Mater. Chem. A 12, 30558–30566 (2024). [Google Scholar]
  • 60.Wang Y.-B., Tsai C.-L., Xue Y.-J., Jiang B.-H., Lu H.-C., Hong J.-C., Huang Y.-C., Huang K.-H., Chien S.-Y., Chen C.-P., Cheng Y.-J., Fluorinated and methylated ortho-benzodipyrrole-based acceptors suppressing charge recombination and minimizing energy loss in organic photovoltaics. Chem. Sci. 16, 3259–3274 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Zhang M., Wang Z., Zhu L., Zeng R., Xue X., Liu S., Yan J., Yang Z., Zhong W., Zhou G., Kan L., Xu J., Zhang A., Deng J., Zhou Z., Song J., Jing H., Xu S., Zhang Y., Liu F., Jamming giant molecules at interface in organic photovoltaics to improve performance and stability. Adv. Mater. 36, 2407297 (2024). [DOI] [PubMed] [Google Scholar]
  • 62.Wang P., Feng F., Liu T., Wang X., Yan X., Du Z., Yang C., Li Y., Bao X., New polymeric acceptors based on benzo[1,2-b:4,5-b′] difuran moiety for efficient all-polymer solar cells. Macromol. Rapid Commun. 46, 2400787 (2025). [DOI] [PubMed] [Google Scholar]
  • 63.Qiu D., Zhang L., Zhang H., Tang A., Zhang J., Wei Z., Lu K., Elucidating the effects of bromine substitution in asymmetric quinoxaline central core-based non-fullerene acceptors on molecular stacking and photovoltaic performances. J. Mater. Chem. A 13, 4237–4246 (2025). [Google Scholar]
  • 64.Wang Z., Zhu S., Peng X., Luo S., Liang W., Zhang Z., Dou Y., Zhang G., Chen S., Hu H., Chen Y., Regulating intermolecular interactions and film formation kinetics for record efficiency in difluorobenzothiadizole-based organic solar cells. Angew. Chem. Int. Ed. 64, e202412903 (2024). [DOI] [PubMed] [Google Scholar]
  • 65.Li H., Ren J., Ma L., Chen Z., Yu Y., Wang J., Zhang S., Chlorinated polythiophene-based donors with reduced energy loss for organic solar cells. Chin. J. Chem. 42, 3405–3413 (2024). [Google Scholar]
  • 66.Liu H., Zhang H., Li M., Wu D., Tang H., Zhang X., Huang M., Zhao B., Tuning molecular aggregation to enhance photovoltaic performance of polymers by isomerizing benzodithiophene moiety. Synth. Met. 310, 117783 (2025). [Google Scholar]
  • 67.Wang L., Zhu M., Gu T., Liang X., Pandey S. K., Xu H., Singhal R., Sharma G. D., Dimeric BODIPY donors based on the donor–acceptor structure for all-small-molecule organic solar cells. ACS Appl. Energy Mater. 7, 11195–11205 (2024). [Google Scholar]
  • 68.Steinbeck C., Han Y., Kuhn S., Horlacher O., Luttmann E., Willighagen E., The chemistry development kit (cdk): An open-source java library for chemo- and bioinformatics. J. Chem. Inf. Comput. Sci. 43, 493–500 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Dong J., Cao D.-S., Miao H.-Y., Liu S., Deng B.-C., Yun Y.-H., Wang N.-N., Lu A.-P., Zeng W.-B., Chen A. F., ChemDes: An integrated web-based platform for molecular descriptor and fingerprint computation. J. Cheminform. 7, 60 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Zhao Y., Truhlar D. G., The M06 suite of density functionals for main group thermochemistry, thermochemical kinetics, noncovalent interactions, excited states, and transition elements: Two new functionals and systematic testing of four M06-class functionals and 12 other functionals. Theor. Chem. Acc. 120, 215–241 (2008). [Google Scholar]
  • 71.Hariharan P. C., Pople J. A., The influence of polarization functions on molecular orbital hydrogenation energies. Theor. Chim. Acta 28, 213–222 (1973). [Google Scholar]
  • 72.M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman, G. Scalmani, V. Barone, G. A. Petersson, H. Nakatsuji, X. Li, M. Caricato, A. V. Marenich, J. Bloino, B. G. Janesko, R. Gomperts, B. Mennucci, H. P. Hratchian, J. V. Ortiz, A. F. Izmaylov, J. L. Sonnenberg, D. Williams-Young, F. Ding, F. Lipparini, F. Egidi, J. Goings, B. Peng, A. Petrone, T. Henderson, D. Ranasinghe, V. G. Zakrzewski, J. Gao, N. Rega, G. Zheng, W. Liang, M. Hada, M. Ehara, K. Toyota, R. Fukuda, J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao, H. Nakai, T. Vreven, K. Throssell, J. A. Montgomery, Jr., J. E. Peralta, F. Ogliaro, M. J. Bearpark, J. J. Heyd, E. N. Brothers, K. N. Kudin, V. N. Staroverov, T. A. Keith, R. Kobayashi, J. Normand, K. Raghavachari, A. P. Rendell, J. C. Burant, S. S. Iyengar, J. Tomasi, M. Cossi, J. M. Millam, M. Klene, C. Adamo, R. Cammi, J. W. Ochterski, R. L. Martin, K. Morokuma, O. Farkas, J. B. Foresman, D. J. Fox, Gaussian 16, Revision C.01, Gaussian Inc., Wallingford CT 2016.
  • 73.Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V., Vanderplas J., Passos A., Cournapeau D., Brucher M., Perrot M., Duchesnay É., Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Sections S1 to S3

Figs. S1 to S3

Tables S1 to S18

sciadv.aeb1323_sm.pdf (1.2MB, pdf)

Data Availability Statement

All data and code needed to evaluate and reproduce the results in the paper are present in the paper and/or the Supplementary Materials. The data for this study have been deposited in the Zenodo repository and are available at: https://doi.org/10.5281/zenodo.17656284. This study did not generate new materials.


Articles from Science Advances are provided here courtesy of American Association for the Advancement of Science

RESOURCES