Skip to main content
Scientific Data logoLink to Scientific Data
. 2025 Apr 23;12:679. doi: 10.1038/s41597-025-04959-0

PAH101: A GW+BSE Dataset of 101 Polycyclic Aromatic Hydrocarbon (PAH) Molecular Crystals

Siyu Gao 1,#, Xingyu Liu 1,#, Yiqun Luo 2, Xiaopeng Wang 3, Kaiji Zhao 1, Vincent Chang 1, Bohdan Schatschneider 4, Noa Marom 1,2,5,
PMCID: PMC12019249  PMID: 40268957

Abstract

The excited-state properties of molecular crystals are important for applications in organic electronic devices. The GW approximation and Bethe-Salpeter equation (GW+BSE) is the state-of-the-art method for calculating the excited-state properties of crystalline solids with periodic boundary conditions. We present the PAH101 dataset of GW+BSE calculations for 101 molecular crystals of polycyclic aromatic hydrocarbons (PAHs) with up to  ~500 atoms in the unit cell. To the best of our knowledge, this is the first GW+BSE dataset for molecular crystals. The data records include the GW quasiparticle band structure, the fundamental band gap, the static dielectric constant, the first singlet exciton energy (optical gap), the first triplet exciton energy, the dielectric function, and optical absorption spectra for light polarized along the three lattice vectors. The dataset can be used to (i) discover materials with desired electronic/optical properties, (ii) identify correlations between DFT and GW+BSE quantities, and (iii) train machine learned models to help in materials discovery efforts.

Subject terms: Electronic structure, Materials for devices, Materials for optics, Electronic materials

Background & Summary

Computational materials design and discovery requires exploring the infinitely vast chemical space using quantum mechanical methods that can reliably predict the electronic and optical properties of candidate materials. The computational cost of quantum mechanical simulations increases rapidly with the method accuracy and system size. This limits the scope of simulations that can be performed within a reasonable time in terms of the number of systems explored, their size, the accuracy of the predicted properties, and the types of phenomena that can be investigated17.

Density functional theory (DFT) is the workhorse of first-principles simulations8. DFT relies on approximate exchange-correlation functionals to describe the many-body quantum mechanical interactions between electrons. Computationally efficient semi-local functionals have been used extensively for high-throughput materials screening918. However, DFT is a ground-state theory, therefore it is inherently unable to describe excited-state properties of interest, such as fundamental band gaps, singlet and triplet excitation energies, optical gaps (i.e., the first singlet excitation energy), and optical absorption spectra. The excited states of isolated molecules may be calculated relatively efficiently with time dependent DFT (TDDFT)1923. The excited states of crystalline systems may be calculated using Green’s function based many-body perturbation theory (MBPT) within the GW approximation and Bethe-Salpeter equation (BSE)2427, which lends itself more easily than TDDFT to periodic implementations. Unfortunately, the high computational cost of GW+BSE simulations makes it unfeasible to use these methods for large scale materials exploration.

Machine learning (ML) may accelerate computational materials discovery by bypassing the need to perform expensive first-principles simulations10,2845. To this end, statistical models are constructed based on training data to make predictions for new data points. Training ML models, especially deep neural networks (DNN), typically requires huge datasets46,47. Therefore, data acquisition is often the bottleneck of applying ML to computational materials discovery. With the supercomputing resources available nowadays, acquiring DFT training data with semi-local functionals is relatively fast. This has led to the proliferation of DFT datasets28,4855. As a result, ML models have been trained predominantly on semi-local DFT data, which limits their applicability to structural and ground state properties. Owing to the high computational cost of GW+BSE, such datasets are scarce and the amount of data they contain is relatively small compared to DFT datasets54,56,57. We note that the GW datasets cited here comprise small isolated molecules, which are considerably faster to calculate than periodic molecular crystals with hundreds of atoms in the unit cell. Recently, ML has been applied to predict the GW quasiparticle energies of small molecules58,59.

It is challenging to construct transferable ML models based on “small data”. This has limited the applicability of ML to excited state properties of molecular crystals. Emerging approaches to ML with small data include multi-fidelity approaches. These methods combine a small amount of high-fidelity data with a large amount of low-fidelity data, which, although not as accurate, is sufficiently correlated with the high-fidelity data for statistical inference6069. Recently, high-quality results have been achieved by fine-tuning a pre-trained DNN model with small datasets or combining feature selection with DNN70,71. Other approaches involve using low-fidelity features, selected based on physical/chemical knowledge, to construct surrogate models that are predictive of high-fidelity data. One such approach is the sure-independence-screening-and-sparsifying-operator (SISSO)72,73 ML algorithm. The input of SISSO is a set of primary features, which are physical descriptors that could be correlated with the target property. SISSO generates a huge feature space by iteratively combining the primary features using linear and nonlinear algebraic operations. Subsequently, linear regression is performed to identify the most predictive models. Physical and chemical knowledge is leveraged in the choice of primary features and in the rules for combining them. SISSO has been demonstrated to work well with a relatively small amount of data for several different types of materials systems and properties13,7492.

One application that requires predicting the excited-state properties of molecular crystals is singlet fission (SF), the conversion of one singlet exciton into two triplet excitons9396. The efficiency of solar cells can be boosted by augmenting traditional absorbers with SF materials9799. The SF material can convert photons with energies high above the traditional absorber’s band gap into two charge carriers instead of losing their excess energy to heat. Currently, few classes of materials are known to undergo intermolecular SF in the solid state, and insufficient stability under operating conditions precludes their utilization in commercial modules93,94,100,101. Therefore, there is a need for computational discovery of new SF materials. The primary criterion for a material to undergo SF is the thermodynamic driving force, i.e., the difference between the singlet exciton energy and twice the triplet exciton energy, ES − 2ET, which can be calculated using GW+BSE102106.

Recently, we have used SISSO to find models based on low-cost DFT properties that can reliably predict the GW+BSE SF driving force107. SISSO generated several models that predicted the GW+BSE SF driving force with errors below 0.2 eV. Based on considerations of accuracy and computational cost, two SISSO models were selected to build a two-step hierarchical classifier for screening promising candidates for SF. To train SISSO, we generated a dataset of GW+BSE calculations of the SF driving force of 101 molecular crystals of polycyclic aromatic hydrocarbons (PAHs). PAHs are compounds comprising carbon and hydrogen atoms and containing multiple aromatic rings. Most SF materials are PAHs. In addition to SF, PAHs and their functionalized derivatives have versatile applications in organic electronic devices108118. To form the PAH101 set, crystal structures of unsubstituted PAHs (containing only C and H atoms) were extracted from the Cambridge Structural Database (CSD)119. The PAH101 set contains several sub-classes including acenes, rylenes, zethrenes, as well as various compounds that do not belong to any particular family. As shown in Fig. 1, the PAH101 set contains molecules ranging in size from 12 atoms in benzene (CSD Reference: BENZEN) to 136 atoms in two pyrene-stabilized acenes 9,11,13,22,24,26-Hexaphenyltetrabenzo[derswxk1l1]nonacene (CSD Reference: KECLAH), 9,11,13,14,15,16,18,20-Octaphenyldibenzo[dec1d1]heptacene (CSD Reference: TAYSUJ), and a phenylated pentacene 1,2,3,4,6,8,9,10,11,13-Decaphenylpentacene (CSD Reference: VEBJAO). The crystal size in the PAH101 set ranges from 44 atoms in the unit cell for biphenyl (CSD Reference: BIPHEN) to 544 atoms in 1,2,3,4,6,8,9,10,11,13-Decaphenylpentacene (CSD Reference: VEBJAO).

Fig. 1.

Fig. 1

Statistics of PAH101. Histograms of the number of atoms (a) in a single molecule and (b) in a crystal unit cell for the materials in the PAH101 set. Also shown are illustrations of the molecular structures of benzene (BENZEN), 9,11,13,22,24,26-Hexaphenyltetrabenzo[derswxk1l1]nonacene (KECLAH), 9,11,13,14,15,16,18,20-Octaphenyldibenzo[dec1d1]heptacene (TAYSUJ), 1,2,3,4,6,8,9,10,11,13-Decaphenylpentacene (VEBJAO), and the crystal structures of 1,2,3,4,6,8,9,10,11,13-Decaphenylpentacene (VEBJAO) and Biphenyl (BIPHEN).

The PAH101 dataset contains GW+BSE results for the electronic and optical properties of molecular crystals, as well as the DFT-level SISSO primary features used in Ref. 107. We envision this dataset being used for computational discovery of crystalline organic semiconductors and chromophores with desired properties for applications in various organic electronic devices. The electronic and optical properties of most of the materials in the PAH101 set have not been thoroughly investigated experimentally. Some of the quantities calculated here, such as triplet excitation energies, are difficult to probe experimentally and require highly specialized techniques and facilities. Therefore, although the PAH101 set is relatively small, it is possible that some useful materials would be found in it. For example, the dataset contains information on optical gaps and absorption spectra, which could be used to search for chromophores that absorb light in a certain energy range. In addition, the dataset contains singlet and triplet excitation energies, which can be used to evaluate candidate chromophores for triplet-triplet annihilation (TTA) and thermally activated delayed fluorescence (TADF). TTA chromophores can be used for harvesting photons with energies below the absorption threshold of a solar cell by up-conversion of two low-energy triplet excitons into one singlet exciton that can be absorbed120,121. TADF chromophores can be used to enhance the efficiency of OLEDs by converting electrically generated non-radiative triplet excitons into radiative singlet excitons23,122,123.

The dataset also contains quantities related to charge separation and transport in organic devices. The band dispersion, which can be extracted from GW band structures in the dataset, is related to transport in crystalline organic semiconductors, which affects the performance of organic electronic devices such as field effect transistors (OFETs)124. The singlet exciton binding energy corresponds to the difference between the GW fundamental gap and the BSE optical gap. This is the energy required to split photogenerated excitons into free charge carriers in organic solar cells. In most organic materials the exciton binding energy is significant compared to inorganic materials because the dielectric screening of charges is not as strong. However, some materials in the PAH101 set, characterized by very extended and/or elongated π systems, have low exciton binding energies, below 0.2 eV. The GW static dielectric constant is related to the strength of charge screening in the material, and consequently to the exciton binding energy and band dispersion. Organic materials typically have relatively low dielectric constants (around 3). The PAH101 set contains several materials with unusually high dielectric constants, ranging from 7 to 10.

Furthermore, this dataset can be used as a resource for comparing and benchmarking the performance of various electronic structure methods for calculating the electronic and optical properties of molecular crystals. Finally, this dataset can be used to augment other datasets, e.g., DFT datasets for molecular crystals or TDDFT datasets for isolated molecules to train multi-fidelity ML models for predicting various electronic and optical properties of molecular crystals. In summary, because the PAH101 is a unique set of GW+BSE data for molecular crystals, we expect it to be a resource of great usefulness to the computational community.

Methods

Hydrogen Addition

The starting geometries of the 101 molecular crystals were extracted from the Cambridge Structural Database (CSD)119. The CSD reference codes for each material are available in the data records. Some of the CIF files in the CSD are missing the hydrogen atom positions, which cannot be determined by X-ray diffraction. To provide an approximate position for each missing H atom, we have developed the Hydrogen Append (HAppend) code, available in the GitHub repo: https://github.com/BLABABA/HAppend. HAppend is written in Python and uses RDKit125 and Pymatgen126. The workflow of HAppend is illustrated in Fig. 2 using BEANTR as an example. All H atoms were removed from the CIF file for the purpose of demonstration. HAppend does not use the symmetry information provided in the CIF file. In step (1) the unit cell is replicated to build a super-cell so that any molecular fragments inside the unit cell can be completed. In step (2) all the complete molecules and molecular fragments are identified. Subsequently, any broken fragments at the supercell boundary (colored in blue in Fig. 2) are removed. In step (3) all the complete molecules are extracted. Only two molecules are shown in Fig. 2 for demonstration purposes. Step (4) is identifying the missing hydrogen sites and appending H atoms to each molecule. A detailed schematic of step (4) is shown in the bottom row of Fig. 2. In step 4a the missing hydrogen sites are identified by checking the type of hybridization of each carbon atom against the number of valence electrons participating in covalent bonds. In this example, all C atoms in the aromatic rings have sp2 hybridization. In step 4b H atoms are attached to atoms with unpaired valence electrons. The bond length and angle are determined based on the bonded neighbors and hybridization type. In this example, given that the C atom is sp2 hybridized, the two H-C-C angles should be about 120°. This process is performed for all atoms in the BEANTR molecule and the completed molecule is obtained after step 4c. Step (5) reconstructs the complete super-cell with appended H atoms. Step (6) reduces the super-cell back to the original unit cell with all the coordinates for the missing H atoms now known. Finally, sanity checks are performed to verify that the structure is correct. The structure is checked against the expected chemical formula (if provided in the CIF file from the CSD). In addition, RDKit is used to repeat step 4b and confirm that the explicit valence matches with the type of hybridization for each atom. If the sanity check fails, the user may have to attach H atoms manually. HAppend is not limited to PAHs and may be used to add missing H atoms to other types of organic molecules.

Fig. 2.

Fig. 2

Hydrogen addition. Schematic illustration of the workflow of adding missing H atoms with HAppend, demonstrated for the BEANTR crystal. The top row shows the steps of (1) super-cell construction, (2) removal of broken molecular fragments (colored in blue), (3) extraction of molecules, (4) addition of H atoms to all molecules, (5) reconstruction of the supercell with the H atoms attached to all molecules, and (6) reduction of the supercell to a single unit cell. For Steps (3) and (4) only two molecules are shown for clarity. The bottom row presents a detailed view of the hydrogen addition step: (a) identification of missing H sites, (b) calculation of approximate H atom positions, and (c) attachment of H atoms to the molecule.

Structural Relaxation

Full unit cell relaxation was performed with either CASTEP127 or FHI-aims128,129 (which code was used is reported in the data records). The Perdew, Burke, and Ernzerhof (PBE) exchange-correlation functional130 was used with the Tkatchenko-Scheffler (TS) pairwise dispersion method131. For relaxations performed with CASTEP, norm-conserving pseudopotentials were utilized for carbon and hydrogen. The plane-wave basis set cutoff was 750 eV. A Monkhorst-Pack k-grid with a spacing of 0.07 Å−1 was adopted. The convergence thresholds for total energy, maximum force, maximum stress, and maximum displacement were 5 × 10−6 eV/atom, 0.01 eV/Å, 0.02 GPa, and 5 × 10−4−1, respectively. For structures relaxed with FHI-aims, the tight numerical settings and tier-2 basis sets were used. The fully relaxed crystal structures and the molecular geometries extracted from them are provided in the data records. The GW+BSE calculations were performed for the fully relaxed crystal structures.

DFT Features

The data records include the DFT primary features used for SISSO in Ref. 107. The DFT features of molecules and crystals were calculated using FHI-aims128,129. From considerations of computational efficiency, the DFT primary features were calculated with locally-optimized geometries. The crystal structures were relaxed with the lattice vectors fixed at the experimental values and the single molecule properties were calculated using molecules extracted from these locally-optimized crystal structures. All primary features were calculated with the PBE functional130. using the tight numerical setting and tier-2 basis sets of FHI-aims128. This procedure, followed in Ref. 107, was intended to simulate a screening workflow, in which the primary features are evaluated fast and further more accurate calculations are pursued only for materials predicted to be promising by the SISSO models.

Mean-Field Wave-Function Calculation

The Quantum ESPRESSO package132 was used to compute the DFT eigenvectors and eigenvalues, which served as the starting point for non-self-consistent GW+BSE calculations, using the PBE functional. The kinetic energy cutoff was 50 Ry. The k-point grids used for each material are reported in the data records. Norm-conserving pseudopotentials were chosen in order to take advantage of the simplification of matrix elements in GW+BSE calculations133. We used the Troullier-Martins norm-conserving pseudopotentials provided on the Quantum Espresso website. These were generated using FHI98PP134 and converted with fhi2upf.x v.5.0.2 to the Quantum Espresso format. The cutoff radii used for carbon (C) were 1.5 Bohr for s states and 1.5 Bohr for p states. For the s states of hydrogen (H) a cutoff radius of 0.8 Bohr was used. The reference configurations for C and H were 2s22p2 and 1s1, respectively. The pseudopotential files used in our calculations are provided in the GitHub repository HAppend/UPF/.

GW+BSE Calculations

The BerkeleyGW package133 was used to perform GW+BSE calculations. From considerations of computational cost, non-self-consistent G0W0 calculations were performed. 550 unoccupied states were included in the dielectric function and the self-energy operator evaluations. The inverse microscopic dielectric function was calculated in the static limit at zero frequency. The generalized plasmon-pole model was used to obtain the self-energy at finite frequency. The static remainder correction135 was applied to accelerate convergence. The screened Coulomb cutoff was set to 10 Ry. For all other settings the default values of BerkeleyGW were used. The Bethe-Salpeter equation was solved within the Tamm-Dancoff approximation (TDA) with 24 valence bands and 24 conduction bands included. The fine k-point grid wave-functions were generated using a fine k-point grid twice as dense as the coarse k-point grid. The coarse and fine k-point grid settings for each material are reported in the data records.

Data Records

The PAH101 dataset136 is available via the NOvel MAterials Discovery (NOMAD) repository137 and can be accessed at 10.17172/NOMAD/2024.12.05-1. The data are provided in YAML (.yaml) format. Each file is named as CSD-REFERENCE.archive.yaml, where CSD-REFERENCE is the CSD reference code for each structure. The data structure for each material record is described in Table 1. The top level sections are struct_id, geometry, dft, and gwbse. The struct_id section contains the CSD reference code. The geometry section provides the fully relaxed crystal structure and the single molecule geometry extracted from it. The dft section contains all the SISSO primary features used in Ref. 107. The gwbse section provides quasi-particle (QP) and excitonic properties for the PAH101 crystals, including the fundamental gap, quasiparticle band structure, the static dielectric constant, the first singlet exciton energy (optical gap), the first triplet exciton energy, the full dielectric function, and optical absorption spectra for light polarized along the three lattice vectors. The GW static dielectric constant is not available for some of the materials in the dataset because some data that was not needed for Ref. 107 was not preserved.

Table 1.

Data records: Description of the data structure of the PAH101 set with explanations of all entries. Abbreviated notations for the DFT primary features are provided in parentheses.

struct_id the CSD reference code for this structure
geometry relaxed_crystal the DFT-relaxed crystal structure saved in Pymatgen Structure format
molecule the single molecule geometry extracted from relaxed_crystal, saved in Pymatgen Molecule format
chemical_formula chemical formula of the single molecule
relax_code code used to perform crystal structure relaxation
dft gap_s (GapS) the single molecule gap, calculated based on the energy difference between the highest occupied molecular orbital (HOMO) and the lowest unoccupied molecular orbital (LUMO)
Et_s (ETS) the single molecule triplet formation energy, calculated based on the total energy difference between the ground-state and triplet-state molecule
DF_s the single molecule DFT estimate for the SF driving force, calculated by taking the difference between gap_s and twice Et_s
IP_s (IPS) the single molecule ionization potential (IP), calculated based on the total energy difference between a cation and neutral molecule
EA_s (EAS) the single molecule electron affinity (EA), calculated based on the total energy difference between an anion and neutral molecule
bandgap (GapC) the crystal band gap
Et (ETC) the crystal triplet formation energy, calculated based on the total energy difference between the ground-state and triplet-state crystal
DF the crystal DFT estimate for the SF driving force, calculated by taking the difference between bandgap and twice Et
VBdisp (VBdispC) the valence band dispersion, i.e., the energy range of the HOMO-derived band
CBdisp (CBdispC) the conduction band dispersion, i.e., the energy range of the LUMO-derived band
hab (Hab) the transfer integral, calculated with fragment orbital DFT172
polarization (PolarTensorS) the trace of the polarization tensor for a single molecule, calculated with DFT using the PBE functional and the range-separated self-consistently screened version of many-body dispersion (MBD@rsSCS) method143,173
epsilon_mbd (ϵC) the dielectric constant calculated with PBE+MBD@rsSCS
weight_s (MolWtS) the molecular weight in atomic mass units (amu)
density (ρC) the crystal density in amu Å−3
eigenvalues the eigenvalues for the single molecules, data stored as n × 4 matrix, whose columns are: State, Occupation, Eigenvalue [Ha], Eigenvalue [eV]
kgrid the k-grid settings for the calculation of crystal primary features
gwbse absorption a Optical absorption spectrum for light polarized along the a, b, and c crystal axes. Each absorption data record contains four columns: energy (eV), the imaginary and real parts of the dielectric function ϵ2 and ϵ1, and the normalized joint density of states.
b
c
bandstructure kpoints the high-symmetry k-point path used to calculate the GW band structure
val the values of band structure, saved as n × 8 matrix, whose columns are: spin, band index, k-point coordinate x, k-point coordinate y, k-point coordinate z, mean-field energy, quasi-particle energy, difference between mean-field and quasi-particle energy
bse_Es the singlet exciton energy (optical gap) calculated with BSE
bse_Et the triplet exciton energy calculated with BSE
bse_DF the SF driving force for a crystal, bse_Es − 2 × bse_Et
kgrid_coarse the k-grid used for coarse grid wave-function calculation
kgrid_fine the k-grid used for fine grid wave-function calculation
fundamental_gap the fundamental gap calculated with GW
bse_Es_bind the singlet-state exciton binding energy
bse_Et_bind the triplet-state exciton binding energy
epsilon_gw the static dielectric constant calculated within the random phase approximation (RPA); N.A. entered if not available

Technical Validation

Crystal Structures

To verify the results of full unit cell relaxation with PBE+TS, the root-mean square distance (RMSD) between the relaxed structures and the experimental structures was calculated. We used the COMPACK138 molecular overlay method, implemented as the Crystal Packing Similarity tool, in Mercury 2023.2.0139. COMPACK overlays clusters of molecules taken from each crystal, within given distance and angle tolerances, and minimizes the RMSD between atoms, typically excluding hydrogen. The output of COMPACK is the number of molecules that could be overlaid and the RMSD. COMPACK comparisons were performed with a cluster of 30 molecules and distance and angle tolerances of 35% and 35°. H atoms were not included. These were the settings used for structure comparison in the 7th crystal structure prediction blind test140,141. Figure 3 shows a histogram of the RMSD obtained for the PAH101 set. For the majority of the structures in the dataset the RMSD is below 0.3 Å. The three structures with the largest RMSDs are BNPERY, BIFUOR, and BEANTR. All three are monoclinic structures with larger than average deviations in their b lattice parameter and β angle. For instance, the relaxed b parameter of BEANTR is 6.00 Å, compared to 6.50 Å in the experimental structure, a deviation of 7.7 %. The relaxed structure of BNPERY has a β angle of 92.2°, compared to 98.5° in the experimental structure. Some differences between structures relaxed by DFT at 0K and structures experimentally characterized at room temperature are to be expected142. Overall, the performance of PBE+TS is within the community accepted standards of agreement with experiment, as established in the crystal structure prediction blind tests140,141. It is possible that performing relaxations with the more accurate many-body dispersion (MBD)143 method would reduce the RMSD.

Fig. 3.

Fig. 3

Crystal structure validation. Histogram of the RMSD30 of crystal structures relaxed with PBE+TS compared to the experimental structures from the CSD for the PAH101 set. The similarity overlay plots generated by Mercury are shown for BNPERY, BIFUOR, and BEANTR with the experimental structures colored in gray and the relaxed structures colored in green.

GW+BSE Convergence

The results of GW+BSE calculations with the BerkeleyGW code are sensitive to the convergence of several parameters57,144,145. Because of the large number of calculations performed for the PAH101 set, we have chosen parameters that provide a balance between accuracy and computational cost. The convergence of the settings used for the PAH101 dataset have been demonstrated previously to be sufficient for selected systems102,107. Figure 4a,b shows the convergence with respect to coarse k-point grid used in the GW step for representative materials. The number of k-points is inversely proportional to the unit cell size. Benzene has the smallest unit cell in the PAH101 set and therefore requires a relatively large number of k-points. 9,9’-bifluorenyl (CSD reference code BIFUOR) represents a system of intermediate size. For both materials, increasing the number of k-points beyond the chosen settings leads to a change of less than 0.001 eV in the GW band gap.

Fig. 4.

Fig. 4

Convergence of GW+BSE calculations. Change in the GW band gap as a function of the coarse k-point grid for (a) benzene and (b) 9,9’-bifluorenyl (BIFUOR) with respect to the finest k grid considered. (c) Change in the GW band gap as a function of the number of empty bands for fluoranthene (FLUANT02), 6-phenylpentacene (VEBKAP), and 1,2,3,4,6,8,9,10,11,13-decaphenylpentacene (VEBJAO) with respect to the extrapolated value. The dashed lines are a hyperbolic fit to the data. (d) Change in the optical gap of fluoranthene, chrysene (CRYSEN01), and triphenylene (TRIPHE12) as a function of the number of valence and conduction bands used in the BSE step with respect to the highest number of bands considered. The chosen settings are circled in red in Panels a-d. Absorption spectra obtained using an increasing number of bands in the BSE step for light polarized along the a-axis of (e) chrysene and (f) triphenylene.

Figure 4c shows the convergence with respect to the number of bands used in the GW step for the representative materials fluoranthene (CSD reference code FLUANT02), 6-phenylpentacene (CSD reference code VEBKAP), and 1,2,3,4,6,8,9,10,11,13-decaphenylpentacene (CSD reference code VEBJAO). The latter has the largest number of atoms in the unit cell in the PAH101 set and is therefore expected to require the most empty bands. To extrapolate the GW band gap to the limit of an infinite number of empty bands we applied a hyperbolic fit146: f(N) = a/(N − N0) + b, where N is the number of bands, a, b, and N0 are fitting parameters. For all three materials the difference between the GW band gap obtained with 550 empty bands and the extrapolated value is below 0.08 eV.

Figure 4d shows the convergence with respect to the number of fine bands used in the BSE step for the representative examples of fluoranthene, triphenylene (TRIPHE12), and chrysene (CRYSEN01). Increasing the number of valence and conduction bands used for the BSE step beyond 24 leads to a change of less than 0.06 eV in the optical gap. Figure 4e,f shows the absorption spectra obtained with an increasing number of fine bands for light polarized along the a-axis of chrysene and triphenylene. In both cases, the spectrum obtained with 12 fine bands is clearly unconverged. The spectra obtained with 24 and 36 fine bands are similar up to about 8 eV for chrysene and about 9 eV for triphenylene. In general, if one wishes to converge the absorption spectrum tightly up to a high energy, then a larger number of fine bands should be used.

The settings used for the PAH101 set are sufficiently robust for “production” calculations. Notably, in the time that passed since the PAH101 set was generated, there have been advances in streamlining the convergence of MBPT calculations147150. These have focused primarily on inorganic crystals with a few atoms in the unit cell. A workflow that converges the settings for each system individually would be too expensive for systems of the size of the PAH101 set. If a certain material is of particular interest, then more detailed calculations may be pursued with ultra-converged settings and/or more accurate methods than G0W0@PBE.

Optical Absorption

The GW+BSE approach has been benchmarked extensively for isolated molecules, for which high-level quantum chemistry reference data can be calculated57,151155. For molecular crystals no benchmark studies are available, owing to the difficulty of obtaining reference data for large systems with periodic boundary conditions. Therefore, we are only able to validate the results of GW+BSE by comparison to experiments. Table 2 shows a comparison of the GW+BSE optical gaps (singlet exciton energies) to experimental values and GW+BSE values reported by others, where available. The GW+BSE values reported here are within 0.2 eV or less of the values reported by others. The results of GW+BSE calculations can differ because of differences in the implementation and convergence settings, as discussed extensively in Ref. 57. Because the absorption edge is not abrupt, Tauc plots are typically used to extract the optical gap from absorption spectra156159. This can lead to some uncertainty in the experimental values. Here, if multiple experimental values are found for the same material, they are within 0.1 eV or less of each other in most cases. For the entries marked with *, we used the Tauc method to extract the optical gap from the experimental data because no value for the optical gap was reported in the paper. For the entries marked with **, there is a larger uncertainty in the optical gaps extracted using the Tauc method because the absorption edge does not decay to zero. In most cases, the GW+BSE optical gaps are within 0.2 eV or less from experimental values.

Table 2.

Optical gaps obtained using GW+BSE (EgGW+BSE) compared with experimental values (EgExp) and GW+BSE values reported by others (EgGW+BSE in literature), where available.

CSD Ref. Code Compound Name EgGW+BSE (eV) in PAH101 EgExp (eV) EgGW+BSE(eV) in literature
BENZEN Benzene 4.83 4.69-4.8174 5.0167
ANTCEN Anthracene 3.22 3.16175 3.3167
TETCEN01 Tetracene 2.24 2.38176,177 2.4167
PENCEN Pentacene 1.72 1.8-1.85178180 1.7-1.8167,181,182
ZZZDKE01 Hexacene 1.17 1.37*-1.4183185 1.0167
QQQCIG04 Rubrene (Orthorhombic) 2.28 2.32186
QQQCIG13 Rubrene (Monoclinic) 2.62 2.36187
QQQCIG14 Rubrene (Triclinic) 2.30 2.31187
PERLEN05 Perylene (SHB) 2.61 2.58*188,189
PERLEN07 Perylene (HB) 2.45 2.49*188,189
POBPIG Diindeno[1,2,3-cd:1′,2′,3′-lm]perylene 2.21 2.25190
QUATER10 Quaterrylene 1.33 1.48-1.60134,191,192
CORONE01 Coronene 2.96 2.9-2.92*193,194
HBZCOR Hexabenzo(bc,ef,hi,kl,no,qr)coronene 2.70 2.80142,162
BEANTR 1,2-Benzanthracene 3.27 3.14195
BIPHEN Biphenyl 3.41 4.1-4.18196199
CRYSEN01 Chrysene 3.66 3.6**160
TERPHE02 p-Terphenyl 4.17 3.9**160
BNPERY 1,12-Benzoperylene 2.80 2.4-2.5*200
KUBVUY 10,10’-Diphenyl-9,9’-bianthryl 3.23 2.9*201
KUBWAF01 9,9’-Bianthracenyl 3.05 2.7-2.8*202

Entries marked with * were extracted by us from absorption spectra using the Tauc method. Entries marked with ** have an absorption spectrum that is non-zero in the low-energy region, leading to a larger uncertainty in the optical gaps extracted using the Tauc method.

The GW+BSE absorption spectra are validated by comparison to thin film experimental data for representative materials160162. For an anisotropic crystal, the absorbance depends on the polarization direction of the incident light. Most absorption experiments are performed on polycrystalline samples and even in experiments performed on single crystals the crystallographic orientation of the sample with respect to the polarization of the incident light is often unknown. This introduces some uncertainties in the comparison with experiments. We calculate the absorbance for light polarized along the a, b, and c lattice vectors and normalize the maximum of the total absorbance to one. The results are shown in Fig. 5. For 1,2-benzanthracene (BEANTR), coronene (CORONE01), and hexabenzo(bc,ef,hi,kl,no,qr)coronene (HBZCOR) the agreement of the GW+BSE spectra with experiment is very good. For chrysene (CRYSEN01), p-terphenyl (TERPHE02), and triphenylene (TRIPHE12) the agreement is more qualitative.

Fig. 5.

Fig. 5

Absorption spectra. Absorption spectra calculated using GW+BSE compared with thin film experiments160162 for (a) 1,2-benzanthracene (BEANTR), (b) coronene (CORONE01) with the region around the absorption edge magnified for clarity, (c) chrysene (CRYSEN01), (d) hexabenzo(bc,ef,hi,kl,no,qr)coronene (HBZCOR), (e) p-terphenyl (TERPHE02), and (f) triphenylene (TRIPHE12).

In addition to the unknown direction of the polarization with respect to the crystal axes, there are other factors, both on the experimental side and on the theoretical side that can contribute to discrepancies. In ref. 160 the crystal structure of the films is not reported. The crystal structures used in our calculations are the common forms of p-terphenyl and chrysene, but both materials have other polymorphs reported in the CSD (for triphenylene all CSD entries appear to be the same structure but we cannot rule out the appearance of a different thin film polymorph). In polycrystals there can be contributions from grain boundaries (in samples comprising very small crystallites, which is not the case here, there can be surface contributions as well). Furthermore, we do not consider vibrational contributions in our simulations. Sources of errors in GW+BSE calculations include the DFT exchange-correlation functional used for the mean-field starting point, numerical convergence of various settings (k-point grids, number of empty states used in the GW step, the number of bands used in the BSE step), the non-self-consistency in the GW step, the plasmon pole approximation used in the GW step57, the Tamm-Dancoff approximation used in the BSE step25,163168, and the static approximation for W used in the BSE step163,169. See also Ref. 170 for additional discussion. The significance of different sources of errors can be material dependent. In the future, it would be desirable to rigorously assess the contributions of different sources of errors in GW+BSE by comparison to high-level theories or well-controlled experiments (performed on single crystals with well-defined polarization) for a diverse benchmark set of molecular crystals.

Usage Notes

The PAH101 set is the currently the largest trove available of GW+BSE data for molecular crystals. As such, it offers unique opportunities to (i) discover materials with desired electronic/ optical properties in the dataset itself, (ii) learn about correlations between DFT and GW+BSE values of various properties, and (iii) train machine learning models to help in materials discovery efforts. Examples of these use cases are provided in the Supplementary Information (SI).

The prospect of discovery of materials with potentially useful electronic and optical properties for organic devices is demonstrated in the SI. The PAH101 dataset contains materials with a broad range of optical gaps and triplet exciton energies. Based on the singlet and triplet excitation energies materials can be evaluated as prospective candidates for SF, TTA, and TADF to improve the efficiency of organic solar cells and OLEDs. The PAH101 dataset also contains band structures, exciton binding energies, and static dielectric constants. Notably, a few of the materials in the dataset have particularly low singlet exciton binding energies and particularly high dielectric constants. In addition to containing information on potentially useful materials, the dataset could be used to identify structure-property correlations.

In materials discovery workflows it is desirable to use models that are fast to evaluate for preliminary screening of a large number of candidates. Semi-local DFT has been used extensively for this purpose. However, such models must be sufficiently reliable to at least capture the correct trends. In the SI, we present statistical analysis across our dataset to examine whether selected DFT models are sufficiently predictive of GW+BSE quantities. The PAH101 dataset may similarly serve as a resource for researchers interested in comparing the results of other DFT and TDDFT models to GW+BSE.

To demonstrate how the PAH101 dataset can be reused to train ML models for other purposes than SF, we used SISSO to find predictive models for the GW fundamental band gap. The results are presented in the SI. In Ref. 171, we further used SISSO to train models to predict the optical gap, the triplet exciton energy, the singlet-triplet gap, and the singlet exciton binding energy. The dataset can be used in a similar manner to train ML models other than SISSO to predict any of the quantities included in the dataset. In addition, it can be used to supplement larger lower-fidelity datasets to train multi-fidelity models.

Supplementary information

Acknowledgements

Work at CMU was supported by the National Science Foundation (NSF) Designing Materials to Revolutionize and Engineer our Future (DMREF) program under award DMR-2323749. This research used resources of the Argonne Leadership Computing Facility (ALCF), which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357 and of the National Energy Research Scientific Computing Center (NERSC), a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy, under Contract DE-AC02-05CH11231.

Author contributions

S.G., X.L., Y.L., X.W., K.Z., and V.C. performed the calculations and curated the data. S.G. and Y.L. performed additional analysis and validation. B.S. provided the structures relaxed with CASTEP. N. M., S.G., X.L., Y.L., and X.W. wrote the manuscript. N.M. conceived and led the project.

Code availability

• The HAppend code for adding missing hydrogen atoms to molecular crystal structures is available in the GitHub Repository HAppend (10.5281/zenodo.15093246), together with the pseudopotentials used in our calculations and scripts for making band structure and absorption plots.• Scripts for calculating the SISSO primary features and for processing SISSO results are available in the GitHub repository MLfeat_FHI-aims (10.5281/zenodo.15093306).• The BerkeleyGW code for performing GW+BSE calculations133 is available at the BerkeleyGW website.• The FHI-aims code128, used to perform some relaxations and calculate DFT features, is available at the FHI-aims website. Version 18.06.07 was used here.• The Quantum ESPRESSO code132, used to calculate the mean-field wave functions for subsequent GW+BSE calculations, is available at the Quantum ESPRESSO website.• The SISSO code72, used to perform sure independent screening and sparsifying operator model training, is available at the GitHub Repository SISSO. SISSO version 3.3 dated July 2023 was used here.• Scripts for preparing the input for SISSO, running the training and model evaluation, analyzing the SISSO output, and making Pareto plots and correlation plots between the SISSO model predictions and the true labels are provided in the GitHub repository SISSOonPAH (10.5281/zenodo.15093308).

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Siyu Gao, Xingyu Liu.

Supplementary information

The online version contains supplementary material available at 10.1038/s41597-025-04959-0.

References

  • 1.Louie, S. G., Chan, Y.-H., da Jornada, F. H., Li, Z. & Qiu, D. Y. Discovering and understanding materials through computation. Nature Materials20, 728–735 (2021). [DOI] [PubMed] [Google Scholar]
  • 2.Luo, S., Li, T., Wang, X., Faizan, M. & Zhang, L. High-throughput computational materials screening and discovery of optoelectronic semiconductors. Wiley Interdisciplinary Reviews: Computational Molecular Science11, e1489 (2021). [Google Scholar]
  • 3.Marzari, N., Ferretti, A. & Wolverton, C. Electronic-structure methods for materials design. Nature materials20, 736–749 (2021). [DOI] [PubMed] [Google Scholar]
  • 4.Stein, H. S. & Gregoire, J. M. Progress and prospects for accelerating materials science with automated and autonomous workflows. Chemical Science10, 9640–9649 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Szczypiński, F. T., Bennett, S. & Jelfs, K. E. Can we predict materials that can be synthesised? Chemical Science12, 830–840 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Jain, A., Shin, Y. & Persson, K. A. Computational predictions of energy materials using density functional theory. Nature Reviews Materials1, 1–13 (2016). [Google Scholar]
  • 7.Pyzer-Knapp, E. O., Suh, C., Gómez-Bombarelli, R., Aguilera-Iparraguirre, J. & Aspuru-Guzik, A. What is high-throughput virtual screening? A perspective from organic materials discovery. Annual Review of Materials Research45, 195–216 (2015). [Google Scholar]
  • 8.Teale, A. M. et al. DFT exchange: sharing perspectives on the workhorse of quantum chemistry and materials science. Physical chemistry chemical physics24, 28700–28781 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Allen, A. E. & Tkatchenko, A. Machine learning of material properties: Predictive and interpretable multilinear models. Science Advances8, eabm7185 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Fiedler, L., Shah, K., Bussmann, M. & Cangi, A. Deep dive into machine learning density functional theory for materials science and chemistry. Physical Review Materials6, 040301 (2022). [Google Scholar]
  • 11.Foppa, L., Purcell, T. A., Levchenko, S. V., Scheffler, M. & Ghiringhelli, L. M. Hierarchical symbolic regression for identifying key physical parameters correlated with bulk properties of perovskites. Physical Review Letters129, 055301 (2022). [DOI] [PubMed] [Google Scholar]
  • 12.Hoock, B., Rigamonti, S. & Draxl, C. Advancing descriptor search in materials science: feature engineering and selection strategies. New Journal of Physics24, 113049 (2022). [Google Scholar]
  • 13.Peng, J. et al. Human-and machine-centred designs of molecules and materials for sustainability and decarbonization. Nature Reviews Materials7, 991–1009 (2022). [Google Scholar]
  • 14.Doan, H. A., Wang, X. & Snurr, R. Q. Computational screening of supported metal oxide nanoclusters for methane activation: Insights into homolytic versus heterolytic C–H bond dissociation. The Journal of Physical Chemistry Letters14, 5018–5024 (2023). [DOI] [PubMed] [Google Scholar]
  • 15.Jain, A., Voznyy, O. & Sargent, E. H. High-throughput screening of lead-free perovskite-like materials for optoelectronic applications. The Journal of Physical Chemistry C121, 7183–7187 (2017). [Google Scholar]
  • 16.Diao, X. et al. High-throughput screening of stable and efficient double inorganic halide perovskite materials by DFT. Scientific Reports12, 12633 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Broberg, D. et al. High-throughput calculations of charged point defect properties with semi-local density functional theory- performance benchmarks for materials screening applications. npj Computational Materials9, 72 (2023). [Google Scholar]
  • 18.Brunin, G., Ricci, F., Ha, V.-A., Rignanese, G.-M. & Hautier, G. Transparent conducting materials discovery using high-throughput computing. npj Computational Materials5, 63 (2019). [Google Scholar]
  • 19.Casida, M. E. Time-dependent density-functional theory for molecules and molecular solids. Journal of Molecular Structure: THEOCHEM914, 3–18 (2009). [Google Scholar]
  • 20.Adamo, C. & Jacquemin, D. The calculations of excited-state properties with time-dependent density functional theory. Chemical Society Reviews42, 845–856 (2013). [DOI] [PubMed] [Google Scholar]
  • 21.Laurent, A. D. & Jacquemin, D. TD-DFT benchmarks: A review. International Journal of Quantum Chemistry113, 2019–2039 (2013). [Google Scholar]
  • 22.Herbert, J. M. Density-functional theory for electronic excited states. In Theoretical and Computational Photochemistry, 69–118 (Elsevier, 2023).
  • 23.Wang, X., Gao, S., Zhao, M. & Marom, N. Benchmarking time-dependent density functional theory for singlet excited states of thermally activated delayed fluorescence chromophores. Physical Review Research4, 033147 (2022). [Google Scholar]
  • 24.Rohlfing, M. & Louie, S. G. Electron-hole excitations and optical spectra from first principles. Phys. Rev. B62, 4927–4944 (2000). [Google Scholar]
  • 25.Sharifzadeh, S. Many-body perturbation theory for understanding optical excitations in organic molecules and solids. Journal of Physics: Condensed Matter30, 153002 (2018). [DOI] [PubMed] [Google Scholar]
  • 26.Blase, X., Duchemin, I. & Jacquemin, D. The Bethe-Salpeter equation in chemistry: Relations with TD-DFT, applications and challenges. Chemical Society Reviews47, 1022–1043 (2018). [DOI] [PubMed] [Google Scholar]
  • 27.Blase, X., Duchemin, I., Jacquemin, D. & Loos, P.-F. The Bethe-Salpeter equation formalism: From physics to chemistry. The Journal of Physical Chemistry Letters11, 7371–7382 (2020). [DOI] [PubMed] [Google Scholar]
  • 28.Saal, J. E., Kirklin, S., Aykol, M., Meredig, B. & Wolverton, C. Materials design and discovery with high-throughput density functional theory: the open quantum materials database (OQMD). Jom65, 1501–1509 (2013). [Google Scholar]
  • 29.Xu, P., Ji, X., Li, M. & Lu, W. Small data machine learning in materials science. npj Computational Materials9, 42 (2023). [Google Scholar]
  • 30.Sorkun, M. C., Astruc, S., Koelman, J. V. A. & Er, S. An artificial intelligence-aided virtual screening recipe for two-dimensional materials discovery. npj Computational Materials6, 106 (2020). [Google Scholar]
  • 31.Chen, C. & Ong, S. P. A universal graph deep learning interatomic potential for the periodic table. Nature Computational Science2, 718–728 (2022). [DOI] [PubMed] [Google Scholar]
  • 32.Deng, B. et al. CHGNet as a pretrained universal neural network potential for charge-informed atomistic modelling. Nature Machine Intelligence5, 1031–1041 (2023). [Google Scholar]
  • 33.Brockherde, F. et al. Bypassing the Kohn-Sham equations with machine learning. Nature communications8, 872 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Schleder, G. R., Padilha, A. C., Acosta, C. M., Costa, M. & Fazzio, A. From DFT to machine learning: Recent approaches to materials science- A review. Journal of Physics: Materials2, 032001 (2019). [Google Scholar]
  • 35.Li, H. et al. Deep-learning density functional theory Hamiltonian for efficient ab initio electronic-structure calculation. Nature Computational Science2, 367–377 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Huang, B., von Rudorff, G. F. & von Lilienfeld, O. A. The central role of density functional theory in the AI age. Science381, 170–175 (2023). [DOI] [PubMed] [Google Scholar]
  • 37.Choudhary, K. et al. Recent advances and applications of deep learning methods in materials science. npj Computational Materials8, 59 (2022). [Google Scholar]
  • 38.Bhat, V., Ganapathysubramanian, B. & Risko, C. Rapid estimation of the intermolecular electronic couplings and charge-carrier mobilities of crystalline molecular organic semiconductors through a machine learning pipeline. The Journal of Physical Chemistry Letters15, 7206–7213 (2024). [DOI] [PubMed] [Google Scholar]
  • 39.Duan, C., Liu, F., Nandy, A. & Kulik, H. J. Putting density functional theory to the test in machine-learning-accelerated materials discovery. The Journal of Physical Chemistry Letters12, 4628–4637 (2021). [DOI] [PubMed] [Google Scholar]
  • 40.Merchant, A. et al. Scaling deep learning for materials discovery. Nature624, 80–85 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Kalita, B., Li, L., McCarty, R. J. & Burke, K. Learning to approximate density functionals. Accounts of Chemical Research54, 818–826 (2021). [DOI] [PubMed] [Google Scholar]
  • 42.Sattari, K. et al. De novo molecule design towards biased properties via a deep generative framework and iterative transfer learning. Digital Discovery3, 410–421 (2024). [Google Scholar]
  • 43.del Rio, B. G., Phan, B. & Ramprasad, R. A deep learning framework to emulate density functional theory. npj Computational Materials9, 158 (2023). [Google Scholar]
  • 44.Hou, B., Wu, J. & Qiu, D. Y. Unsupervised representation learning of Kohn-Sham states and consequences for downstream predictions of many-body effects. Nature Communications15, 9481 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Packwood, D. M., Kaneko, Y., Ikeda, D. & Ohno, M. An intelligent, user-inclusive pipeline for organic semiconductor design. Advanced Theory and Simulations6, 2300159 (2023). [Google Scholar]
  • 46.Ghosh, K. et al. Deep learning spectroscopy: Neural networks for molecular excitation spectra. Advanced Science6, 1801367 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Singh, K., Münchmeyer, J., Weber, L., Leser, U. & Bande, A. Graph neural networks for learning molecular excitation spectra. Journal of Chemical Theory and Computation18, 4408–4417 (2022). [DOI] [PubMed] [Google Scholar]
  • 48.Jain, A. et al. Commentary: The Materials Project: A materials genome approach to accelerating materials innovation. APL materials1, 011002 (2013). [Google Scholar]
  • 49.Curtarolo, S. et al. AFLOW: An automatic framework for high-throughput materials discovery. Computational Materials Science58, 218–226 (2012). [Google Scholar]
  • 50.Choudhary, K. et al. The joint automated repository for various integrated simulations (JARVIS) for data-driven materials design. npj computational materials6, 173 (2020). [Google Scholar]
  • 51.Ramakrishnan, R., Dral, P. O., Rupp, M. & Von Lilienfeld, O. A. Quantum chemistry structures and properties of 134 kilo molecules. Scientific data1, 1–7 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Schreiner, M., Bhowmik, A., Vegge, T., Busk, J. & Winther, O. Transition1x- A dataset for building generalizable reactive machine learning potentials. Scientific Data9, 779 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.O’Mara, J., Meredig, B. & Michel, K. Materials data infrastructure: a case study of the Citrination platform to examine data import, storage, and access. JOM68, 2031–2034 (2016). [Google Scholar]
  • 54.Stuke, A. et al. Atomic structures and orbital energies of 61,489 crystal-forming organic molecules. Scientific data7, 58 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Olsthoorn, B., Geilhufe, R. M., Borysov, S. S. & Balatsky, A. V. Band gap prediction for large organic crystal structures with machine learning. Advanced Quantum Technologies2, 1900023 (2019). [Google Scholar]
  • 56.Fediai, A., Reiser, P., Peña, J., Friederich, P. & Wenzel, W. Accurate GW frontier orbital energies of 134 kilo molecules. Scientific Data10, 581 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.van Setten, M. J. et al. GW100: Benchmarking G0W0 for molecular systems. Journal of chemical theory and computation11, 5665–5687 (2015). [DOI] [PubMed] [Google Scholar]
  • 58.Venturella, C., Hillenbrand, C., Li, J. & Zhu, T. Machine learning many-body Green’s functions for molecular excitation spectra. Journal of Chemical Theory and Computation20, 143–154 (2024). [DOI] [PubMed] [Google Scholar]
  • 59.Çaylak, O. & Baumeier, B. Machine learning of quasiparticle energies in molecules and clusters. Journal of Chemical Theory and Computation17, 4891–4900 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Fare, C., Fenner, P., Benatan, M., Varsi, A. & Pyzer-Knapp, E. O. A multi-fidelity machine learning approach to high throughput materials screening. npj Computational Materials8, 257 (2022). [Google Scholar]
  • 61.Palizhati, A. et al. Agents for sequential learning using multiple-fidelity data. Scientific Reports12, 4694 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Chen, C., Zuo, Y., Ye, W., Li, X. & Ong, S. P. Learning properties of ordered and disordered materials from multi-fidelity data. Nature Computational Science1, 46–53 (2021). [DOI] [PubMed] [Google Scholar]
  • 63.Batra, R. & Sankaranarayanan, S. Machine learning for multi-fidelity scale bridging and dynamical simulations of materials. Journal of Physics: Materials3, 031002 (2020). [Google Scholar]
  • 64.Liu, D. & Wang, Y. Multi-fidelity physics-constrained neural network and its application in materials modeling. Journal of Mechanical Design141, 121403 (2019). [Google Scholar]
  • 65.Pilania, G., Gubernatis, J. E. & Lookman, T. Multi-fidelity machine learning models for accurate bandgap predictions of solids. Computational Materials Science129, 156–163 (2017). [Google Scholar]
  • 66.Islam, M., Thakur, M. S. H., Mojumder, S. & Hasan, M. N. Extraction of material properties through multi-fidelity deep learning from molecular dynamics simulation. Computational Materials Science188, 110187 (2021). [Google Scholar]
  • 67.Yang, J., Manganaris, P. & Mannodi-Kanakkithodi, A. Discovering novel halide perovskite alloys using multi-fidelity machine learning and genetic algorithm. The Journal of Chemical Physics160 (2024). [DOI] [PubMed]
  • 68.Liu, X., De Breuck, P.-P., Wang, L. & Rignanese, G.-M. A simple denoising approach to exploit multi-fidelity data for machine learning materials properties. npj Computational Materials8, 233 (2022). [Google Scholar]
  • 69.Greenman, K. P., Green, W. H. & Gómez-Bombarelli, R. Multi-fidelity prediction of molecular optical peaks with deep learning. Chemical Science13, 1152–1162 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Feng, S., Zhou, H. & Dong, H. Using deep neural network with small dataset to predict material defects. Materials & Design162, 300–310 (2019). [Google Scholar]
  • 71.De Breuck, P.-P., Hautier, G. & Rignanese, G.-M. Materials property prediction for limited datasets enabled by feature selection and joint learning with modnet. npj Computational Materials7, 83 (2021). [Google Scholar]
  • 72.Ouyang, R., Curtarolo, S., Ahmetcik, E., Scheffler, M. & Ghiringhelli, L. M. SISSO: A compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates. Phys. Rev. Mater.2, 083802 (2018). [Google Scholar]
  • 73.Purcell, T. A. R., Scheffler, M. & Ghiringhelli, L. M. Recent advances in the SISSO method and their implementation in the SISSO++ code. The Journal of Chemical Physics159, 114110 (2023). [DOI] [PubMed] [Google Scholar]
  • 74.Cao, G. et al. Artificial intelligence for high-throughput discovery of topological insulators: The example of alloyed tetradymites. Phys. Rev. Mater.4, 034204 (2020). [Google Scholar]
  • 75.Bartel, C. J. et al. New tolerance factor to predict the stability of perovskite oxides and halides. Sci. Adv.5, eaav0693 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Andersen, M., Levchenko, S. V., Scheffler, M. & Reuter, K. Beyond Scaling Relations for the Description of Catalytic Materials. ACS Catal.9, 2752–2759 (2019). [Google Scholar]
  • 77.Bartel, C. J. et al. Physical descriptor for the Gibbs energy of inorganic crystalline solids and temperature-dependent materials chemistry. Nat. Commun.9, 4168 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Foppa, L. et al. Materials genes of heterogeneous catalysis from clean experiments and artificial intelligence. MRS Bull.46, 1016–1026 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Hart, G. L., Mueller, T., Toher, C. & Curtarolo, S. Machine learning for alloys. Nature Reviews Materials6, 730–755 (2021). [Google Scholar]
  • 80.Luo, Y., Li, M., Yuan, H., Liu, H. & Fang, Y. Predicting lattice thermal conductivity via machine learning: A mini review. npj Computational Materials9, 4 (2023). [Google Scholar]
  • 81.Song, Z. et al. Distilling universal activity descriptors for perovskite catalysts from multiple data sources via multi-task symbolic regression. Materials Horizons10, 1651–1660 (2023). [DOI] [PubMed] [Google Scholar]
  • 82.Han, Z.-K. et al. Single-atom alloy catalysts designed by first-principles calculations and artificial intelligence. Nature Communications12, 1833 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Hoffmann, N., Cerqueira, T. F., Schmidt, J. & Marques, M. A. Superconductivity in antiperovskites. npj Computational Materials8, 150 (2022). [Google Scholar]
  • 84.Guo, Z., Hu, S., Han, Z.-K. & Ouyang, R. Improving symbolic regression for predicting materials properties with iterative variable selection. Journal of Chemical Theory and Computation18, 4945–4951 (2022). [DOI] [PubMed] [Google Scholar]
  • 85.Ma, B. et al. An interpretable machine learning strategy for pursuing high piezoelectric coefficients in (K0.5Na0.5)NbO3-based ceramics. npj Computational Materials9, 229 (2023). [Google Scholar]
  • 86.Mou, L.-H., Han, T., Smith, P. E. S., Sharman, E. & Jiang, J. Machine learning descriptors for data-driven catalysis study. Advanced Science10, 2301020 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Ren, C., Li, Q., Ling, C. & Wang, J. Mechanism-guided design of photocatalysts for CO2 reduction toward multicarbon products. Journal of the American Chemical Society145, 28276–28283 (2023). [DOI] [PubMed] [Google Scholar]
  • 88.Oh, S.-H., Yoo, S.-H. & Jang, W. Small dataset machine-learning approach for efficient design space exploration: Engineering ZnTe-based high-entropy alloys for water splitting. npj Computational Materials10, 166 (2024). [Google Scholar]
  • 89.Khatua, R., Das, B. & Mondal, A. Physics-informed machine learning with data-driven equations for predicting organic solar cell performance. ACS Applied Materials & Interfaces16, 57467–57480 (2024). [DOI] [PubMed] [Google Scholar]
  • 90.Tian, S., Zhou, K., Yin, W. & Liu, Y. Machine learning enables the discovery of 2D Invar and anti-Invar monolayers. Nature Communications15, 6977 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Jacobs, R., Liu, J., Abernathy, H. & Morgan, D. Machine learning design of perovskite catalytic properties. Advanced Energy Materials14, 2303684 (2024). [Google Scholar]
  • 92.Wang, H., Ouyang, R., Chen, W. & Pasquarello, A. High-quality data enabling universality of band gap descriptor and discovery of photovoltaic perovskites. Journal of the American Chemical Society146, 17636–17645 (2024). [DOI] [PubMed] [Google Scholar]
  • 93.Smith, M. B. & Michl, J. Singlet fission. Chemical Reviews110, 6891–6936 (2010). [DOI] [PubMed] [Google Scholar]
  • 94.Smith, M. B. & Michl, J. Recent advances in singlet fission. Annual Review of Physical chemistry64, 361–386 (2013). [DOI] [PubMed] [Google Scholar]
  • 95.Monahan, N. & Zhu, X.-Y. Charge transfer–mediated singlet fission. Annual Review of Physical chemistry66, 601–618 (2015). [DOI] [PubMed] [Google Scholar]
  • 96.Lee, J. et al. Singlet exciton fission photovoltaics. Accounts of Chemical Research46, 1300–1311 (2013). [DOI] [PubMed] [Google Scholar]
  • 97.Xia, J. et al. Singlet fission: progress and prospects in solar cells. Advanced Materials29, 1601652 (2017). [DOI] [PubMed] [Google Scholar]
  • 98.Pazos-Outón, L. M. et al. A silicon–singlet fission tandem solar cell exceeding 100% external quantum efficiency with high spectral stability. ACS Energy Letters2, 476–480 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99.Daiber, B., van den Hoven, K., Futscher, M. H. & Ehrler, B. Realistic efficiency limits for singlet-fission silicon solar cells. ACS Energy Letters6, 2800–2808 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Fallon, K. J. et al. Exploiting excited-state aromaticity to design highly stable singlet fission materials. Journal of the American Chemical Society141, 13867–13876 (2019). [DOI] [PubMed] [Google Scholar]
  • 101.Rao, A. & Friend, R. H. Harnessing singlet exciton fission to break the shockley–queisser limit. Nature Reviews Materials2, 1–12 (2017). [Google Scholar]
  • 102.Wang, X., Garcia, T., Monaco, S., Schatschneider, B. & Marom, N. Effect of crystal packing on the excitonic properties of rubrene polymorphs. CrystEngComm18, 7353–7362 (2016). [Google Scholar]
  • 103.Wang, X., Liu, X., Cook, C., Schatschneider, B. & Marom, N. On the possibility of singlet fission in crystalline quaterrylene. The Journal of Chemical Physics148, 184101 (2018). [DOI] [PubMed] [Google Scholar]
  • 104.Liu, X. et al. Pyrene-stabilized acenes as intermolecular singlet fission candidates: importance of exciton wave-function convergence. Journal of Physics: Condensed Matter32, 184001 (2020). [DOI] [PubMed] [Google Scholar]
  • 105.Liu, X., Tom, R., Gao, S. & Marom, N. Assessing zethrene derivatives as singlet fission candidates based on multiple descriptors. The Journal of Physical Chemistry C124, 26134–26143 (2020). [Google Scholar]
  • 106.Wang, X. et al. Phenylated acene derivatives as candidates for intermolecular singlet fission. The Journal of Physical Chemistry C123, 5890–5899 (2019). [Google Scholar]
  • 107.Liu, X. et al. Finding predictive models for singlet fission by machine learning. npj Computational Materials8, 70 (2022). [Google Scholar]
  • 108.Wu, J., Pisula, W. & Müllen, K. Graphenes as potential material for electronics. Chemical Reviews107, 718–747 (2007). [DOI] [PubMed] [Google Scholar]
  • 109.Anthony, J. E. Functionalized acenes and heteroacenes for organic electronics. Chemical Reviews106, 5028–5048 (2006). [DOI] [PubMed] [Google Scholar]
  • 110.Hou, J., Inganäs, O., Friend, R. H. & Gao, F. Organic solar cells based on non-fullerene acceptors. Nature Materials17, 119–128 (2018). [DOI] [PubMed] [Google Scholar]
  • 111.Congreve, D. N. et al. External quantum efficiency above 100% in a singlet-exciton-fission–based organic photovoltaic cell. Science340, 334–337 (2013). [DOI] [PubMed] [Google Scholar]
  • 112.Weiss, L. R. et al. Strongly exchange-coupled triplet pairs in an organic semiconductor. Nature Physics13, 176–181 (2017). [Google Scholar]
  • 113.Katz, H. E. & Huang, J. Thin-film organic electronic devices. Annual Review of Materials Research39, 71–92 (2009). [Google Scholar]
  • 114.Brédas, J.-L., Norton, J. E., Cornil, J. & Coropceanu, V. Molecular understanding of organic solar cells: the challenges. Accounts of Chemical Research42, 1691–1699 (2009). [DOI] [PubMed] [Google Scholar]
  • 115.Anthony, J. E. The larger acenes: Versatile organic semiconductors. Angewandte Chemie International Edition47, 452–483 (2008). [DOI] [PubMed] [Google Scholar]
  • 116.Wang, C., Dong, H., Hu, W., Liu, Y. & Zhu, D. Semiconducting π-conjugated systems in field-effect transistors: A material odyssey of organic electronics. Chemical Reviews112, 2208–2267 (2012). [DOI] [PubMed] [Google Scholar]
  • 117.Mei, J., Diao, Y., Appleton, A. L., Fang, L. & Bao, Z. Integrated materials design of organic semiconductors for field-effect transistors. Journal of the American Chemical Society135, 6724–6746 (2013). [DOI] [PubMed] [Google Scholar]
  • 118.Khasbaatar, A. et al. From solution to thin film: Molecular assembly of π-conjugated systems and impact on (opto)electronic properties. Chemical Reviews123, 8395–8487 (2023). [DOI] [PubMed] [Google Scholar]
  • 119.Groom, C. R., Bruno, I. J., Lightfoot, M. P. & Ward, S. C. The Cambridge structural database. Acta Crystallographica Section B: Structural Science, Crystal Engineering and Materials72, 171–179 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 120.Wang, X., Tom, R., Liu, X., Congreve, D. N. & Marom, N. An energetics perspective on why there are so few triplet–triplet annihilation emitters. Journal of Materials Chemistry C8, 10816–10824 (2020). [Google Scholar]
  • 121.Wang, X. & Marom, N. An energetics assessment of benzo [a] tetracene and benzo [a] pyrene as triplet–triplet annihilation emitters. Molecular Systems Design & Engineering7, 889–898 (2022). [Google Scholar]
  • 122.Chen, X.-K., Kim, D. & Brédas, J.-L. Thermally activated delayed fluorescence (tadf) path toward efficient electroluminescence in purely organic materials: molecular level insight. Accounts of Chemical Research51, 2215–2224 (2018). [DOI] [PubMed] [Google Scholar]
  • 123.Wang, X., Wang, A., Zhao, M. & Marom, N. Inverted lowest singlet and triplet excitation energy ordering of graphitic carbon nitride flakes. The Journal of Physical Chemistry Letters14, 10910–10919 (2023). [DOI] [PubMed] [Google Scholar]
  • 124.Coropceanu, V. et al. Charge transport in organic semiconductors. Chemical Reviews107, 926–952 (2007). [DOI] [PubMed] [Google Scholar]
  • 125.Landrum, G. et al. RDkit: A software suite for cheminformatics, computational chemistry, and predictive modeling. https://www.rdkit.org (2013).
  • 126.Ong, S. P. et al. Python Materials Genomics (pymatgen): A robust, open-source Python library for materials analysis. Computational Materials Science68, 314–319 (2013). [Google Scholar]
  • 127.Clark, S. J. et al. First principles methods using CASTEP. Zeitschrift für Kristallographie-Crystalline Materials220, 567–570 (2005). [Google Scholar]
  • 128.Blum, V. et al. Ab initio molecular simulations with numeric atom-centered orbitals. Computer Physics Communications180, 2175–2196 (2009). [Google Scholar]
  • 129.Havu, V., Blum, V., Havu, P. & Scheffler, M. Efficient O(N) integration for all-electron electronic structure calculation using numeric basis functions. Journal of Computational Physics228, 8367–8379 (2009). [Google Scholar]
  • 130.Perdew, J. P., Burke, K. & Ernzerhof, M. Generalized gradient approximation made simple. Physical Review Letters77, 3865 (1996). [DOI] [PubMed] [Google Scholar]
  • 131.Tkatchenko, A. & Scheffler, M. Accurate molecular van der Waals interactions from ground-state electron density and free-atom reference data. Physical Review Letters102, 073005 (2009). [DOI] [PubMed] [Google Scholar]
  • 132.Giannozzi, P. et al. Quantum Espresso: A modular and open-source software project for quantum simulations of materials. Journal of Physics: Condensed Matter21, 395502 (2009). [DOI] [PubMed] [Google Scholar]
  • 133.Deslippe, J. et al. BerkeleyGW: A massively parallel computer package for the calculation of the quasiparticle and optical properties of materials and nanostructures. Computer Physics Communications183, 1269–1289 (2012). [Google Scholar]
  • 134.Fuchs, M. & Scheffler, M. Ab initio pseudopotentials for electronic structure calculations of poly-atomic systems using density-functional theory. Computer Physics Communications119, 67–98 (1999). [Google Scholar]
  • 135.Deslippe, J., Samsonidze, G., Jain, M., Cohen, M. L. & Louie, S. G. Coulomb-hole summations and energies for GW calculations with limited number of empty orbitals: A modified static remainder approach. Phys. Rev. B87, 165124 (2013). [Google Scholar]
  • 136.Luo, Y., Marom, N., Gao, S. & Liu, X. PAH101. NOMAD10.17172/NOMAD/2024.12.05-1 (2024).
  • 137.Scheidgen, M. et al. NOMAD: A distributed web-based platform for managing materials science research data. Journal of Open Source Software8, 5388 (2023). [Google Scholar]
  • 138.Chisholm, J. A. & Motherwell, S. COMPACK: A program for identifying crystal structure similarity using distances. Journal of Applied Crystallography38, 228–231 (2005). [Google Scholar]
  • 139.Macrae, C. F. et al. Mercury 4.0: From visualization to analysis, design and prediction. Journal of Applied Crystallography53, 226–235 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 140.Hunnisett, L. M. et al. The seventh blind test of crystal structure prediction: structure generation methods. Acta Crystallographica Section B80, 517–547 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 141.Hunnisett, L. M. et al. The seventh blind test of crystal structure prediction: structure ranking methods. Acta Crystallographica Section B80, 548–574 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 142.Schatschneider, B., Monaco, S., Liang, J.-J. & Tkatchenko, A. High-throughput investigation of the geometry and electronic structures of gas-phase and crystalline polycyclic aromatic hydrocarbons. The Journal of Physical Chemistry C118, 19964–19974 (2014). [Google Scholar]
  • 143.Tkatchenko, A., DiStasio, R. A., Car, R. & Scheffler, M. Accurate and efficient method for many-body van der Waals interactions. Phys. Rev. Lett.108, 236402 (2012). [DOI] [PubMed] [Google Scholar]
  • 144.Sharifzadeh, S., Tamblyn, I., Doak, P., Darancet, P. T. & Neaton, J. B. Quantitative molecular orbital energies within a G0W0 approximation. European Physical Journal B85, 323 (2012). [Google Scholar]
  • 145.Filip, M. R., Qiu, D. Y., Del Ben, M. & Neaton, J. B. Screening of excitons by organic cations in quasi-two-dimensional organic–inorganic lead-halide perovskites. Nano Letters22, 4870–4878 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 146.Friedrich, C., Müller, M. C. & Blügel, S. Band convergence and linearization error correction of all-electron GW calculations: The extreme case of zinc oxide. Physical Review B83, 081101 (2011). [Google Scholar]
  • 147.Biswas, T. & Singh, A. pyGWBSE: a high throughput workflow package for GW-BSE calculations. npj Computational Materials9, 22 (2023). [Google Scholar]
  • 148.Bonacci, M. et al. Towards high-throughput many-body perturbation theory: efficient algorithms and automated workflows. npj Computational Materials9, 74 (2023). [Google Scholar]
  • 149.Rasmussen, A., Deilmann, T. & Thygesen, K. Towards fully automated GW band structure calculations: What we can learn from 60,000 self-energy evaluations. npj Computational Materials7 (2021).
  • 150.Großmann, M., Grunert, M. & Runge, E. A robust, simple, and efficient convergence workflow for GW calculations. npj Computational Materials10, 135 (2024). [Google Scholar]
  • 151.Jacquemin, D., Duchemin, I. & Blase, X. Benchmarking the Bethe-Salpeter formalism on a standard organic molecular set. Journal of Chemical Theory and Computation11, 3290–3304 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 152.Jacquemin, D., Duchemin, I., Blondel, A. & Blase, X. Benchmark of Bethe-Salpeter for triplet excited-states. Journal of Chemical Theory and Computation13, 767–783 (2017). [DOI] [PubMed] [Google Scholar]
  • 153.Forster, A. & Visscher, L. GW100: A slater-type orbital perspective. Journal of Chemical Theory and Computation17, 5080–5097 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 154.Knight, J. W. et al. Accurate ionization potentials and electron affinities of acceptor molecules III: A benchmark of GW methods. Journal of Chemical Theory and Computation12, 615–626 (2016). [DOI] [PubMed] [Google Scholar]
  • 155.Bruneval, F., Hamed, S. M. & Neaton, J. B. A systematic benchmark of the ab initio Bethe-Salpeter equation approach for low-lying optical excitations of small organic molecules. The Journal of Chemical Physics142, 244101 (2015). [DOI] [PubMed] [Google Scholar]
  • 156.Tauc, J. Optical properties and electronic structure of amorphous Ge and Si. Materials Research Bulletin3, 37–46 (1968). [Google Scholar]
  • 157.Viezbicke, B. D., Patel, S., Davis, B. E. & Birnie III, D. P. Evaluation of the Tauc method for optical absorption edge determination: ZnO thin films as a model system. Physica Status Solidi (b)252, 1700–1710 (2015). [Google Scholar]
  • 158.Makuła, P., Pacia, M. & Macyk, W. How to correctly determine the band gap energy of modified semiconductor photocatalysts based on UV-Vis spectra. The Journal of Physical Chemistry Letters9, 6814–6817 (2018). [DOI] [PubMed] [Google Scholar]
  • 159.Klein, J. et al. Limitations of the Tauc plot method. Advanced Functional Materials33, 2304523 (2023). [Google Scholar]
  • 160.Hino, S., Veszprémi, T., Ohno, K., Inokuchi, H. & Seki, K. Absorption spectra of volatile aromatic hydrocarbon films in the vacuum ultraviolet region. Chemical Physics71, 135–144 (1982). [Google Scholar]
  • 161.Tanaka, J. The electronic spectra of pyrene, chrysene, azulene, coronene and tetracene crystals. Bulletin of the Chemical Society of Japan38, 86–102 (1965). [Google Scholar]
  • 162.Proehl, H. et al. Comparison of ultraviolet photoelectron spectroscopy and scanning tunneling spectroscopy measurements on highly ordered ultrathin films of hexa-peri-hexabenzocoronene on Au(111). Physical Review B63, 205409 (2001). [Google Scholar]
  • 163.Puschnig, P., Meisenbichler, C. & Draxl, C. Excited state properties of organic semiconductors: breakdown of the Tamm-Dancoff approximation. arXiv, 1306.3790 (2013).
  • 164.Lettmann, T. & Rohlfing, M. Electronic excitations of polythiophene within many-body perturbation theory with and without the Tamm–Dancoff approximation. Journal of Chemical Theory and Computation15, 4547–4554 (2019). [DOI] [PubMed] [Google Scholar]
  • 165.Ma, Y., Rohlfing, M. & Molteni, C. Excited states of biological chromophores studied using many-body perturbation theory: Effects of resonant-antiresonant coupling and dynamical screening. Phys. Rev. B80, 241405 (2009). [Google Scholar]
  • 166.Lettmann, T. & Rohlfing, M. Finite-momentum excitons in rubrene single crystals. Phys. Rev. B104, 115427 (2021). [Google Scholar]
  • 167.Rangel, T. et al. Structural and excited-state properties of oligoacene crystals from first principles. Physical Review B93, 115206 (2016). [Google Scholar]
  • 168.Ambrosch-Draxl, C., Nabok, D., Puschnig, P. & Meisenbichler, C. The role of polymorphism in organic thin films: Oligoacenes investigated from first principles. New Journal of Physics11, 125010 (2009). [Google Scholar]
  • 169.Zhang, X., Leveillee, J. A. & Schleife, A. Effect of dynamical screening in the bethe-salpeter framework: Excitons in crystalline naphthalene. Phys. Rev. B107, 235205 (2023). [Google Scholar]
  • 170.Wang, X. et al. Computational discovery of intermolecular singlet fission materials using many-body perturbation theory. The Journal of Physical Chemistry C128, 7841–7864 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 171.Gao, S., Luo, Y., Liu, X. & Marom, N. Predicting the excited-state properties of crystalline organic semiconductors using GW+ BSE and machine learning. Digital Discovery 10.1039/D4DD00396A (2025).
  • 172.Schober, C., Reuter, K. & Oberhofer, H. Critical analysis of fragment-orbital DFT schemes for the calculation of electronic coupling values. The Journal of Chemical Physics144, 054103 (2016). [DOI] [PubMed] [Google Scholar]
  • 173.Ambrosetti, A., Reilly, A. M., DiStasio, R. A. & Tkatchenko, A. Long-range correlation energy calculated from coupled atomic response functions. The Journal of Chemical Physics140 (2014). [DOI] [PubMed]
  • 174.Schnepp, O. Electronic spectra of molecular crystals. Annual Review of Physical Chemistry14, 35–60 (1963). [Google Scholar]
  • 175.Ahn, T.-S. et al. Experimental and theoretical study of temperature dependent exciton delocalization and relaxation in anthracene thin films. The Journal of chemical physics128 (2008). [DOI] [PubMed]
  • 176.Lim, S.-H., Bjorklund, T. G., Spano, F. C. & Bardeen, C. J. Exciton delocalization and superradiance in tetracene thin films and nanoaggregates. Physical review letters92, 107402 (2004). [DOI] [PubMed] [Google Scholar]
  • 177.Bree, A. & Lyons, L. 998. photo-and semi-conductance of organic crystals. Part VI. Effect of oxygen on the surface photo-current and some photochemical properties of solid anthracene. Journal of the Chemical Society (Resumed) 5179–5186 (1960).
  • 178.Park, S., Kim, S., Kim, J., Whang, C. & Im, S. Optical and luminescence characteristics of thermally evaporated pentacene films on Si. Applied Physics Letters80, 2872–2874 (2002). [Google Scholar]
  • 179.Faltermeier, D., Gompf, B., Dressel, M., Tripathi, A. K. & Pflaum, J. Optical properties of pentacene thin films and single crystals. Physical Review B74, 125416 (2006). [Google Scholar]
  • 180.Jentzsch, T., Juepner, H., Brzezinka, K.-W. & Lau, A. Efficiency of optical second harmonic generation from pentacene films of different morphology and structure. Thin solid films315, 273–280 (1998). [Google Scholar]
  • 181.Sharifzadeh, S., Biller, A., Kronik, L. & Neaton, J. B. Quasiparticle and optical spectroscopy of the organic semiconductors pentacene and PTCDA from first principles. Physical Review B85, 125307 (2012). [Google Scholar]
  • 182.Tiago, M. L., Northrup, J. E. & Louie, S. G. Ab initio calculation of the electronic and optical properties of solid pentacene. Physical Review B67, 115212 (2003). [Google Scholar]
  • 183.Sun, D. et al. Anisotropic singlet fission in single crystalline hexacene. iScience19, 1079–1089 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 184.Watanabe, M. et al. The synthesis, crystal structure and charge-transport properties of hexacene. Nature Chemistry4, 574–578 (2012). [DOI] [PubMed] [Google Scholar]
  • 185.Busby, E. et al. Multiphonon relaxation slows singlet fission in crystalline hexacene. Journal of the American Chemical Society136, 10654–10660 (2014). [DOI] [PubMed] [Google Scholar]
  • 186.Najafov, H., Lee, B., Zhou, Q., Feldman, L. C. & Podzorov, V. Observation of long-range exciton diffusion in highly ordered organic semiconductors. Nature Materials9, 938–943 (2010). [DOI] [PubMed] [Google Scholar]
  • 187.Huang, L. et al. Rubrene micro-crystals from solution routes: their crystallography, morphology and optical properties. Journal of Materials Chemistry20, 159–166 (2010). [Google Scholar]
  • 188.Tanaka, J. The electronic spectra of aromatic molecular crystals. II. The crystal structure and spectra of perylene. Bulletin of the Chemical Society of Japan36, 1237–1249 (1963). [Google Scholar]
  • 189.Mulder, B. Photoconductivity spectra of stable and metastable single-crystals of perylene. Recueil des Travaux Chimiques des Pays-Bas84, 713–728 (1965). [Google Scholar]
  • 190.Kurrle, D. & Pflaum, J. Exciton diffusion length in the organic semiconductor diindenoperylene. Applied Physics Letters92, 133306 (2008). [Google Scholar]
  • 191.Maruyama, Y., Iwaki, T., Kajiwara, T., Shirotani, I. & Inokuchi, H. Molecular orientation and absorption spectra of quaterrylene evaporated film. Bulletin of the Chemical Society of Japan43, 1259–1261 (1970). [Google Scholar]
  • 192.Maruyama, Y., Inokuchi, H. & Harada, Y. Electronic properties of quaterrylene, C40H20. Bulletin of the Chemical Society of Japan36, 1193–1198 (1963). [Google Scholar]
  • 193.Nijegorodov, N., Mabbs, R. & Downey, W. Evolution of absorption, fluorescence, laser and chemical properties in the series of compounds perylene, benzo(ghi)perylene and coronene. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy57, 2673–2685 (2001). [DOI] [PubMed] [Google Scholar]
  • 194.Xiao, J. et al. Preparation, characterization, and photoswitching/light-emitting behaviors of coronene nanowires. Journal of Materials Chemistry21, 1423–1427 (2011). [Google Scholar]
  • 195.Brodin, M. & Soskin, M. Investigation of the absorption spectrum of a single crystal of 1,2-benzanthracene in the region of lowest electronic transitions. Optika i Spektroskopiya6, 600–604 (1959). [Google Scholar]
  • 196.Ramasesha, S., Albert, I. & Sinha, B. Optical and magnetic properties of the exact PPP states of biphenyl. Molecular Physics72, 537–547 (1991). [Google Scholar]
  • 197.Coffman, R. & McClure, D. S. The electronic spectra of crystalline toluene, dibenzyl, diphenylmethane, and biphenyl in the near ultraviolet. Canadian Journal of Chemistry36, 48–58 (1958). [Google Scholar]
  • 198.Puschnig, P. et al. Pressure studies on the intermolecular interactions in biphenyl. Synthetic Metals116, 327–331 (2001). [Google Scholar]
  • 199.Gondo, Y. Electronic structure and spectra of biphenyl and its related compound. The Journal of Chemical Physics41, 3928–3938 (1964). [Google Scholar]
  • 200.Mukherjee, B. & Ganguly, S. Anisotropy of the electronic spectra of a single crystal of 1,12-benzperylene (C22H12). Proceedings of the Physical Society83, 93 (1964). [Google Scholar]
  • 201.Pu, Y.-J. et al. Absence of delayed fluorescence and triplet–triplet annihilation in organic light emitting diodes with spatially orthogonal bianthracenes. Journal of Materials Chemistry C7, 2541–2547 (2019). [Google Scholar]
  • 202.Manna, B., Nandi, A. & Chandrakumar, K. Comparative study of exciton dynamics in 9, 9’-bianthracene nanoaggregates and thin films: Observation of singlet–singlet annihilation-mediated triplet exciton formation. The Journal of Physical Chemistry C126, 10762–10771 (2022). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

• The HAppend code for adding missing hydrogen atoms to molecular crystal structures is available in the GitHub Repository HAppend (10.5281/zenodo.15093246), together with the pseudopotentials used in our calculations and scripts for making band structure and absorption plots.• Scripts for calculating the SISSO primary features and for processing SISSO results are available in the GitHub repository MLfeat_FHI-aims (10.5281/zenodo.15093306).• The BerkeleyGW code for performing GW+BSE calculations133 is available at the BerkeleyGW website.• The FHI-aims code128, used to perform some relaxations and calculate DFT features, is available at the FHI-aims website. Version 18.06.07 was used here.• The Quantum ESPRESSO code132, used to calculate the mean-field wave functions for subsequent GW+BSE calculations, is available at the Quantum ESPRESSO website.• The SISSO code72, used to perform sure independent screening and sparsifying operator model training, is available at the GitHub Repository SISSO. SISSO version 3.3 dated July 2023 was used here.• Scripts for preparing the input for SISSO, running the training and model evaluation, analyzing the SISSO output, and making Pareto plots and correlation plots between the SISSO model predictions and the true labels are provided in the GitHub repository SISSOonPAH (10.5281/zenodo.15093308).


Articles from Scientific Data are provided here courtesy of Nature Publishing Group

RESOURCES