Skip to main content
Scientific Data logoLink to Scientific Data
. 2020 Sep 8;7:295. doi: 10.1038/s41597-020-00634-8

Experimental database of optical properties of organic compounds

Joonyoung F Joung 1,#, Minhi Han 1,#, Minseok Jeong 1, Sungnam Park 1,
PMCID: PMC7478979  PMID: 32901041

Abstract

Experimental databases on the optical properties of organic chromophores are important for the implementation of data-driven chemistry using machine learning. Herein, we present a series of experimental data including various optical properties such as the first absorption and emission maximum wavelengths and their bandwidths (full width at half maximum), extinction coefficient, photoluminescence quantum yield, and fluorescence lifetime. A database of 20,236 data points was developed by collecting the optical properties of organic compounds already reported in the literature. A dataset of 7,016 unique organic chromophores in 365 solvents or in solid state is available in CSV format.

Subject terms: Fluorescence spectroscopy, Excited states, Cheminformatics, Chemical physics


Measurement(s) absorption wavelength • emission wavelength • fluorescence lifetime • bandwidth • extinction coefficient • electronic absorption spectrum • fluorescence spectrum • optical property of organic compound
Technology Type(s) ultraviolet-visible spectrophotometry • spectrofluorimeter • time-resolved fluorescence spectroscopy • digital curation
Factor Type(s) organic molecule • solvent

Machine-accessible metadata file describing the reported data: 10.6084/m9.figshare.12808424

Background & Summary

Organic chromophores used in optoelectronics, organic light emitting diodes (OLEDs), staining, fluorescent dyes, and bioimaging dyes, have been steadily developed. Therefore, it would be useful to reliably and quickly predict the optical properties of newly designed organic chromophores prior to their synthesis. Theoretical calculations based on ab initio and density functional theory methods have been extensively used to characterize the optical properties of newly designed organic chromophores. However, such theoretical calculations require high computational costs. Therefore, data-driven sciences based on machine learning have emerged as a promising alternative method and have been applied in many research areas13. However, databases are a prerequisite for data-driven sciences based on machine learning. Thus, databases for specific applications need to be available or collected.

The optical properties, such as absorption and emission maximum wavelengths and their bandwidths, extinction coefficient, photoluminescence quantum yield (PLQY), and lifetime, are important factors in characterizing organic chromophores. Therefore, databases on optical properties can be used to model the quantitative structure–property relationship for designing new organic chromophores with desired optical properties. Recently, the absorption peaks and extinction coefficients of small organic molecules have already been obtained using quantum chemical calculations and have been used for machine learning46. In addition, Beard et al. have reported the datasets of experimental and computational ultraviolet–visible (UV–Vis) absorption spectra7. However, no databases are currently available for the experimental absorption, emission, and fluorescence properties of organic chromophores.

As illustrated in Fig. 1, the absorption properties of organic chromophores are characterized by the first maximum absorption wavelength (λabs, max), bandwidth (σabs), and extinction coefficient (εmax) (Fig. 1a), which are important parameters for the design of chromophores for specific applications in various research fields such as photovoltaics, dyes, and optical filters. Similarly, the emission and fluorescent properties, which are characterized by the maximum emission wavelength (λemi, max), bandwidth (σemi), PLQY (ΦQY), and excited state lifetime (τ) (Fig. 1b, c), are essential for the development of emitters in OLEDs, fluorescent bioimaging dyes, and fluorescent sensors. In this study, we present a reliable and high-quality database of the optical properties of organic compounds that can be used for various purposes in diverse research fields.

Fig. 1.

Fig. 1

(a) Absorption spectrum displaying λabs, max, σabs (in FWHM), and εmax. (b) Emission spectrum displaying λemi, max, σemi (in FWHM), and ΦQY. (c) Time-resolved fluorescence (TRF) signal and τflu.

Methods

A total of 1,358 articles containing organic compounds were downloaded from journals of Nature Research, American Chemical Society, Royal Society of Chemistry, Springer, and Elsevier by exploring keywords such as fluorescence, luminescence, emission, OLED, fluorescence lifetime, or PLQY.

In our database, the organic compounds and solvent molecules are limited to a maximum number of 150 atoms (except hydrogen atoms) consisting of C, N, O, S, F, Cl, Br, I, Se, Te, Si, P, B, Sn, and Ge. Binary or ternary solvent systems are not included in our database. Data points in the solid state include one component systems (either amorphous or crystalline) and the solid solution such as dopant (chromophore) – host (solvent) systems in our database.

All the optical properties in our database are based on the absorption and emission spectra reported in the originally published papers. To extract the optical properties, the absorption and emission spectra were carefully examined to exclude unreliable experimental results. In the case of collecting the extinction coefficients, absorption maximum, and absorption bandwidth, the background corrected absorption spectra in the dynamic range (typically, absorbance < 2) were selected. Similarly, for collecting the emission maximum, bandwidth, and quantum yield, the properly measured emission spectra were carefully selected. The ΦQY values exceeding 1 were not included. In the absorption spectrum of a given molecule, the first absorption peak was selected and its λabs, max, σabs (in full width at half maximum (FWHM)), and εmax values were obtained. Likewise, the λemi, max and σemi (in FWHM) values were obtained from the emission (or fluorescence) spectra. The bandwidths (σabs and σemi) were reported in cm−1 or nm as provided in the published papers.

Furthermore, the PLQY (ΦQY) measured in degassed media was preferentially collected, if reported. Otherwise, the PLQY measured in the air was collected. The fluorescence (or excited state) lifetimes (τ) measured by time-resolved fluorescence (TRF) experiments were also collected. In the case that the TRF signal was fit using a multi-exponential function [St=iAiexp(t/τi)] where Ai and τi is the amplitude and time constant, the average lifetime [τ=iAiτi/iAi] was obtained and recorded. The molecular structures are reported in the canonicalized simplified molecular input line entry system (SMILES)811. For the optical properties, a pair of chromophore and solvent are provided, whereas for solid states, the chromophore is used as both chromophore and solvent. Moreover, for chromophores in a solid matrix, the solid matrix is used as the solvent.

Data Records

The developed database is available at figshare12 and its format is described in Table 1. The database comprises 20,236 combinations of 7,016 chromophores in 365 solvents and 17 solid matrices (or host) or solid states. Furthermore, the SMILES strings of the chromophores and solvents are provided and they indicate their molecular structures. All experimental data from the literature are presented with the corresponding reference, and each digital object identifier (DOI) is also reported. An example of benzene in cyclohexane is presented in Table 2. The data that are not reported in the references are indicated as NaN (not a number).

Table 1.

Description of the database.

No. Column name Unit Data type Description
1 Tag Float The numbering of data points
2 Chromophore String SMILES of chromophore structure
3 Solvent String SMILES of solvent structure
4 Absorption max (nm) nm Float Maximum absorption wavelength, λabs,max
5 Emission max (nm) nm Float Maximum emission wavelength, λemi,max
6 Lifetime (ns) ns Float Fluorescence lifetime, τflu
7 Quantum yield Float Photoluminescence quantum yield, ΦQY
8 log(e/mol-1 dm3 cm-1) Float Extinction coefficient at λabs,max, log10(εmax)
9 abs FWHM (cm-1) cm−1 Float Absorption bandwidth (FWHM), σabs
10 emi FWHM (cm-1) cm−1 Float Emission bandwidth (FWHM), σemi
11 abs FWHM (nm) nm Float Absorption bandwidth (FWHM), σabs
12 emi FWHM (nm) nm Float Emission bandwidth (FWHM), σemi
13 Molecular weight (g mol-1) g mol−1 Float Molecular weight of chromophore
14 Reference String Source document DOI

Table 2.

Optical properties of benzene in cyclohexane.

Tag 4461
Chromophore c1ccccc1
Solvent C1CCCCC1
Absorption max (nm) 252.5253
Emission max (nm) 287.3563
Lifetime (ns) 60
Quantum yield 0.14
log(e/mol-1 dm3 cm-1) NaN
abs FWHM (cm-1) NaN
emi FWHM (cm-1) NaN
abs FWHM (nm) NaN
emi FWHM (nm) NaN
Molecular weight (g mol-1) 78.11364
Reference 10.1016/S1386-1425(02)00207-X

Technical Validation

The main purpose of our database is to provide the optical properties of chromophores to the scientific and industrial communities with high quality and reliability.

The validation of the data we collected relies on the validation of peer-reviewed articles. To reduce the potential errors, we built our database in the following procedure. Two people, who had sufficient background in spectroscopic measurements, separately collected the optical properties from the published papers. The third person cross-checked these two datasets and added them to the database. In addition, the outliers such as λabs, max (λemi, max) > 950 nm or <200 nm, λabs, max > λemi, max, σabs or σemi > 7000 cm−1, τflu < 0.1 ns, and log10(εmax) < 2.5, were double-checked. Therefore, all the values in the final version of our database were carefully checked with those values and the spectra in the originally published papers.

A summary of the developed database is provided in Fig. 2. Among the 7,016 chromophores that can be found in our database, 95.2% have molecular weights lower than 1000 g/mol. Moreover, the chromophores contain diverse core structures such as pyrene, coumarin, perylene, porphyrin, boron-dipyrromethene (BODIPY), stilbene, azobenzene, and so on. In addition, the chromophores with molecular weight higher than 1000 g/mol generally comprise long alkyl chains or sugar units, which are introduced to improve the solubility without affecting the optical properties.

Fig. 2.

Fig. 2

Histograms of (a) the molecular weight of chromophores, (b) λabs, max, (c) λemi, max, (d) ΦQY, (e) σabs, (f) σemi, (g) τflu, (h) log10(εmax), and (i) solvents (CH2Cl2: dichloromethane, CH3CN: acetonitrile, Tol: toluene, THF: tetrahydrofuran, CHCl3: chloroform, MeOH: methanol, EtOH: ethanol, DMSO: dimethyl sulfoxide, CH: cyclohexane, and EA: ethyl acetate). The number of data points and the unique molecules are included in each graph. The box plots are drawn inside (b,c,e,f). The numbers in the parenthesis in (e,f) represent the number of the bandwidths reported in cm−1 and the number of the corresponding molecules, respectively.

The histograms of λabs, max and λemi, max in Fig. 2b,c are divided into bins with a 20-nm width, covering a wide range of λabs, max and λemi, max. For example, 63% of λabs, max and 88% of λemi, max are in the visible range (380–700 nm), whereas more than 93% of the chromophores can absorb sunlight (310–750 nm), indicating their potential use as dyes and light harvesting molecules. Furthermore, our database contains fluorophores covering a wide range of emission wavelengths from UV to near infrared (NIR), which are applicable to OLEDs, fluorescence imaging dyes, and fluorescence sensors. In addition, chromophores with various functional groups13,14 and a chromophore in various solvents15,16 are included so that the effects of the functional groups and the solvents (solvatochromism) on the optical properties of the chromophores are well documented.

The histogram of the collected ΦQY values is also divided into bins with a width of 0.05 (Fig. 2d). The standards for PLQYs, such as quinine sulfate and rhodamine 6G in solution, are also included17. Among the obtained QY data, the ΦQY of 91 data points is 0, and that of 137 data points is 1, whereas for approximately 23% of the QY data, the ΦQY is less than 0.05. Furthermore, the PLQYs of 803 samples in solid state were obtained mainly from OLED molecules18,19. In addition, molecules exhibiting aggregation induced emission were also collected for our database20,21.

Figure 2e,f display the σabs and σemi values that were extracted from the absorption and emission spectra of over 1,600 and 2,800 molecules, respectively. Our database contains 3292 and 7198 data points of σabs and σemi reported in nm and 747 and 627 data points of σabs and σemi reported in cm−1. σabs and σemi values were barely reported in the published papers when compared with other optical properties. Most of σabs and σemi values in nm were extracted directly from the absorption and emission spectra reported in the originally published papers.

The standards for the fluorescence lifetime (τflu) values reported by Boens et al.22 as well as other τflu measurements were also included in our database. The histogram of the collected τflu is divided into bins with a width of 1 ns (Fig. 2g), indicating that approximately 5% of the τflu values is longer than 20 ns.

The εmax values at λabs, max are recorded in log10(εmax) and their distribution in Fig. 2h is shown in the histogram which is divided into bins with a width of 0.2. In our database, most of the εmax values are in the range of 103–106 mol−1 dm3 cm−1 (mol−1 L cm−1). Note that the product of ΦQY and εmax is proportional to the brightness, which is the fluorescence intensity per fluorophore. In addition, the number of data points simultaneously exhibiting ΦQY and εmax is 6,663, which can be used to estimate the brightness.

Finally, the optical properties of chromophores are solvent-dependent. In our database, 365 solvents are included. Among the 12 most common solvents presented in Fig. 2i, dichloromethane is the most frequently used. Moreover, alkanes with a number of carbon atoms ranging from 2 (1,1-dichloroethane) to 16 (1-chlorohexadecane) and 99 alcohols with one (methanol) to 12 (dodecanol) carbon atoms are reported as solvents. In addition, solid solutions and host molecules, such as 4,4ʹ-bis(carbazol-9-yl)biphenyl, bis[2-(diphenylphosphino)phenyl] ether oxide (DPEPO), and 1,3-bis(N-carbazolyl)benzene (mCP), are included in our database.

Estimation of experimental uncertainties of the optical properties in the database

The optical properties of organic compounds were collected from the published peer-reviewed papers. In most original papers, the experimental uncertainties in seven optical properties were not reported. In addition, the experimental conditions were different when the optical properties were measured. Therefore, it is very difficult to accurately estimate the experimental uncertainties. However, the experimental uncertainties of the optical properties are roughly estimated in the following way.

Experimental uncertainty of λabs, max and λemi, max

Most of UV-visible absorption and emission spectra reported in the published papers were measured by the spectrophotometers and spectrofluorometers available from Agilent, Ocean optics, Hitachi, and JASCO. Including them, the typical and modern absorption and emission spectrometers have a wavelength resolution of less than 1 nm. The maximum wavelengths (λabs, max and λemi, max) of absorption and emission spectra can be readily determined within an experimental error of 1 nm. Therefore, the experimental uncertainty of λabs, max and λemi, max is estimated to be less than 1 nm.

Experimental uncertainty of σabs and σemi

The values of absorption and emission bandwidth (σabs and σemi) in full width half maximum (FWHM) were extracted from the absorption and emission spectra reported in the published paper when they were not directly reported. Therefore, the error is much smaller than the thickness of the linewidth of spectra. The experimental uncertainty of σabs and σemi in FWHM is estimated to be a maximum of 2 nm.

Experimental uncertainty of ΦQY

The photoluminescence quantum yield (ΦQY) is found to be the most error-prone quantity among seven optical properties. The experimental error in ΦQY is affected by several factors such as experimental instruments, measuring methods (absolute vs relative), and molecular oxygen (O2). The IUPAC technical report is useful for estimating the error in ΦQY17. The ΦQY of 9,10-diphenylanthracene in cyclohexane is in the range of 0.9 to 0.97. Based on the fact that ΦQY is error-prone, the experimental uncertainty in ΦQY is conservatively estimated to be a maximum of 0.1.

Experimental uncertainty of τflu

The fluorescence lifetime (τflu) is determined by an exponential fit to the time-resolved fluorescence (TRF) signal. The experimental uncertainty of τflu results mainly from the instrument response function (IRF) and multi-exponential fit process. Since the IRF determines the time-resolution of the TRF spectrometer, the experimental error of τflu is significant when τflu is shorter than the IRF. In most cases. the multi-exponential fitting error does not exceed a maximum of 1%. We collected τflu that was substantially larger than the IRF. Therefore, the experimental uncertainty of τflu is conservatively estimated to be 1%.

Experimental uncertainty of log10max)

To determine the extinction coefficient (εmax), the absorbance (A) and the concentration (c) of chromophores should be known based on the Beer’s law (A = εbc where b is the pathlength). Considering that the published papers are peer-reviewed, the experimental error in the concentration is assumed to be less than 5%. Therefore, the experimental uncertainty of log10(εmax) is estimated to be less than 0.02 which is corresponding to log10(1.05).

Acknowledgements

This study was supported by grants from the National Research Foundation of Korea (NRF) funded by the Korean government (MSIP) (No. 2019R1H1A2079968 and 2019R1A6A1A11044070).

Author contributions

J.F. Joung, M. Han, and M. Jeong collected the data and designed research under the supervision of S. Park. J.F. Joung and S. Park wrote the manuscript. J.F. Joung and M. Han contributed equally to this work.

Code availability

The optical properties of the chromophores were extracted from the scientific literatures, which is available at 10.6084/m9.figshare.12045567.v212. We have opened a user-friendly webpage (http://Deep4Chem.korea.ac.kr/search) where users can search for chromophores in the database. The database of this webpage will be updated regularly.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors contributed equally: Joonyoung F. Joung, Minhi Han.

References

  • 1.Gao H, et al. Using Machine Learning To Predict Suitable Conditions for Organic Reactions. ACS Cent. Sci. 2018;4:1465–1476. doi: 10.1021/acscentsci.8b00357. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Ryu S, Kwon Y, Kim WY. A Bayesian graph convolutional network for reliable prediction of molecular properties with uncertainty quantification. Chem. Sci. 2019;10:8438–8446. doi: 10.1039/C9SC01992H. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Sahu H, et al. Designing promising molecules for organic solar cells via machine learning assisted virtual screening. J. Mater. Chem. A. 2019;7:17480–17488. doi: 10.1039/C9TA04097H. [DOI] [Google Scholar]
  • 4.Blum LC, Reymond JL. 970 million druglike small molecules for virtual screening in the chemical universe database GDB-13. J. Am. Chem. Soc. 2009;131:8732–8733. doi: 10.1021/ja902302h. [DOI] [PubMed] [Google Scholar]
  • 5.Montavon, G. et al. Machine learning of molecular electronic properties in chemical compound space. New J. Phys. 15 (2013).
  • 6.Ghosh K, et al. Deep Learning Spectroscopy: Neural Networks for Molecular Excitation Spectra. Adv. Sci. 2019;6:1801367. doi: 10.1002/advs.201801367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Beard EJ, Sivaraman G, Vazquez-Mayagoitia A, Vishwanath V, Cole JM. Comparative dataset of experimental and computational attributes of UV/vis absorption spectra. Sci. Data. 2019;6:307. doi: 10.1038/s41597-019-0306-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Weininger D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Model. 1988;28:31–36. doi: 10.1021/ci00057a005. [DOI] [Google Scholar]
  • 9.Weininger D, Weininger A, Weininger JL. SMILES. 2. Algorithm for generation of unique SMILES notation. J. Chem. Inf. Model. 1989;29:97–101. doi: 10.1021/ci00062a008. [DOI] [Google Scholar]
  • 10.Weininger D. SMILES. 3. DEPICT. Graphical depiction of chemical structures. J. Chem. Inf. Model. 1990;30:237–243. doi: 10.1021/ci00067a005. [DOI] [Google Scholar]
  • 11.O’Boyle NM. Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChI. J. Cheminform. 2012;4:22. doi: 10.1186/1758-2946-4-22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Joung JF, Han M, Jeong M, Park S. 2020. DB for chromophore. figshare. [DOI]
  • 13.Zhengneng J, et al. Synthesis and fluorescence property of some novel 1,8-naphthalimide derivatives containing a thiophene ring at the C-4 position. Dyes Pigm. 2013;96:204–210. doi: 10.1016/j.dyepig.2012.07.018. [DOI] [Google Scholar]
  • 14.Gawale Y, Sekar N. Investigating the excited state optical properties and origin of large stokes shift in Benz[c,d]indole N-Heteroarene BF2 dyes with ab initio tools. J. Photochem. Photobiol. B. 2018;178:472–480. doi: 10.1016/j.jphotobiol.2017.12.006. [DOI] [PubMed] [Google Scholar]
  • 15.Vekshin N, Savintsev I, Kovalev A, Yelemessov R, Wadkins RM. Solvatochromism of the Excitation and Emission Spectra of 7-Aminoactinomycin D: Implications for Drug Recognition of DNA Secondary Structures. J. Phys. Chem. B. 2001;105:8461–8467. doi: 10.1021/jp011168p. [DOI] [Google Scholar]
  • 16.Zoto CA, Connors RE. Photophysical properties of an asymmetrical 2,5-diarylidene-cyclopentanone dye possessing electron donor and acceptor substituents. J. Mol. Struct. 2010;982:121–126. doi: 10.1016/j.molstruc.2010.08.016. [DOI] [Google Scholar]
  • 17.Brouwer AM. Standards for photoluminescence quantum yield measurements in solution (IUPAC Technical Report) Pure Appl. Chem. 2011;83:2213–2228. doi: 10.1351/PAC-REP-10-09-31. [DOI] [Google Scholar]
  • 18.Uoyama H, Goushi K, Shizu K, Nomura H, Adachi C. Highly efficient organic light-emitting diodes from delayed fluorescence. Nature. 2012;492:234–238. doi: 10.1038/nature11687. [DOI] [PubMed] [Google Scholar]
  • 19.Zhang QS, et al. Design of Efficient Thermally Activated Delayed Fluorescence Materials for Pure Blue Organic Light Emitting Diodes. J. Am. Chem. Soc. 2012;134:14706–14709. doi: 10.1021/ja306538w. [DOI] [PubMed] [Google Scholar]
  • 20.Guo J, et al. Robust Luminescent Materials with Prominent Aggregation-Induced Emission and Thermally Activated Delayed Fluorescence for High-Performance Organic Light-Emitting Diodes. Chem. Mater. 2017;29:3623–3631. doi: 10.1021/acs.chemmater.7b00450. [DOI] [Google Scholar]
  • 21.Yuan, Y.-X. et al. Exceptional aggregation-induced emission from one totally planar molecule. Dyes Pigm. 170 (2019).
  • 22.Boens N, et al. Fluorescence lifetime standards for time and frequency domain fluorescence spectroscopy. Anal. Chem. 2007;79:2137–2149. doi: 10.1021/ac062160k. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

  1. Joung JF, Han M, Jeong M, Park S. 2020. DB for chromophore. figshare. [DOI]

Data Availability Statement

The optical properties of the chromophores were extracted from the scientific literatures, which is available at 10.6084/m9.figshare.12045567.v212. We have opened a user-friendly webpage (http://Deep4Chem.korea.ac.kr/search) where users can search for chromophores in the database. The database of this webpage will be updated regularly.


Articles from Scientific Data are provided here courtesy of Nature Publishing Group

RESOURCES