A database of thermally activated delayed fluorescent molecules auto-generated from scientific literature with ChemDataExtractor.
Dingyun HuangJacqueline M ColePublished in: Scientific data (2024)
A database of thermally activated delayed fluorescent (TADF) molecules was automatically generated from the scientific literature. It consists of 25,482 data records with an overall precision of 82%. Among these, 5,349 records have chemical names in the form of SMILES strings which are represented with 91% accuracy; these are grouped in a subsidiary database. Each data record contains one of the following four properties: maximum emission wavelength (λ EM ), photoluminescence quantum yield (PLQY), singlet-triplet energy splitting (ΔE ST ), and delayed lifetime (τ D ). The databases were created through text mining using ChemDataExtractor, a chemistry-aware natural-language-processing toolkit, which has been adapted for TADF research. The text-mined corpus consisted of 2,733 papers from the Royal Society of Chemistry and Elsevier. To the best of our knowledge, these databases are the first databases that have been auto-generated for TADF molecules from existing publications. The databases have been publicly released for experimental and computational applications in the TADF research field.
Keyphrases
- big data
- quantum dots
- energy transfer
- systematic review
- adverse drug
- electronic health record
- artificial intelligence
- living cells
- machine learning
- smoking cessation
- healthcare
- light emitting
- autism spectrum disorder
- molecular dynamics
- label free
- emergency department
- data analysis
- fluorescent probe
- deep learning
- single molecule