Transfer learning for a foundational chemistry model.
Emma King-SmithPublished in: Chemical science (2023)
Data-driven chemistry has garnered much interest concurrent with improvements in hardware and the development of new machine learning models. However, obtaining sufficiently large, accurate datasets of a desired chemical outcome for data-driven chemistry remains a challenge. The community has made significant efforts to democratize and curate available information for more facile machine learning applications, but the limiting factor is usually the laborious nature of generating large-scale data. Transfer learning has been noted in certain applications to alleviate some of the data burden, but this protocol is typically carried out on a case-by-case basis, with the transfer learning task expertly chosen to fit the finetuning. Herein, I develop a machine learning framework capable of accurate chemistry-relevant prediction amid general sources of low data. First, a chemical "foundational model" is trained using a dataset of ∼1 million experimental organic crystal structures. A task specific module is then stacked atop this foundational model and subjected to finetuning. This approach achieves state-of-the-art performance on a diverse set of tasks: toxicity prediction, yield prediction, and odor prediction.
Keyphrases
- machine learning
- big data
- electronic health record
- artificial intelligence
- drug discovery
- randomized controlled trial
- mental health
- healthcare
- deep learning
- squamous cell carcinoma
- risk factors
- working memory
- gold nanoparticles
- resistance training
- electron transfer
- radiation therapy
- single cell
- mass spectrometry
- body composition
- highly efficient
- reduced graphene oxide
- high intensity
- oxide nanoparticles