Login / Signup

Transferring a Molecular Foundation Model for Polymer Property Predictions.

Pei ZhangLogan KearneyDebsindhu BhowmikZachary FoxAmit K NaskarJohn Gounley
Published in: Journal of chemical information and modeling (2023)
Transformer-based large language models have remarkable potential to accelerate design optimization for applications such as drug development and material discovery. Self-supervised pretraining of transformer models requires large-scale data sets, which are often sparsely populated in topical areas such as polymer science. State-of-the-art approaches for polymers conduct data augmentation to generate additional samples but unavoidably incur extra computational costs. In contrast, large-scale open-source data sets are available for small molecules and provide a potential solution to data scarcity through transfer learning. In this work, we show that using transformers pretrained on small molecules and fine-tuned on polymer properties achieves comparable accuracy to those trained on augmented polymer data sets for a series of benchmark prediction tasks.
Keyphrases
  • electronic health record
  • big data
  • machine learning
  • public health
  • magnetic resonance
  • computed tomography
  • magnetic resonance imaging
  • high throughput
  • single cell
  • virtual reality
  • resistance training
  • wound healing