BERT2DAb: a pre-trained model for antibody representation based on amino acid sequences and 2D-structure.
Xiaowei LuoFan TongWenbin ZhaoXiangwen ZhengJiangyu LiJing LiDongsheng ZhaoPublished in: mAbs (2023)
Prior research has generated a vast amount of antibody sequences, which has allowed the pre-training of language models on amino acid sequences to improve the efficiency of antibody screening and optimization. However, compared to those for proteins, there are fewer pre-trained language models available for antibody sequences. Additionally, existing pre-trained models solely rely on embedding representations using amino acids or k-mers, which do not explicitly take into account the role of secondary structure features. Here, we present a new pre-trained model called BERT2DAb. This model incorporates secondary structure information based on self-attention to learn representations of antibody sequences. Our model achieves state-of-the-art performance on three downstream tasks, including two antigen-antibody binding classification tasks (precision: 85.15%/94.86%; recall:87.41%/86.15%) and one antigen-antibody complex mutation binding free energy prediction task (Pearson correlation coefficient: 0.77). Moreover, we propose a novel method to analyze the relationship between attention weights and contact states of pairs of subsequences in tertiary structures. This enhances the interpretability of BERT2DAb. Overall, our model demonstrates strong potential for improving antibody screening and design through downstream applications.