Login / Signup

Elucidation of Genome-wide Understudied Proteins targeted by PROTAC-induced degradation using Interpretable Machine Learning.

Li XieLei Xie
Published in: bioRxiv : the preprint server for biology (2023)
Many human diseases remain incurable because disease-causing genes cannot by selectively and effectively targeted by small molecules. Proteolysis-targeting chimera (PROTAC), an organic compound that binds to both a target and a degradation-mediating E3 ligase, has emerged as a promising approach to selectively target disease-driving genes that are not druggable by small molecules. Nevertheless, not all of proteins can be accommodated by E3 ligases, and be effectively degraded. Knowledge on the degradability of a protein will be crucial for the design of PROTACs. However, only hundreds of proteins have been experimentally tested if they are amenable to the PROTACs. It remains elusive what other proteins can be targeted by the PROTAC in the entire human genome. In this paper, we propose an intepretable machine learning model PrePROTAC that takes advantage of powerful protein language modeling. PrePROTAC achieves high accuracy when evaluated by an external dataset which comes from different gene families from the proteins in the training data, suggesting the generalizability of PrePROTAC. We apply PrePROTAC to the human genome, and identify more than 600 understudied proteins that are potentially responsive to the PROTAC. Furthermore, we design three PROTAC compounds for novel drug targets associated with Alzheimer's disease.
Keyphrases