A feature extraction free approach for protein interactome inference from co-elution data.
Yu-Hsin ChenKuan-Hao ChaoJin Yung WongChien-Fu LiuJun-Yi LeuHuai-Kuang TsaiPublished in: Briefings in bioinformatics (2023)
Protein complexes are key functional units in cellular processes. High-throughput techniques, such as co-fractionation coupled with mass spectrometry (CF-MS), have advanced protein complex studies by enabling global interactome inference. However, dealing with complex fractionation characteristics to define true interactions is not a simple task, since CF-MS is prone to false positives due to the co-elution of non-interacting proteins by chance. Several computational methods have been designed to analyze CF-MS data and construct probabilistic protein-protein interaction (PPI) networks. Current methods usually first infer PPIs based on handcrafted CF-MS features, and then use clustering algorithms to form potential protein complexes. While powerful, these methods suffer from the potential bias of handcrafted features and severely imbalanced data distribution. However, the handcrafted features based on domain knowledge might introduce bias, and current methods also tend to overfit due to the severely imbalanced PPI data. To address these issues, we present a balanced end-to-end learning architecture, Software for Prediction of Interactome with Feature-extraction Free Elution Data (SPIFFED), to integrate feature representation from raw CF-MS data and interactome prediction by convolutional neural network. SPIFFED outperforms the state-of-the-art methods in predicting PPIs under the conventional imbalanced training. When trained with balanced data, SPIFFED had greatly improved sensitivity for true PPIs. Moreover, the ensemble SPIFFED model provides different voting schemes to integrate predicted PPIs from multiple CF-MS data. Using the clustering software (i.e. ClusterONE), SPIFFED allows users to infer high-confidence protein complexes depending on the CF-MS experimental designs. The source code of SPIFFED is freely available at: https://github.com/bio-it-station/SPIFFED.
Keyphrases
- mass spectrometry
- protein protein
- cystic fibrosis
- electronic health record
- multiple sclerosis
- ms ms
- machine learning
- small molecule
- deep learning
- single cell
- liquid chromatography
- healthcare
- high throughput
- convolutional neural network
- data analysis
- gas chromatography
- high performance liquid chromatography
- high intensity
- binding protein
- amino acid
- artificial intelligence