PIFiA: Self-supervised Approach for Protein Functional Annotation from Single-Cell Imaging Data.
Anastasiia RazdaibiedinaAlexander V BrechalovHelena FriesenMojca Mattiazzi UsajMyra Paz David MasinasHarsha Garadi SureshKyle WangCharles BooneJimmy BaBrenda J AndrewsPublished in: bioRxiv : the preprint server for biology (2023)
Fluorescence microscopy data describe protein localization patterns at single-cell resolution and have the potential to reveal whole-proteome functional information with remarkable precision. Yet, extracting biologically meaningful representations from cell micrographs remains a major challenge. Existing approaches often fail to learn robust and noise-invariant features or rely on supervised labels for accurate annotations. We developed PIFiA, ( P rotein I mage-based F unct i onal A nnotation), a self-supervised approach for protein functional annotation from single-cell imaging data. We imaged the global yeast ORF-GFP collection and applied PIFiA to generate protein feature profiles from single-cell images of fluorescently tagged proteins. We show that PIFiA outperforms existing approaches for molecular representation learning and describe a range of downstream analysis tasks to explore the information content of the feature profiles. Specifically, we cluster extracted features into a hierarchy of functional organization, study cell population heterogeneity, and develop techniques to distinguish multi-localizing proteins and identify functional modules. Finally, we confirm new PIFiA predictions using a colocalization assay, suggesting previously unappreciated biological roles for several proteins. Paired with a fully interactive website ( https://thecellvision.org/pifia/ ), PIFiA is a resource for the quantitative analysis of protein organization within the cell.
Keyphrases
- single cell
- rna seq
- high throughput
- high resolution
- machine learning
- protein protein
- binding protein
- single molecule
- electronic health record
- amino acid
- deep learning
- stem cells
- working memory
- gene expression
- dna methylation
- convolutional neural network
- air pollution
- artificial intelligence
- genome wide
- social media
- risk assessment
- healthcare
- climate change
- energy transfer