node2vec2rank: Large Scale and Stable Graph Differential Analysis via Multi-Layer Node Embeddings and Ranking.
Panagiotis MandrosIan GallagherViola FanfaniChen ChenJonas FischerAnis IsmailLauren HsuEnakshi SahaDerrick K DeContiJohn QuackenbushPublished in: bioRxiv : the preprint server for biology (2024)
1Computational methods in biology can infer large molecular interaction networks from multiple data sources and at different resolutions, creating unprecedented opportunities to explore the mechanisms driving complex biological phenomena. Networks can be built to represent distinct conditions and compared to uncover graph-level differences-such as when comparing patterns of gene-gene interactions that change between biological states. Given the importance of the graph comparison problem, there is a clear and growing need for robust and scalable methods that can identify meaningful differences. We introduce node2vec2rank (n2v2r), a method for graph differential analysis that ranks nodes according to the disparities of their representations in joint latent embedding spaces. Improving upon previous bag-of-features approaches, we take advantage of recent advances in machine learning and statistics to compare graphs in higher-order structures and in a data-driven manner. Formulated as a multi-layer spectral embedding algorithm, n2v2r is computationally efficient, incorporates stability as a key feature, and can provably identify the correct ranking of differences between graphs in an overall procedure that adheres to veridical data science principles. By better adapting to the data, node2vec2rank clearly outperformed the commonly used node degree in finding complex differences in simulated data. In the real-world applications of breast cancer subtype characterization, analysis of cell cycle in single-cell data, and searching for sex differences in lung adenocarcinoma, node2vec2rank found meaningful biological differences enabling the hypothesis generation for therapeutic candidates. Software and analysis pipelines implementing n2v2r and used for the analyses presented here are publicly available.
Keyphrases
- machine learning
- lymph node
- cell cycle
- electronic health record
- big data
- single cell
- neural network
- convolutional neural network
- data analysis
- cell proliferation
- deep learning
- public health
- healthcare
- genome wide
- young adults
- mass spectrometry
- computed tomography
- artificial intelligence
- minimally invasive
- magnetic resonance
- dna methylation
- sentinel lymph node
- quality improvement
- locally advanced
- childhood cancer