in silico transcriptome dissection of neocortical excitatory neurogenesis via joint matrix decomposition and transfer learning.
Shreyash SonthaliaGuangyan LiXoel Mato BlancoAlex CasellaJinrui LiuGenevieve Stein-O'BrienBrian CaffoRicky S AdkinsJoshua OrvisRonna HertzanoAnup MahurkarJesse GillisJonathan WernerShaojie MaNicola MicaliNenad SestanPasko RakicGabriel SantpereSeth A AmentCarlo ColantuoniPublished in: bioRxiv : the preprint server for biology (2024)
The rising quality and amount of multi-omic data across biomedical science demands that we build innovative solutions to harness their collective discovery potential. From publicly available repositories, we have assembled and curated a compendium of gene-level transcriptomic data focused on mammalian excitatory neurogenesis in the neocortex. This collection is open for exploration by both computational and cell biologists at nemoanalytics.org , and this report forms a demonstration of its utility. Applying our novel structured joint decomposition approach to mouse, macaque and human data from the collection, we define transcriptome dynamics that are conserved across mammalian excitatory neurogenesis and which map onto the genetics of human brain structure and disease. Leveraging additional data within NeMO Analytics via projection methods, we chart the dynamics of these fundamental molecular elements of neurogenesis across developmental time and space and into postnatal life. Reversing the direction of our investigation, we use transcriptomic data from laminar-specific dissection of adult human neocortex to define molecular signatures specific to excitatory neuronal cell types resident in individual layers of the mature neocortex, and trace their emergence across development. We show that while many lineage defining transcription factors are most highly expressed at early fetal ages, the laminar neuronal identities which they drive take years to decades to reach full maturity. Finally, we interrogated data from stem-cell derived cerebral organoid systems demonstrating that many fundamental elements of in vivo development are recapitulated with high-fidelity in vitro , while specific transcriptomic programs in neuronal maturation are absent. We propose these analyses as specific applications of the general approach of combining joint decomposition with large curated collections of analysis-ready multi-omics data matrices focused on particular cell and disease contexts. Importantly, these open environments are accessible to, and must be fueled with emerging data by, cell biologists with and without coding expertise.
Keyphrases
- single cell
- electronic health record
- big data
- rna seq
- cerebral ischemia
- transcription factor
- cell therapy
- magnetic resonance
- minimally invasive
- gene expression
- single molecule
- artificial intelligence
- machine learning
- data analysis
- blood brain barrier
- heavy metals
- young adults
- deep learning
- dna binding
- molecular docking
- high density
- genome wide analysis