Branching topology of the human embryo transcriptome revealed by entropy sort feature weighting.
Arthur RadleyStefan BoeingAustin G SmithPublished in: Development (Cambridge, England) (2024)
Analysis of single cell transcriptomics (scRNA-seq) data is typically performed after sub-setting to highly variable genes (HVGs). Here we show that Entropy Sorting provides an alternative mathematical framework for feature selection. On synthetic datasets, continuous entropy sort feature weighting (cESFW) outperforms HVG selection in distinguishing cell state specific genes. We apply cESFW to six merged scRNA-seq datasets spanning human early embryo development. Without smoothing or augmenting the raw counts matrices, cESFW generates a high-resolution embedding displaying coherent developmental progression from 8-cell to post-implantation stages and delineating 15 distinct cell states. The embedding highlights sequential lineage decisions during blastocyst development while unsupervised clustering identifies branch point populations obscured in previous analyses. The first branching region, where morula cells become specified for inner cell mass or trophectoderm, includes cells previously asserted to lack a developmental trajectory. We quantify the relatedness of different pluripotent stem cell cultures to distinct embryo cell types and identify marker genes of naïve and primed pluripotency. Finally, by revealing genes with dynamic lineage-specific expression we provide markers for staging progression from morula to blastocyst.
Keyphrases
- single cell
- rna seq
- genome wide
- high throughput
- stem cells
- machine learning
- cell therapy
- endothelial cells
- induced apoptosis
- high resolution
- dna methylation
- deep learning
- gene expression
- oxidative stress
- bioinformatics analysis
- pregnant women
- long non coding rna
- artificial intelligence
- signaling pathway
- pluripotent stem cells
- mesenchymal stem cells
- binding protein
- pregnancy outcomes
- embryonic stem cells