Branching topology of the human embryo transcriptome revealed by Entropy Sort Feature Weighting.
Arthur RadleyStefan BoeingAustin G SmithPublished in: Development (Cambridge, England) (2024)
Analysis of single cell transcriptomics (scRNA-seq) data is typically performed after subsetting to highly variable genes (HVGs). Here, we show that Entropy Sorting provides an alternative mathematical framework for feature selection. On synthetic datasets, continuous Entropy Sort Feature Weighting (cESFW) outperforms HVG selection in distinguishing cell-state-specific genes. We apply cESFW to six merged scRNA-seq datasets spanning human early embryo development. Without smoothing or augmenting the raw counts matrices, cESFW generates a high-resolution embedding displaying coherent developmental progression from eight-cell to post-implantation stages and delineating 15 distinct cell states. The embedding highlights sequential lineage decisions during blastocyst development, while unsupervised clustering identifies branch point populations obscured in previous analyses. The first branching region, where morula cells become specified for inner cell mass or trophectoderm, includes cells previously asserted to lack a developmental trajectory. We quantify the relatedness of different pluripotent stem cell cultures to distinct embryo cell types and identify marker genes of naïve and primed pluripotency. Finally, by revealing genes with dynamic lineage-specific expression, we provide markers for staging progression from morula to blastocyst.
Keyphrases
- single cell
- rna seq
- genome wide
- high throughput
- stem cells
- cell therapy
- high resolution
- machine learning
- endothelial cells
- induced apoptosis
- cell cycle arrest
- dna methylation
- poor prognosis
- lymph node
- transcription factor
- pregnant women
- artificial intelligence
- induced pluripotent stem cells
- genome wide identification
- deep learning
- pet ct
- mass spectrometry