Inferring expressed genes by whole-genome sequencing of plasma DNA.
Peter UlzGerhard G ThallingerMartina AuerRicarda GrafKarl KashoferStephan W JahnLuca AbeteGunda PristauzEdgar PetruJochen B GeiglEllen HeitzerMichael R SpeicherPublished in: Nature genetics (2016)
The analysis of cell-free DNA (cfDNA) in plasma represents a rapidly advancing field in medicine. cfDNA consists predominantly of nucleosome-protected DNA shed into the bloodstream by cells undergoing apoptosis. We performed whole-genome sequencing of plasma DNA and identified two discrete regions at transcription start sites (TSSs) where nucleosome occupancy results in different read depth coverage patterns for expressed and silent genes. By employing machine learning for gene classification, we found that the plasma DNA read depth patterns from healthy donors reflected the expression signature of hematopoietic cells. In patients with cancer having metastatic disease, we were able to classify expressed cancer driver genes in regions with somatic copy number gains with high accuracy. We were able to determine the expressed isoform of genes with several TSSs, as confirmed by RNA-seq analysis of the matching primary tumor. Our analyses provide functional information about cells releasing their DNA into the circulation.
Keyphrases
- copy number
- genome wide
- cell cycle arrest
- single molecule
- circulating tumor
- induced apoptosis
- machine learning
- cell free
- rna seq
- endoplasmic reticulum stress
- mitochondrial dna
- genome wide identification
- cell death
- dna methylation
- bioinformatics analysis
- nucleic acid
- squamous cell carcinoma
- healthcare
- poor prognosis
- pi k akt
- artificial intelligence
- binding protein
- papillary thyroid
- health information
- gene expression
- young adults
- lymph node metastasis