Causal inference in genetic trio studies.
Stephen BatesMatteo SesiaChiara SabattiEmmanuel CandèsPublished in: Proceedings of the National Academy of Sciences of the United States of America (2020)
We introduce a method to draw causal inferences-inferences immune to all possible confounding-from genetic data that include parents and offspring. Causal conclusions are possible with these data because the natural randomness in meiosis can be viewed as a high-dimensional randomized experiment. We make this observation actionable by developing a conditional independence test that identifies regions of the genome containing distinct causal variants. The proposed digital twin test compares an observed offspring to carefully constructed synthetic offspring from the same parents to determine statistical significance, and it can leverage any black-box multivariate model and additional nontrio genetic data to increase power. Crucially, our inferences are based only on a well-established mathematical model of recombination and make no assumptions about the relationship between the genotypes and phenotypes. We compare our method to the widely used transmission disequilibrium test and demonstrate enhanced power and localization.
Keyphrases
- genome wide
- copy number
- electronic health record
- high fat diet
- big data
- data analysis
- dna damage
- dna repair
- clinical trial
- gene expression
- randomized controlled trial
- single cell
- metabolic syndrome
- wastewater treatment
- adipose tissue
- phase iii
- oxidative stress
- binding protein
- placebo controlled
- artificial intelligence
- phase ii
- study protocol