Multivariate association analysis with somatic mutation data.
Qianchuan HeYang LiuUlrike PetersLi HsuPublished in: Biometrics (2017)
Somatic mutations are the driving forces for tumor development, and recent advances in cancer genome sequencing have made it feasible to evaluate the association between somatic mutations and cancer-related traits in large sample sizes. However, despite increasingly large sample sizes, it remains challenging to conduct statistical analysis for somatic mutations, because the vast majority of somatic mutations occur at very low frequencies. Furthermore, cancer is a complex disease and it is often accompanied by multiple traits that reflect various aspects of cancer; how to combine the information of these traits to identify important somatic mutations poses additional challenges. In this article, we introduce a statistical approach, named as SOMAT, for detecting somatic mutations associated with multiple cancer-related traits. Our approach provides a flexible framework for analyzing continuous, binary, or a mixture of both types of traits, and is statistically powerful and computationally efficient. In addition, we propose a data-adaptive procedure, which is grid-search free, for effectively combining test statistics to enhance statistical power. We conduct an extensive study and show that the proposed approach maintains correct type I error and is more powerful than existing approaches under the scenarios considered. We also apply our approach to an exome-sequencing study of liver tumor for illustration.