Methodology for good machine learning with multi-omics data.
Thibaud CorollerBerkman SahinerAnup AmatyaAlexej GossmannKonstantinos KaragiannisConor MoloneyRavi K SamalaLuis Santana-QuinteroNadia SolovieffCraig WangLaleh Amiri-KordestaniQian CaoKenny H ChaRosane CharlabFrank H CrossTingting HuRuihao HuangJeffrey KraftPeter KruscheYutong LiZheng LiIlya MazoRahul PaulSusan SchnakenbergPaolo SerraSean SmithChi SongFei SuMohit TiwariColin VecheryXin XiongJuan Pablo ZarateHao ZhuArunava ChakravarttyQi LiuDavid OhlssenNicholas PetrickJulie A SchneiderMark WalderhaugEmmanuel ZuberPublished in: Clinical pharmacology and therapeutics (2023)
In 2020, Novartis Pharmaceuticals Corporation and the U.S. Food and Drug Administration (FDA) started a 4-year scientific collaboration to approach complex new data modalities and advanced analytics. The scientific question was to find novel radio-genomics-based prognostic and predictive factors for HR+/HER2- metastatic breast cancer under a Research Collaboration Agreement. This collaboration has been providing valuable insights to help successfully implement future scientific projects, particularly using artificial intelligence (AI) and machine learning (ML). This tutorial aims to provide tangible guidelines for a multi-omics project that includes multidisciplinary expert teams, spanning across different institutions. We cover key ideas such as "maintaining effective communication" and "following good data science practices", followed by the four steps of exploratory projects, namely 1) plan, 2) design, 3) develop and 4) disseminate. We break each step into smaller concepts with strategies for implementation and provide illustrations from our collaboration to further give the readers actionable guidance.