Relating enhancer genetic variation across mammals to complex phenotypes using machine learning.
Irene M KaplowAlyssa J LawlerDaniel E SchäfferChaitanya SrinivasanHeather H SestiliMorgan E WirthlinBaDoi N PhanKavya PrasadAshley R BrownXiaomeng ZhangKathleen FoleyDiane P Genereuxnull nullElinor K KarlssonKerstin Lindblad-TohWynn K MeyerAndreas R PfenningGregory AndrewsJoel C ArmstrongMatteo BianchiBruce W BirrenKevin R BredemeyerAna M BreitMatthew J ChristmasHiram ClawsonJoana DamasFederica Di PalmaMark DiekhansMichael X DongEduardo EizirikKaili FanCornelia FanterNicole M FoleyKarin Forsberg-NilssonCarlos J GarciaJohn GatesySteven GazalDiane P GenereuxLinda GoodmanJenna GrimshawMichaela K HalseyAndrew J HarrisGlenn HickeyMichael HillerAllyson G HindleRobert M HubleyGraham M HughesJeremy JohnsonDavid JuanIrene M KaplowElinor K KarlssonKathleen C KeoughBogdan KirilenkoKlaus-Peter KoepfliJennifer M KorstianAmanda KowalczykSergey V KozyrevAlyssa J LawlerColleen LawlessThomas LehmannDanielle L LevesqueHarris A LewinXue LiAbigail LindKerstin Lindblad-TohAva Mackay-SmithVoichita D MarinescuTomas Marques-BonetVictor C MasonJennifer R S MeadowsWynn K MeyerJill E MooreLucas R MoreiraDiana D Moreno-SantillanKathleen M MorrillGerard MuntanéWilliam J MurphyArcadi NavarroMartin NweeiaSylvia OrtmannAustin OsmanskiBenedict PatenNicole S PaulatAndreas R PfenningBaDoi N PhanKatherine S PollardHenry E PrattDavid A RaySteven K ReillyJeb R RosenIrina RufLouise RyanOliver A RyderPardis C SabetiDaniel E SchäfferAitor SerresBeth ShapiroArian F A SmitMark SpringerChaitanya SrinivasanCynthia SteinerJessica M StorerKevin A M SullivanPatrick F SullivanElisabeth SundströmMegan A SuppleRoss SwoffordJoy-El TalbotEmma TeelingJason Turner-MaierAlejandro ValenzuelaFranziska WagnerOla WallermanChao WangJuehan WangZhiping WengAryn P WilderMorgan E WirthlinJames R XueXiaomeng ZhangPublished in: Science (New York, N.Y.) (2023)
Protein-coding differences between species often fail to explain phenotypic diversity, suggesting the involvement of genomic elements that regulate gene expression such as enhancers. Identifying associations between enhancers and phenotypes is challenging because enhancer activity can be tissue-dependent and functionally conserved despite low sequence conservation. We developed the Tissue-Aware Conservation Inference Toolkit (TACIT) to associate candidate enhancers with species' phenotypes using predictions from machine learning models trained on specific tissues. Applying TACIT to associate motor cortex and parvalbumin-positive interneuron enhancers with neurological phenotypes revealed dozens of enhancer-phenotype associations, including brain size-associated enhancers that interact with genes implicated in microcephaly or macrocephaly. TACIT provides a foundation for identifying enhancers associated with the evolution of any convergently evolved phenotype in any large group of species with aligned genomes.
Keyphrases
- gene expression
- transcription factor
- machine learning
- binding protein
- zika virus
- single cell
- functional connectivity
- intellectual disability
- resting state
- multiple sclerosis
- blood brain barrier
- body composition
- autism spectrum disorder
- resistance training
- deep learning
- subarachnoid hemorrhage
- bioinformatics analysis