Login / Signup

Ribonanza: deep learning of RNA structure through dual crowdsourcing.

Shujun HeRui HuangJill TownleyRachael C KretschThomas G KaragianesDavid B T CoxHamish BlairDmitry PenzarValeriy VyaltsevElizaveta AristovaArsenii ZinkevichArtemy BakulinHoyeol SohnDaniel KrstevskiTakaaki FukuiFumiya TatematsuYusuke UchidaDonghoon JangJun Seong LeeRoger ShiehTom MaEduard MartynovMaxim V ShugaevHabib S T BukhariKazuki FujikawaKazuki OnoderaChristof HenkelShlomo RonJonathan RomanoJohn J NicolGrace P NyeYuan WuChristian A ChoeWalter Readenull nullRachel J Hagey
Published in: bioRxiv : the preprint server for biology (2024)
Prediction of RNA structure from sequence remains an unsolved problem, and progress has been slowed by a paucity of experimental data. Here, we present Ribonanza, a dataset of chemical mapping measurements on two million diverse RNA sequences collected through Eterna and other crowdsourced initiatives. Ribonanza measurements enabled solicitation, training, and prospective evaluation of diverse deep neural networks through a Kaggle challenge, followed by distillation into a single, self-contained model called RibonanzaNet. When fine tuned on auxiliary datasets, RibonanzaNet achieves state-of-the-art performance in modeling experimental sequence dropout, RNA hydrolytic degradation, and RNA secondary structure, with implications for modeling RNA tertiary structure.
Keyphrases
  • deep learning
  • nucleic acid
  • machine learning
  • air pollution
  • artificial intelligence
  • big data
  • virtual reality