Systematic assessment of long-read RNA-seq methods for transcript identification and quantification.

Francisco J Pardo-PalaciosDingjie WangFairlie Reese Mark E Diekhans Silvia Carbonell-Sala Brian A Williams Jane E Loveland Maite De María Matthew S AdamsGabriela Balderrama-GutierrezAmit K Behera Jose Manuel Gonzalez Toby HuntJulien LagardeCindy E LiangHaoran LiMarcus Jerryd MeadeDavid A Moraga AmadorAndrey D Prjibelski Inanc Birol Hamed Bostan Ashley M Brooks Muhammed Hasan Çelik Ying Chen Mei R M Du Colette Felton Jonathan Goke Saber Hafezqorani Ralf Herwig Hideya KawajiJoseph LeeJian-Liang Li Matthias LienhardAlla MikheenkoDennis MulliganKa Ming Nip Mihaela Pertea Matthew E Ritchie Andre D SimAlison D TangYuk Kei Wan Changqing Wang Brandon Y WongChen YangIf BarnesAndrew E Berry Salvador CapellaAlyssa CousineauNamrita DhillonJosé María Fernandez Luis Ferrández-Peral Natàlia Garcia-Reyero Stefan Götz Carles Hernandez-Ferrer Liudmyla Kondratova Tianyuan LiuAlessandra Martinez-MartinCarlos Menor Jorge Mestre-Tomás Jonathan M MudgeNedka G PanayotovaAlejandro PaniaguaDmitry Repchevsky Xingjie Ren Eric C RouchkaBrandon Saint-JohnEnrique SapenaLeon SheynkmanMelissa Laird Smith Marie-Marthe Suner Hazuki TakahashiIngrid A YoungworthPiero Carnici Nancy D Denslow Roderic Guigo Margaret E Hunter René Maehr Yin ShenHagen U TilgnerBarbara J Wold Christopher Vollmers Adam Frankish Kin Fai Au Gloria M Sheynkman Ali Mortazavi Ana Conesa Angela N Brooks

Published in: Nature methods (2024)

The Long-read RNA-Seq Genome Annotation Assessment Project Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. Using different protocols and sequencing platforms, the consortium generated over 427 million long-read sequences from complementary DNA and direct RNA datasets, encompassing human, mouse and manatee species. Developers utilized these data to address challenges in transcript isoform detection, quantification and de novo transcript detection. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. Incorporating additional orthogonal data and replicate samples is advised when aiming to detect rare and novel transcripts or using reference-free approaches. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis.

Keyphrases