Systematic assessment of long-read RNA-seq methods for transcript identification and quantification.
Francisco J Pardo-PalaciosDingjie WangFairlie ReeseMark E DiekhansSilvia Carbonell-SalaBrian A WilliamsJane E LovelandMaite De MaríaMatthew S AdamsGabriela Balderrama-GutierrezAmit K BeheraJose Manuel GonzalezToby HuntJulien LagardeCindy E LiangHaoran LiMarcus Jerryd MeadeDavid A Moraga AmadorAndrey D PrjibelskiInanc BirolHamed BostanAshley M BrooksMuhammed Hasan ÇelikYing ChenMei R M DuColette FeltonJonathan GokeSaber HafezqoraniRalf HerwigHideya KawajiJoseph LeeJian-Liang LiMatthias LienhardAlla MikheenkoDennis MulliganKa Ming NipMihaela PerteaMatthew E RitchieAndre D SimAlison D TangYuk Kei WanChangqing WangBrandon Y WongChen YangIf BarnesAndrew E BerrySalvador CapellaAlyssa CousineauNamrita DhillonJosé María FernandezLuis Ferrández-PeralNatàlia Garcia-ReyeroStefan GötzCarles Hernandez-FerrerLiudmyla KondratovaTianyuan LiuAlessandra Martinez-MartinCarlos MenorJorge Mestre-TomásJonathan M MudgeNedka G PanayotovaAlejandro PaniaguaDmitry RepchevskyXingjie RenEric C RouchkaBrandon Saint-JohnEnrique SapenaLeon SheynkmanMelissa Laird SmithMarie-Marthe SunerHazuki TakahashiIngrid A YoungworthPiero CarniciNancy D DenslowRoderic GuigoMargaret E HunterRené MaehrYin ShenHagen U TilgnerBarbara J WoldChristopher VollmersAdam FrankishKin Fai AuGloria M SheynkmanAli MortazaviAna ConesaAngela N BrooksPublished in: Nature methods (2024)
The Long-read RNA-Seq Genome Annotation Assessment Project Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. Using different protocols and sequencing platforms, the consortium generated over 427 million long-read sequences from complementary DNA and direct RNA datasets, encompassing human, mouse and manatee species. Developers utilized these data to address challenges in transcript isoform detection, quantification and de novo transcript detection. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. Incorporating additional orthogonal data and replicate samples is advised when aiming to detect rare and novel transcripts or using reference-free approaches. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis.