Restrander: rapid orientation and artefact removal for long-read cDNA data.
Jakob SchusterMatthew E RitchieQuentin A GouilPublished in: NAR genomics and bioinformatics (2023)
In transcriptomic analyses, it is helpful to keep track of the strand of the RNA molecules. However, the Oxford Nanopore long-read cDNA sequencing protocols generate reads that correspond to either the first or second-strand cDNA, therefore the strandedness of the initial transcript has to be inferred bioinformatically. Reverse transcription and PCR can also introduce artefacts which should be flagged in data pre-processing. Here we introduce Restrander , a lightning-fast and highly accurate tool for restranding and removing artefacts in long-read cDNA sequencing data. Thanks to its C++ implementation, Restrander was faster than Oxford Nanopore Technologies' existing tool Pychopper , and correctly restranded more reads due to its strategy of searching for polyA/T tails in addition to primer sequences from the reverse transcription and template-switch steps. We found that restranding improved the process of visualising and exploring data, and increased the number of novel isoforms discovered by bambu , particularly in regions where sense and anti-sense transcripts co-occur. The artefact detection implemented in Restrander quantifies reads lacking the correct 5' and 3' ends, a useful feature in quality control for library preparation. Restrander is pre-configured for all major cDNA protocols, and can be customised with user-defined primers. Restrander is available at https://github.com/mritchielab/restrander.