Generic Repeat Finder: A High-Sensitivity Tool for Genome-Wide De Novo Repeat Detection.
Jieming ShiChun LiangPublished in: Plant physiology (2019)
Comprehensive and accurate annotation of the repeatome, including transposons, is critical for deepening our understanding of repeat origins, biogenesis, regulatory mechanisms, and roles. Here, we developed Generic Repeat Finder (GRF), a tool for genome-wide repeat detection based on fast, exhaustive numerical calculation algorithms integrated with optimized dynamic programming strategies. GRF sensitively identifies terminal inverted repeats (TIRs), terminal direct repeats (TDRs), and interspersed repeats that bear both inverted and direct repeats. GRF also detects DNA or RNA transposable elements characterized by these repeats in plant and animal genomes. For TIRs and TDRs, GRF identifies spacers in the middle and mismatches/insertions or deletions in terminal repeats, showing their alignment or base-pairing information. GRF helps improve the annotation for various DNA transposons and retrotransposons, such as miniature inverted-repeat transposable elements (MITEs), long terminal repeat (LTR) retrotransposons, and non-LTR retrotransposons, including long interspersed nuclear elements and short interspersed nuclear elements in plants. We used GRF to perform TIR/TDR, interspersed-repeat, and MITE detection in several species, including Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa), and mouse (Mus musculus). As a generic bioinformatics tool in repeat finding implemented as a parallelized C++ program, GRF was faster and more sensitive than the existing inverted repeat/MITE detection tools based on numerical approaches (i.e. detectIR and detectMITE) in Arabidopsis and mouse. GRF is more sensitive than Inverted Repeat Finder in TIR detection, LTR_FINDER in short TDR detection (≤1,000 nt), and phRAIDER in interspersed repeat detection in Arabidopsis and rice. GRF is an open source available from Github.