Deriving genotypes from RAD-seq short-read data using Stacks.
Nicolas C RochetteJulian M CatchenPublished in: Nature protocols (2017)
Restriction site-associated DNA sequencing (RAD-seq) allows for the genome-wide discovery and genotyping of single-nucleotide polymorphisms in hundreds of individuals at a time in model and nonmodel species alike. However, converting short-read sequencing data into reliable genotype data remains a nontrivial task, especially as RAD-seq is used in systems that have very diverse genomic properties. Here, we present a protocol to analyze RAD-seq data using the Stacks pipeline. This protocol will be of use in areas such as ecology and population genetics. It covers the assessment and demultiplexing of the sequencing data, read mapping, inference of RAD loci, genotype calling, and filtering of the output data, as well as providing two simple examples of downstream biological analyses. We place special emphasis on checking the soundness of the procedure and choosing the main parameters, given the properties of the data. The procedure can be completed in 1 week, but determining definitive methodological choices will typically take up to 1 month.
Keyphrases
- genome wide
- single cell
- electronic health record
- big data
- dna damage
- dna repair
- rna seq
- dna methylation
- single molecule
- machine learning
- data analysis
- minimally invasive
- small molecule
- gene expression
- radiation therapy
- high resolution
- artificial intelligence
- copy number
- cell free
- mass spectrometry
- high density
- circulating tumor
- rectal cancer