High-throughput sequencing on preservative ethanol is effective at jointly examining infraspecific and taxonomic diversity, although bioinformatics pipelines do not perform equally.
Marjorie CoutonAurélien BaudClaire Daguin-ThiébautErwan CorreThierry ComtetFrédérique ViardPublished in: Ecology and evolution (2021)
High-throughput sequencing of amplicons (HTSA) has been proposed as an effective approach to evaluate taxonomic and genetic diversity at the same time. However, there are still uncertainties as to how the results produced by different bioinformatics treatments impact the conclusions drawn on biodiversity and population genetics indices.We evaluated the ability of six bioinformatics pipelines to recover taxonomic and genetic diversity from HTSA data obtained from controlled assemblages. To that end, 20 assemblages were produced using 354 colonies of Botrylloides spp., sampled in the wild in ten marinas around Brittany (France). We used DNA extracted from preservative ethanol (ebDNA) after various time of storage (3, 6, and 12 months), and from a bulk of preserved specimens (bulkDNA). DNA was amplified with primers designed for targeting this ascidian genus. Results obtained from HTSA data were compared with Sanger sequencing on individual zooids (i.e., individual barcoding).Species identification and relative abundance determined with HTSA data from either ebDNA or bulkDNA were similar to those obtained with traditional individual barcoding. However, after 12 months of storage, the correlation between HTSA and individual-based data was lower than after shorter durations. The six bioinformatics pipelines were able to depict accurately the genetic diversity using standard population genetics indices (HS and FST), despite producing false positives and missing rare haplotypes. However, they did not perform equally and dada2 was the only pipeline able to retrieve all expected haplotypes.This study showed that ebDNA is a nondestructive alternative for both species identification and haplotype recovery, providing storage does not last more than 6 months before DNA extraction. Choosing the bioinformatics pipeline is a matter of compromise, aiming to retrieve all true haplotypes while avoiding false positives. We here recommend to process HTSA data using dada2, including a chimera-removal step. Even if the possibility to use multiplexed primer sets deserves further investigation to expand the taxonomic coverage in future similar studies, we showed that primers targeting a particular genus allowed to reliably analyze this genus within a complex community.