Login / Signup

Lessons from assembling UCEs: A comparison of common methods and the case of Clavinomia (Halictidae).

Silas BossertAlain PaulyBryan N DanforthMichael C OrrElizabeth A Murray
Published in: Molecular ecology resources (2024)
Sequence data assembly is a foundational step in high-throughput sequencing, with untold consequences for downstream analyses. Despite this, few studies have interrogated the many methods for assembling phylogenomic UCE data for their comparative efficacy, or for how outputs may be impacted. We study this by comparing the most commonly used assembly methods for UCEs in the under-studied bee lineage Nomiinae and a representative sampling of relatives. Data for 63 UCE-only and 75 mixed taxa were assembled with five methods, including ABySS, HybPiper, SPAdes, Trinity and Velvet, and then benchmarked for their relative performance in terms of locus capture parameters and phylogenetic reconstruction. Unexpectedly, Trinity and Velvet trailed the other methods in terms of locus capture and DNA matrix density, whereas SPAdes performed favourably in most assessed metrics. In comparison with SPAdes, the guided-assembly approach HybPiper generally recovered the highest quality loci but in lower numbers. Based on our results, we formally move Clavinomia to Dieunomiini and render Epinomia once more a subgenus of Dieunomia. We strongly advise that future studies more closely examine the influence of assembly approach on their results, or, minimally, use better-performing assembly methods such as SPAdes or HybPiper. In this way, we can move forward with phylogenomic studies in a more standardized, comparable manner.
Keyphrases
  • electronic health record
  • big data
  • gene expression
  • high throughput sequencing
  • machine learning
  • genome wide
  • artificial intelligence