Login / Signup

Towards a better future for DNA barcoding: Evaluating monophyly- and distance-based species identification using COI gene fragments of Dacini fruit flies.

Camiel DoorenweerdMichael San JoseLuc LeblancNorman BarrScott M GeibArthur Y C ChungJulian R DupuisArni EkayantiElaida FiegalanKennantudawage S HemachandraMohammad Aftab HossainChia-Lung HuangYu-Feng HsuKimberly Y MorrisAndi Maryani A MustapengJerome NiogretThai Hong PhamNhien Thi NguyenUda G A I SirisenaTerrence ToddDaniel Rubinoff
Published in: Molecular ecology resources (2024)
The utility of a universal DNA 'barcode' fragment (658 base pairs of the Cytochrome C Oxidase I [COI] gene) has been established as a useful tool for species identification, and widely criticized as one for understanding the evolutionary history of a group. Large amounts of COI sequence data have been produced that hold promise for rapid species identification, for example, for biosecurity. The fruit fly tribe Dacini holds about a thousand species, of which 80 are pests of economic concern. We generated a COI reference library for 265 species of Dacini containing 5601 sequences that span most of the COI gene using circular consensus sequencing. We compared distance metrics versus monophyly assessments for species identification and although we found a 'soft' barcode gap around 2% pairwise distance, the exceptions to this rule dictate that a monophyly assessment is the only reliable method for species identification. We found that all fragments regularly used for Dacini fruit fly identification >450 base pairs long provide similar resolution. 11.3% of the species in our dataset were non-monophyletic in a COI tree, which is mostly due to species complexes. We conclude with recommendations for the future generation and use of COI libraries. We revise the generic assignment of Dacus transversus stat. rev. Hardy 1982, and Dacus perpusillus stat. rev. Drew 1971 and we establish Dacus maculipterus White 1998 syn. nov. as a junior synonym of Dacus satanas Liang et al. 1993.
Keyphrases
  • genetic diversity
  • copy number
  • single molecule
  • bioinformatics analysis
  • big data
  • current status
  • transcription factor
  • machine learning
  • cell free
  • circulating tumor cells
  • genome wide identification