Mge-cluster: a reference-free approach for typing bacterial plasmids.
Sergio Arredondo-AlonsoRebecca A GladstoneAnna K PöntinenJoão A GamaAnita C SchürchVal F LanzaPål Jarle JohnsenØrjan SamuelsenGerry Q Tonkin-HillJukka CoranderPublished in: NAR genomics and bioinformatics (2023)
Extrachromosomal elements of bacterial cells such as plasmids are notorious for their importance in evolution and adaptation to changing ecology. However, high-resolution population-wide analysis of plasmids has only become accessible recently with the advent of scalable long-read sequencing technology. Current typing methods for the classification of plasmids remain limited in their scope which motivated us to develop a computationally efficient approach to simultaneously recognize novel types and classify plasmids into previously identified groups. Here, we introduce mge-cluster that can easily handle thousands of input sequences which are compressed using a unitig representation in a de Bruijn graph. Our approach offers a faster runtime than existing algorithms, with moderate memory usage, and enables an intuitive visualization, classification and clustering scheme that users can explore interactively within a single framework. M ge-cluster platform for plasmid analysis can be easily distributed and replicated, enabling a consistent labelling of plasmids across past, present, and future sequence collections. We underscore the advantages of our approach by analysing a population-wide plasmid data set obtained from the opportunistic pathogen Escherichia coli , studying the prevalence of the colistin resistance gene mcr-1.1 within the plasmid population, and describing an instance of resistance plasmid transmission within a hospital environment.
Keyphrases
- escherichia coli
- klebsiella pneumoniae
- machine learning
- deep learning
- high resolution
- biofilm formation
- induced apoptosis
- single cell
- healthcare
- mass spectrometry
- risk factors
- emergency department
- cell proliferation
- genetic diversity
- pseudomonas aeruginosa
- rna seq
- copy number
- gene expression
- high intensity
- signaling pathway
- adverse drug
- acinetobacter baumannii
- data analysis
- liquid chromatography