Large genomic deletions delineate Mycobacterium tuberculosis L4 sublineages in South American countries.
Andres BaenaFelipe CabarcasJuan C OcampoLuis F BarreraJuan Fernando AlzatePublished in: PloS one (2023)
Mycobacterium tuberculosis (Mtb) is still one of the primary pathogens of humans causing tuberculosis (TB) disease. Mtb embraces nine well-defined phylogenetic lineages with biological and geographical disparities. The lineage L4 is the most globally widespread of all lineages and was introduced to America with European colonization. Taking advantage of many genome projects available in public repositories, we undertake an evolutionary and comparative genomic analysis of 522 L4 Latin American Mtb genomes. Initially, we performed careful quality control of public read datasets and applied several thresholds to filter out low-quality data. Using a genome de novo assembly strategy and phylogenomic methods, we spotted novel south American clades that have not been revealed yet. Additionally, we describe genomic deletion profiles of these strains from an evolutionary perspective and report Mycobacterium tuberculosis L4 sublineages signature-like gene deletions, some of the novel. One is a specific deletion of 6.5 kbp that is only present in sublineage 4.1.2.1. This deletion affects a complex group of 10 genes with putative products annotated, among others, as a lipoprotein, transmembrane protein, and toxin/antitoxin system proteins. The second novel deletion spans for 4.9 kbp and specific of a particular clade of the 4.8 sublineage and affects 7 genes. The last novel deletion affects 4 genes, extends for 4.8 kbp., and is specific to some strains within the 4.1.2.1 sublineage that are present in Colombia, Peru and Brasil.
Keyphrases
- mycobacterium tuberculosis
- genome wide
- copy number
- pulmonary tuberculosis
- escherichia coli
- dna methylation
- quality control
- genome wide identification
- healthcare
- single cell
- small molecule
- quality improvement
- genome wide analysis
- electronic health record
- transcription factor
- big data
- gene expression
- health insurance
- adverse drug
- hepatitis c virus
- data analysis
- antimicrobial resistance