The Essential Genome of Escherichia coli K-12.
Emily C A GoodallAshley RobinsonIain G JohnstonSara JabbariKeith A TurnerAdam F CunninghamPeter A LundJeffrey A ColeIan R HendersonPublished in: mBio (2018)
Transposon-directed insertion site sequencing (TraDIS) is a high-throughput method coupling transposon mutagenesis with short-fragment DNA sequencing. It is commonly used to identify essential genes. Single gene deletion libraries are considered the gold standard for identifying essential genes. Currently, the TraDIS method has not been benchmarked against such libraries, and therefore, it remains unclear whether the two methodologies are comparable. To address this, a high-density transposon library was constructed in Escherichia coli K-12. Essential genes predicted from sequencing of this library were compared to existing essential gene databases. To decrease false-positive identification of essential genes, statistical data analysis included corrections for both gene length and genome length. Through this analysis, new essential genes and genes previously incorrectly designated essential were identified. We show that manual analysis of TraDIS data reveals novel features that would not have been detected by statistical analysis alone. Examples include short essential regions within genes, orientation-dependent effects, and fine-resolution identification of genome and protein features. Recognition of these insertion profiles in transposon mutagenesis data sets will assist genome annotation of less well characterized genomes and provides new insights into bacterial physiology and biochemistry.IMPORTANCE Incentives to define lists of genes that are essential for bacterial survival include the identification of potential targets for antibacterial drug development, genes required for rapid growth for exploitation in biotechnology, and discovery of new biochemical pathways. To identify essential genes in Escherichia coli, we constructed a transposon mutant library of unprecedented density. Initial automated analysis of the resulting data revealed many discrepancies compared to the literature. We now report more extensive statistical analysis supported by both literature searches and detailed inspection of high-density TraDIS sequencing data for each putative essential gene for the E. coli model laboratory organism. This paper is important because it provides a better understanding of the essential genes of E. coli, reveals the limitations of relying on automated analysis alone, and provides a new standard for the analysis of TraDIS data.
Keyphrases
- genome wide
- genome wide identification
- bioinformatics analysis
- escherichia coli
- high throughput
- genome wide analysis
- dna methylation
- data analysis
- high density
- electronic health record
- copy number
- big data
- systematic review
- transcription factor
- cystic fibrosis
- crispr cas
- hiv infected
- small molecule
- machine learning
- pseudomonas aeruginosa
- air pollution
- staphylococcus aureus
- men who have sex with men
- risk assessment
- antiretroviral therapy
- circulating tumor cells
- binding protein
- human immunodeficiency virus
- biofilm formation