Improving the Accuracy of Bulk Fitness Assays by Correcting Barcode Processing Biases.
Ryan Seamus McGeeGrant KinslerDmitri A PetrovMikhail TikhonovPublished in: Molecular biology and evolution (2024)
Measuring the fitnesses of genetic variants is a fundamental objective in evolutionary biology. A standard approach for measuring microbial fitnesses in bulk involves labeling a library of genetic variants with unique sequence barcodes, competing the labeled strains in batch culture, and using deep sequencing to track changes in the barcode abundances over time. However, idiosyncratic properties of barcodes can induce nonuniform amplification or uneven sequencing coverage that causes some barcodes to be over- or under-represented in samples. This systematic bias can result in erroneous read count trajectories and misestimates of fitness. Here, we develop a computational method, named REBAR (Removing the Effects of Bias through Analysis of Residuals), for inferring the effects of barcode processing bias by leveraging the structure of systematic deviations in the data. We illustrate this approach by applying it to two independent data sets, and demonstrate that this method estimates and corrects for bias more accurately than standard proxies, such as GC-based corrections. REBAR mitigates bias and improves fitness estimates in high-throughput assays without introducing additional complexity to the experimental protocols, with potential applications in a range of experimental evolution and mutation screening contexts.
Keyphrases
- high throughput
- body composition
- physical activity
- single cell
- escherichia coli
- big data
- healthcare
- microbial community
- gene expression
- genome wide
- computed tomography
- dna methylation
- single molecule
- artificial intelligence
- deep learning
- risk assessment
- peripheral blood
- high resolution
- data analysis
- liquid chromatography