EnsMOD: A Software Program for Omics Sample Outlier Detection.
Nathan P ManesJian SongAleksandra Nita-LazarPublished in: Journal of computational biology : a journal of computational molecular cell biology (2023)
Detection of omics sample outliers is important for preventing erroneous biological conclusions, developing robust experimental protocols, and discovering rare biological states. Two recent publications describe robust algorithms for detecting transcriptomic sample outliers, but neither algorithm had been incorporated into a software tool for scientists. Here we describe Ensemble Methods for Outlier Detection (EnsMOD) which incorporates both algorithms. EnsMOD calculates how closely the quantitation variation follows a normal distribution, plots the density curves of each sample to visualize anomalies, performs hierarchical cluster analyses to calculate how closely the samples cluster with each other, and performs robust principal component analyses to statistically test if any sample is an outlier. The probabilistic threshold parameters can be easily adjusted to tighten or loosen the outlier detection stringency. EnsMOD can be used to analyze any omics dataset with normally distributed variance. Here it was used to analyze a simulated proteomics dataset, a multiomic (proteome and transcriptome) dataset, a single-cell proteomics dataset, and a phosphoproteomics dataset. EnsMOD successfully identified all of the simulated outliers, and subsequent removal of a detected outlier improved data quality for downstream statistical analyses.
Keyphrases
- single cell
- label free
- rna seq
- machine learning
- loop mediated isothermal amplification
- real time pcr
- mass spectrometry
- high throughput
- quality improvement
- gene expression
- neural network
- dna methylation
- data analysis
- artificial intelligence
- electronic health record
- sensitive detection
- liquid chromatography tandem mass spectrometry
- high performance liquid chromatography