A simple strategy for sample annotation error detection in cytometry datasets.

Megan E Smithmyer Alice E Wiedeman David A G Skibinski Adam K Savage Carolina Acosta-Vega Sheila Scheiding Vivian H GersukColin O'RourkeSarah Alice Long Jane H Buckner Cate Speake

Published in: Cytometry. Part A : the journal of the International Society for Analytical Cytology (2021)

Mislabeling samples or data with the wrong participant information can affect study integrity and lead investigators to draw inaccurate conclusions. Quality control to prevent these types of errors is commonly embedded into the analysis of genomic datasets, but a similar identification strategy is not standard for cytometric data. Here, we present a method for detecting sample identification errors in cytometric data using expression of human leukocyte antigen (HLA) class I alleles. We measured HLA-A*02 and HLA-B*07 expression in three longitudinal samples from 41 participants using a 33-marker CyTOF panel designed to identify major immune cell types. 3/123 samples (2.4%) showed HLA allele expression that did not match their longitudinal pairs. Furthermore, these same three samples' cytometric signature did not match qPCR HLA class I allele data, suggesting that they were accurately identified as mismatches. We conclude that this technique is useful for detecting sample-labeling errors in cytometric analyses of longitudinal data. This technique could also be used in conjunction with another method, like GWAS or PCR, to detect errors in cross-sectional data. We suggest widespread adoption of this or similar techniques will improve the quality of clinical studies that utilize cytometry.

Keyphrases