Restricting datasets to classifiable samples augments discovery of immune disease biomarkers.
Gunther GlehrPaloma RiquelmeKatharina KronenbergRobert LohmayerVictor J López-MadronaMichael KapinskyHans Jürgen SchlittEdward K GeisslerRainer SpangSebastian HaferkampJames A HutchinsonPublished in: Nature communications (2024)
Immunological diseases are typically heterogeneous in clinical presentation, severity and response to therapy. Biomarkers of immune diseases often reflect this variability, especially compared to their regulated behaviour in health. This leads to a common difficulty that frustrates biomarker discovery and interpretation - namely, unequal dispersion of immune disease biomarker expression between patient classes necessarily limits a biomarker's informative range. To solve this problem, we introduce dataset restriction, a procedure that splits datasets into classifiable and unclassifiable samples. Applied to synthetic flow cytometry data, restriction identifies biomarkers that are otherwise disregarded. In advanced melanoma, restriction finds biomarkers of immune-related adverse event risk after immunotherapy and enables us to build multivariate models that accurately predict immunotherapy-related hepatitis. Hence, dataset restriction augments discovery of immune disease biomarkers, increases predictive certainty for classifiable samples and improves multivariate models incorporating biomarkers with a limited informative range. This principle can be directly extended to any classification task.
Keyphrases
- small molecule
- flow cytometry
- public health
- high throughput
- poor prognosis
- machine learning
- emergency department
- stem cells
- gene expression
- genome wide
- case report
- risk assessment
- data analysis
- social media
- dna methylation
- long non coding rna
- health information
- transcription factor
- replacement therapy
- binding protein
- smoking cessation