Machine learning to optimize automated RH genotyping using whole-exome sequencing data.
Ti-Cheng ChangJing YuZhaoming WangJane Silva HankinsMitchell J WeissGang WuConnie Marie WesthoffStella T ChouYan ZhengPublished in: Blood advances (2024)
Rh phenotype matching reduces but does not eliminate alloimmunization in patients with sickle cell disease (SCD) due to RH genetic diversity that is not distinguishable by serological typing. RH genotype matching can potentially mitigate Rh alloimmunization, but comprehensive and accessible genotyping methods are needed. We developed RHtyper as an automated algorithm to predict RH genotypes using whole-genome sequencing (WGS) data with high accuracy. Here, we adapted RHtyper for whole-exome sequencing (WES) data which are more affordable but challenged by uneven sequencing coverage and exacerbated sequencing read misalignment, resulting in uncertain prediction for 1) RHD zygosity and hybrid alleles, 2) RHCE*C versus RHCE*c alleles, 3) RHD c.1136C>T zygosity, and 4) RHCE c.48G>C zygosity. We optimized RHtyper to accurately predict RHD and RHCE genotypes using WES data by leveraging machine learning models and improved the concordance of WES with WGS predictions from 90.8% to 97.2% for RHD and 96.3 to 98.2% for RHCE among 396 patients in the Sickle Cell Clinical Research and Intervention Program (SCCRIP). In a second validation cohort with 3030 cancer survivors (15.2% Black or African Americans) from the St. Jude Lifetime Cohort Study (SJLIFE), the optimized RHtyper reached concordance rates between WES and WGS predications to 96.3% for RHD, and 94.6% for RHCE. In conclusion, machine learning improved the accuracy of RH predication from WES data. RHtyper has the potential, once implemented, to provide a precision medicine-based approach to facilitate RH genotype-matched transfusion and improve transfusion safety for patients with SCD.
Keyphrases
- machine learning
- big data
- genetic diversity
- electronic health record
- artificial intelligence
- deep learning
- randomized controlled trial
- high throughput
- end stage renal disease
- cardiac surgery
- sickle cell disease
- healthcare
- chronic kidney disease
- newly diagnosed
- ejection fraction
- acute kidney injury
- prognostic factors
- quality improvement
- single molecule
- gene expression
- human health