Machine learning to optimize automated RH genotyping using whole-exome sequencing data.
Ti-Cheng ChangJing YuZhaoming WangJane Silva HankinsMitchell J WeissGang WuConnie M WesthoffStella T ChouYan ZhengPublished in: Blood advances (2024)
Rh phenotype matching reduces but does not eliminate alloimmunization in patients with sickle cell disease (SCD) due to RH genetic diversity that is not distinguishable by serological typing. RH genotype matching can potentially mitigate Rh alloimmunization but comprehensive and accessible genotyping methods are needed. We developed RHtyper as an automated algorithm to predict RH genotypes using whole-genome sequencing (WGS) data with high accuracy. Here, we adapted RHtyper for whole-exome sequencing (WES) data, which are more affordable but challenged by uneven sequencing coverage and exacerbated sequencing read misalignment, resulting in uncertain predictions for (1) RHD zygosity and hybrid alleles, (2) RHCE∗C vs. RHCE∗c alleles, (3) RHD c.1136C>T zygosity, and (4) RHCE c.48G>C zygosity. We optimized RHtyper to accurately predict RHD and RHCE genotypes using WES data by leveraging machine learning models and improved the concordance of WES with WGS predictions from 90.8% to 97.2% for RHD and 96.3% to 98.2% for RHCE among 396 patients in the Sickle Cell Clinical Research and Intervention Program. In a second validation cohort of 3030 cancer survivors (15.2% Black or African Americans) from the St. Jude Lifetime Cohort Study, the optimized RHtyper reached concordance rates between WES and WGS predications to 96.3% for RHD and 94.6% for RHCE. Machine learning improved the accuracy of RH predication using WES data. RHtyper has the potential, once implemented, to provide a precision medicine-based approach to facilitate RH genotype-matched transfusion and improve transfusion safety for patients with SCD. This study used data from clinical trials registered at ClinicalTrials.gov as #NCT02098863 and NCT00760656.