Proteomic signatures improve risk prediction for common and rare diseases.
Julia Carrasco-ZaniniMaik PietznerJonathan DavittePraveen SurendranDamien C Croteau-ChonkaChloe RobinsAna TorralboChristopher TomlinsonFlorian GrünschlägerNatalie FitzpatrickCai YtsmaTokuwa KannoStephan GadeDaniel FreitagFrederik ZiebellSimon HaasSpiros DenaxasJoanna C BettsNicholas J WarehamHarry HemingwayRobert A ScottClaudia LangenbergPublished in: Nature medicine (2024)
For many diseases there are delays in diagnosis due to a lack of objective biomarkers for disease onset. Here, in 41,931 individuals from the United Kingdom Biobank Pharma Proteomics Project, we integrated measurements of ~3,000 plasma proteins with clinical information to derive sparse prediction models for the 10-year incidence of 218 common and rare diseases (81-6,038 cases). We then compared prediction models developed using proteomic data with models developed using either basic clinical information alone or clinical information combined with data from 37 clinical assays. The predictive performance of sparse models including as few as 5 to 20 proteins was superior to the performance of models developed using basic clinical information for 67 pathologically diverse diseases (median delta C-index = 0.07; range = 0.02-0.31). Sparse protein models further outperformed models developed using basic information combined with clinical assay data for 52 diseases, including multiple myeloma, non-Hodgkin lymphoma, motor neuron disease, pulmonary fibrosis and dilated cardiomyopathy. For multiple myeloma, single-cell RNA sequencing from bone marrow in newly diagnosed patients showed that four of the five predictor proteins were expressed specifically in plasma cells, consistent with the strong predictive power of these proteins. External replication of sparse protein models in the EPIC-Norfolk study showed good generalizability for prediction of the six diseases tested. These findings show that sparse plasma protein signatures, including both disease-specific proteins and protein predictors shared across several diseases, offer clinically useful prediction of common and rare diseases.
Keyphrases
- newly diagnosed
- single cell
- bone marrow
- health information
- end stage renal disease
- high throughput
- chronic kidney disease
- electronic health record
- healthcare
- mass spectrometry
- protein protein
- oxidative stress
- risk factors
- social media
- rna seq
- induced apoptosis
- ejection fraction
- cell death
- pulmonary fibrosis
- genome wide
- cross sectional
- endoplasmic reticulum stress