Comparing natural language processing representations of coded disease sequences for prediction in electronic health records.

Thomas BeaneySneha JhaAsem AlaaAlexander SmithJonathan ClarkeThomas WoodcockAzeem MajeedPaul P Aylin Mauricio Barahona

Published in: Journal of the American Medical Informatics Association : JAMIA (2024)

Patient representations produced by sequence-based NLP algorithms from sequences of disease codes demonstrate improved predictive content for patient outcomes compared with representations generated by co-occurrence-based algorithms. This suggests transformer models may be useful for generating multi-purpose representations, even without fine-tuning.

Keyphrases

working memory
electronic health record
machine learning
deep learning
clinical decision support
adverse drug
amino acid