Comparing natural language processing representations of coded disease sequences for prediction in electronic health records.
Thomas BeaneySneha JhaAsem AlaaAlexander SmithJonathan ClarkeThomas WoodcockAzeem MajeedPaul P AylinMauricio BarahonaPublished in: Journal of the American Medical Informatics Association : JAMIA (2024)
Patient representations produced by sequence-based NLP algorithms from sequences of disease codes demonstrate improved predictive content for patient outcomes compared with representations generated by co-occurrence-based algorithms. This suggests transformer models may be useful for generating multi-purpose representations, even without fine-tuning.