Projection Word Embedding Model With Hybrid Sampling Training for Classifying ICD-10-CM Codes: Longitudinal Observational Study.
Chin LinYu-Sheng LouDung-Jang TsaiChia-Cheng LeeChia-Jung HsuDing-Chung WuMei-Chuen WangWen-Hui FangPublished in: JMIR medical informatics (2019)
The word embeddings trained using EHR and PubMed could understand medical semantics better, and the proposed projection word2vec model improved the ability of medical semantics extraction in Wikipedia embeddings. Although the improvement from the projection word2vec model in the real ICD-10-CM coding task was not substantial, the models could effectively handle emerging diseases. The proposed hybrid sampling method enables the model to behave like a human expert.