De-identifying free text of Japanese electronic health records.

Kohei KajiyamaHiromasa HoriguchiTakashi OkumuraMizuki MoritaYoshinobu Kano

Published in: Journal of biomedical semantics (2020)

Our LSTM-based machine learning method was able to extract named entities to be de-identified with better performance, in general, than that of our rule-based methods. However, machine learning methods are inadequate for processing expressions with low occurrence. Our future work will specifically examine the combination of LSTM and rule-based methods to achieve better performance. Our currently achieved level of performance is sufficiently higher than that of publicly available Japanese de-identification tools. Therefore, our system will be applied to actual de-identification tasks in hospitals.

Keyphrases

machine learning
electronic health record
artificial intelligence
healthcare
neural network
risk assessment
oxidative stress
big data
bioinformatics analysis
smoking cessation