De-identifying free text of Japanese electronic health records.
Kohei KajiyamaHiromasa HoriguchiTakashi OkumuraMizuki MoritaYoshinobu KanoPublished in: Journal of biomedical semantics (2020)
Our LSTM-based machine learning method was able to extract named entities to be de-identified with better performance, in general, than that of our rule-based methods. However, machine learning methods are inadequate for processing expressions with low occurrence. Our future work will specifically examine the combination of LSTM and rule-based methods to achieve better performance. Our currently achieved level of performance is sufficiently higher than that of publicly available Japanese de-identification tools. Therefore, our system will be applied to actual de-identification tasks in hospitals.