Login / Signup

Unlocking the Secrets Behind Advanced Artificial Intelligence Language Models in Deidentifying Chinese-English Mixed Clinical Text: Development and Validation Study.

You-Qian LeeChing-Tai ChenChien-Chang ChenChung-Hong LeePeitsz ChenChi-Shin WuHong-Jie Dai
Published in: Journal of medical Internet research (2024)
The study contributes to understanding the underlying mechanism of PLMs in addressing the deidentification process in the code-mixed context and highlights the significance of incorporating code-mixed training instances into the model training phase. To support the advancement of research, we created a manipulated subset of the resynthesized data set available for research purposes. Based on the compiled data set, we found that the LLM-based deidentification method is a feasible approach, but carefully crafted prompts are essential to avoid unwanted output. However, the use of such methods in the hospital setting requires careful consideration of data security and privacy concerns. Further research could explore the augmentation of PLMs and LLMs with external knowledge to improve their strength in recognizing rare PHI.
Keyphrases
  • big data
  • artificial intelligence
  • electronic health record
  • machine learning
  • healthcare
  • deep learning
  • autism spectrum disorder
  • virtual reality
  • health information
  • drug induced