Login / Signup

Development and Validation of a Natural Language Processing Algorithm to Pseudonymize Documents in the Context of a Clinical Data Warehouse.

Xavier TannierPerceval WajsbürtAlice CalligerBasile DuraAlexandre MouchetMartin HilkaRomain Bey
Published in: Methods of information in medicine (2024)
 Our results show an overall performance of 0.99 of F1-score. We discuss implementation choices and present experiments to better understand the effort involved in such a task, including dataset size, document types, language models, or rule addition. We share guidelines and code under a 3-Clause BSD license.
Keyphrases
  • autism spectrum disorder
  • primary care
  • machine learning
  • healthcare
  • electronic health record
  • deep learning
  • big data
  • quality improvement
  • artificial intelligence