Application of Efficient Data Cleaning Using Text Clustering for Semistructured Medical Reports to Large-Scale Stool Examination Reports: Methodology Study.
Hyun-Ki WooKyunga KimKyeongMin ChaJin-Young LeeHan Song MunSoo Jin ChoJi In ChungJeung Hui PyoKun-Chul LeeMira KangPublished in: Journal of medical Internet research (2019)
Our data cleaning process based on the combinatorial use of key collision and nearest neighbor methods provides an efficient cleaning of large-scale text data and hence improves data accuracy.