You-Qian Lee, Bo-Hong Wang, Chu-Hsien Su, Pei-Tsz Chen, Wu-Qing Lin, Chi-Shin Wu, Hong-Jie Dai
Abstract
Electronic health records (EHRs) at medical institutions provide valuable sources for research in both clinical and biomedical domains. However, before such records can be used for research purposes, protected health information (PHI) mentioned in the unstructured text must be removed. In Taiwan’s EHR systems. the unstructured EHR texts are usually represented in the mixing of English and Chinese languages, which brings challenges for de-identification. This paper presented the first study, to the best of our knowledge, of the contruction of a code-mixed EHR deidentification corpus and the evaluation of different mature entity recognition methods applied for the code-mixed PHI recognition task.
comments powered by Disqus