Abstract
De-identification of clinical notes is a special case of named entity recognition. Supervised machine-learning (ML) algorithms have achieved promising results for this task. However, ML-based de-identification systems often require annotating a large number of clinical notes of interest, which is costly. Domain adaptation (DA) is a technology that enables learning from annotated datasets from different sources, thereby reducing annotation cost required for ML training in the target domain. In this study, we investigate the use of DA methods for deidentification of psychiatric notes. Three state-of-the-art DA methods: instance pruning, instance weighting, and feature augmentation are applied to three source corpora of annotated hospital discharge summaries, outpatient notes, and a mixture of different note types written for diabetic patients. Our results show that DA can increase deidentification performance over the baselines, indicating that it can effectively reduce annotation cost for the target psychiatric notes. Feature augmentation is shown to increase performance the most among the three DA methods. Performance variation among the different types of clinical notes is also observed, showing that a mixture of different types of notes brings the biggest increase in performance.
INTRODUCTION
Clinical narratives contain detailed information of patients and are a valuable data source for clinical research. However, they often contain protected health information (PHI) such as names and addresses, that have the potential risk of revealing the patient’s identity. Therefore, de-identification (removing PHI) of clinical documents is often required. As manual review to remove PHI is time-consuming and costly, significant effort has gone into developing automatic de-identification methods1-18.
Early de-identification systems were often rule-based, using hand-coded rules and specialized dictionaries to identify PHI mentions in clinical notes1-6,19. More recent systems have adopted supervised machine learning (ML) algorithms, treating de-identification as a token classification or a sequence labeling problem. Various ML algorithms have been studied, including conditional random fields7-9, support vector machines10, decision trees11,12, hidden Markov models20, and recurrent neural networks13. Often, the ML-based systems are augmented with rule-based parts, forming hybrid approaches21-23. Such ML-based de-identification systems have shown high performance, similar to annotation quality done by humans14.
Community challenges on de-identification have also been held. In the 2006 i2b2 challenge Task 117, a set of discharge summaries were provided for training and testing de-identification systems. The discharge summaries were annotated with eight PHI categories, which comprised a subset of the eighteen categories listed by HIPAA (the Health Insurance Portability and Accountability Act)24. The Track 1 of the 2014 i2b2/UTHealth shared task15 also addressed a de-identification task where PHIs were to be removed in clinical notes of diabetes patients. Unlike the challenge in 2006, a stricter set of PHI categories was used, defining seven main PHI categories and 30 subcategories. The most recent community challenge on de-identification is the 2016 CEGS N-GRID Track 118, where a set of psychiatric notes were provided for system development. In these challenges, ML-based systems have shown the best performances15,17,18. We participated in the 2016 challenge, and developed an ML-based hybrid system that achieved second best performance25.
However, ML-based systems depend heavily on large amounts of training data that is highly similar to the test data. This is problematic for clinical text, which consists of diverse types of clinical documents with different writing styles across institutions. For instance, a de-identification system trained on discharge summaries may show much lower performance on psychiatric notes, or a system trained on discharge summaries from one hospital may show much lower performance on discharge summaries from a different hospital. To achieve optimal performance, it is ideal to annotate individual training sets for each type of clinical note. However, manual annotation of clinical text is expensive and complicated16, thus making such extensive annotation approaches unrealistic and prohibitive for wide adoption of ML-based de-identification systems. Therefore, methods that can leverage existing annotated corpora to quickly build a de-identification system for any target domain of interest are highly desirable.
Here we propose to use domain adaptation (DA) technologies to address this problem. Domain adaptation maximizes the use of existing data (source) for the data of interest (target) by learning useful aspects of source data and largely ignoring the source aspects that cannot contribute to the model. Through domain adaptation, ML-based de-identification systems can efficiently leverage existing de-identification corpora from other sources to quickly build high-performance models using less annotated samples from the target domain, thus reducing the annotation cost.
Domain adaptation has been applied for various natural language processing tasks in the biomedical domain. Dahlmeier and Ng26 evaluated three domain adaptation methods for semantic role labeling (SRL) task in biomedical literature. A general domain SRL corpus was used as the source, and a biomedical domain SRL corpus was used as the target. Their result shows that domain adaptation can leverage existing general domain SRL resource and greatly improve the performance of SRL in the biomedical domain. Zhang et al.27 also conducted a similar set of experiments for SRL of clinical notes. An SRL corpus of biomedical literature as well as general domain corpora were used as sources. Their result also shows that domain adaptation can boost the SRL performance through utilization of existing out-of-domain data. Ramesh et al.28 applied domain adaptation for automatic discourse connective detection in biomedical text. Here, a general domain corpus was used as the source. They also showed that combination of different domain adaptation methods can further improve the performance.
In this study, we propose to investigate how to leverage three existing corpora (source domains) to improve deidentification in psychiatric notes (the target domain). We implement three domain adaptation algorithms: instance pruning, instance weighting, and feature augmentation for this task. The de-identification system that we developed for 2016 challenge is utilized for this study25. Our study shows that domain adaptation can boost the deidentification performance when it is compared with baselines that use only the psychiatric notes or simply combine annotated data from target and source, indicating that it can effectively reduce annotation cost. Moreover, we also find that feature augmentation performed better than the other two DA methods and existing data from different source domains contribute differently to de-identification of psychiatric notes. To the best of our knowledge, this is the first detailed study on domain adaptation for automatic de-identification of clinical text.
METHODS
Datasets
As target data, a corpus of psychiatric notes provided by the 2016 CEGS N-GRID shared task Track 118 is used (Psychiatric notes). The corpus is divided into a training set, a development set, and a validation set, for training, parameter optimization, and measurement of final performance, respectively. Three different source corpora are used: 1) discharge summaries from the 2006 i2b2 de-identification challenge Task 117 (Discharge summaries), 2) outpatient notes from the University of Texas Health Science Center at Houston (UTHealth) with manually annotated PHIs (Outpatient notes), 3) diabetes patients’ notes from the 2014 i2b2/UTHealth shared task on deidentification (Track 1)15, consisting of a mixture of different types of clinical notes such as admission notes, emergency visit notes, and discharge summaries (Mixture notes). These three source corpora have different samples sizes (Table 1), which may affect the performance. To compare them fairly, we decided to use equal sample size for each source corpus. As outpatient notes from UTHealth is the smallest corpus, we reduced the sizes of the other two source corpora to contain similar numbers of tokens as that of outpatient corpus, by randomly sampling notes from the original corpora. Table 1 shows the statistic of the four corpora used in this study. As there are slight differences among annotation guidelines for these four corpora, we define a common set of PHIs for all four datasets that consists of eight PHI categories, or Date, Doctor, Phone, Patient, Location, ID, Hospital, and Age.
Table 1.
Statistics of the source and the target corpora. The sizes of original datasets, as provided by the challenges or the institutes, are compared to the sizes of sampled subsets used in this study.
Description | Outpatient notes | Discharge summaries | Mixture notes | Psychiatric notes | ||
---|---|---|---|---|---|---|
Source | Source | Source | Target (training) | Target (development) | Target (validation) | |
# Document in original dataset | 325 | 889 | 1,304 | 600 | 400 | |
# Documents used in this study | 325 | 604 | 470 | 600 | 100 | 300 |
# PHI mentions used in this study | 10,454 | 13,193 | 8,863 | 11,937 | 2,073 | 5,668 |
# Tokens used in this study | 413,657 | 413,841 | 414,944 | 1,430,390 | 241,154 | 714,119 |
ML-based de-identification system
A modified version of the de-identification system that was developed for the 2016 CEGS N-GRID challenge18 is used in this study. The system employs a single conditional random fields (CRF) model. CRF is a standard model widely used in many state-of-the-art de-identification systems7-9,21-23. Our system first pre-processes the notes using the CLAMP tokenizer (http://clamp.uth.edu), OpenNLP POS tagger (http://opennlp.sourceforge.net), and a dictionary-based section parser that uses a dictionary of standard section names in clinical notes (a modified version of CLAMP section parser). After pre-processing, a token-based CRF tagger is employed to identify PHI mentions using the BIO tagging scheme. CRFSuite (http://www.chokkan.org/software/crfsuite/) is used as the implementation of the CRF algorithm. Features shown to be effective in 2016 challenge are employed, excluding those that are applicable only to psychiatric notes, or relevant only to PHI categories not targeted in this work25. Table 2 shows the features.
Table 2.
Features used for our ML-based de-identification system
Feature | Description |
---|---|
Word shape | Orthographic forms of tokens (by substituting uppercase letters, lowercase letters, and numbers with ‘A’, ‘a’, and ‘#’, respectively) |
Surface N-grams | Token, POS, word shape N-grams |
Prefix/suffix | Prefix and suffix of each token |
Token regex | Token-level regular expression matching results |
Sentence info. | Sentence length and shape (i.e., ending with enumeration indicator or not) |
Section info. | Section name |
Dictionary matching | Matching results to a dictionary of frequent PHI terms |
Word representations | Brown clusters29, random indexings30, word2vec embeddings31, and GloVe embeddings32 |
General domain NER | Outputs of Stanford NER33 |
Semantic role labeling | Outputs of SENNA semantic role labeling34 |
Domain adaptation methods
Three state-of-the-art domain adaptation methods are implemented. The domain adaptation methods are described below. Note that we focus on supervised domain adaptation methods that require small amount of labeled (annotated) target data for training, as opposed to unsupervised domain adaptation methods that utilize unlabeled target data.
Instance weighting: Instance weighting35 assigns higher weights to instances from target data than to instances from source data. The weighted instances from both the target and the source are combined into a single training dataset to train a classifier. The assumption is that the data distribution in the source domain does not match well to the data distribution in the target domain, and the weights guide the ML algorithms to learn more from the target domain than from the source domain.
Instance pruning: Instance pruning35 first trains a classifier using only the labeled target data, and uses the classifier to predict labels for the source data. Then, it selects top k instances that are wrongly predicted from the source data ranked by the prediction confidence, and removes the k instances from the source data. The modified source data is then directly combined with the target training data to train the final classifier. The intuition is that the top k wrongly classified instances are very different from the target data.
Feature augmentation: A feature augmentation algorithm named EasyAdapt36 is employed. The algorithm augments features from the target and the source data with general version features. Formally, given a target feature vector Xt and a source feature vector Xs, EasyAdapt generates augmented features EA(Xt) and EA(XS) described as follows:
where 0 is a zero vector of length |X|. As a result, three versions of feature sets are produced: general, source-specific, and target-specific. The general features are expected to get higher weights for instances that are common for both target and source. On the other hand, the target-specific or the source-specific features are expected to gain weights for unique instances for target or source.
Experimental setup
In order to validate our CRF-based de-identification system, in-domain performance of the system is measured. The original datasets of the three sources (without subset sampling) are used to perform 10-fold cross validation of the system on source domains. In-domain performance on the target psychiatric notes is measured by training the system on the training set and evaluating on the validation set, to provide a baseline before any DA method is applied.
The three domain adaptation algorithms are evaluated on each of the three sources. The algorithms are compared to three baselines: “target only” (TO), “source only” (SO), and “source & target” (S&T). For TO, no source data is used and the CRF is trained on the psychiatric notes training set only. For SO, only the source data is used for training, without any labeled target data. Finally, for S&T, the source data and the target data are simply merged together without applying any domain adaptation method.
In order to determine the extent of savings conferred by the domain adaptation methods in terms of annotations of the target data, performance of domain adaptation methods as well as the TO and the S&T baselines are reported at different amounts of labeled target data – with an increase of 10% of psychiatric training data. Learning curves that plot the amount of labeled target data and the performances of the de-identification systems (F-score) are reported.
For all the experiments, the psychiatric notes training set is used as the labeled target data, the validation set is used for measuring the performance, and the development set is used for parameter selection, including k for instance pruning (top k instances to remove), and w for instance weighting (weight w<1 to be assigned to the source instances).
Evaluation
For evaluation, we follow the method used in previous de-identification challenges15,17. Micro-averaged precision, recall, and F-score are reported. System outputs are considered to be correct only when both the type and the character offsets match the gold standard. Statistical significance is determined by approximate randomization test37 using N=9999 and α=0.1.
In-domain de-identification performance
Table 3 shows the performance of CRF-based de-identification systems when trained and tested on data from the same domain. For outpatient notes, discharge summaries, and mixture notes, the system shows performance comparable to the results from previous studies (e.g., F-score 93.23 for HIPPA-PHI categories by the third best system in the 2014 i2b2 challenge15).
Table 3.
In-domain performance of CRF-based de-identification system on UT, 2006, 2014, and 2016 datasets.
P | R | F | |
---|---|---|---|
Outpatient notes | 98.11 | 96.72 | 97.40 |
Discharge summaries | 98.69 | 97.46 | 98.07 |
Mixture notes | 96.84 | 92.54 | 94.64 |
Psychiatric notes | 93.17 | 86.82 | 89.88 |
De-identification performance using domain adaptation
Table 4 shows the de-identification performance with domain adaptation methods using the three sources. When trained only on the psychiatric notes training set (i.e., the TO baseline), the system shows an F-score of 89.88. When only the source corpora are used (i.e., the SO baseline), the performance dropped down to 59.51 (outpatient notes). Simply merging source and target (i.e., the S&T baseline) improves the performance in the case of mixture notes (compared to TO), but is worse than TO for outpatient notes and discharge summaries. DA methods have varied performance; but feature augmentation method shows statistically significant performance increase over both the S&T baseline (with outpatient notes or discharge summaries as source) and the TO baseline (for all three sources), achieving best F-scores for all three sources. Overall, the highest F-score 90.40 is achieved when feature augmentation is applied using mixture notes as source.
Table 4.
Performance with and without domain adaptation using outpatient notes, discharge summaries, and mixture notes. Statistical significance is marked with * when the performance is higher than the target only baseline, and with ◊ when higher than the source & target baseline. Best F-score for each source is marked in bold.
Source | DA method | P | R | F |
---|---|---|---|---|
- | Target only | 93.17 | 86.82 | 89.88 |
Outpatient notes | Source only | 92.94 | 43.77 | 59.51 |
Source & target | 93.64 | 85.89 | 89.60 | |
Instance pruning | 93.49 | 86.61 | 89.92◊ | |
Instance weighting | 93.14 | 86.73 | 89.82◊ | |
Feature augmentation | 93.30 | 87.24 | 90.17*◊ | |
Discharge summaries | Source only | 70.54 | 56.18 | 62.54 |
Source & target | 93.22 | 85.89 | 89.40 | |
Instance pruning | 93.18 | 86.29 | 89.60 | |
Instance weighting | 92.45 | 86.24 | 89.24 | |
Feature augmentation | 93.32 | 87.26 | 90.19*◊ | |
Mixture notes | Source only | 82.82 | 74.26 | 78.31 |
Source & target | 93.37 | 86.96 | 90.05 | |
Instance pruning | 93.32 | 86.98 | 90.04 | |
Instance weighting | 92.96 | 87.63 | 90.22 | |
Feature augmentation | 93.46 | 87.53 | 90.40* |
Table 4 also shows that the three sources exhibit quite different performance, even though they are equal-sized. With the SO baseline, mixture notes shows the highest performance, followed by discharge summaries, and then outpatient notes (statistically significant difference by approximate randomization test with N=9999 and α=0.1). This performance gap among equal-sized sources indicates that mixture notes may be the most similar to the psychiatric notes and outpatient notes may be the most different. The performance difference between outpatient notes and discharge summaries disappears when S&T or feature augmentation domain adaptation is applied. However, mixture notes still has significantly higher performance than outpatient notes or discharge summaries even when S&T or feature augmentation is applied.
Table 5 compares the performance of the TO baseline and feature augmentation for each PHI category. When outpatient notes or mixture notes is used as source, feature augmentation improves performance for all PHI categories. In particular, ID category shows the biggest improvement over the TO baseline (4.04 increase in the F-score with outpatient notes, and 6.04 with mixture notes). Note that ID is one of the sparsest categories in the psychiatric notes training set (0.58% of all the PHI mentions). Discharge summaries shows somewhat different result than outpatient notes and mixture notes. While most of the PHI categories show performance increase, ID and Phone show performance drop. Lastly, note that the F-score for Age category is 0.00 regardless of which domain adaptation method is applied or which source is used. We conjecture that this is due to the severe sparsity of Age PHI mentions even after addition of source data through domain adaptation; Age constitutes less than 0.1% of all PHI mentions in all the four corpora.
Table 5.
Performance comparison of feature augmentation (FA) methods that use the three sources to the target only baseline, per each PHI category. F-scores are shown. The amount of performance change by feature augmentation over the target only baseline is shown inside the parentheses.
PHI category | Target only | FA w/outpatient notes | FA w/discharge summaries | FA w/mixture notes |
---|---|---|---|---|
Age | 0.00 | 0.00 (+0.00) | 0.00 (+0.00) | 0.00 (+0.00) |
Date | 96.37 | 96.61 (+0.24) | 96.72 (+0.35) | 96.85 (+0.48) |
Doctor | 94.63 | 94.79 (+0.16) | 95.00 (+0.37) | 94.89 (+0.26) |
Hospital | 80.33 | 81.10 (+0.77) | 80.8 (+0.47) | 81.53 (+1.20) |
ID | 58.82 | 62.86 (+4.04) | 52.94 (-5.88) | 64.86 (+6.04) |
Location | 83.95 | 84.20 (+0.25) | 84.34 (+0.39) | 84.48 (+0.53) |
Patients | 82.37 | 82.47 (+0.10) | 82.45 (+0.08) | 82.47 (+0.10) |
Phone | 96.59 | 96.59 (+0.00) | 96.05 (-0.54) | 96.59 (+0.00) |
Overall | 89.88 | 90.17 (+0.29) | 90.19 (+0.31) | 90.40 (+0.52) |
Figure 1 compares de-identification performance of feature augmentation with TO and S&T, for the three sources, with increasing amounts of target training data. When outpatient notes or discharge summaries is used as source, S&T outperforms TO when the amount of target training data is smaller (30% to 40% of the whole target training dataset), but S&T shows lower performance than TO as the amount of target training data increases. When mixture notes is used as source, S&T shows higher performance than the TO baseline regardless of the amount of target training data, but the performance of S&T converges to that of TO as the amount of target training data approaches to 100%. Feature augmentation improves performance over both the TO and S&T baselines regardless of the amount of target training data, when outpatient notes or discharge summaries is used as source. Interestingly, with mixture notes as source, S&T shows higher performance than feature augmentation when less than or equal to 60% of the target training data is used. However, feature augmentation starts to show better performance than S&T when more than 60% of target training data is used. According to Figure 1, if we want to achieve an F-score of 89 for the target domain, it will require annotating 80% of training samples when only target domain data is used. However, if we apply feature augmentation to existing annotated sources such as discharge summaries, it only requires approximately 65% of annotated training samples from the target domain, indicating a 18.8% saving on annotation effort.
Figure 1.
Learning curves of the de-identification systems for TO, S&T, and feature augmentation (FA) for three sources: outpatient notes, discharge summaries, and mixture notes. X-axis denotes the percentage of labeled target training data, and Y-axis denotes de-identification F-score.
DISCUSSION
In this study, we proposed to use domain adaptation techniques to maximize the use of existing annotated datasets for de-identification of clinical notes. Psychiatric notes, an important but understudied type of notes, are used as the target. Three existing corpora, each consisting of discharge summaries, outpatient notes, and a mixture of various types of notes written for diabetes patients, are used as sources. Three state-of-the-art domain adaptation methods, instance weighting, instance pruning, and feature augmentation, are tested. It is shown that feature augmentation can increase de-identification F-score over the TO and the S&T baselines, indicating the potential of reducing annotation cost for building ML-based de-identification systems. We also find that the performance increase by domain adaptation could depend on which type of clinical notes is used as source.
In previous work that employs domain adaptation for natural language processing (NLP) tasks in the biomedical domain, general domain datasets or biomedical literature were used as sources26-28. In contrast, in this work, clinical notes of other types than psychiatric notes are used as sources, which might be regarded as being “in the same domain” as the psychiatric notes. However, in most of the cases, simply combining annotated notes of different types does not show any improvement over the TO baseline. On the other hand, feature augmentation domain adaptation method is shown to be able to improve over both the TO and the S&T baseline. This indicates that different note types should indeed be considered as separate domains for the de-identification task.
Based on the results of our experiments, the best domain adaptation algorithm for de-identification task among the three algorithms is shown to be feature augmentation. While instance pruning and instance weighting do not show performance improvement over the TO baseline, feature augmentation achieves statistically significant performance improvement over the TO baseline regardless of the type of source notes. Such performance difference among the domain adaptation algorithms may come from the fact that feature augmentation allows adaptation per feature basis, as opposed to the other two methods that perform adaptation per instance basis. However, previous work that evaluates similar sets of domain adaptation algorithms for different biomedical NLP tasks reports different results than this work. Instance pruning shows the best performance for semantic role labeling (SRL) of biomedical literature26, feature augmentation for SRL of clinical notes27, and instance weighting for discourse connective detection in biomedical literature28. Thus, it seems that the performance of domain adaptation algorithms varies depending on the characteristics of both the source and target datasets and the NLP task at hand.
It is observed that different types of notes contribute differently for de-identification of psychiatric notes. In this task, mixture notes shows the biggest performance improvement through domain adaptation, achieving significantly better performance than outpatient notes or discharge summaries. Moreover, when target data size is small, simply merging mixture notes and target data works better than domain adaptation methods. This finding is interesting and requires further investigation to provide more insights. One notable fact is that psychiatric notes and mixture notes are both from the same organization, the Partners Healthcare System; while discharge summaries and outpatient notes contain documents from other institutions than Partners. We leave further investigation on domain similarities among different types of clinical notes as future work.
The size of the target corpus used in this study is much larger than those of the source corpora (in terms of number of tokens). This is different from the typical use cases of domain adaptation, where the size of source is much larger than the size of target. While it is interesting to see that domain adaptation can increase de-identification performance even when source sample size is much smaller than the target size, we expect a larger size of source data to achieve even higher performance increase through domain adaptation. In fact, an additional experiment using feature augmentation with a combined source of all existing datasets shows F-score 90.67, the highest among all the experiments done in this study. However, the effect of much larger source to de-identification of psychiatrics notes through domain adaptation is still to be investigated.
We compared de-identification errors produced by systems with domain adaptation to errors produced by the TO baseline system. While less number of errors is produced when domain adaptation is used, the distribution of the error types is not changed much. For instance, with or without domain adaptation, acronyms of hospital names such as “MBH” or “SAH” is one of the major sources of errors. For future work, we plan to test combinations of different domain adaptation algorithms28 as well as to incorporate domain similarity38 between target and source into the domain adaptation processes.
CONCLUSION
In this paper, we investigated the use of domain adaptation methods for de-identification of psychiatric notes. Three state-of-the-art domain adaptation methods were evaluated and three source datasets consisting of discharge summaries, outpatient notes, and a mixture of different types of clinical notes were studied. Our results show that domain adaptation can achieve better performance than simply merging the source and the target data, indicating the potential of domain adaptation to reduce the annotation cost when building automatic de-identification systems for new types of clinical notes.
Acknowledgements
The authors were supported by NIH grants 5R01LM010681, 1R01GM102282, NU24 CA194215, 1U24CA194215-01A1, and 4R00LM012104. The CEGS N-GRID challenge was supported by NIH grants P50MH106933, 4R13LM011411.
References
- 1.Sweeney L. Replacing personally-identifying information in medical records, the Scrub system. In proceedings of AMIA symposium; 1996. pp. 333–337. [PMC free article] [PubMed] [Google Scholar]
- 2.Ruch P, Baud RH, Rassinoux AM, Bouillon P, Robert G. Medical document anonymization with a semantic lexicon. In proceedings of AMIA symposium; 2000. pp. 729–733. [PMC free article] [PubMed] [Google Scholar]
- 3.Thomas SM, Mamlin B, Schadow G, McDonald C. A successful technique for removing names in pathology reports using an augmented search and replace method. In proceedings of AMIA symposium; 2002. pp. 777–781. [PMC free article] [PubMed] [Google Scholar]
- 4.Gupta D, Saul M, Gilbertson J. Evaluation of a deidentification (De-Id) software engine to share pathology reports and clinical documents for research. Am J Clin Pathol. 2004;121(2):176–186. doi: 10.1309/E6K3-3GBP-E5C2-7FYU. [DOI] [PubMed] [Google Scholar]
- 5.Beckwith BA, Mahaadevan R, Balis UJ, Kuo F. Development and evaluation of an open source software tool for deidentification of pathology reports. BMC Medical Informatics and Decision Making. 2006;6(1):12. doi: 10.1186/1472-6947-6-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Neamatullah I, Douglass MM, Lehman L-WH, et al. Automated de-identification of free-text medical records. BMC Medical Informatics and Decision Making. 2008;8(1):641–17. doi: 10.1186/1472-6947-8-32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Gardner J, Xiong L. HIDE: An Integrated System for Health Information DE-identification. In proceedings of CBMS. 2008:254–259. [Google Scholar]
- 8.Aberdeen J, Bayer S, Yeniterzi R, et al. The MITRE Identification Scrubber Toolkit: Design, training, and assessment. International Journal of Medical Informatics. 2010;79(12):849–859. doi: 10.1016/j.ijmedinf.2010.09.007. [DOI] [PubMed] [Google Scholar]
- 9.Benton A, Hill S, Ungar L, et al. A system for de-identifying medical message board text. BMC Bioinformatics. 2011;12 Suppl 3(3):S2. doi: 10.1186/1471-2105-12-S3-S2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Uzuner O, Sibanda TC, Luo Y, Szolovits P. A de-identifier for medical discharge summaries. Artificial Intelligence in Medicine. 2008;42(1):13–35. doi: 10.1016/j.artmed.2007.10.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.McMurry AJ, Fitch B, Savova G, Kohane IS, Reis BY. Improved de-identification of physician notes through integrative modeling of both public and private medical text. BMC Medical Informatics and Decision Making. 2013;13(1):112. doi: 10.1186/1472-6947-13-112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Szarvas G, Farkas R, Busa-Fekete R. State-of-the-art Anonymization of Medical Records Using an Iterative Machine Learning Framework. J Am Med Inform Assoc. 2007;14(5):574–580. doi: 10.1197/j.jamia.M2441. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Dernoncourt F, Lee JY, Szolovits P, Uzuner OZ. De-identification of Patient Notes with Recurrent Neural Networks. 2016 Jun; doi: 10.1093/jamia/ocw156. arXiv:1606.03475[cs.CL] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Deleger L, Molnár K, Savova G, et al. Large-scale evaluation of automated clinical note de-identification and its impact on information extraction. J Am Med Inform Assoc. 2013;20(1):84–94. doi: 10.1136/amiajnl-2012-001012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Stubbs A, Kotfila C, Uzuner O. Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1. Journal of Biomedical Informatics. 2015;58:S11–S19. doi: 10.1016/j.jbi.2015.06.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Stubbs A, Uzuner O. Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus. Journal of Biomedical Informatics. 2015;58(Suppl):S20–S29. doi: 10.1016/j.jbi.2015.07.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Uzuner O, Luo Y, Szolovits P. Evaluating the state-of-the-art in automatic de-identification. J Am Med Inform Assoc. 2007;14(5):550–563. doi: 10.1197/jamia.M2444. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Stubbs A, Filannino M, Uzuner O. De-identification of psychiatric intake records: Overview of 2016CEGS N-GRID Shared Tasks Track 1. Journal of Biomedical Informatics. 2017 doi: 10.1016/j.jbi.2017.06.011. (to be published) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Friedlin FJ, McDonald CJ. A software tool for removing patient identifying information from clinical documents. J Am Med Inform Assoc. 2008;15(5):601–610. doi: 10.1197/jamia.M2702. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Chen T, Cullen RM, Godwin M. Hidden Markov model using Dirichlet process for de-identification. Journal of Biomedical Informatics. 2015;58(Suppl):S60–S66. doi: 10.1016/j.jbi.2015.09.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Liu Z, Chen Y, Tang B, et al. Automatic de-identification of electronic medical records using token-level and character-level conditional random fields. Journal of Biomedical Informatics. 2015;58:S47–S52. doi: 10.1016/j.jbi.2015.06.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Yang H, Garibaldi JM. Automatic detection of protected health information from clinic narratives. Journal of Biomedical Informatics. 2015;58:S30-S38. doi: 10.1016/j.jbi.2015.06.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Dehghan A, Kovacevic A, Karystianis G, Keane JA, Nenadic G. Combining knowledge-and data-driven methods for de-identification of clinical narratives. Journal of Biomedical Informatics. 2015;58:S53–S59. doi: 10.1016/j.jbi.2015.06.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Health Insurance Portability and Accountability Act (HIPAA) Washington, D.C.: 2004. http://purl.fdlp.gov/GPO/gpo10291. [Google Scholar]
- 25.Lee H-J, Wu Y, Zhang Y, Xu J, Roberts K, Xu H. A hybrid approach to automatic de-identification of psychiatric notes. Journal of Biomedical Informatics. 2017 doi: 10.1016/j.jbi.2017.06.006. (to be published) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Dahlmeier D, Ng HT. Domain adaptation for semantic role labeling in the biomedical domain. Bioinformatics. 2010;26(8):1098–1104. doi: 10.1093/bioinformatics/btq075. [DOI] [PubMed] [Google Scholar]
- 27.Zhang Y, Tang B, Jiang M, Wang J, Xu H. Domain adaptation for semantic role labeling of clinical text. J Am Med Inform Assoc. 2015;22(5):967–979. doi: 10.1093/jamia/ocu048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Ramesh BP, Prasad R, Miller T, Harrington B, Yu H. Automatic discourse connective detection in biomedical text. J Am Med Inform Assoc. 2012;19(5):800–808. doi: 10.1136/amiajnl-2011-000775. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Brown PF, deSouza PV, Mercer RL, Pietra VJD, Lai JC. Class-based n-gram models of natural language. Computational Linguistics. 1992;18(4):467–479. [Google Scholar]
- 30.Lund K, Burgess C. Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instruments, & Computers. 1996;28(2):203–208. [Google Scholar]
- 31.Mikolov T, Sutskever I, 0010 KC, Corrado GS, Dean J. Distributed Representations of Words and Phrases and their Compositionality. In proceedings of NIPS. 2013:3111–3119. [Google Scholar]
- 32.Pennington J, Socher R, Christopher M. GloVe: Global Vectors for Word Representation. In proceedings of EMNLP. 2014:1–12. [Google Scholar]
- 33.Finkel JR, Grenager T, Manning C. Incorporating non-local information into information extraction systems by Gibbs sampling. In proceedings of ACL. 2005:363–370. doi: 10.3115/1219840.1219885. [DOI] [Google Scholar]
- 34.Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P. Natural Language Processing (Almost) from Scratch. JMLR. 2011;12(Aug):2493–2537. [Google Scholar]
- 35.Jiang J, Zhai C. Instance Weighting for Domain Adaptation in NLP. In proceedings of ACL. 2007:264–271. [Google Scholar]
- 36.Daumé H., III Frustratingly Easy Domain Adaptation. arXiv. 2009 Jul [Google Scholar]
- 37.Chinchor PD. THE STATISTICAL SIGNIFICANCE OF THE MUC-4 RESULTS. In proceedings of MUC. 1992:30–50. [Google Scholar]
- 38.Plank B, van Noord G. Effective Measures of Domain Similarity for Parsing. In proceedings of ACL. 2011:1566–1576. [Google Scholar]