Abstract
Background
Full syntactic parsing of clinical text as a part of clinical natural language processing (NLP) is critical for a wide range of applications, such as identification of adverse drug reactions, patient cohort identification, and gene interaction extraction. Several robust syntactic parsers are publicly available to produce linguistic representations for sentences. However, these existing parsers are mostly trained on general English text and often require adaptation for optimal performance on clinical text. Our objective was to adapt an existing general English parser for the clinical text of operative reports via lexicon augmentation, statistics adjusting, and grammar rules modification based on a set of biomedical text.
Method
The Stanford unlexicalized probabilistic context-free grammar (PCFG) parser lexicon was expanded with SPECIALIST lexicon along with statistics collected from a limited set of operative notes tagged with a two of POS taggers (GENIA tagger and MedPost). The most frequently occurring verb entries of the SPECIALIST lexicon were adjusted based on manual review of verb usage in operative notes. Stanford parser grammar production rules were also modified based on linguistic features of operative reports. An analogous approach was then applied to the GENIA corpus to test the generalizability of this approach to biomedical text.
Results
The new unlexicalized PCFG parser extended with the extra lexicon from SPECIALIST along with accurate statistics collected from an operative note corpus tagged with GENIA POS tagger improved the parser performance by 2.26% from 87.64% to 89.90%. There was a progressive improvement with the addition of multiple approaches. Most of the improvement occurred with lexicon augmentation combined with statistics from the operative notes corpus. Application of this approach on the GENIA corpus showed that parsing performance was boosted by 3.81% with a simple new grammar and the addition of the GENIA corpus lexicon.
Conclusion
Using statistics collected from clinical text tagged with POS taggers along with proper modification of grammars and lexicons of an unlexicalized PCFG parser can improve parsing performance.
Keywords: probabilistic context-free grammar (PCFG), unlexicalized parser, parser adaption, Natural language processing, operative reports, SPECIALIST
1 Introduction
In the clinical domain, the rapid proliferation of patient documents within electronic health record (EHR) systems and the need to utilize these documents for secondary purposes such as disease surveillance, population health assessment, clinical research, and quality measurement has increased the need for automated information extraction and other natural language processing (NLP) techniques. A large amount of detailed information in EHRs is stored in narrative documents, which are not directly accessible to computerized applications without specialized clinical NLP and text mining tools. NLP research to process clinical text effectively aims to improve these techniques for the specific intricacies of clinical documents.
Full syntactic parsing is a very important step towards automated natural language understanding. Full syntactic parsing of texts provides deep linguistic features such as predicate-argument structure, voice, phrasal categories, position, and path. Moreover, incorporation of full syntactic parsing into information extraction systems has been shown to improve their performance[1–7]. Over the past decade, parsing systems have improved dramatically. Several robust parsers such as Charniak/Johnson’s parser[8] and Stanford unlexicalized probabilistic context-free grammar (PCFG) parser[9] are available to produce linguistic representations for narrative text. Most of these modern parsers rely on large corpora and tag sets such as the Penn Treebank[10] to obtain a grammar with reasonable coverage and to acquire an accurate estimation of an appropriate statistical parsing model.
While they perform well on general English, these parsers require special development and adaptation for clinical text because clinical sublanguage differs from general English[11]. For instance, specialized domain terms and syntactic structures not typically found in general English are prevalent in clinical documents. Physicians who create clinical notes often have limited time and therefore frequently omit information that can be inferred from context. Out-of-the-box parsers were highly tuned to general English formal text[12–18]. Their performance is suboptimal for highly specialized clinical documents. However, since manually annotating large numbers of parse trees is costly and may not be practical for fully supervised training within a new domain or subdomain, parser adaption is one approach proposed by researchers to improve parser performance for a domain of interest.
Various methodologies have been proposed for parser domain adaption. These different methods fall broadly into three categories: supervised domain adaption[19–21], semi-supervised domain adaption[22] and unsupervised domain adaption[12–15, 18, 23]. In supervised domain adaptation, a limited amount of labeled data from the new target domain is used to adapt the models trained on larger out-of-domain datasets. In the semi-supervised setting, the goal is to use a small amount of labeled target domain data together with lots of unlabeled data for domain adaption. In contrast, unsupervised domain adaptation relies on only unlabeled data, which is usually easy to acquire from the target domain.
In principle, using a combination of limited labeled source data together with the unlabeled target data is an effective approach to adapt an existing general English parser to the target domain. This is a much more realistic approach, as labeled data are expensive to obtain. Over the last decade, a number of techniques have been proposed for parser adaption without large amounts of manually labeled target text. Self-training is a process to parse unlabeled target text with an existing parser and add the parses to the training corpus to create a new parsing model. Recent work by McClosky[18, 24] has demonstrated that the performance of the Charniak/Johnson lexicalized PCFG parser on a target domain can be improved by including extra target domain data labeled by existing parser from the Brown corpus[24] and Medline[18]. Lexicon augmentation is another frequently used technique for parser adaption by adding extra lexical items from domain sources (e.g., Unified Medical Language System (UMLS) SPECIALIST lexicon[25]) into the existing parser lexicon. Several efforts have been devoted to improve parsing performance by extending parser lexicon such as Stanford PCFG parser, Link Grammar parser and Combinatory Categorical Grammar parser. Finally, full parsing based on a part-of-speech (POS) tagger adapted to the target domain is also proved to be helpful for domain adaption[12, 14]. A POS tagger retrained on the target domain, which is usually less expensive than retraining a parser, can provide more accurate POS tags for the back-end parsing process.
2 Background
2.1 Unlexicalized parsing and lexicalized parsing
Full syntactic parsing results in a hierarchical tree-like representation of the syntactic structure of a piece of text according to some formal grammar such as, for example, a constituency grammar[26]. Figure 1 shows the constituency parse tree of the sentence: “The eye was patched with hyoscine ophthalmic drops.”
Figure 1.

Constituent (phrase structure) tree for the sentence: “The eye was patched with hyoscine ophthalmic drops.” *S: Sentence; NP: Noun phrase; VP: Verb phrase; DT: Determiner; NN: Noun, singular or mass; VBD: Verb, past tense; IN: Preposition or subordinating conjunction; JJ: Adjective; VP: Verb phrase.
As shown in Figure 1, the tree representation of the input sentence from a parser conveys useful information such as the constituent boundaries, the grammatical relationship between constituents, which is expressed by the path from one constituent to another, the head word of each candidate constituent and a number of other features.
In formal linguistics, Context Free Grammars[27] (CFG) are formal systems used to model natural languages. CFGs contain a set of production rules (or recursive rewrite rules) that are used to generate linguistic expressions from underlying constituent building blocks. Formally, a CFG is represented as a 4-tuple consisting of 4 sets: G = (N, Σ, R, S) where:
N is a finite set of non-terminal symbols.
Σ is a finite set of terminal symbols.
R is a finite set of rules of the form X →Y1Y2…Yn, where X ∈ N, n ≥ 0, and Yi ∈ (N ∪ Σ) for i =1…n.
S ∈ N is a distinguished start symbol.
For an input sequence of words, a parse tree can be derived according to the CFG production rules. Figure 2 exemplifies a set of simple production rules. For an input sentence ‘The patient left the OR’, a parse tree can be derived from the production rules as shown below in Figure 3.
Figure 2.

Production rules example. *S: Sentence; NP: Noun phrase; VP: Verb phrase; DT: Determiner; NN: Noun, singular or mass; VBD: Verb, past tense; OR=Operating room.
Figure 3.

Two syntactic trees for the sentence: ‘The I&A removed the viscoelastic with a tip.’ *I&A=Irrigation and aspiration.
When dealing with complex natural language text, more than one production rule may apply to a sequence of words, which results in syntactic ambiguity. Figure 3 shows two syntactic trees derived for the same sentence “The I&A removed the viscoelastic with a tip….”.
The sentences in Figure 3 illustrate the classic phenomenon of prepositional attachment ambiguity where the interpretation of the sentence depends on whether the prepositional phrase “with a tip” attaches to the verb phrase node “removed …” or the lower noun phrase node “the viscoelastic.”
Probabilistic context-free grammars (PCFGs) are an attempt to deal with this ambiguity encountered when applying CFG production rules on complex natural languages text. Thus, PCFG is a probabilistic version of CFG where each production has a probability, as shown in Figure 4. In PCFG, the probability of a parse tree is the product of the probabilities of its re-write rules productions. The parse tree with the greatest probability will be picked from a number of alternatives with varying likelihoods. Probabilities of a PCFG model are typically estimated from a set of training texts (e.g., Penn Treebank[10]). Formally, a PCFG is defined as follows:
A context-free grammar G = (N, Σ, R, S)
Parameters q(α →β), which is the conditional probability of choosing rule α →β
Figure 4.

A syntactic tree with production probabilities for sentence ‘The I&A removed the viscoelastic with a tip.’ *I&A=Irrigation and aspiration.
Given a PCFG with all parameters estimated from a corpus such as the Penn Treebank, a parse tree for a sentence s is chosen from all possible alternative parse tress by finding the parse tree with maximum likelihood:
Here t is a parse tree for s; T(s) is a set of all possible parse trees for sentence s; p(t) is the probability of parse tree t calculated based on parameters collected from corpus. Out-of-the-box and unenhanced PCFGs usually do not perform optimally on text from new domains[28]. Unlexicalized PCFGs with special linguistic annotations[9] and lexicalized PCFGs are two approaches that have been used to address the weaknesses of basic PCFGs.
Klein and Manning utilized a set of linguistic annotations to construct an unlexicalized PCFG parser using the probabilities associated with different syntactic categories to include vertical and horizontal history of tree nodes[9]. For example, the UNARY-INTERNAL annotation was used to mark any nonterminal node in Penn Treebank with only one child. Similarly, the TAG-PA annotation is used to mark all preterminals with their parent category as shown in Figure 5. As shown in Klein’s work, the TAG-PA annotation significantly improves parsing accuracy[9]. Here, the unlexicalized Stanford PCFG parser was trained on the Penn Treebank corpus and enriched with additional annotations and achieved similar performance to the start-of-the-art lexicalized PCFG parser without relying heavily upon lexical dependencies.
Figure 5.

Adding parent annotation to trees
The lexicon of an unlexicalized PCFGs parser trained on treebanks with the additional annotations, as a result, stores not only lexical entries, but also the statistics that a lexical associated with an POS tag as well as the parent tag such as “NNˆNP” and “VBNˆADJP”. The grammars of an unlexicalized PCFG parser also incorporate these additional annotations. For example, a unary rule “NPˆS-U -> PRNˆNP” that specifies that the node has only one child. One advantage of using the unlexicalized Stanford parser is that the text format of the lexicon and grammar can be easily extended and reloaded into original parser.
A lexicalized PCFG specializes its production rules for specific words by including their head-word in the trees as shown in Figure 6. In this way, a lexicalized PCFG largely resolves ambiguities such as the prepositional phrase (PP) attachment problem. Additionally, Collins[29] and Charniak[30] used a discriminative re-ranking technique to obtain better parse from a list of parses generated from original parsers for each sentence. However, the performance of lexicalized PCFGs is limited by the sparseness of lexical dependency information available in Penn Treebank. Also, modeling word-to-word dependencies is difficult, especially if these dependencies are domain-specific.
Figure 6.

Adding headtags to trees
2.2 Domain adaption for unlexicalized parsing and lexicalized parsing
A number of groups have reported and evaluated methods to improve parsing performance of existing unlexicalized parsers. Xu and colleagues[31] reported that the use of POS tags from manual annotation could be used to produce a POS tagger for the medical domain with improved Stanford parser performance of between 2 to 4% with a small set of sentences from clinical reports. The evaluation of these enhancements revealed an improvement on the high level NLP task of noun phrase identification. Similarly, Huang et al.[32] enriched the Stanford lexicon with unambiguous entries in the SPECIALIST lexicon and customized the Stanford parser grammar based on the review of clinical reports although no formal evaluation of these modifications was performed.
We observed from preliminary experiments on clinical text that sometimes even with correct POS tags, general English parsers were not able to give correct parse tree. Figure 7 shows parse trees of a POS tagged sentence (1) produced by the Stanford parser with and without enriched lexicons. Parse tree (7b) is produced by the original Stanford parser with correct POS tagging provided (7a) is a parse tree produced by the enriched Stanford parser.
Figure 7.

Parse trees of a POS tagged sentence (1) produced by Stanford parser (a) with and (b) without enriched lexicon.
(1). “The/DT wound/NN was/VBD extended/VBN proximally/RB and/CC distally/RB.”
Self-training is a technique used to adapt a lexicalized parser to a new target domain. It creates a new parser by training an existing parser with data parsed by the existing parser as extra training data[18, 24]. While some early reports on self-training for parsing reported negative results, McClosky[18, 24] and Bacchiani[33] have showed that this technique can improve parsing performance of the new parser on a target domain. In McClosky’s work, the standard Charniak/Johnson parser was trained on a biomedical abstracts corpus, which were labeled with the existing Charniak/Johnson parser, along with Penn-Treebank. The resulted new parser showed performance improvement on a standard test set, the GENIA Treebank[34].
2.3 Procedure description section of operative reports
The narrative description of a procedure is the core portion of an operative note, which provides specific documentation of what occurred during an operative procedure. The following text exemplifies an excerpt from a ‘procedure description’ section from a transurethral prostatectomy:
“After adequate anesthesia, the patient in the dorsal lithotomy position was prepped and draped in the usual manner. A 28 French continuous flow resectoscope sheath was inserted. Inspection showed that the patient had significant regrowth of his prostatic tissue. This patient in the past had undergone transurethral resection of the prostate elsewhere. The verumontanum and both ureteral orifices were noted to be intact. All the prostatic chips were irrigated from the bladder. A total of 46 grams of prostate was resected. Good hemostasis obtained. A 22 French three-way Foley catheter was inserted and continuous bladder irrigation was started. Sponge and needle correct × 2. The patient tolerated the procedure well.”
Effective computerized NLP systems for operative reports require an understanding of this text and ideally address the domain-specific features of operative notes. Automatic processing of such text is challenging due to higher use of certain surgery terms (e.g., “extubate’”, “prep”), domain-specific words (e.g., “preperitoneally”, “free-up”) and special grammars (e.g., “Good hemostasis obtained”).
2.4 GENIA Corpus
GENIA corpus is a collection of MEDLINE articls on biological reactions of transcription factors in human blood cells. The articles are extracted from MEDLINE database with the MeSH terms, human, blood cell and transcription factor. Each article was annotated with parse trees following the Penn Treebank II (PTB) bracketing guidelines. The following text in Figure 8 shows an example of GENIA syntactic annotation.
Figure 8.

GENIA syntactic annotation example
2.5 SPECIALIST lexicon
The SPECIALIST Lexicon consists of a set of lexical entries including multi-word terms with spelling variants, part(s) of speech, and other information for biomedical domain terms. SPECIALIST consists of over 200,000 biomedical terms, as well as common English words. It has been successfully used to adapt parsers for general English to the biomedical domain as it contains important syntactic, morphological, and orthographic information for each entry[13, 14, 32]. For instance, a lexical record for a term in SPECIALIST contains base forms of the term, the part-of-speech, a unified identifier, spelling variants, and inflection for nouns, verbs and adjectives. As presented in our previous work[35], the SPECIALIST lexicon has very good coverage of both verb predicates (89.9%) and nominal predicates (100%) occurring in operative notes. Table 1 shows the number of entries of four important POS categories in SPECIALIST lexicon and Stanford lexicon, demonstrating that the SPECIALIST lexicon contains many more word entries than the Stanford lexicon.
Table 1.
Entries of 4 POS categories in SPECIALIST lexicon and Stanford lexicon.
| POS category | SPECIALIST | Stanford |
|---|---|---|
|
| ||
| Verb | 56859 | 8477 |
| Noun | 280482 | 27832 |
| Adjective | 90884 | 9032 |
| Adverb | 12467 | 1422 |
In the clinical domain, only a small amount of research has focused on parser adaption for clinical text, with previous work not focusing on operative notes. Therefore in this paper we described our experiments on adapting the Stanford parser into clinical text. We hypothesized that the addition of more accurate statistics from our clinical corpus of operative reports and use of the SPECIALST lexicon could improve the parsing performance of the Stanford parser for operative notes. We extended the lexicon of Stanford unlexicalized parser with new entries in SPECIALIST lexicon that occurred in our operative notes corpus and modified the parser grammar. We tested the performance of parsers augmented with statistics collected from corpus POS tagged with two start-of-art POS taggers, GENIA tagger and Medpost tagger. Also, we self-trained a new lexicalized parser with extra parse trees parsed by the Charniak/Johnson lexicalized parser and tested the performance of the resulting parser.
3 Methods
Figure 9 provides an overview of this study. Overall, we enriched the Stanford lexicon with SPECIALIST lexicon and with statistics collected from POS-tagged operative notes from our clinical note repository and customized the Stanford grammar to the special syntactic structure of operative note text. The resulting enhanced Stanford parser output was then evaluated and compared with POS-tagged corpus with different POS taggers using a set of manually annotated sentences.
Figure 9.

Overview of operative notes parser adaption
3.1 Dataset and Overview
A total of 362,310 operative notes from University of Minnesota-affiliated Fairview Health Services with data from four metropolitan hospitals in the Twin Cities including both community and tertiary-referral settings were used for this study. The corpus includes operative reports created by 2,300 surgeons with 4,333 different procedure types defined by Current Procedural Terminology (CPT) codes. The procedure description was extracted from each note and split into sentences with a locally developed heuristically-based text-processing tool (See Figure 8, Pre-processing Pipeline). We randomly selected a dataset of 70,000 sentences, which is similar to the size of Penn Treebank, from the repository of operative notes sentences.
3.2 Stanford unlexicalized PCFG parser adaption for operative notes
The SPECIALIST Lexicon contains far more entries than Stanford lexicon as shown previously in Table 1. To selectively expand the Stanford lexicon for operative notes, we added only SPECIALIST Lexicon entries contained within the overall operative note corpus. This approach was taken since words that were not within the operative note corpus do not have associated frequency statistics and also to decrease the associated computational overhead encountered with loading the parser and parsing the text associated with adding a large lexicon.
In adding entries to the Stanford Lexicon, we had to take into account that the SPECIALIST Lexicon uses a set of syntactic categories that are different from the Penn Treebank tags for its entries. For unambiguous entries in the SPECIALIST lexicon, the same set of mapping rules used in Huang’s work[32] were used to convert the SPECIALIST Lexicon syntactic categories into Penn Treebank tags. For ambiguous entries in the SPECIALIST lexicon, we converted those entries with multiple syntactic categories (about 20,000 words) into Stanford entries using statistics collected from the tagged corpus combined with several heuristic rules. As introduced above, an unlexicalized PCFG model requires statistics for usage of each POS tag under different parent for parsing. For instance, the word “callus” can be both a noun and a verb. To collect frequencies for the tags of each word, we first created a corpus with a similar size to the Penn Treebank used for Stanford parser training using 70,000 randomly selected sentences from operative note “procedure description” section text. Heuristic rules based on the Stanford lexicon were also used, where we observed that some parents for a particular POS tag were more frequent than others. Using adjectives as an example, in the Stanford lexicon the incidence of adjectives (68,090 in total) used within an adjective phrase (11,498) or a noun phrase (54,211) was significantly greater than other phrase types. The sentence set was then tagged using the five Stanford POS taggers. For example, in the Stanford lexicon, frequencies for each POS tag with a different parent for the word “inject” are given in Table 2. To decide the frequency distribution of each possible parent, we collected the frequency from POS tagged sentences.
Table 2.
Frequency of each POS tag of word “inject” with different parents in the Stanford lexicon
| POS tag | Parent tag | Frequency |
|---|---|---|
| VBD | VP | 2 |
| VBN | VP | 2 |
| JJ | ADJP | 7 |
| JJ | NP | 2 |
| JJ | WHADJP | 1 |
| JJ | WHNP | 1 |
| JJ | UCP | 1 |
| JJ | QP | 1 |
We also observed that for some POS tag and parent combinations, only one or a few specific words were associated. For example, the word “only” in sentence (3) is the only adjective word that could be used in a conjunction phrase:
(2). “The biceps tendon, long head intra-articular portion, was not only split, but remarkably frayed.”
Thus, for each POS tag such as “JJ”, “NN” and “VBD”, we defined a heuristic parent distribution for it and split the collected frequency based on these distributions. For example, for POS tag “JJS” (superlative adjective), we define a distribution as shown in Figure 10. From each POS tagged corpus, the frequency of POS tags associated with each SPECIALIST lexicon entry within the set of 70,000 sentences was collected and used to adapt the Stanford lexicon and create a new adapted lexicon. For example, the new Stanford lexicon extended with the MedPost lexicon contained 172,636 entries while the original Stanford lexicon had 101,703 entries.
Figure 10.

Parent phrase type distribution of the POS tag superlative adjective.
Using our previous observation that physicians tend to use passive voice to narrate the procedure description section[36], we manually adjusted the frequencies of VBD (verb, past tense) and VBN (verb, past participle) tags for verb entries that could be both a past tense verb and a past participle. Also, the POS tag of some verbs, such as “appeared”, “tolerated” and “revealed”, can be either VBD and VBN in the SPECIALIST lexicon, but after review of a random set of sentences with these words, we found that the POS tags of these verbs were mostly VBD as opposed to other verbs such as “incised” and “dissected” which tended to mostly be used in text as VBN. To assign frequencies that could better reflect actual usage of verbs, we used the 200 verbs previously reported from our operative note corpus that covers 92% of all verbs from operative notes to help provide reasonable frequencies of potential ambiguous POS tags.
Finally, we were able to omit auxiliary verbs as this was another feature previously observed in the sublanguage of operative notes. For example, in following sentences (3) and (4), the auxiliary verb “was” is omitted in the operative note text.
(3). “A transverse incision was made in the popliteal fossa and the lesser saphenous vein identified, ligated proximally.”
(4). “Good hemostasis obtained.”
Syntactical information such as the voice of verbs is also critical for many NLP tasks. To address this problem in operative notes, we modified the grammar of the Stanford parser by including more productions rules. For example, given sentence “Good hemostasis obtained” original sentence will give a parse tree as (5). After adding a new rule “VPˆS-VBF-v -> VBNˆVP”, the parser gives correct parse as (6). The new parse assigns correct phrase tags and POS tags for the verbs, which are very important to NLP tasks such as semantic role labeling[37]. As shown in Gildea’s work the phrase tags and POS tags are used to extract voice and parse tree path for semantic role calculating.
(5). (ROOT (S (NP (NNP Gelfoam)) (VP (VP (VBD applied)) (CC and) (ADVP (RB hemostasis)) (VP (VBD confirmed)))))
(6). (ROOT (S (NP (NP (NNP Gelfoam)) (VP (VP (VBN applied)) (CC and) (VP (NN hemostasis)))) (VP (VBN confirmed))))
3.3 Stanford unlexicalized PCFG parser adaption for GENIA corpus
A similar approach was used to adapt for the GENIA corpus. We used 14,325 training trees from the GENIA Treebank as a training corpus and collected statistics from it. Since we do not have enough domain knowledge on biology, we merely ported the words that occurred in GENIA into the Stanford unlexicalized PCFG parser lexicon. Since GENIA trees have parent labels for each word, we tested our approach with two sets of lexicons, one with the accurate parent statistics and the other one with parent statistics generated from heuristics rules. We removed old entries in the original Stanford lexicon when the entry exists in GENIA corpus. A simple grammar was added into the Stanford lexicon for testing. The resulted new parsers have about 129,600 entries.
3.4 Evaluation
To evaluate the performance of parsers adapted from the corpus POS-tagged using different POS taggers, we created a reference standard with 200 manually annotated parse trees of randomly selected operative notes sentences. The reference standard parse trees were annotated by two separate annotators with both a linguistics and informatics background with experience in clinical NLP. Annotations followed the Penn Treebank II Bracketing guidelines[38]. To compare parse results of adapted parsers with the parse trees produced by the Charniak/Johnson parser, parse trees generated by the original Stanford parser and parse trees generated by the original Stanford parser with POS tags from MedPost were examined. In addition, we tested the performance of the parser on a random set of GENIA parse trees. Since the GENIA corpus is from a slightly different domain, we wanted to evaluate the same technique for parser adaption on this domain.
Parsing performance was evaluated following the PARSEVAL standards[39] for parsing accuracy evaluation. Each constituent in the parse was represented as a labeled span. A constituent is counted as correct only if the label and text span is correct. Given two parses, the precision and recall of constituents were calculated. Precision and recall can be formally defined in terms of the number of true positive (TP), false positive (FP) and false negatives (FN) as below. F-score is the weighted harmonic mean of precision and recall.
4 Results
The precision, recall, and f-score means for each of the parsers evaluated are summarized in Table 3 for parsers adapted for operative notes and Table 4 for those adapted for GENIA corpus. As shown in Table 3, at baseline, the Charniak/Johnson parser had slightly better parsing performance for operative notes compared to the Stanford parser. The expansion of the lexicon yielded a moderate improvement of parsing performance. Grammar modification combined with statistics adjustment also brought additional performance gain. The f-score of the final adapted Stanford parser on the operative notes test set improved from 87.64% to 89.90% (adapted with corpus pre-tagged with GENIA tagger).
Table 3.
Evaluation results of parser adaption for operative notes.
|
Evaluation of parser adaption for operative notes
| |||
|---|---|---|---|
| Parser | Precision | Recall | F-score |
| Baseline (Stanford unlexicalized parser) | 87.54% | 87.74% | 87.64% |
| Charniak/Johnson | 88.43% | 88.46% | 88.45% |
| Adapted Stanford unlexicalized parser (New grammars) | 87.73% | 87.94% | 87.83% |
| Adapted Stanford unlexicalized parser (New grammars + lexicon expansion) | 89.27% | 89.84% | 89.55% |
|
| |||
| Adapted Stanford unlexicalized parser (New grammars + lexicon expansion + statistics adjustment) | 89.65% | 90.13% | 89.90% |
Table 4.
Evaluation results of parser adaption for GENIA.
|
Evaluation of parser adaption for GENIA corpus
| |||
|---|---|---|---|
| Parser | Precision | Recall | F-score |
| Baseline (Stanford unlexicalized parser) | 78.18% | 73.52% | 75.78% |
| Adapted Stanford unlexicalized parser (New lexicon with parent statistics by rules and new grammar) | 82.92% | 76.52% | 79.59% |
| Adapted Stanford unlexicalized parser (New lexicon with actual parent statistics and new grammar from) | 84.08% | 78.60% | 81.25% |
Table 4 shows the result of parser adaption on the GENIA corpus, when apply same technique on GENIA corpus, the parsing result of adapted parser on the GENIA test set improved from 75.78% to 79.59% with parent distribution from rules and to 81.25% with parent distribution collected from GENIA Treebank annotations.
5 Discussion
Full syntactic parsing of text provides deep linguistic information (e.g. voice, phrase type) useful for many NLP tasks. Parsers developed for general English text have benefited from a large tree bank and training corpus (e.g, Penn Treebank) and have achieved high parsing performance. Clinical documents are known to have special sub-language features (e.g. domain vocabulary, telegraphic text, special grammar), which often require adaptation of general English NLP tools. Parsers often have a decrement in performance when applied to scientific texts[40]. Domain NLP experts have investigated methods to adapt parsers trained on general English to new target domains[12–15, 18, 41, 42], but these approaches have been attempted to only a limited extent in different types of clinical texts. In this work, we investigated the adaptation of a general unlexicalized PCFG parser to a specific type of clinical text with operative reports using tag statistics collected from operative reports and other sublanguage features of operative notes. We applied the approach on two different domains, clinical operative notes and the GENIA corpus. The results show that this approach can improve parsing performance on both domains. Interestingly, baseline performance of our parsers was good at operative notes, although domain adaptation was helpful in improving parser performance further.
To compare our results with previously work on parser domain adaption, we applied our approach on the GENIA corpus, which is a public available corpus. Our evaluations show that the performance of the new parser adapted to GENIA corpus is close to the state of the art parser performance 80.7% without parser training using domain parse trees[43], which requires a large annotated corpus and is not feasible for parser adaption in most cases.
To extend the Stanford parser lexicon, we incorporated only the SPECIALIST entries that existed in our corpus. Another option to consider with future enhancements would be to add all tokens in the operative notes corpus, which would not limit us to the ones contained in the SPECIALIST lexicon. We observed that out of all the tokens in our corpus, about 75% of them were contained in the SPECIALIST lexicon. Of all of the tokens not in the SPECIALIST lexicon, a large portion of them (about 85%) were nouns. Since the Stanford parser treats unknown words as nouns by default, we chose to ignore these tokens. However, we did include adjective and adverb tokens, which are in our corpus but not in the SPECIALIST lexicon because of capitalization of the first letter when these words appear at the beginning of a sentence.
In this work, we used a set of heuristic rules to specify the parent distribution of each entry depend on the POS tag of the token as shown in section 3.2. As shown in Table 4, when use real parent phrase tag distribution collected from GENIA tree bank, the adapted parser performance improved another 1.68%. However, real parent phrase tag distribution is not always available for other domain such as the clinical text. To acquire a better estimation of the statistics on parent distribution, some features such as the POS tag of the word before and after the interested word may help to decide the parent phrase tag. More work will be needed to analysis the algorithm for parent distribution in the future. When tested the new unlexicalized PCFG parser adapted with clinical text on GENIA tree bank, as we expected, we found no performance improvement. As the GENIA corpus is a domain with very different sublanguage features, the statistics of GENIA text have differences from clinical text.
Since the Stanford PCFG parser is unlexicalized, no head word information is incorporated in the associated production rules. Thus, we observed that the adapted Stanford parser was unable to solve the prepositional phrase (PP) attachment ambiguity, which an issue often observed in general English. In the text for procedure description, we observed that the average sentence length (86 characters) is less than that of the Wall Street Journal sentences (126 characters). As shown in the example procedure description in the introduction, surgeons tend to describe actions, which occurred during a procedure using short and simple sentences. Thus, the ambiguity is potentially less of a problem in operative notes than in general English and other clinical texts.
We also noted that in operative notes procedures are usually described with short and simple sentences. So the parsing performance of regular parsers is better than that of some other types of clinical text such as the corpus presented in Xu’s work. Other areas where we might consider further study include increasing the parse tree training set, which we purposefully did not do here with the goal of enhancing the parser with corpus statistics and other sublanguage characteristics. Subjectively, the overall parsing performance improvement observed with these enhancements was good despite the small magnitude of increase observed since the baseline performance of the unenhanced Stanford parser was fairly high. Furthermore, the magnitude of increase in performance accuracy found in this study is consistent with that found in other similar studies of parser adaptation[12, 14, 31, 41].
In placing this study in the overall context of clinical NLP, we only concentrated on the clinical text for the procedure description of operative notes. Additional work will be needed to determine if the approach used here with operative reports will be generalizable to other types of clinical texts such as discharge summaries and radiology reports. These approaches will also likely require an understanding and ideally consideration of other unique syntactic structures and language features seen in clinical documents, such as the irregular sentence structures observed in Xu’s work[3131]. We suspect that by including additional grammars for irregular structures into the Stanford parser and extending the parser lexicon to the lexicon specific to those texts that the performance of the Stanford parser can similarly be improved on other clinical text in an analogous manner.
6 Conclusion
In this study, we adapted the Stanford parser by extending the parser lexicon with new entries in operative notes. Syntactical statistics of each new entry were collected from POS tagged clinical text. The 200 most frequent verbs were modified in their entries using an adjustment based on their usage in operative notes. The Stanford parser unary and binary grammar were also customized to deal with the special syntactical structure of operative notes. Our experiments showed that statistics collected from GENIA tagged corpus of operative notes as well as new production rules best augmented the parsing performance of Stanford parser. When applying a similar approach on the GENIA Treebank corpus, we observed similar improvement with the adapted unlexicalized parser by augmenting the lexicon and grammar production rules.
Acknowledgments
The author would like to thank Fairview Health Services and support from the American Surgical Association Foundation (GM), #R01 LM009623-01 (SP) National Library of Medicine, Institute for Health Informatics Seed Grant (GM/SP), and University of Minnesota Clinical Translational Science Award 8UL1 TR000114-02.
References
- 1.Kilicoglu H, Bergler S. Syntactic dependency based heuristics for biological event extraction. Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task; Boulder, Colorado. Association for Computational Linguistics; 2009. pp. 119–27. 1572361. [Google Scholar]
- 2.Rinaldi F, Schneider G, Kaljurand K, Hess M, Romacker M. An environment for relation mining over richly annotated corpora: the case of GENIA. BMC Bioinformatics [Evaluation Studies] 2006;7(Suppl 3):S3. doi: 10.1186/1471-2105-7-S3-S3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Rinaldi F, Schneider G, Kaljurand K, Hess M, Andronis C, Konstandi O, et al. Mining of relations between proteins over biomedical scientific literature using a deep-linguistic approach. Artif Intell Med. 2007 Feb;39(2):127–36. doi: 10.1016/j.artmed.2006.08.005. [DOI] [PubMed] [Google Scholar]
- 4.Song M, Yu H, Han WS. Combining active learning and semi-supervised learning techniques to extract protein interaction sentences. BMC Bioinformatics. 2011;12(Suppl 12):S4. doi: 10.1186/1471-2105-12-S12-S4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Daraselia N, Yuryev A, Egorov S, Novichkova S, Nikitin A, Mazo I. Extracting human protein interactions from MEDLINE using a full-sentence parser. Bioinformatics. 2004 Mar 22;20(5):604–11. doi: 10.1093/bioinformatics/btg452. [DOI] [PubMed] [Google Scholar]
- 6.Bui Q-C, Nuallain B, Boucher C, Sloot P. Extracting causal relations on HIV drug resistance from literature. BMC Bioinformatics. 2010;11(1):101. doi: 10.1186/1471-2105-11-101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Rinaldi F, Schneider G, Clematide S. Relation mining experiments in the pharmacogenomics domain. J Biomed Inform. 2012 Oct;45(5):851–61. doi: 10.1016/j.jbi.2012.04.014. [DOI] [PubMed] [Google Scholar]
- 8.Charniak E. A maximum-entropy-inspired parser. Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference; Seattle, Washington. Association for Computational Linguistics; 2000. pp. 132–9. 974323. [Google Scholar]
- 9.Klein D, Manning C. Accurate unlexicalized parsing. Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics; 7–12 July 2003; Sapporo. pp. 423–30. [Google Scholar]
- 10.Marcus MP, Marcinkiewicz MA, Santorini B. Building a large annotated corpus of English: the penn treebank. Comput Linguist. 1993;19(2):313–30. [Google Scholar]
- 11.Friedman C, Kra P, Rzhetsky A. Two biomedical sublanguages: a description based on the theories of Zellig Harris. J Biomed Inform. 2002 Aug;35(4):222–35. doi: 10.1016/s1532-0464(03)00012-1. [DOI] [PubMed] [Google Scholar]
- 12.Rimell L, Clark S. Porting a lexicalized-grammar parser to the biomedical domain. J Biomed Inform. 2009 Oct;42(5):852–65. doi: 10.1016/j.jbi.2008.12.004. [DOI] [PubMed] [Google Scholar]
- 13.Szolovits P. Adding a medical lexicon to an English Parser. AMIA Annu Symp Proc. 2003:639–43. [PMC free article] [PubMed] [Google Scholar]
- 14.Pyysalo S, Salakoski T, Aubin S, Nazarenko A. Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches. BMC Bioinformatics. 2006;7(Suppl 3):S2. doi: 10.1186/1471-2105-7-S3-S2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Lease M, Charniak E, editors. Parsing biomedical literature. the Second International Joint Conference on Natural Language Processing (IJCNLP-05); 2005; Jeju Island, Korea. [Google Scholar]
- 16.Clegg AB, Shepherd AJ, editors. the ACL Workshop on Software. 2005. Evaluating and Integrating Treebank Parsers on a Biomedical Corpus. [Google Scholar]
- 17.Clegg A, Shepherd A. Benchmarking natural-language parsers for biological applications using dependency graphs. BMC Bioinformatics. 2007;8(1):24. doi: 10.1186/1471-2105-8-24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.McClosky D, Charniak E. Self-Training for Biomedical Parsing. Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers; Columbus, Ohio. 2008. pp. 101–4. [Google Scholar]
- 19.Hara T, Miyao Y, Tsujii JÄ. Adapting a Probabilistic Disambiguation Model of an HPSG Parser to a New Domain. In: Dale R, Wong K-F, Su J, Kwong O, editors. Natural Language Processing ‚Äì IJCNLP 2005. Springer; Berlin Heidelberg: 2005. pp. 199–210. [Google Scholar]
- 20.D H., III Frustratingly Easy Domain Adaptation. Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics; Prague, Czech Republic daumeiii. Association for Computational Linguistics; 2007. pp. 256–63. 2007:ACL Main. [Google Scholar]
- 21.McClosky D, Charniak E, Johnson M. Effective self-training for parsing. HLT-NAACL ’06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics; pp. 152–9. [Google Scholar]
- 22.D H, III, Kumar A, Saha A. Frustratingly easy semi-supervised domain adaptation. Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing; Uppsala, Sweden. Association for Computational Linguistics; 2010. pp. 53–9. 1870534. [Google Scholar]
- 23.Huang Y, Lowe HJ, Klein D, Cucina RJ. Improved identification of noun phrases in clinical radiology reports using a high-performance statistical natural language parser augmented with the UMLS specialist lexicon. J Am Med Inform Assoc. 2005;12(3):275–85. doi: 10.1197/jamia.M1695. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.McClosky D, Charniak E, Johnson M. Reranking and self-training for parser adaptation. Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics; Sydney, Australia. 2006. pp. 337–44. [Google Scholar]
- 25.SPECIALIST Lexicon. Available from: http://wwwnlmnihgov/pubs/factsheets/umlslexhtml, last access 2013 August 10th.
- 26.Sipser M. Introduction to the Theory of Computation. International Thomson Publishing; 1996. [Google Scholar]
- 27.Chomsky N. Syntactic structures. Mouton de Gruyter; 2002. [Google Scholar]
- 28.Collins M. Head-Driven Statistical Models for Natural Language Parsing. Comput Linguist. 2003;29(4):589–637. [Google Scholar]
- 29.Collins M, Koo T. Discriminative Reranking for Natural Language Parsing. Comput Linguist. 2005;31(1):25–70. [Google Scholar]
- 30.Charniak E, Johnson M, editors. Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. ACL’05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics; 2005. [Google Scholar]
- 31.Hua X, AbdelRahman S, Min J, Jung-wei F, Yang H, editors. An initial study of full parsing of clinical text using the Stanford Parser. Bioinformatics and Biomedicine Workshops (BIBMW), 2011 IEEE International Conference on; 2011 12–15 Nov; 2011. [Google Scholar]
- 32.Huang Y, Lowe H, Klein D, Cucina R. Improved Identification of Noun Phrases in Clinical Radiology Reports Using a High-Performance Statistical Natural Language Parser Augmented with the UMLS Specialist Lexicon. Journal of the American Medical Informatics Association. 2005;12(3):275–85. doi: 10.1197/jamia.M1695. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Bacchiani M, Riley M, Roark B, Sproat R. MAP adaptation of stochastic grammars. Comput Speech Lang. 2006;20(1):41–68. [Google Scholar]
- 34.Genia treebank. Available from: http://www.nactem.ac.uk/genia/genia-corpus/treebank, last access 2013 December 6th.
- 35.Wang Y, Pakhomov S, Burkart N, Ryan J, Melton G. A Study of Actions in Operative Notes. Proceedings of the American Medical Informatics Association Symposium. 2012 [PMC free article] [PubMed] [Google Scholar]
- 36.Wang Y, Pakhomov S, Burkart NE, Ryan JO, Melton GB. A study of actions in operative notes. AMIA Annu Symp Proc. 2012;2012:1431–40. [PMC free article] [PubMed] [Google Scholar]
- 37.Gildea D, Palmer M. The necessity of parsing for predicate argument recognition. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics; Philadelphia, Pennsylvania. Association for Computational Linguistics; 2002. pp. 239–46. 1073124. [Google Scholar]
- 38.the Penn Treebank II Bracketing Guide. Available from: ftp://ftp.cis.upenn.edu/pub/treebank/doc/manual/, last access 2013 December 6th.
- 39.Abney S, Flickenger S, Gdaniec C, Grishman C, Harrison P, Hindle D, et al. Procedure for quantitatively comparing the syntactic coverage of English grammars. In: Black E, editor. Proceedings of the workshop on Speech and Natural Language; Pacific Grove, California. Association for Computational Linguistics; 1991. pp. 306–11. 112467. [Google Scholar]
- 40.Clegg AB, Shepherd AJ. Evaluating and Integrating Treebank Parsers on a Biomedical Corpus. Proceedings of the ACL Workshop on Software. 2005 [Google Scholar]
- 41.Rush AM, Reichart R, Collins M, Globerson A. Improved parsing and POS tagging using inter-sentence consistency constraints. 2012 [Google Scholar]
- 42.Miller JE, Torii M, Vijay-Shanker K, editors. Subdomain adaptation of a POS tagger with a small corpus. BioNLP ’07 Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing; 2006. [Google Scholar]
- 43.McClosky D. Any Domain Parsing: Automatic Domain Adaptation for Natural Language Parsing. Brown University; 2010. [Google Scholar]
