Skip to main content
AMIA Summits on Translational Science Proceedings logoLink to AMIA Summits on Translational Science Proceedings
. 2019 May 6;2019:182–191.

Information Extraction of Behavior Change Intervention Descriptions

Debasis Ganguly 1, Yufang Hou 1, Le´a A Deleris 1, Francesca Bonin 1
PMCID: PMC6568066  PMID: 31258970

Abstract

We describe an information extraction (IE) approach for knowledge base population of behavior change scientific intervention findings. In this paper, we focus on building a system able to characterize the specific intervention techniques that are undertaken within behavior change intervention studies. We have investigated three different configurations of a general information retrieval based framework for information extraction: a) an unsupervised approach that hinges on specification of a query for each attribute to be extracted and a few parameters for rule-based post-processing; b) a semi-supervised approach, which uses a part of the ground-truth annotations as a training set to automatically learn optimal representation of the queries; and c) a supervised approach that replaces the rule-based post processing by a text classifier. To train and evaluate our system, we make use of a ground-truth data set annotated by behavior science experts. This dataset consists of a total of 226 research papers on smoking cessation.

1. Introduction

Many global threats to human health and well-being can only be solved by people, organizations and governments changing their behavior. For example, obesity, antimicrobial resistance, and hospital-acquired infections can be mit- igated respectively by healthier eating, appropriate antibiotic prescribing, and improved hand hygiene. Behavioral change interventions (BCIs) are policies, activities, services or products designed to cause people to act differently from how they would have done otherwise. They involve attempting to change either members of the target popula- tion (in terms of their knowledge, skills, beliefs, feelings or habits), or their social or physical environment.

Research findings have the potential to provide invaluable knowledge to help with developing or selecting BCIs but such evidence needs to be synthesised and interpreted [1]. Since the scientific literature on behaviour change is vast and accumulating at a rapidly accelerating rate, it is difficult to achieve this manually. An automated Information Extraction (IE) approach that extracts relevant pieces of information from BCI reports is essential to provide navigable interfaces that allow domain experts to easily find relevant pieces of information from previously reported studies. The extracted information can also be used as features to develop predictive models capable of suggesting likely BCI outcomes for planning of trial experiments. This is the objective of the Human Behaviour Change Project,1 the broader research endeavor that motivates this specific study.

Extending on our initial approach which looked at a variety of attributes to be extracted [2], we report here our further investigations related to the extraction of BCI descriptions. Specifically, a BCI represents the set of behavior change techniques (BCTs) that are prescribed to influence the target behaviour, e.g., ‘Goal setting (behavior)’ is a particular BCT which, in the context of smoking cessation, is about setting a quitting date for smokers who are ready to quit. Table 1 shows a few additional examples of BCIs

Table 1.

Examples of BCT attributes from a collection of behavior change research reports.

BCT Type Examples
Goal Setting (behavior) 1) Individualised step count targets
2) Kept a diary to record the number of cigarettes smoked daily over one week.
Social Support 1) Smoking cessation counselling of at most 10 min.
2) Tailored support was offered for up to 1 month before and after quitting.

Being able to describe a BCT accurately is a requisite to be able to reason upon which actions are effective at influ- encing behaviors. In that context, we rely on the BCT taxonomy, which has been developed to provide a standardised method of classifying intervention techniques. Specifically, we leverage a taxonomy of 93 BCTs [3, 4], organised in 16 groups.2

Given the lack of readily available annotated corpus and the effort associated with annotations, our objective is to develop a system that does not require a large set of annotations. For this reason, our starting point is an unsupervised IE approach, where data is only needed for evaluation purposes.

Another aspect of the challenge lies within the quantity of attributes to be extracted which precludes application of a hand-crafted approach for each single item but rather calls for a more holistic approach. Indeed, the full taxonomy of BCTs contains more than 90 BCTs. It is thus essential that we design a system that is based on a general framework and whose parameterization for each BCT is as lightweight as possible. Our exploration of semi-supervised and supervised approaches from a small quantity of annotated data represents a further effort to reduce the amount of manual input and increase scalability of the extraction approach.

The rest of this paper is organized as follows. After a review of the background literature in Section 2, we introduce a general passage retrieval based framework of our system in Section 3, which is followed by a description of three different methods of BCT prediction in Section 4. In Section 5 we describe our experimental setup including the dataset and the evaluation metric. This is followed by 6, where we report the results of our experiments. Section 7 concludes the paper with directions for future work.

2. Related Work

Overall, research on information extraction from the medical literature is still in its infancy and faces a number of limitations, such as lack of common benchmarking datasets, and of a general consensus on the class of approaches that are reported to perform well on such benchmarks. The study in [5] provides an exhaustive survey of IE approaches on BCI literature. The survey reveals a number of instances where variations are observed, such as:

  • the document collection used for the experiments ranging from randomly chosen articles from clinical journals [6] to manually chosen abstracts suitable to the study from PubMed [7];

  • dataset used for the experiments, e.g. using the same corpus for training whereas a different one for testing; for instance, while the ‘bmjcardio’ corpus was used for training supervised models in [7] and [5], the latter used a different corpus of 44 articles as a test set instead of using the test set of [7];

  • the text processed for the IE experiments, ranging from abstracts [8] to full articles [9];

  • the approaches used in the experiments, ranging from unsupervised rule-based [8] to supervised approaches involving sentence classification [9];

  • the granularity of the approaches, some being a single step multi-class sentence classification, e.g. classes corresponding to the PICO ontology [10], whereas others using classified sentences for further processing by either rule-based [6] or supervised approaches [7].

Most of the methods described above are rule-based. We elected to follow an information retrieval (IR) based ap- proach both unsupervised and supervised which presents multiple advantages over the rule-based one. First, it allows provision for incorporating term importance (in the form of collection statistics) to assign more importance to a match of a rare term than a match of a frequent one, which is not possible in a regular expression based matching. Secondly, IR model parameters can be tuned, e.g. tuning the λ parameter in language modeling with Jelinek Mercer smoothing (LM-JM) [11], to yield different rankings of retrieved passages, whereas regular expressions do not allow preference matching. Thirdly, supervised approaches do not require handcrafted query so they can more easily be generalized.

Some work has been conducted on supervised approaches for medical information extraction. Some studies have concentrated their efforts on medical abstracts. In [12], the authors propose a conditional random field (CRF) classifi- cation method for labelling medical abstract sentences according to medical categories, such as outcome, intervention, population. Hansen et al., 2008 [13] developed a Support Vector Machine algorithm for extracting the number of trials participants from medical abstracts, while in [14], the authors use a machine learning approach for classifying abstract sentences according to the PICO scheme.

Other studies have exploited the entire article, for the extraction of papers’ metadata as [15]: the authors propose a preliminary system based on CRF for extracting formulaic text (authors names, email and institution) as well as some key study parameters in a free text form, from PubMedCentral articles. They reach promising results for the formulaic text, but only moderate success for the free text attributes. The study in [16] involves finding key-phrases from scientific articles and then classifying them. However, these categories are much broader (coarse-grained), e.g. ‘process’, ‘task’ etc., than the fine-grained BCT categories in our task.

In contrast to previous work, we a) use the entire article for detecting the interventions applied in the study and b) compare an unsupervised and a supervised approach for BCT extraction.

3. A focused passage retrieval framework

As we mentioned in Section 1, the set of possible BCTs one needs to consider for a particular behaviour change type (e.g. smoking cessation) is pre-defined within the taxonomy. The aim of the extractor is thus to find sufficient evidences within text which supports the presence of a particular BCT. A concrete example is detecting whether a study prescribed the BCT ‘self monitoring of behaviour’ for the participants. The passage retrieval based framework allows provision for extracting the evidence (in the form of a sentence or part there-of) that support our claim.

In contrast to the standard text classification task where the bag-of-words representation of whole documents are used to determine its category, in the context of BCT extraction, the pieces of text potentially indicating the presence or absence of a BCT is limited to small passages within a document. The first part of our work thus involves finding out the passage from within a document which may potentially indicate the presence of a BCT. Our starting point is thus a generic information retrieval (IR) extraction pipeline which is then parametrized for each BCT attribute.

In order to restrict our attention to relevant pieces of text for detecting BCT presence, the key idea is to associate a query (manually specified or automatically constructed from a training set of annotations) with the BCT to be extracted. Using this query, the next step is then to obtain a list of top K passages (small fragments of text from a document) ranked in descending order by their similarities with the query. In particular, we use a standard retrieval model, namely the language model with Jelinek-Mercer smoothing (LM-JM) [11], for computing the similarities between passages and queries.

Given a query, the retrievable units comprise arbitrary passages of text defined by word windows of a pre-defined number of words (which in particular was set to 20 for our experiments). The intention of retrieving passages is to restrict extraction of factoid answers to potentially relevant small semantic units of text rather than the text of the whole document. For our work of BCT extraction in this paper we set K (number of passages to be retrieved) to 1, i.e. we retrieve only the top ranked passage.

From an implementation point of view, the system constructs an in-memory transient index of passages of text while processing each document in turn. As components in our implementation pipeline, we employ the Apache Tika (a text extraction tool)3 to extract text from pdf files and Apache Lucene (an indexing/retrieval tool)4 to index the extracted text into separate fields. We employ standard pre-processing steps before indexing, i.e., stop-word removal and stemming using the Porter stemmer.

3.1. Normalizing retrieval scores

Note that the retrieval scores from a standard retrieval model, e.g. LM-JM, are not normalized in the range [0, 1]. However, to be able to make use of the retrieval scores for the BCT predictions, we need to normalize the scores within a fixed range. To this end, we normalize the retrieval score for each retrieved passage r by dividing each retrieval score with a value Z(r). The value of Z(r) is the LM-JM similarity score of a passage r with itself, which is always higher than the LM-JM similarity of a subset of terms from r matching the query. This ensures that the normalized values are in the range [0, 1].

3.2. BCT prediction

We make the final prediction of absence or presence of a BCT based on a similarity threshold function. The function equals to 1 if the similarity between the top retrieved passage and the BCT query, denoted by sim(p, q), is higher than a threshold τ, which is a parameter as shown in Equation 1.

ϕ(p,q,a,τ) = 1, if sim(p,q) > τ, τ  [0,1]= 0, otherwise (1)

When using values of K > 1 (i.e. retrieving more than one top ranked passage), one needs to compute an aggregate score (e.g. average or maximum) over the normalized values of K passage scores. The threshold is then applied on the aggregate score.

Overall, for each BCT to be extracted, we need to define two main parameters: i) a query q, and ii) a threshold for the cosine similarity τ. After initial empirical investigations of tuning the parameter τ, we set the value of the similarity threshold τ to 0.2.

4. Proposed Methods for BCT Extraction

4.1. Unsupervised Approach

In this approach, the query for retrieving the candidate passages for determining each BCT is manually configured into the system. These queries are developed by domain experts and are structured with Boolean operators connecting the constituent terms. As a particular example, the query for extracting the ‘Goal setting (behavior)’ BCT is ‘(goal OR target) AND (quit OR plan)’, i.e. we are interested to retrieve passages that must contain the word ‘goal’ or the word ‘target’ while at the same time containing either ‘quit’ or ‘plan’.

4.2. Semi-Supervised Approach : Learning the Queries

The main limitation of our approach, proposed in Section 4.1, is that it requires specifying a query corresponding to each BCT. In a completely unsupervised setting, this has to be specified manually. In a semi-supervised setting, we seek to learn the query representation from a seed set of annotated data. The advantage is that there is no manual intervention required for including a new set of target attributes. The drawback is that learned queries may not be as effective as the ones provided by the domain experts, especially in the absence of a sufficiently large corpus of annotated data.

To learn the query, we make use of a set of training documents (as shown by the green rectangle of Figure 1) from which we compute multiple term frequency vectors. First, we compute the collection frequency CF(t) which denotes the frequency of each term t in the collection. Second, for each BCT attribute i we tally the term frequency based on the associated positive annotations of that attribute in our training set. We denote this vector T Fi such that T Fi(t) denotes the number of times the term t appears in the annotated text associated with attribute i.

Figure 1.

Figure 1.

Schematic flow of the semi-supervised (top) and supervised (bottom) approaches. The semi-supervised approach exploits information only from the annotated documents in the training set (denoted by the green rectangle). The supervised approach (bottom) exploits also the unannotated documents (denoted by the yellow rectangle) for extracting negative examples. Positive and negative examples are then used to train a classifier. Once trained, the classifier predicts the presence/absence of a BCT during the evaluation phase.

As a next step, for each attribute i, we select the top most informative terms by applying a language modeling with Jelinek Mercer smoothing (LM-JM) term selection score [11],

w(t)=log(1+λ1λ×TFi(t)tTFi(t)×tCF(t)CF(t)) (2)

where TFi(t)tTFi(t) denotes the maximum likelihood probability of sampling t from the set of terms in T Fi, CF(t) denotes the collection frequency of t in the set Dtrain indicative of how discriminating the term is (normalized based on the collection size tCF(t), and λ is a linear combination parameter which controls the relative importance of the term frequency and the collection statistics components.

An intuitive explanation of Equation 2 is that w(t) favours terms that are frequently highlighted and are relatively rare in the collection, suggesting that these terms are likely to be more informative. The rationality behind selecting relatively rare terms allows to filter out terms that frequently occur in the corpus indicating that they are less likely to exclusively represent a particular BCT. Terms in T Fi are sorted in decreasing values of w(t) and the top-most M terms are used to represent the query q of the associated information unit.

As a side-note, we mention that we selected LM-JM retrieval model in particular because after some initial experiments with other retrieval models, such as BM25 and LM Dirichlet, we found out that LM-JM yielded the best results.

The parameters associated with the semi-supervised approach are λ and M, which influence the query learning pro- cess, and τ which is used to evaluate whether the retrieved passages indicate the presence of the BCT with enough confidence. A natural default value for λ is 0.5 which means equal importance for term presence and term discrimi- nativeness.

4.3. Supervised Approach: Learning to predict BCTs without similarity threshold

This final variation of our general framework explores whether, despite the modest size of our annotated dataset, a supervised approach can improve performance. To this end, we use a classifier to replace the threshold based method to determine whether top retrieved passages provide sufficient evidence of the presence of a BCT, as illustrated in Figure 1.

To train a text classifier to predict the presence or absence of a BCT, we need to have access, for each BCT attribute, to both positive and negative examples. As mentioned previously, the annotations only provide positive examples, i.e. evidences about the presence of a BCT. However, for papers that are not annotated with a BCT, there exists no explicit evidence to support the absence of the BCT.

A solution is to automatically construct negative examples for the classifier training. We generate negative annotations as follows. For each annotated document where a BCT is not present, we make use of the query learned from the positive examples (the green set of Figure 1) as described in Section 4.2, to retrieve a list of top-ranked passages from the unannotated documents in the training set (shown by the yellow rectangle in Figure 1). Each such top-ranked passage denotes a piece of text that is similar to the query (in terms of overlap in informative words). The fact that these pieces of text was not annotated by a human assessor, during the annotation process, likely indicates that these can potentially be treated as pieces of text indicating the absence of that particular BCT, i.e. as negative examples for the classifier training.

For the supervised approach, the set of relevant parameters are thus λ and M for the query learning. For the classifier, we conduct our experiments with the standard naive Bayes model, which is known to perform well on text data.

5. Experiment Setup

5.1. Data

The ground-truth dataset used for evaluation comprises a set of 226 published papers on BCI studies focused on smoking cessation, as shown in Table 2. The papers were annotated by a team of 4 behaviour science domain experts. Annotation (freely highlighting of a region of the text) corresponding to each attribute value for each document was performed by two human annotators - coding between the annotators was reconciled and one version of annotations was accepted as the correct data - using the EPPI tool5. Annotations have been carried out in four distinct sprints associated respectively with 43, 57, 47 and 79 papers. A total of 10 different BCT types were annotated in the four annotation sprints. The objective of our experiments is to correctly identify the BCTs annotated in a given paper by an automated method. The annotation process involved highlighting relevant pieces of text of arbitrary length and then assigning them to the corresponding attribute. Additionally, in order to disambiguate the highlighted text, the annotators were asked to annotate the entire sentence containing the highlighted piece as the additional context.

Table 2.

Dataset Characteristics.

# Papers 226
# Papers in training set 100
# Papers test set 126
# BCTs 10
# BCT annotation instances 642

For Boolean attributes, such as BCTs, the highlighted text comprises evidence in the text which supports a given claim, e.g., the highlighted text ‘set a target quit date’ provides evidence of the presence of the BCT ‘Goal setting (behaviour)’ within the reported intervention (see Table 1 for more examples). Note that when a BCT is absent from a behaviour study article, there is no associated highlighted text. Indeed, most BCI studies describe interventions by only listing the behavior change techniques that are present. The understanding being that if a BCT is not mentioned, then it is absent from the intervention. This technicality is relevant when considering machine learning approaches which requires both positive and negative examples.

5.2. Evaluation Metrics

For BCT attributes, our objective is to detect their presence in a given document d. We define y as the Boolean indicating whether there is an annotation h for attribute i in document d and ŷ the corresponding predicted value. We evaluate our model using the accuracy, which is a standard evaluation metric for classification.6

To allow fair comparison among our three variants (unsupervised, semi-supervised and supervised), we require a train- test split. Instead of splitting the collection arbitrarily, we make use of the different annotation sprints to define the train-test partition. A reason to split this way is to investigate how well the system will generalize to a document collection (a test set from a different annotation sprint) that may have considerable differences with the training set. In particular, we choose to use the set of papers from annotation sprints 1 and 2 for training (constituting a total of 100 papers), and the set of papers from annotation sprints 3 and 4 for testing (constituting a total of 126 papers). For all experiments we define the retrievable units as passages comprising 20 words.

6. Results

6.1. Unsupervised BCT Extraction

Table 3 reports the accuracy scores associated with the 10 BCTs for smoking cessation. We report two sets of values in Table 3, one with a fixed threshold (τ) value (which in this case is 0.2) denoted as ‘A’, and the other with an optimal value of τ (optimized with grid search within a range of 0.01 to 0.5) denoted as ‘A*’. The optimal threshold value results represent an optimal scenario that assumes the existence of an oracle which correctly chooses the optimal settings (also reported in the table along the τ column). These results thus serve as an estimate of the maximum performance gains that could be achieved with the proposed method of BCT extraction.

Table 3.

BCT extraction effectiveness obtained with the unsupervised method (i.e. manually formulated queries with fixed threshold values). Queries are shown alongside each BCT attribute. ‘A*’ represents an oracle-driven setting where the value of the threshold is chosen optimally. These optimal values are reported in the last column.

BCT Query Representation A A* τ
1.1 Goal setting (behavior) (goal OR target) AND (quit OR plan) 0.619 0.651 0.15
1.2 Problem solving cope overcome identify problem relapse 0.595 0.603 0.15
1.4 Action planning action plan intention quit 0.722 0.968 0.50
2.2 Feedback on behaviour patient feedback 0.976 0.984 0.25
2.3 Self-monitoring of behavior self monitor diary track 0.937 0.944 0.15
3.1 Social support (unspecified) quit instruction advice training 0.254 0.778 0.01
5.1 Information about health consequences hazard smoking 0.540 0.762 0.50
5.3 Information about social and environmental consequences harmful chemical environmental consequences 0.738 0.770 0.25
11.1 Pharmacological support nicotine gum patch NRT transdermal 0.579 0.651 0.10
11.2 Reduce negative emotions negative emotion stress 0.865 0.873 0.25
Mean Accuracy values 0.683 0.798

Overall accuracy levels with fixed a τ = 0.2 are satisfactory across the various BCTs with an average of 0.683. The accuracy increases to 0.798 with the oracle based optimized τ settings. Naturally, the more deviated the optimized τ values in ‘A*’ are from the fixed value of 0.2 in ‘A’, the higher are the gains in accuracy. For example, this is illustrated by the BCT ‘3.1 Social support (unspecified)’, the accuracy of which increased from 0.254 to 0.778. The oracle-based setting shows that an optimal selection of thresholds can lead to promising results, e.g. 3 BCTs out of 10 in ‘A*’ achieve higher than 95% accuracy.

6.2. Semi-supervised BCT Extraction

Table 4 reports the accuracy of the BCT extraction task for the semi-supervised approach where instead of manually specifying the queries we learn them automatically from the annotated data in the training set. We report results with different values of M (i.e. the number of query terms). Similar to the experiments reported in Table 3, the threshold value (τ) was set to 0.2. The second column of Table 4 shows the top 10 query terms (ranked in decreasing order by the term selection score of Equation 2) associated with each BCT. For M = 1, the query that was used corresponds to the first term listed in that column, for M = 2 the first two terms and so on. Note that the query terms, that are shown, corresponds to the word stems (Porter stemmer) instead of the original words.

Table 4.

Performance of the semi-supervised approach where queries are learned automatically. We vary the number of terms included in the query.

BCT Top 10 query terms M=1 M=2 M=5 M=10
1.1 quit date set smoke particip week target dai patient 2 0.651 0.579 0.429 0.381
1.2 quit smoke relaps strategi cope situat prevent develop plan skill 0.524 0.524 0.524 0.500
1.4 plan smoke quit reduct rst cigarett hardest set smoker al 0.968 0.968 0.968 0.960
2.2 smoke feedback includ assess patient quit report base baselin individu 0.984 0.984 0.889 0.540
2.3 smoke monitor diari record daili week alcohol number quit behavior 0.929 0.897 0.675 0.135
3.1 counsel quit support smoke session cessat intervent week social face 0.802 0.786 0.778 0.778
5.1 smoke quit health inform risk relat benefit cessat stop consequ 0.762 0.762 0.762 0.770
5.3 smoke quit cessat benefit effect discuss health inform provid motiv 0.770 0.754 0.619 0.429
11.1 nicotin patch week mg quit receiv smoke particip gum nrt 0.500 0.643 0.532 0.500
11.2 manag mood skill smoke stress quit behavior cope intervent neg 0.873 0.873 0.873 0.754
Mean Accuracy values 0.776 0.777 0.705 0.575

Most importantly, we observe from Table 4 that the automated learning of the query proves to be useful in that it outperforms the ‘A’ (fixed threshold) results of Table 3. This has an important implication from the scalability perspective indicating that BCT extraction does not necessarily need to rely on human specified queries. Instead one can learn an effective representation of the BCT from a set of annotated data (in our experiments less than 50% of the data was used for obtaining the query representations).

Automatically formulated queries with M = 2 terms achieves the best overall accuracy of around 0.777, which is close to the ideal accuracy obtained from the unsupervised system (i.e. the overall ‘A*’ result of Table 3). Apparently, short queries tend to perform better than their longer counterparts which indicates that some query terms in manually specified queries may in fact be redundant.

6.3. Supervised BCT Extraction

We now report the performance of the supervised pipeline, where in addition to learning the query from the positive examples in the training set, we also use automatically constructed negative examples to train a text classifier for BCT prediction instead of using a fixed threshold value (see Section 4.3). For retrieving passages from unannotated documents in a training set, we used a variable number of words to define the retrievable units. In particular, we vary the number of words defining a passage of negative evidence between 10 and 200 words.

It can be observed from Table 5 that the effectiveness of the supervised approach is lower than that of the unsupervised and the semi-supervised approaches. This shows that a semi-supervised setting which only uses information from the positive examples works better than a classifier based approach which uses automatically generated negative examples. One of the reasons for the worse performance of the supervised approach is the small amount of training data (less than 50% of the whole collection). However, this quantity of training data represents a more realistic scenario for BCT extraction for the purpose of systematic reviews and justifies the use of semi-supervised methods over supervised ones for this task.

Table 5.

Effectiveness of the text classifier based supervised approach for BCT prediction with different passage length for the negative samples.

Passage length (#words) of negative samples 10 50 100 150 200
1.1.Goal setting (behavior) 0.381 0.484 0.595 0.635 0.627
1.2 Problem solving 0.452 0.484 0.492 0.516 0.500
1.4 Action planning 0.056 0.135 0.198 0.286 0.357
2.2 Feedback on behaviour 0.040 0.111 0.206 0.333 0.373
2.3 Self-monitoring of behavior 0.071 0.214 0.357 0.468 0.619
3.1 Social support (unspecified) 0.706 0.294 0.230 0.222 0.222
5.1 Information about health consequences 0.270 0.595 0.714 0.754 0.770
5.3 Information about social and environmental consequences 0.246 0.444 0.556 0.675 0.690
11.1 Pharmacological support 0.667 0.675 0.595 0.556 0.532
11.2 Reduce negative emotions 0.421 0.500 0.587 0.667 0.722
Mean Accuracy values 0.331 0.394 0.453 0.511 0.541

Table 6 reports a comparison between the unsupervised approach with fixed threshold (set to 0.2) and the best per- forming settings for the semi-supervised and supervised approaches. In addition to the absolute accuracy values for each BCT, we also report two ratios for measuring the relative performance differences between the semi-supervised and the supervised methods over the unsupervised one.

Table 6.

Comparison between the unsupervised (Aunsup with τ = 0.2) and the best performing settings for the semi-supervised Asemi and supervised Asup methods.

BCT Aunsup Asemi Asup Asemi/Aunsup Asup/Aunsup
1.1.Goal setting (behavior) 0.619 0.579 0.627 0.935 1.013
1.2 Problem solving 0.595 0.524 0.500 0.881 0.840
1.4 Action planning 0.722 0.968 0.357 1.341 0.494
2.2 Feedback on behaviour 0.976 0.984 0.373 1.008 0.382
2.3 Self-monitoring of behavior 0.937 0.897 0.619 0.957 0.661
3.1 Social support (unspecified) 0.254 0.786 0.222 3.094 0.874
5.1 Information about health consequences 0.540 0.762 0.770 1.411 1.426
5.3 Information about social and environmental consequences 0.738 0.754 0.690 1.022 0.935
11.1 Pharmacological support 0.579 0.643 0.532 1.111 0.919
11.2 Reduce negative emotions 0.865 0.873 0.722 1.009 0.835
Mean Values 0.683 0.777 0.541 1.138 0.792

It can be seen from the Asemi/Aunsup ratios that the semi-supervised method mostly outperforms the unsupervised one. This is encouraging because it implies that the system can scale to new BCTs in new domains with little supervision (in the form of seed annotations) without manual specification of the queries. For some BCTs, the supervised method shows improvement over the unsupervised one, e.g. BCT ‘1.1’ and ‘5.1’. However, the performance of the supervised method is considerably worse as compared to the unsupervised case. In Table 7 we report the overall average Precision, Recall and F-score of the system. Also in this case, it is clear that the semisupervised approach allows to reach better performances.

Table 7.

Precision,Recall F-score of the three systems. We report the average value among the 10 BCTs.

System Avg. Precision Avg. Recall Avg. F-Score
Unsupervised 0.33 0.18 0.23
Semisupervised 0.32 0.35 0.33
Supervised 0.21 0.32 0.25

7. Conclusions and Future Work

This paper presents initial results related to the development of an information extraction system for extraction of relevant features (in the form of behaviour change techniques or BCTs) from behaviour change studies on smoking cessation. In general, our proposed BCT extraction system first makes use of a passage retrieval framework to identify pieces of text that are potential candidates for indicating the presence or absence of a BCT from a large document (be- haviour change intervention report). We proposed three methodologies: a) an unsupervised approach, where queries are handcrafted and the presence/absence of a BCT is determined by a threshold on the similarity between the retrieved passage and the query; b) a semi-supervised approach, where queries terms are automatically constructed from a seed set of annotated examples and the prediction is again based on the similarity threshold; and c) a supervised approach, where queries are automatically constructed similar to the semi-supervised approach, but where the prediction is made on the basis of a text classifier trained on the set of positive annotations and a set of automatically constructed negative examples from the training set.

Experiments conducted on a set of 226 documents show promising results by demonstrating that the semi-supervised approach outperforms the unsupervised and the supervised ones. This result indicates that the BCT extraction approach can scale well for new attributes with the presence of a small set of seed annotations. Furthermore it also suggests that a supervised approach may not perform well with automatically constructed negative examples. Instead, it may require explicit annotations about the absence of a BCT.

There are three different avenues along which we intend to extend our current work. First, we intend to evaluate BCT extraction over the whole taxonomy of 93 BCTs, a task which requires annotation effort by the domain experts. Second, we intend to investigate more involved supervised approaches (e.g. jointly learning to rank and extract from the passages) using deep-learning based approaches. Third, we intend to make use of the extracted BCTs from selected passages in the paper to construct a knowledge-base that will enable users to retrieve similar pieces of information (i.e. the ones indicating the presence of similar BCTs) from other documents in the collection. This will potentially help to conduct effective systematic reviews by facilitating query (in terms of BCTs) focused navigation through a collection of behaviour change intervention studies.

Acknowledgements

This work was supported by a Wellcome Trust collaborative award as a part of the Human Behaviour-Change Project (HBCP): Building the science of behaviour change for complex intervention development (grant no. 201,524/Z/16/Z). The authors would like to thank the Behaviour Science Team at UCL for the annotations and the interesting discussions. Thanks to Dr. Ailbhe Finnerty for proofreading.

Footnotes

3

tika.apache.org

4

lucene.apache.org/core

5

eppi.ioe.ac.uk/CMS/Default.aspx?alias=eppi.ioe.ac.uk/cms/er4&

6

Accuracy is reported on the positive class, as we do not have explicit annotation of the negative class.

Figures & Table

References

  • [1].Michie Susan, Thomas James, Johnston Marie, Aonghusa Pol Mac, Shawe-Taylor John, Kelly Michael P, Deleris Léa A, Finnerty Ailbhe N, Marques Marta M, Norris Emma, et al. The human behaviour-change project: harnessing the power of artificial intelligence and machine learning for evidence synthesis and interpretation. Implementation Science. 2017;12(1):121. doi: 10.1186/s13012-017-0641-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [2].Ganguly Debasis, Deleris Léa A, Mac P Aonghusa, Wright Alison J, Finnerty Ailbhe N, Norris Emma, Marques Marta M, Michie Susan. Unsupervised information extraction from behaviour change literature. Studies in health technology and informatics. 2018;247:680–684. [PubMed] [Google Scholar]
  • [3].Michie Susan, Richardson Michelle, Johnston Marie, Abraham Charles, Francis Jill, Hardeman Wendy, Eccles Martin P, Cane James, Wood Caroline E. The behavior change technique taxonomy (v1) of 93 hierarchically clustered techniques: building an international consensus for the reporting of behavior change interventions. Annals of behavioral medicine. 2013;46(1):81–95. doi: 10.1007/s12160-013-9486-6. [DOI] [PubMed] [Google Scholar]
  • [4].Wood Caroline E., Richardson Michelle, Johnston Marie, Abraham Charles, Francis Jill, Hardeman Wendy, Michie Susan. Applying the behaviour change technique (bct) taxonomy v1: a study of coder training. Translational Behavioral Medicine. 2015 Jun;5(2):134–148. doi: 10.1007/s13142-014-0290-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Jasch Dennis. Master’s thesis, Australian Institute of Health Innovation (AIHI) Sydney, Australia: 2016. Information Extraction from Clinical Trials. [Google Scholar]
  • [6].Kiritchenko Svetlana, de Bruijn Berry, Carini Simona, Martin Joel, Sim Ida. ExaCT: automatic extraction of clinical trial characteristics from journal publications. BMC Medical Informatics and Decision Making. 2010;10(56) doi: 10.1186/1472-6947-10-56. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Summerscales Rodney L. Automatic Summarization of clinical abstracts for evidence-based medicine. Chicago, Illinois: PhD thesis, Illinois Institute of Technology; 2013. [Google Scholar]
  • [8].Hara Kazuo, Matsumoto Yuji. Extracting clinical trial design information from medline abstracts. New Generation Computing. 2007 May;25(3):263–275. [Google Scholar]
  • [9].Rosemblat Graciela, Graham Laurel, Tse Tony. Proc. of IICAI’12. 2007. Extractive summarization in clinical trials protocol summaries: A case study; pp. 1571–1586. [Google Scholar]
  • [10].Verbeke Mathias, Van Asch Vincent, Morante Roser, Frasconi Paolo, Daelemans Walter, De Raedt Luc. Proc. of EMNLP’12. 2012. A statistical relational learning approach to identifying evidence based medicine categories; pp. 579–589. [Google Scholar]
  • [11].Hiemstra Djoerd. Using Language Models for Information Retrieval. AE Enschede: PhD thesis, CTIT; 2000. [Google Scholar]
  • [12].Kim Su Nam, Martinez David, Cavedon Lawrence, Yencken Lars. Automatic classification of sentences to support evidence based medicine. BMC Bioinformatics. 2011 Mar;12(2):S5. doi: 10.1186/1471-2105-12-S2-S5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [13].Hansen Marie J, Rasmussen Nana, Chung Grace. A method of extracting the number of trial participants from abstracts describing randomized controlled trials. Journal of Telemedicine and Telecare. 2008;14(7):354–358. doi: 10.1258/jtt.2008.007007. PMID: 18852316. [DOI] [PubMed] [Google Scholar]
  • [14].Hassanzadeh Hamed, Groza Tudor, Hunter Jane. Identifying scientific artefacts in biomedical literature: The evidence based medicine use case. Journal of Biomedical Informatics. 2014;49:159–170. doi: 10.1016/j.jbi.2014.02.006. [DOI] [PubMed] [Google Scholar]
  • [15].Lin Sein, Ng Jun-Ping, Pradhan Shreyasee, Shah Jatin, Pietrobon Ricardo, Kan Min-Yen. Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents, Louhi ‘10. Stroudsburg, PA, USA: Association for Computational Linguistics; 2010. Extracting formulaic and free text clinical research articles metadata using conditional random fields; pp. 90–95. [Google Scholar]
  • [16].Luan Yi, Ostendorf Mari, Hajishirzi Hannaneh. Scientific information extraction with semi-supervised neural tagging. CoRR. abs/1708.06075, 2017. [Google Scholar]

Articles from AMIA Summits on Translational Science Proceedings are provided here courtesy of American Medical Informatics Association

RESOURCES