Skip to main content
AMIA Annual Symposium Proceedings logoLink to AMIA Annual Symposium Proceedings
. 2018 Apr 16;2017:1322–1331.

The Role of Surface, Semantic and Grammatical Features on Simplification of Spanish Medical Texts: A User Study

Partha Mukherjee 1, Gondy Leroy 1, David Kauchak 2, Brianda Armenta Navarrete 1, Damian Y Diaz 1, Sonia Colina 1
PMCID: PMC5977682  PMID: 29854201

Abstract

Simplifying medical texts facilitates readability and comprehension. While most simplification work focuses on English, we investigate whether features important for simplifying English text are similarly helpful for simplifying Spanish text. We conducted a user study on 15 Spanish medical texts using Amazon Mechanical Turk and measured perceived and actual difficulty. Using the median of the difficulty scores, we split the texts into easy and difficult groups and extracted 10 surface, 2 semantic and 4 grammatical features. Using t-tests, we identified those features that significantly distinguish easy text from difficult text in Spanish and compare with prior work in English. We found that easy Spanish texts use more repeated words and adverbs, less negations and more familiar words, similar to English. Also like English, difficult Spanish texts use more nouns and adjectives. However in contrast to English, easier Spanish texts contained longer sentences and used grammatical structures that were more varied.

Introduction

Providing simplified text to patients and health information consumers facilitates understanding. Using clear and understandable language is especially relevant in the medical domain as better text comprehension helps create a health-literate patient group. Easier texts help patients remember medical information1 and motivates them to read and understand the text2. Well informed patients can manage their conditions better.

Readers’ education level, language skills and reasoning abilities play a role in their ability to understand, digest and act on information3. Studies that evaluated readability grade levels, presentation and format, and education levels4, 5 show that the education level influences understanding. Demographic differences also impact patient understanding of health education and instructions6; there are significant differences in health literacy with respect to age, gender, academic background and household income, with higher educational attainment and higher household income associated with more adequate health literacy. Advanced age had a negative correlation with adequate health literacy.

There exist several approaches for improving patient health literacy7, however, providing text-based materials is one of the most cost- and time-efficient solutions. Albright et al.8 recommend providing patients and caretakers with written information as it is vital in reinforcing verbal communication between patients and doctors. However, written materials for patient education must be carefully matched to patient reading levels9. Different text features affect text difficulty10 and different approaches have been suggested to increase comprehension. For example, studies advise replacing technical jargon with more common terms or improving the organization of the text11: avoiding long sentences, keeping length to no more than eight to ten words12; organizing the content in question and answer format; and creating a logical flow of information13. Interdisciplinary team collaboration to write and provide information for patients in plain language has also been suggested14. Finally, the use of information technology tools has been advocated to assist with this process and improve the understanding of clinical information and data15, 16

In this study, we examine how different text features interact with text difficulty in Spanish medical texts. Most previous studies have focused on English texts, even though Spanish is the second most used language in the United States17. We leverage our previous work on English features to motivate our feature selection and to help understand how Spanish text simplification might differ from English simplification. Identifying parallel simplification approaches is important for our final goal, which is the development of a multi-lingual text simplification toolkit. To evaluate our set of potentially interesting features, we conducted a user study to estimate the difficulty of texts. We separated the texts into ‘easy’ and ‘difficult’ based on perceived and actual difficulty scores. We then examined feature occurrence in the texts and report on those that were significantly different between the easy and difficult texts. We also compare the Spanish text features with those found important for English text simplification.

Literature Review

Text Simplification

Text simplification is a procedure that transforms text into a clearer and more understandable version while preserving the meaning of the text18. Simplifying texts can facilitate readability for people with disabilities1921, low literacy22, 23, non-native backgrounds24, 25 or non-expert knowledge26, 27. Text simplification may also help improve the performance of many natural language processing (NLP) tasks, such as parsing28, summarization29, 30, semantic role labeling31 and machine translation32.

Formulas to measure text readability include Reading Ease Score (RES), SMOG (Simple Measure of Gobbledygook), Flesh-Kincaid score and Gunning Fog index33. However, these formulas rely on simple, surface- level features such as sentence length, percentage of familiar words and word length and, although they have been used in readability research over the past two decades34, there is little evidence that there is a relationship with actual text difficulty.

We have developed a new feature to measure text difficulty, term familarity2, which can be used to both measure text difficulty and guide text simplification. Term familiarity is measured based on a term/word’s frequency in a large corpus, with low frequency terms being more difficult. A text can be simplified by replacing difficult words with synonyms that are less difficult, as measured by term familiarity 35, 36. Similar to term familiarity, we developed grammar familiarity as a new feature, measured by the frequency of the parse tree structure of the sentence in a large corpus. Grammar familiarity has a similar effect to term familiarity: more commonly used grammatical structures (i.e. higher frequency in the corpus) are considered more familiar and have been shown to be easier to understand37. We have also demonstrated that different types of negation are used in easy and difficult texts: morphological negation is used more frequently in difficult text38. Furthermore, we have shown that splitting long noun phrases is beneficial for improving perceived difficulty, but only affects actual difficulty when it can be done while maintaining the natural flow of the sentence39.

Even though Spanish is one of the most common languages in the US, there is little information about how to determine text difficulty and simplify text in Spanish. Even traditional readability formulas do not entirely fit Spanish text40. Some previous Spanish studies have focused on structural simplification such as sentence splitting, lexical substitution of functional multi-word units41 and re-ordering of syntactic units42. For example, the Simplext project focuses on simplifying vocabulary and syntactic structures43 for people with cognitive disabilities. Others focus on different aspects of the text such as the treatment of idioms and collocations11, the way that the syntactic structure looks13, and the font and colors14.

Actual and Perceived Difficulty

Two components play a role in the difficulty of a text: 1) whether the text looks easy for readers (perceived difficulty) and 2) whether the reader can understand it (actual difficulty). Perceived and actual difficulty independently influence intention to read and the comprehension of materials44. Perceived difficulty is often measured on a 5-point Likert scale by asking participants how difficult the text ‘looks’. Actual difficulty requires measuring understanding and has been accomplished with cloze tests, multiple-choice tests and other measures of comprehension45. A text conveys information effectively when people both want to read it and can understand the content. The willingness to read is related to a text’s perceived difficulty, while comprehension is related to its actual difficulty and specific characteristics. Previously we showed that lexical simplification reduces perceived difficulty, while coherence enrichment reduces actual difficulty45.

Evidence for distinguishing between perceived and actual difficulty comes from two research models. The Health Belief Model (HBM) explains and predicts health-related behaviors. It is defined in terms of perception of four constructs: 1) perceived susceptibility, 2) perceived severity, 3) perceived benefits, and 4) perceived barriers. These concepts were proposed to account for people’s ‘readiness to act’46. It was assumed that diverse demographic, socio- psychological and structural variables might, in any given instance, affect an individual’s perception and thus indirectly influence health-related behavior. The second model is the Theory of Planned Behavior (TPB), which is an extension of the Theory of Reasoned Action (TRA)47. TPB is used to explain deliberate and planned behavior. According to this theory, human action is guided by three kinds of considerations: 1) behavioral beliefs, 2) normative beliefs and 3) control beliefs. Human actions are guided by three concepts that particularly impact willingness to change behavior. The general rule is that the greater the perceived control, the stronger a person’s intention to perform the behavior in question.

Both models highlight the importance of perceived text difficulty. Many readers do not read a text if they feel that it is difficult48. Naturally, perceived difficulty does not tell the entire story. It is insufficient for a text to be perceived as easy; actual difficulty plays a major role in the resulting comprehension of information. We want to both reduce the perceived difficulty and increase text comprehension. Adjusting text based on reading levels will encourage people to read more45.

Methods

To evaluate which text features are important for simplifying Spanish text, we conducted a user study to measure both perceived and actual difficulty of the texts. We then split the texts into easy and difficult based on the perceived and actual difficulty scores and evaluated which text features are indicative of difficult and easy texts.

Text and Content Questions

To analyze the characteristics of medical texts, we created a corpus with 15 texts. The texts are paragraphs from abstracts of scientific articles related to seven different diseases that are the most frequent causes of death in the world49: asthma, cancer, diabetes, hepatitis, hypertension, influenza, leukemia. We chose scientific articles since the content is more difficult to understand50. Each disease was discussed in at least one text, with some diseases represented in as many as six.

For each text, we generated four multiple-choice questions. The first three multiple-choice questions tested content knowledge in the article and participants had to select from four answer options. The fourth multiple-choice question is a conclusion question that asked participants to select the sentence that would best finish the text. We removed the actual last sentence of the text and included this as an option, along with three other incorrect options. We used Amazon Mechanical Turk (MTurk) to recruit participants, with the restrictions that they were US residents and had a 95% approval rate on tasks previously performed for other requesters.

After the study, we pre-processed the text and extracted the different features. The word parts-of-speech (POS) were identified using the Freeling parser51 and the full parse trees were generated using the Stanford parser52. Word frequencies were looked up in the LEXESP53 corpus which contains approximately 120,000 words. The frequency of words not found in LEXESP was assumed to be zero.

Metrics

To measure perceived difficulty, we used a 5-point Likert scale and asked people “How difficult is the text to read?”. The answers were quantified using a 0-4 range, with 0 representing the most difficult and 4 representing the least difficult.

To measure actual difficulty, we calculated the average accuracy of the answers to the multiple-choice questions.

Features

We identified surface, semantic and grammatical features motivated by prior work in English37, 54. The surface and semantic features are averaged over the number of words per text, while the grammatical features are aggregated over the number of sentences per text. “Proportion” below denotes the number of words that have that feature (e.g. words that are nouns) divided by the total number of words in a text.

  • Surface features (10):
    • Number of sentences.
    • Average words per sentence.
    • Function word ratio: proportion of words in the text that are function words (determiners, prepositions and conjunctions).
    • Punctuation ratio: proportion of words in the text that are punctuation characters.
    • Number ratio: proportion of words in the text that are numerals.
    • Negation ratio: proportion of words in the text that are negation words. The negation words are identified by the Freeling parser.
    • Noun/adjective/verb/adverb ratio: proportion of words in the text that are nouns/adjectives/verbs/adverbs in the text, each as a separate feature.
  • Semantic features (2):
    • Repeated words ratio: proportion of the words in the text that are repeated.
    • Term familiarity: We use the frequency of content bearing terms (i.e., nouns, adjectives, verbs, adverbs) in the LEXESP53 database as an approximation of term familiarity.
  • Grammatical features (4): We denote the “grammar structure” of the sentence as the top two levels of the parse tree of a sentence37.
    • Average grammar frequency: Number of different sentence grammar structures in a text, averaged over the texts.
    • Average grammar question frequency: Average number of different grammar structures for the multiple-choice question per text.
    • Average minimum edit distance: Edit distance is the minimum number of edit operations (deletions, substitutions and insertions) to transform one grammar structure into another. We calculate the edit distances of the grammar structures for all adjacent sentence pairs in the text and calculate the average of these values. This measure quantifies how dissimilar (i.e., distant) two grammar strings are55, so lower values indicate more similarity. We consider this measure complementary to the average grammar frequency since it also measures grammatical diversity in a text.
    • Average cosine similarity: We created frequency vectors for each of the constituents in a grammar structure. We then calculated the cosine similarity56 between these vectors for all adjacent sentences and averaged these values per text.

Participants

A total of 66 workers participated in our study with 12 participants per text on average. They were paid $1.50 for completing each text and the associated questions. As is customary with MTurk studies, each participant could choose to complete from 1 to all 15 texts. Overall, an average of 3.9 minutes was needed to read the texts and answer the associated questions. The shortest time spent to complete reading the text and answering the questions was 1.4 minutes and the longest was 14 minutes.

Table 1 provides the participants’ demographic information as self-identified by the participants. Most participants (95%) were less than 50 years old. The majority were with 1) gender: female (52%), 2) race: white (80%), and 3) ethnicity1: Hispanic or Latino (70%). Most had moderate education: 20% have a high school diploma, 26% an associate’s degree and 36% a bachelor’s degree. Thirty-nine percent of the participants spoke half Spanish at home and 35% mostly Spanish.

Table 1.

Demographic information (N=66)

Characteristics N (%)
Age Less than 30 years 30 (45.4)
31 – 40 years 16 (24.2)
41 – 50 years 17 (25.7)
51 – 60 years 2 (3)
61 – 70 years 1 (1.5)
Greater than 70 years -
Gender Male 32 (48.5)
Female 34 (51.5)
Race American Indian/Alaska Native 5 (7.5)
Asian 1 (1.5)
Black 6 (9)
Native Hawaiian/Pacific Islander 1 (1.5)
White 53 (80.3)
Ethnicity Hispanicor Latino 46 (69.7)
Not Hispanic or Latino 20 (30.3)
Education Less than high school 1 (1.5)
High School diploma 13 (19.7)
Associate’s degree 17 (25.7)
Bachelor’s degree 24 (36.3)
Master’sdegree 10 (15.1)
Doctoratedegree 1 (1.5)
Language spokenat home Never Spanish 2 (3)
Rarely Spanish 8 (12.1)
Half Spanish 26 (39.3)
Mostly Spanish 23 (34.8)
Only Spanish 7 (10.6)

Results

Perceived and Actual Difficulty

To analyze the text features, the 15 texts were classified into three categories, easy, middle, and difficult using the median perceived and actual difficulty scores which are calculated based on the participant responses to the questions. Texts with scores higher than the median are considered easy; texts with scores less than the median are difficult. The texts with scores that were nearly the same as the median (i.e., median ± 0.05) were considered in the middle category. To get a clear distinction between ‘easy’ and ‘difficult’ texts, we excluded the middle category from our analysis, which included two texts. Table 2 shows the descriptive statistics for the easy and difficult texts: 6 texts were easy and 7 texts were difficult based on perceived difficulty and 7 texts were easy and 6 texts difficult based on actual difficulty. A /-test confirmed that the easy and difficult splits were significantly different. For the perceived difficulty split, the easy texts had significantly lower perceived difficulty scores with an average difference of almost a full point (p < 0.001). For the actual difficulty split, the easy texts had significantly better accuracy with the easy text resulting in question scores that were 23% (absolute) better (p < 0.001).

Table 2.

Classification of texts

Using Perceived Difficulty Using Actual Difficulty
Easy Difficult Easy Difficult
Number of texts 6 7 7 6
Average Perceived Difficulty 3.130 2.206 NA NA
Average Actual Difficulty NA NA 0.801 0.571

Example of sentences from easy and difficult texts are:

  1. Perceived difficulty
    • Easy: “El manejo de la diabetes tipo 1 en la infancia y adolescencia ha evolucionado en los ultimos anos, como consecuencia de la intensification del tratamiento insulinico y la aceptacion de nuevos objetivos en el tratamiento.”
    • Difficult: “La primoinfeccion por los virus herpes simple (VHS), varicela-zoster (VVZ), citomegalovirus (CMV), herpesvirus humano 6 y virus de Epstein-Barr (VEB) ocasiona hepatitis generalmente leve y autolimitada en pacientes inmunocompetentes.”
  2. Actual difficulty
    • Easy: “Aproximadamente el 75% de las mujeres con cancer de mama avanzado presenta metastasis oseas, lo que les ocasiona una importante morbilidad y deterioro en su calidad de vida.”
    • Difficult: “Con la creciente posibilidad de ocurrencia de una pandemia de influenza en las proximas decadas, es necesario que los niveles subnacionales de gobierno (estados, provincias, municipios) esten adecuadamente preparados con planes operativos basados en los lineamientos generales establecidos por organizaciones multinacionales o gobiernos nacionales.”

Feature Comparison

Features that impact perceived difficulty

To evaluate the differences in feature occurrences between easy and difficult texts, we carried out t-tests for all features. Since the easy/difficult document sets were different for perceived and actual difficulty, we conducted a separate analysis for each. The t-test results for the perceived difficulty partitioning are shown in Table 3. We used Bonferroni correction with significance level (α = 0.05) to reduce Type 1 errors. With 17 features, the level required for statistical significance then becomes 0.05/17 = 0.003.

Table 3.

The t-test results for easy and difficult texts based on perceived difficulty (‘*’ indicates significance with Bonferroni adjustment)

Feature Type Feature Name Easy Difficult p-value
Surface Average number of sentences 2.814 4.07 0 (*)
Average words per sentence 34.741 25.8 0 (*)
Functional words ratio 0.402 0.403 0.924
Punctuations ratio 0.121 0.095 0 (*)
Numbers ratio 0.017 0.002 0 (*)
Negations ratio 0 0.003 0 (*)
Nouns ratio 0.303 0.301 0.892
Adjectives ratio 0.121 0.162 0 (*)
Adverbs ratio 0.029 0.015 0 (*)
Verbs ratio 0.101 0.101 0.986
Semantic Repeatedwordsratio 0.12 0.102 0 (*)
Term familiarity ratio 1344.225 796.768 0 (*)
Grammatical Average grammar frequency 2.815 3.47 0 (*)
Average grammar question frequency 0.350 0.621 0 (*)
Average minimum edit distance 3.451 3.463 0.327
Average cosine similarity 0.473 0.464 0.848
Other Percentaccuracy 0.652 0.571 0.024

Table 3 shows that there are several significant differences between the perceived easy and difficult texts. For surface features, we show, as has been seen in English text, that easy documents contain more adverbs while difficult documents contain more adjectives11. Interestingly, unlike in English text, we did not see any significant differences between the use of function words, nouns and verbs. We also saw similarities with English text for the semantic and grammatical features, with easy documents containing more frequent words and using fewer unique grammatical structures. We also saw more repeated words, punctuation and numbers in easy texts, while difficult texts had more negations.

Features that impact actual difficulty

Table 4 shows the analysis when splitting the text based on actual difficulty. Many of the same patterns appear, with numbers and adjectives occurring more in easy texts and negations more frequently in difficult texts, though a number of the features are not significant in this split. Difficult texts do use more nouns, though, which has been seen in English as well11. Interestingly, while the easy texts had fewer structures (not significant), the structures were more similar in the difficult texts.

Table 4.

The t-test results for easy and difficult texts based on actual difficulty (‘*’ indicates significance with Bonferroni adjustment)

Feature Type Feature Name Easy Difficult p-value
Surface Average number of sentences 3.510 3.750 0.106
Average words per sentence 27.735 34.024 0.002(*)
Functional words ratio 0.392 0.396 0.523
Punctuations ratio 0.114 0.107 0.222
Numbers ratio 0.013 0.003 0 (*)
Negations ratio 0.001 0.003 0 (*)
Nouns ratio 0.288 0.309 0 (*)
Adjectives ratio 0.133 0.155 0 (*)
Adverbs ratio 0.024 0.022 0.39
Verbs ratio 0.106 0.104 0.523
Semantic Repeatedwordsratio 0.114 0.104 0.009
Term familiarity ratio 1127.095 1035.268 0.207
Grammatical Average grammar frequency 3.202 3.434 0.111
Average grammar question frequency 0.524 0.482 0.119
Average minimum edit distance 3.877 2.939 0 (*)
Average cosine similarity 0.49 0.527 0.328
Other Perceived difficulty 0.671 0.634 0.236

Discussion

Many features are significantly different between easy and difficult texts, particularly when perceived difficulty is used to make the distinction. The surface features such as the proportions of numerals, negations and adjectives are significant for both the perceived and actual difficulty and follow the same direction in terms of high and low between easy and difficult text, while average words per sentence follows the opposite direction. Relating to perceived difficulty, the easy texts are shorter in length but have longer sentences and difficult texts have more different grammar structures. Surprisingly, for actual difficulty, sentences in the easy texts have more dissimilar grammar structures. This means the sentences of difficult texts have more similar grammar structures. Neither of the similarity measures plays an important role in identifying difficulty of Spanish texts from the perceived difficulty perspective.

Comparing these findings with those for English texts57, we see that from the perceived difficulty perspective easier Spanish texts have more punctuation and longer sentences. Similar to English, difficult Spanish texts contain more negations38. In English more nouns are seen in difficult texts57, but the proportion of adjectives is significantly higher in difficult Spanish texts. Surprisingly, function word use is not significantly different between easy and difficult text. In contrast, in English texts the higher frequency of function words identifies the text as easy58. Furthermore, easy texts in Spanish contain more adverbs, while easier English texts have more verbs57. Regarding semantic features, easy Spanish texts have a higher term familiarity, which is similar to easier English texts. Unlike English, for Spanish texts the grammar frequency does not have any effect on evaluating how easy a sentence is to understand (i.e., actual difficulty)37.

Some of these features differences between Spanish and English can be partially accounted for because of language differences. Though Spanish word order is similar to English (Subject-Verb-Object), Spanish generally places words that are emphasized at the end of the sentence. This may result in more grammatical variation. Long noun groups (modifier-noun-qualifier) are commonly used in English text, but are troublesome in Spanish59. To express the same meaning, Spanish sentences use prepositions or other avenues instead that may partially account for longer sentences.

Although preliminary, some initial practical observations can be made. Many of the practices used already in English can be continued in Spanish, e.g. use repetition, avoid negations and use familiar words. However, not all practices can be directly followed. For example, it is common in English to shorten sentence (e.g. by splitting long sentences) when simplifying text. In Spanish, we find that longer sentences are easier. Relatedly, extra caution should be used when using readability metrics, since they often incorporate sentence length, with a bias towards shorter sentences.

Looking forward, we plan to incorporate our findings into a tool to assist writers. Directly, the features above can be measured before and after manual simplification to help the writer understand the changes they’ve made and how the changes correlate with these findings. Additionally, suggestions can be made such as identifying more familiar (i.e. higher frequency) synonyms.

Conclusion

We measured perceived and actual difficulty separately in our study and evaluated the effect of different linguistic features that are indicative of easy and difficult Spanish text. Difficult texts have more nouns and adjectives, while easy texts have more adverbs and longer sentences. A Spanish text perceived as less difficult has higher term familiarity. Features identifying difficult texts exhibit somewhat similar behavior for both Spanish and English texts.

Acknowledgement

Research reported in this publication was supported by the National Library Of Medicine of the National Institutes of Health under Award Number R01LM011975. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Footnotes

1

https://grants.nih.gov/grants/guide/notice-files/NOT-OD-15-089.html

References

  • 1.Burgers C, Beukeboom CJ, Sparks L, Diepeveen V. How (not) to inform patients about drug use: use and effects of negations in Dutch patient information leaflets. Pharmacoepidemiology and drug safety. 2015;24(2):137–43. doi: 10.1002/pds.3679. [DOI] [PubMed] [Google Scholar]
  • 2.Leroy G, Kauchak D. The effect of word familiarity on actual and perceived text difficulty. Journal of the American Medical Informatics Association. 2014;21(e1):e169–e72. doi: 10.1136/amiajnl-2013-002172. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Leroy G, Endicott JE, Mouradi O, Kauchak D, Just M, editors. Improving perceived and actual text difficulty for health information consumers using semi-automated methods. AMIA. 2012 [PMC free article] [PubMed] [Google Scholar]
  • 4.Mazor KM, Dodd KS, Kunches L. Communicating hospital infection data to the public: a study of consumer responses and preferences. American Journal of Medical Quality. 2009;24(2):108–15. doi: 10.1177/1062860608330827. [DOI] [PubMed] [Google Scholar]
  • 5.Yan X, Song D, Li X, editors. Proceedings of the 15th ACM international conference on Information and knowledge management. ACM; 2006. Concept-based document readability in domain specific information retrieval. [Google Scholar]
  • 6.Chen G-D, Huang C-N, Yang Y-S, Lew-Ting C-Y. Patient perception of understanding health education and instructions has moderating effect on glycemic control. BMC public health. 2014;14(1):683. doi: 10.1186/1471-2458-14-683. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Egbert N, Nanna KM. Health literacy: Challenges and strategies. The Online Journal of Issues in Nursing. 2009;14(3) [Google Scholar]
  • 8.Albright J, de Guzman C, Acebo P, Paiva D, Faulkner M, Swanson J. Readability of patient education materials: implications for clinical practice. Applied Nursing Research. 1996;9(3):139–43. doi: 10.1016/s0897-1897(96)80254-0. [DOI] [PubMed] [Google Scholar]
  • 9.Cooley ME, Moriarty H, Berger MS, Selm-Orr D, Coyle B, Short T, editors. Patient literacy and the readability of written cancer educational materials. Oncology Nursing Forum. 1995 [PubMed] [Google Scholar]
  • 10.Kauchak D, Mouradi O, Pentoney C, Leroy G, editors. Text simplification tools: using machine learning to discover features that identify difficult text. 2014 47th Hawaii International Conference on System Sciences; 2014; IEEE; [Google Scholar]
  • 11.Nagel K, Wizowski L, Duckworth J, Cassano J, Hahn SA, Neal M. Using plain language skills to create an educational brochure about sperm banking for adolescent and young adult males with cancer. Journal of Pediatric Oncology Nursing. 2008;25(4):220–6. doi: 10.1177/1043454208319973. [DOI] [PubMed] [Google Scholar]
  • 12.Jackson RH, Davis TC, Bairnsfather LE, George RB, Crouch MA, Gault H. Patient reading ability: an overlooked problem in health care. Southern medical journal. 1991;84(10):1172–5. doi: 10.1097/00007611-199110000-00004. [DOI] [PubMed] [Google Scholar]
  • 13.Doak CC, Doak LG, Root JH. Teaching patients with low literacy skills: Lippincott. 1985 [Google Scholar]
  • 14.Wizowski L, Harper T, Hutchings T. Writing health information for patients and families. Hamilton, ON: Hamilton Health Sciences. 2008 [Google Scholar]
  • 15.Johnson SB, Farach FJ, Pelphrey K, Rozenblit L. Data management in clinical research: Synthesizing stakeholder perspectives. Journal of biomedical informatics. 2016;60:286–93. doi: 10.1016/j.jbi.2016.02.014. [DOI] [PubMed] [Google Scholar]
  • 16.Morid MA, Fiszman M, Raja K, Jonnalagadda SR, Del Fiol G. Classification of clinically useful sentences in clinical evidence resources. Journal of biomedical informatics. 2016;60:14–22. doi: 10.1016/j.jbi.2016.01.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.González-Barrera A, López MH. Spanish is the most spoken non-English language in US homes, even among non-Hispanics. Pewresearch org. 2013;13 [Google Scholar]
  • 18.Bott S, Saggion H, editors. Proceedings of the Workshop on Monolingual Text-To-Text Generation. Association for Computational Linguistics; 2011. An unsupervised alignment algorithm for text simplification corpus construction. [Google Scholar]
  • 19.Carroll J, Minnen G, Pearce D, Canning Y, Devlin S, Tait J, editors. Simplifying text for language-impaired readers. Proceedings of EACL. 1999 [Google Scholar]
  • 20.Canning Y, Tait J, Archibald J, Crawley R, editors. International Workshop on Text, Speech and Dialogue. Springer; 2000. Cohesive generation of syntactically simplified newspaper text. [Google Scholar]
  • 21.Inui K, Fujita A, Takahashi T, Iida R, Iwakura T, editors. Proceedings of the second international workshop on Paraphrasing-Volume 16. Association for Computational Linguistics; 2003. Text simplification for reading assistance: a project note. [Google Scholar]
  • 22.Watanabe WM, Junior AC, Uzeda VR, Fortes RPdM, Pardo TAS, Aluisio SM, editors. Proceedings of the 27th ACM international conference on Design of communication. ACM; 2009. Facilita: reading assistance for low-literacy readers. [Google Scholar]
  • 23.De Belder J, Moens M-F, editors. Proceedings of the SIGIR workshop on accessible search systems. ACM; 2010. Text simplification for children. [Google Scholar]
  • 24.Petersen SE, Ostendorf M, editors. Text simplification for language learners: a corpus analysis. SLaTE. 2007 [Google Scholar]
  • 25.Allen D. A study of the role of relative clauses in the simplification of news texts for learners of English. System. 2009;37(4):585–99. [Google Scholar]
  • 26.Elhadad N, Sutaria K, editors. Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing. Association for Computational Linguistics; 2007. Mining a lexicon of technical terms and lay equivalents. [Google Scholar]
  • 27.Siddharthan A, Katsos N, editors. Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics; 2010. Reformulating discourse connectives for non-expert readers. [Google Scholar]
  • 28.Chandrasekar R, Doran C, Srinivas B, editors. Proceedings of the 16th conference on Computational linguistics-Volume 2. Association for Computational Linguistics; 1996. Motivations and methods for text simplification. [Google Scholar]
  • 29.Siddharthan A, Nenkova A, McKeown K, editors. Proceedings of the 20th international conference on Computational Linguistics. Association for Computational Linguistics; 2004. Syntactic simplification for improving content selection in multi-document summarization. [Google Scholar]
  • 30.Klebanov BB, Knight K, Marcu D. OTM Confederated International Conferences” On the Move to Meaningful Internet Systems”. Springer; 2004. Text simplification for information-seeking applications. [Google Scholar]
  • 31.Vickrey D, Koller D, editors. Sentence Simplification for Semantic Role Labeling. ACL. 2008 [Google Scholar]
  • 32.Chen H-B, Huang H-H, Chen H-H, Tan C-T, editors. A simplification-translation-restoration framework for cross-domain SMT applications. COLING. 2012 [Google Scholar]
  • 33.Barrio-Cantalejo I, Simon-Lorda P, Melguizo M, Escalona I, Marijuan M, Hernando P, editors. Validation of the INFLESZ scale to evaluate readability of texts aimed at the patient. Anales del sistema sanitario de Navarra. 2007 doi: 10.4321/s1137-66272008000300004. [DOI] [PubMed] [Google Scholar]
  • 34.Benjamin RG. Reconstructing readability: Recent developments and recommendations in the analysis of text difficulty. Educational Psychology Review. 2012;24(1):63–88. [Google Scholar]
  • 35.Leroy G, Endicott JE, Kauchak D, Mouradi O, Just M. User evaluation of the effects of a text simplification algorithm using term familiarity on perception, understanding, learning, and information retention. Journal of medical Internet research. 2013;15(7):e144. doi: 10.2196/jmir.2569. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Drndarevic B, Saggion H, editors. Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations. Association for Computational Linguistics; 2012. Towards automatic lexical simplification in Spanish: an empirical study. [Google Scholar]
  • 37.Kauchak D, Leroy G, Hogue A. Measuring Text Difficulty Using parse-tree frequency. Journal of the Association for Information Science and Technology. 2017 doi: 10.1002/asi.23855. Forthcoming. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Partha Mukherjee, Gondy Leroy, David Kauchak, Srinidhi Rajanarayanan, Damian Y, Romero Diaz, Yuan Nicole P, et al. NegAIT: A new parser for medical text simplification using morphological, sentential and double negation. Journal of Biomedical Informatics. 2017;69:55–62. doi: 10.1016/j.jbi.2017.03.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Leroy G, Kauchak D, Hogue A. Effects on text simplification: Evaluation of splitting up noun phrases. Journal of Health Communication. 2016;21(sup1):18–26. doi: 10.1080/10810730.2015.1131775. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Who’s reading your writing: How difficult is your text2007:[1-13 pp.] Available from: https://medicine.osu.edu/orgs/ahec/chcp/modulecontent/pages/who’sreadmgyourwritmghowdifficultyisyourtext.aspx.
  • 41.Bott S, Saggion H. Spanish text simplification: An exploratory study. Procesamiento del lenguaje natural. 2011;47:87–95. [Google Scholar]
  • 42.Bott S, Saggion H, Mille S, editors. Text simplification tools for Spanish. LREC. 2012 [Google Scholar]
  • 43.Saggion H, Martínez EG, Etayo E, Anula A, Bourg L. Text simplification in simplext. making text more accessible. Procesamiento del lenguaje natural. 2011;47:341–2. [Google Scholar]
  • 44.Leroy G, Helmreich S, Cowie JR. The influence of text characteristics on perceived and actual difficulty of health information. International journal of medical informatics. 2010;79(6):438–49. doi: 10.1016/j.ijmedinf.2010.02.002. [DOI] [PubMed] [Google Scholar]
  • 45.Leroy G, Kauchak D, Mouradi O. A user-study measuring the effects of lexical simplification and coherence enhancement on perceived and actual text difficulty. International journal of medical informatics. 2013;82(8):717–30. doi: 10.1016/j.ijmedinf.2013.03.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Janz NK, Becker MH. The health belief model: A decade later. Health education quarterly. 1984;11(1):1–47. doi: 10.1177/109019818401100101. [DOI] [PubMed] [Google Scholar]
  • 47.Ajzen I, Fishbein M. Understanding attitudes and predicting social behaviour. 1980 [Google Scholar]
  • 48.Beaver K, Luker K. Readability of patient information booklets for women with breast cancer. Patient Education and Counseling. 1997;31(2):95–102. doi: 10.1016/s0738-3991(96)00988-3. [DOI] [PubMed] [Google Scholar]
  • 49.Mathers C, Fat DM, Boerma JT. The global burden of disease: 2004 update. World Health Organization. 2008 [Google Scholar]
  • 50.Pirozzoli AMC. Application of a readability score in informed consent forms for clinical studies. 2013 [Google Scholar]
  • 51.Lloberes M, Castellón I, Padró L, editors. Spanish FreeLing Dependency Grammar. LREC. 2010 [Google Scholar]
  • 52.Stanford CoreNLP - a suite of core NLP tools Stanford, USA: Stanford CoreNLP. Available from: http://stanfordnlp.github.io/CoreNLP/
  • 53.Sebastián-Gallés N, Martí M, Carreiras M, Cuetos F. Barcelona: Universitat de Barcelona; 2000. LEXESP: Una base de datos informatizada del español. [Google Scholar]
  • 54.Mouradi O, Leroy G, Kauchak D, Endicott JE, editors. System Sciences (HICSS), 2013 46th Hawaii International Conference on. IEEE; 2013. Influence of text and participant characteristics on perceived and actual text difficulty. [Google Scholar]
  • 55.Punyakanok V, Roth D, Yih W-t. Mapping dependencies trees: An application to question answering. Proceedings of AI&Math 2004. 2004 [Google Scholar]
  • 56.Canhasi E. Measuring the sentence level similarity. 2013 [Google Scholar]
  • 57.Leroy G, Rodriguez E, Armenta B. Arizona Uo., editor. Determining text difficulty by word frequency and parts-of-speech analysis in health information. Latin American Summer Research Program2014 [Google Scholar]
  • 58.Leroy G, Endicott JE, editors. International Conference on Asian Digital Libraries. Springer; 2011. Term familiarity to indicate perceived and actual difficulty of text in medical digital libraries. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Coe N, Swan M, Smith B. Cambridge University Press; 1987. Learner English: a teacher’s guide to interference and other problems. [Google Scholar]

Articles from AMIA Annual Symposium Proceedings are provided here courtesy of American Medical Informatics Association

RESOURCES