Abstract
This study explores the extent to which Grammarly can be a reliable assessment tool for academic English writing. Ten articles published in high-status scholarly Q.1 journals and written by specialist English native speakers were used to evaluate the accuracy of Grammarly's flagged issues. The results showed that Grammarly tends to over-flag many issues resulting in many false positives; besides, it does not take into consideration optional usage in English. The study concluded that although Grammarly can identify many ambiguous instances of language use that writers would do well to review and consider for revision, it does not seem to be a reliable tool for assessing academic written English.
Keywords: Academic writing, Assessment, False positives, Grammarly
1. Introduction
Automated Writing Evaluation (AWE) tools are widely used nowadays by many students and teachers [1]. Grammarly is one of these tools, and it is used by 30 million people and 30, 000 teams. Grammarly is claimed to be the world's most accurate grammar checker [2]. In fact, Grammarly claims its ability to provide correct and reliable instant feedback for different aspects of writing including correctness and clarity. Still, little research has been done to examine to what extent Grammarly is reliable in assessing academic writing. Earlier studies focused on non-native speakers' written work, where mistakes are expected to occur, e.g., Ref. [3]. Other studies compared the performance of Grammarly with other automated engines such as Word and Ginger, e.g., Refs. [4,5]. Still, other studies compared the feedback given by Grammarly with that given by human raters, e.g., Refs. [6,7,8]. However, no study has examined how Grammarly deals with well-written English produced by competent native speakers. In this study, we attempt to find out the reliability of Grammarly's feedback by examining whether the flagged errors are actual errors or not (see Section 2).
Grammarly checks the language of written work and lists all issues in the order of occurrence within it, and it scores each written work on a 100-point scale after comparing it with other documents written with the same goals and in the same domain [9]. Scores are given depending on the number of words and the types and numbers of detected issues [9].
Grammarly comes in two versions: a free one and a premium one. The difference between them lies in the number and types of issues they detect. The premium version offers over 400 types of checks and features, while the free version checks 150.
Upon submitting a file into Grammarly, users can select their goals, the intended audience of their writing and other features that fit their documents such as level of formality, academic domain, and the language preference (e.g., American/British) [9].
The suggestions/issues given by Grammarly fall into four types (3). First, correctness issues relate to grammar, spelling, and punctuation. Second, clarity and conciseness issues (e.g., passive voice use and unclear/wordy sentences) relate to the degree to which the written material is smooth and easy to understand. Third, engagement issues offer synonyms for bland and overused words. Finally, delivery issues relate to formality, politeness, usage conventions, and friendliness [9]. Here are some screenshots and examples taken from Grammarly's website to explain how Grammarly works (Gramamrly.com).
Assessment of academic writing is not restricted to surface level issues such as grammar, punctuation, and spelling; it involves other areas such as diction, coherence, and ideas/content [9]. Grammarly claims that it can help not only with surface-level issues but also with clarity, engagement, and delivery issues (but not content issues, which are still beyond the capabilities of any tool (11).
Differentiating between errors and other issues relating to style or optional use is in order here. In this study, errors refer to “morphological, syntactic, and lexical forms that deviate from rules of the target language, violating the expectations of literate adult native speakers” [10]. This means that issues relating to style such as the use of a serial comma (see Section 5.2) are not errors, and they cannot be considered as such in assessing a piece of writing. If Grammarly flags these cases as language issues/errors, this is considered as a case of over-flagging, which refers to flagged errors that are not actual errors [11]. That said, we believe that teachers or AWE tools should not penalize a writer for using/not using an optional case. Giving a score (by a teacher or an AWE tool) that counts such cases is problematic (see Section 5.1). Finding out how reliable Grammarly can be as an assessment tool is crucial as it has several repercussions in the field of English as a second/foreign language [[3], [6], [12]]. A learner of English using Grammarly to check the accuracy of their writing could probably assume that the feedback given by Grammarly is accurate and would accept all suggestions, which could result in negative consequences if the feedback was not totally correct (10). A teacher of English, under the pressure of time constraints and especially with the wide use of online teaching, is very likely to use Grammarly to check their students’ writing assignments. If Grammarly was not accurate, it would be confusing, and even harmful to learners and teachers by giving them wrong feedback (9).
Moreover, Grammarly might be used to assess the writing of non-native English researchers and teachers [[7], [8], [13], [14]]. It could be the case that journal reviewers, especially some non-native speakers of English might be tempted to use Grammarly to assess the English language of articles (personal experience).1If Grammarly incorrectly flags many issues and therefore gives the article a misleading low score, reviewers might reject the paper assuming that Grammarly's feedback was accurate. Finally, students or even colleagues, might use Grammarly to assess the accuracy of the language of English teachers. We believe that any score that is not close to 100 % might be interpreted by students or colleagues as poor competence on the part of teachers. To recapitulate, if Grammarly was not a reliable assessment tool with accurate feedback, it would result in serious problems for all stakeholders [6,13,7].
The current study thus aims at exploring the accuracy of Grammarly's feedback by analyzing Grammarly reports for ten articles written by specialist English native speakers and published in high-status scholarly linguistics journals. Assuming that these researchers use academic English professionally, and the articles were peer-reviewed and proofread by the journals, we expect the articles to receive a very high score on Grammarly and to be without any noticeable writing issues. The two main questions that the current study attempts to answer are: 1) Is the overall score given to a piece of writing by Grammarly reliable, and 2) Are all flagged issues actual errors. Answering these two questions will show to what extent Grammarly is a reliable assessment tool for academic written English. More specifically, the main research objectives of this study are.
-
1.
Find the overall score given by Grammarly to articles written by specialist English native speakers.
-
2.
Determine the types of writing issues Grammarly flag and the extent to which these errors are actual errors.
By achieving these objectives, our study aims to contribute to identifying the extent to which Grammarly can be a reliable assessment tool, and more importantly, we identify the areas where Grammarly needs to improve its performance.
2. Literature review
There seems to be a consensus that using Grammarly is beneficial to English language users in general, although it has some weaknesses. Grammarly can catch typos, grammatical errors, and punctuation mistakes that human eyes might miss, especially in long documents; in addition, Grammarly can suggest improvements to sentence structure, word choice, and clarity, leading to easier-to-understand writing [6,6,12,15,16,17]. For example [3], exploring the reactions of three Indonesian postgraduate students, reported that the three students believed that Grammarly had several strengths, including valuable feedback with explanations and examples, high speed, and ease of access, but it provided misleading feedback sometimes.33
It has also been found that Grammarly could help even low-level English learners to improve their writing, although it was not beneficial in avoiding errors related to word forms and word usage (15). Grammarly may suggest unnecessary changes or miss genuine errors, requiring user judgment. Also, Grammarly primarily focuses on mechanics and may not address higher-level writing concerns like argument flow or logical fallacies [18].
Comparing the feedback given by Grammarly and the feedback given by writing centers and academic writing consultants, researchers argued that the feedback given by human raters was more useful and effective than that of Grammarly [7,19]. For instance Ref. [19], compared Grammarly's and 10 consultants' feedback given to three essays (about 700 words each) written by first year students in a writing center. She argued that the feedback given by human raters was better than Grammarly's feedback due to the number of repeated comments and the complex and inaccurate terms given by Grammarly. She added that while Grammarly offers valuable feedback, human raters still outperform automated tools in terms of providing subtle feedback and understanding the context of the text. In addition, human raters can provide more tailored suggestions, taking into account the specific requirements of the writing task and the intended audience.
Likewise [8], compared the accuracy and type of Grammarly's feedback with that given by human raters by using 56 essays written by first-year students majoring in English in an Armenian university. The results showed that Grammarly's feedback was mostly accurate with few inconsistencies, while some errors, detected by human raters, were undetected by Grammarly. The study recommended using the feedback given by both Grammarly and human raters.
Studies that compared Grammarly with other engines such as Word and Ginger concluded that Grammarly tended to outperform other engines [e.g., 18, 19]. For example [7], explored the perceptions of academic advisors of Grammarly's feedback compared with the feedback given by Word. They reported that Grammarly's feedback was perceived to be more effective than that of Word; however, Grammarly was not without problems and therefore the authors recommended that Grammarly be used together with academic advisors' feedback. They added that Grammarly offers a more advanced error-checking system compared to Word's basic spellchecker; Grammarly goes beyond basic grammar, suggesting improvements for sentence clarity and conciseness.
Likewise [4], examined the efficiency of five free grammar checkers, namely Grammarly, ProwritingAid, Ginger, After the Deadline, and Language Tool. To check their efficiency, the study used a collection of 500 sentences containing grammatical errors, and the results showed that Grammarly did better than the other checkers in detecting errors. The researchers concluded that Grammarly was the best grammar checker, a finding that agrees with the results of [12], who reported that Grammarly was useful in detecting and reducing grammatical errors.
Finally [5], comparing Grammarly with Microsoft Word spelling and grammar checker, reported that Grammarly was generally better than Microsoft Word with a precision rate of 0.88 (i.e., 88 % of the errors it identified were, in fact, errors), while its correction rate was slightly lower (0.83); that is, 17 % of the errors were not flagged.
To sum up, Grammarly tends to be a useful engine that helps improve its users’ English, albeit it has some weaknesses. However, no study has been wholly devoted to examining the reliability of Grammarly as an assessment tool.
Bearing in mind that AWE tools are in constant development, this study is an attempt to establish the extent to which Grammarly is a reliable assessment tool for academic writing. Unlike previous studies, it uses academic English in use as written by specialist native speakers of English and published in academic top-tier refereed journals (cf. Methodology).
3. Methodology
Ten articles were used in this study.2 The criteria for selecting these articles were as follows. First, they must be written by English native speaker researchers; second, they must be published in Q1 Scopus-indexed well-established high-status journals in the field of linguistics. We used Scopus as a database, and we selected Q1 journals in the field of linguistics. We selected papers written by potential native speakers (by name and affiliation). This was further verified by checking their profiles and CVs on the Internet. When it was not evident that the writer was a native speaker, s/he was not selected. Note that selecting native speakers rather than non-native speakers does not entail that all native speakers are better academic writers than non-native speakers. Rather, we assume that specialist native speakers are more likely to be competent academic writers due to their long exposure to the language [20]. More than 30 papers were first checked and all of them were published between 2004 and 2021. Of these 30 papers, we randomly selected 10 papers: five were written by American native speakers and 5 were written by British ones.
We selected articles published in Q1 Scopus-indexed well-established high-status journals in the field of linguistics to ensure that the articles are of the highest quality. The 10 articles appeared in the following journals: Applied Linguistics, International Journal of Bilingualism, Journal of Writing Research, Language Learning and Technology, Language Teaching, Natural Language and Linguistic Theory, Sage journals, Studies in Second Language Acquisition, and The Modern Language Journal. Note that we limited our analysis to the field of linguistics for three reasons: first, the field of linguistics is concerned with language and therefore it is expected that researchers writing in this field pay more attention to language issues. Second, the paper is concerned with academic writing, which is closely related to the field of linguistics. The final reason relates to space limitations. Covering other disciplines will render the study rather long and therefore we leave this for future research.
The PDF articles were converted into word formats using online converting tools, as Grammarly does not accept PDF formats. This was followed by careful cross-checking by the researchers for any converting issues. The researchers uploaded the articles (excluding tables, footnotes, endnotes, and references) one by one to Grammarly Premium to receive a complete analysis including correctness, clarity, delivery, and engagement. Recall that correctness covers grammatical, spelling, and punctuation mistakes. Clarity relates to the use of wordy and unclear sentences including the use of passive voice. Delivery issues refer to the appropriate use of English with respect to the level of formality, politeness, usage conventions, and friendliness. Engagement issues relate to the use of synonyms and avoiding the use of bland and overused words.
The following goals were selected: knowledgeable audience, formal language, academic domain, and American or British English (according to the writer's choice). The authors selected formal language and academic domain as this study aimed to assess Grammarly in academic writing within the field of linguistics, where formal, academic language is used. Note that the references were not included as they contain proper names and might be written with different formatting styles. Few footnotes in the articles were not included for technical reasons.
All the issues in the reports were checked one by one to verify that no errors were caused by the conversion of the PDF files. A dataset of all issues/suggestions given by Grammarly for each writer was compiled and classified. The 10 writers are referred to in this study as W1, W2 ⋯ W10. No noticeable differences were found between the American and British writers; therefore, all issues were combined and analyzed for all writers.
No sophisticated statistical tools were needed as the paper's focus was on frequencies and percentages and no comparisons between writers were needed. We counted the number of mistakes and categorized them using an Excel sheet. The researchers then checked the flagged issues to find out whether they were correctly identified or not. False positives are used in this study to refer to flagged errors that are not actual errors and accepting Grammarly's suggestions results in an error. For example, spotting the use of ‘in’ as an error in the phrase ‘our focus in teaching grammar should be on functions rather than on forms’, is considered a false positive, and changing the preposition will yield a grammatical error. Optional usage refers to issues that are not true errors but represent cases where writers have more than one choice, e.g., spelling ‘non-native’ with or without a hyphen. Note that false negatives (cases where Grammarly fails to identify errors) were not dealt with in this study. This is because the articles were written by specialist English native speakers and proofread by the journals. In fact, we did not come across any undetected errors in these articles. Besides, false positives are more detrimental than false negatives in second/foreign language acquisition, and AWE developers give priority to false positives [5,21].
Uneasily identified issues were checked in grammar usage references and in online corpora. Google scholar and the Corpus of British Academic Written English (BAWE) [22] were used to further explore the usage of articles, hyphenation, wordy phrases, and collocations/concordances. BAWE is a British corpus with eight million words representing academic English written at UK universities. All issues were checked on Google Scholar and BAWE on the 10th and March 11, 2022. Note that the use of Google Scholar here does not mean that written work on Google Scholar is always correct. We resort to Google Scholar to show cases where we know that the flagged item is not an error and to show that this is commonly used on academic search engines. To further check such cases, we requested three English native speakers who teach academic writing at a university level to judge whether Google Scholar usage is correct in academic English to ensure that they are not common mistakes. This was done without telling them that these cases come from Grammarly or Google Scholar.
4. Results and discussion
In this section, we first present the overall results of all articles and then detail the results according to the type of issue.
4.1. Overall results
The scores given to each article and the number of writing issues in each article are shown in Table 1.
Table 1.
Overall results.
Writers | Score | All writing issues | Correctness issues | Clarity issues | Engagement issues | Delivery issues |
---|---|---|---|---|---|---|
W1 | 85 | 255 | 92 | 112 | 19 | 32 |
W2 | 76 | 294 | 166 | 98 | 13 | 17 |
W3 | 86 | 255 | 83 | 127 | 28 | 17 |
W4 | 86 | 329 | 98 | 173 | 37 | 21 |
W5 | 83 | 357 | 62 | 215 | 24 | 56 |
W6 | 86 | 218 | 93 | 98 | 16 | 11 |
W7 | 80 | 512 | 174 | 260 | 58 | 29 |
W8 | 83 | 479 | 181 | 220 | 38 | 40 |
W9 | 83 | 300 | 109 | 157 | 19 | 15 |
W10 | 82 | 401 | 138 | 204 | 37 | 22 |
Total/average | 830/83 | 3400/340 | 1196/120 | 1664/166 | 289/29 | 260/26 |
Range | 76–86 | 218–512 | 62–174 | 92–260 | 13–58 | 11–56 |
As is clear from Table 1, Grammarly reports show that all the writers had writing issues that affected their overall scores. The average score for all writers stood at 83, which is unexpected given that the writers represent well-qualified, accomplished English writers. It is very likely that such a score would be disappointing to non-native speakers, let alone specialist English native speakers.
4.2. Correctness issues
The total number of correctness issues was 1196. This number is alarming, and it could suggest that these articles’ writers have serious problems in their grammatical competence, had these issues been true errors. We will show here that few of these issues were actual errors (57 errors (4.8 %)), and Grammarly has a serious problem of over-flagging. For ease of exposition, we will group these issues according to their types.
4.2.1. Punctuation issues
Punctuation issues were the most common totaling 522. The following table presents the most common punctuation issues with their frequency for all writers. In the last three columns, we present the number of actual errors in the articles, the number of issues that relate to optional usage, and the number of false positives.
As is clear from Table 2, on closer inspection, we found that of all these 522 cases, only two were true errors, and one of them appeared in a participant's quotation, which means it was not an error made by the article's writer. All the other flagged issues related to optional uses of punctuation marks except for four cases that were false positives. A false positive appeared in ‘with recent events regarding ICE and the camps immigrants are being put into, the mindset … ’ (W4). Grammarly incorrectly suggested using a comma after ‘camps’, which will render the sentence ungrammatical. An example of a false positive with the semicolon appeared in ‘[a]n expository essay and a discussion essay are classified together in the Essay genre family; a book review and … in the Critique genre family; and an annotated bibliography … classified in the Literature Survey genre family’ (W10). Here, the semi-colon is used correctly to link independent clauses and the use of ‘and’ is necessary to link the third clause to the other clauses. Grammarly incorrectly suggested using a comma before ‘and’ and a semicolon after it, which results in an error. The closing punctuation false positive related to a heading in a paper where Grammarly suggested using a period after the title thinking it was a sentence in the body of the article.
Table 2.
Punctuation issues.
Punctuation issues | Total | Actual errors/percentage | Optional usage/percentage | False positives/percentage |
---|---|---|---|---|
Punctuation in compound/complex sentences | 365 | 1/0.3 % | 364/99.7 % | 0/0 % |
Comma misuse | 102 | 0/0 % | 101/99 % | 1/1 % |
Misuse of semicolons, quotation marks, etc. | 53 | 0/0 % | 51/96 % | 2/4 % |
Closing punctuation | 2 | 1/50 % | 0/0 % | 1/50 % |
Total | 522 | 2/0.3 % | 516/99 % | 4/0.7 % |
All the other 516 cases can be related to optional usage of punctuation marks. That is, these issues flagged by Grammarly are not true errors; however, accepting Grammarly suggestions will not result in errors. The following table presents illustrative examples of the most common optional cases.
As is clear from Table 3, in the first five examples, Grammarly did not take into consideration that using commas here is optional [23,24]. [23] explained that using a comma after an introductory word/phrase is optional, using a comma with short prepositional phrases also depends on the writer's preference, and using a comma with sentence adverbs like ‘therefore’ relies on the writer's judgment if these adverbs are parenthetical insertions or well-integrated into the sentence. In example 3, although the comma is not needed, it cannot be considered an error especially if the dependent clause is not closely related to the proposition in the main clause [23]. All these cases of using commas are optional and depend on ‘the writer's judgment and aesthetic considerations' [23]. Hence, they cannot be considered errors. In the other examples, the punctuation marks were correctly used by the writers, but Grammarly spotted them as errors without giving any suggestions.
Table 3.
Punctuation optional cases.
Writer's use | Grammarly's suggestion | Remarks |
---|---|---|
1. In this section I consider (W1). | →section, | In the first four examples here, the use of the comma is optional. |
2. Coefficients therefore represent (W4). | →, therefore, | |
3. This is a puzzling pattern, since neither the (W1), | →pattern since | |
4. Certain types of student writing, for example proposals and (W10). | →writing for example | |
5. The hidden power of media discourse and the capacity of… power-holders to exercise this power (W7). | … (Grammarly spotted the use of the three dots used by the writer for ellipsis as an issue.) | This use is the norm in English to show ellipsis and cannot be considered an error. Here, we classify this as an optional case, rather than a false positive because Grammarly did not provide a suggestion that can result in an error. |
6. In class, participants were instructed to: (a) (W9). | : (The use of the colon was spotted as an issue.) | The writers correctly used the colon to introduce a list, but Grammarly flagged this as an error without providing any suggestions. |
7. Comparative forms ansab ‘more appropriate’, (W2). | appropriate’,→ appropriate,' | Grammarly considered the quotations marks around words given as translation of words form other languages as errors. These are not errors, and they follow the standard way of using words from other languages in academic journals. Note that Grammarly sometimes flagged such cases as ‘improper formatting’ (c f. Section 5.2.4). |
The optional use of these punctuation marks needs to be taken into consideration, and such cases should not be counted as errors that affect the overall score. This is because the rules governing the use of punctuation marks are much debated in English, even among professional editors and major style guides and dictionaries; very often using or leaving a punctuation mark is a matter of choice or style [23].
4.2.2. Grammatical issues
The total number of grammatical issues was 320. Table 4 gives more details on grammatical issues.
Table 4.
Grammatical issues.
Grammatical issues | Total | Actual errors/percentage | Optional cases/percentage | False positives/percentage |
---|---|---|---|---|
Determiner use | 145 | 2/1.3 % | 119/82 % | 24/16.5 % |
Misplaced words | 35 | 4/11 % | 27/78 % | 4/11 % |
Wrong prepositions | 24 | 5/20 % | 12/60 % | 7/20 % |
Faulty S–V agreement | 21 | 8/38 % | 3/14 % | 10/47.6 % |
Pronoun use | 15 | 6/40 % | 8/53 % | 1/6.6 % |
Incorrect verb forms | 22 | 7/31.8 % | 5/22.7 % | 10/45.5 % |
Incorrect noun number | 18 | 2/11 % | 9/50 % | 7/39 % |
Faulty tense sequence | 13 | 2/14 % | 10/79 % | 1/7 % |
Incomplete sentences | 10 | 1/10 % | 0/0 % | 9/90 % |
Conjunction use | 8 | 2/25 % | 4/50 % | 2/25 % |
Misuse of modifiers/quantifiers | 6 | 1/0 % | 0/0 % | 5/100 % |
Faulty parallelism | 3 | 0/0 % | 0/0 % | 3/100 % |
Total | 320 | 40/12.5 % | 197/61.5 % | 83/25.9 % |
As can be seen from Table 4, only 40 issues were true errors. 22 of them were made by the writers (many of them were typos), and 18 appeared in participants’ quotations or resulted from the word-by-word translation of example sentences from other languages.
For errors appearing in participants' quotes, these cannot be counted as errors made by the writers, and they should not affect the overall score. The writers almost always indicated that the sentence had an error using [sic], as in ‘[i]t contributed to stress but resulted [sic] a simple rough draft’ (W10). Grammarly identified this as an error but failed to take into consideration that this was a quotation given by a study participant and cannot be changed in research. Similarly, the word-by-word translations cannot be considered errors because the writers used them on purpose following journals' guidelines that require example sentences from other languages to be reported in this way. For example, writer 2 used the following sentence from Arabic ‘Ahmed taller from Basem’. This sentence is not grammatically correct in English; nevertheless, counting it as an error affecting the writer's score is not justified; the writer did not make an error here and did not mean to write an English sentence; in fact, the writer gave the correct English translation next to the word-by-word translation. If this was checked by a human rater, it is very likely that it would not be marked as an error.
The remaining 81 issues were false positives. Grammarly incorrectly flagged these as errors and offered suggestions to fix them, which will result in explicit grammatical errors. For example, Grammarly incorrectly flagged the use of ‘who’ as a pronoun use error in ‘if he were teaching students who all spoke English … ’ (W5). Grammarly suggested using ‘whom’, which will render the sentence ungrammatical. More illustrative examples of false positives are given in Table 5.
Table 5.
Grammatical false positives.
Writer's use | Grammarly's suggestion | Remarks |
---|---|---|
1. Among these are constraints that mandate particular types of prosodic structure (universally preferred syllables, feet, and prosodic words) and universally preferred patterns of alignment between prosodic and morphological categories. (W1) | → particular mandate | Grammarly flagged ‘mandate particular’ as a misplaced word thinking that ‘mandate’ was used as a noun, not a verb. |
2. A case in point is a recent relevant edited collection (Fairclough et al., 2007), in which almost one in five articles is informed by corpus analysis. (W7) |
is→ are | Grammarly was misled by the presence of interrupting phrases between the subject and the verb. |
3. Fig. (2) shows how the disciplines of Food Science, Chemistry, Engineering, and Meteorology cluster at the positive end of Dimension 1, together with the Methodology Recount and Design Specification genre families, all of which have means greater than +8. (W10) | means→ mean | Grammarly incorrectly suggested using the past participle instead of the direct object assuming that a verb is needed after ‘have’. |
4. A sudden transition from teaching in their native language to teaching in English (W5) | teaching→ teach | Grammarly mistakenly assumed that ‘to’ must be followed by the infinitive, failing to realize that ‘to’ here is a preposition that needs a noun/gerund. |
5. The participants in this study were 20 adult (mean age: 20 years, range: 19–23; 17 female) native-English-speaking undergraduate students (W6) | adult→ adults | Grammarly was misled by the parenthetical information and suggested ‘adults’. |
6. He found that social reading created a zone of proximal development for less expert readers, …(W3) | less→ fewer | Grammarly assumed that the quantifier ‘less’ modifies the countable noun ‘readers’ rather than the adjective ‘expert’. |
7. Ferré et al. (2010) later found an advantage for both positive and negative emotional words relative to neutral words for adult early and late proficient bilinguals. (W4) | adult→ an adult | Grammarly incorrectly suggested adding the indefinite article ‘an’ before ‘adult’, thinking that ‘adult’ here is used as a noun, not as an adjective modifying ‘bilinguals’. |
8. Flecken et al. (2015) find French L1 speakers carrying across their L1 motion description to L2 German but not path; Stam (2015) highlights differences in both L1 and L2 handling of motion events in L2 English by a Spanish-speaking L2 user … (W6) | path→ the path | Adding ‘the’ affects the technical use of the word ‘path’, which refers to a semantic role here. |
9. How might a word's emotional valence (positive, neutral, negative) influence novel vocabulary immediate recall and retention in a heritage or foreign language? (W4) | a heritage→ heritage | Grammarly incorrectly thought that the indefinite article modifies the uncountable noun ‘heritage’, not the noun ‘language’. |
10. Stance in dimension 1 is used to evaluate the work of others, typically in Essays, while stance and evaluation features in dimension 2 cluster with first-person pronouns and are typically found in Narrative Recounts and Empathy Writing. (W10) | features→ feature | The false positive resulted from the presence of ‘and’ in the compound subject. Grammarly incorrectly assumed that the verb is ‘features’ (not cluster). |
11. Interestingly, the present results resonate with previous research in psychology (e.g., Kensinger, 2004; McGaugh, 2018; Steidl & Anderson, 2006) that suggests that emotional arousal (i.e., emotional activation or stimulation) occurring before, during, or even shortly after learning may be associated with enhanced memory of target items. (W4) | occurring→ occurs | Grammarly failed to realize that the verb is ‘may be’ and ‘occurring’ begins a reduced relative clause. |
12. An initial methodological point is that the participants studied in these articles are mostly instructed students of an L2, for instance Tomczak and Ewert's (2015) Polish university students of English, rather than so-called natural L2 learners with little contact with language teaching, such as …. (W6) | teaching→ teachings | Grammarly incorrectly suggested ‘teachings’, which changes the meaning from ‘the practice of language teaching’ to ‘language principles’. |
As is clear from Table 5, the above examples clearly show that accepting Grammarly's suggestions results in explicit grammatical errors. The rest of grammatical issues can be classified as optional cases. The majority related to determiner use (119 cases), as in ‘the thinking of L2 users is not interpreted as failure to achieve … ’ (W6). Here using ‘failure’ to mean lack of success with or without an article is optional as it can be countable or uncountable. In another example, Grammarly suggested adding the definite article ‘the’ before ‘transfer’ in ‘a direct result of transfer of the native language’ (W1). Such cases represent optional usage in English and therefore cannot be counted as true errors. To further explore the use of zero article with singular nouns, the researchers randomly selected five nouns that were used without articles and flagged as errors in the ten articles and searched their frequency on Google Scholar and BAWE and found that these nouns were frequently used without articles (Table 6). This was further confirmed by the three English native speaker consultants who were asked to judge the correctness of these cases.
Table 6.
Nouns without articles (phrases checked are underlined).
Google Scholar | BAWE | |
---|---|---|
their courses and percentage of time | 751,000 | 6 |
arise only after confrontation with input data | 1160 | 12 |
as a direct result of transfer of the native | 679,000 | 22 |
in which a monosyllabic input is augmented by addition of a vowel | 3480000 | 12 |
mental representations … must be captured in language before they are lost. | 2340000 | 111 |
As is clear from Table 6, the use of such nouns without articles, especially the definite article cannot be counted as errors, especially because there are no watertight rules about using the definite article in English; the question of definiteness in English is rather complicated and requires considerable semantic, pragmatic, and contextual information [[25], [26]]. More illustrative examples of optional cases are given in Table 7.
Table 7.
Optional grammatical issues.
Writer's use | Grammarly's suggestion | Remarks |
---|---|---|
1. There is general consensus (W9) | → a general | Both are correct and used in English as reflected by the high frequency of both on Google Scholar: 43,200 (without a); 89,300 (with a). |
2. Structures which violate (W1) | which→ that | Although the use of ‘that’ in restrictive clauses is recommended (cf. APA Publication Manual), using ‘which’ is optional and cannot be considered an error. |
3. The articles included in this issue divide between (W6). | divide→ are divided | Although the suggestion is grammatically correct, the use of ‘divide’ as an ergative verb is perfectly acceptable. |
4. I argued that this ranking can be seen (W1) | can→ could | Using the modals ‘can, will, may’ with past sentences should not be considered errors as mixing of tenses here is accepted to convey different levels of tentativeness and hedging [27]. The phrases ‘I argued that this can’, ‘explained how they will’, and ‘explained how he can’ appeared 54, 136, and 68 times, respectively on Google scholar. |
5. Patients with diabetes and who require (W10) | and | This phrase appeared 96 times on Google Scholar. |
4.2.3. Spelling/word issues
The total number of issues here was 312. Only nine were true errors; many of them appeared in participants’ quotes. More details are presented in Table 8.
Table 8.
Spelling/word issues.
Spelling issues | Total | Actual errors/percentage | Optional usage/percentage | False positives/percentage |
---|---|---|---|---|
Misspelled words | 147 | 6/4 % | 60/40.8 % | 81/55 % |
Confused words | 107 | 3/2.8 % | 74 | 30/97.2 % |
Unknown words | 58 | 0/0 % | 58/100 % | 0 |
Total | 312 | 9/2.9 % | 192/61.5 % | 111/35.5 % |
From Table 8, we can see that 111 of the flagged words were false positives, and 48 words were examples from other languages such as Arabic waati ‘low’ and Mandarin gao ‘tall’. Sometimes Grammarly flagged them as misspelled words; other times it flagged them as confused words. These words were used as examples in the articles, and Grammarly was unable to recognize them and considered them as misspelled words. The problem here is that Grammarly offered other words to replace these words, which results in explicit, off-context words. For example, Grammarly suggested replacing gao with ‘ago/go’ and waati ‘low’ with ‘water’. Had Grammarly flagged these as unknow words, as it did sometimes, it would have been safer and better for its users.
41 words were technical words, acronyms, or abbreviations such as ‘rhotics’, ‘exponence’, ‘RASIM’ and ‘CaCiiC’. Again, these were flagged as misspelled words or confused words. Grammarly suggested replacing these words with ‘orthotics’, ‘experience’, ‘RACISM’, and ‘classic’, respectively. Accepting these suggestions will distort the intended meaning.
The other 22 cases related to misspelling individual words or confusing them, as shown in Table 9.
Table 9.
Spelling/word issues false positives.
Writer's use | Grammarly's suggestion |
---|---|
1. Emotional arousal, and not just valence as more commonly investigated, may be a telling facilitator of vocabulary learning (W4) | may be→ maybe |
2. This sequence to minimize any one subtest (W4) | any one→ anyone |
3. A higher total score is predicted on the immediate posttest (W4) | predicted→ predicated |
4. Though few studies explore questions specifically surrounding affect within the context (W4) | affect→ effect |
5. Are marked in boxes (W2) | → inboxes |
6. The methodology was extended to investigate variation among university registers, where marked differences emerged (W10) | where→ were |
7. Extracts 2a–c are (W10) | c are→ Care |
8. Should contain a foot (W1) | →afoot |
The rest of issues (192) were optional cases. 51 of them related to spelling compound words with a hyphen when used attributively, as in ‘Additionally, journalistic features, for example, the order of the information, agenda setting and space allocation, in general, and quotation patterns, in particular, play an important role in implementing particular perspectives, and hence, ideologies.’ (W7). Grammarly always required a hyphen in such cases (→agenda-setting). Although this is correct, it cannot be considered an error. It is well-known in English that the spelling of compound words goes through three overlapping stages: spelled as two separate words, spelled with a hyphen and finally spelled as one word [23]. It is very common to find the same compound word spelled with and without a hyphen interchangeably. All the 51 flagged words were checked for their frequency on BAWE (Table 10), and most of them were used both with and without a hyphen, with a higher frequency for spellings without a hyphen. This is in line with [24] advice not to overuse hyphens in familiar phrases and when there is no risk of confusion. Note that Google Scholar was not used here as it lists both words with and without a hyphen simultaneously.
Table 10.
Words spelled with/without a hyphen on BAWE.
Word | with | without |
---|---|---|
well known + noun | 42 | 39 |
one half of | 4 | 13 |
one third of | 17 | 34 |
agendasetting | 8 | 11 |
lower level + noun | 3 | 13 |
first time + noun | 5 | 21 |
first person + noun | 22 | 36 |
As is clear from Table 10, 58 optional issues related to unknown words. The majority were words from other languages such as Arabic, Spanish, and Mandarin; some were technical words such as ‘templatic’ ‘bisyllabicity, ‘genre’, and ‘ethnoracial’. Recall that Grammarly detected these words as confused words in some cases and as unknown words in others. Note that Grammarly did not provide suggestions to replace these words; it only flagged them as unknown words. Therefore, we classify them as optional cases rather than false positives (unlike confused words, where Grammarly provided suggestions to replace them).
Another common issue of optional confused words related to using single quotation marks by the writers, as in “Mautner (2007) argues that ‘what large-scale data are not well suited for, on the other hand, is making direct, text-by-text links between the linguistic evidence and the contextual framework it is embedded in’. Theses …” (W7). Strangely, Grammarly usually flags the use of one of the quotation marks. Finally, a few cases related to variant spellings of some words, cf. ‘focusing’, which is a less common variant of ‘focusing’ in British English and should not be considered an error.
4.2.4. Other issues
A small number of other issues related to improper formatting (21 issues), text inconsistencies (15 issues), and mixed dialects of English (6 issues).
Improper formatting issues were not errors; all of them related to word-by-word translations from other languages or to spelling out numbers as in ‘patients with diabetes and who require long-term (at least 1 month)’ (W10). Grammarly flagged ‘1’ as an error; however, this use is not wrong as it is accepted to use numerals for numbers when the numbers precede a unit of measurement, with statistical functions, percentages, and ratios [28].
Of the 15 text inconsistency issues, six were flagged correctly by Grammarly as the writers used both American and British spellings or used compound words with and without a hyphen (e.g., nonnative and non-native). The other nine cases were not errors. Some words came at the beginning of new sentences, so they were capitalized, and some words came at the end of the line, so the writer hyphenated them. In other cases, some words were used technically, e.g., ‘condition’ in ‘for each dependent variable, two nested models were built … and the model adding a fixed effect for Condition (control vs. intervention)’ (W6). The six errors of mixed dialects of English are not errors as they relate to variant spellings of words in British English, e.g., ‘precent/per cent’.
To conclude our discussion of correctness issues, the large number of false positives and optional issues flagged by Grammarly shows that Grammarly tends to over-flag, and its reliability as an assessment tool seems to be questionable.
4.3. Clarity (wordy sentences) issues
Clarity issues flagged by Grammarly were 578, ranging from 37 to 81. All these issues related to using words instead of phrases, e.g., using ‘a number of’, as in ‘the literature on second language acquisition contains a number of reports’, and ‘with respect to’, as in ‘we can compare the rankings of the faithfulness constraints with respect to the two markedness constraints’. Grammarly flagged these phrases and suggested replacing them with ‘some/many’ and ‘concerning’, respectively. Another example is using ‘so as to’, as in ‘[i]t was explained to the participant that the goal of this manipulation was to provide feedback … and encourage them to write fluently so as to avoid losing sight of their text’ (W6), where Grammarly suggested ‘to’ instead of ‘so as to’. Although ‘to’ can be used here, it does not express the same level of formality, and it will result in many repetitions of ‘to’. In fact, this use of ‘so as to’ in English is very common (see Table 11 below).
Table 11.
Frequency of wordy phrases.
Wordy phrases in the articles | Frequency in the articles | Frequency on BAWE | Frequency on Google Scholar |
---|---|---|---|
in order to | 24 | 4003 | 6130000 |
a number of | 19 | 1165 | 8300000 |
with respect to | 14 | 337 | 5810000 |
in relation to | 9 | 664 | 5190000 |
so as to avoid | 16 | 266 | 442,000 |
by means of | 4 | 152 | 5800000 |
Even though shorter words are better in academic English, using their longer counterparts should not be regarded as ‘errors’, especially when they are used to add variety and avoid repetition. To find out whether Grammarly's suggestions to avoid wordiness reflect actual use in academic English, we further checked the frequency of the most common flagged phrases on BAWE and Google Scholar (Table 11).
The frequencies in Table 11 show that these wordy phrases are common in academic English. Also, the three native speaker consultants confirmed that the use of these phrases in academic English is not wrong. Given that such phrases are frequent in academic English, and they are used by native English writers, it is not justified to adopt a too prescriptive attitude toward using them; they should not be considered as errors affecting the overall score of submitted writing to Grammarly.
Moreover, some of these wordy phrases are used purposefully by the writers to add emphasis or variety. Therefore, replacing these phrases as per Grammarly's suggestions may result in a loss of emphasis or variety. Consider the use of the underlined words to add emphasis in the following example: ‘[t]he idea of linguistic relativity put forward by Benjamin Lee Whorf often comes up in discussions of bilingual cognition, though he himself is actually cited in only two of the articles here’ (W6). Likewise, some wordy phrases should be used to avoid repetition. For example, ‘a number of’ would be used to replace its one-word synonyms to add variety when the need arises.
Before closing this section, it is worth mentioning that Grammarly flagged 638 issues of using the passive voice. Grammarly flags any use of the passive voice and suggests rewriting it in the active voice. We will not analyze the legitimacy of using the passive voice in the articles in this study as it is a complicated issue that is beyond the scope of this paper; however, it suffices to say that using the passive voice in academic English is an integral part of academic written English [29], although its use has declined over the years [30,31].
4.4. Word choice/engagement issues
Grammarly flagged 289 word-choice issues, supposedly to avoid overused words and repetition. However, we found that in some cases accepting Grammarly's suggestion could change the meaning of the sentence, as in ‘because two words of each type of emotion were included in each text, the maximum total score for each word type (positive, neutral, negative) within each text is 6’ (W4). Grammarly suggested replacing ‘negative’ with ‘harmful’. In another instance, Grammarly suggested using ‘antagonistic’ instead of ‘negative’ in ‘the CDA notions described earlier enabled the assignation of more explicit and finer semantic/discourse prosody values than merely assigning a general positive/negative bias’ (W7). A final example is Grammarly's suggestion to replace ‘representative’ with ‘figurative’ in ‘which may not be a representative language for many reasons’ (W6). These examples show that some Grammarly's suggestions are misleading and could result in unnatural English phrases.
To further examine Grammarly's engagement suggestions, we compared the frequency of some phrases in the articles with the frequency of the phrases suggested by Grammarly on Google Scholar and BAWE (Table 12).
Table 12.
Frequency of phrases with engagement issues (phrases in the articles are on the left).
Phrases used by the writers | Grammarly's suggestions | Frequency on BAWE | Frequency on Google Scholar |
---|---|---|---|
particularly scarce | exceptionally scarce | 2 → 0 (the first frequency relates to the writers' phrases, while the second relates to Grammarly's phrases) | 13,500 → 898 |
largely found | broadly found | 1 → 0 | 25,700 → 3010 |
emotional words | inspirational words | 1 → 0 | 33,400 → 2980 |
normally used | generally used | 14 → 14 | 383,000 → 837,000 |
wider issues | broader issues | 6 → 2 | 72,000 → 135,000 |
simplest level | most superficial level | 3 → 0 | 56,400 → 6110 |
A full analysis of such patterns | complete analysis | 2725 → 2669 | 230,000 → 439,000 |
From Tables 12 and it is clear that four of the phrases used in the articles are more common than those suggested by Grammarly, while the other three phrases are in very common use. Moreover, the three English native speaker consultants confirmed that the use of the phrases in the articles is correct in academic English. This shows that the use of these words is not erroneous, hence, they cannot be regarded as issues that affect the overall score of written work.
4.5. Delivery issues
Grammarly flagged 238 delivery issues related to inappropriate colloquialisms (231 issues), tone suggestions (five issues), and sensitive language (two issues).
Most inappropriate colloquialism issues were about using first and second person pronouns such as ‘I’ and ‘you’. Although using such pronouns is generally avoided in academic writing, many publishers and journals tolerate their use and therefore these should not be regarded as errors [32]. More importantly, most issues flagged by Grammarly here are different as these first/second-person pronouns in the articles were used in quotations by the study participants, as in ‘[y]our wife also works at Jal’ (W6). Sometimes, the writers used ‘I’ in example sentences to explain an issue as in ‘[t]he relationships between the elements of the scene are expressed in terms of physical motion, the manner in which it takes place, and the path it follows, say, in I walked along the road or I ran around the track’ (W6).
Other flagged issues were related to using the coordinating conjunctions ‘and’, ‘yet’, and, ‘but’ at the beginning of the sentences, where Grammarly suggested using other conjunctions such as ‘however’ ‘nevertheless’ and ‘furthermore’. Grammarly explains that using these conjunctions at the beginning of sentences is not a grammatical error, but it is a matter of style. This further confirms that counting these issues toward the overall score awarded to pieces of writing is unwarranted [33]. reports that using conjunctions to start sentences is quite common in written English when writers want to add something new to what they have just said. Other flagged issues were about using contractions such as ‘don't’, ‘didn't’, and ‘isn't’. These words again appeared in participants' quotes.
For tone suggestions, Grammarly suggested that using ‘sort of’, ‘simply’, and ‘just’ weakens the writer's message. This is true; however, not all cases where such phrases/words appear are mistakes. Consider the use of ‘sort of’ in ‘[l] languages frequently impose restrictions on what sorts of consonants may occur in syllable codas, ranging from a prohibition against any sort of coda to a prohibition’ (W1). Here Grammarly could not differentiate between the noun ‘sort’ that means a type and ‘sort’ in the adverb ‘sort of’ meaning ‘somewhat’.
Finally, Grammarly flagged ‘foreign’ and ‘freshman’ in ‘foreign students’ and ‘university freshman’ as issues of sensitive language. Grammarly advised that such terms might be considered offensive, outdated, non-inclusive and disrespectful. In fact, these words were not used by the writers themselves; rather they appeared in participants' quotations.
5. Concluding remarks and implications
It has been shown that Grammarly over-flagged a considerable number of writing issues. The huge number of over-flagged issues is misleading as it suggests that these writers have serious problems in their writing. Concerning correctness issues, only 57 (4.8 %) were true errors. Most of the other cases were either optional cases (911/76 %) or false positives (228/19 %). For clarity, Grammarly's performance was not without problems; the flagged issues were not errors and accepting some of Grammarly's suggestions will change the meaning or could affect variety or emphasis. That said, these issues should not be counted toward the overall score of written work.
For engagement issues, most issues were suggestions that were supposed to avoid overused words and repetition. Again, these cases should not be regarded as issues that affect the overall score especially because the words used by the writers are in common use in academic English. Moreover, accepting some suggestions would result in unidiomatic language. Finally, the most common delivery issues flagged by Grammarly related to inappropriate colloquialisms that were used in participants’ quotations and therefore they do not represent writing issues and should not be counted toward the overall score.
As an answer to the first research question (is the overall score given to a piece of writing by Grammarly reliable), our results show that the score given by Grammarly does not seem to be reliable for assessing academic written English. Concerning the second question (are all flagged issues actual errors), the results confirm that Grammarly over-flags many writing issues. Three factors might explain why Grammarly tends to over-flag such issues. First, Grammarly was misled by examples from other languages. In linguistics research, it is very common to use such words and examples, and Grammarly developers should find a way to deal with this case. Second, Grammarly failed to recognize technical terms and special uses of certain words in reporting research findings (e.g., Condition, Section 5.2.4). Th ird, many issues related to optional cases of usage. These optional cases should not be considered errors as they are more linked to style, which Grammarly does not seem to be able to account for.
Our results are similar to previous research on Grammarly in that they show that Grammarly gives instant feedback on many language issues but this feedback can be wrong sometimes [6,12,4], see Section). Moreover, our results lend support to Refs. [8,19], and [7] who advised that Grammarly should be used along with academic advisors’ feedback as it has some inconsistencies.
However, our findings contradict those of [5], who reported that Grammarly had a high precision rate of error detection. The results of our study are also different from all previous research in that they show precisely the extent to which Grammarly can be reliable and how much its feedback is correct.
To recap, our results suggest that using Grammarly as a summative evaluation tool is questionable [13], but it can be used in conjunction with human raters to yield better results [8,34]. This does not mean that Grammarly cannot be a useful tool. Grammarly does a good job at identifying many ambiguous instances of language use that writers will do well to review and consider full revision. However, Grammarly tends to over-flag and therefore Grammarly users should be cautious about interpreting Grammarly's scores and accepting its suggestions. Accepting all suggestions blindly could result in ungrammatical or unidiomatic sentences and distortion of intended meaning. If Grammarly's developers take this into consideration and give more room to optional usage, Grammarly could be an invaluable tool. Furthermore, Grammarly should clearly point out that its suggestions do not mean that the words/structures in question are incorrect, unless they relate to explicit, undisputable grammatical errors. More importantly, these suggestions should not count toward the overall score given to a piece of writing, as this could result in undesirable, unjustified consequences. We suggest that Grammarly should give more than one score to a piece of writing, based on the type of issues involved. In this way, users will not be misled by the overall score given to submitted writing.
These findings have pedagogical approaches as well. Educators and users of Grammarly should deal with Grammarly's scores and suggestions cautiously. They need to double check whether these suggestions are true errors or not before accepting and integrating Grammarly's suggestions into their written work.
One limitation of this study is that it restricted itself to false positives and optional cases. A future study that focuses on false negatives (undetected errors) and the reasons that lead to them is highly recommended. This can be achieved by submitting texts with different types of mistakes to find the types of mistakes that Grammarly may not detect. Another limitation relates to the field. This paper was concerned with writing in the linguistics field. It is recommended to explore the performance of Grammarly in other fields such as science, business, and medicine.
Ethics statements
All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation. This study was approved by the Ethics Committee of King Faisal University, KSA, with ethics approval reference [35,243]. The standards are also in line with the Helsinki Declaration of 1975, as revised in 2000. Informed consent was obtained from all participants for being included in the study.
Funding statement
This work was supported by the Deanship of Scientific Research Vice Presidency for Graduate Studies and Scientific Research King Faisal University, Saudi Arabia project no 6194
Data availability statement
The research data are made available at: https://PACS-global.com/OpenSharingStudy.aspx?id=EDQXKDYCQVRXKWNYOQAA.
CRediT authorship contribution statement
Abdallah Abu Qub'a: Writing – original draft, Data curation. Mohammed Nour Abu Guba: Writing – review & editing, Methodology. Shehdeh Fareh: Writing – review & editing, Methodology.
Declaration of competing interest
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Abdallah Abu Quba reports financial support was provided by King Faisal University. Abdallah Abu Quba reports a relationship with King Faisal University that includes: funding grants. None If there are other authors, they declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Footnotes
One of the researchers of this study submitted a paper to a well-established linguistics journal. One of the reviewers (a non-native speaker of English) used Grammarly to check the accuracy of the language of the paper and that reviewer rejected the paper claiming that the paper received a low score on Grammarly (which was misleading information). Luckily, the paper received positive feedback from the other reviewer and the researcher had the opportunity to defend the paper.
Some papers were written by more than one author, but at least the main/first author was a native speaker.
Contributor Information
Abdallah Abu Qub'a, Email: aabuquba@kfu.edu.sa.
Mohammed Nour Abu Guba, Email: mabu-gub@sharjah.ac.ae.
Shehdeh Fareh, Email: shfareh@sharjah.ac.ae.
References
- 1.Link S., Mehrzad M., Rahimi M. Impact of automated writing evaluation on teacher feedback, student revision, and writing improvement. Comput. Assist. Lang. Learn. 2020 doi: 10.1080/09588221.2020.1743323. [DOI] [Google Scholar]
- 2.Fitria T.N. Grammarly as AI-powered English writing assistant: students' alternative for writing English. Metathesis: Journal of English Language, Literature, and Teaching. 2021;5(1):65–78. . [Google Scholar]
- 3.Nova M. Utilizing Grammarly in evaluating academic writing: a narrative research on EFL students' experience. Premise: Journal of English Education. 2018;7(1):80–97. [Google Scholar]
- 4.Sahu S., Vishwakarma Y., Kori J., Thakur J. Evaluating performance of different grammar checking tools. Int. J. Adv. Trends Comput. Sci. Eng. 2020;9 doi: 10.30534/ijatcse/2020/201922020. [DOI] [Google Scholar]
- 5.Ranalli J., Yamashita T. Automated written corrective feedback: error correction performance and timing of delivery. Lang. Learn. Technol. 2022;26(1):1–25. http://hdl.handle.net/10125/73465 [Google Scholar]
- 6.Dewi N.A. The effectiveness of Grammarly checker toward student's writing quality of English Department students at IAIN Tulungagung. IAIN Tulungagung. 2019 [Google Scholar]
- 7.O'Neill R., Russell A. Grammarly: help or hindrance? Academic Learning Advisors' perceptions of an online grammar checker. Journal of Academic Language & Learning. 2019;13(1):A88–A107. 2019. ISSN 1835-5196. [Google Scholar]
- 8.Dodigovic M., Tovmasyan A. Automated writing evaluation: the accuracy of Grammarly's feedback on form. International Journal of TESOL Studies. 2021;3(2):71–87. doi: 10.46451/ijts.2021.06.06. [DOI] [Google Scholar]
- 9.Grammarly. (n.d.) Grammarly support. https://support.grammarly.com/hc/en-us. Retrieved on 10, March 2022.
- 10.Ferris D. University of Michigan Press; : 2011. Treatment of Error in Second Language Student Writing. [Google Scholar]
- 11.Wael A. AI in the foreign language classroom: a pedagogical overview of automated writing assistance tools. Educ. Res. Int. 2023;2023(1):1–15. doi: 10.1155/2023/4253331. [DOI] [Google Scholar]
- 12.Ghufron M.A., Rosyida F. The role of Grammarly in assessing English as a foreign language (EFL) writing. Lingua Cult. 2018;12:395. doi: 10.21512/lc.v12i4.4582. [DOI] [Google Scholar]
- 13.Hockly N. Automated writing evaluation. ELT J. 2019;73(1):82–88. doi: 10.1093/elt/ccy044. [DOI] [Google Scholar]
- 14.Cho H. The use of English articles by non-native speaker of English. Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology (AJMAHS) 2018;8(5):127–139. [Google Scholar]
- 15.Fadhilah U., Lizawati, Hotmarai J. Effectiveness of Grammarly application for writing English abstract. Int. J. Sci. Res. 2018;8(12):163–166. [Google Scholar]
- 16.Karyuatry L., Rizqan M., Darayani N. Grammarly as a tool to improve students' writing quality: free online-proofreader across the boundaries. JSSH (Jurnal Sains Sosial dan Humaniora) 2018;2:83. doi: 10.30595/jssh.v2i1.2297. [DOI] [Google Scholar]
- 17.O'Neill R., Russell A. Stop! Grammar time: university students' perceptions of the automated feedback program Grammarly. Australas. J. Educ. Technol. 2019;35(1):42–56. doi: 10.14742/ajet.3795. [DOI] [Google Scholar]
- 18.Abu Guba M.N., Abu Qub’a A., Awad A. Grammarly in teaching writing to EFL learners at low levels: how useful is it? World J. Engl. Lang. 2024;14(3) doi: 10.5430/wjel.v14n3p1. [DOI] [Google Scholar]
- 19.Dembsey J.M. Closing the Grammarly® gaps: a study of claims and feedback from an online grammar program. Writ. Cent. J. 2017:63–100. [Google Scholar]
- 20.Lin Z. The linguistic elephant in scientific publishing. 2023. [DOI]
- 21.Nagata R., Nakatani K. Proceedings of the 23rd International Conference on Computational Linguistics. Beijing, China; 2010. Evaluating performance of grammatical error detection to maximize learning effect.https://aclanthology.org/C10-2103/ [Google Scholar]
- 22.BAWE (n.d.) https://app.sketchengine.eu/#concordance. Retrieved on 11 March 2022.
- 23.Casagrande J. Student, and Businessperson. Ten Speed Press; 2014. The best punctuation book, period: a comprehensive guide for every writer. [Google Scholar]
- 24.Straus J., Kaufman L., Stern T. John Wiley & Sons; : 2014. The Blue Book of Grammar and Punctuation. [Google Scholar]
- 25.Raimes A. In: State of the Art TESOL Essays. Silberstein S., editor. TESOL; Alexandria, VA: 1993. Out of the woods: emerging traditions in the teaching of writing; pp. 237–260. [Google Scholar]
- 26.Gamon M., Leacock C., Brockett C., Dolan W.B., Gao J., Belenko D., Klementiev A. Using statistical techniques and web search to correct ESL errors. CALICO Journal. 2009;26(3):491–511. [Google Scholar]
- 27.Carter R., McCarthy M. CUP; Cambridge, UK: 2006. Cambridge Grammar of English: A Comprehensive Guide: Spoken and Written English Grammar and Usage. [Google Scholar]
- 28.APA style. (n.d.) APA Style. https://apastyle.apa.org. Retrieved on 10, March 2022.
- 29.Minton T.D. In defense of the passive voice in medical writing. Keio J. Med. 2015;64(1):1–10. doi: 10.2302/kjm.2014-0009-RE. [DOI] [PubMed] [Google Scholar]
- 30.Leong P.A. The passive voice in scientific writing through the ages: a diachronic study. Text Talk. 2020;40(4):467–489. [Google Scholar]
- 31.Leong P.A. The passive voice in scholarly writing: a diachronic look at science and history. Finnish Journal of Linguistics. 2021;34:77–102. [Google Scholar]
- 32.Wang S., Tseng W.-T., Johanson R. To we or not to we: corpus-based research on first-person pronoun use in abstracts and conclusions. Sage Open. 2021;11(2) doi: 10.1177/21582440211008893. [DOI] [Google Scholar]
- 33.Biber D., Johansson S., Leech G., Conrad S., Finegan E., Quirk R. Longman; London: 1999. Longman Grammar of Spoken and Written English. [Google Scholar]
- 34.Wilson J., Olinghouse N.G., Andrada G.N. Does automated feedback improve writing quality? Learning Disabilities. A Contemporary Journal. 2014;12(1):93–118. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The research data are made available at: https://PACS-global.com/OpenSharingStudy.aspx?id=EDQXKDYCQVRXKWNYOQAA.