Skip to main content
Heliyon logoLink to Heliyon
. 2025 Jan 15;11(2):e42016. doi: 10.1016/j.heliyon.2025.e42016

A systematic review of corpus-based instruction in EFL classroom

Dandan Li 1,, Nooreen Noordin 1, Lilliati Ismail 1, Dan Cao 1
PMCID: PMC11786821  PMID: 39897805

Abstract

This systematic review provides a comprehensive analysis of corpus-based instruction (CBI) in English as a foreign language (EFL) classroom. Corpus-based instruction is a teaching method based on actual data from natural language use, representing an important branch of corpus research. It serves as an auxiliary tool for foreign language teaching and is considered an effective instructional method. The researchers aim to thoroughly analyze the empirical studies on corpus-based instruction (CBI) in EFL classrooms, exploring research contexts, theoretical foundations, types of classroom activities, research methods, corpus tools, influencing factors, and the advantages and challenges of implementing CBI. It also proposes effective coping strategies. Following the PRISMA guidelines, the research team conducted an extensive search across four reputable databases—ScienceDirect, Google Scholar, Wiley online Library, and the Web of Science—to identify empirical studies on Data-Driven Learning (DDL) and the use of corpora in language learning contexts from 2011 to 2024. A total of forty-four studies met the inclusion criteria for the final review. This systematic review underscores CBI's effectiveness in developing various English skills and promoting learner autonomy but also identifies key challenges, such as its focus on higher-level learners, the complexity of corpus use, and instructional limitations. The review suggests solutions like comprehensive training, simplified resources and tasks, personalized learning paths, and increased resource allocation. These strategies aim to enhance the effectiveness of CBI in EFL classrooms and improve language learning outcomes.

Keywords: Corpus-based instruction (CBI), Foreign language learning (EFL), Systematic review, Data-driven learning (DDL)

1. Introduction

In the dynamic field of English as a Foreign Language (EFL) education, corpus-based instruction (CBI) has risen as a transformative approach, enriching language learning and teaching through its diverse applications and a solid theoretical grounding. In linguistic research, corpus linguistics involves the collection and analysis of authentic texts to provide evidence for describing the nature, structure, and use of languages [1]. In the 1990s, European linguists and educators proposed the use of corpora as an auxiliary means for foreign language teaching. Corpus linguistics, an important branch of corpus studies, can serve as an auxiliary means for foreign language teaching and is considered an effective teaching method. Leech [2] argues that the utilization of corpora in language teaching can occur in two distinct ways: either indirectly or directly. The indirect use of corpora refers to the application of corpus data in reference publishing, materials development, and language testing, such as the creation of dictionaries, syllabi, and teaching materials, and the construction, compilation, and selection of language tests. The direct use of corpora involves the integration of corpus data in the actual teaching process through "teaching about, teaching to exploit, and exploiting to teach". In recent years, corpora have also become a valuable resource in the field of language teaching and learning, positively impacting curriculum design, testing, and material development. It provides authentic language input, enables evidence-based teaching, and fosters learner autonomy by offering students opportunities to scrutinize corpus data, formulate hypotheses, and develop rules to gain inductive insights into language. By using corpora, teachers can provide learners with more accurate and reliable information about language use and structure, while also promoting active learning and engagement.

In addition, there is closely relationship between Corpus-Based Instruction (CBI) and data-Driven Learning (DDL), with both methods relying on authentic language data from corpora. Data-Driven Learning (DDL) involves learners directly interacting with corpus data to discover linguistic patterns and rules. This approach encourages inductive learning and fosters greater learner autonomy and analytical skills [3]. While CBI is typically teacher-led, using corpus data to inform lesson content and instruction, DDL places learners in the role of language researchers, engaging them directly with corpus analysis to promote discovery-based learning [4]. CBI leverages large-scale electronic texts (corpora) to support language teaching by analyzing real language usage data. By presenting examples of actual usage, CBI encourages learners to inductively discover language rules and conventions through the analysis of real contexts, leading to more effective language acquisition. Furthermore, integrating corpus-based approaches into language teaching has the potential to revolutionize grammar instruction by providing empirical evidence of language use. CBI enables teachers to demonstrate language features as they occur in real-life contexts, bridging the gap between theoretical descriptions and practical usage, thereby enhancing learners' grammatical competence and communicative skills [5]. The integration of CBI and DDL offers a comprehensive approach to language education. CBI provides structured exposure to authentic language use under the guidance of teachers, while DDL empowers learners to take charge of their own learning through corpus exploration and analysis [6]. Incorporating corpus data into language teaching not only offers authentic examples of language use but also highlights linguistic variations and changes, providing learners with tools for their own linguistic investigations. Thus, CBI and DDL together support a more learner-centered and exploratory approach to language learning.

Early research on corpus-based instruction primarily focused on vocabulary and lexical competence, demonstrating its effectiveness in expanding learners' vocabulary through authentic language data [7,8]. This approach enhances lexical awareness and retention by exploring word meanings and collocations. Building upon this foundation, CBI has revolutionized grammar learning and instruction by using corpora to exemplify grammatical structures with real-world examples, aiding in the understanding of language mechanics and fostering a more intuitive grasp of grammar [9]. Additionally, CBI has been shown to substantially improve students' writing abilities, particularly in academic contexts, by providing models of proficient language use and supporting the development of syntactic awareness and well-organized texts [7,10]. The integration of reading materials into writing tasks has further bolstered language exposure and writing proficiency, highlighting the mutually beneficial relationship between these skills [11]. In the realm of collocation learning, CBI has proven advantageous in teaching conventional word pairings, essential for achieving fluency and naturalness in language use [12,13]. Moreover, CBI has significantly enhanced pragmatics and communicative competence by improving learners' ability to produce pragmatic routines through explicit instruction and authentic language materials [14]. Furthermore, CBI fosters learner autonomy and self-directed learning by empowering learners to take charge of their educational journey through engaging with corpora [15]. Broadly, the application of corpus linguistics in language teaching is essential for developing teaching materials and enhancing discourse awareness, providing a more empirical and autonomous approach to language learning [16,17].

Despite the increasing research on the use of corpora in classrooms, our understanding of what truly happens in the classroom remains relatively lacking. Against this backdrop, the research method adopted a systematic literature review, conducting extensive searches of relevant databases and analyzing academic literature that met the criteria. The systematic review aims to thoroughly analyze the empirical studies on corpus-based instruction (CBI) in EFL classrooms, exploring research contexts, theoretical foundations, types of classroom activities, research methods, corpus tools, influencing factors, and the advantages and challenges of implementing CBI. It also proposes effective coping strategies.

The following research questions guide this systematic review.

  • 1.

    What is the research context of CBI studies?

  • 2.

    What theories underpin the use of CBI in EFL classrooms?

  • 3.

    What classroom activities are employed in CBI studies?

  • 4.

    What types of corpus tools are used in CBI studies?

  • 5.

    What research methods are used in CBI studies?

  • 6.

    What factors influence the implementation of CBI in EFL classrooms?

  • 7.

    What are the advantages and challenges of implementing CBI in EFL classrooms?

2. Methodology

The research methodology of this study involved identifying relevant studies addressing the research question and conducting a comprehensive analysis and synthesis of the selected articles. This systematic review adhered to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Protocols (PRISMA-P) guidelines, which provide a checklist of essential elements for conducting systematic reviews and meta-analyses [18]. Following these guidelines ensures consistency, transparency, accountability, and integrity in the review process, thereby enhancing the overall quality of the analysis [19].

2.1. Data collection and selection process

The research team conducted an extensive search across four reputable databases—ScienceDirect, Google Scholar, Wiley online Library, and the Web of Science—to identify empirical studies on Data-Driven Learning (DDL) and the use of corpora in language learning contexts from 2011 to 2024. The search strategy integrated keywords such as "corpus," "corpus-based instruction," "corpus linguistics," and "corpus pedagogy," alongside "EFL" and "foreign language learning." The selection of journals was based on four key criteria: the journal's prestige and editorial standards, inclusion in prominent databases, adherence to a stringent peer-review process, and the avoidance of predatory journals. Specifically, the research team considered the impact factor and ranking of journals within their respective fields, particularly those recognized in language education and applied linguistics, to ensure that the research included in the review met the highest standards of rigor and credibility. To standardize the selection process and minimize potential bias, a systematic and objective protocol was established. The initial search yielded a large pool of scholarly articles, which were then subjected to a set of inclusion and exclusion criteria.

To address potential biases related to length and accessibility, dissertations and other non-article formats were excluded. Furthermore, studies that merely referenced tools without evaluation or failed to adequately detail the implementation of Corpus-Based Instruction (CBI) were excluded. The decision to include only studies published in English was made to ensure consistency and comparability across the studies, as English is the dominant language in academic publishing, particularly in the field of EFL research. Additionally, since previous reviews have covered earlier periods, selecting 2011 as a starting point allows researchers to supplement or update existing reviews, thereby ensuring the study's uniqueness and contribution. Consequently, studies published before 2011 were excluded. This paper establishes the inclusion and exclusion criteria (Table 1) and reviews each paper to determine its eligibility for analysis.

Table 1.

Inclusion and exclusion criteria.

Criterion Inclusion Exclusion
Type of article Journal articles and education-related Book, book chapter, systematic review, proceedings
Language English Non-English
Year 2011–2024 <2011
Peer review Peer-reviewed Non-Peer-reviewed
Methodology Quantitative, qualitative, mixed method Text analysis, ambiguously described
Term defined Consist with selected Inconsistent
Instruction Experimental or case study Not specified

2.2. Data analysis

The articles selected underwent a meticulous review to extract pertinent details, including full references, abstracts, publication specifics, participant demographics, study characteristics, methodologies, data handling, and research designs. A thorough examination and summary of this information were conducted to ascertain the extent to which each study addressed the research questions. On this basis, the researchers performed the PRISMA review process (Fig. 1), including identification, screening, qualification and analysis. A total of forty-four studies met the inclusion criteria for the final review. The initial categorization was verified independently by two researchers, with an additional reviewer assessing one of the studies for quality assurance. Discrepancies were addressed through collaborative discussion among the researchers and, if necessary, consultation with a third reviewer.

Fig. 1.

Fig. 1

Literature selection process.

Guided by the framework proposed by Littell et al. [20], the review process encompassed several steps.

  • 1.

    Transcribing key information from the studies into an analytical matrix.

  • 2.

    Examining data pertinent to CBI, such as the context of CBI (the distribution of the studies by year, geography, characteristics in the reviewed studies) theories, tools, activities, advantages, challenges, and factors influencing the adoption of corpora and DDL.

  • 3.

    Comparing and contrasting codes and data within categories to discern patterns, similarities, and divergences.

  • 4.

    Presenting a descriptive account of the aggregated results.

  • 5.

    Offering an interpretive synthesis of the findings.

This structured and rigorous approach ensures that the review is both methodologically sound and aligned with the highest standards of academic research.

3. Results

3.1. Research context

To establish the research context of CBI in EFL classrooms, this systematic review first analyzes the distribution of empirical studies across three key dimensions: time, geographic regions, and affiliated research institutions. These aspects provide insights into the evolution, global reach, and institutional support of CBI research, setting the stage for a deeper exploration of its theoretical foundations, classroom practices, and outcomes.

3.1.1. Distribution of articles based on time

The distribution of publication per year is shown in Fig. 2, which indicates the number of published papers related to corpus-based instruction in EFL classrooms from 2011 to 2024. From 2011 to 2014, the number of publications remained relatively stable, with each year producing one to two papers. This initial phase suggests a nascent interest in CBI within EFL contexts, likely reflecting early exploratory research in this area. The period from 2015 to 2018 marks a significant increase in publications, peaking in 2017 with five papers. This upward trend indicates growing recognition and academic interest in the potential of CBI for enhancing EFL instruction. Following 2018, the publication rate maintained a relatively high level, with a slight dip in 2020, where the number of published papers decreased to three.Despite this, the overall trend from 2018 to 2023 shows sustained interest and consistent research activity, highlighting the continued relevance and importance of CBI. In 2021, there was another peak with five publications, suggesting a rebound and renewed focus on CBI research post-pandemic. The following years, 2022 and 2023, sustained this interest with four papers each year, indicating a steady stream of research contributions. In 2024, there are only one publication. Since 2024 is not yet complete, this decline may reflect an incomplete dataset rather than a true decrease in research activity. Overall, the data suggests that the field of CBI in EFL classrooms has experienced significant growth and sustained interest from 2011 to 2023, despite occasional fluctuations likely due to external factors such as the pandemic.

Fig. 2.

Fig. 2

Distribution of articles based on the timeline consider.

3.1.2. Distribution of articles based on regions

In the realm of EFL context, corpus-based instruction has emerged as a pivotal approach, leveraging the wealth of authentic language data to enhance teaching and learning. The Fig. 3 of research frequency across various countries underscores the global interest and implementation of CBI. Notably, China and Turkey exhibit the highest adoption rates with 11 and 8 instances, respectively, indicating a significant engagement with CBI. Meanwhile, countries like Egypt, South Korea, Thailand, Iran and Spain demonstrate moderate usage with at least 2 occurrences each, suggesting a growing yet steady integration into EFL classrooms. The spectrum of adoption ranges from the robust presence in China and Turkey to the minimal application in countries like Germany, where the method seems to be nascent or underexplored.

Fig. 3.

Fig. 3

Distribution of articles based on territory.

3.1.3. Distribution of research institutions

Based on the information depicted in the Fig. 4, the research participants can be divided into two distinct educational levels: secondary schools and tertiary schools. The pie chart presents a clear distribution, showing that the majority of the studies have been focused on tertiary institutions, with a significant percentage of 83 % dedicated to this level of education. This suggests that a substantial amount of attention and research efforts in CBI have been allocated to higher education settings. On the other hand, secondary schools also constitute a portion of the research, albeit a smaller one, with a percentage of 17 %. This indicates that while there is a notable focus on tertiary education, there is also a recognized need to explore CBI within the context of secondary schools, ensuring that this form of instruction is not overlooked at the pre-university level. The pie chart serves as a visual representation of the educational level distribution among CBI studies, emphasizing the need for a balanced exploration across different educational levels to enrich the understanding of CBI applications in language teaching.

Fig. 4.

Fig. 4

Distribution of research institutions.

3.2. Theoretical foundations

In the exploration of corpus-based instruction (CBI) in EFL classrooms, several theoretical frameworks guide both research and practice. As shown in Table 2, discovery Learning and Learner Autonomy Theory is the most commonly adopted framework [[21], [22], [23], [24]]. These studies promote a learner-centered approach that fosters exploration, inquiry, and the development of autonomous learning skills. This aligns with CBI's principle of positioning learners as autonomous researchers in the language learning process. Lin [23] applies discovery learning and the noticing hypothesis to design activities that enhance language acquisition. Similarly, Luo and Liao [24] emphasize inductive and discovery-oriented learning, showing how these approaches help learners uncover linguistic rules. Çalışkan and Gönen [21] highlight the synergy between discovery learning and learner autonomy, framing learners as active participants—key to CBI. Lee and Lin [22] further integrate levels of processing theory with discovery learning, showing that both inductive and deductive methods enhance vocabulary retention. Aşık et al. [15] explore Data-Driven Learning (DDL), reinforcing that discovery learning combined with corpus tools allows learners to engage with authentic language data and deepen their understanding of language patterns.

Table 2.

Summary of theories used in CBI studies.

Theory Count Corresponding Paper
Discovery Learning and Learner Autonomy Theory 7 ID35, ID4, ID6, ID12, ID21, ID2, ID3
Corpus Linguistics Theory 5 ID26, ID18, ID24, ID25, ID37
Constructivism 2 ID1, ID27
Constructivism 2 ID1, ID27
Social Constructivism 2 ID7, ID33
Sociocultural Theory 2 ID34, ID29
Usage-Based Theory 1 ID30
Metalinguistic Awareness and Metacognition Theory 1 ID22

Corpus Linguistics Theory is also prominent, guiding five studies that rely on the empirical analysis of language use [[25], [26], [27]]. These studies provide valuable insights into authentic language usage. Oktavianti et al. [26] emphasize the role of corpus linguistics in revealing patterns of real-world language use. Jamal et al. [25] underscores its empirical nature, focusing on the use of large collections of natural texts, computational tools, and a blend of quantitative and qualitative techniques. Phoocharoensil [27] highlights corpus linguistics as a reliable guide for language use and research.

Constructivism and Social Constructivism, each represented in two studies, emphasize learners' active role in constructing knowledge through interaction and collaboration. Eman and Hossam [7] highlight the strategic use of corpora to build cognitive and metacognitive processes. Ozer and Özbay [9] extend this by framing learners as researchers, emphasizing their agency in the learning journey. Social constructivism further enriches this by emphasizing the social nature of knowledge construction and the teacher's role in facilitating active learning [8]. Zhong and Wakat [10] integrate schema theory, underscoring the importance of interaction and cooperation in contextual learning.

Vygotsky's sociocultural theory explains the mediated process of learning through social interaction, particularly scaffolding within the Zone of Proximal Development (ZPD), which is critical for CBI. It provides a structured yet flexible learning approach, guiding learners through challenging concepts [28,29]. Usage-Based theories and Second Language Acquisition (SLA) perspectives, as explored by Rodríguez-Fuentes and Swatek [30], align with CBI by promoting language learning through experience and use, reinforcing the importance of frequency and exposure in language acquisition. Lastly, Meta-Linguistic Awareness and Metacognitive theories focus on the role of higher-order thinking skills in language learning, enhancing learners' self-regulation [31].

In summary, the integration of these theoretical frameworks in CBI offers a comprehensive, learner-centered, socially interactive, and empirically informed educational paradigm. These theories provide a solid foundation for ongoing innovation in EFL pedagogy, equipping learners with the skills needed to navigate the complexities of language learning in the 21st century.

3.3. Classroom activities

This review classifies corpus-based instructional activities into direct and indirect uses of data-driven learning (DDL), highlighting the flexibility of corpus-based instruction (CBI) in English as a foreign language (EFL) setting. According to Leech [2], direct DDL (hands-on) involves learners interacting directly with corpus data through tools like concordance lines, corpus software, and data analysis exercises. In contrast, indirect DDL (hands-off) uses corpus-informed resources such as textbooks and syllabi, allowing learners to benefit from corpus-derived materials without directly engaging with the corpus.

As presented in Table 3, most studies reviewed favor the hands-on approach, where learners actively work with corpora. For example, Abdel-Haq et al. [7] and Smirnova [32] designed activities involving error correction and language exploration using corpus tools. Luo and Liao [24] focused on self-correction with BFSU CQP web, and Moon and Oh [33] examined learners’ interaction with corpora to develop language hypotheses. These studies illustrate how direct manipulation of corpora enhances learners' understanding of vocabulary, grammar, and collocations through structured exercises. Additionally, some studies, like Fang et al. [34] and Poole [35], emphasized guided DDL practice to help students transition from instructional support to independent corpus use.

Table 3.

The learning activities of CBI studies.

Category Corresponding Paper
DDL Hands-on (Direct Use) ID1, ID2, ID3, ID4, ID5, ID6, ID10, ID12, ID13, ID16, ID17, ID27, ID28, ID29, ID42, ID44, ID26, ID40, ID41, ID43, ID32
DDL Hands-off (Indirect Use) ID7, ID9, ID11, ID30, ID33, ID34, ID35, ID37, ID38, ID36

Conversely, the indirect hands-off approach focuses on using corpus data for material development rather than direct learner engagement. Youssef [36] used corpus-based materials in the classroom without requiring direct corpus interaction from students, and Zhong and Wakat [10] developed grammar lessons based on corpus analysis without involving learners in corpus searches. This approach demonstrates how corpus-based instruction can enhance teaching materials without direct learner manipulation. Some studies combine both approaches. For example, Boontam [37] involved learners in corpus data analysis but used pre-compiled materials.

Overall, most studies favor direct DDL, where learners actively engage with corpus data, enhancing their language proficiency. However, the indirect approach also contributes significantly by informing material development. This dual approach shows that direct corpus interaction fosters learner autonomy and inquiry, while indirect use through material development complements traditional teaching methods. The reviewed studies highlight the adaptability of DDL in EFL instruction, offering diverse strategies for incorporating corpora into pedagogy. Both direct and indirect applications provide valuable methods for enhancing language competency, with hands-on activities promoting independent language discovery and hands-off DDL enriching teaching materials.

3.4. Type of corpus tools

As seen in Table 4, the analysis reveals a diverse deployment of technological applications across studies. Concordancing software, such as AntConc, emerges as a widely implemented tool, facilitating language analysis and learner engagement with linguistic data. This is evidenced by its use in multiple studies [11,38]. The British National Corpus (BNC) and the Corpus of Contemporary American English (COCA) are frequently cited resources, each utilized in several studies to enhance vocabulary acquisition and grammatical understanding. For example, Qoura [39] and Mahbashi [8] demonstrate the effective use of these resources. Other tools, such as Sketch Engine and Lextutor, are also noteworthy for supporting DDL activities and corpus-assisted language learning [37,40,41]. The array of tools employed, such as Word Smith, BFSU CQP web, Schoology LMS, Web Para News, Lago Word Profiler, reflect a commitment to harnessing technology to enhance EFL instruction, providing learners with rich, data-driven learning experiences.

Table 4.

Distribution of corpus tools.

Corpus Tools Count Corresponding Paper
AntConc 6 ID14, ID7, ID18, ID27, ID29, ID2
BNC 1 ID7
ACAT 1 ID31
Wikipedia corpus 1 ID18
Word Smith 1 ID5
Sketch Engine 1 ID13
BFSU CQP web 1 ID4
Schoology LMS 1 ID27
Web Para News 1 ID28
Lago Word Profiler 1 ID28
Lex tutor 2 ID10, ID36
Collins Collocation Dictionary
Lancs Box
1
1
ID39
ID38

3.5. Research method

According to these empirical studies, different research methods were used in these studies. As shown in Fig. 5, this analysis highlights the predominance of mixed-methods research (MMR), followed by quantitative and qualitative approaches, and provides insights into the distribution and frequency of various research tools and techniques. As depicted in the pie chart, mixed-methods research dominate the field, accounting for 56 % of the reviewed studies. This methodological preference underscores the value placed on integrating both qualitative and quantitative data to gain a holistic understanding of CBI's impact. Quantitative methods constitute 33 % of the studies, emphasizing the importance of measurable, statistical data in assessing educational outcomes. Qualitative methods, while less frequent at 11 %, are crucial for capturing in-depth, contextual insights into learners' and educators' experiences.

Fig. 5.

Fig. 5

The design of the study.

3.6. Influencing factors in the application of (CBI) in EFL classrooms

This literature review aims to synthesize and analyze the key determinants that facilitate the effective integration of CBI within EFL instructional settings. Drawing from an extensive review of scholarly studies, five overarching factors have been identified as critical to the success of CBI in EFL contexts: individual learner factors, teacher-related factors, technological and resource factors, curriculum and pedagogical factors, and social and cultural factors.

3.6.1. Individual learner factors

The direct approach to CBI, where learners engage with corpus data autonomously, presents unique challenges. Abdel-Haq and Ali [7] found that some learners find corpus consultation daunting due to the sheer volume of data, and not all are predisposed to inductive learning. Moon and Oh [33] noted that learners' unfamiliarity with DDL and the cognitive demands of data analysis can impede effectiveness, particularly for lower-level learners who may benefit more from a guided approach. Li [42] emphasized that prior knowledge significantly influences the use of corpora and DDL. Luo and Liao [24] highlighted the importance of language proficiency and error types in CBI effectiveness. Al-Mahbashi et al. [8] pointed out that access to technology and familiarity with corpus use are essential for effective DDL. To address these challenges, the indirect approach to CBI, where teachers mediate corpus findings, might be more appropriate for some contexts, as it aligns with Boulton's [4] theory of scaffolding learner autonomy.

3.6.2. Teacher-related factors

Teacher training and familiarity with corpus tools are foundational for the successful integration of CBI. Çalışkan et al. [21] and Qoura et al. [39] stressed the importance of teacher training in corpus tool usage and activity design. Smirnova [32] cautioned against over-reliance on corpus data without adequate teacher guidance. Jamal et al. [25] found that teachers' decisions to incorporate CBI are influenced by student needs, curriculum design, and classroom organization. Lin [23] and Ma et al. [40] noted that without proper training, teachers may be less inclined to adopt corpus-based methods. The indirect approach to CBI, which relies on teacher mediation, might be more suitable in contexts where teacher training is limited, as it leverages the teacher's role in guiding learners through the corpus analysis process.

3.6.3. Technological and resource factors

Access to technology and the practicality of integrating corpus tools into the classroom are significant factors. Kartal [12] and Li [42] highlighted the importance of technological tools and the novelty of the CBI approach. Youssef [36] emphasized the need for training on corpus use and the integration of CBI with conventional teaching methods. Boontam [37] noted that the quality and appropriateness of corpus data for different proficiency levels are crucial for CBI effectiveness. Zhong and Wakat [10] added that student motivation, prior knowledge, and learning styles can influence the success of corpus-integrated lessons.

3.6.4. Curriculum and pedagogical factors

The integration of CBI into the curriculum and its alignment with traditional teaching methods are essential. Fang et al. [34] identified limited access to electronic devices and the need for prolonged training as challenges. Boulton et al. [43] emphasized the importance of corpus size, representativity, and learners' L1 background in corpus-informed learning. Rodríguez-Fuentes and Swatek [30] highlighted the significance of textbook design, information presentation based on frequency, and attention to register patterns in the effectiveness of corpus-informed materials.

3.6.5. Social and cultural factors

Social recognition and the cultural context in which CBI is implemented also influence its effectiveness. Girgin [44] noted that learners' familiarity with technology and the availability of appropriate corpus resources are crucial. Tokdemir et al. [45] mentioned that the first language background significantly affects collocation use.

In conclusion, the successful implementation of CBI in EFL classrooms is influenced by a range of factors, including individual learner characteristics, teacher-related factors, technological and resource availability, curriculum integration, and social and cultural contexts. Boulton's concept of direct and indirect approaches to CBI provides a theoretical framework for understanding how these factors can be addressed to enhance the effectiveness of CBI in improving EFL learners' language skills. Comprehensive training, appropriate resource allocation, and careful curriculum design, informed by both direct and indirect CBI methodologies, can significantly enhance the integration and outcomes of CBI in EFL settings.

3.7. Advantages and challenges of CBI in EFL classrooms

3.7.1. Advantages

Corpus-based instruction and data-driven learning have garnered considerable attention in the field of English as a Foreign Language education due to their potential to enhance language learning through authentic language use and discovery learning. This literature review aims to synthesize the advantages identified in various studies, categorizing them into themes that reflect their contributions to language skills, learner autonomy, and instructional efficacy. The Fig. 6 visually represents the distribution of references across various subcategories of advantages associated with corpus-based instruction in EFL classrooms. Each bar corresponds to a specific subcategory and indicates the number of references supporting it.

Fig. 6.

Fig. 6

Advantages of corpus-based instruction.

One of the primary advantages of corpus-based instruction is its ability to improve language knowledge and writing skills. Some researchers found that this approach can improve students' writing performance, vocabulary usage, and motivation [34,46,47]. Furthermore, previous studies proved that EFL learners’ vocabulary and collocation skills were also improved after corpus-based instruction [21,42,45].

In addition to improving specific language skills, corpus-based instruction also fosters autonomous learning and discovery. Some scholars have also noticed that learners' learning autonomy is improved [27,48,49]. Moreover, Lin [23] and Yoon et al. [41] emphasize that corpus-based instruction raises leaners’ grammar awareness and self-directed study. Previous research has shown that by creating an active, discovery-based learning environment, learners can engage in autonomous learning through the use of authentic language data from online corpora [9,24,37,50]. Furthermore, Smirnova [32] and Jamal et al. [25] demonstrate that these methods increase learning motivation and interest.

Another significant advantage of corpus-based instruction is the access to and use of authentic language data. Researches show that learners improve their language abilities through exposure to real language examples, which are crucial for effective vocabulary learning [12,15,36]. Additionally, Pérez-Paredes [49] and Phoocharoensil [27] highlight the benefits of using corpora data to increase learner engagement and provide real learning experiences [26,43,51].

In addition, corpus-based instruction significantly enhances language patterns and awareness. Several studies demonstrate how this approach leads to a better understanding of language patterns and raises overall language awareness [31,33,33]. Similarly, Zhong and Wakat [10] emphasizes the enhancement of language awareness, learning abilities, and critical thinking skills through DDL. Finally, CBI increases learner engagement and provides valuable feedback. Some researches show that using authentic examples and data significantly boosts learner engagement [52,53]. Similarly, Pérez-Paredes et al. [49] and Phoocharoensil [27] echo these findings, emphasizing the role of real language use in enhancing student participation and motivation.

To conclude, corpus-based instruction and DDL offer numerous advantages in EFL education. These methods enhance language skills, promote autonomous learning, provide access to authentic language data, raise language awareness, and increase learner engagement. Addressing these advantages through targeted training, support, and instructional design can help maximize the effectiveness of corpus-based instruction in EFL classrooms, ultimately leading to improved language learning outcomes.

3.7.2. Challenges

Corpus-based instruction has become increasingly popular in English education. However, the literature reveals several challenges that impact both teachers and students. As show in Table 5, technical challenges are at the forefront of these issues. The complexity and time-consuming nature of using corpus tools are often cited as barriers. Learners find corpus consultation difficult due to the large volume of data and the time required to prepare materials [12,21,46,54] echo these concerns, emphasizing the complexity of interfaces and the need for effective use. Additionally, Li [42] and AL-Maahbashi [8] highlight the necessity of initial training and ongoing support for effective implementation, which is a significant challenge in itself. Moreover, the integration of technology into the classroom further complicates matters [39,41].

Table 5.

Challenges of Corpus-based instruction.

Challenges Details Corresponding Paper
Technical Challenges Complexity and Time Consumption ID1, ID6, ID8, ID42
Initial Training and Support ID2, ID9, ID11
Teacher Training and Presentation ID7, ID36
Learner-Related Challenges Suitability for Lower-Level Learners ID3, ID13
Understanding and Analyzing Data ID5, ID20
Instructional Design and Implementation Challenges Data Analysis and Task Execution
Course Design and Support
ID4, ID21, ID23
ID26, ID29
Other Challenges Individual Differences and Long-Term Effectiveness ID12, ID37
Increased Workload and Limited Resources ID35, ID30
Vocabulary Development ID32, ID39

For EFL learners, the suitability of CBI is also a concern, particularly for those at lower proficiency levels. Smirnova [32] and Boontam [37] discuss the potential inapplicability of corpus-based methods for lower-level students and the need for teacher support to overcome initial difficulties. Moon and Oh [33] and Girgin [52] also point out the challenges students face in understanding and analyzing corpus data, which can lead to information overload. Furthermore, the complexity and time requirements for data analysis and task execution are significant instructional challenges. Luo [55] and Aşık et al. [15] note the time-consuming nature of data analysis and the complexity of using corpora for task execution. Pérez-Paredes et al. [49] adds that learners struggle with formulating queries and managing search pattern complexity. Oktavianti et al. [26] and Qiu [29] discuss the complexities of task design and the need for instructional support in DDL courses. In addition, individual learner differences and the long-term effectiveness of DDL are critical considerations. The importance of considering these factors was highlighted, as well as the potential need for greater emphasis on comprehension and explicit instruction to achieve long-term retention [22,56]. Some scholars also draw attention to the increased workload for teachers and the limitations of resources, which can hinder the development of vocabulary depth among EFL learners [23,30,42].

While corpus-based instruction and DDL offer numerous benefits for EFL learning, these challenges must be addressed to maximize their effectiveness. By incorporating targeted training, support, and instructional design, we can help overcome these obstacles and ensure that the potential of CBI and DDL is fully realized in EFL classrooms.

4. Implications

This systematic review proved the significant potential of Corpus-Based Instruction in enhancing English as a Foreign Language education. The findings suggest that CBI can substantially improve learners' vocabulary, grammar, writing, reading, and communicative skills while promoting autonomous learning. However, several challenges were identified, and strategies to address these issues were proposed. To begin with, technical simplification is essential for overcoming challenges such as complexity and time consumption. Developing user-friendly corpus tools and interfaces, along with integrating automation technologies to expedite data processing and material preparation, can make CBI more accessible and efficient for both teachers and learners [12,21,46,54]. Furthermore, comprehensive training and support are crucial. Initial training and ongoing professional development opportunities should be provided to teachers, and establishing online resources and communities can offer continuous support and facilitate experience sharing among educators [8,13,36]. Additionally, specialized training on integrating technology into the classroom, combined with workshops and seminars showcasing best practices, can significantly enhance teachers' capabilities [39,41].

Moreover, adapting tasks to be more accessible for lower proficiency learners is imperative. Tasks should avoid over-reliance on data and provide guidance frameworks and examples to help these learners understand and apply corpus data effectively [32,37]. Guided tasks and progressive learning activities can alleviate difficulties in data analysis and information overload, while visualizing data can simplify its presentation and comprehension [33,52]. In terms of instructional design and implementation, leveraging automated analysis tools and pre-set task templates can simplify the data analysis process [15,24,49]. Establishing standardized task execution processes and guidelines can reduce the burden on both teachers and students. Furthermore, integrating support mechanisms such as tutorials and consultation times into course design can manage task complexity and provide additional assistance [26,29]. Collaborative learning environments can also encourage mutual support among students during task execution.

In addition, allocating sufficient resources and designing personalized learning paths can further enhance the effectiveness of CBI. These strategies cater to the diverse needs of learners, ensuring that all students, regardless of proficiency level, benefit from corpus-based instruction. The review also underscores the importance of continued research in several areas. Future studies should include broader samples encompassing learners from different regions, languages, and educational levels, providing a more comprehensive understanding of CBI 's effectiveness across various contexts. Interdisciplinary research should explore the application of CBI in different educational fields and disciplines, uncovering new insights and innovative applications of corpus-based instruction. Future research should consider the impact of cultural and social factors on the implementation of CBI. Designing CBI activities that are culturally relevant and suitable for different socio-cultural backgrounds can enhance their effectiveness and acceptance. The dominance of mixed-research methods in current studies indicates the value of combining quantitative and qualitative approaches. Future research should continue leveraging mixed-method approaches to provide a balanced and nuanced understanding of CBI's applications and implications in EFL classrooms. Furthermore, exploring innovative tools and techniques can further enrich the understanding of CBI's impact on language education.

5. Limitations

The scope of this review is limited as it concentrates exclusively on English-language papers published between 2011 and 2024, sourced from specific databases using predetermined keywords. This approach may have inadvertently excluded relevant research published in other languages, across different time periods, or found in other databases. Additionally, the review may also overlook variations in educational policies, curriculum standards, and teaching practices across different countries and regions, which could impact the implementation and evaluation of CBI. Furthermore, due to the rapid pace of technological advancement, certain aspects of CBI—such as learning tools, strategies, or activities—may quickly become outdated or evolve. Consequently, ongoing research is essential to document these changes and adapt to the evolving nature of CBI.

6. Conclusion

This systematic review seeks to provide a comprehensive analysis of the application of Corpus-Based Instruction in English as a Foreign Language classrooms. The analysis, addressing a series of research questions, reveals that CBI has significant potential for enhancing students' English proficiency, including vocabulary, grammar, writing, reading, and communicative skills, as well as fostering students' autonomous learning abilities. However, challenges and limitations in CBI implementation are also evident. While various classroom activities have been developed to target different skills and purposes, most studies predominantly focus on writing and vocabulary development among proficient learners in higher education. This uneven focus on specific proficiency levels restricts the broader application of CBI. What's more, the complexity and time-consuming nature of corpus use necessitate initial training and support for both teachers and learners. This requirement may be challenging for lower-level students, who could struggle with understanding and analyzing data and may become overly reliant on corpus data. Instructional design and implementation also present challenges. In response, the review suggests several effective strategies to address these issues, including comprehensive training, simplified resources and tasks, personalized learning paths, and increased resource allocation. These strategies aim to enhance the effectiveness of CBI in EFL classrooms and improve language learning outcomes.

Nonetheless, the review acknowledges several limitations. Future research should involve broader samples, including learners from diverse regions, languages, and educational levels. Interdisciplinary studies are encouraged to explore CBI's application across different educational fields and disciplines. Addressing these limitations will allow for a deeper and more comprehensive exploration of corpus and Data-Driven Learning in language education, providing a stronger theoretical and empirical foundation for practice.

CRediT authorship contribution statement

Dandan Li: Writing – review & editing, Writing – original draft, Visualization, Validation, Resources, Methodology, Investigation, Formal analysis, Data curation, Conceptualization, Project administration. Nooreen Noordin: Writing – review & editing, Supervision, Conceptualization. Lilliati Ismail: Writing – review & editing, Supervision, Conceptualization. Dan Cao: Writing – review & editing, Supervision.

Data availability statement

Data included in article/supplementary material/referenced in article.

Additional information

No additional information is available for this paper.

Funding statement

This systematic review received no specific funding or financial support.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Appendix A. Details of the reviewed studies

ID Reference Country Institution Learning Activities Research method
ID1 [7] Egypt Tertiary School Concordance software, language analysis Mixed-methods approach
ID2 [13] China Tertiary School Corpus-assisted method, inductive learning Mixed-methods approach
ID3 [32] Russia Tertiary School Vocabulary teaching, writing instruction Mixed-methods approach
ID4 [24] China Tertiary School Error correction, concordance analysis Mixed-methods approach
ID5 [33] South Korea Secondary
School
DDL induction, concordance line analysis Mixed-methods approach
ID6 [21] Turkey Tertiary School Concordance line analysis, language pattern Qualitative research design
ID7 [39] Egypt Tertiary School Corpus-Based Program, writing skills Quasi-experimental design (Quantitative)
ID8 [12] Turkey Tertiary School Not provided Mixed-methods approach
ID9 [8] Yemen Not specified Concordance programs, active learning Quantitative
ID10 [34] China Secondary
School
Corpus training, collocation errors Quantitative
ID11 [36] Saudi Arabia Tertiary School Direct corpora use, language pattern Quantitative
ID12 [22] China Tertiary School Vocabulary learning, inductive and deductive approaches Quantitative)
ID13 [37] Thailand Tertiary School Concordance line analysis, self-discovery Quantitative
ID14 [11] Malaysia Secondary
School
Not provided Mixed-method
ID15 [48] Iran Secondary
School
Corpus-based instruction, vocabulary enhancement Quantitative
ID16 [53] Turkey Tertiary School Concordancing, collocation exercises Quantitative
ID17 [40] China Not specified Corpus-based lesson design, inductive discovery Mixed methods design
ID18 [25] Pakistan Tertiary School Not provided Qualitative
ID19 [14] Algeria Tertiary School Not provided Experimental
ID20 [52] Turkey Tertiary School Not provided Mixed method approach
ID21 [57] Turkey Tertiary School Not provided Mixed method
ID22 [31] China Tertiary School Not provided Quantitative
ID23 [49] Spain Tertiary School Not provided Mixed-methods approach
ID24 [27] Thailand Tertiary School Not provided Mixed-methods approach
ID25 [43] Not specified Tertiary School Not provided Mixed-methods approach,
ID26 [26] Indonesia Tertiary School Not provided Survey approach
ID27 [38] Turkey Secondary
School
ACAT, self-guided learning Mixed-methods
ID28 [58] Japan Tertiary School KWIC concordance, lexical profiling Quantitative
ID29 [29] China Tertiary School Discipline-specific corpus, linguistic features Mixed methods
ID30 [30] Colombia Tertiary School Not provided Quantitative
ID31 [9] Turkey Tertiary School Corpus-based teaching, AntConc software Mixed-methods
ID32 [59] China Secondary
School
Not provided Empirical study
ID33 [10] China Secondary
School
Corpus-integrated lessons, grammar instruction Mixed methods
ID34 [60] Ethiopia Tertiary School Corpus-based teaching, academic writing Quasi-experimental research design
ID35 [23] China Tertiary School DDL, linguistic data observation Mixed approach
ID36 [41] South Korea Tertiary School Error correction, language detective role Mixed methods
ID37 [56] Turkey Tertiary School Dialogue recognition and production Quasi-experimental
ID38 [61] Italy Tertiary School Corpus analysis, concordance searches case studies
ID39 [62] Iran Not specified Concordance practice, collocations Quantitative
ID40 [63] Egypt Secondary
School
Concordances, functional lexical bundles Qualitative instrument
ID41 [64] China Tertiary School Corpus consultation, self-access learning Mixed-methods approach
ID42 [54] Turkey Tertiary School Corpus searches, corpus-based activities Mixed-methods
ID43 [65] Germany Tertiary School Concordance lines exercises, business lexical items Mixed-methods
ID44 [66] Indonesia Tertiary School Corpora teaching, vocabulary, grammar, writing Mixed methods

References

  • 1.McEnery T., Gabrielatos C. English corpus linguistics. Handb. Engl. Linguist. 2006:33–71. [Google Scholar]
  • 2.Leech G. Routledge; 2014. Teaching and Language Corpora: A Convergence; pp. 1–24. [Google Scholar]
  • 3.Johns T. 1991. Should You Be Persuaded: Two Samples of Data-Driven Learning Materials, Na. [Google Scholar]
  • 4.Boulton A. Data‐driven learning: taking the computer out of the equation. Lang. Learn. 2010;60:534–572. doi: 10.1111/j.1467-9922.2010.00566.x. [DOI] [Google Scholar]
  • 5.Conrad S. Will corpus linguistics revolutionize grammar teaching in the 21st century? Tesol Q. 2000;34:548–560. [Google Scholar]
  • 6.O’keeffe A., McCarthy M., Carter R. Cambridge University Press; 2007. From corpus to classroom: language use and language teaching; pp. 58–79.https://www.google.com/books?hl=zh-CN&lr=&id=O3k0tWLp5LYC&oi=fnd&pg=PA10&dq=%E2%80%A2%09O%E2%80%99Keeffe,+A.,+McCarthy,+M.,+%26+Carter,+R.+(2007).+From+Corpus+to+Classroom:+Language+Use+and+Language+Teaching.+&ots=4tiEMl7d1A&sig=p-kQU2dDUENGSZiHUb8J1OZcmYE June 5, 2024. [Google Scholar]
  • 7.Abdel-Haq E.M., Ali H.S. Utilizing the corpus approach in developing EFL writing skills. J. Res. Curric. Instr. Educ. Technol. 2017;3:11–44. [Google Scholar]
  • 8.Al-Mahbashi A., Noor N.M., Amir Z. The effect of data driven learning on receptive vocabulary knowledge of Yemeni University learners. 3L Lang. Linguist. Lit. 2015;21 [Google Scholar]
  • 9.Özer M., Özbay A.S. Exploring the data-driven approach to grammar instruction in the ELT context of Turkey. Aust. J. Appl. Linguist. 2022;5:35–63. [Google Scholar]
  • 10.Zhong X., Wakat G. Enhancing grammar proficiency of EFL learners through corpus-integrated lessons. Ampersand. 2023;11 doi: 10.1016/j.amper.2023.100139. [DOI] [Google Scholar]
  • 11.Habibi H., Salleh A.H., Sarjit Singh M.K. The effect of reading on improving the writing of EFL students. Pertanika J. Soc. Sci. Humanit. 2015;23 [Google Scholar]
  • 12.Kartal G. The effects of using corpus tools on EFL student teachers' learning and production of Verb-Noun collocations. PASAA. 2018;55:100–125. doi: 10.58837/CHULA.PASAA.55.1.5. [DOI] [Google Scholar]
  • 13.Li S. 2017. Using Corpora to Develop Learners' Collocational Competence. [Google Scholar]
  • 14.Bouzekria H., Mashaqba B., Al Khalaf E., Huneety A. Production of pragmatic routines by Algerian EFL learners: the effect of corpus-based instruction. Ampersand. 2023;10 doi: 10.1016/j.amper.2023.100122. [DOI] [Google Scholar]
  • 15.Aşık A., Vural A.Ş., Akpınar K.D. Lexical awareness and development through data driven learning: attitudes and beliefs of EFL learners. J. Educ. Train. Stud. 2015;4:87–96. doi: 10.11114/jets.v4i3.1223. [DOI] [Google Scholar]
  • 16.Li S., Xu M. Corpus literacy empowerment: taking stock of research to look forward for practice. J. China Comput.-Assist. Lang. Learn. 2022;2:126–155. [Google Scholar]
  • 17.Vyatkina N., Boulton A. Corpora in language teaching and learning. Lang. Learn. Technol. 2017;21:66–89. [Google Scholar]
  • 18.Moher D., Liberati A., Tetzlaff J., Altman D.G. t PRISMA Group∗, Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Ann. Intern. Med. 2009;151:264–269. doi: 10.7326/0003-4819-151-4-200908180-00135. [DOI] [PubMed] [Google Scholar]
  • 19.Page M.J., McKenzie J.E., Bossuyt P.M., Boutron I., Hoffmann T.C., Mulrow C.D., Shamseer L., Tetzlaff J.M., Akl E.A., Brennan S.E. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ. 2021;372 doi: 10.1136/bmj.n71. https://www.bmj.com/content/372/bmj.n71.short [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Littell J.H., Corcoran J., Pillai V. Oxford University Press; 2008. Systematic reviews and meta-analysis; pp. 85–120.https://www.google.com/books?hl=zh-CN&lr=&id=UpsRDAAAQBAJ&oi=fnd&pg=PR5&dq=J.H.+Littell,+J.+Corcoran,+V.+Pillai,+Systematic+Reviews+and+Meta-Analysis,+Oxford+University+Press,+Oxford,+United+Kingdom,+2008.&ots=ViU45jlpfb&sig=Be93CmPrxKDhwISW5xtwPYphClg [Google Scholar]
  • 21.Çalışkan G., Gönen S.İ.K. Training teachers on corpus-based language pedagogy: perceptions on using concordance lines in vocabulary instruction. J. Lang. Linguist. Stud. 2018;14:190–210. [Google Scholar]
  • 22.Lee P., Lin H. The effect of the inductive and deductive data-driven learning (DDL) on vocabulary acquisition and retention. System. 2019;81:14–25. doi: 10.1016/j.system.2018.12.011. [DOI] [Google Scholar]
  • 23.Lin M.H. Effects of corpus‐aided language learning in the EFL grammar classroom: a case study of students' learning attitudes and teachers' perceptions in Taiwan. Tesol Q. 2016;50:871–893. [Google Scholar]
  • 24.Luo Q., Liao Y. Using corpora for error correction in EFL learners' writing, J. Lang. Teach. Res. 2015;6:1333. [Google Scholar]
  • 25.Jamal J., Shafqat A., Afzal E. Teachers' perceptions of incorporation of corpus-based approach in English language teaching classrooms in Karachi, Pakistan. Lib. Arts Soc. Sci. Int. J. LASSIJ. 2021;5:611–629. [Google Scholar]
  • 26.Oktavianti I.N., Eriani E., Rolyna I., Prayogi I. Investigating the use of corpus-informed grammar materials in Indonesian EFL classrooms. Indones. J. Engl. Lang. Teach. Appl. Linguist. 2023;7:417–438. [Google Scholar]
  • 27.Phoocharoensil S. Language corpora for EFL teachers: an exploration of English grammar through concordance lines. Procedia - Soc. Behav. Sci. 2012;64:507–514. doi: 10.1016/j.sbspro.2012.11.060. [DOI] [Google Scholar]
  • 28.Amare T. Critical Thinking Skills, Engagement, and Perception; 2022. Effects of Corpus-Based Instruction on EFL Students' Academic Writing Skills. [Google Scholar]
  • 29.Qiu X. 2024. Exploring the Effect of Corpus‐Based Writing Instruction on Learner‐Corpus Interaction in L2 Revision: A Study of Chinese EFL Disciplinary Writers, TESOL Q. [Google Scholar]
  • 30.Rodríguez-Fuentes R.A., Swatek A.M. Exploring the effect of corpus-informed and conventional homework materials on fostering EFL students' grammatical construction learning. System. 2022;104 doi: 10.1016/j.system.2021.102676. [DOI] [Google Scholar]
  • 31.Yang Y.-F., Wong W.-K., Yeh H.-C. Learning to construct English (L2) sentences in a bilingual corpus-based system. System. 2013;41:677–690. doi: 10.1016/j.system.2013.07.014. [DOI] [Google Scholar]
  • 32.Smirnova E.A. Using corpora in EFL classrooms: the case study of IELTS preparation. RELC J. 2017;48:302–310. [Google Scholar]
  • 33.Moon S., Oh S.-Y. Unlearning overgenerated be through data-driven learning in the secondary EFL classroom. ReCALL. 2018;30:48–67. [Google Scholar]
  • 34.Fang L., Ma Q., Yan J. The effectiveness of corpus-based training on collocation use in L2 writing for Chinese senior secondary school students, J. China Comput.-Assist. Lang. Learn. 2021;1:80–109. [Google Scholar]
  • 35.Poole R. “Corpus can be tricky”: revisiting teacher attitudes towards corpus-aided language learning and teaching. Comput. Assist. Lang. Learn. 2022;35:1620–1641. [Google Scholar]
  • 36.Youssef A.F.F.-A. The effectiveness of corpus-based approach on vocabulary learning gains and retention in Saudi tertiary EFL context. J. Engl. Lang. Teach. Appl. Linguist. 2020;2:1–15. [Google Scholar]
  • 37.Boontam P. The effect of teaching English synonyms through data-driven learning (DDL) on Thai EFL students' vocabulary learning. Shanlax Int. J. Educ. 2022;10:80–91. [Google Scholar]
  • 38.Ozer M., Ozbay A.S. 2021. Exploring the Effectiveness of DDL in an L2 Context through a Non-control-group Asynchronous Experimental Design. [Google Scholar]
  • 39.Qoura Y.A., Hassan B., Mostafa A. The impact of corpus-based program on enhancing the EFL student teachers' writing skills and self-autonomy. J. Res. Curric. Instr. Educ. Technol. 2018;4:11–53. [Google Scholar]
  • 40.Ma Q., Tang J., Lin S. The development of corpus-based language pedagogy for TESOL teachers: a two-step training approach facilitated by online collaboration. Comput. Assist. Lang. Learn. 2022;35:2731–2760. doi: 10.1080/09588221.2021.1895225. [DOI] [Google Scholar]
  • 41.Yoon H., Jo J. Error Correction and Learning Strategy Use in L2 Writing. 2014. Direct and indirect access to corpora: an exploratory case study comparing students'. [Google Scholar]
  • 42.Li H. Research on task-driven lexical chunk teaching model based on corpus. J. Liaoning Univ. Technol. Soc. Sci. Ed. 01. 2017:59–61. doi: 10.15916/j.issn1674-327x.2017.01.018. [DOI] [Google Scholar]
  • 43.Boulton A., Carter-Thomas S., Rowley-Jolivet E. Issues in corpus-informed research and learning in ESP. Corpus Inf. Res. Learn. ESP. 2012:1–14. [Google Scholar]
  • 44.Girgin U. The effectiveness of using corpus-based activities on the learning of some phrasal-prepositional verbs. Turk. Online J. Educ. Technol.-TOJET. 2019;18:118–125. [Google Scholar]
  • 45.Demirel E.T., Kazazoğlu S. The comparison of collocation use by Turkish and asian learners of English: the case of TCSE corpus and icnale corpus. Procedia - Soc. Behav. Sci. 2015;174:2278–2284. doi: 10.1016/j.sbspro.2015.01.887. [DOI] [Google Scholar]
  • 46.Abdel-Haq D.E.M. vol. 3. 2017. (Utilizing the Corpus Approach in Developing EFL Writing Skills). [Google Scholar]
  • 47.Syed N.M., Quraishi U., Kazi A.S. 2019. English Language Textbook and Development of Oral Communicative Competence in Grade VIII Students of Public Sector Schools in Punjab. [Google Scholar]
  • 48.Ashkan L., Seyyedrezaei S.H. The effect of corpus-based language teaching on Iranian EFL learners' vocabulary learning and retention. Int. J. Engl. Ling. 2016;6:190–196. [Google Scholar]
  • 49.Pérez-Paredes P., Sánchez-Tornel M., Calero J.M.A. Learners' search patterns during corpus-based focus-on-form activities: a study on hands-on concordancing. Int. J. Corpus Linguist. 2012;17:482–515. [Google Scholar]
  • 50.Kuzminykh I.A., Khoroshilova S.P. Investigating the impact of corpus-based classroom activities in English phonetics classes on students' academic progress, Novosib. State Pedagog. Univ. Bull. 2017;7:40–51. doi: 10.15293/2226-3365.1704.03. [DOI] [Google Scholar]
  • 51.Ozer M., Ozbay A.S. Exploring the effectiveness of DDL in an L2 context through a non-control-group asynchronous experimental design. 2021. [DOI]
  • 52.Girgin U. Perceptions of Turkish EFL student teachers towards learning phrasal-prepositional verbs through corpus-based materials. Lang. Teach. Educ. Res. 2019;2:1–19. [Google Scholar]
  • 53.Uçar S., Yüksel C. The effect of corpus-based activities on verb-noun collocations in EFL classes. Turk. Online J. Educ. Technol. 2015;14 [Google Scholar]
  • 54.Emir G., Yangın-Ekşi G. Corpus used as a data-driven learning tool in L2 academic writing: evidence from Turkish contexts. Teflin J. 2023;34:209–225. [Google Scholar]
  • 55.Luo Q. The effects of data-driven learning activities on EFL learners' writing development. SpringerPlus. 2016;5:1255. doi: 10.1186/s40064-016-2935-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Yilmaz N., Koban Koç D. Developing pragmatic comprehension and production: corpus-based teaching of formulaic sequences in an EFL setting. Dil Ve Dilbilimi Çalışmaları Derg. 2020;16:474–488. doi: 10.17263/jlls.712880. [DOI] [Google Scholar]
  • 57.Asik A., Vural A.S., Akpinar K.D. Lexical awareness and development through data driven learning: attitudes and beliefs of EFL learners. J. Educ. Train. Stud. 2016;4:67–96. [Google Scholar]
  • 58.Chujo K., Kobayashi Y., Mizumoto A., Oghigian K. Exploring the effectiveness of combined web-based corpus tools for beginner EFL DDL. Linguist. Lit. Stud. 2016;4:262–274. [Google Scholar]
  • 59.Li X. Proc. Int. Conf. Contemp. Educ. Soc. Sci. Ecol. Stud. CESSES 2018. Atlantis Press; Moscow, Russia: 2018. Exploring Chinese EFL learners' acquisition development of depth of vocabulary knowledge based on input corpus and output corpus. [DOI] [Google Scholar]
  • 60.Birhan A.T., Teka M., Asrade N. Effects of using Corpus-Based instructional mediation on EFL Students' academic writing skills improvement, Theory Pract. Second Lang. Acquis. 2021;7:133–153. [Google Scholar]
  • 61.Corino E., Onesti C. Frontiers Media SA; 2019. Data-driven Learning: A Scaffolding Methodology for CLIL and LSP Teaching and Learning; p. 7. [Google Scholar]
  • 62.Foomani E.M., Khalaji K. Corpus-based versus traditional collocation learning: the case of Iranian EFL learners. J. Soc. Sci. Stud. 2016;3:103–116. [Google Scholar]
  • 63.Nour D. Corpus-based instruction and the acquisition of functional lexical bundles: EFL Egyptian school learners' perspective. هرمس. 2021;10:89–116. [Google Scholar]
  • 64.Chen L. Corpus-Aided business English collocation pedagogy: an empirical study in Chinese EFL learners. Engl. Lang. Teach. 2017;10:181–197. [Google Scholar]
  • 65.O'Donoghue J., Jung C.K. Corpus pedagogy: analyzing corpus use in the classroom and EFL business student attitudes towards corpora. 영어교육연구. 2013;25:51–74. [Google Scholar]
  • 66.Oktavianti I.N., Triyoga A., Prayogi I. Corpus for language teaching: student'Perceptions and difficulties. Proj. Prof. J. Engl. Educ. 2022;5:441–455. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Data included in article/supplementary material/referenced in article.


Articles from Heliyon are provided here courtesy of Elsevier

RESOURCES