Skip to main content
Perspectives in Clinical Research logoLink to Perspectives in Clinical Research
. 2025 Sep 20;16(4):211–217. doi: 10.4103/picr.picr_226_24

Artificial intelligence enabled audio-to-text transcription and translation for streamlined pharmacovigilance data collection and adverse drug reaction reporting

Tasneem Hussain 1,, Pooja Solanki Mishra 1, Ashutosh Tiwari 1, Manan Parmar 1, Shubham Gadaria 1, Pinkey Kannoj 1
PMCID: PMC12591525  PMID: 41209780

Abstract

Context:

Adverse drug reaction (ADR) reporting in pharmacovigilance is critical for patient safety but often limited by resource constraints and manual inefficiencies. The integration of artificial intelligence (AI) has the potential to address these challenges by streamlining the reporting process.

Aims:

The aim of the study was to assess the performance of an AI-enabled system for audio-to-text transcription, translation, ADR form completion, and causality assessment based on the World Health Organization-Uppsala Monitoring Centre scale.

Settings and Design:

A computational comparative, cross-sectional study involving healthcare professionals and patients to evaluate the AI system’s functionality in a real-world pharmacovigilance setting.

Methodology:

A hundred participants (50 healthcare professionals and 50 patients) provided audio-recorded ADR reports. These recordings were processed through the AI system to generate transcriptions, translations, and ADR forms. The system’s performance was assessed using transcription metrics (word error rate [WER], character error rate [CER], Sentence Error Rate [SER]), translation metrics (bilingual evaluation understudy [BLEU] score, Translation Edit Rate [TER]), and ADR form accuracy. Causality assessments by the AI were compared against expert evaluations.

Statistical Analysis Used:

Descriptive and analytical statistics (unpaired t-test) were applied to evaluate the performance metrics and compare results between the two participant groups.

Results:

The AI system demonstrated high accuracy in transcription (WER <0.05, CER <0.04, and SER <0.35) and translation (BLEU >0.85 and TER <0.05). ADR form completion achieved near-perfect accuracy with minor discrepancies. Causality assessments were consistent across healthcare professional and patient data (P = 1).

Conclusions:

The AI-enabled system effectively streamlined ADR reporting, ensuring accuracy in transcription, translation, and causality assessment while maintaining consistency across groups. Its integration into pharmacovigilance processes can reduce workloads, enhance reporting rates, and improve global health outcomes.

Keywords: Adverse drug reaction reporting, artificial intelligence, causality assessment tools, natural language processing, pharmacovigilance

INTRODUCTION

Adverse drug reactions (ADRs) are a major concern in the field of health care, affecting patient safety, treatment outcomes, and healthcare costs globally. Systematic reporting and analysis of ADRs form the backbone of pharmacovigilance – the discipline focused on detecting, assessing, understanding, and preventing adverse effects or any other drug-related problems.[1] However, in India, ADR reporting remains alarmingly low. It is estimated that only about 1% of ADRs are reported in comparison to the global average of approximately 5%.[2] This significant disparity highlights the urgent need for improvements in India’s pharmacovigilance system.

The under-reporting of ADRs in India can be attributed to multiple factors, including a lack of awareness among healthcare professionals, insufficient training, and the absence of a robust reporting infrastructure. In addition, cultural and systemic barriers often discourage voluntary reporting, resulting in a significant gap in the detection of potential drug-related risks.[3] This under-reporting can lead to undetected safety signals and preventable harm to patients.

To standardize the evaluation of ADRs, the World Health Organization (WHO) established The WHO-Uppsala Monitoring Centre (WHO-UMC) causality assessment scale standardizes the evaluation of ADRs by categorizing causality from “certain” to “unlikely” based on factors such as the temporal relationship between drug administration and the reaction, response to drug withdrawal (dechallenge), and the absence of alternative causes.[4] Widely used in pharmacovigilance, this scale ensures consistency, reliability, and accuracy in ADR assessment, serving as a critical tool for clinicians and regulatory authorities.

The integration of artificial intelligence (AI) into pharmacovigilance offers a transformative approach to addressing the persistent issues of under-reporting and inefficiencies in ADR monitoring. AI technologies are increasingly being utilized to automate the detection, collection, and analysis of ADR data, enhancing the speed and accuracy of reporting. In addition, AI can assist in causality assessment by processing complex data sets and supporting decision-making processes, thereby reducing the workload on healthcare professionals and improving the overall reliability of pharmacovigilance efforts.[5]

This study explores the integration of AI to enhance pharmacovigilance by automating transcription, translation, ADR form completion, and causality assessment using the WHO-UMC scale. AI technologies, including machine learning and natural language processing (NLP), can improve the accuracy, efficiency, and consistency of ADR reporting while reducing manual workload and errors. By converting audio ADR reports into structured text, translating them into English, and automating causality assessment, AI has the potential to streamline pharmacovigilance workflows. Given the significant under-reporting of ADRs in India, this study aims to assess AI’s reliability in addressing these challenges and strengthening the pharmacovigilance system.

METHODOLOGY

This study evaluated the performance of an AI system in transcribing and translating ADR reports, completing ADR forms, and assessing causality using the WHO-UMC scale. The primary objective was to compare the accuracy, reliability, and quality of AI-generated outputs against manual methods using established metrics. This study was initiated after obtaining permission from the Institutional Ethics Committee (approval reference number is EC/MGM/August24/193). A visual representation of the study methodology is provided in Figure 1.

Figure 1.

Figure 1

Study methodology flowchart. A detailed flowchart illustrating the steps of the study methodology, including participant recruitment, data collection, transcription, translation, adverse drug reaction form filling, and data analysis. ADR = Adverse drug reaction

The AI system used in this study is an NLP-based large language model designed for audio-to-text transcription, translation, ADR reporting, completing ADR forms, and assessing causality using the WHO-UMC scale. It utilizes deep learning algorithms to process and analyze pharmacovigilance data efficiently, facilitating streamlined ADR reporting.

Informed consent was obtained from all participants before their enrollment. Participants were assured of the confidentiality of their data and were informed of their right to withdraw from the study at any time without any repercussions.

The study included 100 participants, equally divided between healthcare professionals and patients, to evaluate the AI system’s functionality within logistical constraints. This pilot study provides a foundation for larger-scale research. Participants were selected based on inclusion criteria: individuals aged 18 or older, healthcare professionals with at least 1 year of clinical experience, and patients who had reported or experienced an ADR within the past 6 months. All participants had to provide verbal ADR reports in any language. Exclusion criteria included unwillingness to provide informed consent or speech impairments affecting audio clarity.

Data collection focused on capturing audio recordings of ADR reports. Participants were asked to verbally report ADRs associated with specific drugs. These audio recordings served as the primary input for the AI system.

The AI system processed these recordings in two stages: transcription followed by translation of transcribed data to English. During transcription, the AI converted audio data into textual format.

The AI system was validated by comparing its transcription and translation with human-generated ADR reports, assessing accuracy, completeness, consistency, and error rates. Cross-checking with manually transcribed forms ensured reliability, while evaluation using the WHO-UMC causality assessment criteria confirmed its applicability in pharmacovigilance reporting.

The AI system’s transcription and translation outputs were validated against reference texts curated by experts to ensure benchmark accuracy. Expert-prepared reference documents served as the gold standard for evaluating the precision of transcription and translation results produced by the AI system. The accuracy of both transcription and translation was quantitatively assessed by comparing the AI-generated outputs with expert-validated reference texts, providing a robust measure of performance.

The transcription accuracy was evaluated using quantitative metrics: Word error rate (WER), which measured the percentage of incorrectly transcribed words; Character error rate (CER), assessing character-level transcription errors; and Sentence error rate (SER), representing the percentage of sentences containing at least one error. These metrics and the formulae used for these metrics are detailed in Table 1.

Table 1.

Metrics for evaluation transcription accuracy

Metric Formula
WER graphic file with name PCR-16-211-g002.jpg
CER graphic file with name PCR-16-211-g003.jpg
SER graphic file with name PCR-16-211-g004.jpg

WER=Word error rate, CER=Character error rate, SER=Sentence error rate

The AI system translated transcribed text into English, with translation quality assessed using bilingual evaluation understudy (BLEU) score and Translation Edit Rate (TER). BLEU measured N-Gram accuracy against expert-curated reference translations, while TER quantified the edits needed to align AI-generated translations with the reference. BLEU scores were computed using Python’s NLTK library, ensuring a standardized evaluation of translation accuracy.[6] The metrics and formulae used for these metrics are detailed in Table 2.

Table 2.

Metrics for evaluating translation accuracy

Metric Formula
BLEU score graphic file with name PCR-16-211-g005.jpg
TER graphic file with name PCR-16-211-g006.jpg

BLEU=BiLingual evaluation understudy score, TER=Translation edit rate

The AI system completed ADR forms using transcribed and translated data, with accuracy and completeness assessed across multiple parameters, including patient and drug information, ADR description, reaction outcome, and consistency. AI-generated forms were systematically compared to expert-reviewed forms, which served as the standard for evaluating precision, consistency, and reliability. This approach ensured a robust assessment of the AI’s performance in replicating expert-level ADR documentation. The parameters and scoring system are detailed in Table 3.

Table 3.

Parameters for evaluating adverse drug reaction form filling

Parameter Description Scoring system
Patient information Accuracy of demographic and clinical details 0: Incomplete, 1: Partially complete, 2: Complete
Drug information Correct identification of suspected drugs 0: Incorrect, 1: Partially accurate, 2: Accurate
Adverse reaction description Completeness of ADR narrative 0: Incomplete, 1: Partially complete, 2: Complete
Outcome of the reaction Accurate reporting of ADR outcomes 0: Incorrect, 1: Partially accurate, 2: Accurate
Date and time Precision in recording timing details 0: Missing, 1: Partially accurate, 2: Accurate
Field consistency Uniformity of data across sections 0: Inconsistent, 1: Partially consistent, 2: Consistent
Terminology consistency Use of standardized medical terms
Missing entries Identification of omitted data fields 0: Present, 1: Minor omissions, 2: None
Incorrect entries Detection of inaccurate or inconsistent information 0: Frequent, 1: Minor errors, 2: None
Translation errors Analysis of errors introduced during translation

ADR=Adverse drug reaction

Causality assessment performed by an AI system, based on the WHO-UMC scale, was compared to expert evaluations to assess alignment. Expert reviewers’ assessment served as the reference standard for validating AI-generated results. A scoring system was employed to evaluate consistency and reliability of the AI in replicating expert judgment, with scores assigned as follows: 0 for inconsistent, 1 for partially consistent, and 2 for consistent assessments.

Data analysis included descriptive statistics to summarize performance metrics (WER, CER, SER, BLEU, and TER) and ADR form accuracy, while t-tests assessed differences between healthcare professionals and patients. Statistical significance was determined at P < 0.05.

RESULTS

This study evaluates the performance of an AI-enabled system in transcribing, translating, and processing ADR reports provided by healthcare professionals and patients. The results are presented across key metrics including transcription accuracy, translation quality, ADR form completion, and causality assessment along with statistical analyses to compare the system’s performance across both groups.

Transcription accuracy

The AI-enabled system demonstrated high transcription accuracy for ADR reports, with comparable performance between healthcare professionals and patients. The WER averaged 0.04 ± 0.0189 for patients and 0.03 ± 0.0267 for healthcare professionals, while the CER was 0.033 ± 0.0342 and 0.021 ± 0.0167, respectively. The SER was higher for patients (0.32 ± 0.1794) than for healthcare professionals (0.26 ± 0.1280), reflecting greater transcription complexity in patient-generated audio. All metrics remained within accepted thresholds for ADR reporting (WER <0.05, CER <0.04, and SER <0.35), with no statistically significant differences (P > 0.05) [Table 4]. These findings confirm the AI system’s consistent and accurate performance across both groups. A line graph [Figure 2] visualizes the average transcription metrics, while a scatter plot [Figure 3] illustrates the nonsignificant P values (all P > 0.05).

Table 4.

Transcription and translation metrics

Metrics Patients (mean±SD) Healthcare professionals (mean±SD) P
WER 0.04±0.0189 0.03±0.0267 0.061778*
CER 0.033±0.0342 0.021±0.0167 0.068266*
SER 0.32±0.1794 0.26±0.1280 0.138757*
BLEU score 0.90±0.1017 0.93±0.0789 0.308569*
TER 0.04±0.0313 0.032±0.0250 0.283692*

*P>0.05 (not significant). WER=Word error rate, CER=Character error rate, SER=Sentence error rate, BLEU=Bilingual evaluation understudy, TER=Translation edit rate, SD=Standard deviation

Figure 2.

Figure 2

Trend of Transcription and Translation Metrics Mean for Patients and Healthcare Professionals. A graph showing the average values of transcription and translation metrics (Word error rate, Character error rate, Sentence error rate, Bilingual evaluation understudy, Translation edit rate) for patients and healthcare professionals over the study period. WER = Word error rate, CER = Character error rate, SER = Sentence error rate, BLEU = Bilingual evaluation understudy, TER = Translation edit rate

Figure 3.

Figure 3

P value analysis across transcription and translation metrics. Statistical comparison of transcription and translation accuracy metrics for patients versus healthcare professionals, represented by P values

Translation quality

The AI system’s translation quality, assessed using BLEU and TER scores, showed slightly higher performance for healthcare professionals (BLEU: 0.93 ± 0.0789, TER: 0.032 ± 0.0250) compared to patients (BLEU: 0.90 ± 0.1017, TER: 0.04 ± 0.0313). Both metrics remained within accepted benchmarks (BLEU >0.85, TER <0.05) for practical application, with no statistically significant differences between groups (P > 0.05) [Table 4]. These findings confirm the AI system’s consistent and reliable translation performance. Figure 2 illustrates the comparison of BLEU and TER scores, while Figure 3 confirms the absence of statistical significance (P > 0.05).

Adverse drug reaction form completion

The AI system demonstrated high accuracy and completeness in filling ADR forms for both groups, with most fields, including patient and drug information and adverse reaction descriptions, recorded with precision. Minor errors in certain fields, such as “Outcome of the Reaction,” did not significantly impact overall quality. Accuracy and consistency scores showed no statistically significant differences between groups (P > 0.05) [Table 5], confirming the system’s reliability across both. Figure 4 compares average scores for various ADR form fields, highlighting a high degree of accuracy.

Table 5.

Adverse drug reaction form completion metrics

Metrics Patients (mean±SD) Healthcare professionals (mean±SD) P
Patient information 2.0±0.0 2.0±0.0 1$
Drug information 1.8±0.4 1.93±0.25 0.133*
Adverse reaction description 1.93±0.25 2.0±0.0 0.155*
Outcome of the reaction 2.0±0.0 2.0±0.0 1$
Date and time 1.67±0.60 1.8±0.54 0.376*
Field consistency 1.9±0.3 1.93±0.25 0.647*
Terminology consistency 2.0±0.0 2.0±0.0 1$
Missing entries 2.0±0.0 2.0±0.0 1$
Incorrect entries 1.8±0.6 1.93±0.25 0.274*
Translation errors 1.93±0.25 2.0±0.0 0.16*
Causality assessment (WHO-UMC Scale) 2.0±0.0 2.0±0.0 1$

*P>0.05 is not significant, $P=1 means no difference. SD=Standard deviation, WHP-UMC=World Health Organization-Uppsala monitoring center

Figure 4.

Figure 4

Comparison of adverse drug reaction form completion metrics mean for patients and healthcare professionals. A bar chart or comparative graph depicting average adverse drug reaction form completion metrics (e.g., accuracy, consistency) between patients and healthcare professionals

Causality assessment

The causality assessment scores based on the WHO-UMC scale were identical for both groups (2.0 ± 0.0), demonstrating perfect accuracy and agreement with expert evaluations. The P = 1 confirmed no difference between groups [Table 5 and Figure 5], indicating that the AI system provides reliable and consistent assessments across both patient and healthcare professional data.

Figure 5.

Figure 5

P value trends across adverse drug reaction (ADR) form completion metrics for patients and healthcare professionals. P value trends highlighting the statistical significance of differences in ADR form completion metrics between patient and professional groups

Overall, the AI system demonstrated high accuracy, consistency, and completeness in transcription, translation, ADR form completion, and causality assessment across both patient and healthcare professional data. The transcription and translation metrics consistently met or exceeded the accepted thresholds, and the statistical analyses confirmed that there were no significant differences between the groups. This suggests that the AI system can be used reliably across diverse groups.

DISCUSSION

The aim of this study was to evaluate the effectiveness of AI tools in streamlining pharmacovigilance processes, specifically transcription, translation, ADR reporting, and causality assessment using the WHO-UMC scale by comparing data collected from patients and healthcare professionals. The study findings confirm that AI can significantly enhance these processes, providing accurate, consistent, and efficient outputs.

AI is increasingly utilized in pharmacovigilance for automated data processing, signal detection, adverse event identification, and causality assessment. However, the novelty of this study lies in its application of AI-enabled transcription and translation to pharmacovigilance reporting, an area that has received limited exploration. While previous AI applications focus on ADR signal detection and predictive modeling, this study uniquely applies AI-enabled transcription and translation to pharmacovigilance reporting. By integrating audio-to-text conversion, multilingual translation, and structured data processing, it addresses challenges such as language barriers, underreporting, and documentation inefficiencies. This innovation enhances global pharmacovigilance by improving the speed, accuracy, and accessibility of ADR data collection and reporting.

Existing evidence supports the integration of AI into pharmacovigilance workflows. For instance, Salas et al. highlighted the predominant use of AI in ADR identification, safety report processing, and drug interaction prediction.[5] In this study, AI demonstrated high transcription accuracy with low WER and high BLEU scores, reflecting its reliability in processing pharmacovigilance data.

Generative AI’s transformative potential, as discussed by Mishra and Gupta, is evident in its ability to ensure data uniformity and consistency.[7] Similarly, this study found that AI tools ensured uniformity in data processing, achieving comparable performance metrics for transcription, translation, ADR form completion, and causality assessment across both groups.

The application of machine learning models for automating ADR coding and seriousness assessment, as validated by Martin et al., aligns with this study’s findings.[8] By minimizing human error and enhancing efficiency, AI demonstrated its utility in standardizing pharmacovigilance reporting.

However, challenges exist in fully integrating AI into pharmacovigilance. As noted by Kompa et al., limited adherence to best practices in machine learning has constrained the broader impact of these technologies.[9] This study addresses this gap by validating AI performance using a standardized approach and diverse datasets. Still, real-world application is necessary to assess the reliability of these tools under dynamic and large-scale conditions, such as public health emergencies.

In summary, AI has the potential to make pharmacovigilance processes faster and more accurate, improving patient safety. The study findings add to the growing evidence that AI tools can support pharmacovigilance tasks, and with further development, they could transform the way ADRs are detected and reported globally.[10,11,12,13]

CONCLUSIONS

This study highlights the potential of AI-enabled tools in pharmacovigilance, demonstrating consistent performance in transcription, translation, ADR form completion, and causality assessment. AI integration could streamline ADR reporting, reduce manual workloads, and improve reporting rates. However, larger studies and system optimization are needed for broader applicability. With advancements, AI can enhance ADR detection, reporting, and prevention, improving global patient safety.

Limitations of the study

This study has some limitations. The small sample size limits generalizability, requiring larger studies for validation. The AI system was tested on a narrow range of ADRs, potentially affecting performance in complex cases. In addition, the lack of longitudinal data leaves its long-term reliability untested. Future studies should address these gaps for real-world pharmacovigilance applications.

Conflicts of interest

There are no conflicts of interest.

Funding Statement

Nil.

REFERENCES

  • 1.World Health Organization. Geneva: World Health Organization; 2013. [[Last accessed on 2024 Nov 17]]. Pharmacovigilance. Available from: https://www.who.int/teams/regulation-prequalification/regulation-and-safety/pharmacovigilance . [Google Scholar]
  • 2.Shukla S, Sharma P, Gupta P, Pandey S, Agrawal R, Rathour D, et al. Current scenario and future prospects of adverse drug reactions (ADRs) monitoring and reporting mechanisms in the rural areas of India. Curr Drug Saf. 2024;19:172–90. doi: 10.2174/1574886318666230428144120. [DOI] [PubMed] [Google Scholar]
  • 3.Dutta A, Banerjee A, Basu S, Chaudhry S. Analysis of under-reporting of adverse drug reactions: Scenario in India and neighbouring countries. IP Int J Compr Adv Pharmacol. 2020;5:118–24. [Google Scholar]
  • 4.World Health Organization. Geneva: World Health Organization; 2013. [[Last accessed on 2024 Nov 17]]. The Use of the WHO-UMC System for Standardised Case Causality Assessment. Available from: Available From: https://www.who.int/docs/default-source/medicines/pharmacovigilance/whocausality-assessment.pdf . [Google Scholar]
  • 5.Salas M, Petracek J, Yalamanchili P, Aimer O, Kasthuril D, Dhingra S, et al. The use of artificial intelligence in pharmacovigilance: A systematic review of the literature. Pharmaceut Med. 2022;36:295–306. doi: 10.1007/s40290-022-00441-z. [DOI] [PubMed] [Google Scholar]
  • 6.Eric-Urban. Test Accuracy of a Custom Speech Model – Speech Service – Azure AI Services. 2023. [[Last accessed on 2024 Nov 17]]. Available from: https://learn.microsoft.com .
  • 7.Mishra HP, Gupta R. Leveraging Generative AI for Drug Safety and Pharmacovigilance. Curr Rev Clin Exp Pharmacol. 2025;20:89–97. doi: 10.2174/0127724328311400240823062829. doi: 10.2174/0127724328311400240823062829. [DOI] [PubMed] [Google Scholar]
  • 8.Martin GL, Jouganous J, Savidan R, Bellec A, Goehrs C, Benkebil M, et al. Validation of artificial intelligence to support the automatic coding of patient adverse drug reaction reports, using nationwide pharmacovigilance data. Drug Saf. 2022;45:535–48. doi: 10.1007/s40264-022-01153-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Kompa B, Hakim JB, Palepu A, Kompa KG, Smith M, Bain PA, et al. Artificial intelligence based on machine learning in pharmacovigilance: A scoping review. Drug Saf. 2022;45:477–91. doi: 10.1007/s40264-022-01176-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Hauben M. Artificial intelligence and data mining for the pharmacovigilance of drug-drug interactions. Clin Ther. 2023;45:117–33. doi: 10.1016/j.clinthera.2023.01.002. [DOI] [PubMed] [Google Scholar]
  • 11.Liang L, Hu J, Sun G, Hong N, Wu G, He Y, et al. Artificial intelligence-based pharmacovigilance in the setting of limited resources. Drug Saf. 2022;45:511–9. doi: 10.1007/s40264-022-01170-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Aronson JK. Artificial intelligence in pharmacovigilance: An introduction to terms, concepts, applications, and limitations. Drug Saf. 2022;45:407–18. doi: 10.1007/s40264-022-01156-5. [DOI] [PubMed] [Google Scholar]
  • 13.Sorbello A, Haque SA, Hasan R, Jermyn R, Hussein A, Vega A, et al. Artificial intelligence-enabled software prototype to inform opioid pharmacovigilance from electronic health records: Development and usability study. JMIR AI. 2023;2:e45000. doi: 10.2196/45000. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Perspectives in Clinical Research are provided here courtesy of Wolters Kluwer -- Medknow Publications

RESOURCES