Skip to main content
EFSA Journal logoLink to EFSA Journal
. 2022 Dec 14;20(Suppl 2):e200913. doi: 10.2903/j.efsa.2022.e200913

Emerging risk identification by applying data analytical tools

Elisa Palmas 1,, Tekla Engelhardt 1, Zsuzsa Farkas 1, Szilveszter Csorba 1, Erika Országh 1, Ákos Bernard Józwiak 1
PMCID: PMC9749435  PMID: 36531278

Abstract

The working programme ‘Emerging risk identification by applying data analytical tools’ was delivered by the Digital Food Chain Education, Research, Development, and Innovation Institute (Digital Food Institute, DFI) on the field of emerging risks at the University of Veterinary Medicine Budapest, Hungary. The Institute is the University's research and education unit that provides data analysis and research along the whole food chain and takes networking in this area to a new level. The Fellow joined the hub of experts and researchers in the field of food chain safety data analysis, responsible for protecting public health concerning food in Hungary. The programme consisted of several different activities to provide an overview of the different tools that can be employed in the emerging risk identification process and prepare various stakeholders for new food chain safety issues. The programme was split into four modules to run over the one‐year fellowship covering different areas of data analysis and emerging risk identification. The aim was to be fully integrated with the organisation's work experience, increase knowledge of scientific aspects relevant in the field of data analysis and visualisation tools in the emerging risk identification area, and implement the results into various EU stakeholders' environments assessments.

Keywords: data analysis, visualisation tool, emerging risk identification, text mining

1. Introduction

The working programme ‘Emerging risk identification by applying data analytical tools’ was proposed for the EUFORA fellowship programme by the Digital Food Institute. DFI is a centre for risk assessment, data analysis, and research, and provides practical support and development of digitalisation possibilities to the actors of the agricultural and food economy, and ensures complex veterinary training based on digital technology with a food chain approach. As such, the Institute is actively involved in the promotion of digitalisation efforts in Hungary, including the Digital Success Programme (DSP) and the Digital Food Strategy (DFS). The main objective of the DFS is to promote the digitalisation potential of data in the food industry, thereby helping the sector to operate more efficiently. This requires the processing of the large volumes of data available in the food industry, the use of artificial intelligence, and the development of human resources. The Institute also works closely with the Hungarian National Food Chain Safety Authority (NÉBIH) in several areas, including the identification of emerging risks. The most important foreign partner of the Institute is the European Food Safety Authority (EFSA). DFI staff represent Hungary in several working groups and networks of EFSA. The Fellow was part of the DFI group throughout the fellowship. She collaborated with the staff as well as attended several events such as presentations, training courses, conferences, EFSA meetings and project‐related meetings. The working programme covered different data analysis tools that can be applied in a wider range of food safety risk assessment areas and were organised into four different modules.

2. Description of work programme

The EU‐FORA work programme was comprised of four modules. The fellow had the opportunity to implement the emerging risk identification (ERI) results in various stakeholders' environments. She had an overview of the different data analysis tools that can be applied also in a wider range of food safety risk assessment areas, to acquire skills and competencies that can be used in her future career. The fellow worked within our expert teams attending Teams and Unit meetings, as well as all associated social events. In addition, as an integral member of the EFSA, the fellow followed a wide range of continuous internal risk assessment training programmes, workshops, presentations and webinars.

2.1. Aims

The working programme prepared for the fellow covered different areas of data analysis and ERI.

The main goals of the fellowship project were:

  • Learning best practices on data analysis principles (including transparency, validation and documentation).

  • Getting insight into tasks of the ERI, the various procedures and possible outcomes.

  • Getting familiar with software tools for data mining and data visualisation (Knime, InfraNodus).

2.2. Activities/Methods

Module 1 : Application of data science in food safety: data science basics, overview of data science tools

Under the first module, the fellow got an overview of data science and data science tools in food science, including data collection, preparation, modelling (linear and non‐linear methods, network analysis, etc.), limitations and interpretation. The fellow learned how to see food chain safety problems with a data analytical‐focused perspective, define the problem, structure it, and select and apply the possible and appropriate data analytical tools to solve them. She participated in the ‘Application possibilities of data analysis in the field of food safety’ course organised by the Digital Food Institute. The course focused on the evaluation of the safety of the food chain, risk assessment, and in many cases risk management and computation‐intensive analyses and methods. The general objective was to acquaint the main computational methods applied in the field of food chain safety, their basics, application possibilities and limitations.

Module 2 : Emerging Risk Identification: basic concept, procedures, overview of ERI tools

In the second module, the fellow got an overview of the concept of ERI workflow, the various procedures and possible outcomes (Figure 1). An emerging risk is a risk resulting from a newly identified hazard to which a significant exposure may occur, or from an unexpected new or increased significant exposure and/or susceptibility to a known hazard (EFSA, 2018). The successful identification of emerging risks is at the heart of protecting public health and the environment. By identifying emerging risks in the food chain early, risks are anticipated and allow to take effective and timely prevention measures to protect consumers, animals, plants and the environment. Identifying emerging risks also helps to meet future risk assessment challenges, for example by mining new sources of data, developing new analytical tools and methods, and broadening networks of scientific knowledge. A large number of data (e.g. scientific literature, news, patents, etc.) is arising continuously, and it is a real challenge to derive meaningful information from this huge noise regarding emerging risks. Early and prompt identification of emerging risks enables better preparedness and policymaking in the food safety decision‐making system. The fellow was deeply involved in the weekly identification of potential emerging risks through the help of data analysis tools for screening and prioritising large data sets to capture dependent changes in the food system. By analysing graphs and topics, she identified potential emerging issues. Then, a multi‐step selection procedure was conducted by the fellow and our expert group to select the emerging risks that need further action to mitigate the adverse effects. In particular, the criteria considered are novelty, soundness, imminence and scale based on limited data and expert knowledge with high levels of uncertainty and low reproducibility. Additional information regarding the nature of the hazard identified, or associated drivers and trends are also included. Important objectives for the procedure for identification of emerging risks are to raise awareness of risk managers for emerging risks and improve preparedness for risk assessors. Indeed, the emerging risk that needed further measures were selected and forwarded to the relevant stakeholders such as the industry, the authorities and the research community, depending on the issues and depending on the risks themselves identified. In particular, the Fellow identified more than 100 potential issues (Table 1), from which more than 70 were uploaded to our management tool that helps us in tracking and monitoring emerging issues. She was also involved in preparing the whole documentation for the identified emerging issues and risks to be published on our DFI website.

Figure 1.

Figure 1

Emerging issue identification (ERI) workflow

Table 1.

Selection of the emerging issues identified by the Fellow

Description Link of the issue Date of the issue Action
High levels of toxins found in more baby food brands https://www.cbsnews.com/news/baby-food-toxins-government-report/ 2021.9.29 New card in the tool Trello. To be monitored
Alcohol content of fermented beverages https://www.longfordleader.ie/news/national-news/668712/on-the-kombucha-craze-new-alert-issued-over-alcohol-content-problem.html 2021.9.23 Attached to ‘home fermentation’ card
Another gene edited tomato https://www.bullfrag.com/tomatoes-that-reduce-stress-and-lower-blood-pressure-go-on-sale-in-japan-life/ 2021.9.30 Attached to ‘GM tomato’ card
Palmitic acid promotes a prometastatic memory https://www.nature.com/articles/s41586-021-04075-0 2021.11.10 Communicated to EFSA and shared in our website
Bacteria‐killing, biodegradable food packaging material https://www.straitstimes.com/singapore/environment/spore-us-scientists-create-bacteria-killing-biodegradable-food-packaging-material 2021.12.29 New card in the tool Trello
Pseudomonas aeruginosa in liquid probiotics https://kswfoodmicro.com/2021/12/02/usa-livia-global-announces-voluntary-recall-of-two-lots-of-its-liviaone-liquid-probiotics-because-of-the-potential-for-contamination-with-pseudomonas-aeruginosa/ 2021.12.2 New card in the tool Trello. To be monitored
Demand for crocodile meat doubles following rise in pork prices https://www.laprensalatina.com/thais-turn-to-crocodile-meat-as-price-of-pork-soars/ 2022.1.22 Communicated in LinkedIn
Cadmium and risk of diabetes https://www.sciencedirect.com/science/article/pii/S0160412021005456 2022.1.26 New card and to be monitored
Artificial sweeteners and cancer risk https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1003950 2022.3.24 Communicated in LinkedIn
Dietary intake with urinary melamine https://www.mdpi.com/1660-4601/19/9/4964/htm 2022.4.19 New card and to be monitored

Module 3 : Data analytic and visualisation methods for ERI: theory, method development and practical application

In the third module, the fellow acquired knowledge on the use of data science tools in ERI. She got useful experiences in data analytic and visualisation methods, to show their results to various stakeholders. Indeed, in our Emerging Risk Identification system, we retrieved food safety news collected by Europe Media Monitor, which is a freely accessible news database with an RSS feed that can be used as an input for data analysis purposes (Meijer et al., 2020). Then, a text mining application software such as Knime or InfraNodus is used. The steps of data preparation and analysis which help us to find clusters of words and/or topics are:

  • Data retrieval: The news feed is downloaded, parsed, and transformed into documents through the node RSS feed reader.

  • Text preprocessing (e.g. stop word filtering and punctuation erasure): Preprocessing component uses extremely fast text processing to remove/filter specific types of characters from a string column.

  • Creating bag of words: This node creates a bag of words which consisting of one column containing the terms occurring in the corresponding document and copied to the output table.

  • Co‐occurrence analysis: The node counts the number of co‐occurrences for the given list of terms within the selected parts, e.g. sentence, paragraph, section and title of the corresponding document.

  • Network analysis with clustering algorithms: This component counts incoming/outgoing edges and the edge with the highest value is the most central node. The clustering algorithm includes the neighbourhood's analysis of the node and the group's identification in the data.

The resulted graph (Figure 2) is based on the co‐occurrence network of words and identifies different topics with the Latent Dirichlet Allocation algorithm. By analysing the graph and selecting topics (Figure 3), potential emerging issues can be identified. The fellow was involved in the entire emerging risk identification process, where she assisted the process to derive meaningful information from the huge noise regarding emerging risks. Communication to stakeholders involved the collaboration with the EFSA Emerging Risk Exchange Network (EREN), a very successful network for the sharing, analysis, and dissemination of emerging issues via a strong and committed membership including EFSA and the other Member States. The fellow attended the 26th and 27th EREN meetings. She was also involved in developing preparatory materials (short issues, briefing notes and presentations). One of the outcomes of her daily work of identification and analysis of news and data brought to the development of a briefing note for the 27th EREN meeting. Indeed, she found a relevant article about Palmitic acid that stimulates metastasis in a long‐term stable manner (Pascual et al., 2021). She prepared the briefing note to be presented to EFSA and Member states. After further analysis and evaluation, the topic was evaluated as an emerging issue because new sound evidence was found, and it was proposed to initiate a new risk assessment. The fellow presented her results also at the Agrostat 2022 Conference in Lyon, France, as a poster titled ‘Applying a text mining software for emerging risk identification in the food chain’. 1

Figure 2.

Figure 2

Topic detection and visualisation with InfraNodus software

Figure 3.

Figure 3

Topic detection and visualisation with InfraNodus software after the selection of the topic ‘food safety’

Module 4 : Practical applications: early warning systems in various stakeholders ‘environment.

Our institution has ongoing projects with both governmental and industrial partners, so the fellow had a great opportunity to join these projects and see the role of result implementation in practical applications, and how ERI or other data analysis‐based systems can be turned into practical applications.

Drug repositioning Data lake project

One of the data analysis‐based projects the fellow joined, was developing a data science‐assisted veterinary drug repositioning platform to combat antimicrobial resistance, where different data analytical pipelines will be used to suggest drug candidate molecules already used in other indications which potentiate the effects of antimicrobials in poultry. With the help of the newly forming platform, the goal is to nominate several drug candidates for repositioning and to test them in vitro and in vivo. The platform could serve as a base for the establishment of a future food chain safety Data lake to be developed which would be valuable for the integration and complex analysis of animal health, veterinary public health, food safety and human health data from different sources. The fellow supported the project by creating a list of potential databases useful for the creation of this platform. She went through several data sources to establish which database was a potential candidate and created a detailed list of more than 70 databases with information about the data owner, the public availability, how much data was stored, data format, and other information shown in Table 2. These databases collected are now being evaluated by the project consortium and will be used to develop a data model. The project provides an opportunity to develop a cross‐sectoral data storage strategy that will later provide for the complete analysis of the food chain and an integrated approach to risk assessment. The resulting data lake can serve as a model for other countries as well as for the common European veterinary public health profession, proving great market opportunities of the project.

Table 2.

Selection of the data sources assessed for the Drug repositioning Data lake project

Domain Database Technology (data access) Refresh methodology Data size Access link
Microbiology Food‐borne pathogens XLSX, PDF, CSV Frequently updated 4,588 results https://zenodo.org/search?page=1&size=20&q=foodborne%20pathogens
Veterinary drugs Veterinary drug residues XLSX, PDF, CSV Frequently updated 420 results https://zenodo.org/communities/efsa-kj/search?page=1&size=20&q=veterinary%20drugs%20residues
Chemicals REACH ‐ Registration, Evaluation, Authorisation and Restriction of Chemicals Regulation PDF, XML, CSV, XLSX Frequently updated 32 current consultations https://echa.europa.eu/regulations/reach/understanding-reach
Medicines European public assessment reports (EPAR) PDF, XLSX, HTML 2 weeks after EC decision 1,601 reports https://www.ema.europa.eu/en/medicines/field_ema_web_categories%253Aname_field/Human/ema_group_types/ema_medicine
Chemical structure data ChEMBL TXT, CSV, JSON, XLSX, MySQL, TSV, SDF Every 3–4 months more than 65,000 journal articles https://www.ebi.ac.uk/chembl/

RASFF project

Another task is linked to collecting short‐term, early warning signs from the Rapid Alert System for Food and Feed (RASFF). RASFF has been put in place by the European Commission to provide food and feed control authorities with an effective tool to exchange information about measures taken to respond to serious risks detected concerning food or feed. This exchange of information helps EU countries to act more rapidly and in a coordinated manner in response to a health threat caused by food or feed. RASFF tool is for these reasons an important portal to be connected to our emerging risk identification. In our internal project, the aim was to connect the RASFF data to our emerging risk data, making it possible to analyse risk profile from an emerging risk perspective combined with the RASFF perspective. To enable this analysis, an alignment of RASFF data structure with FoodEx2 (a comprehensive EFSA food classification and description system) and EFSA parameter codes is necessary. For this reason, the project idea was to align food and hazard categories through applying an easy and convenient own categorisation system. The fellow was involved in the creation of a workflow in Knime software to make automated category mapping as a part of data preprocessing. Knime is an open‐source data analytics, reporting, and integration platform that allows assembly of nodes blending different data sources, including preprocessing for modelling, data analysis and visualisation without, or with only minimal, programming. With the help of this tool, she created the workflow of the product (food, feed and live animals and relative subgroups) and hazards (physical, chemicals and biological and relative subgroups) classification.

3. Conclusions

The working programme ‘Emerging risk identification by applying data analytical tools’ offered the fellow an opportunity to familiarise herself with emerging risk identification, data analysis and data visualisation. The four modules included activities that successfully trained the EU‐FORA fellow such as attending the meetings of several scientific national advisory committees that ensured the fellow's understanding of the interaction between different international organisations. The fellow was successfully integrated into the day‐by‐day workflow of Digital Food Institute, gaining first‐hand experience in a scientific and interdisciplinary context. This experience provided the fellow the opportunity to develop her data science‐related skills, which will benefit her professional development as a data analyst. Moreover, the fellow gained an overview of several topics related to food safety risk assessment by attending the EU‐FORA training modules.

Abbreviations

DFI

Digital Food Institute

DFS

Digital Food Strategy

ERI

emerging risk identification

EU‐FORA

European Union Food Risk Assessment

NÉBIH

National Food Chain Safety Authority

RASFF

Rapid Alert System for Food and Feed

Appendix A – Trainings and activities

Type Title Date
Course training Data analysis and computational science in general. Defining computational methods, general applications, timeliness, benefits and limitations. 2022.10.20
Modelling basics. Problems solvable by modelling and the limitations of modelling. Linear models. 2022.10.20
Modelling. Markov models, game theory. 2022.10.20
Modelling. Non‐linear models, complexity science, their role and importance in food chain safety. 2022.11.3
Network analysis. Basics of network analysis and application possibilities in the field of food chain safety. 2022.11.3
Network analysis. Microbial metabolic pathways as networks. Epidemies and foodborne incidents as networks. 2022.11.3
Epidemiological modelling. Diffusion, compartment, agent‐based, spatio‐temporal and network models. 2022.11.17
Applications: KNIME, R, Python 2022.11.17
Applications: Gephi, STEM, GleamViz 2022.11.17
Data mining, text mining. Basics and application possibilities. Case study: identifying emerging risks with text mining. 2022.12.1
Predictive microbiology. Basics and application possibilities from industrial, policy and research perspective. 2022.12.1
Traceability. Role of data and IT in traceability systems. Investigation of foodborne outbreaks with Food Chain Lab. Blockchain‐based traceability systems. 2022.12.1
Food chain data analysis, driver analysis. Process mining, Bayesian network analysis. Case study: milk production chain automated driver analysis and alert system. 2022.12.1
Decision support. Data visualization, interpretation and communication of results. Communicating limitations and uncertainties. Ethical considerations. Decision making processes 2022.12.15
The future food data scientist: challenges and new areas. Big data and artificial intelligence (AI) application possibilities and limitations. 2022.12.15
Case study: literature review with the help of AI. 2022.12.15
Data analysis lecture on KNIME and text mining with practical exercises 2022.12.15
Seminars and conferences 26th EREN meeting 2021.10.21–10‐22
The Bilateral symposia serials with Huazhong Agricultural University 2022.2.16–2.17
27th EREN meeting 2022.5.11–5.12
Agrostat 2022 – presentation of the poster ‘Applying a text mining software for emerging risk identification in the food chain’ 2022.6.16–6.17
One Health Conference 2022 2022.6.21–6.24

Suggested citation: Palmas E, Engelhardt T, Farkas ZS, Csorba SZ, Országh E and Józwiak Á, 2022. Emerging risk identification by applying data analytical tools. EFSA Journal 2022;20(S2):e200913, 12 pp. 10.2903/j.efsa.2022.e200913

Declarations of interest If you wish to access the declaration of interests of any expert contributing to an EFSA scientific assessment, please contact interestmanagement@efsa.europa.eu.

Acknowledgements This report is funded by EFSA as part of the EU‐FORA programme.

Approved: 31 August 2022

Notes

References

  1. EFSA (European Food Safety Authority) , Donohoe T, Garnett K, Lansink AO, Afonso A and Noteborn H, 2018. Scientific report on the emerging risks identification on food and feed – EFSA. EFSA Journal 2018;16(7):5359, 37 pp. 10.2903/j.efsa.2018.5359 [DOI] [PMC free article] [PubMed]
  2. Meijer N, Filter M, Jóźwiak Á, Willems D, Frewer L, Fischer A, Liu N, Bouzembrak Y, Valentin L, Fuhrmann M, Mylord T, Kerekes K, Farkas ZS, Hadjigeorgiou E, Clark B, Coles D, Comber R, Simpson E and HJP Marvin, 2020. Determination and Metrics for Emerging Risks Identification DEMETER: Final Report. https://efsa.onlinelibrary.wiley.com/doi/abs/10.2903/sp.efsa.2020.EN-1889
  3. Pascual G, Domínguez D, Elosúa‐Bayes M, Beckedorff F, Laudanna C, Bigas C, Douillet D, Greco C, Symeonidi A, Hernández I, Gil SR, Prats N, Bescós C, Shiekhattar R, Amit M, Heyn H, Shilatifard A and Benitah SA, 2021. Dietary palmitic acid promotes a prometastatic memory via Schwann cells. Nature, 599, 485–490. https://www.nature.com/articles/s41586-021-04075-0 [DOI] [PubMed] [Google Scholar]

Articles from EFSA Journal are provided here courtesy of Wiley

RESOURCES