LEAP4FNSSA lexicon: Towards a new dataset of keywords dealing with food security

Mathieu Roche; Agneta Lindsten; Tomas Lundén; Thierry Helmer

doi:10.1016/j.dib.2022.108680

. 2022 Oct 17;45:108680. doi: 10.1016/j.dib.2022.108680

LEAP4FNSSA lexicon: Towards a new dataset of keywords dealing with food security

Mathieu Roche ^a,^b,^⁎, Agneta Lindsten ^c, Tomas Lundén ^c, Thierry Helmer ^a

PMCID: PMC9679707 PMID: 36425989

Abstract

The main objective of the project LEAP4FNSSA (Long-term EU-AU Research and Innovation Partnership for Food and Nutrition Security and Sustainable Agriculture) is to provide a tool for European and African institutions to engage in a sustainable partnership platform for research and innovation on Food and Nutrition Security, and Sustainable Agriculture (FNSSA). The FNSSA roadmap facilitates the involvement of stakeholders for addressing and linking research to innovation dealing with food security issues. In this context, the LEAP4FNSSA project supports the driving of the roadmap. Research and innovation activities were captured in different data, i.e. LEAP4FNSSA database and heterogeneous textual data including project reports, websites, scientific publications, workshop reports and student theses. The Knowledge Extractor Pipeline System (KEOPS) was implemented to support the processing and analysis of textual data associated with FNSSA activities. KEOPS is based on the LEAP4FNSSA lexicon presented in this data paper. The LEAP4FNSSA lexicon composed of 331 keywords associated with 12 concepts of the food security domain is the result of 3 steps of work and brainstorming. The lexicon enables the capturing of research and innovation topics dealing with food security and conducted by African and European partners. This data paper presents the obtained lexicon and a summary of the method to build it.

Keywords: Food security, Semantic resources, Agrovoc, Text Mining

Specifications Table

Subject	Agricultural Sciences; Computer and Information Science; Social Sciences
Specific subject area	Lexicon in English dealing with food security
Type of data	Table
How data were acquired	Data are manually acquired by combining 3 types of resources (primary source): Pretoria vocabulary (obtained during a workshop organised in Pretoria in 2019), Agrovoc terms (https://www.fao.org/agrovoc/), and terms obtained by text mining. The LEAP4FNSSA lexicon is obtained with 3 iterative steps based on surveys and brainstorming with experts.
Data format	Filtered (LEAP4FNSSA lexicon) and raw (description of the process to build this lexicon).
Description of data collection	The dataset consists of (i) one table file with lexicon, (ii) a document describing the steps to obtain the final lexicon.
Data source location	The data are hosted on the CIRAD Dataverse. The data were built in the context of the LEAP4FNSSA project¹.
Data accessibility	Repository name: CIRAD Dataverse. Data identification number: 10.18167/DVN1/D1C53L. Direct URL to data: https://www.doi.org/10.18167/DVN1/D1C53L

Open in a new tab

Value of the Data

•
This dataset contributes to the available resources for Natural Language Processing (NLP) and data mining on specialized domains and more precisely in the field of food security.
•
This dataset is useful for computer scientists for enriching thesaurus and ontologies.
•
This dataset can be used for indexing data bases (for instance these keywords could be proposed as metadata).
•
This dataset can be used for analysing textual data dealing with agricultural sciences and social sciences.
•
This list of keywords can be used as part of a search strategy protocol for systematic review research in areas related to food security.

Data Description

In order to analyse textual data dealing with food security we have to consider different topics related to this issue. The proposed lexicon takes into account the multifactorial aspect related to food security with 331 keywords associated with 12 concepts summarized in Table 1. Examples of the concepts ”food security” and ”water management” are given in Tables 2 and 3. Note that both examples represent only 2 out of 12 concepts. All these concepts refer to different aspects of food security and sustainable agriculture in Africa and Europe. The 12 concepts are given in the Dataverse repository: https://doi.org/10.18167/DVN1/D1C53L.

Table 1.

Number of keywords by concept.

Concept	Number of keywords
Food security	20
Agroecology	22
Climate change	14
Water management	37
Crops	61
Livestock and animal production	26
One Health	34
Agricultural intensification and innovation	31
Food value chains and market	29
Agricultural systems	20
Partnerships in agricultural research development	11
Research + Training	26
TOTAL	331

Open in a new tab

Table 2.

Keywords associated with the ”food security” concept.

food security	food access	food insecurity
household food security	food aid	food sovereignty
hunger	nutrition security	right to food
self-sufficiency	novel food	resource management
early warning	nutritional quality	malnutrition
socioeconomic sustainability	sustainable intensification	sustainable food security
urban nutrition security

Open in a new tab

Table 3.

Keywords associated with the ”water management” concept.

water management	flood control	freshwater management
hydrological restoration	rain water management	water accounting
water auditing	water conservation	water extraction
water management in lowland	water management in upland	water security
water supply	water treatment	water conservation zone
drainage	hydraulic structure	water reuse
water storage	water use	agricultural hydraulics
watershed management	resource management	water resource
rural planning	water exploration	water rights
irrigation	groundwater storage	ground water storage
water quality	water governance	water harvesting
ict-based irrigation	drought	water constraint
hydrological monitoring

Open in a new tab

Experimental Design, Materials and Methods

The LEAP4FNSSA lexicon is the combination between 3 semantic resources, i.e. inputs in order to construct the final lexicon:

•
Pretoria vocabulary (list 1): This first lexicon composed of 8 concepts has been obtained during a workshop organised in Pretoria in 2019 in the context of the LEAP4FNSSA project. The process and the lexicon obtained are described in [1], [2]
•
Agrovoc vocabulary (list 2): Based on these 8 concepts (with one additional concept), Agrovoc terms associated with these concepts are manually extracted from the online1 resource. Agrovoc is a multilingual thesaurus dedicated to the agricultural domain developed by FAO (Food and Agriculture Organization) [3]. This thesaurus is used for different applications, e.g. indexing, annotation, data linking, etc.
•
Terms obtained by text-mining (list 3): Terminology is extracted from the LEAP4FNSSA corpus using generic parameters of the BioTex tool [4]. The LEAP4FNSSA corpus consists of documents and web pages relating to the FNSSA project database2. BioTex uses both statistical and linguistic information to extract terminology from free texts. The process applied is described in [5].

The initial terms (i.e. Pretoria vocabulary, Agrovoc vocabulary, terms obtained by text-mining) are given in the document ’LEAP4FNSSA_LEXICON_method_v2.pdf’ available in the Dataverse repository: https://doi.org/10.18167/DVN1/D1C53L.

The LEAP4FNSSA lexicon is obtained with 3 iterative steps. In these different steps, 4 types of experts and skills were involved: research scientist in text mining3, IT engineer4, experts in database indexing5, experts in food security issues (i.e. members of the LEAP4FNSSA project).

1.
The first step based on the three inputs (i.e. lists 1, 2 and 3) involves the actions summarized below:
- •
  Starting point: the Agrovoc vocabulary (i.e. list 2) with 9 initial concepts and terms associated with FNSSA.
- •
  Based on a survey dedicated to Work Package 3 members of the LEAP4FNSSA project (10 answers), a term associated with 2 or more irrelevant labels is removed (strict pruning). Irrelevant labels are assigned by the LEAP4FNSSA members according to the point-of-view of their work and expertise.
- •
  For each concept, the Pretoria terms (i.e. list 1) are added to obtain a new lexicon.
- •
  The irrelevant terms of this new lexicon (based on a survey with 12 answers) are removed (strict pruning applied).
- •
  New terms proposed from surveys and brainstorming are added (i.e. LEAP4FNSSA workshop).
- •
  Selection of terms extracted by text-mining (i.e. list 3) labeled as relevant by Work Package 3 members (via a survey with 5 answers).
- •
  Final suggestions from the surveys are taken into account (e.g. remarks, new concepts, concepts to delete).
2.
The second step is based on the following process:
- •
  Starting point: the lexicon obtained at step 1.
- •
  Improvement of concepts:
  - •
    The ’Project management’ concept is deleted because this concept is not a major focus of the LEAP4FNSSA project and food security issues.
  - •
    The ’Agroecology’ concept is added with terms proposed by Work Package 3 members.
- •
  Improvement of terms:
  - •
    Terms are manually lemmatized.
  - •
    Animals are added in the ’Agriculture and animal production’ concept.
  - •
    Diseases are added in the ’One Health’ concept.
3.
The last step is summarized below:
- •
  Starting point: the lexicon obtained at step 2.
- •
  Improvement of concepts:
  - •
    Names of specific concepts have been changed.
  - •
    Two new concepts are added: ’Food value chains and market’ and ’Agricultural systems’. These concepts contain new terms and terms that come from other concepts.
- •
  Improvement of terms:
  - •
    New keywords are added after a work conducted by the experts in charge of data indexing of the FNSSA project database. For instance, keywords extracted from the FNSSA project database and manually validated by the experts are added.
  - •
    Some terms are swapped between different concepts.
  - •
    Ambiguous terms are deleted (e.g. capacity, agriculture, etc.)
  - •
    The word ’crop’ is deleted in the 2-word terms of the ’Crops’ concept.
  - •
    New keywords are integrated after a final checking by the experts in charge of data indexing.

These modifications to consolidate the LEAP4FNSSA lexicon (e.g. addition and/or deletion of concepts and/or terms) are detailed in the document ‘LEAP4FNSSA_LEXICON_method_v2.pdf ’.

Note that variations of terms could be automatically extracted with NLP approaches in dedicated corpora [6], [7]. This will be integrated as future work to extend the current lexicon.

The LEAP4FNSSA lexicon obtained is integrated into the KEOPS (Knowledge ExtractOr Pipeline System) tool that uses text mining approaches to highlight knowledge from heterogenous textual data [5]. KEOPS is currently implemented on LEAP4FNSSA data in order to extract, visualise and analyse food security themes with maps, graphs, curves, and Venn diagrams [8].

Ethics Statement

No conflict of interest exists in this submission. The authors declare that the work described in this paper is original and not under consideration for publication elsewhere, in whole or in part. Its publication is approved by all the authors listed.

CRediT authorship contribution statement

Mathieu Roche: Data curation, Methodology, Formal analysis, Writing – original draft. Agneta Lindsten: Data curation, Formal analysis, Writing – review & editing. Tomas Lundén: Data curation, Formal analysis, Writing – review & editing. Thierry Helmer: Data curation, Formal analysis, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported by LEAP4FNSSA H2020 project, SFS-33-2018, grant agreement 817663. We thank the WP3 leaders, Petronella Chaminuka and Ioannis Dimitriou, and the WP3 members for their contribution to analyse the first results obtained.

first author

⁴

last author

⁵

second and third authors

https://www.fao.org/agrovoc/

https://www.library.wur.nl/WebQuery/leap4fnssa-projects

Contributor Information

Mathieu Roche, Email: mathieu.roche@cirad.fr.

Agneta Lindsten, Email: agneta.lindsten@slu.se.

Tomas Lundén, Email: tomas.lunden@slu.se.

Thierry Helmer, Email: thierry.helmer@cirad.fr.

Data Availability

LEAP4FNSSA lexicon (Original data) (Dataverse).

References

1.M. Roche, P. Martin, T. Helmer, PRETORIA lexicon - CIRAD Dataverse, 2022, doi: 10.18167/DVN1/WJT7U2. [DOI]
2.M. Roche, T. Helmer, P. Martin, A. Csorba, P. Chaminuka, I. Dimitriou, P. van Boheemen, V. Carrasco, V. Joutsjoki, A. Lindsten, T. Lundon, E. Okalany, S. Rokka, KEOPS - LEAP4FNSSA - Indexing - CIRAD Dataverse, 2021, doi: 10.18167/DVN1/MLFIPV [DOI]
3.Caracciolo C., Stellato A., Morshed A., Johannsen G., Rajbhandari S., Jaques Y., Keizer J. The AGROVOC linked dataset. Semantic Web. 2013;4(3):341–348. doi: 10.3233/SW-130106. [DOI] [Google Scholar]
4.Lossio-Ventura J.A., Jonquet C., Roche M., Teisseire M. Biomedical term extraction: overview and a new methodology. Inf. Retr. J. 2016;19(1-2):59–99. doi: 10.1007/s10791-015-9262-2. [DOI] [Google Scholar]
5.Martin P., Helmer T., Rabatel J., Roche M. In: Research Challenges in Information Science. Cherfi S., Perini A., Nurcan S., editors. Springer International Publishing; Cham: 2021. Keops: Knowledge extractor pipeline system; pp. 561–567. [Google Scholar]
6.Bourigault D., Jacquemin C. Ninth Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics; Bergen, Norway: 1999. Term extraction + term clustering: An integrated platform for computer-aided terminology; pp. 15–22. [Google Scholar]; https://www.aclanthology.org/E99-1003
7.Nenadic G., Ananiadou S., McNaught J. COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics. COLING; Geneva, Switzerland: 2004. Enhancing automatic term recognition through recognition of variation; pp. 604–610. [Google Scholar]; https://www.aclanthology.org/C04-1087
8.Ho S.Y., Tan S., Sze C.C., Wong L., Goh W.W.B. What can venn diagrams teach us about doing data science better? Int. J. Data Sci. Anal. 2021;11(1):1–10. doi: 10.1007/s41060-020-00230-4. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

LEAP4FNSSA lexicon (Original data) (Dataverse).

[bib0001] 1.M. Roche, P. Martin, T. Helmer, PRETORIA lexicon - CIRAD Dataverse, 2022, doi: 10.18167/DVN1/WJT7U2. [DOI]

[bib0002] 2.M. Roche, T. Helmer, P. Martin, A. Csorba, P. Chaminuka, I. Dimitriou, P. van Boheemen, V. Carrasco, V. Joutsjoki, A. Lindsten, T. Lundon, E. Okalany, S. Rokka, KEOPS - LEAP4FNSSA - Indexing - CIRAD Dataverse, 2021, doi: 10.18167/DVN1/MLFIPV [DOI]

[bib0003] 3.Caracciolo C., Stellato A., Morshed A., Johannsen G., Rajbhandari S., Jaques Y., Keizer J. The AGROVOC linked dataset. Semantic Web. 2013;4(3):341–348. doi: 10.3233/SW-130106. [DOI] [Google Scholar]

[bib0004] 4.Lossio-Ventura J.A., Jonquet C., Roche M., Teisseire M. Biomedical term extraction: overview and a new methodology. Inf. Retr. J. 2016;19(1-2):59–99. doi: 10.1007/s10791-015-9262-2. [DOI] [Google Scholar]

[bib0005] 5.Martin P., Helmer T., Rabatel J., Roche M. In: Research Challenges in Information Science. Cherfi S., Perini A., Nurcan S., editors. Springer International Publishing; Cham: 2021. Keops: Knowledge extractor pipeline system; pp. 561–567. [Google Scholar]

[bib0006] 6.Bourigault D., Jacquemin C. Ninth Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics; Bergen, Norway: 1999. Term extraction + term clustering: An integrated platform for computer-aided terminology; pp. 15–22. [Google Scholar]; https://www.aclanthology.org/E99-1003

[bib0007] 7.Nenadic G., Ananiadou S., McNaught J. COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics. COLING; Geneva, Switzerland: 2004. Enhancing automatic term recognition through recognition of variation; pp. 604–610. [Google Scholar]; https://www.aclanthology.org/C04-1087

[bib0008] 8.Ho S.Y., Tan S., Sze C.C., Wong L., Goh W.W.B. What can venn diagrams teach us about doing data science better? Int. J. Data Sci. Anal. 2021;11(1):1–10. doi: 10.1007/s41060-020-00230-4. [DOI] [Google Scholar]

PERMALINK

LEAP4FNSSA lexicon: Towards a new dataset of keywords dealing with food security

Mathieu Roche

Agneta Lindsten

Tomas Lundén

Thierry Helmer

Abstract

Value of the Data

Data Description

Table 1.

Table 2.

Table 3.

Experimental Design, Materials and Methods

Ethics Statement

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Contributor Information

Data Availability

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

LEAP4FNSSA lexicon: Towards a new dataset of keywords dealing with food security

Mathieu Roche

Agneta Lindsten

Tomas Lundén

Thierry Helmer

Abstract

Value of the Data

Data Description

Table 1.

Table 2.

Table 3.

Experimental Design, Materials and Methods

Ethics Statement

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Contributor Information

Data Availability

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases