Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2019 May 22;10:8. doi: 10.1186/s13326-019-0200-x

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© The Author(s) 2019

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

PMC Copyright notice

Fig. 1 — Corpus compilation pipeline. This pipeline, from bottom to top, shows the steps that were taken to compile the corpus (171 sentence pairs) that was later evaluated by annotators regarding the semantic similarity between the sentence pairs. First, two subsets, the anaerobiosis-FNR and the more general one, were compiled using different strategies. Then, a basic STS process was applied to both subsets in order to have a preliminary semantic similarity evaluation. This preliminary evaluation was used to select candidate sentences, creating a corpus that ended up with 40% of sentences from the anaerobiosis subset and 60% from the general subset