Skip to main content
. 2016 Feb 3;11(2):e0148434. doi: 10.1371/journal.pone.0148434

Table 1. Number of news articles that were annotated with extracted images, entities and topics.

At each step of our extraction process, we use high precision computational methods in order to ensure that the information we extract from the news articles is of high quality. As such, we do not extract faces, entities or topic from every single news article.

Label Description Count
No Image No image URL present in the news article header meta information. 846,980
Missing Image No article image could be retrieved from the URL. 129,848
No Face The article image does not contain a face. 904,638
Face The article image does contain a face. 472,186
No People No person entities could be fully resolved within the full text of the article. 1,287,882
People Person entities could be fully resolved within the full text of the article. 1,065,770
No Topic No topic category was assigned to the news article. 878,683
Topic One or more topic categories were assigned to the news article. 1,474,969