Skip to main content
. 2013 Oct 15;8(10):e75541. doi: 10.1371/journal.pone.0075541

Figure 1. Outline view of the data extraction process.

Figure 1

(1) Initially we download a complete dataset for a given database version in flat file format. (2) We then extract the comment lines (lines beginning with ‘CC’, the comment indicator). (3) We remove comment blocks and properties (as defined in the UniProtKB manual [17]) and the ‘CC’ identifier. (4) Sentences are then extracted, using LingPipe. (5) Finally, all of the identified sentences are added to the MySQL database.