Figure 2. An automatic pipeline for malaria literature mining.
Approach A, full text search by literature search engines: A1) All P. falciparum and P. yoelii locus names were downloaded from PlasmoDB and searched against Google Scholar and SCIRUS one at a time; A2) URL hits were then mapped to PubMed entries. Approach B, NCBI database mining: B1) Mapping between GenBank sequence entries and PubMed entries were systematically retrieved from NCBI for four Plasmodium species; B2) Sequences were mapped to malaria locus names by BLAST alignment. The pipeline resulted in 6,428 functional associations between 3,262 malaria proteins and 1,278 PubMed papers.
