MULTBLAST: A web application for multiple BLAST searches

Taliah Mittler; Marcel Levy; Feller Chad; Schlauch Karen

doi:10.6026/97320630005224

. 2010 Nov 1;5(5):224–226. doi: 10.6026/97320630005224

MULTBLAST: A web application for multiple BLAST searches

Taliah Mittler ¹, Marcel Levy ², Feller Chad ², Schlauch Karen ^1,^*

PMCID: PMC3040504 PMID: 21364803

Abstract

Basic Local Alignment Search Tool, (BLAST) allows the comparison of a query sequence/s to a database of sequences and identifies those sequences that are similar to the query above a user-defined threshold. We have developed a user friendly web application, MULTBLAST that runs a series of BLAST searches on a user-supplied list of proteins against one or more target protein or nucleotide databases. The application pre-processes the data, launches each individual BLAST search on the University of Nevada, Reno's-TimeLogic DeCypher® system (available from Active Motif, Inc.) and retrieves and combines all the results into a simple, easy to read output file. The output file presents the list of the query proteins, followed by the BLAST results for the matching sequences from each target database in consecutive columns. This format is especially useful for either comparing the results from the different target databases, or analyzing the results while keeping the identification of each target database separate.

Availability

The application is available at the URLhttp://blastpipe.biochem.unr.edu/

Keywords: BLAST, Web application, Multiple BLAST searches

Background

Basic Local Alignment Search Tool, (BLAST) [1, 2] is the most frequently used algorithm for computing sequence similarity. It enables comparing a query sequence/s to a database/s of sequences and identifies sequences that are similar to the query above a defined threshold.

While there are a few BLAST applications that allow searching against several target databases, like TimeLogic ' s DeCypher ® server [3] and ViroBLAST (which provides only a limited number of target databases) [4], they perform a single BLAST search against a combination of target databases. The present application launches a series of BLAST searches, searching with one query list of proteins against each of several requested target protein or nucleotide databases. The application merges the results into one output file, presenting the results from each target database in consecutive columns. This format is especially useful for either comparing the results from the different target databases, or analyzing the results while keeping the identification of each target database separate. MULTBLAST provides further advantages by pre–processing the query file and by allowing useful formatting options for the output file.

TimeLogic ' s DeCypher system (available from Active Motif, Inc.) offers a hardware-accelerated implementation of the BLAST algorithm (TeraBLAST™) [3]. MULTBLAST utilizes these accelerated searches, thus allowing the completion of a large number of BLAST searches against large databases within a reasonable amount of time.

Methodology

The application includes several Perl scripts (Perl v5.10.0) that perform the following tasks: upload and process the query file (including sorting, deleting duplicate identifiers, and when requested, retrieving sequences from the appropriate database), incorporate the user defined parameters into command files to be sent to the DeCypher server (ver. 8.5.0), launch a series of BLAST searches on the DeCypher server [3], utilizing DeCypher Client 8.5, and retrieve and combine the results into the requested output file. The searches are either protein to protein (Tera-BLASTP) or protein to DNA (Tera-TBLASTN; comparing protein sequences to nucleic sequences translated using all 6 reading frames).

Program input

A web form prompts the user to enter job parameters and upload a query file used for all BLAST searches (Figure 1). The query file can be a FASTA formatted file with protein sequences from any specie/s or a personalized text file containing only a list of sequence identifier numbers, one Id on each line. If the user uploads a list of identifiers, the application will retrieve the corresponding protein sequences from the requested database. At this point the application can retrieve sequences from TAIR or, for Caenorhabditis elegans, from Ensembl (future plans include adding other databases). The user can select target databases against which the query will be compared from a list of 10 protein databases, including NCBI NR, Swiss-Prot, TrEMBL, UniRef100 and TAIR8, or 7 nucleotide database, including NCBI NT.

A screen shots of a portion of the web-application form.

The user can define the following BLAST search parameters: maximum number of matches printed for each query id, significant threshold (only results at or better than this threshold, i.e., lower P value, will be reported) and gapped alignment processing method. Additionally, the following default settings are used: Weight matrix: Blosum 62; Word size: 4; Query increment: 1; Extension threshold: 20; Open penalty: -11, Extend penalty: -1, Query filtered (masks repetitive elements in the data).

Program output

The user receives, based on his or her request, an email with either attachments of or links to the following two results files: A final tab-delimited text file (Table 1 see Supplemental material,) with a query identifier column (QUERYTEXT) followed by three columns for each target database: rank, significance level (E value) and the target sequence identifier (TARGETLOCUS). These three columns are the BLAST search results for the comparison between the respective query and target sequences. Results are based on matches between the query and each individual target database. No claims are made for matches between sequences of the different target databases. The file can be opened in Microsoft Excel for easy viewing and further analysis. BLAST results can contain, for a given query sequence, more than one of the same target sequence. The user can request to remove or keep these duplicate target sequences. Additionally, the user can request to either print the query id in each row (useful for further analysis), or only print the first instance of each query id, leaving the query id column empty for all following rows of the same query id (allowing clearer viewing). 1) A log file for that run, containing the names of the input and output files, the target databases and the BLAST search parameters.

Future development

Future developments will include adding other databases from which query sequences can be retrieved and additional target databases to search against, as well as incorporating an option to paste the query data into the application form in addition to the ability to upload a query file.

Supplementary material

Data 1

97320630005224S1.pdf^{(172.1KB, pdf)}

Acknowledgments

We would like to thank Dr. Ron Mittler for initiating the idea behind this application as well as for his helpful comments and feedback and Dr. John Cushman and Richard Tillett for their helpful input and feedback.

Footnotes

Citation:Schlauch etal, Bioinformation 5(5): 224-226 (2010)

References

1.Altschul SF, et al. J Mol Biol . 1990;215:403. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
2.Altschul SF, et al. Nucleic Acids Res. 1997;25:3380. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
3. www.timelogic.com.
4.Deng W, et al. Bioinformatics. 2007;23:2334. doi: 10.1093/bioinformatics/btm331. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data 1

97320630005224S1.pdf^{(172.1KB, pdf)}

[R01] 1.Altschul SF, et al. J Mol Biol . 1990;215:403. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]

[R02] 2.Altschul SF, et al. Nucleic Acids Res. 1997;25:3380. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R03] 3. www.timelogic.com.

[R04] 4.Deng W, et al. Bioinformatics. 2007;23:2334. doi: 10.1093/bioinformatics/btm331. [DOI] [PubMed] [Google Scholar]

PERMALINK

MULTBLAST: A web application for multiple BLAST searches

Taliah Mittler

Marcel Levy

Feller Chad

Schlauch Karen

Abstract

Availability

Background

Methodology

Program input

Figure 1.

Program output

Future development

Supplementary material

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

MULTBLAST: A web application for multiple BLAST searches

Taliah Mittler

Marcel Levy

Feller Chad

Schlauch Karen

Abstract

Availability

Background

Methodology

Program input

Figure 1.

Program output

Future development

Supplementary material

Acknowledgments

Footnotes

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases