Acharya KK1,2, Darshan SC1,2, Neelima Ch1, Paul D1, Akhilesh B1,2, Sravanthi D1,2, Sreelakshmi KS1, Deepti H1, Dey P2, Pamuru P2 and Vasan SS3
1IBAB (Institute of Bioinformatics and Applied Biotechnology) and 2Shodhaka Life Sciences Pvt. Ltd., Biotech Park, Electronic City, Bengaluru; 3Ankur Healthcare Pvt Ltd., Rajaji Nagar, Bengaluru , India
The efficiency of biomarker discovery was supposed to increase with the advent of ‘genome-wide expression profiling’ techniques, particularly the microarray technology. But variations in results across experiments have become obstacles and hence, the pace of translational research has not been as per expectations. While the next generation sequencing technology now offers better hopes, most of the existing transcriptomics data need not be neglected. We developed a promising new algorithm and a database (MGEx-Tdb: http://resource.ibab.ac.in/MGEx-Tdb/) to make the best use of existing data for identifying new biomarkers for various diseases, by addressing one tissue at a time (BMC Genomics (2011) 11: 467). The new computational process helps to differentiate genes with higher reliability of expression pattern from the others. The approach involved manual extraction (biocuration) of most available microarray data sets, developing of a new database and deriving a ‘consensus’ expression status for each gene by using data across multiple ‘comparable’ studies (i.e., addressing same location and condition). Microarray experiments have now been conducted with clinical samples to validate the consensus expression status and their ‘reliability scores’. A strong correlation was found between the reliability scores and the reproducibility of the same. A comparative in silico analysis confirmed that the database is more informative and easier to use than any other bioinformatics resources routinely used by scientists. We have further improved the efficiency of the database by revising the algorithm: a new scoring method developed by Shodhaka has been adopted to assess the reliability of gene expression status. The method not only improves microarray-based reliability scoring, but also allows incorporation of EST and other types of gene expression data. The new database and associated software can serve as a very potent ‘platform for gene expression prediction and biomarker discovery’ for any condition related to the testis tissue. It permits identification of genes consistently reported to be present or absent in disease conditions, but well established to have the opposite expression status in the normal tissue. Currently a list of genes co-expressed in the context of testicular cancer have been marked as targets for further research in diagnostics, prognostics and therapeutics, and being analyzed at the level of system biology and transcriptional regulation. The strategy can be adopted for any other tissue and disease. The new approach is also being used to identify more dependable potential biomarkers for cervical cancer. Similar work has also been initiated in the context of cancer associated with liver and prostate.

