Skip to main content
American Journal of Physiology - Renal Physiology logoLink to American Journal of Physiology - Renal Physiology
. 2014 Jul 23;307(6):F747–F755. doi: 10.1152/ajprenal.00012.2014

A knowledge base of vasopressin actions in the kidney

Akshay Sanghi 1,*, Matthew Zaringhalam 1,*, Callan C Corcoran 1,*, Fahad Saeed 3, Jason D Hoffert 1, Pablo Sandoval 1, Trairak Pisitkun 2, Mark A Knepper 1,
PMCID: PMC4166727  PMID: 25056354

Abstract

Biological information is growing at a rapid pace, making it difficult for individual investigators to be familiar with all information that is relevant to their own research. Computers are beginning to be used to extract and curate biological information; however, the complexity of human language used in research papers continues to be a critical barrier to full automation of knowledge extraction. Here, we report a manually curated knowledge base of vasopressin actions in renal epithelial cells that is designed to be readable either by humans or by computer programs using natural language processing algorithms. The knowledge base consists of three related databases accessible at https://helixweb.nih.gov/ESBL/TinyUrls/Vaso_portal.html. One of the component databases reports vasopressin actions on individual proteins expressed in renal epithelia, including effects on phosphorylation, protein abundances, protein translocation from one subcellular compartment to another, protein-protein binding interactions, etc. The second database reports vasopressin actions on physiological measures in renal epithelia, and the third reports specific mRNA species whose abundances change in response to vasopressin. We illustrate the application of the knowledge base by using it to generate a protein kinase network that connects vasopressin binding in collecting duct cells to physiological effects to regulate the water channel protein aquaporin-2.

Keywords: aquaporin-2, database, ENaC, urea channel


the treasure trove of published biological information is vast and continues to grow exponentially. The availability of so much information is beneficial but presents a problem for the physiological researcher. No single investigator can hope to have familiarity with more than a small fraction of the existing information that is relevant to his or her area of research focus. Even if investigators were able to read every relevant paper (itself an impossible goal), the human mind is limited in its ability to remember details and to identify complex relationships among biological variables discussed in multiple related papers.

In recognition of this problem, computers are being put to work (12, 27). Computers can compensate for human limitations like limited memory, slow data acquisition, and inability to carry out complex calculations. However, computers do not measure up to many of the integrative capacities of the human mind, including the ability to read and understand natural human language. Since the standard method for conveying information in the scientific world is written text, usually in the English language, there have been efforts to develop computational tools that can extract information from publications, a field called “natural language processing” (12). The task turns out to be difficult. The difficulty arises from the fact that “good writing” involves complexity, both syntactically and semantically. As a result of the complexity, computer programs used to categorize elements of sentences and determine the relationship among these elements (automated text processors) can misidentify relationships (27). Thus, although computational extraction of information from text is a promising direction, we cannot yet rely on computers to archive information from publications in an error-free manner. Therefore, the creation of knowledge databases is often achieved through manual curation as seen for commercial knowledge bases such as MetaBase (2) and Ingenuity Pathway Analysis (37). Such manually curated databases are useful in identifying biological relationships in a broad biological context, but they are less helpful in specialized areas of research. In this paper, we have compiled a prototype knowledge base aimed at archiving published information about one such specialized area, viz., the actions of the peptide hormone vasopressin in the kidney.

Vasopressin is the major regulator of water excretion by the kidney. It has actions in several renal tubule segments, including the collecting duct, the connecting tubule, the distal convoluted tubule, and the thick ascending limb. Vasopressin actions in these segments are mediated by binding to the V2 receptor (Gene symbol: Avpr2), a G protein-coupled receptor. In general, signaling pathways in these segments are incompletely understood despite many years of investigation. Recently, much information has been accruing from proteomic studies that identify proteome-wide actions of vasopressin, describing vasopressin-regulated phosphorylation, protein abundance changes, protein translocation from one subcellular compartment to another, protein-protein binding interactions, etc. Most of these studies have associated online databases (https://helixweb.nih.gov/ESBL/Database/), allowing users to freely access the data for their own analysis and experimental planning. These resources add to a growing literature based on reductionist studies of vasopressin action.

A major goal in systems biology is large-scale integration of data from diverse studies (19). In this paper, we describe a new knowledge base, reporting a comprehensive list of documented vasopressin actions at the level of individual proteins. We also provide two ancillary databases of known physiological actions of vasopressin at an epithelial level and mRNAs whose abundances are regulated by vasopressin in renal epithelia. We structured the three databases in a “syntactical triplet” format, which presents the information in a “subject-verb-object” syntax (12). This structure provides a uniform format that can be read easily either by human beings or by computers using standard parsing algorithms discussed by Quest et al. (25). Because the databases are about vasopressin actions, all subjects are either “AVP” (arginine vasopressin) or “DDAVP” (a synthetic V2 receptor-selective agonist).

METHODS

Protein Targets database.

Publications that report vasopressin's effects on specific proteins in kidney epithelial cells were found using PubMed [search terms: (DDAVP OR AVP OR vasopressin) AND (collecting OR thick OR Henle) AND (kidney OR renal)]. Forty-two of these were found to report effects of vasopressin or DDAVP in which a change in vasopressin dose or concentration was an independent variable. Publication dates ranged from 1992 to 2013. The subject-verb-object triplets were manually extracted.

To curate the data, we initially organized the details about each study in an electronic spreadsheet. The elements of the syntactical triplets used are the “subject” (either “AVP” or “DDAVP”), the “verb” or “action” (predicate phrases), and the “object” (official gene symbols of the regulated target proteins). Data are also recorded for the location of phosphorylated sites (if relevant), the experimental system studied (tissue or cell type), a description of the responsive protein, the vasopressin dose and duration of treatment, the RefSeq identifier number of the responsive protein, the magnitude of the recorded change, the title of the publication, the year of publication, and the unique identifier (PMID) of the publication. The magnitudes of vasopressin responses were recorded as the percentage of control. The online database [The Database of Vasopressin Regulation in Kidney (Protein Targets)] can be entered through the portal at https://helixweb.nih.gov/ESBL/TinyUrls/Vaso_portal.html and is an abridged version of this spreadsheet. It excludes the vasopressin concentration and duration of treatment, publication information, and protein RefSeq identifier. To allow the user to access the RefSeq records, the online database has hyperlinks for each official gene symbol to its entry in the NCBI protein catalog. In addition, hyperlinks are provided to the publication abstracts in PubMed. The unabridged spreadsheet can be downloaded from a link on the webpage.

Physiological Targets database.

A second database, The Database of Vasopressin Regulation in Kidney (Physiological Targets), was curated from the same PubMed search described above plus review articles (20, 21). This database includes measurements reported for effects of vasopressin or vasopressin analogs on physiological variables at a cellular or epithelial level. These observations, gleaned from 49 publications, are heavily weighted toward measurements made in the pregenomic era, i.e., before 1990, using in vitro methods. The year of publication ranged from 1966 to 2012. This database also uses a subject-verb-object syntax to represent the observations. The elements of the syntactical triplets used are the subject (either AVP or DDAVP), the verb or action (either decreased or increased), and the object (the physiological measure).

mRNA Targets database.

A third database, The Database of Vasopressin Regulation in Kidney (mRNA Targets), was curated from a PubMed search (search terms: mRNA AND vasopressin AND kidney) with inclusion of data from three large-scale transcriptomic studies (5, 17, 28). This database includes measurements of effects of vasopressin or vasopressin analogs on specific mRNA species in renal epithelial tissues. These observations were obtained from 21 papers covering the years 1995–2011. This database also uses a subject-verb-object syntax to represent the observations. The elements of the syntactical triplets used are the subject (either AVP or DDAVP), the verb or action (either increases, does not change, or decreases), and the object (the official gene symbol corresponding to the regulated mRNA).

Protein kinase network.

Construction of a protein kinase network from the data utilized Pfam (http://pfam.sanger.ac.uk/search); the Uniprot-SwissProt Protein Knowledge Base (http://www.uniprot.org/docs/pkinfam); KinBase (http://kinase.com/); ABE (http://helixweb.nih.gov/ESBL/ABE); BLAST (http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download); STRING (http://string-db.org/); Medusa (http://sourceforge.net/projects/graph-medusa/files/); and Cytoscape (http://www.cytoscape.org/) as described in results.

RESULTS

Protein Targets database.

A screenshot image of the Database of Vasopressin Regulation in Kidney (Protein Targets) at https://helixweb.nih.gov/ESBL/Database/Meta_Analysis/VasoProtein.html is shown in Fig. 1. The data are displayed in a tabular format. The first three columns (red box, A) show the subject, verb (action), and object of each syntactical triplet, respectively. The objective of the database is to enumerate the protein targets of vasopressin, extracted from the scientific literature. Accordingly, all subject terms are either AVP or DDAVP. The verb column contains action phrases. The objects are the targeted proteins, designated by their official gene symbols. The gene symbols are hyperlinked to appropriate RefSeq records. The elements together form an English sentence when read from left to right, including the additional columns that provide ancillary information. The last column indicates the data-source publication, which is hyperlinked to the appropriate PubMed entry.

Fig. 1.

Fig. 1.

Screenshot image of the Database of Vasopressin Regulation in Kidney (Protein Targets) shows its major features. A: red box indicates locations of the subject, verb (action), and object of each syntactical triplet. B: user may filter data using dropdown list of filters. C: user can download data in space-delimited file that can be opened with spreadsheet programs. D: user can link to two ancillary databases. The image was taken from the display at the URL: https://helixweb.nih.gov/ESBL/Database/Meta_Analysis/VasoProtein.html.

The database has other features, including a dropdown list of filters (Fig. 1, red B) that allow the user to isolate subsets of the data. Filters are based on verbs (actions). Users can download an expanded version of the database in the form of a space-delimited file that can be opened with spreadsheet programs (Fig. 1, red C). The downloadable spreadsheet includes additional experimental details. The red D in Fig. 1 indicates a link to two other databases called Database of Vasopressin Regulation in Kidney (Physiological Targets) and Database of Vasopressin Regulation in Kidney (mRNA Targets). In addition, we have provided a link that allows users to submit new information to the knowledge base curators for inclusion in any of the three databases.

Figure 2 is a synopsis of the Database of Vasopressin Regulation in Kidney (Protein Targets). Figure 2A shows the frequency of each verb (action). The most frequent entries are terms for vasopressin's effects on abundance and phosphorylation. In Fig. 2B, we show the distribution of objects (official gene symbols). Most appear only once. The most frequent target of vasopressin regulation with 43 entries is aquaporin-2 (Aqp2). Other frequent entries are Ahnak (a PDZ-domain containing nuclear protein); Ctnnb1 (β-catenin, a multifunctional protein that translocates to the nucleus in response to vasopressin) (30); Tns (tensin, a protein-tyrosine phosphatase-like scaffold protein with SH2, PH-like, and C2 domains); Lrba (a single-pass integral membrane protein with multiple WD40 repeats, a BEACH domain, and an armadillo domain); and Slc14a2 (the vasopressin-regulated urea transporter of the renal collecting duct) (31). Figure 2D presents the frequency of experimental systems (cell type) appearing in the database. Most entries were from studies using cultured cells from mice (mpkCCD) or native rat inner medullary collecting ducts.

Fig. 2.

Fig. 2.

Summary of elements of the Database of Vasopressin Regulation in Kidney (Protein Targets). A: relative frequencies of specified actions (verb phrases). B: frequency distribution of objects, i.e., protein targets. C: most frequent objects, i.e., protein targets (represented by official gene symbols). D: relative frequencies of experimental systems used.

Physiological Targets database.

The Database of Vasopressin Regulation in Kidney (Physiological Targets) (https://helixweb.nih.gov/ESBL/Database/Meta_Analysis/VasoPhysio.html) is organized in a format similar to the Protein Targets database. The syntactic triplet structure is maintained. The objects in this database are physiological functions rather than official gene symbols. Figure 3 describes characteristics of the database. The pie chart in Fig. 3A presents the frequency of physiological targets represented. Permeability and intracellular concentrations of signaling molecules are the most frequently represented object types in the database. Figure 3B shows the frequency of experimental systems. Studies using isolated, perfused tubules from rats or mice have the greatest frequency.

Fig. 3.

Fig. 3.

Summary of elements of the Database of Vasopressin Regulation in Kidney (Physiological Targets). A: relative frequencies of specified objects (physiological targets). B: distribution of Experimental Systems.

mRNA Targets database.

The Database of Vasopressin Regulation in Kidney (mRNA Targets) (https://helixweb.nih.gov/ESBL/Database/Meta_Analysis/VasomRNA.html) is organized in a format similar to the Protein Targets and Physiological Targets databases with the syntactic triplet structure maintained. The objects in this database are official gene symbols.

Protein kinase network.

In the remainder of this paper, we present an example of how this knowledge base can be used. We have extracted all protein kinases from the Database of Vasopressin Regulation in Kidney (Protein Targets) using Gene Ontology Molecular Function terms, the Swiss-Prot Protein Knowledgebase (http://www.uniprot.org/docs/pkinfam), individual Swiss-Prot protein records, and the listing of mammalian protein kinases on KinBase (http://kinase.com/kinbase/). These vasopressin-regulated protein kinases (all in the collecting duct) are shown in Table 1. There were 48 different entries for 36 distinct protein kinases. The vasopressin-regulated kinases in this table were members of seven different protein kinase families (22), viz., the AGC, CAMK, CK1, CMGC, STE, TKL, and TK kinases, as well as two kinases classified as “Other.” The family classification provides information about the target specificities of its members. AGC and CAMK members are usually basophilic serine/threonine kinases. CK1 family members are acidophilic serine/threonine kinases. CMGC members are generally proline-directed serine/threonine kinases, and TK members are tyrosine kinases. The STE and TKL kinases tend to have more highly variable specificities. The last column of Table 1 shows predicted changes in whole cell protein kinase activity induced by vasopressin, as inferred from the measured response and the kinase's annotation. The phosphorylation changes that are known to alter kinase activity were found on PhosphoSitePlus (http://www.phosphosite.org/) and/or in individual Swiss-Prot records. In addition, we assume that changes in total abundance of a given protein kinase are associated with changes in whole cell kinase activity as seen for Grk4, Pak2, and Pak3.

Table 1.

Protein kinases found in Database of Vasopressin Regulation in Kidney (Protein Targets)

Vasopressin Action Gene Symbol Phospho Site (if any) Experimental System Kinase Class PMID No. Vasopressin Effect on Activity
Increases phosphorylation of Akt1 At S473 In rat IMCD suspensions AGC 18667481 Increases total cellular activity
Increases phosphorylation of Akt1 At T308 In rat IMCD suspensions AGC 18667481 Increases total cellular activity
Decreases phosphorylation of Cdc42bpb At S970 In rat IMCD AGC 22108457 Unknown
Increases phosphorylation of Cdk11b At T238 In rat IMCD AGC 22108457 Unknown
Increases abundance of Grk4 In rat IMCD AGC 14532164 Increases total cellular activity
Increases translation rate of Prkcd In mouse mpkCCD AGC 24029424 Unknown
Increases phosphorylation of Camk2b At T287 In mouse mpkCCD CAMK 20139300 Allows kinase to be Ca2+/calmodulin independent
Increases phosphorylation of Camk2d At T287 In rat IMCD CAMK 20139300 Allows kinase to be Ca2+/calmodulin independent
Decreases abundance of Mylk In mouse mpkCCD CAMK 20940332 Decreases total cellular activity
Decreases phosphorylation of Mylk At S364 In nuclear pellet of mouse mpkCCD CAMK 22992673 Unknown
Decreases phosphorylation of Mylk At S364 In nuclear extract of mouse mpkCCD CAMK 22992673 Unknown
Decreases translation rate of Mylk In mouse mpkCCD CAMK 24029424 Unknown
Decreases phosphorylation of Csnk1e At T362 or S363 In rat IMCD CK1 22108457 Unknown
Increases phosphorylation of Cdk18 At S66 In rat IMCD CMGC 20075062 Unknown
Decreases phosphorylation of Gsk3a At Y279 in rat IMCD CMGC 22108457 Decreases total cellular activity
Decreases phosphorylation of Mapk1 At T183 and at Y185 In rat IMCD suspensions CMGC 18667481 Decreases total cellular activity
Decreases phosphorylation of Mapk1 At T183 and at Y185 In mouse mpkCCD CMGC 20139300 Decreases total cellular activity
Decreases phosphorylation of Mapk14 At Y182 In rat IMCD CMGC 22108457 Decreases total cellular activity
Decreases phosphorylation of Mapk3 At T203 and at Y205 In rat IMCD suspensions CMGC 18667481 Decreases total cellular activity
Decreases phosphorylation of Mapk3 At T203 and at Y205 In mouse mpkCCD CMGC 20139300 Decreases total cellular activity
Decreases phosphorylation of Mapk8 At T183 and at Y185 In mouse mpkCCD CMGC 20139300 Unknown
Decreases phosphorylation of Mapk9 At T183 and at Y185 In mouse mpkCCD CMGC 20139300 Unknown
Decreases phosphorylation of Mapk9 At T183 and at Y185 In rat IMCD CMGC 22108457 Decreases total cellular activity
Decreases phosphorylation of Aak1 At T618 or at S622 In mouse mpkCCD Other 20139300 Unknown
Increases phosphorylation of Camkk2 At S494 In rat IMCD Other 22108457 Unknown
Increases phosphorylation of Ulk3 At S219 In rat IMCD Other 22108457 Unknown
Decreases phosphorylation of Map2k1 At S218 and at S222 In rat IMCD suspensions STE 18667481 Decreases total cellular activity
Decreases phosphorylation of Map2k2 At S222 and at 226 In rat IMCD suspensions STE 18667481 Decreases total cellular activity
Decreases phosphorylation of Map2k3 At S218 In rat IMCD STE 22108457 Decreases total cellular activity
Decreases phosphorylation of Map2k5 At S311 In rat IMCD STE 22108457 Decreases total cellular activity
decreases phosphorylation of Map2k6 At S207 In rat IMCD STE 22108457 Decreases total cellular activity
Decreases phosphorylation of Map3k2 At S347 or at S349 In rat IMCD STE 22108457 Unknown
Increases phosphorylation of Map3k7 At S439 In rat IMCD STE 22108457 Unknown
Increases phosphorylation of Map4k5 At T400 In mouse mpkCCD STE 20139300 Unknown
Increases phosphorylation of Mink1 At S797 In rat IMCD STE 22108457 Unknown
Increases phosphorylation of Mink1 At S714 In rat IMCD STE 22108457 Unknown
Decreases phosphorylation of Pak2 At S55 In rat IMCD STE 22108457 Unknown
Increases abundance of Pak2 In mouse mpkCCD STE 20940332 Increases total cellular activity
Increases translation rate of Pak2 In mouse mpkCCD STE 24029424 Unknown
Increases abundance of Pak3 In mouse mpkCCD STE 20940332 Increases total cellular activity
Decreases phosphorylation of Tnik At S680 In rat IMCD STE 22108457 Unknown
Decreases phosphorylation of Hck At Y409 In rat IMCD TK 22108457 Decreases total cellular activity
Increases phosphorylation of Lmtk2 At S704 In rat IMCD TK 22108457 Unknown
Increases phosphorylation of Ptk2 At S913 In rat IMCD TK 22108457 Unknown
Increases phosphorylation of Ptk2 At T575 In rat IMCD TK 22108457 Unknown
Decreases phosphorylation of Araf At S255 In rat IMCD TKL 22108457 Decreases total cellular activity
Increases phosphorylation of Raf1 At S259 In rat IMCD suspensions TKL 18667481 Decreases total cellular activity
Increases phosphorylation of Raf1 At S29 In rat IMCD TKL 22108457 Decreases total cellular activity

PMID, PubMed identification number; CCD, cortical collecting duct; IMCD, inner medullary collecting duct.

We used the list of protein kinases from Table 1 plus several other proteins previously documented to play important roles in vasopressin signaling to generate a relational network using STRING (http://string-db.org/) (Fig. 4). For the STRING analysis, we used human gene symbols rather than rat or mouse gene symbols to take advantage of the richer annotation in many human records. The additional proteins were as follows: Avpr2 (the vasopressin V2 receptor), Aqp2 (aquaporin-2), Prkar1a (a protein kinase A-regulatory subunit), Prkacb (a protein kinase A catalytic subunit), RhoA (a small GTP-binding protein), CDC42 (a small GTP-binding protein similar to RhoA), Rock2 (the most abundant Rho-associated protein kinase in the collecting duct), Gnas (a heterotrimeric G protein α-subunit that activates adenylyl cyclases), Gna11 (a heterotrimeric G protein α-subunit that activates signaling through phospholipase C β isoforms), Gna12 (a heterotrimeric G protein α-subunit that activates RhoA-dependent signaling), Ctnnb1 (β-catenin), Gsk3β (glycogen synthase kinase-3β), and Actb (β-actin). These additions are based on observations from several laboratories (3, 14, 16, 23, 26, 29, 30, 39). Some kinases identified in our database had no connections within the generated kinase network (Cdk11b, Cdk18, Ulk3, Map4k, Lmtk2, Aak1, Mink1, Map4k5, and Camkk2) and thus do not appear in Fig. 4. In cases where there were multiple genes coding for similar proteins (e.g., Gna12 and Gna13), we used the one expressed at the highest level in transcriptomic studies (38, 40). The resulting network was edited in Cytoscape for display (Fig. 4). In this diagram, the vasopressin V2 receptor (Avpr2) is at the top and the regulated processes are at the bottom, so that the flow of signaling information is hypothetically from top to bottom. The different kinase classes are indicated by node color (see the figure legend), and the direction of change in total cellular kinase activity is indicated by the color of the node border (blue = increased; red = decreased).

Fig. 4.

Fig. 4.

Protein-kinase network for the action of vasopressin in collecting duct cells was constructed from data in Table 1 using STRING (http://string-db.org/), followed by manual editing using Cytoscape (http://www.cytoscape.org/). Several additional proteins were added (see text) at the level of the STRING input to incorporate well-established knowledge about vasopressin signaling, viz., Avpr2, Aqp2, Prkar1a, Prkacb, RhoA, CDC42, Rock2, Gnas, Gna11, Gna12, Ctnnb1, Gsk3β, and Actb. Node colors indicate protein kinase family (see figure key). Node borders designate whether the protein kinase was inferred to increase (blue) or decrease (red) in activity. AQP2, aquaporin-2.

Several MAP kinases [Mapk1 (ERK2), Mapk3 (ERK1), Mapk8 (JNK1), Mapk9 (JNK2), and Mapk14 (p38α)] are represented in Fig. 4, and they all show decreases in activity, accounting ultimately for the vasopressin-induced decrease in aquaporin-2 phosphorylation at Ser-261 (13, 15, 24). It can be noted that Ser-259 of Raf1 at the head of the MAP kinase cascade can be phosphorylated by protein kinase A (Prkacb) (9). Since phosphorylation at this site inhibits kinase activity (1), this observation suggests a mechanism that would explain inhibition of MAP kinases by vasopressin through protein kinase A activation. We can find no direct evidence that protein kinase A is activated in collecting duct cells in response to vasopressin. However, this seems to be a reasonable assumption based on the observation that vasopressin increases cAMP levels in collecting duct cells (Physiological Targets database) with the knowledge that cAMP activates protein kinase A isoforms by binding to their inhibitory regulatory subunits (here Prkar1a). Protein kinase A has been shown to be capable of phosphorylating Ser-256 of aquaporin-2 (10, 16), which is necessary for its trafficking to the apical plasma membrane. An additional AGC-family kinase shown in Fig. 4 that is capable of phosphorylating aquaporin-2 at Ser-256 is Akt1 (10).

Figure 4 also includes several signaling elements that are calcium dependent. Vasopressin mobilizes intracellular calcium in the collecting duct through the V2 receptor (11, 32, 34). However, the mechanism of increase remains uncertain. Although the type 1 ryanodine receptor (Ryr1) has been proposed to be involved in this calcium mobilization (7), its expression level in the rat inner medullary collecting duct and mouse mpkCCD is at the noise level on microarrays (38, 40). In contrast, the high expression levels of the inositol 1,4,5-trisphosphate receptors (Itpr1, Itpr2, and Itpr3), suggests a role for phospholipase C-β-mediated effects, presumably via the heterotrimeric protein Gna11 which is much more highly expressed in collecting duct cells than the other alternative, Gnaq (38, 40). The calcium mobilization is thought to control multiple calmodulin-dependent processes in the collecting duct cell, including activation of three members of the CAMK2 family, namely, myosin-light chain kinase (Mylk), calmodulin-dependent kinase β (Camk2b), and calmodulin-dependent kinase δ (Camk2d), all represented in Fig. 4. Myosin light-chain kinase activity has been shown to be necessary for trafficking of aquaporin-2-containing vesicles to the plasma membrane, through regulation of blebbistatin-sensitive conventional myosins, myosin II-A, II-B, and/or II-C, in collecting duct cells (6, 8).

It has also been demonstrated that vasopressin depolymerizes apical cortical F-actin in the inner medullary collecting duct (33), and that this occurs through RhoA inactivation (18, 35). RhoA inactivation has been proposed to be a result of protein kinase A-mediated phosphorylation of RhoA at Ser188, an inhibitory site (36). The effect on actin polymerization is hypothetically due to decreased activity of Rock2, as indicated on the network diagram, although a role for Cdc42bpb, a CDC42-activated protein kinase, cannot be ruled out.

Note that several of the kinases linked to the network shown in Fig. 4 have not been studied with regard to their roles in vasopressin signaling in the collecting duct. Thus data archiving and network modeling from the data can be useful in identifying potential hypotheses for future reductionist studies of vasopressin action in the collecting duct.

DISCUSSION

This paper describes a knowledge base consisting of three databases that report renal epithelial actions of vasopressin. The first database, the Database of Vasopressin Regulation in Kidney (Protein Targets), reports vasopressin-induced changes in specific proteins. The second database, the Database of Vasopressin Regulation in Kidney (Physiological Targets), lists vasopressin-induced changes in physiological measures in renal epithelia. The third database, the Database of Vasopressin Regulation in Kidney (mRNA Targets), lists vasopressin-induced changes in mRNA abundances for specific genes. These databases are formatted to be read either by humans or by computers (25, 27); the latter is achieved through use of standard sentence-parsing algorithms (for example, http://nlp.stanford.edu:8080/parser/). Specifically, we curated the databases based on a syntactical triplet structure. Each data entry reads from left to right: <subject><verb phrase><object><prepositional phrases><parenthetical information>. This structure is built around simplified conventional English grammar. This allows readers or computers to extract the information from the database by reading entries from left to right without knowledge of the column-header text.

Automated extraction of information from text depends on parser algorithms which use syntactical context, along with other factors, to identify the relationship among individual words (12). The design of our databases is based on the idea that parser programs can be used to extract and categorize the information from the databases if they are structured in a simple, unambiguous sentence format. Our databases used data from a variety of biological publications. In theory, computers could extract similar information from the original publications. However, the standard English syntax and vocabulary used in most publications tend to be highly variable and complex. This complexity impedes successful automated information extraction and, in our experience, can lead to frequent false identification of word relationships. Instead, the simple sentence structure in these databases can be expected to allow text-mining programs to retrieve the information accurately.

A possible objective for the future would be to incorporate the above concept in the publication process by including annotations expressing the major conclusions of a given paper in a simplified format similar to that used in our databases. Ideally, these annotations would be generated by the authors themselves and reviewed along with the paper. This would allow databases, like the ones in this paper, to be automatically updated from the biological literature as it accrues. It would also allow researchers some control in how their findings are assimilated into the general body of knowledge.

Another important feature of the protein database and the mRNA database is the use of official gene symbols to represent objects. Protein nomenclature in publications is difficult to decipher because many gene products have multiple names and in some cases different gene products have the same name. The use of official gene symbols almost completely removes the inherent redundancy and ambiguity of protein nomenclature. In other words, the list of official gene symbols is a useful controlled vocabulary that helps to eliminate the possibility of false conclusions being extracted.

In all three databases, the verb (action) terms come from an open set of possible English verbs or verb-containing phrases. Generally, conventional English vocabulary suffices for these terms. Although some of the phrases used do not technically correspond to the conventional definition of an English verb, the liberal definition used here is helpful in constraining the object terms to a defined list of official gene symbols. In general, the manner of presenting the information makes questions like “What constitutes a verb?” unimportant. As long as a particular entry in the database can be read from left to right as a conventional English sentence, the information encoded is likely to be correctly conveyed either to an automated parser or to a human reader.

In all three databases, the subject column is presently limited to two terms, either AVP or DDAVP, which are different forms of vasopressin. This was a practical choice that allowed us to limit the initial curation task to ∼1,700 entries. Although the example presented shows how the current database can be successfully used to aid physiological modeling of a system like the renal collecting duct cell, there is a lot more additional information in the literature that is relevant to how vasopressin regulates water transport in the collecting duct than is represented in the current database. Thus future expansion of the database to use additional subject terms like cAMP and forskolin seems warranted.

In addition to facilitating automated data extraction, the manner of organization of our databases also allows human users to more readily understand the information conveyed. Note that the column headers can be viewed as superfluous, since the reader can readily deduce the meaning of each element based on its syntactic context. Thus a human user can simply read an entry from left to right as she would any sentence. The simple English sentence structure created in each entry, therefore, enhances the human reader's ability to understand what information is being conveyed.

We have provided an example of the use of the knowledge base described in this paper in Table 1 and Fig. 4, showing protein kinases and their relationships to two regulated biological processes in collecting duct cells, viz., regulation of aquaporin-2 phosphorylation and actin dynamics. The resulting kinase network (Fig. 4) can be used to identify the frontiers of knowledge in vasopressin signaling and to develop novel hypotheses. In particular, several of the protein kinases shown in Table 1 have not been studied directly with regard to their roles in vasopressin signaling and would be appropriate targets for future studies using reductionist techniques. Furthermore, the databases can help in systems biology applications (19). Specifically, large-scale data integration techniques, e.g., Bayesian modeling, can bring together different types of data to further understand how integrated physiological systems (such as the collecting duct epithelium) work (4).

Ideally, the creation and maintenance of databases like those reported here can be done by communities of scientists rather than by workers from individual laboratories. Accordingly, we have set up a mechanism to update our databases with user-submitted data. To do this, we have created a link to download a data submission form that can be completed in the prescribed format and submitted for incorporation into the online version of the database.

In the future, new databases may be readily created by other members of the physiology community, for example by reporting observations with subject terms besides those used here, viz., AVP or DDAVP. In this way, experts in different areas can develop and publish their own online databases reflecting their own bodies of knowledge. This is a surprisingly straightforward task that can be accomplished in a few days once the data are in hand by following the procedure described in the appendix.

GRANTS

This work was funded by the operating budget of the Division of Intramural Research, National Heart, Lung, and Blood Institute (projects ZO1-HL001285 and ZO1-HL 006129 to M. A. Knepper). T. Pisitkun is currently supported by CU Research Cluster: 2014 Ratchadapisek Sompoch Endowment Fund (Chulalongkorn University). A. Sanghi (Johns Hopkins University) and C. Corcoran (Duke University) were undergraduate students supported by the National Institutes of Health (NIH) Summer Internship Program. M. Zaringhalam was an undergraduate student supported by the Colgate University Off-Campus Study Program at the NIH.

DISCLOSURES

No conflicts of interest, financial or otherwise, are declared by the authors.

AUTHOR CONTRIBUTIONS

Author contributions: A.S., M.Z., J.D.H., T.P., and M.A.K. provided conception and design of research; A.S., M.Z., C.C.C., F.S., J.D.H., P.C.S., T.P., and M.A.K. analyzed data; A.S., M.Z., C.C.C., F.S., J.D.H., P.C.S., and T.P. interpreted results of experiments; A.S., M.Z., and C.C.C. prepared figures; A.S., M.Z., C.C.C., and M.A.K. drafted manuscript; A.S., M.Z., C.C.C., F.S., J.D.H., T.P., and M.A.K. edited and revised manuscript; A.S., M.Z., C.C.C., F.S., J.D.H., P.C.S., T.P., and M.A.K. approved final version of manuscript.

ACKNOWLEDGMENTS

Present address of T. Pisitkun: Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand.

Present address of F. Saeed: Depts. of Electrical and Computer Engineering and Computer Science, Western Michigan University, Kalamazoo MI.

Appendix: CREATING A DATABASE WEBPAGE

Setting up a database webpage is easier than many physiologists recognize. Here, we present a simple step-by-step protocol for setting up expert-curated database web pages like the ones presented in this paper.

Step 1. Organize the Data Using a Spreadsheet Program

Use a standard spreadsheet program such as Microsoft Excel to organize user-determined information in the manner desired. The format should be constrained such that the width is less than a single page. However, the length (the number of rows) can be as great as desired. Column headers, page header, and explanatory information can be formatted at the top of the spreadsheet page to specify the desired appearance of the webpage.

Step 2. Convert the Spreadsheet File to .html Format

Once editing of the spreadsheet is finalized, the user can convert it to an .html file that can be displayed by a browser. This is done in Microsoft Excel, for example, by selecting “Save as” and choosing “Web Page” on the “Save as Type” dropdown menu. User should choose “Selection: Single Sheet” if the spreadsheet contains multiple tabs and then follow remaining prompts to “Publish.” The .html file can be viewed in the user's default browser by clicking on the file name. Corrections can be made in the original spreadsheet or by editing the .html code in Notepad++ or a similar text editor.

Step 3. Transfer .html File to Web-Hosting Server

Once the user is satisfied with the appearance of the webpage, the .html file can be transferred to a computer with a functioning web server client such as Apache Tomcat. Most institutions have IT departments that can facilitate this step. Alternatively, commercial web-hosting companies can provide this service.

REFERENCES

  • 1.Abraham D, Podar K, Pacher M, Kubicek M, Welzel N, Hemmings BA, Dilworth SM, Mischak H, Kolch W, Baccarini M. Raf-1-associated protein phosphatase 2A as a positive regulator of kinase activation. J Biol Chem 275: 22300–22304, 2000 [DOI] [PubMed] [Google Scholar]
  • 2.Bessarabova M, Ishkin A, JeBailey L, Nikolskaya T, Nikolsky Y. Knowledge-based analysis of proteomics data. BMC Bioinformatics 13, Suppl 16: S13, 2012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Boone M, Deen PM. Physiology and pathophysiology of the vasopressin-regulated renal water reabsorption. Pflügers Arch 456: 1005–1024, 2008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Bradford D, Raghuram V, Wilson JL, Chou CL, Hoffert JD, Knepper MA, Pisitkun T. Use of LC-MS/MS and Bayes' theorem to identify protein kinases that phosphorylate aquaporin-2 at Ser256. Am J Physiol Cell Physiol 307: C123–C139, 2014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Cai Q, McReynolds MR, Keck M, Greer KA, Hoying JB, Brooks HL. Vasopressin receptor subtype 2 activation increases cell proliferation in the renal medulla of AQP1 null mice. Am J Physiol Renal Physiol 293: F1858–F1864, 2007 [DOI] [PubMed] [Google Scholar]
  • 6.Chou CL, Christensen BM, Frische S, Vorum H, Desai RA, Hoffert JD, de Lanerolle P, Nielsen S, Knepper MA. Non-muscle myosin II and myosin light chain kinase are downstream targets for vasopressin signaling in the renal collecting duct. J Biol Chem 279: 49026–49035, 2004 [DOI] [PubMed] [Google Scholar]
  • 7.Chou CL, Yip KP, Michea L, Kador K, Ferraris J, Wade JB, Knepper MA. Regulation of aquaporin-2 trafficking by vasopressin in renal collecting duct: roles of ryanodine-sensitive Ca2+ stores and calmodulin. J Biol Chem 275: 36839–36846, 2000 [DOI] [PubMed] [Google Scholar]
  • 8.Chou CL, Yu MJ, Kassai EM, Morris RG, Hoffert JD, Wall SM, Knepper MA. Roles of basolateral solute uptake via NKCC1 and of myosin II in vasopressin-induced cell swelling in inner medullary collecting duct. Am J Physiol Renal Physiol 295: F192–F201, 2008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Dhillon AS, Pollock C, Steen H, Shaw PE, Mischak H, Kolch W. Cyclic AMP-dependent kinase regulates Raf-1 kinase mainly by phosphorylation of serine 259. Mol Cell Biol 22: 3237–3246, 2002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Douglass J, Gunaratne R, Bradford D, Saeed F, Hoffert JD, Steinbach PJ, Knepper MA, Pisitkun T. Identifying protein kinase target preferences using mass spectrometry. Am J Physiol Cell Physiol 303: C715–C727, 2012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Ecelbarger CA, Chou CL, Lolait SJ, Knepper MA, DiGiovanni SR. Evidence for dual signaling pathways for V2 vasopressin receptor in rat inner medullary collecting duct. Am J Physiol Renal Fluid Electrolyte Physiol 270: F623–F633, 1996 [DOI] [PubMed] [Google Scholar]
  • 12.Evans JA, Rzhetsky A. Advancing science through mining libraries, ontologies, and communities. J Biol Chem 286: 23659–23666, 2011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Hoffert JD, Nielsen J, Yu MJ, Pisitkun T, Schleicher SM, Nielsen S, Knepper MA. Dynamics of aquaporin-2 serine-261 phosphorylation in response to short-term vasopressin treatment in collecting duct. Am J Physiol Renal Physiol 292: F691–F700, 2007 [DOI] [PubMed] [Google Scholar]
  • 14.Hoffert JD, Pisitkun T, Saeed F, Song JH, Chou CL, Knepper MA. Dynamics of the G protein-coupled vasopressin V2 receptor signaling network revealed by quantitative phosphoproteomics. Mol Cell Proteomics 11: M111, 2012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Hoffert JD, Pisitkun T, Wang G, Shen RF, Knepper MA. Quantitative phosphoproteomics of vasopressin-sensitive renal cells: regulation of aquaporin-2 phosphorylation at two sites. Proc Natl Acad Sci USA 103: 7159–7164, 2006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Katsura T, Gustafson CE, Ausiello DA, Brown D. Protein kinase A phosphorylation is involved in regulated exocytosis of aquaporin-2 in transfected LLC-PK1 cells. Am J Physiol Renal Physiol 272: F817–F822, 1997 [PubMed] [Google Scholar]
  • 17.Khositseth S, Pisitkun T, Slentz DH, Wang G, Hoffert JD, Knepper MA, Yu MJ. Quantitative protein and mRNA profiling shows selective post-transcriptional control of protein expression by vasopressin in kidney cells. Mol Cell Proteomics 10: M110, 2011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Klussmann E, Tamma G, Lorenz D, Wiesner B, Maric K, Hofmann F, Aktories K, Valenti G, Rosenthal W. An inhibitory role of Rho in the vasopressin-mediated translocation of aquaporin-2 into cell membranes of renal principal cells. J Biol Chem 276: 20451–20457, 2001 [DOI] [PubMed] [Google Scholar]
  • 19.Knepper MA. Systems biology in physiology: the vasopressin signaling network in kidney. Am J Physiol Cell Physiol 303: C1115–C1124, 2012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Knepper MA, Nielsen S, Chou CL, DiGiovanni SR. Mechanism of vasopressin action in the renal collecting duct. Semin Nephrol 14: 302–321, 1994 [PubMed] [Google Scholar]
  • 21.Knepper MA, Rector FC., Jr Urinary concentration and dilution. In: The Kidney, edited by Brenner BM. Philadelphia, PA: Saunders, 1995, p. 532–570 [Google Scholar]
  • 22.Manning G, Whyte DB, Martinez R, Hunter T, Sudarsanam S. The protein kinase complement of the human genome. Science 298: 1912–1934, 2002 [DOI] [PubMed] [Google Scholar]
  • 23.Moeller HB, Fenton RA. Cell biology of vasopressin-regulated aquaporin-2 trafficking. Pflügers Arch 464: 133–144, 2012 [DOI] [PubMed] [Google Scholar]
  • 24.Nedvetsky PI, Tabor V, Tamma G, Beulshausen S, Skroblin P, Kirschner A, Mutig K, Boltzen M, Petrucci O, Vossenkamper A, Wiesner B, Bachmann S, Rosenthal W, Klussmann E. Reciprocal regulation of aquaporin-2 abundance and degradation by protein kinase A and p38-MAP kinase. J Am Soc Nephrol 21: 1645–1656, 2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Quest DJ, Land ML, Brettin TS, Cottingham RW. Next generation models for storage and representation of microbial biological annotation. BMC Bioinformatics 11, Suppl 6: S15, 2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Rao R. Glycogen synthase kinase-3 regulation of urinary concentrating ability. Curr Opin Nephrol Hypertens 21: 541–546, 2012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Rebholz-Schuhmann D, Oellrich A, Hoehndorf R. Text-mining solutions for biomedical research: enabling integrative biology. Nat Rev Genet 13: 829–839, 2012 [DOI] [PubMed] [Google Scholar]
  • 28.Robert-Nicoud M, Flahaut M, Elalouf JM, Nicod M, Salinas M, Bens M, Doucet A, Wincker P, Artiguenave F, Horisberger JD, Vandewalle A, Rossier BC, Firsov D. Transcriptome of a mouse kidney cortical collecting duct cell line: effects of aldosterone and vasopressin. Proc Natl Acad Sci USA 98: 2712–2716, 2001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Sasaki S, Yui N, Noda Y. Actin directly interacts with different membrane channel proteins and influences channel activities: AQP2 as a model. Biochim Biophys Acta 1838: 514–520, 2014 [DOI] [PubMed] [Google Scholar]
  • 30.Schenk LK, Bolger SJ, Luginbuhl K, Gonzales PA, Rinschen MM, Yu MJ, Hoffert JD, Pisitkun T, Knepper MA. Quantitative proteomics identifies vasopressin-responsive nuclear proteins in collecting duct cells. J Am Soc Nephrol 23: 1008–1018, 2012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Shayakul C, Knepper MA, Smith CP, DiGiovanni SR, Hediger MA. Segmental localization of urea transporter mRNAs in rat kidney. Am J Physiol Renal Physiol 272: F654–F660, 1997 [DOI] [PubMed] [Google Scholar]
  • 32.Siga E, Champigneulle A, Imbert-Teboul M. cAMP-dependent effect of vasopressin and calcitonin on cytosolic calcium in rat CCD. Am J Physiol Renal Fluid Electrolyte Physiol 267: F354–F365, 1994 [DOI] [PubMed] [Google Scholar]
  • 33.Simon H, Gao Y, Franki N, Hays RM. Vasopressin depolymerizes apical F-actin in rat inner medullary collecting duct. Am J Physiol Cell Physiol 265: C757–C762, 1993 [DOI] [PubMed] [Google Scholar]
  • 34.Star RA, Nonoguchi H, Balaban R, Knepper MA. Calcium and cyclic adenosine monophosphate as second messengers for vasopressin in the rat inner medullary collecting duct. J Clin Invest 81: 1879–1888, 1988 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Tamma G, Klussmann E, Maric K, Aktories K, Svelto M, Rosenthal W, Valenti G. Rho inhibits cAMP-induced translocation of aquaporin-2 into the apical membrane of renal cells. Am J Physiol Renal Physiol 281: F1092–F1101, 2001 [DOI] [PubMed] [Google Scholar]
  • 36.Tamma G, Klussmann E, Procino G, Svelto M, Rosenthal W, Valenti G. cAMP-induced AQP2 translocation is associated with RhoA inhibition through RhoA phosphorylation and interaction with RhoGDI. J Cell Sci 116: 1519–1525, 2003 [DOI] [PubMed] [Google Scholar]
  • 37.Thomas S, Bonchev D. A survey of current software for network analysis in molecular biology. Hum Genomics 4: 353–360, 2010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Uawithya P, Pisitkun T, Ruttenberg BE, Knepper MA. Transcriptional profiling of native inner medullary collecting duct cells from rat kidney. Physiol Genomics 32: 229–253, 2008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Valenti G, Procino G, Tamma G, Carmosino M, Svelto M. Minireview: aquaporin 2 trafficking Endocrinology 146: 5063–5070, 2005 [DOI] [PubMed] [Google Scholar]
  • 40.Yu MJ, Miller RL, Uawithya P, Rinschen MM, Khositseth S, Braucht DW, Chou CL, Pisitkun T, Nelson RD, Knepper MA. Systems-level analysis of cell-specific AQP2 gene expression in renal collecting duct. Proc Natl Acad Sci USA 106: 2441–2446, 2009 [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from American Journal of Physiology - Renal Physiology are provided here courtesy of American Physiological Society

RESOURCES