Skip to main content
Matrix Biology Plus logoLink to Matrix Biology Plus
. 2019 Feb 21;1:100001. doi: 10.1016/j.mbplus.2018.11.001

The in-silico characterization of the Caenorhabditis elegans matrisome and proposal of a novel collagen classification

Alina C Teuscher a, Elisabeth Jongsma a, Martin N Davis b, Cyril Statzer a, Jan M Gebauer c,, Alexandra Naba b,d,, Collin Y Ewald a,
PMCID: PMC7852208  PMID: 33543001

Abstract

Proteins are the building blocks of life. While proteins and their localization within cells and sub-cellular compartments are well defined, the proteins predicted to be secreted to form the extracellular matrix - or matrisome - remain elusive in the model organism C. elegans. Here, we used a bioinformatic approach combining gene orthology and protein structure analysis and an extensive curation of the literature to define the C. elegans matrisome. Similar to the human genome, we found that 719 out of ~20,000 genes (~4%) of the C. elegans genome encodes matrisome proteins, including 181 collagens, 35 glycoproteins, 10 proteoglycans, and 493 matrisome-associated proteins. We report that 173 out of the 181 collagen genes are unique to nematodes and are predicted to encode cuticular collagens, which we are proposing to group into five clusters. To facilitate the use of our lists and classification by the scientific community, we developed an automated annotation tool to identify ECM components in large datasets. We also established a novel database of all C. elegans collagens (CeColDB). Last, we provide examples of how the newly defined C. elegans matrisome can be used for annotations and gene ontology analyses of transcriptomic, proteomic, and RNAi screening data. Because C. elegans is a widely used model organism for high throughput genetic and drug screens, and to study biological and pathological processes, the conserved matrisome genes may aid in identifying potential drug targets. In addition, the nematode-specific matrisome may be exploited for targeting parasitic infection of man and crops.

Keywords: Nematode, Extracellular matrix, Collagen, Cuticle, Basement membrane

Highlights

  • Pipeline combining gene- and protein-sequence analysis to predict the C. elegans matrisome

  • The in-silico C. elegans matrisome comprises 719 genes.

  • The 185 C. elegans collagen-domain-containing proteins are classified into 4 groups.

  • The 173 cuticular collagens are further classified into 5 clusters based on their domain organization.

  • The C. elegans Matrisome Annotator is an online tool to identify matrisome genes and proteins in large datasets.

Introduction

Around one third of the human world population, including a majority of children, is infected by parasitic nematodes [1,2]. In addition, plant-parasitic nematodes are one of the most infectious species in agriculture with an impact on economic loss of about 100 billion dollars per year [3]. The major barriers for drugs to penetrate parasitic nematodes are its collagenous cuticle, an exoskeleton, and an extracellular matrix (ECM). The free-living nematode C. elegans has been widely used as a surrogate model organism for parasitic nematodes [4], as well as for host-pathogen interactions [5], and other fundamental biological processes [6]. C. elegans is also used as a pioneering in-vivo model for biomedical research because about 40% of C. elegans genes are conserved in the human genome [7], and vice versa between 60 and 80% of human genes have a corresponding orthologue in the C. elegans genome [8]. In addition, 40% of human genes associated with diseases are well conserved in C. elegans [9]. C. elegans is genetically tractable for high throughput screens and is one of the best curated organisms for genetic, genomic, and phenotypic data. The vast array of openly shared molecular tools paved the way to gain molecular, functional, and mechanistic insights into gene and protein functions [8]. In particular, the two major extracellular matrices of C. elegans, the cuticle [10] and basement membrane [[11], [12], [13]], have recently become models to study cancer cell invasion [14] and aging [15]. However, a precise Gene Ontology term or a comprehensive compendium of genes predicted to form the C. elegans matrisome remains to be defined.

Using characteristic features of ECM proteins and a computational pipeline combining interrogation of protein and gene databases, we previously defined the matrisome as the ensemble of ECM and ECM-associated proteins [[16], [17], [18]]. In mammals, the matrisome represents 4% of the genome, or approximately 1000 genes. We further classified these genes into core matrisome components, consisting of collagens, proteoglycans, and glycoproteins (including laminins, fibronectins, etc.), and matrisome-associated components, including proteins that could incorporate into ECMs or are co-purified with ECM proteins. These components are further subdivided into ECM-affiliated proteins (e.g., C-type lectins, galectins, annexins, semaphorins, syndecans, and glypicans), ECM regulators (e.g., MMPs, ADAMs, and crosslinking enzymes), and secreted factors (e.g., TGF-β, BMPs, FGFs, Wnt proteins, and chemokines) [[16], [17], [18]]. More recently, we employed a computational approach to predict the in-silico matrisome of the zebrafish [19]. Defining the matrisome of organisms has been instrumental to annotate transcriptomic and proteomic data and has permitted the identification of ECM signatures of biological processes [20] and of human diseases including cancers and fibrosis [[21], [22], [23], [24], [25]].

Here, we devised a novel bioinformatic pipeline combining gene orthology and de-novo identification to define the C. elegans matrisome. We report the identification of 719 genes potentially encoding ECM and ECM-associated proteins, including 181 collagens of which 173 are predicted to be components of the cuticle. Based on their collagen-domain organization, we propose to group these cuticular collagens into five novel clusters and further divide them in sub-clusters. In addition, we demonstrate that the newly defined C. elegans matrisome can be used to annotate data from high throughput RNAi screens, transcriptomic, and proteomic data, and can assist with the identification of ECM genes or signatures relevant in the context of various physiological and pathological processes.

Computational approach to define the C. elegans matrisome

The workflow and steps for defining the C. elegans matrisome are outlined in Fig. 1.

Fig. 1.

Fig. 1

Workflow of the pipeline devised to define the in-silico C. elegans matrisome.

Identification of C. elegans orthologues of human matrisome genes

The orthologue list was created by comparing the human matrisome gene list downloaded from the Matrisome Project website (http://matrisome.org/)[26] with the C. elegans genome using the Greenwald Lab OrthoList website (http://greenwaldlab.org/ortholist/; accessed 07.04.2017, [7]). The OrthoList uses four different orthology-prediction programs (Ensembl Compara, In Paranoid, Homologene, and OrthoMCL) to obtain the C. elegans orthologues from human Ensembl ID numbers. We included all genes that were found by at least one prediction program from the OrthoList. The human Ensemble IDs were then translated back into HUGO gene names using Ensembl BioMart [27]. This approach allows the identification of 348 C. elegans genes orthologous to human matrisome genes (Supplementary Table 1).

Domain-based definition and Gene Ontology annotations of matrisome proteins

We initially defined the mammalian matrisome by using the presence of characteristic protein domains commonly found in ECM proteins [17,18]. To verify that the orthology approach identified key components of the C. elegans matrisome, we focused on a specific category of matrisome proteins: the ECM-affiliated proteins, which are proteins that share structural and functional homologies with ECM components [17,18]. To do so, we retrieved the C. elegans reference proteome (UP000001940 downloaded August 14, 2017; Supplementary Table 2A) from the UniProt database [28] and identified proteins containing domains previously defined as characteristic of the 6 families of ECM-affiliated components (Fig. 1 and Supplementary Table 2B): the transmembrane proteoglycans syndecans and glypicans, the galectins, the plexins and semaphorins, and the annexins. Comparison of the list of proteins obtained using this approach (Supplementary Table 2C–H) and the list of genes identified by orthology revealed that all but 2 ECM-affiliated proteins were found by both approaches (Supplementary Table 1), suggesting that both approaches may be used to define the complete matrisome. However, the orthology-based approach does not permit to identify nematode-specific ECM proteins. One ECM structure specific to C. elegans is the cuticle. It is made of cuticular collagens and cuticlins [10,29,30]. To define the ensemble of proteins potentially contributing to cuticle ECM, we identified an InterPro domain termed “Nematode cuticle collagen, N-terminal” (InterPro domain IPR002486; [31]), this domain retrieved 171 UniProt entries in the C. elegans proteome, out of which 128 also contain the canonical collagen triple helix repeat (IPR008160) (Supplementary Table S2I and J). Close examination of the list of collagen genes obtained revealed that some might not have been found using the InterPro domain. We thus further sought to define C. elegans collagens using a more rigorous structural domain-based approach.

De-novo identification of C. elegans collagens

To identify in an unbiased manner and de-novo all the collagen proteins in C. elegans, we downloaded all reviewed and unreviewed protein entries from the UniProt database (release 2018_01; [28]). Using HMMER3 [32] and the standard HMM profile (PF01391) for collagens from the Pfam website [33] identified 219 sequences, which included several duplicated entries. However, this approach missed various bona fide collagens in C. elegans, such as col-51 and col-142, probably due to their small collagenous domains interrupted by non-collagenous stretches. Collagen domains are characterized by their glycine-X-Y amino acid triplet repeats, whereby X and Y are frequently proline and 4-hydroxyproline residues, respectively. In vitro, 10 Gly-X-Y repeats are typically sufficient to form stable triple helices with melting temperatures, depending on the content of proline and hydroxyproline residues on the X and Y position. Therefore, for a more sensitive approach, we generated a simple regular expression in Python matching at least 10 Gly-X-Y repeats (regex = r(G.){10,}) and used it against the above-mentioned dataset. In total, we found 243 entries in the UniProt database matching this pattern. After deleting duplicated entries and cross-referencing against the WormBase (WS262) [34], we obtained a list of 201 unique entries. However, besides a repeating Gly-X-Y pattern, collagens also need frequent proline residues at the X and Y positions as this amino acid is important for the formation of single poly-proline II helices, which are the backbone of the collagen triple helix. In vertebrates, the percentage of proline residues at the X and Y position is approximately 30%. To avoid missing any potential sequences, we decided to use a cut-off of 10% proline at both positions, which still represents the double of the normal frequency of proline residues in the C. elegans proteome [35]. With this criterion, the list was narrowed to 190 potential collagen sequences, which were all curated manually for the likelihood of being a collagen. Five sequences (UniProt Q4W4Y5, G5EDS0, B3GWA1, O61209, Q9N3I0) were excluded, since they contain short glycine rich repeats, with only very few proline residues and no apparent collagen structures. These five proteins were also not recognized by the Pfam collagen profile. On the other hand, various sequences were not predicted to be collagen by the Pfam profile, but upon manually inspection have proper collagen domains (COL-51, COL-103, COL-161, COLl-142 and COL-183,). Finally, after manual curation, we identified 185 genes in total that encode collagen-domain containing proteins in C. elegans (Supplementary Table 4). Of these, 4 genes could be classified as gliomedins or collectins based on their small collagenous domain and the presence of further signature domains (see below). The remaining 181 proteins define the existent collagens in C. elegans.

The C. elegans matrisome consists of 719 genes

After combining the lists of genes and proteins identified above, we manually curated each entry and assigned them to matrisome divisions and categories. Last, in order to identify putative matrisome genes and proteins that have not been captured by the gene orthology approach or the structural domain-based approach, we searched both WormBase (http://www.wormbase.org/, release WS263; [34]) and the C. elegans reference proteome from UniProt to identify genes and proteins annotated as ECM genes by a selection of Gene Ontology – Cellular Component terms (Supplementary Table 3). This last step allowed us to identify an additional 11 genes that had not been identified otherwise and may be considered as matrisome components (Supplementary Table 1; see Column A, with the exception of col-78, which was identified earlier by the structural domain-based approach).

Altogether, we identified 719 C. elegans matrisome genes out of the total ~20,000 C. elegans protein-encoding genes, suggesting that 4% of the C. elegans genome is dedicated to ECM genes (Table 1; Supplementary Table 1). This is comparable to the 1027 human matrisome genes, which also represents about 4% the human genome [17,26]. We further classified these 719 genes into divisions and categories proposed to classify the mammalian matrisomes. We found 226 genes for the C. elegans core-matrisome (ECM glycoproteins, collagens, proteoglycans). However, 181 out of the 226 C. elegans core-matrisome genes are collagen genes (Table 1), of which 173 are predicted to be nematode-specific cuticular collagens (see below). We found that the C. elegans genome encodes 35 ECM glycoproteins compared to 195 found in humans (Table 1). All 35 C. elegans ECM glycoproteins have mammalian orthologues and thus far no C. elegans-specific ECM glycoprotein was identified (Table 2). By contrast, 7 out of the 10 C. elegans proteoglycans are nematode-specific and several are sulfate-less-chondroitin-binding proteoglycans (cpg-1-4, cpg-7-9) [36]. The remaining three proteoglycans are similar to the heparan sulfate proteoglycan perlecan (unc-52; Hspg2 in mammals) [13], a SPOCK/Testican (test-1), and a leucine-rich proteoglycan nyctalopin (lron-8) (Supplementary Table 1).

Table 1.

Comparison of the number of human to C. elegans matrisome genes. Corresponding genes for each category are found in Supplementary Table 1.

Comparison of the number of human to C. elegans matrisome genes. Corresponding genes for each category are found in Supplementary Table 1.

Human
C. elegans
Complete matrisome 1027 719
Core matrisome ECM glycoproteins 195 35
Collagens 44 181
Proteoglycans 35 10
Cuticlins 0 12
Total 274 238
Matrisome-associated ECM-affiliated proteins 171 301
ECM regulators 238 128
Secreted factors 344 52
Total 753 481

Table 2.

Comparison of conserved versus nematode-specific matrisome genes. Corresponding genes for each category are found in Supplementary Table 1.

Comparison of conserved versus nematode-specific matrisome genes. Corresponding genes for each category are found in Supplementary Table 1.

Conserved matrisome
Nematode-specific matrisome
Human to C. elegans orthologues
[# found/total (percentage)]
Not found in mammals
[# found/total (percentage)]
Complete matrisome 467/1027 (45%) 252/719 (35%)
Core matrisome ECM glycoproteins 35/195 (18%) 0/35 (0%)
Collagens 4/44 (9%) 177/181 (98%)
Proteoglycans 3/35 (9%) 7/10 (70%)
Cuticlins 12/12 (100%)
Total 42/274 (15%) 196/238 (82%)
Matrisome-associated ECM-affiliated proteins 290/171 (170%) 11/301 (4%)
ECM regulators 97/238 (41%) 31/128 (24%)
Secreted factors 38/344 (11%) 14/52 (27%)
Total 425/753 (56%) 56/481 (12%)

The C. elegans genome comprises 493 matrisome-associated genes (ECM-affiliated proteins, ECM regulators, and secreted factors) compared to the 753 human matrisome-associated genes. The majority of these 493 C. elegans matrisome-associated genes are C-type lectins (240 genes; Table 1 and Supplementary Table 1) [37].

Orthology relationship between human and C. elegans matrisome genes

Next, we determined the conserved versus nematode-specific matrisome genes for each matrisome category (Table 2). We compared the human matrisome genes to the C. elegans matrisome genes and vice-versa using OrthoList [7], or directly aligned them and examined the conservation of domains. In agreement with previous reports [10,38], most of the C. elegans collagens are predicted to be cuticular collagens that share no or little orthology to mammalian collagens (Table 2). However, other collagens and ECM proteins that originated in basal metazoans are found to be well conserved in C. elegans (Fig. 2). These include basement membrane proteins (laminins, collagen type IV, nidogen, perlecan), transmembrane proteoglycans classified as ECM-affiliated proteins, syndecan and glypican, other collagens (type IX, XVIII, and XXV collagens), and axon guidance proteins (netrins, slits, agrin, fibrillin) (Fig. 2). Although thrombospondins are found in metazoans and we found many C. elegans proteins containing thrombospondin domains, we did not find a thrombospondin orthologue in agreement with previous reports [39]. Furthermore, ECM proteins that evolved during the vertebrate expansions, such as fibronectin, complex collagens, LINK proteins, and hyalectans (Fig. 2), were not identified in the in-silico searches in C. elegans, consistent with previous reports [16]. Last, some ECM proteins identified are shared between nematodes and humans, but not with other organisms like yeasts or Drosophila. These proteins include hemicentin (him-4) [40], SPARC/osteonectin (ost-1) [41,42], fibulin (fbl-1) [43], spondin (spon-1) [44], and olfactomedin (unc-122) [45].

Fig. 2.

Fig. 2

Conserved C. elegans matrisome in the context of the evolution of ECM proteins. Figure is adapted from [16]. Corresponding C. elegans orthologues are italicized and indicate in parenthesis e.g. (lam-1). Individual genes and corresponding orthologues are found in Supplementary Table 1.

Conserved C. elegans matrisome in the context of the evolution of ECM proteins. Figure is adapted from [16]. Corresponding C. elegans orthologues are italicized and indicate in parenthesis e.g. (lam-1). Individual genes and corresponding orthologues are found in Supplementary Table 1.

Taken together, our survey of the C. elegans genome and proteome provides the first comprehensive compendium of the C. elegans matrisome. To facilitate the use of our lists of predicted genes encoding ECM and ECM-associated proteins in the C. elegans genome, we have deposited them on a dedicated page of the Matrisome Project website (http://matrisome.org) [26]. In addition, we have built an online tool, the C. elegans Matrisome Annotator, which, provided a list of genes, returns it annotated for matrisome components (http://ce-matrisome-annotator.permalink.cc/; tutorial provided as Supplementary Data).

Proposal of a novel classification of C. elegans collagens

In order to better classify and study the 185 collagen-domain-containing proteins in C. elegans, we propose to define a novel nomenclature based on their collagen-domain organization and the addition of other characteristic protein domains (e.g. C-type lectin; C4, the collagen IV NC1 domain; TSP; FNIII), similar to the mammalian collagen classification [46]. To do so, we clustered the 181 collagens and the 4 collagen-domain containing proteins into four major groups: (1) the vertebrate-like collagens (similar to mammalian type IV, XVIII, XXV), (2) the collagen-domain-containing proteins with mammalian orthologues (collectins and gliomedin), (3) the non-cuticular collagens with no clear orthology to mammalian collagens, and (4) the cuticular collagens. This last group contains the largest number of 173 collagens and which we further propose to subdivide into five main clusters (A to E). For detailed comparison and to facilitate the diffusion of this proposed classification, we constructed the C. elegans collagen database, CeColDB, available at: http://CeColDB.permalink.cc/.

Group 1: the conserved vertebrate-like collagens in C. elegans

Although fibrillar collagens are found in metazoans [47,48], our computational approach did not find any genes encoding fibrillar collagens in the C. elegans genome, which is in agreement with previous reports [49]. It has been hypothesized that C. elegans might have lost fibrillar collagens, since no evidence for an interstitial matrix is found in C. elegans [14,49,50]. However, the basement membrane type IV collagens are well conserved in C. elegans [13]. The C. elegans collagen-IV-like proteins are encoded by two genes emb-9 and let-2, which both have collagenous domains of 1488 and 1487 amino acids, respectively, similar to their vertebrate (1398 amino acids) counterparts (Fig. 3A). The C-terminal domain (C4) is well conserved and the sequence identity of 52% and 69% among the nematode and human domains are similar to the variance in the 6 existing human genes. In phylogenetic analyses, the C-terminal domain of LET-2 consistently clusters with the even-numbered collagen alpha chains (α2(IV), α4(IV) and α6(IV)), while EMB-9 groups with the odd-numbered collagen IV chains (α1(IV), α3(IV) and α5(IV)) (Fig. 3B). In humans, heterotrimers of collagen IV are formed by one even-numbered and two odd-numbered chains ([α1]2[α2], [α3][α4][α5], [α5]2[α6]). Thus, we speculate that the collagen IV in C. elegans is an [EMB-9]2[LET-2] heterotrimer. In addition, let-2 is alternatively spliced whereby one version is predominantly found in embryos and the other version in larval stages [51]. Both emb-9 and let-2 are essential genes and glycine mutations in the Gly-X-Y repeats result in retainment of this mutant collagen in the endoplasmic reticulum and arrest in embryonic development [52,53].

Fig. 3.

Fig. 3

Group 1: C. elegans collagens with orthologues in vertebrates.

(A) Collagen type IV. Alignment of collagen type IV α1 from Homo sapiens with EMB-9 and LET-2 from Caenorhabditis elegans.

(B) Phylogenetic analysis of collagen type IV. The C-terminal collagen type IV domains (C4) of humans and C. elegans were analyzed using ClustalO [70] followed by Neighbor Joining. The numbers indicate the bootstrap values of 100 replicates. The nematode-specific sequences are indicated in bold (EMB-9 and LET-2).

(C) Endostatins. Comparison of CLE-1 from C. elegans with endostatin-containing collagens type XV and XVIII from H. sapiens. Asterisk (*) indicate domain predictions with weak significance.

(D) Transmembrane collagens. Comparison of C. elegans COL-99, the only non-cuticular transmembrane collagen, with its human orthologues (collagens type XIII, XXIII, XXV). For human proteins, dark outlines group collagenous stretches recognized as collagen domains in earlier publications. All panels are drawn to scale. Colour codes are as follow: light blue: signal peptides; pink: transmembrane region; orange: frizzled domain or collagen C4 domain; yellow: Laminin G-like domain; purple: endostatin domain; blue: collagenous Gly-X-Y repeats.

The C. elegans CLE-1 protein has similarities to collagen type XV [54] and XVIII [55]. CLE-1 is also found in basement membranes, but predominantly localized around neurons. Similar to the phenotype of the Col18a1-null mice [55], reduction of C. elegans cle-1 function results in defects in the organization of the nervous system, but, in contrast, also results in a partially-penetrant embryonic lethality which may be due to failure of epidermal cell migration [56]. CLE-1 has one or two fibronectin-type-III-like domains, a laminin G-like domain, a very short interrupted collagenous domain, and an endostatin domain (Fig. 3C). Based on the last domains, CLE-1 is classified as an orthologue of collagen type XV [54] or XVIII [55], however, it is worth noting that the overall sequence identity is only 14% and 18% for collagen XV and XVIII, respectively, and the collagenous domain of CLE-1 is short in contrast to collagen XV or XVIII. The C. elegans collagen COL-99 is also not an essential protein, but is important for the organization of the nervous system [57]. COL-99 is a type II transmembrane-domain-containing protein with a smaller cytoplasmic region and a larger extracellular region containing 10 smaller collagenous domains. It therefore formally groups with the vertebrate Membrane-Associated Collagens with Interrupted Triple-helices (MACITs: collagen types XIII, XXIII, and XXV; Fig. 3D) [58].

Group 2: collagen-domain-containing proteins with mammalian orthologues

We identified four collagens with additional non-collagenous domains, which, based on their domain organization, resemble mammalian gliomedins and collectins. The group of gliomedin-like proteins consists of cof-2 and unc-122. Both have a predicted N-terminal transmembrane domain, followed by a collagenous domain of 15 triplets and a C-terminal olfactomedin-like domain (Fig. 4A). However, despite their similar domain organization both proteins only share approximately 26% sequence identity with each other. Mutations in unc-122 cause an uncoordinated locomotory behavior, the so-called Unc phenotype [59]. The group of collectins harbours two genes (clec-222 and clec-223) which are oriented in a head-to-tail fashion on chromosome V. Both have very short collagenous domains (10 triplets), which might still permit trimerization and 1 or 3 C-type lectin domains (Fig. 4B).

Fig. 4.

Fig. 4

Domain organization of C. elegans collagens from Group 2 and 3.

(A-B) Group 2: Collagen-domain containing proteins with mammalian orthologues.

(A) Gliomedin-like collagens are characterized by an olfactomedin-like domain.

(B) Collectins carry C-type lectin domains and only small collagenous domains.

(C) Group 3: The C. elegans non-cuticular collagens with no clear orthology to mammalian collagens. MEC-5 is a neuron-specific collagen. For COL-55 and ROL-8 the hypothetical Nematode cuticle collagen N-terminal domain (PF01484) overlaps with the predicted transmembrane domain. COL-135 is high in GxK and GxD repeats (Supplementary Fig. 1).

All panels are drawn to a common scale. Colour codes are as follow: light blue: signal peptides; pink: transmembrane region; dark violet: N-cuticular domain (PF01484); green: C-type lectin; red: olfactomedin-like domain; blue: collagenous Gly-X-Y repeats.

Group 3: the C. elegans non-cuticular collagens with no clear orthology to mammalian collagens

This group consists of four collagens that cluster neither with basement membrane or vertebrate-like collagens nor with cuticular collagens. We can speculate that they might have specialized functions or localize to other ECMs than the basement membrane or the cuticle. MEC-5 is a collagen of medium size with a short collagen domain, similar to the N-terminal pro-helices found in fibrillar collagens, followed by a major uninterrupted collagenous domain. There is no further domain predicted and no obvious similarities to vertebrate collagens (Fig. 4C). The MEC-5 collagen is produced and secreted from hypodermal cells to anchor the ion channel/degenerin complex (MEC-4/10) that is expressed from touch receptor neurons to the ECM and thus MEC-5 is essential for the mechanosensory response to gentle touch [60,61]. COL-55 and ROL-8 are similar to the cuticular collagens discussed below, but are missing certain features, like the N-Pro-helix or the characteristic cysteine knots. COL-55 and ROL-8 are predicted to have a transmembrane domain, which overlaps with the predicted N-cuticular domain (Fig. 4C). Mutations in rol-8 cause a left-handed rolling phenotype (a helically twisted body), suggesting its importance in cuticle assembly and/or chirality [62]. COL-135 has been predicted to be a collagen but has a very particular composition. Its sequence contains a signal peptide, three short collagen domains, and a rather large domain of Gly-X-Y repeats. Proline residues are under-represented compared to other C. elegans collagens especially at the Y position (17.0% and 3.6% for X and Y in COL-135, 43.3% and 23.9% in all collagens), but lysine and aspartate residues are over-represented (28.6% and 22.2%, respectively in COL-135, compared to 4.3% and 8.1% in all collagens) (Supplementary Fig. 1A). Furthermore, 109 out of the 198 triplets are Gly-X-Lys repeats (Supplementary Fig. 1B). Although COL-135 meets the criteria stated above for being recognized as a collagen, it is uncertain whether it is able to form a bona fide collagen triple helix.

Group 4: the cuticular collagens

Approximately 80% of the cuticle is made of collagenous proteins [29]. Previously, cuticular collagens were grouped according to their cysteine knots into 6 groups based on 20 collagen sequences known at that time [63]. Furthermore, these 20 known collagen sequences showed four shared amino-acid-sequence motifs (termed “homology blocks”) in the N-terminal region before the Gly-X-Y domains [63]. Expanding the analysis from 20 to the 173 cuticular collagen genes identified in our study, we did not identify these shared homology blocks. We found three conserved features which occur in various combinations in many but not all cuticular collagens: a serine (in position 21 in BLI-6 and position 78 in the alignment; conserved in 75% of all cuticular collagens), a potential furin cleavage site with an RxxR consensus (in position 71–74 and 195–198, respectively; 93% conservation), and a tyrosine (in position 78 and 273, respectively; approx. 50% conservation) that may be important for tyrosine-tyrosine crosslinking (Supplementary Fig. 2).

Here, we defined the cuticular collagens based on their characteristic collagenous domains consisting of 37 to 43 Gly-X-Y triplets, which are flanked by N- and C-terminal cysteine knots (Fig. 5; Supplementary Table 4). As in fibrillar collagens found in vertebrates, there is an additional N-Pro-Helix of usually 10 G-X-Y repeats long located between 12 and 31 residues (in 97% of the cases between 13 and 23) N-terminally of the collagenous domain. This N-Pro-Helix is often stabilized by an additional cysteine knot (Fig. 5; Supplementary Table 4). Based on the interruptions in their main collagenous domain, we grouped the cuticular collagens into five main clusters (A to E), having either 0, 1, 2, 3, or >3 interruptions (Fig. 5; Supplementary Table 4). We further classified these five main clusters based on the length of their collagenous domains, the positions of interruptions, and their prediction of being transmembrane or secreted (Supplementary Figs. 3–7; Supplementary Table 4). These sub-clusters are numbered based on the length of their uninterrupted collagenous stretches, counting from the C-terminus in an ascending manner (Supplementary Table 4). Members of a sub-cluster often have the same cysteine knot (Supplementary Table 4) although the same cysteine knot might also occur in different clusters (Supplementary Table 4). One type of cysteine knot, reported to be important for tyrosine-crosslinking [63], can be found in 35 collagens (Supplementary Table 4/CysKnot C-Col domain, red-marked Y). Many, but not all, of the predicted transmembrane collagens have predicted furin cleavage sites potentially enabling the shedding of these collagens (Supplementary Table 4).

Fig. 5.

Fig. 5

Organization of the cuticular collagen clusters.

The 173 cuticular collagens were grouped into 5 clusters based on the interruptions in their main collagenous domain (blue boxes). All cuticular collagens contain a shorter N-terminal helical domain (red box) and typically three cysteine-rich regions (yellow) flanking the collagenous domains. Numbers in the boxes correspond to the number of Gly-X-Y repeats.

Cluster A

Cluster A comprises six members and is subdivided into four sub-cluster (A1–4; Supplementary Fig. 3; Supplementary Table 4). Members of sub-clusters A2 and A4 are predicted to be transmembrane proteins. Interestingly, sub-clusters A3 and A4 have the same length of their collagenous domain (Supplementary Fig. 3; Supplementary Table 4).

Cluster B

Cluster B comprises 84 members divided into 18 sub-clusters (Supplementary Fig. 4; Supplementary Table 4). The two collagenous domains are typically 20 or 21 triplets long and the non-collagenous interruption only differs by a few amino acids.

Cluster C

Cluster C comprises 53 members divided into 27 sub-clusters. 19 out of the 27 C sub-cluster consist only of one cuticular collagen gene (Supplementary Fig. 5; Supplementary Table 4).

Cluster D

Cluster D comprises 28 members forming 17 sub-clusters (Supplementary Fig. 6; Supplementary Table 4). Of note, COL-51 is predicted to be a multispan-membrane protein with a total of four transmembrane (TM) regions, with the collagenous domain between the second and the third TM. If this prediction is correct, it would be interesting to know how the collagenous domain of COL-51 forms.

Cluster E

Cluster E comprises two members, with five interruptions in their collagenous domain (Supplementary Fig. 7; Supplementary Table 4).

Taken together, the newly proposed classification of cuticular collagens is based on the structural similarity of the collagenous domain. This will help identify similar collagens with similar, redundant or compensatory functions. Furthermore, it is likely that heterotrimers exist within the cuticular collagen family. We hope that our system will help to identify potential candidates for heterotrimerization.

Low amino acid sequence similarity among cuticular collagens

Among the 173 cuticular collagens, approximately 30 genes form 9 similarity groups that share high sequence similarity to each other (>90%, Supplementary Fig. 8, pink/white).

Sequences with a similarity of over 90% normally belong to the same sub-cluster, with the minor exception of COL-146 and COL-147, which belong to C21a and C21c, respectively. However, not all members of a sub-cluster group by sequence similarity. For example, although many members of the sub-cluster B9 group together based on the sequence alignment (Supplemental Fig. 8, green bars), some members, like COL-152 or COL-123, are separated and only show weak sequence similarity to the sub-cluster (approx. 35%, Supplementary Fig. 8 or visit http://CeColDB.permalink.cc/ website and use “recursive on” with cluster B9). Additionally, some collagens show sequence similarity with members of B9, but group differently based on their collagen domain organization (e.g. COL-148 and COL-150). A similar pattern can be observed in the sub-cluster B14 (Supplemental Fig. 8, green bars). Overall, the cuticular collagens only show a relatively low sequence similarity (81% with <40% identity; Supplementary Fig. 8), with the exception of COL-126 and COL-127. These two proteins are identical at the amino acid level, but also at the nucleotide level (in both exons and introns), which raises the question of whether this is one gene misannotated as two collagens. We confirmed by PCR that col-126 and col-127 are indeed two distinct genes located next to each other in inverse direction (Supplementary Fig. 9), suggesting a very recent gene duplication event.

As the structural similarity of cuticular collagens is striking, it is very likely that they originate from a common evolutionary ancestor. The low sequence identity further suggests that upon gene multiplication cuticular collagens diversified to fulfill various important functions in C. elegans. However, as the prerequisites for collagen helices are relatively low, there is also the possibility that evolutionary pressure was mostly directed towards domain organization and less to the primary sequence.

Functions of cuticular collagens

Twenty one out of the 173 cuticular collagens have been isolated in genetic mutagenesis screens [10] (bli-1, bli-2, bli-6, dpy-2, dpy-3, dpy-4, dpy-5, dpy-7, dpy-8, dpy-9, dpy-10, dpy-13, dpy-14, dpy-17, lon-3, ram-2, rol-1, rol-6, sqt-1, sqt-2, sqt-3; Supplementary Table 4). Mutations in these cuticular collagens affect the synthesis or assembly of the cuticle and thereby alter the body morphology. These cuticular collagens are named based on their phenotype: long (lon-#) are about 1.5 times the length of wild type, dumpy (dpy-#) are short and fat-looking, roller (rol-#) roll around their helical axis instead of the sinusoid-curve crawling of wild type, blister (bli-#) show detachment of cuticular layer forming blisters along the body, Ray abnormal (ram-#) affects the morphology of the male tail, and squat (sqt-#) can lengthen, shorten, or helically twist C. elegans [10] (Supplementary Table 4). For instance, sqt-1(e1350) mutation leads to a R69C substitution altering the predicted furin RVRR cleavage site to RVRC before the collagen domains. These sqt-1(e1350) mutant C. elegans show stage-specific phenotypes: larval stage L1–2 are wild-type, L3 or dauer are rolling, and L4 are dumpy [64]. By contrast, sqt-1 null mutants are wild type [64], suggesting that the absence of SQT-1 collagen has no effect on the cuticular structure and there is redundancy among the cuticular collagens to compensate for the absence of SQT-1. However, the genotype to phenotype interpretation of how these collagens interact to from and integrate into their ECM is complex [65]. For instance, sqt-1 null mutations suppress the rolling phenotype of rol-6 mutants, suggesting that collagen ROL-6 gene product depends on the presence of collagen SQT-1[66], whereas both null mutations of sqt-1 or rol-6 suppress lon-3 mutant phenotype [67]. Taken together, with the complete matrisome list, it becomes now possible to start dissecting out the complex genetics underlying the formation of ECM structures in vivo.

Utilizing the in-silico C. elegans matrisome to annotate large datasets

RNA sequencing and proteomics are standard techniques used by many C. elegans research laboratories to elucidate physiological and pathological processes. In addition, genome-wide RNA interference (RNAi) screens are commonly used to identify the mechanism(s) underlying phenotypes of interest.

To demonstrate the applicability and power of our matrisome definition and classification, we used the Matrisome Annotator we developed here (http://ce-matrisome-annotator.permalink.cc/) to re-annotate existing datasets. We first re-analyzed our previously published study using transcriptomics to identify genes involved in longevity [15]. We found 79 matrisome genes out of the total 426 transcriptionally upregulated genes when comparing long-lived C. elegans under reduced Insulin/IGF-1 conditions with short-lived C. elegans that lack the oxidative stress transcription factor SKN-1/Nrf1,2,3 (Supplementary Table 5) [15]. Although, we previously recognized the upregulated collagens and potentially secreted proteases [15], the re-annotation of this data set paints a more complete picture to envision a remodeling of the ECM in long-lived C. elegans. Our list can also be used to annotate proteomic datasets. Here, we re-annotated a proteomic dataset from a recently published study aimed at studying longevity in C. elegans [68]. In contrast to the 11 collagens highlighted in their study, we found 25 matrisome proteins out of the 177 total upregulated proteins when comparing long-lived germ stem cell mutant glp-1 with wild-type C. elegans (Supplementary Table 6). Our additionally identified matrisome proteins includes laminin A and B (EPI-1 and LAM-1), prolyl 4-hydroxylase (DPY-18), and secreted proteases (Supplementary Table 6). Together with the 11 previously identified collagens [68], this suggests a potential remodeling of the ECM in long-lived C. elegans, consistent with the findings from the mRNA expression profile [15]. Last, we set out to re-annotate data from a whole-genome RNAi screen aimed at identifying antifungal innate immunity genes [69], since this would help to identify functional importance of matrisome genes. We found that 18 out of the 297 gene hits that regulate antimicrobial peptide gene expression are matrisome genes (Supplementary Table 7) [69]. These 18 matrisome genes include six cuticular collagens, three secreted proteases, and one collagen cross-linking enzyme (Supplementary Table 7), suggesting a potential role for strengthening or stiffening of the ECM to form a protective barrier against fungal infections.

By using the C. elegans Matrisome Annotator tool, we found substantial enrichment for matrisome genes in these data sets. Thus, re-analyzing -omic datasets with the C. elegans Matrisome Annotator tool may be useful to generate novel hypotheses about the role of the C. elegans matrisome for various biological processes.

Conclusions

Defining proteins in cellular compartments has helped understand their functions and implication in various processes. The ECM has been implicated in many biological processes. Components of the ECM have essential roles for C. elegans development, cell migration, and aging. In this study, we defined the C. elegans matrisome, an ensemble of ECM proteins and associated factors. We identified conserved and nematode-specific components, which informs biomedical research and provides potential targets to fight pathogenic nematodes. The categorization and clustering of C. elegans collagens lays the foundation to experimentally test, for example, whether cuticular collagens might form heterotrimers. Using the C. elegans Matrisome Annotator tool, we found enrichment of ECM genes at the mRNA, protein, and phenotypic level. This will assist researchers in delineating genotype-to-phenotype relationships for ECM genes. Modern science is hypothesis-driven. We hope that our contribution in defining the C. elegans matrisome and providing tools to analyze -omic data will aid generating novel hypotheses to propel science forward.

The following are the supplementary data related to this article.

Supplementary Fig. 1 COL-135 has a prolonged Gly-X-Y repeat with unusual amino acid composition compared to all other collagens.

(A) Amino acid occurrence in X and Y positions of the collagenous Gly-X-Y domains of COL-135 are indicated by dark and light colours, respectively. Amino acid composition of all collagens is depicted in green and of COL-135 in orange colours.

(B) Protein sequence of the longest COL-135 transcript. The collagenous domain is indicated by the pinkish box with imperfections shown as breaks. Glycine residues involved in the collagenous Gly-X-Y repeats are highlighted by a yellow background. Positively and negatively charged amino acids are marked by a blue or red background, respectively. Proline residues belonging to the collagenous domain are shown by a violet background.

Supplementary Fig. 2. Sequence logo of the alignment of the N-terminal region of cuticular collagens.

The 173 C. elegans cuticular collagens were aligned using the MUSCLE algorithm and manually curated. A sequence logo was created with the Weblogo 2.8 service [71].

Supplementary Fig. 3. Cluster A of the cuticular collagens.

Cluster A contains collagens with no imperfections in the main collagenous domain (blue). Sequences were grouped based on the length of their collagen domain and the existence of transmembrane domains. Predicted signal peptides, indicating secreted collagens, are depicted in light blue, transmembrane regions are coloured in pink. Only one representative collagen per cluster is shown, indicated by bold lettering. Green: furin-like protease sites; yellow: cysteines knots; dark-blue: collagen domain; red: N-terminal helical Gly-X-Y repeat; numbers in the boxes correspond to the number of Gly-X-Y repeats.

Supplementary Fig. 4. Cluster B of the cuticular collagens.

Cluster B contains collagens with one imperfection in the main collagenous domain (blue). Sequences were grouped based on the length of their collagen domain and sorted from the shortest to the longest non-interrupted C-terminal Gly-X-Y repeat. Only one representative collagen per cluster is shown, indicated by bold lettering. Sub clusters were formed when the structure of the collagenous domain was identical, but either certain sequence conservation in the non-collagenous domain or the presence of transmembrane domains justified further classification. Predicted signal peptides, indicating secreted collagens, are depicted in light blue, transmembrane regions are coloured in pink. Green: furin-like protease sites; yellow: cysteines knots; dark-blue: collagen domain; red: N-terminal helical Gly-X-Y repeat; numbers in the boxes correspond to the number of Gly-X-Y repeats.

Supplementary Fig. 5. Cluster C of the cuticular collagens.

Cluster C contains collagens with two imperfections in the main collagenous domain (dark blue). Sequences were grouped based on the length of their collagen domain and sorted from the shortest to the longest non-interrupted C-terminal Gly-X-Y repeat. Only one representative collagen per cluster is shown, indicated by bold lettering. Sub clusters were formed when the structure of the collagenous domain was identical, but either certain sequence conservation in the non-collagenous domains or the presence of transmembrane domains justified further classification. The sub-clusters C12 and C21 contain collagenous domains, where the central collagenous domain is only shifted by a single triplet. Predicted signal peptides, indicating secreted collagens, are depicted in light blue, transmembrane regions are coloured in pink. Green: furin-like protease sites; yellow: cysteines knots; dark-blue: collagen domain; red: N-terminal helical Gly-X-Y repeat; numbers in the boxes correspond to the number of Gly-X-Y repeats.

Supplementary Fig. 6. Cluster D of the cuticular collagens.

Cluster D contains collagens with three imperfections in the main collagenous domain (blue). Sequences were grouped based on the length of their collagen domain and sorted from the shortest to the longest non-interrupted C-terminal Gly-X-Y repeat. Only one representative collagen per cluster is shown, indicated by bold lettering. The D1 sub clusters were formed based on the perfect alignment of the main collagenous domain, with the exception of one glycine is missing at the beginning of the second collagenous domain. Predicted signal peptides, indicating secreted collagens, are depicted in light blue, transmembrane regions are coloured in pink. Green: furin-like protease sites; yellow: cysteines knots; dark-blue: collagen domain; red: N-terminal helical Gly-X-Y repeat; numbers in the boxes correspond to the number of Gly-X-Y repeats.

Supplementary Fig. 7. Cluster E of the cuticular collagens.

Cluster E contains collagens with >3 imperfections. Pink: transmembrane regions; green: furin-like protease sites; yellow: cysteines knots; dark-blue: collagen domain; red: N-terminal helical Gly-X-Y repeat; numbers in the boxes correspond to the number of Gly-X-Y repeats.

Supplementary Fig. 8. Sequence similarity of cuticular collagens.

Cuticular collagens were aligned and sorted using the Clustal Omega web service with default parameters (version 1.2.4, mBed algorithm, output order: aligned) [70]. The resultant percent identity matrix for pairwise comparisons was directly plotted using the heatmap.2 function from gplots in R (https://cran.r-project.org/web/packages/gplots/index.html) and coloured to indicate sequence identity above stated values. Dashed drop lines mark groups of genes in the heatmap with higher similarity. Genes belonging to sub-cluster B9 (green) and B14 (blue) are indicated by green and blue bars, respectively.

Supplementary Fig. 9. Cuticular collagens col-126 and col-127 are identical but separate genes.

(A) Genomic representation of the col-126 and col-127 genes. The black represents the DNA and the two genes are shown as blue boxes with the arrow head showing the direction of transcription (col-126: light blue; col-127: dark blue). The red lines represent the genes sequences which are reverse complementary, but 100% identical in nucleotide sequence. The green lines show the three sequencing PCRs, while the arrows on the lines indicate the primers. Under each green line the size of the PCR product is listed.

(B) Agarose gel with the three PCR products (A, B, C) from wild type N2 genomic DNA and the positive control (glp-1 PCR product of ~1.4 kb) and negative control (N2 genomic DNA without primers).

(C) The sequences of the primers used.

(D) Sequencing results for PCR product A and C.

mmc1.pdf (3.4MB, pdf)
Supplementary Table 1

The C. elegans matrisome.

A. Complete matrisome list.

B. List of identified matrisome gene families

mmc2.xlsx (101.9KB, xlsx)
Supplementary Table 2

Domain-based identification of matrisome proteins.

A. C. elegans reference proteome (UP000001940) downloaded from UniProt.

B. List of InterPro (IPR) domains to identify ECM-affiliated proteins and cuticle components.

C–J. Lists of proteins containing the listed InterPro (IPR) domain.

mmc3.xlsx (2MB, xlsx)
Supplementary Table 3

Gene Ontology terms used to identify putative matrisome components.

A. List of Gene Ontology terms used.

B. List of UniProt entries annotated by a Gene Ontology-Cellular Component term related to the ECM.

mmc4.xlsx (34.2KB, xlsx)
Supplementary Table 4

C. elegans collagens.

Complete list of cuticular collagens in C. elegans and their most interesting features: Column headers are as follows. Gene ID: Official protein name; Wormbase: Wormbase gene ID; Isoforms: currently known alternative splice forms; Clade: new classification; Aa: total length in amino acids; Philius Topology: Topology predicted by Philius server; Signalpeptide/resp. transmembrane: Position of either signal peptide or transmembrane regions predicted by Philius server [72]; Furin sites: Position of potential furin cleavages sites (RxxR); RGD sites: Position of potential integrin binding sequences RGD; N-Pro-Peptide: Sequence of the N-Pro-helix; CysKnot C-ProPep: Cysteines present after the N-Pro-helix; CysKnot N-Col: Sequence of cysteine knot N-terminally of the collagenous domain; Collagen domain/start: Start-position of the collagenous domain; Collagen/Col1–4&NC1–4: Length of nth collagenous and following non-collagenous domain; Collagen/total length: Total length of the collagen domain including interruptions; CysKnot-C-Col: Sequence of the C‑terminal cysteine knot; Length of C-Terminus: Sequence length after the collagenous domain (including the CysKnot C-Col)

mmc5.xlsx (36.9KB, xlsx)
Supplementary Table 5

Analysis of microarray transcriptomic data using the C. elegans Matrisome Annotator.

We used the list of differentially upregulated genes from the Supplementary Table 3 published in Ewald et al. [15]. We copied the 429 genes found in column “Public Name” of the “SKN-1-dependent upregulated genes under reduced IIS (ranked by SAM score)” into the C. elegans Matrisome Annotator. 24 out of the 429 genes were not mapped due to retired genes or sequence names. For this reason, WormBase gene IDs are preferred for gene input. We used WormBase to convert these 24 genes into WormBase gene IDs manually and used this updated list of 427 genes as the input list for the C. elegans Matrisome Annotator.

mmc6.xlsx (802.6KB, xlsx)
Supplementary Table 6

Analysis of proteomics data using the C. elegans Matrisome Annotator.

We used the 177 proteins up-regulated in glp-1(e2141) mutants vs. wild type N2 from the Supplementary Table S1 published in Pu et al. [68] as the input list for UniProt ID for the C. elegans Matrisome Annotator.

mmc7.xlsx (1.4MB, xlsx)
Supplementary Table 7

Analysis of genome-wide RNA interference screen using the C. elegans Matrisome Annotator.

We used the 297 candidate gene hits form the RNAi screen from the Supplementary Table S2 published in Zugasti et al. [69] as the input list for WormBase gene IDs for the C. elegans Matrisome Annotator.

mmc8.xlsx (676.2KB, xlsx)

Supplementary material

mmc9.pdf (63.8KB, pdf)

Acknowledgments

Acknowledgement

We thank Gary Williams from WormBase and Paolo Bazzicalupo for helpful discussions about col-126/-127 and cuticlin-like genes, respectively, and members of the Naba and Ewald labs for discussion and comments on the manuscript.

Funding sources

This work was supported by Swiss National Science Foundation [163898] to ACT, CS, and CYE, the work of AN and MND was supported by a start-up fund from the Department of Physiology and Biophysics at the University of Illinois at Chicago. JMG was supported by the Deutsche Forschungsgemeinschaft SFB829/B11.

Author contributions

All authors participated in analyzing and interpreting the data.

AN, ACT, MND, EJ, and CYE defined the in-vitro matrisome and analyzed expression data. CS developed the online matrisome annotation script.

JMG identified and established the C. elegans collagen classification.

JMG, AN, and CYE wrote the manuscript in consultation with the other authors.

Author information

The authors have no competing interests to declare.

Contributor Information

Jan M. Gebauer, Email: jan.gebauer@uni-koeln.de.

Alexandra Naba, Email: anaba@uic.edu.

Collin Y. Ewald, Email: collin-ewald@ethz.ch.

References

  • 1.Stepek G., Buttle D.J., Duce I.R., Behnke J.M. Human gastrointestinal nematode infections: are new control methods required? Int. J. Exp. Pathol. 2006;87:325–341. doi: 10.1111/j.1365-2613.2006.00495.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Partners for Parasite Control Meeting 3rd 2004 Geneva, World Health Organization. 2005. Deworming for health and development: report of the third global meeting of the partners for parasite control; pp. 1–64.http://www.who.int/iris/handle/10665/69005 [Google Scholar]
  • 3.Bernard G.C., Egnin M., Bonsi C. Nematology-Concepts, Diagnosis and Control. InTech; 2017. The impact of plant-parasitic nematodes on agriculture and methods of control; pp. 1–33. [DOI] [Google Scholar]
  • 4.Holden-Dye L., Walker R.J. Parasitic Helminths. Wiley-VCH Verlag GmbH & Co. KGaA; Weinheim, Germany: 2012. How relevant is Caenorhabditis elegans as a model for the analysis of parasitic nematode biology? pp. 23–41. [DOI] [Google Scholar]
  • 5.Ewald C.Y. Redox signaling of NADPH oxidases regulates oxidative stress responses, immunity and aging. Antioxidants. 2018;7:130–136. doi: 10.3390/antiox7100130. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Corsi A.K., Wightman B., Chalfie M. A transparent window into biology: a primer on Caenorhabditis elegans. Genetics. 2015;200:387–407. doi: 10.1534/genetics.115.176099. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Shaye D.D., Greenwald I. OrthoList: a compendium of C. elegans genes with human orthologues. PLoS ONE. 2011;6 doi: 10.1371/journal.pone.0020085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Kaletta T., Hengartner M.O. Finding function in novel targets: C. elegans as a model organism. Nat. Rev. Drug Discov. 2006;5:387–398. doi: 10.1038/nrd2031. [DOI] [PubMed] [Google Scholar]
  • 9.Culetto E., Sattelle D.B. A role for Caenorhabditis elegans in understanding the function and interactions of human disease genes. Hum. Mol. Genet. 2000;9:869–877. doi: 10.1093/hmg/9.6.869. [DOI] [PubMed] [Google Scholar]
  • 10.Page A.P., Johnstone I.L. The cuticle, WormBook: the online review of C. elegans. Biology. 2007:1–15. doi: 10.1895/wormbook.1.138.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Keeley D.P., Sherwood D.R. Tissue linkage through adjoining basement membranes: the long and the short term of it. Matrix Biol. 2018 doi: 10.1016/j.matbio.2018.05.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Adams J.C. Matricellular proteins: functional insights from non-mammalian animal models. Curr. Top. Dev. Biol. 2018;130:39–105. doi: 10.1016/bs.ctdb.2018.02.003. [DOI] [PubMed] [Google Scholar]
  • 13.Kramer J.M. Basement membranes, WormBook: the online review of C. elegans. Biology. 2005:1–15. doi: 10.1895/wormbook.1.16.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Sherwood D.R., Plastino J. Invading, leading and navigating cells in Caenorhabditis elegans: insights into cell movement in vivo. Genetics. 2018;208:53–78. doi: 10.1534/genetics.117.300082. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Ewald C.Y., Landis J.N., Porter Abate J., Murphy C.T., Blackwell T.K. Dauer-independent insulin/IGF-1-signalling implicates collagen remodelling in longevity. Nature. 2015;519:97–101. doi: 10.1038/nature14021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Hynes R.O., Naba A. Overview of the matrisome—an inventory of extracellular matrix constituents and functions. Cold Spring Harb. Perspect. Biol. 2012;4:a004903. doi: 10.1101/cshperspect.a004903. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Naba A., Clauser K.R., Hoersch S., Liu H., Carr S.A., Hynes R.O. The matrisome: in silico definition and in vivo characterization by proteomics of normal and tumor extracellular matrices. Mol. Cell. Proteomics. 2012;11 doi: 10.1074/mcp.M111.014647. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Naba A., Hoersch S., Hynes R.O. Towards definition of an ECM parts list: an advance on GO categories. Matrix Biol. 2012;31:371–372. doi: 10.1016/j.matbio.2012.11.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Nauroy P., Hughes S., Naba A., Ruggiero F. The in-silico zebrafish matrisome: a new tool to study extracellular matrix gene and protein functions. Matrix Biol. 2018;65:5–13. doi: 10.1016/j.matbio.2017.07.001. [DOI] [PubMed] [Google Scholar]
  • 20.Nauroy P., Guiraud A., Chlasta J., Malbouyres M., Gillet B., Hughes S. Gene profile of zebrafish fin regeneration offers clues to kinetics, organization and biomechanics of basement membrane. Matrix Biol. 2018 doi: 10.1016/j.matbio.2018.07.005. [DOI] [PubMed] [Google Scholar]
  • 21.Naba A., Clauser K.R., Lamar J.M., Carr S.A., Hynes R.O. Extracellular matrix signatures of human mammary carcinoma identify novel metastasis promoters. elife. 2014;3 doi: 10.7554/eLife.01308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Socovich A.M., Naba A. The cancer matrisome: from comprehensive characterization to biomarker discovery. Semin. Cell Dev. Biol. 2018 doi: 10.1016/j.semcdb.2018.06.005. [DOI] [PubMed] [Google Scholar]
  • 23.Zhou Y., Horowitz J.C., Naba A., Ambalavanan N., Atabai K., Balestrini J. Extracellular matrix in lung development, homeostasis and disease. Matrix Biol. 2018 doi: 10.1016/j.matbio.2018.03.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Massey V.L., Dolin C.E., Poole L.G., Hudson S.V., Siow D.L., Brock G.N. The hepatic “matrisome” responds dynamically to injury: characterization of transitional changes to the extracellular matrix in mice. Hepatology. 2017;65:969–982. doi: 10.1002/hep.28918. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Staiculescu M.C., Kim J., Mecham R.P., Wagenseil J.E. Mechanical behavior and matrisome gene expression in the aneurysm-prone thoracic aorta of newborn lysyl oxidase knockout mice. Am. J. Physiol. Heart Circ. Physiol. 2017;313:H446–H456. doi: 10.1152/ajpheart.00712.2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Naba A., Clauser K.R., Ding H., Whittaker C.A., Carr S.A., Hynes R.O. The extracellular matrix: tools and insights for the “omics” era. Matrix Biol. 2016;49:10–24. doi: 10.1016/j.matbio.2015.06.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Kinsella R.J., Kähäri A., Haider S., Zamora J., Proctor G., Spudich G. Ensembl BioMarts: a hub for data retrieval across taxonomic space. Database (Oxford) 2011 doi: 10.1093/database/bar030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.T. UniProt Consortium UniProt: the universal protein knowledgebase. Nucleic Acids Res. 2018;46:2699. doi: 10.1093/nar/gky092. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Cox G.N., Kusch M., Edgar R.S. Cuticle of Caenorhabditis elegans: its isolation and partial characterization. J. Cell Biol. 1981;90:7–17. doi: 10.1083/jcb.90.1.7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Sebastiano M., Lassandro F., Bazzicalupo P. cut-1 a Caenorhabditis elegans gene coding for a dauer-specific noncollagenous component of the cuticle. Dev. Biol. 1991;146:519–530. doi: 10.1016/0012-1606(91)90253-y. [DOI] [PubMed] [Google Scholar]
  • 31.Finn R.D., Attwood T.K., Babbitt P.C., Bateman A., Bork P., Bridge A.J. InterPro in 2017-beyond protein family and domain annotations. Nucleic Acids Res. 2017;45:D190–D199. doi: 10.1093/nar/gkw1107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Eddy S.R. Accelerated profile HMM searches. PLoS Comput. Biol. 2011;7 doi: 10.1371/journal.pcbi.1002195. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Finn R.D., Coggill P., Eberhardt R.Y., Eddy S.R., Mistry J., Mitchell A.L. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 2016;44:D279–D285. doi: 10.1093/nar/gkv1344. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Lee R.Y.N., Howe K.L., Harris T.W., Arnaboldi V., Cain S., Chan J. WormBase 2017: molting into a new stage. Nucleic Acids Res. 2018;46:D869–D874. doi: 10.1093/nar/gkx998. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Zhuang Y., Ma F., Li-Ling J., Xu X., Li Y. Comparative analysis of amino acid usage and protein length distribution between alternatively and non-alternatively spliced genes across six eukaryotic genomes. Mol. Biol. Evol. 2003;20:1978–1985. doi: 10.1093/molbev/msg203. [DOI] [PubMed] [Google Scholar]
  • 36.Olson S.K., Bishop J.R., Yates J.R., Oegema K., Esko J.D. Identification of novel chondroitin proteoglycans in Caenorhabditis elegans: embryonic cell division depends on CPG-1 and CPG-2. J. Cell Biol. 2006;173:985–994. doi: 10.1083/jcb.200603003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Drickamer K., Dodd R.B. C-type lectin-like domains in Caenorhabditis elegans: predictions from the complete genome sequence. Glycobiology. 1999;9:1357–1369. doi: 10.1093/glycob/9.12.1357. [DOI] [PubMed] [Google Scholar]
  • 38.Johnstone I.L. Cuticle collagen genes. Expression in Caenorhabditis elegans. Trends Genet. 2000;16:21–27. doi: 10.1016/s0168-9525(99)01857-0. [DOI] [PubMed] [Google Scholar]
  • 39.Bentley A.A., Adams J.C. The evolution of thrombospondins and their ligand-binding activities. Mol. Biol. Evol. 2010;27:2187–2197. doi: 10.1093/molbev/msq107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Whittaker C.A., Hynes R.O. Distribution and evolution of von Willebrand/integrin A domains: widely dispersed domains with roles in cell adhesion and elsewhere. Mol. Biol. Cell. 2002;13:3369–3387. doi: 10.1091/mbc.e02-05-0259. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Schwarzbauer J.E., Spencer C.S. The Caenorhabditis elegans homologue of the extracellular calcium binding protein SPARC/osteonectin affects nematode body morphology and mobility. Mol. Biol. Cell. 1993;4:941–952. doi: 10.1091/mbc.4.9.941. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Morrissey M.A., Jayadev R., Miley G.R., Blebea C.A., Chi Q., Ihara S. SPARC promotes cell invasion in vivo by decreasing type IV collagen levels in the basement membrane. PLoS Genet. 2016;12 doi: 10.1371/journal.pgen.1005905. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Segade F. Molecular evolution of the fibulins: implications on the functionality of the elastic fibulins. Gene. 2010;464:17–31. doi: 10.1016/j.gene.2010.05.003. [DOI] [PubMed] [Google Scholar]
  • 44.Woo W.-M., Berry E., Hudson M.L., Swale R.E., Goncharov A., Chisholm A.D. The C. elegans F-spondin family protein SPON-1 maintains cell adhesion in neural and non-neural tissues. Development. 2008;135:2747–2756. doi: 10.1242/dev.015289. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Karavanich C.A., Anholt R.R. Molecular evolution of olfactomedin. Mol. Biol. Evol. 1998;15:718–726. doi: 10.1093/oxfordjournals.molbev.a025975. [DOI] [PubMed] [Google Scholar]
  • 46.Ricard-Blum S. The collagen family. Cold Spring Harb. Perspect. Biol. 2011;3:a004978. doi: 10.1101/cshperspect.a004978. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Fidler A.L., Darris C.E., Chetyrkin S.V., Pedchenko V.K., Boudko S.P., Brown K.L. Collagen IV and basement membrane at the evolutionary dawn of metazoan tissues. elife. 2017;6 doi: 10.7554/eLife.24176. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Ozbek S., Balasubramanian P.G., Chiquet-Ehrismann R., Tucker R.P., Adams J.C. The evolution of extracellular matrix. Mol. Biol. Cell. 2010;21:4300–4305. doi: 10.1091/mbc.E10-03-0251. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Hutter H., Vogel B.E., Plenefisch J.D., Norris C.R., Proenca R.B., Spieth J. Conservation and novelty in the evolution of cell adhesion and extracellular matrix genes. Science. 2000;287:989–994. doi: 10.1126/science.287.5455.989. [DOI] [PubMed] [Google Scholar]
  • 50.Boot-Handford R.P., Tuckwell D.S. Fibrillar collagen: the key to vertebrate evolution? A tale of molecular incest. Bioessays. 2003;25:142–151. doi: 10.1002/bies.10230. [DOI] [PubMed] [Google Scholar]
  • 51.Sibley M.H., Johnson J.J., Mello C.C., Kramer J.M. Genetic identification, sequence, and alternative splicing of the Caenorhabditis elegans alpha 2(IV) collagen gene. J. Cell Biol. 1993;123:255–264. doi: 10.1083/jcb.123.1.255. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Guo X.D., Johnson J.J., Kramer J.M. Embryonic lethality caused by mutations in basement membrane collagen of C. elegans. Nature. 1991;349:707–709. doi: 10.1038/349707a0. [DOI] [PubMed] [Google Scholar]
  • 53.Sibley M.H., Graham P.L., von Mende N., Kramer J.M. Mutations in the alpha 2(IV) basement membrane collagen gene of Caenorhabditis elegans produce phenotypes of differing severities. EMBO J. 1994;13:3278–3285. doi: 10.1002/j.1460-2075.1994.tb06629.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Ramchandran R., Dhanabal M., Volk R., Waterman M.J., Segal M., Lu H. Antiangiogenic activity of restin, NC10 domain of human collagen XV: comparison to endostatin. Biochem. Biophys. Res. Commun. 1999;255:735–739. doi: 10.1006/bbrc.1999.0248. [DOI] [PubMed] [Google Scholar]
  • 55.Heljasvaara R., Aikio M., Ruotsalainen H., Pihlajaniemi T. Collagen XVIII in tissue homeostasis and dysregulation - lessons learned from model organisms and human patients. Matrix Biol. 2017;57–58:55–75. doi: 10.1016/j.matbio.2016.10.002. [DOI] [PubMed] [Google Scholar]
  • 56.Ackley B.D., Crew J.R., Elamaa H., Pihlajaniemi T., Kuo C.J., Kramer J.M. The NC1/endostatin domain of Caenorhabditis elegans type XVIII collagen affects cell migration and axon guidance. J. Cell Biol. 2001;152:1219–1232. doi: 10.1083/jcb.152.6.1219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Taylor J., Unsoeld T., Hutter H. The transmembrane collagen COL-99 guides longitudinally extending axons in C. elegans. Mol. Cell. Neurosci. 2018;89:9–19. doi: 10.1016/j.mcn.2018.03.003. [DOI] [PubMed] [Google Scholar]
  • 58.Tu H., Huhtala P., Lee H.-M., Adams J.C., Pihlajaniemi T. Membrane-associated collagens with interrupted triple-helices (MACITs): evolution from a bilaterian common ancestor and functional conservation in C. elegans. BMC Evol. Biol. 2015;15:281. doi: 10.1186/s12862-015-0554-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Loria P.M., Hodgkin J., Hobert O. A conserved postsynaptic transmembrane protein affecting neuromuscular signaling in Caenorhabditis elegans. J. Neurosci. 2004;24:2191–2201. doi: 10.1523/JNEUROSCI.5462-03.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Emtage L., Gu G., Hartwieg E., Chalfie M. Extracellular proteins organize the mechanosensory channel complex in C. elegans touch receptor neurons. Neuron. 2004;44:795–807. doi: 10.1016/j.neuron.2004.11.010. [DOI] [PubMed] [Google Scholar]
  • 61.Cueva J.G., Mulholland A., Goodman M.B. Nanoscale organization of the MEC-4 DEG/ENaC sensory mechanotransduction channel in Caenorhabditis elegans touch receptor neurons. J. Neurosci. 2007;27:14089–14098. doi: 10.1523/JNEUROSCI.4179-07.2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Bergmann D.C., Crew J.R., Kramer J.M., Wood W.B. Cuticle chirality and body handedness in Caenorhabditis elegans. Dev. Genet. 1998;23:164–174. doi: 10.1002/(SICI)1520-6408(1998)23:3&#x0003c;164::AID-DVG2&#x0003e;3.0.CO;2-C. [DOI] [PubMed] [Google Scholar]
  • 63.Kramer J.M. Structures and functions of collagens in Caenorhabditis elegans. FASEB J. 1994;8:329–336. doi: 10.1096/fasebj.8.3.8143939. [DOI] [PubMed] [Google Scholar]
  • 64.Kramer J.M., Johnson J.J., Edgar R.S., Basch C., Roberts S. The sqt-1 gene of C. elegans encodes a collagen critical for organismal morphogenesis. Cell. 1988;55:555–565. doi: 10.1016/0092-8674(88)90214-0. [DOI] [PubMed] [Google Scholar]
  • 65.Page A.P., Winter A.D. Enzymes involved in the biogenesis of the nematode cuticle. Adv. Parasitol. 2003;53:85–148. doi: 10.1016/s0065-308x(03)53003-2. [DOI] [PubMed] [Google Scholar]
  • 66.Kramer J.M., Johnson J.J. Analysis of mutations in the sqt-1 and rol-6 collagen genes of Caenorhabditis elegans. Genetics. 1993;135:1035–1045. doi: 10.1093/genetics/135.4.1035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Nyström J., Shen Z.-Z., Aili M., Flemming A.J., Leroi A., Tuck S. Increased or decreased levels of Caenorhabditis elegans lon-3, a gene encoding a collagen, cause reciprocal changes in body length. Genetics. 2002;161:83–97. doi: 10.1093/genetics/161.1.83. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Pu Y.-Z., Wan Q.-L., Ding A.-J., Luo H.-R., Wu G.-S. Quantitative proteomics analysis of Caenorhabditis elegans upon germ cell loss. J Proteomics. 2017;156:85–93. doi: 10.1016/j.jprot.2017.01.011. [DOI] [PubMed] [Google Scholar]
  • 69.Zugasti O., Thakur N., Belougne J., Squiban B., Kurz C.L., Soulé J. A quantitative genome-wide RNAi screen in C. elegans for antifungal innate immunity genes. BMC Biol. 2016;14:35. doi: 10.1186/s12915-016-0256-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Sievers F., Wilm A., Dineen D., Gibson T.J., Karplus K., Li W. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 2011;7:539. doi: 10.1038/msb.2011.75. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Crooks G.E., Hon G., Chandonia J.-M., Brenner S.E. WebLogo: a sequence logo generator. Genome Res. 2004;14:1188–1190. doi: 10.1101/gr.849004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Reynolds S.M., Käll L., Riffle M.E., Bilmes J.A., Noble W.S. Transmembrane topology and signal peptide prediction using dynamic bayesian networks. PLoS Comput. Biol. 2008;4 doi: 10.1371/journal.pcbi.1000213. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Fig. 1 COL-135 has a prolonged Gly-X-Y repeat with unusual amino acid composition compared to all other collagens.

(A) Amino acid occurrence in X and Y positions of the collagenous Gly-X-Y domains of COL-135 are indicated by dark and light colours, respectively. Amino acid composition of all collagens is depicted in green and of COL-135 in orange colours.

(B) Protein sequence of the longest COL-135 transcript. The collagenous domain is indicated by the pinkish box with imperfections shown as breaks. Glycine residues involved in the collagenous Gly-X-Y repeats are highlighted by a yellow background. Positively and negatively charged amino acids are marked by a blue or red background, respectively. Proline residues belonging to the collagenous domain are shown by a violet background.

Supplementary Fig. 2. Sequence logo of the alignment of the N-terminal region of cuticular collagens.

The 173 C. elegans cuticular collagens were aligned using the MUSCLE algorithm and manually curated. A sequence logo was created with the Weblogo 2.8 service [71].

Supplementary Fig. 3. Cluster A of the cuticular collagens.

Cluster A contains collagens with no imperfections in the main collagenous domain (blue). Sequences were grouped based on the length of their collagen domain and the existence of transmembrane domains. Predicted signal peptides, indicating secreted collagens, are depicted in light blue, transmembrane regions are coloured in pink. Only one representative collagen per cluster is shown, indicated by bold lettering. Green: furin-like protease sites; yellow: cysteines knots; dark-blue: collagen domain; red: N-terminal helical Gly-X-Y repeat; numbers in the boxes correspond to the number of Gly-X-Y repeats.

Supplementary Fig. 4. Cluster B of the cuticular collagens.

Cluster B contains collagens with one imperfection in the main collagenous domain (blue). Sequences were grouped based on the length of their collagen domain and sorted from the shortest to the longest non-interrupted C-terminal Gly-X-Y repeat. Only one representative collagen per cluster is shown, indicated by bold lettering. Sub clusters were formed when the structure of the collagenous domain was identical, but either certain sequence conservation in the non-collagenous domain or the presence of transmembrane domains justified further classification. Predicted signal peptides, indicating secreted collagens, are depicted in light blue, transmembrane regions are coloured in pink. Green: furin-like protease sites; yellow: cysteines knots; dark-blue: collagen domain; red: N-terminal helical Gly-X-Y repeat; numbers in the boxes correspond to the number of Gly-X-Y repeats.

Supplementary Fig. 5. Cluster C of the cuticular collagens.

Cluster C contains collagens with two imperfections in the main collagenous domain (dark blue). Sequences were grouped based on the length of their collagen domain and sorted from the shortest to the longest non-interrupted C-terminal Gly-X-Y repeat. Only one representative collagen per cluster is shown, indicated by bold lettering. Sub clusters were formed when the structure of the collagenous domain was identical, but either certain sequence conservation in the non-collagenous domains or the presence of transmembrane domains justified further classification. The sub-clusters C12 and C21 contain collagenous domains, where the central collagenous domain is only shifted by a single triplet. Predicted signal peptides, indicating secreted collagens, are depicted in light blue, transmembrane regions are coloured in pink. Green: furin-like protease sites; yellow: cysteines knots; dark-blue: collagen domain; red: N-terminal helical Gly-X-Y repeat; numbers in the boxes correspond to the number of Gly-X-Y repeats.

Supplementary Fig. 6. Cluster D of the cuticular collagens.

Cluster D contains collagens with three imperfections in the main collagenous domain (blue). Sequences were grouped based on the length of their collagen domain and sorted from the shortest to the longest non-interrupted C-terminal Gly-X-Y repeat. Only one representative collagen per cluster is shown, indicated by bold lettering. The D1 sub clusters were formed based on the perfect alignment of the main collagenous domain, with the exception of one glycine is missing at the beginning of the second collagenous domain. Predicted signal peptides, indicating secreted collagens, are depicted in light blue, transmembrane regions are coloured in pink. Green: furin-like protease sites; yellow: cysteines knots; dark-blue: collagen domain; red: N-terminal helical Gly-X-Y repeat; numbers in the boxes correspond to the number of Gly-X-Y repeats.

Supplementary Fig. 7. Cluster E of the cuticular collagens.

Cluster E contains collagens with >3 imperfections. Pink: transmembrane regions; green: furin-like protease sites; yellow: cysteines knots; dark-blue: collagen domain; red: N-terminal helical Gly-X-Y repeat; numbers in the boxes correspond to the number of Gly-X-Y repeats.

Supplementary Fig. 8. Sequence similarity of cuticular collagens.

Cuticular collagens were aligned and sorted using the Clustal Omega web service with default parameters (version 1.2.4, mBed algorithm, output order: aligned) [70]. The resultant percent identity matrix for pairwise comparisons was directly plotted using the heatmap.2 function from gplots in R (https://cran.r-project.org/web/packages/gplots/index.html) and coloured to indicate sequence identity above stated values. Dashed drop lines mark groups of genes in the heatmap with higher similarity. Genes belonging to sub-cluster B9 (green) and B14 (blue) are indicated by green and blue bars, respectively.

Supplementary Fig. 9. Cuticular collagens col-126 and col-127 are identical but separate genes.

(A) Genomic representation of the col-126 and col-127 genes. The black represents the DNA and the two genes are shown as blue boxes with the arrow head showing the direction of transcription (col-126: light blue; col-127: dark blue). The red lines represent the genes sequences which are reverse complementary, but 100% identical in nucleotide sequence. The green lines show the three sequencing PCRs, while the arrows on the lines indicate the primers. Under each green line the size of the PCR product is listed.

(B) Agarose gel with the three PCR products (A, B, C) from wild type N2 genomic DNA and the positive control (glp-1 PCR product of ~1.4 kb) and negative control (N2 genomic DNA without primers).

(C) The sequences of the primers used.

(D) Sequencing results for PCR product A and C.

mmc1.pdf (3.4MB, pdf)
Supplementary Table 1

The C. elegans matrisome.

A. Complete matrisome list.

B. List of identified matrisome gene families

mmc2.xlsx (101.9KB, xlsx)
Supplementary Table 2

Domain-based identification of matrisome proteins.

A. C. elegans reference proteome (UP000001940) downloaded from UniProt.

B. List of InterPro (IPR) domains to identify ECM-affiliated proteins and cuticle components.

C–J. Lists of proteins containing the listed InterPro (IPR) domain.

mmc3.xlsx (2MB, xlsx)
Supplementary Table 3

Gene Ontology terms used to identify putative matrisome components.

A. List of Gene Ontology terms used.

B. List of UniProt entries annotated by a Gene Ontology-Cellular Component term related to the ECM.

mmc4.xlsx (34.2KB, xlsx)
Supplementary Table 4

C. elegans collagens.

Complete list of cuticular collagens in C. elegans and their most interesting features: Column headers are as follows. Gene ID: Official protein name; Wormbase: Wormbase gene ID; Isoforms: currently known alternative splice forms; Clade: new classification; Aa: total length in amino acids; Philius Topology: Topology predicted by Philius server; Signalpeptide/resp. transmembrane: Position of either signal peptide or transmembrane regions predicted by Philius server [72]; Furin sites: Position of potential furin cleavages sites (RxxR); RGD sites: Position of potential integrin binding sequences RGD; N-Pro-Peptide: Sequence of the N-Pro-helix; CysKnot C-ProPep: Cysteines present after the N-Pro-helix; CysKnot N-Col: Sequence of cysteine knot N-terminally of the collagenous domain; Collagen domain/start: Start-position of the collagenous domain; Collagen/Col1–4&NC1–4: Length of nth collagenous and following non-collagenous domain; Collagen/total length: Total length of the collagen domain including interruptions; CysKnot-C-Col: Sequence of the C‑terminal cysteine knot; Length of C-Terminus: Sequence length after the collagenous domain (including the CysKnot C-Col)

mmc5.xlsx (36.9KB, xlsx)
Supplementary Table 5

Analysis of microarray transcriptomic data using the C. elegans Matrisome Annotator.

We used the list of differentially upregulated genes from the Supplementary Table 3 published in Ewald et al. [15]. We copied the 429 genes found in column “Public Name” of the “SKN-1-dependent upregulated genes under reduced IIS (ranked by SAM score)” into the C. elegans Matrisome Annotator. 24 out of the 429 genes were not mapped due to retired genes or sequence names. For this reason, WormBase gene IDs are preferred for gene input. We used WormBase to convert these 24 genes into WormBase gene IDs manually and used this updated list of 427 genes as the input list for the C. elegans Matrisome Annotator.

mmc6.xlsx (802.6KB, xlsx)
Supplementary Table 6

Analysis of proteomics data using the C. elegans Matrisome Annotator.

We used the 177 proteins up-regulated in glp-1(e2141) mutants vs. wild type N2 from the Supplementary Table S1 published in Pu et al. [68] as the input list for UniProt ID for the C. elegans Matrisome Annotator.

mmc7.xlsx (1.4MB, xlsx)
Supplementary Table 7

Analysis of genome-wide RNA interference screen using the C. elegans Matrisome Annotator.

We used the 297 candidate gene hits form the RNAi screen from the Supplementary Table S2 published in Zugasti et al. [69] as the input list for WormBase gene IDs for the C. elegans Matrisome Annotator.

mmc8.xlsx (676.2KB, xlsx)

Supplementary material

mmc9.pdf (63.8KB, pdf)

Articles from Matrix Biology Plus are provided here courtesy of Elsevier

RESOURCES