Skip to main content
Data in Brief logoLink to Data in Brief
. 2023 Nov 19;52:109831. doi: 10.1016/j.dib.2023.109831

Data analysis of polygalacturonase inhibiting proteins (PGIPs) from agriculturally important proteomes

Sudha Acharya a,b, Hallie A Troell c,Ɵ, Rebecca L Billingsley d,ʘ, Katherine S Lawrence e, Daniel S McKirgan a, Nadim W Alkharouf a, Vincent P Klink b,Ω,Ψ,¥,
PMCID: PMC10698527  PMID: 38076472

Abstract

The plant cell wall structure can be altered by pathogen-secreted polygalacturonases (PGs) that cleave the α-(1→4) linkages occurring between D-galacturonic acid residues in homogalacturonan. The activity of the PGs leads to cell wall maceration, facilitating infection. Plant PG inhibiting proteins (PGIPs) impede pathogen PGs, impairing infection and leading to the ability of the plant to resist infection. Analyses show the Glycine max PGIP11 (GmPGIP11) is expressed within a root cell that is parasitized by the pathogenic nematode Heterodera glycines, the soybean cyst nematode (SCN), but while undergoing a defence response that leads to its demise. Transgenic experiments show GmPGIP11 overexpression leads to a successful defence response, while the overexpression of a related G. max PGIP, GmPGIP1 does not, indicating a level of specificity. The analyses presented here have identified PGIPs from 51 additional studied proteomes, many of agricultural importance. The analyses include the computational identification of signal peptides and their cleavage sites, O-, and N-glycosylation. Artificial intelligence analyses determine the location where the processed protein localize. The identified PGIPs are presented as a tool base from which functional transgenics can be performed to determine whether they may have a role in plant-pathogen interactions.

Keywords: Plant interactions, Polygalacturonase inhibiting protein (PGIP), Soybean, Heterodera glycines, Beta vulgaris, Sugar beet


Specifications Table

Subject Biological sciences
Specific subject area Omics: Genomics
Data format Raw, Analysed, Filtered
Type of data Table, Figure
Data collection Analysed proteomes
The 11 G. max PGIP protein sequences are used in Basic Local Alignment Search Tool program (BLAST) searches of the proteomes (BLASTP) using the default parameters at Phytozome (http://www.phytozome.net/). The identified PGIP proteins are compiled using a Bitscore of 140 as a cutoff. To identify the PGIP proteins, each of the 11 G. max PGIP protein sequences are queried into the studied proteomes. The individual queries for GmPGIP1 through GmPGIP11 are stored in individual tabs in Excel. Then, the PGIPs that have Bitscores of 140 or higher are compiled for all of the queries for the individual GmPGIPs. The duplicate PGIPs then are removed in Excel. The analysis results in a list of PGIP proteins that include the products of alternate splicing so the numbers in some cases are higher than the numbers of genes in some genomes.
Signal peptide prediction
Signal peptide prediction is done using SignalP 6.0. The default parameters are used.
O-glycosylation determination
O-glycosylation is determined using NetOGlyc - 4.0. The parameters are set on default.
N-glycosylation determination
N-glycosylation is determined using NetNGlyc - 1.0. The parameters are on set default.
Protein alignment
Protein alignment is performed using CLUSTAL Omega, CLUSTAL O(1.2.4) multiple sequence alignment. The analysis is performed using default parameters.
Artificial intelligence
Prediction of eukaryotic protein subcellular localization using deep learning is done using DeepLoc-1.0 in default settings.
Data source location Data obtained from Phytozome (http://www.phytozome.net/)
Data accessibility Direct URL to data: https://data.mendeley.com/datasets/66r9pkckjz/1

1. Value of the Data

  • Why are these data valuable?

Plants have a 2-tiered defense platform allowing them to defend themselves from pathogens [1]. The plant recognizes epitopes produced directly or indirectly as a consequence of the plant-pathogen interaction [1]. The epitopes are collectively called pathogen activated molecular patterns (PAMPs) acting within a 2-tiered defense system involving PAMP (pattern) triggered immunity (PTI) and effector triggered immunity (ETI) [1].

Plant cell walls are an important barrier to pathogen infection. Up to 60% of the cell wall pectic moieties of dicot and nongraminaceous monocot primary cell walls are homogalacturonans (HGs), the major component of the middle lamella [2]. Pathogen polygalacturonases (PGs) are effective in facilitating pathogenicity because they break down cell wall polymers, permitting infection [3]. The study presented here is valuable to those interested in understanding plant defence, the evolutionary processes behind defence processes, cell signalling, and an understanding of basic cellular processes.

  • Who can benefit from these data?

In order to impede pathogen PGs, plants secrete polygalacturonase inhibiting proteins (PGIPs). PGIPs have a bimodal function. Firstly, PGIPs directly inhibit PGs. Secondly, PG activity leads to oligogalacturonide (OG) accumulation, eliciting a defence response [4]. Therefore, PGIPs deactivate the pathogen effector while also leading to the production and amplification of a signalling cascade. This signal cascade further impairs the pathogen, leading to their demise. For example, a Beta vulgaris (sugar beet) PGIP, when expressed in Nicotiana benthamiana, limits the pathogenicity of Rhizoctonia solaniFusarium solani, and Botrytis cinerea whose pathogenicity is normally driven by their PGs [5]. Previous work on G. max PGIPs (GmPGIPs) have functionally examined them [6], benefitting stakeholders interested in the development of pathogen-resistant crops, including Beta vulgaris ssp. vulgaris (sugar beet). Novel signalling events can also be determined through the study presented here.

  • How can these data be reused by other researchers?

Using 11 G. max PGIP protein sequences, the analysis presented here extracts the PGIPs that exist in 51 additional genomes of other important crops and other flowering plants. Analyses determine whether the 469 proteins have signal sequences, compatible with them being secreted proteins, a cleavage site, and whether they are O- and/or N-glycosylated. Artificial Intelligence analyses show which cellular locale the proteins can be expected to exist, complementing recent transgenic studies of the GmPGIP11. The provided analysis and accompanying data can be re-used to basic aspects of plant cell biology and generate pathogen-resistance in a wide spectrum of agriculturally-important crops. The evolution of defence and signalling processes can also be examined.

2. Data Description

A total of 469 proteins obtained from Phytozome, not including G. max, are analysed, spanning 51 proteomes (Table 1, Supplemental Data File 1) [7]. The proteins annotated as probable PGIPs pass a cutoff between 300 and 399 AAs, within the range of known PGIPs. Among them, 394 putative PGIPs are between 300 and 399 AAs (84%). Among the 51 proteomes, 45 (9.6%) are shorter than 300 AAs. Furthermore, 30 (6.4%) proteins annotated as PGIPs are identified as being 400 AAs or larger. LRRS have a low overall homology based on the LRR composition. For example, BvPGIP6 (EL10Ac4g07809.1) is annotated as being 1,383 AAs. When a BLASTP analysis is run, it is shown to be homologous to the 1,249 AA GASSHO1 (GSO1) (OAO97463.1) as well as the 332 AA polygalacturonase inhibiting protein 1 (AAM65836.1). A re-annotation of the PGIPs is beyond the scope of the study.

Table 1.

The proteomes under study.

Genome number Species genome Order Family
1 Amborella trichopoda Amborella trichopoda v1.0 Amborellales Amborellaceae
2 Amaranthus hypochondriacus Amaranthus hypochondriacus v2.1 Caryophyllales Amaranthaceae
3 Beta vulgaris Beta vulgaris EL10_1.0 Caryophyllales Amaranthaceae
4 Chenopodium quinoa Chenopodium quinoa v1.0 Caryophyllales Amaranthaceae
5 Spinacia oleracea Spinacia oleracea Spov3 Caryophyllales Amaranthaceae
6 Coffea arabica Coffea arabica v0.5 Gentianales Rubiaceae
7 Daucus carota Daucus carota v2.0 Apiales Apiaceae
8 Helianthus annuus Helianthus annuus r1.2 Asterales Asteraceae
9 Lactuca sativa Lactuca sativa V8 Asterales Asteraceae
10 Mimulus guttatus M.guttatus_TOL v5.0 Lamiales Phrymaceae
11 Olea europaea Olea europaea v1.0 Lamiales Oleaceae
12 Solanum lycopersicum Solanum lycopersicum ITAG4.0 Solanales Solanaceae
13 Solanum tuberosum Solanum tuberosum v6.1 Solanales Solanaceae
14 Vaccinium darrowii V.darrowii v1.2 Ericales Ericaceae
15 Eucalyptus grandis Eucalyptus grandis v2.0 Myrtales Myrtaceae
16 Vitis vinifera Vitis vinifera v2.1 Vitales Vitaceae
17 Arachis hypogaea Arachis hypogaea v1.0 Fabales Fabaceae
18 Castanea dentata Castanea dentata v1.1 Fagales Fagaceae
19 Cicer arietinum Cicer arietinum v1.0 Fabales Fabaceae
20 Cucumis sativus Cucumis sativus v1.0 Cucurbitales Cucurbitaceae
21 Fragaria vesca Fragaria vesca v4.0.a2 Rosales Rosaceae
22 Malus domestica Malus domestica v1.1 Rosales Rosaceae
23 Medicago truncatula Medicago truncatula Mt4.0v1 Fabales Fabaceae
24 Phaseolus vulgaris Phaseolus vulgaris v2.1 Fabales Fabaceae
25 Prunus persica Prunus persica v2.1 Rosales Rosaceae
26 Quercus rubra Quercus rubra v2.1 Fagales Fagaceae
27 Trifolium pratense Trifolium pratense v2 Fabales Fabaceae
28 Carya illinoinensis Carya illinoinensis v1.1 Fabales Juglandaceae
29 Vigna unguiculata Vigna unguiculata v1.2 Fabales Fabaceae
30 Linum usitatissimum Linum usitatissimum v1.0 Malpighiales Linaceae
31 Manihot esculenta Manihot esculenta v8.1 Malpighiales Euphorbiaceae
32 Carica papaya Carica papaya ASGPBv0.4 Brassicales Caricaceae
33 Theobroma cacao Theobroma cacao v2.1 Malvales Malvaceae
34 Arabidopsis thaliana Arabidopsis thaliana TAIR10 Brassicales Brassicaceae
35 Schrenkiella parvula Schrenkiella parvula v2.2 Brassicales Brassicaceae
36 Brassica oleracea capitata Brassica oleracea capitata v1.0 Brassicales Brassicaceae
37 Brassica rapa Brassica rapa FPsc v1.3 Brassicales Brassicaceae
38 Sinapis alba Sinapis alba v3.1 Brassicales Brassicaceae
39 Gossypium hirsutum Gossypium hirsutum v2.1 Malvales Malvaceae
40 Citrus sinensis Citrus sinensis v1.1 Sapindales Rutaceae
41 Ananas comosus Ananas comosus v3 Poales Bromeliaceae
42 Dioscorea alata Dioscorea alata v2.1 Dioscoreales Dioscoreaceae
43 Musa acuminata Musa acuminata v1 Zingiberales Musaceae
44 Hordeum vulgare Hordeum vulgare r1 Poales Poaceae
45 Oryza sativa Oryza sativa v7.0 Poales Poaceae
46 Triticum aestivum Triticum aestivum v2.2 Poales Poaceae
47 Brachypodium distachyon Brachypodium distachyon v3.2 Poales Poaceae
48 Miscanthus sinensis Miscanthus sinensis v7.1 Poales Poaceae
49 Sorghum bicolor Sorghum bicolor v3.1.1 Poales Poaceae
50 Zea mays Zea mays RefGen_V4 Poales Poaceae
51 Panicum hallii Panicum hallii v3.2 Poales Poaceae
52 Glycine max G.max Wm82.a2.v1 Fabales Fabaceae

2.1. Signal peptide prediction

Signal peptide prediction is performed to determine whether the identified 469 proteins exhibiting homology to PGIP have characteristics of secreted proteins (Supplemental Data File 2). The protein sequences are imported into SignalP 6.0 [8,9]. The number of putative PGIPs with predicted signal peptides are identified (Table 2; Supplemental Data File 3).

Table 2.

The identified PGIPs.

Genome number Species genome PGIP proteins
1 Amborella trichopoda Amborella trichopoda v1.0 2
2 Amaranthus hypochondriacus Amaranthus hypochondriacus v2.1 6
3 Beta vulgaris Beta vulgaris EL10_1.0 9
4 Chenopodium quinoa Chenopodium quinoa v1.0 22
5 Spinacia oleracea Spinacia oleracea Spov3 6
6 Coffea arabica Coffea arabica v0.5 15
7 Daucus carota Daucus carota v2.0 9
8 Helianthus annuus Helianthus annuus r1.2 8
9 Lactuca sativa Lactuca sativa V8 13
10 Mimulus guttatus M.guttatus_TOL v5.0 10
11 Olea europaea Olea europaea v1.0 5
12 Solanum lycopersicum Solanum lycopersicum ITAG4.0 5
13 Solanum tuberosum Solanum tuberosum v6.1 3
14 Eucalyptus grandis Eucalyptus grandis v2.0 9
15 Vitis vinifera Vitis vinifera v2.1 4
16 Arachis hypogaea Arachis hypogaea v1.0 17
17 Castanea dentata Castanea dentata v1.1 5
18 Cicer arietinum Cicer arietinum v1.0 8
19 Cucumis sativus Cucumis sativus v1.0 3
20 Fragaria vesca Fragaria vesca v4.0.a2 4
21 Malus domestica Malus domestica v1.1 8
22 Medicago truncatula Medicago truncatula Mt4.0v1 23
23 Phaseolus vulgaris Phaseolus vulgaris v2.1 9
24 Prunus persica Prunus persica v2.1 5
25 Quercus rubra Quercus rubra v2.1 7
26 Trifolium pratense Trifolium pratense v2 16
27 Carya illinoinensis Carya illinoinensis v1.1 3
28 Vigna unguiculata Vigna unguiculata v1.2 10
29 Linum usitatissimum Linum usitatissimum v1.0 9
30 Manihot esculenta Manihot esculenta v8.1 6
31 Carica papaya Carica papaya ASGPBv0.4 3
32 Theobroma cacao Theobroma cacao v2.1 4
33 Arabidopsis thaliana Arabidopsis thaliana TAIR10 6
34 Schrenkiella parvula Schrenkiella parvula v2.2 6
35 Brassica oleracea capitata Brassica oleracea capitata v1.0 19
36 Brassica rapa Brassica rapa FPsc v1.3 19
37 Sinapis alba Sinapis alba v3.1 26
38 Gossypium hirsutum Gossypium hirsutum v2.1 6
39 Citrus sinensis Citrus sinensis v1.1 4
40 Ananas comosus Ananas comosus v3 5
41 Dioscorea alata Dioscorea alata v2.1 9
42 Musa acuminata Musa acuminata v1 5
43 Hordeum vulgare Hordeum vulgare r1 13
44 Oryza sativa Oryza sativa v7.0 9
45 Triticum aestivum Triticum aestivum v2.2 19
46 Brachypodium distachyon Brachypodium distachyon v3.2 7
47 Miscanthus sinensis Miscanthus sinensis v7.1 13
48 Sorghum bicolor Sorghum bicolor v3.1.1 8
49 Zea mays Zea mays RefGen_V4 9
50 Panicum hallii Panicum hallii v3.2 7
51 Vaccinium darrowii V.darrowii v1.2 13
TOTAL 469

2.2. Comparison of O- and N-glycosylation of GmPGIPs

A companion analysis demonstrates that GmPGIP11 but not GmPGIP1 functions in the defence response that G. max has toward H. glycines parasitism. A comparative analysis of G. max PGIPs is undertaken to determine whether O- and/or N-glycosylation could be correlated to these differences. The O-glycosylation analysis demonstrates that while GmPGIP1 is O-glycosylated, GmPGIP11 is not (Table 3; Supplemental Data File 4).

Table 3.

The signal peptide prediction, O-, N-glycosylation prediction, cellular location prediction.

Species/PGIP protein SP O glyc N-glyc Location Species/PGIP protein SP O glyc N-glyc Location
Amborella trichopoda Vigna unguiculata
AmtPGIP1 y Y y Extracellular VuPGIP1 y y y Extracellular
AmtPGIP2 y Y y Extracellular VuPGIP2 y n y Extracellular
Amaranthus hypochondriacus VuPGIP3 y n y Extracellular
AhyPGIP1 y N y Extracellular VuPGIP4 y n y Extracellular
AhyPGIP2 y N y Extracellular VuPGIP5 y n y Extracellular
AhyPGIP3 y N y Extracellular VuPGIP6 n y y Extracellular
AhyPGIP4 n N y Cytoplasm VuPGIP7 y n y Extracellular
AhyPGIP5 y Y y Extracellular VuPGIP8 y n y Extracellular
AhyPGIP6 n N y Extracellular VuPGIP9 y n y Extracellular
Beta vulgaris VuPGIP10 y n y Extracellular
BvPGIP1 y Y y Extracellular Linum usitatissimum
BvPGIP2 y Y y Extracellular LuPGIP1 y y y Extracellular
BvPGIP3 y Y y Extracellular LuPGIP2 y y y Extracellular
BvPGIP4 y Y y Extracellular LuPGIP3 y y y Extracellular
BvPGIP5 y N y Extracellular LuPGIP4 y n y Extracellular
BvPGIP6 y Y y Extracellular LuPGIP5 y y y Extracellular
BvPGIP7 y N y Extracellular LuPGIP6 y n y Extracellular
BvPGIP8 n Y y Extracellular LuPGIP7 n n y Cytoplasm
BvPGIP9 y Y y Extracellular LuPGIP8 y y y Extracellular
Chenopodium quinoa LuPGIP9 y y y Extracellular
CqPGIP1 n Y y Extracellular Manihot esculenta
CqPGIP2 y Y y Extracellular MePGIP1 y n y Extracellular
CqPGIP3 y Y y Extracellular MePGIP2 y n y Extracellular
CqPGIP4 y Y y Extracellular MePGIP3 y n y Extracellular
CqPGIP5 y N y Extracellular MePGIP4 y n y Extracellular
CqPGIP6 y Y y Extracellular MePGIP_5 n y y Extracellular
CqPGIP7 y N y Extracellular MePGIP6 y y y Celll membrane
CqPGIP8 n N y Extracellular Carica papaya
CqPGIP9 n N y Extracellular CpPGIP1 y n y Extracellular
CqPGIP10 y N y Extracellular CpPGIP2 y n y Extracellular
CqPGIP11 n Y y Nucleus CpPGIP3 y n y Extracellular
CqPGIP12 n N y Cytoplasm Theobroma cacao
CqPGIP13 n N y Cytoplasm TcPGIP1 y n y Extracellular
CqPGIP14 y N y Extracellular TcPGIP2 y n y Extracellular
CqPGIP15 n N y Extracellular TcPGIP3 y y y Extracellular
CqPGIP16 n N y Nucleus TcPGIP4 y n y Extracellular
CqPGIP17 y N y Extracellular Arabidopsis thaliana
CqPGIP18 y N y Extracellular AtPGIP1 y n y Extracellular
CqPGIP19 y Y y Lysosome AtPGIP2 y y y Extracellular
CqPGIP20 n N y Extracellular AtPGIP3 y y y Extracellular
CqPGIP21 n N y Extracellular AtPGIP4 y y n Extracellular
CqPGIP22 y Y y Extracellular AtPGIP5 y n y Extracellular
Spinacia oleracea AtPGIP6 y n y Extracellular
SoPGIP1 y Y y Extracellular Schrenkiella parvula
SoPGIP2 y Y y Extracellular SpPGIP1 y n y Extracellular
SoPGIP3 n Y y Cytoplasm SpPGIP2 y n y Extracellular
SoPGIP4 y Y y Extracellular SpPGIP3 y y y Extracellular
SoPGIP5 y N y Extracellular SpPGIP4 y n y Extracellular
SoPGIP6 y N n Extracellular SpPGIP5 y n y Extracellular
Coffea Arabica SpPGIP6 y y y Extracellular
CaPGIP1 y Y y Extracellular Brassica oleracea capitata
CaPGIP2 y N y Extracellular BoPGIP1 y n y Extracellular
CaPGIP3 y Y y Extracellular BoPGIP2 n y y Extracellular
CaPGIP4 y Y y Extracellular BoPGIP3 y n y Extracellular
CaPGIP5 y Y y Extracellular BoPGIP4 y y y Extracellular
CaPGIP6 n N y Cytoplasm BoPGIP5 y y y Extracellular
CaPGIP7 n Y y Cytoplasm BoPGIP6 n n y Extracellular
CaPGIP8 y N y Extracellular BoPGIP7 n n y Extracellular
CaPGIP9 y Y n Extracellular BoPGIP8 y n y Extracellular
CaPGIP10 y N y Extracellular BoPGIP9 y n y Extracellular
CaPGIP11 y N y Extracellular BoPGIP10 y y y Extracellular
CaPGIP12 y N y Extracellular BoPGIP11 n n y Cytoplasm
CaPGIP13 y N y Extracellular BoPGIP12 y y y Extracellular
CaPGIP14 y N y Extracellular BoPGIP13 y n y Extracellular
CaPGIP15 y N y Extracellular BoPGIP14 y y y Extracellular
Daucus carota BoPGIP15 y n y Extracellular
DcPGIP1 y N y Extracellular BoPGIP16 y n y Extracellular
DcPGIP2 y N y Extracellular BoPGIP17 y n n Extracellular
DcPGIP3 y N y Extracellular BoPGIP18 y y y Extracellular
DcPGIP4 n N y Extracellular BoPGIP19 y n y Extracellular
DcPGIP5 y N y Extracellular Brassica rapa
DcPGIP6 y Y y Extracellular BrPGIP1 y n y Extracellular
DcPGIP7 y N n Extracellular BrPGIP2 y n y Extracellular
DcPGIP8 n N y Not find BrPGIP3 y n y Extracellular
DcPGIP9 y Y y Extracellular BrPGIP4 y n y Extracellular
Helianthus annuus BrPGIP5 y n y Extracellular
HaPGIP1 y Y y Extracellular BrPGIP6 y y y Extracellular
HaPGIP2 y Y y Extracellular BrPGIP7 y n y Extracellular
HaPGIP3 y N y Extracellular BrPGIP8 y y y Extracellular
HaPGIP4 y Y y Extracellular BrPGIP9 y n y Extracellular
HaPGIP5 y N y Extracellular BrPGIP10 y y y Extracellular
HaPGIP6 y N y Extracellular BrPGIP11 y y y Extracellular
HaPGIP7 y N y Extracellular BrPGIP12 y n y Extracellular
HaPGIP8 y N y Extracellular BrPGIP13 y y y Extracellular
Lactuca sativa BrPGIP14 y y y Extracellular
LsPGIP1 y Y y Extracellular BrPGIP15 y n y Extracellular
LsPGIP2 y Y y Extracellular BrPGIP16 y n n Extracellular
LsPGIP3 y Y y Extracellular BrPGIP17 y n y Extracellular
LsPGIP4 y N y Extracellular BrPGIP18 y y y Extracellular
LsPGIP5 y Y y Extracellular BrPGIP19 y y y Extracellular
LsPGIP6 y N y Extracellular Sinapis alba
LsPGIP7 n N y Cytoplasm SaPGIP1 y n y Extracellular
LsPGIP8 y N y Extracellular SaPGIP2 y n y Extracellular
LsPGIP9 y N y Extracellular SaPGIP3 y n y Extracellular
LsPGIP10 y N y Extracellular SaPGIP4 y n y Extracellular
LsPGIP11 y N y Extracellular SaPGIP5 y n y Extracellular
LsPGIP12 y N y Extracellular SaPGIP6 y y y Extracellular
LsPGIP13 y N y Extracellular SaPGIP7 n y y Nucleus
Mimulus guttatus SaPGIP8 y y y Extracellular
MgPGIP1 y N y Extracellular SaPGIP9 y y y Extracellular
MgPGIP2 y Y y Extracellular SaPGIP10 y n y Extracellular
MgPGIP3 y N y Extracellular SaPGIP11 y y y Extracellular
MgPGIP4 y Y y Extracellular SaPGIP12 y n y Extracellular
MgPGIP5 y Y y Extracellular SaPGIP13 y y y Extracellular
MgPGIP6 y Y y Extracellular SaPGIP14 y n y Extracellular
MgPGIP7 n Y y Cytoplasm SaPGIP15 y n y Extracellular
MgPGIP8 n Y y Cytoplasm SaPGIP16 y n y Extracellular
MgPGIP9 y Y y Extracellular SaPGIP17 y n y Extracellular
MgPGIP10 y Y y Cell membrane SaPGIP18 y n y Extracellular
Olea europaea N n SaPGIP19 y n y Extracellular
OePGIP1 y N y Extracellular SaPGIP20 y y n Extracellular
OePGIP2 y N y Extracellular SaPGIP21 n n y Extracellular
OePGIP3 y N y Extracellular SaPGIP22 n n y Lysosome
OePGIP4 n N y Nucleus SaPGIP23 y n y Extracellular
OePGIP5 n Y y Cell membrane SaPGIP24 y n y Extracellular
Solanum lycopersicum SaPGIP25 y n y Extracellular
SlPGIP1 y N y Extracellular SaPGIP26 y y n Extracellular
SlPGIP2 y N y Extracellular Gossypium hirsutum
SlPGIP3 y N y Extracellular GhPGIP1 y n y Extracellular
SlPGIP4 y Y y Extracellular GhPGIP2 y y y Extracellular
SlPGIP5 y N y Extracellular GhPGIP3 y n y Extracellular
Solanum tuberosum GhPGIP4 y y y Extracellular
StPGIP1 y N y Extracellular GhPGIP5 y n y Extracellular
StPGIP2 y Y y Extracellular GhPGIP6 y n y Extracellular
StPGIP3 y Y y Extracellular Citrus sinensis
Eucalyptus grandis CsPGIP1 y n y Extracellular
EgPGIP1 n N y Nucleus CsPGIP2 y n y Extracellular
EgPGIP2 n Y y Cytoplasm CsPGIP3 y y y Extracellular
EgPGIP3 y N y Extracellular CsPGIP4 y n y Extracellular
EgPGIP4 n N y Extracellular Ananas comosus
EgPGIP5 n N y Extracellular AcPGIP1 y y y Extracellular
EgPGIP6 y N y Extracellular AcPGIP2 y y y Extracellular
EgPGIP7 y N y Extracellular AcPGIP3 y n y Extracellular
EgPGIP8 y Y y Extracellular AcPGIP4 y n y Extracellular
EgPGIP9 y Y y Extracellular AcPGIP5 y n y Extracellular
Vitis vinifera Dioscorea alata
VvPGIP1 y N y Extracellular DaPGIP1 y y y Extracellular
VvPGIP2 y N y Extracellular DaPGIP2 y n y Extracellular
VvPGIP3 y N y Extracellular DaPGIP3 y y y Extracellular
VvPGIP4 y N y Extracellular DaPGIP4 y n y Extracellular
Arachis hypogaea DaPGIP5 n y y Extracellular
AhPGIP1 y N y Extracellular DaPGIP6 y n y Extracellular
AhPGIP2 y N y Extracellular DaPGIP7 y y y Extracellular
AhPGIP3 y N y Extracellular DaPGIP8 y y y Extracellular
AhPGIP4 y N y Extracellular DaPGIP9 y y y Extracellular
AhPGIP5 y Y y Extracellular Musa acuminata
AhPGIP6 y Y y Extracellular MaPGIP1 y y y Extracellular
AhPGIP7 y N y Extracellular MaPGIP2 y n y Extracellular
AhPGIP8 y N y Extracellular MaPGIP3 n y y Cytoplasm
AhPGIP9 y Y y Extracellular MaPGIP4 y y y Extracellular
AhPGIP10 y Y y Extracellular MaPGIP5 n y y Extracellular
AhPGIP11 y Y y Extracellular Hordeum vulgare
AhPGIP12 y Y y Extracellular HvPGIP1 y y y Extracellular
AhPGIP13 y Y n Extracellular HvPGIP2 y y y Extracellular
AhPGIP14 y Y n Extracellular HvPGIP3 n y y Extracellular
AhPGIP15 n Y y Cytoplasm HvPGIP4 y y y Extracellular
AhPGIP16 n Y y Cytoplasm HvPGIP5 y y y Extracellular
AhPGIP17 y Y y Extracellular HvPGIP6 y y y Extracellular
Castanea dentate HvPGIP7 y y y Extracellular
CdPGIP1 y Y y Extracellular HvPGIP8 n y y Extracellular
CdPGIP2 y N y Extracellular HvPGIP9 n y y Cytoplasm
CdPGIP3 y N y Extracellular HvPGIP10 y y y Extracellular
CdPGIP4 y Y y Extracellular HvPGIP11 y y y Extracellular
CdPGIP5 n N y Lysosome HvPGIP12 y y y Extracellular
Cicer arietinum HvPGIP13 y y y Extracellular
CiaPGIP1 y Y y Extracellular Oryza sativa
CiaPGIP2 y N y Extracellular OsPGIP1 y y y Extracellular
CiaPGIP3 y N y Extracellular OsPGIP2 y n y Extracellular
CiaPGIP4 n Y y Extracellular OsPGIP3 n n y Cell membrane
CiaPGIP5 y Y y Extracellular OsPGIP4 y y y Extracellular
CiaPGIP6 y Y y Extracellular OsPGIP5 y y y Extracellular
CiaPGIP7 y N y Extracellular OsPGIP6 y y y Extracellular
CiaPGIP8 y N y Extracellular OsPGIP7 n y y Nucleus
Cucumis sativus OsPGIP8 y n y Extracellular
CusPGIP1 y N y Extracellular OsPGIP9 y y y Extracellular
CusPGIP2 y N y Extracellular Triticum aestivum
CusPGIP3 y N y Extracellular TaPGIP1 y y y Extracellular
Fragaria vesca TaPGIP2 y y y Extracellular
FvPGIP1 y Y y Extracellular TaPGIP3 y y y Extracellular
FvPGIP2 y N y Extracellular TaPGIP4 y y y Extracellular
FvPGIP3 y Y y Extracellular TaPGIP5 y y y Extracellular
FvPGIP4 y N y Extracellular TaPGIP6 y y y Extracellular
Malus domestica TaPGIP7 y n y Extracellular
MdPGIP1 y N y Extracellular TaPGIP8 y n y Extracellular
MdPGIP2 y Y y Extracellular TaPGIP9 y y y Extracellular
MdPGIP3 n N y Lysosome TaPGIP10 y y y Extracellular
MdPGIP4 y N y Extracellular TaPGIP11 y y y Extracellular
MdPGIP5 y N y Extracellular TaPGIP12 y y y Extracellular
MdPGIP6 y N y Extracellular TaPGIP13 y y y Extracellular
MdPGIP7 y Y y Extracellular TaPGIP14 y y y Extracellular
MdPGIP8 y Y y Extracellular TaPGIP15 y y y Extracellular
Medicago truncatula TaPGIP16 y y y Extracellular
MtPGIP1 y N y Extracellular TaPGIP17 y y y Extracellular
MtPGIP2 y Y y Extracellular TaPGIP18 n n y Cytoplasm
MtPGIP3 y Y y Extracellular TaPGIP19 n n y Cytoplasm
MtPGIP4 y N y Extracellular Brachypodium distachyon
MtPGIP5 y Y y Extracellular BdPGIP1 y y y Extracellular
MtPGIP6 y Y y Extracellular BdPGIP2 y y y Extracellular
MtPGIP7 y N y Extracellular BdPGIP3 y y y Extracellular
MtPGIP8 y Y y Extracellular BdPGIP4 y y y Extracellular
MtPGIP9 n N y Not find BdPGIP5 y y y Extracellular
MtPGIP10 y Y y Extracellular BdPGIP6 y y y Extracellular
MtPGIP11 y Y y Extracellular BdPGIP7 y y y Extracellular
MtPGIP12 y Y y Extracellular Miscanthus sinensis
MtPGIP13 y Y y Extracellular MsPGIP1 y n y Extracellular
MtPGIP14 y N y Extracellular MsPGIP2 y y y Extracellular
MtPGIP15 y Y y Extracellular MsPGIP3 y y y Extracellular
MtPGIP16 y Y y Extracellular MsPGIP4 y n y Extracellular
MtPGIP17 y Y y Extracellular MsPGIP5 n y y Extracellular
MtPGIP18 n Y y Chloroplast MsPGIP6 y y y Extracellular
MtPGIP19 y Y y Extracellular MsPGIP7 y y y Extracellular
MtPGIP20 y N y Extracellular MsPGIP8 y n y Extracellular
MtPGIP21 y Y y Extracellular MsPGIP9 y n y Extracellular
MtPGIP22 y Y y Cell membrane MsPGIP10 n y y Cytoplasm
MtPGIP23 y Y y Extracellular MsPGIP11 y n y Extracellular
Phaseolus vulgaris MsPGIP12 n y y Extracellular
PvPGIP1 y Y y Extracellular MsPGIP13 y y y Extracellular
PvPGIP3 y Y y Extracellular Sorghum bicolor
PvPGIP4 y Y y Extracellular SbPGIP1 y y y Extracellular
PvPGIP5 y Y y Extracellular SbPGIP2 y y y Extracellular
PvPGIP6 y N y Extracellular SbPGIP3 y y y Extracellular
PvPGIP7 y N y Extracellular SbPGIP4 y y y Extracellular
PvPGIP8 n Y y Extracellular SbPGIP5 y y n Extracellular
PvPGIP9 y N y Extracellular SbPGIP6 y y y Extracellular
Prunus persica SbPGIP7 y y y Extracellular
PpPGIP1 y N y Extracellular SbPGIP8 y y y Extracellular
PpPGIP2 y N y Extracellular Zea mays
PpPGIP3 y N y Extracellular ZmPGIP1 y y y Extracellular
PpPGIP4 y Y y Extracellular ZmPGIP2 y y y Extracellular
PpPGIP5 y N y Extracellular ZmPGIP3 y n y Extracellular
Quercus rubra ZmPGIP4 y y y Extracellular
QrPGIP1 y Y y Extracellular ZmPGIP5 y y y Extracellular
QrPGIP2 y Y y Extracellular ZmPGIP6 y y y Extracellular
QrPGIP3 y Y y Extracellular ZmPGIP7 y y y Extracellular
QrPGIP4 y Y y Nucleus ZmPGIP8 y y y Extracellular
QrPGIP5 y N y Extracellular ZmPGIP9 y y y Extracellular
QrPGIP6 y N y Extracellular Panicum hallii
QrPGIP7 y Y y Cytoplasm PhPGIP1 y y y Extracellular
Trifolium pretense PhPGIP2 y y y Extracellular
TpPGIP1 y Y y Extracellular PhPGIP3 y y y Extracellular
TpPGIP2 y Y y Extracellular PhPGIP4 y y y Extracellular
TpPGIP3 y Y y Extracellular PhPGIP5 y y y Extracellular
TpPGIP4 y Y y Extracellular PhPGIP6 y y y Extracellular
TpPGIP5 y Y y Extracellular PhPGIP7 y y y Cell membrane
TpPGIP6 y N y Extracellular Vaccinium darrowii
TpPGIP7 y Y y Extracellular VdPGIP1 y y y Extracellular
TpPGIP8 y N y Extracellular VdPGIP2 y y y Extracellular
TpPGIP9 y N y Extracellular VdPGIP3 y y y Extracellular
TpPGIP10 y Y y Extracellular VdPGIP4 y y y Extracellular
TpPGIP11 n Y y Cell membrane VdPGIP5 y n y Extracellular
TpPGIP12 y Y y Extracellular VdPGIP6 y y y Extracellular
TpPGIP13 y N y Extracellular VdPGIP7 y y y Extracellular
TpPGIP14 y Y y Cell membrane VdPGIP8 y n y Extracellular
TpPGIP15 y Y y Extracellular VdPGIP9 n n y Cytoplasm
TpPGIP16 n Y y Extracellular VdPGIP10 y n y Extracellular
Carya illinoinensis VdPGIP11 y y y Extracellular
CiPGIP1 y Y n Extracellular VdPGIP12 y y y Extracellular
CiPGIP2 y Y n Extracellular VdPGIP13 n y y Extracellular
CiPGIP3 y Y n Extracellular Glycine max
GmPGIP1 y y y Extracellular
GmPGIP2 n y y Cytoplasm
GmPGIP3 y y y Extracellular
GmPGIP4 y y y Extracellular
GmPGIP5 n y y Nucleus
GmPGIP6 y y y Extracellular
GmPGIP7 y y y Extracellular
GmPGIP8 y y y Extracellular
GmPGIP9 n y y Cell membrane
GmPGIP10 y y y Extracellular
GmPGIP11 y n y Extracellular

In contrast to the above-presented findings both GmPGIP1 and GmPGIP11 are predicted to be N-glycosylated. However, some of their predicted N-glycosylation sites are not at homologous aa positions (Fig. 1; Supplemental Data File 5). For example, the NPTT site found in GmPGIP1 and starting at aa position 41 is not identified in GmPGIP11 (Fig.). In contrast, a NLSG site found in GmPGIP11 and starting at aa position 101 is not found in GmPGIP1. Similarly, an NLSG predicted N-glycosylation site found in GmPGIP11 and starting at aa position 174 is not found in GmPGIP1 (Fig. 1). Furthermore, an NKTT predicted N-glycosylation site found in GmPGIP11 and starting at aa position 258 is not found in GmPGIP1 (Fig. 1). However, N-glycosylation sites that are in homologous positions between GmPGIP1 and GmPGIP11 do exist (Fig. 1). GmPGIP1 has a NVSG predicted N-glycosylation site starting at aa position 132 while GmPGIP11 has a NVSG predicted N-glycosylation site starting at aa position 150 (Fig. 1). Consequently, while experimentation has not proven that these sites are important to the functional differences occurring between GmPGIP1 and GmPGIP11, they are different and provide a basis for future experimentation.

Fig. 1.

Fig 1

Beta vulgaris BvPGIP4 protein analysis. A. Signal peptide prediction. B. Amino acid relative importance. C. Hierarchical tree, showing the localization likelihood in numerical value: Extracellular, 0.9701; Lysosome/Vacuole, 0.0263; Endoplasmic reticulum, 0.0024; Cell membrane, 0.0007; Cytoplasm, 0.0003; Mitochondrion, 0.0001; Golgi apparatus, 0; Plastid, 0; Nucleus, 0; Peroxisome, 0; Soluble, 0.9994; Membrane, 0.0006. Prediction: Extracellular, Soluble.

2.3. Artificial intelligence

The 469 identified PGIP proteins spanning the 51 genomes are assessed by artificial intelligence analyses to produce a sequence position file (Supplemental Data File 6). A second file generates a map to the cellular destination where the predicted protein is predicted to function (Table 3; Supplemental Data File 7). An example for Beta vulgaris BvPGIP4, shown to function in defence to various pathogens in N. tabacum, is presented (Fig. 2) [5].

Fig. 2.

Fig 2

Predicted O- and N-glycosylation sites of G. max PGIP proteins. Cyan, predicted O-glycosylation site. Magenta, N-glycosylation site. Yellow, an aa that overlaps between two predicted N-glycosylation sites. Blue, an aa that overlaps between an O- and N-glycosylation site. Gray, possible mis-annotated N-terminal sequence.

The secretion of plant proteins is an important cellular property used for a variety of processes including development and disease resistance [10]. The data presented here is computational support showing PGIPs identified as belonging to taxa positioned at the base of angiosperm evolution are predicted to have signal peptides, have O- and/or N-glycosylation, and undergo secretion into the apoplast. Further assessment identifies PGIPs from both monocot and dicot lineages with predicted signal peptides and the subcellular or supracellular compartment to which they are targeted [5,11].

2.4. Analysed proteomes

The study analyses the proteomes of 51 plants not including G. max, many important to agriculture. The 51 proteomes span the base of angiosperm evolution (A. trichopoda), a monotypic genus of Amborellaceae and the only member of the Amborellales that has 2 predicted PGIP proteins [12]. Each PGIP is predicted to have signal peptides, experience O- and N-glycosylation, and undergo secretion into the apoplast. The monocots presented here are represented by A. comosus, D. alata, M. acuminata, H. vulgare, O. sativa, T. aestivum, B. distachyon, M. sinensis, S. bicolor, Z. mays and P. hallii with the remaining plants belonging to the Eudicots. All of the studied species have at least one putative PGIP that is predicted to have a signal peptide, have O- and/or N-glycosylation, and are secreted into the apoplast.

Local duplication of plant genes, including PGIPs, results in the generation of genes whose protein products perform an important function in defence [6]. The PGIP proteins identified here also appear to be products of localized gene duplications. Consequently, the identified genes may relate to the birth and death model for PGIPs that is proposed [6]. Possible localized gene duplication is identified from the analysis of the 51 proteomes. Based off the annotations, the analysis, identifying direct tandem duplications for at least one PGIP gene duplication in 29 of the 51 proteomes including A. hypochondriacus, B. vulgaris, C. quinoa, C. arabica, D. carota, M. guttatus, O. europaea, E. grandis, C. arietinum, M. domestica, M. truncatula, P. vulgaris, P. persica, Q. rubra, V. unguiculata, L. usitatissimum, M. esculenta, T. cacao, A. thaliana, S. parvula, B. oleracea capitata, B. rapa, S. alba, D. alata, V. darrowii, O. sativa, M. sinensis, S. bicolor and P. hallii.

2.5. Glycosylation

Computational studies are performed to identify O- and/or N-glycosylation of the PGIP proteins. Glycosylation is an important feature of proteins that imparts new function and in plants is important in both development and defence [13]. Glycosylation is not a random event and occurs on greater than 50% of eukaryote proteins [14]. Pyrus communis (pear) PGIP exhibits heterogeneous glycosylation that relates to pathogen defence [15]. The results presented here provide computational support that plant PGIPs experience glycosylation, broadly.

2.6. Sequence alignments identify glycosylation variation that may explain functional differences

Transgenic studies show GmPGIP11 functions in defence to H. glycines while GmPGIP1 does not. A computational study analysing the O- and N-glycosylation sites show that while GmPGIP1 is predicted to be O-glycosylated that GmPGIP11 is not. Furthermore, GmPGIP11 has predicted N-glycosylation sites that GmPGIP1 does not while GmPGIP1 has predicted N-glycosylation sites that are lacking in GmPGIP11. Glycosylation performs important defence roles [16].

2.7. Alternate spicing of PGIP mRNAs

What has not been presented is the possible importance of alternate RNA splicing in PGIP biology. The analysis presented here identifies 4 proteomes (L. sativa, P. persica, H. vulgare, and T. aestivum) that are annotated to contain products of alternate splicing. Alternatively spliced transcripts of genes encode transcripts that perform important defence functions, including parasitic nematodes [8].

3. Supplemental Data

Supplemental Data Set 1: The PGIP accessions obtained by the blast queries according to the described protocol. Supplemental data file 1 - https://data.mendeley.com/datasets/66r9pkckjz/1.

Supplemental Data Set 2: The protein sequences obtained from Phytozome. Supplemental data file 2- https://data.mendeley.com/datasets/66r9pkckjz/1.

Supplemental Data Set 3: The signal peptide prediction made by SignalP 6.0. Supplemental data file 3- https://data.mendeley.com/datasets/66r9pkckjz/1.

Supplemental Data Set 4: The O-glycosylation prediction made by NetOGlyc - 4.0. Supplemental data file 4- https://data.mendeley.com/datasets/66r9pkckjz/1.

Supplemental Data Set 5: The N-glycosylation prediction made by NetNGlyc - 1.0. Supplemental data file 5 - https://data.mendeley.com/datasets/66r9pkckjz/1.

Supplemental Data Set 6: The amino acid relative importance predicted by DeepLoc-1.0. Supplemental data file 6- https://data.mendeley.com/datasets/66r9pkckjz/1.

Supplemental Data Set 7: The hierarchical trees predicted by DeepLoc-1.0. Supplemental data file 7- https://data.mendeley.com/datasets/66r9pkckjz/1.

4. Experimental Design, Materials and Methods

4.1. Data access

https://data.mendeley.com/datasets/66r9pkckjz/1.

4.2. Analysed proteomes

The 11 G. max PGIP protein sequences are used in Basic Local Alignment Search Tool program (BLAST) searches of the proteomes (BLASTP) using the default parameters at Phytozome (http://www.phytozome.net/) [7]. There are 52 total proteomes analysed [7]. The default BLAST parameters are used in querying, including Target type: Proteome; Program: BLASTP-protein query to protein database; Expect (E) threshold: -1; Comparison matrix: BLOcks SUbstitution Matrix (BLOSUM) 62 (BLOSUM62); Word (W) length: default = 3; number of alignments to show: 100 allowing for gaps and filter query, in order that they appear on the BLAST program. Through these analyses it is possible to extract the genomic DNA, transcript, cDNA, protein accessions, their sequences, and gene family members. The analyses also permit the extraction of protein homologs and splice variants from the selected agricultural crops of international importance, those with importance in the U.S., and those important biologically according to [8].

The identified PGIP proteins are compiled using a Bitscore of 140 as a cutoff. To identify the PGIP proteins, each of the 11 G. max PGIP protein sequences are queried into the studied proteomes. The individual queries for GmPGIP1 through GmPGIP11 are stored in individual tabs in Excel. Then, the PGIPs that have Bitscores of 140 or higher are compiled for all of the queries for the individual GmPGIPs. The duplicate PGIPs then are removed in Excel. The analysis results in a list of PGIP proteins that include the products of alternate splicing so the numbers in some cases are higher than the numbers of genes in some genomes.

4.3. Signal peptide prediction

Signal peptide prediction is done using SignalP 6.0 [9]. SignalP 6.0 is based on protein language models (LMs). The models use information from millions of unannotated protein sequence which are been analysed across all life domains. LMs create logical protein representations capturing their biological structure and properties. SignalP 6.0, thus, predicts additional SP types not possible in earlier iterations of SignalP (e.g., SignalP 5.0) and better extrapolates them to distantly related proteins and ones used to create the model and metagenomic data of unknown origin. SignalP 6.0 also identifies SP subregions. The default parameters are used.

4.4. O-glycosylation determination

O-glycosylation is determined using NetOGlyc - 4.0 [17]. The parameters are set on default. The output format is imported into Excel.

4.5. N-glycosylation determination

N-glycosylation is determined using NetNGlyc - 1.0 set [18]. The parameters are on set default. The output format is imported into Excel.

4.6. Protein alignment

Protein alignment is performed using CLUSTAL Omega, CLUSTAL O(1.2.4) multiple sequence alignment [19]. The analysis is performed using default parameters. The output file is imported into MS Word.

4.7. Artificial intelligence

Prediction of eukaryotic protein subcellular localization using deep learning is done using DeepLoc-1.0 [20]. The DeepLoc-1.0 analysis determines the importance of a particular amino acid along a protein chain that is relevant for prediction (attention) of its subcellular location and is done in default settings. DeepLoc-1.0 then predicts the subcellular localization of eukaryotic proteins, differentiating between 10 different localizations including the nucleus, cytoplasm, extracellular, mitochondrion, cell membrane, endoplasmic reticulum, (ER) chloroplast, Golgi apparatus, lysosome/vacuole, and peroxisome and is done in default settings. The output of the analysis is presented as a graphic that shows the relative importance of each AA along the polypeptide chain as well as a hierarchical tree that shows where the protein is expected to be located withing a cell [20].

Limitations

Not applicable.

Ethics Statement

The authors have read and follow the ethical requirements for publication in Data in Brief and confirm that the current work does not involve human subjects, animal experiments, or any data collected from social media platforms.

CRediT authorship contribution statement

Sudha Acharya: Validation, Formal analysis, Investigation. Hallie A. Troell: Validation, Formal analysis, Investigation. Rebecca L. Billingsley: Validation, Formal analysis, Investigation. Katherine S. Lawrence: Validation, Formal analysis, Investigation, Resources, Writing – original draft, Supervision, Project administration, Funding acquisition. Daniel S. McKirgan: Validation, Formal analysis, Investigation. Nadim W. Alkharouf: Methodology, Validation, Investigation, Resources, Writing – original draft, Supervision, Project administration. Vincent P. Klink: Conceptualization, Methodology, Validation, Investigation, Resources, Writing – original draft, Writing – review & editing, Visualization, Supervision, Project administration, Funding acquisition.

Acknowledgements

VK is thankful to the Department of Biological Sciences and the Department of Biochemistry, Molecular Biology, Entomology and Plant Pathology (BMBEPP) at Mississippi State University. Furthermore, VK is thankful to Gary Lawrence (retired) (BMBEPP) for all his support over the years. Robert Nichols and Kater Hake of Cotton Incorporated are thanked for their support during this project. Yixiu (Jan) Pinnix BMBEPP is thanked for her technical support. Jeff Dean (BMBEPP) is thanked for his generosity of providing greenhouse, headhouse, storage, and field space for the experiments and maintenance of plant stocks. The authors thank Scott Willard, Wes Burger, George Hopper, and Reuben Moore, Mississippi Agricultural and Forestry Experiment Station (MAFES), and Mississippi State University for support. The College of Arts and Sciences and MAFES at Mississippi State University have each provided Special Research Initiative (SRI) funding for this work.

USDA-ARS NP301- 8042-21220-233; Cotton Incorporated, grants 17-603, 19-603; MAFES-Special Research Initiative (SRI); SRI-01; Alabama Hatch Grant ALA015-2-14003.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability

References

  • 1.Jones J.D., Dangl J.L. The plant immune system. Nature. 2006;444:323–329. doi: 10.1038/nature05286. [DOI] [PubMed] [Google Scholar]
  • 2.Waldron K.W., Faulds C.B. Comprehensive Glycoscience, Elsevier; 2007. Cell Wall Polysaccharides: Composition and Structure, Editor(s): Hans Kamerling; pp. 181–201. [DOI] [Google Scholar]
  • 3.Phaff H.J. The production of exocellular pectic enzymes by Penicillium chrysogenum; on the formation and adaptive nature of polygalacturonase and pectinesterase. Arch. Biochem. 1947;13:67–81. [PubMed] [Google Scholar]
  • 4.Cervone F., Castoria R., Leckie F., De Lorenzo G. Perception of fungal elicitors and signal transduction. Aducci P., editor. Perception of fungal elicitors and signal transductionSignal Transduction in Plants. 1997 doi: 10.1007/978-3-0348-9183-7_8. [DOI] [Google Scholar]
  • 5.Li H., Smigocki A.C. Sugar beet polygalacturonase-inhibiting proteins with 11 LRRs confer Rhizoctonia, Fusarium and Botrytis resistance in Nicotiana plants. Physiol. Mol. Plant Pathol. 2018;102:200–208. doi: 10.1016/J.PMPP.2018.03.001. [DOI] [Google Scholar]
  • 6.Kalunke R.M., Cenci A., Volpi C., O'Sullivan D.M., Sella L., Favaron F., Cervone F., De Lorenzo G., D'Ovidio R. The pgip family in soybean and three other legume species: evidence for a birth-and-death model of evolution. BMC Plant Biol. 2014;14:189. doi: 10.1186/s12870-014-0189-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Goodstein D., Shu S., Howson R. Phytosome: a comparative platform for green plant genomics. Nucleic Acids Res. 2012;40:D1178–D1186. doi: 10.1093/nar/gkr944. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Klink V.P., Darwish O., Alkharouf N.W., Lawaju B.R., Khatri R., Khatri R., Lawrence K.S. Conserved oligomeric Golgi (COG) complex genes functioning in defense are expressed in root cells undergoing a defense response to a pathogenic infection. PLoS One. 2021 doi: 10.1371/journal.pone.0256472. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Teufel F., Almagro Armenteros J.J., Johansen A.R., Gíslason M.H., Pihl S.I., Tsirigos K.D., Winther O., Brunak S., von Heijne G., Nielsen H. SignalP 6.0 predicts all five types of signal peptides using protein language models. Nat. Biotechnol. 2022 doi: 10.1038/s41587-021-01156-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Collins N.C., Thordal-Christensen H., Lipka V., Bau S., Kombrink E., Qiu J.L., Hückelhoven R., Stein M., Freialdenhoven A., Somerville S.C., Schulze-Lefert P. SNARE-protein-mediated disease resistance at the plant cell wall. Nature. 2003;425:973–977. doi: 10.1038/nature02076. [DOI] [PubMed] [Google Scholar]
  • 11.Ferrari S., Sella L., Janni M., De Lorenzo G., Favaron F., D'Ovidio R. Transgenic expression of polygalacturonase-inhibiting proteins in Arabidopsis and wheat increases resistance to the flower pathogen Fusarium graminearum. Plant Biol. (Stuttg.) 2012;14(Suppl 1):31–38. doi: 10.1111/j.1438-8677.2011.00449.x. [DOI] [PubMed] [Google Scholar]
  • 12.Group Angiosperm Phylogeny. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG III. Bot. J. Linnean Soc. 2009;161:105–121. doi: 10.1111/j.1095-8339.2009.00996.x. [DOI] [Google Scholar]
  • 13.Tan L., Qiu F., Lamport D.T., Kieliszewski M.J. Structure of a hydroxyproline (Hyp)-arabinogalactan polysaccharide from repetitive Ala-Hyp expressed in transgenic Nicotiana tabacum. J. Biol. Chem. 2004;279:13156–13165. doi: 10.1074/jbc.M311864200. [DOI] [PubMed] [Google Scholar]
  • 14.Apweiler R., Hermjakob H., Sharon N. On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database. Biochimica Biophysica Acta. 1999;1473:4–8. doi: 10.1016/0031-9422(81)83083-X. [DOI] [PubMed] [Google Scholar]
  • 15.Lim J.M., Aoki K., Angel P., Garrison D., King D., Tiemeyer M., Bergmann C., Wells L. Mapping glycans onto specific N-linked glycosylation sites of Pyrus communis PGIP redefines the interface for EPG-PGIP interactions. J. Proteome Res. 2009;8:673–680. doi: 10.1021/pr800855f. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Zhang Y., Held M.A., Kaur D., Showalter A.M. CRISPR-Cas9 multiplex genome editing of the hydroxyproline-O-galactosyltransferase gene family alters arabinogalactan-protein glycosylation and function in Arabidopsis. BMC Plant Biol. 2021;21:16. doi: 10.1186/s12870-020-02791-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Steentoft C., Vakhrushev S.Y., Joshi H.J., Kong Y., Vester-Christensen M.B., Schjoldager K.T., Lavrsen K., Dabelsteen S., Pedersen N.B., Marcos-Silva L., Gupta R., Bennett E.P., Mandel U., Brunak S., Wandall H.H., Levery S.B., Clausen H. Precision mapping of the human O-GalNAc glycoproteome through SimpleCell technology. EMBO J. 2013;32:1478–1488. doi: 10.1038/emboj.2013.79. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Gupta R., Brunak S. Prediction of glycosylation across the human proteome and the correlation to protein function. Pac. Symp. Biocomput. 2002;2002:310–322. [PubMed] [Google Scholar]
  • 19.Sievers F., Wilm A., Dineen D., Gibson T.J., Karplus K., Li W., Lopez R., McWilliam H., Remmert M., Söding J., Thompson J.D., Higgins D.G. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 2011;7:539. doi: 10.1038/msb.2011.75. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Almagro Armenteros J.J., Sønderby C.K., Sønderby S.K., Nielsen H., Winther O. DeepLoc: prediction of protein subcellular localization using deep learning. Bioinformatics. 2017;33:3387–3395. doi: 10.1093/bioinformatics/btx431. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement


Articles from Data in Brief are provided here courtesy of Elsevier

RESOURCES