Cereal allergen/alpha-amylase inhibitor, rice type
Type:
Family
Description:
Seeds of cereals contain a variety of serine protease and alpha-amylase inhibitors. These inhibitors can be grouped into families based on structural similarities. Rice seed allergenic proteins (RA) have sequence homology to seed trypsin/alpha-amylase inhibitors. Some have serine peptidase activity or alpha-amylase, and a few are bifunctional. The proteins contain ~10 cysteine residues, all of which are involved in disulphide bond formation [
].This majority of sequences in this family are from Oryza sativa (Rice), exceptions are from Hordeum vulgare (Barley) and Triticum aestivum (Wheat). The majority are annotated either as alpha-amylase inhibitors or seed allergens and all belong to the MEROPS inhibitor family I6, clan IJ. There is no direct evidence to suggest that they can inhibit serine peptidases belonging to MEROPS peptidase S1 [
], and studies on a closely related alpha-amylase inhibitor from Secale cereale (Rye) demonstrates no activity against trypsin, and illustrates the necessity of exercising caution in assigning function based on sequence comparisons [].The rice seed allergenic proteins are encoded by a multigene family consisting of at least four members. A conserved sequence similar to a motif identified in rice glutelin promoters was observed in the 5' region of the two genes. RA genes are specifically expressed in ripening seeds and they accumulate maximally 15-20 days after flowering [
].
The jacalin-like mannose-binding lectin domain has a β-prism fold consisting of three 4-stranded β-sheets, with an internal pseudo 3-fold symmetry. Some proteins with this domain stimulate distinct T- and B- cell functions, such as the plant lectin jacalin, which binds to the T-antigen and acts as an agglutinin. The domain can occur in tandem-repeat arrangements with up to six copies, and in architectures combined with a variety of other functional domains. While the family was initially named after an abundant protein found in the jackfruit seed, taxonomic distribution is not restricted to plants. The domain is also found in the salt-stress induced protein from rice and an animal prostatic spermine-binding protein. Proteins containing this domain include:Jacalin, a tetrameric plant seed lectin and agglutinin from Artocarpus heterophyllus (jackfruit), which is specific for galactose [
].Artocarpin, a tetrameric plant seed lectin from A. heterophyllus [
].Lectin MPA, a tetrameric plant seed lectin and agglutinin from Maclura pomifera (Osage orange), [
].Heltuba lectin, a plant seed lectin and agglutinin from Helianthus tuberosus (Jerusalem artichoke) [
].Agglutinin from Calystegia sepium (Hedge bindweed) [
].Griffithsin, an anti-viral lectin from red algae (Griffithsia species) [
].
The jacalin-like mannose-binding lectin domain has a β-prism fold consisting of three 4-stranded β-sheets, with an internal pseudo 3-fold symmetry. Some proteins with this domain stimulate distinct T- and B- cell functions, such as the plant lectin jacalin, which binds to the T-antigen and acts as an agglutinin. The domain can occur in tandem-repeat arrangements with up to six copies, and in architectures combined with a variety of other functional domains. While the family was initially named after an abundant protein found in the jackfruit seed, taxonomic distribution is not restricted to plants. The domain is also found in the salt-stress induced protein from rice and an animal prostatic spermine-binding protein. Proteins containing this domain include:Jacalin, a tetrameric plant seed lectin and agglutinin from Artocarpus heterophyllus (jackfruit), which is specific for galactose [
].Artocarpin, a tetrameric plant seed lectin from A. heterophyllus [
].Lectin MPA, a tetrameric plant seed lectin and agglutinin from Maclura pomifera (Osage orange), [
].Heltuba lectin, a plant seed lectin and agglutinin from Helianthus tuberosus (Jerusalem artichoke) [
].Agglutinin from Calystegia sepium (Hedge bindweed) [
].Griffithsin, an anti-viral lectin from red algae (Griffithsia species) [
].
Jacalin-like lectins are sugar-binding protein domains mostly found in plants. They adopt a β-prism topology consistent with a circularly permuted three-fold repeat of a structural motif. Proteins containing this domain may bind mono- or oligosaccharides with high specificity. The domain can occur in tandem-repeat arrangements with up to six copies, and in architectures combined with a variety of other functional domains. While the family was initially named after an abundant protein found in the jackfruit seed, taxonomic distribution is not restricted to plants. The domain is also found in the salt-stress induced protein from rice and an animal prostatic spermine-binding protein. Proteins containing this domain include:Jacalin, a tetrameric plant seed lectin and agglutinin from Artocarpus heterophyllus (jackfruit), which is specific for galactose [
].Artocarpin, a tetrameric plant seed lectin from A. heterophyllus [
].Lectin MPA, a tetrameric plant seed lectin and agglutinin from Maclura pomifera (Osage orange), [
].Heltuba lectin, a plant seed lectin and agglutinin from Helianthus tuberosus (Jerusalem artichoke) [
].Agglutinin from Calystegia sepium (Hedge bindweed) [
].Griffithsin, an anti-viral lectin from red algae (Griffithsia species) [
].Ipomoelin, a Jacalin-related lectin from sweet potato (Ipomoea batatas cv. Tainung 57) [
]. This entry refers to jacalin-like lectin domains found in plants.
This entry represents a group of plant seed storage proteins, including Napin, 2S seed storage protein and Conglutin.Napins are low-molecular weight, basic storage proteins synthesised in rape-seed embryos during seed
maturation [,
]. Sequence comparisons have revealed that Napin belongs to a diverseprotein family, which includes major allergens, trypsin inhibitors and natural anti-fungal proteins. Napin
comprises 2 polypeptide chains (MW 9000 and 4000) held together by disulphide bonds. The protein isinitially synthesised as a precursor of 178 residues, which is proteolytically cleaved to generate mature
Napin chains, with 86 and 29 residues respectively [].Some of the proteins in this family are allergens, with cores that are very resistant to proteolytic digestion and to elevated temperatures of up to 100 degrees C [
,
,
,
]. Allergies are hypersensitivity reactions of the immune system to specific substances called allergens (such as pollen, stings, drugs, or food) that, in most people, result in no symptoms. A nomenclature system has been established for antigens (allergens) that cause IgE-mediated atopic allergies in humans [WHO/IUIS Allergen Nomenclature SubcommitteeKing T.P., Hoffmann D., Loewenstein H., Marsh D.G., Platts-Mills T.A.E.,
Thomas W. Bull. World Health Organ. 72:797-806(1994)]. This nomenclature system is defined by a designation that is composed of
the first three letters of the genus; a space; the first letter of thespecies name; a space and an arabic number. In the event that two species
names have identical designations, they are discriminated from one anotherby adding one or more letters (as necessary) to each species designation.
This family consists of several galactinol-sucrose galactosyltransferase proteins, also known as raffinose synthases, which is a widespread oligosaccharide in plant seeds and other tissues. Raffinose synthase (
) is the key enzyme that channels sucrose into the raffinose oligosaccharide pathway [
]. Raffinose family oligosaccharides (RFOs) are ubiquitous in plant seeds and are thought to play critical roles in the acquisition of tolerance to desiccation and seed longevity. Raffinose synthases are alkaline alpha-galactosidases and are solely responsible for RFO breakdown in germinating maize seeds, whereas acidic galactosidases appear to have other functions []. Glycoside hydrolase family 36 can be split into 11 families, GH36A to GH36K []. This family includes enzymes from GH36C.
The Kunitz-type soybean trypsin inhibitor (STI) family consists mainly of proteinase inhibitors from Leguminosae seeds [
]. They belong to MEROPS inhibitor family I3, clan IC. They exhibit proteinase inhibitory activity against serine proteinases; trypsin (MEROPS peptidase family S1, ) and subtilisin (MEROPS peptidase family S8,
), thiol proteinases (MEROPS peptidase family C1,
) and aspartic proteinases (MEROPS peptidase family A1,
) [
]. Inhibitors from cereals are active against subtilisin and endogenous alpha-amylases, while some also inhibit tissue plasminogen activator. The inhibitors are usually specific for either trypsin or chymotrypsin, and some are effective against both. They are thought to protect the seeds against consumption by animal predators, while at the same time existing as seed storage proteins themselves - all the actively inhibitory members contain 2 disulphide bridges. The existence of a member with no inhibitory activity, winged bean albumin 1, suggests that the inhibitors may have evolved from seed storage proteins.Proteins from the Kunitz family contain from 170 to 200 amino acid residues and one or two intra-chain disulphide bonds. The best conserved region is found in their N-terminal section. The crystal structures of soybean trypsin inhibitor (STI), trypsin inhibitor DE-3 from the Kaffir tree Erythrina caffra (ETI) [
] and the bifunctional proteinase K/alpha-amylase inhibitor from wheat (PK13) have been solved, showing them to share the same 12-stranded β-sheet structure as those of interleukin-1 and heparin-binding growth factors []. The β-sheets are arranged in 3 similar lobes around a central axis, 6 strands forming an anti-parallel β-barrel. Despite the structural similarity, STI shows no interleukin-1 bioactivity, presumably as a result of their primary sequence disparities. The active inhibitory site containing the scissile bond is located in the loop between β-strands 4 and 5 in STI and ETI.The STIs belong to a superfamily that also contains the interleukin-1
proteins, heparin binding growth factors (HBGF) and histactophilin, all of which have very similar structures, but share no sequence similarity with
the STI family.
Small hydrophilic plant seed protein, conserved site
Type:
Conserved_site
Description:
This entry represents a conserved site in hydrophilic plant seed proteins that are structurally related:Arabidopsis thaliana proteins GEA1 and GEA6Cotton late embryogenesis abundant (LEA) protein D-19Carrot EMB-1 proteinBarley LEA proteins B19.1A, B19.1B, B19.3 and B19.4Maize late embryogenesis abundant protein Emb564Radish late seed maturation protein p8B6Rice embryonic abundant protein Emp1Sunflower 10 Kd late embryogenesis abundant protein (DS10)Wheat Em proteinsThese proteins may play a role in equipping the seed for survival, maintaining a minimal level of hydration in the dry organism and preventing the denaturation of cytoplasmic components [
,
]. They may also play a role during imbibition by controlling water uptake.
This is a family of plant seed-specific proteins identified in Arabidopsis thaliana (Mouse-ear cress). ATS3 (Arabidopsis thaliana seed gene 3) is expressed in a pattern similar to the Arabidopsis seed storage protein genes [
].
Members of this protein family are HemK, a protein once thought to be involved in heme biosynthesis but now recognised to be a protein-glutamine methyltransferase that modifies the peptide chain release factors [
]. All members of the seed alignment are encoded next to the release factor 1 gene (prfA) and confirmed by phylogenetic analysis. However, the family is diverse enough that even many members of the seed alignment do not score above the seed alignment, which was set high enough to exclude all instances of PrmB.
Secondary dormancy is an adaptive trait arising in previously non-dormant seeds due to unfavourable environmental conditions during germination. The SEED MATURATION PROTEIN1 (SMP1; AT3G12960) is involved in seed maturation and dormancy maintenance after high temperature fluctuation [
].
Late embryogenesis abundant protein, LEA_4 subgroup
Type:
Family
Description:
LEA (late embryogenesis abundant) proteins were first identified in land plants. Plant LEA proteins have been found to accumulate to high levels during the last stage of seed formation (when a natural desiccation of the seed tissues takes place) and during periods of water deficit in vegetative organs. Later, LEA homologues have also been found in various species [
,
]. They have been classified into several subgroups in Pfam and according to Bray and Dure [].This entry represents Pfam LEA_4, or D-7, D-29 from Dure. Proteins in this entry includes LEA3 from wheat [
,
], ECP63 and At3g53040 from Arabidopsis. Their function is not clear. However, ECP63 has been linked to BHLH109-mediated regulation of somatic embryogenesis, and At3g53040 may be involved in the re-establishment of desiccation tolerance in germinated seeds [
]. This entry also includes uncharacterised proteins from bacteria.
The entry represents a group of proteins mainly found in plants, including MUCI70 and TOD1 from Arabidopsis. They share a Rossmann-like fold found in glycosyltransferases.MUCI70 is a predicted glycosyltransferase essential for the accumulation of seed mucilage, a gelatinous wall rich in unbranched rhamnogalacturonan I (RG I), and for shaping the surface morphology of seeds [
,
]. Together with IRX14, itis required for xylan and pectin synthesis in seed coat epidermal (SCE) cells.TOD1 is an endoplasmic reticulum ceramidase that catalyses the hydrolysis of ceramides into sphingosine and free fatty acids at alkaline pH (e.g. pH 9.5) [
]. It is involved in the regulation of turgor pressure in guard cells and pollen tubes [,
].
The BURP domain was named after the proteins in which it was first identified: BNM2, USP, RD22, and PG1beta. It is found in the C terminus of a number of plant cell wall proteins, which are defined not only by the BURP domain, but also by the overall similarity in their modular construction. The BURP domain-containing proteins consists of either three or four modules: (i) an N-terminal hydrophobic domain - a presumptive transit peptide, joined to (ii) a short conserved segment or other short segment, (iii) an optional segment consisting of repeated units which is unique to each member, and (iv) the C-terminal BURP domain. Although the BURP domain proteins share primary structural features, their expression patterns and the conditions under which they are expressed differ. The presence of the conserved BURP domain in diverse plant proteins suggests an important and fundamental functional role for this domain []. It is possible that the BURP domain represents a general motif for localization of proteins within the cell wall matrix. The other structural domains associated with the BURP domain may specify other target sites for intermolecular interactions [].Some proteins known to contain a BURP domain are listed below [
,
,
]:Brassica protein BNM2, which is expressed during the induction of microspore embryogenesis.Field bean USPs, abundant non-storage seed proteins with unknown function.Soybean USP-like proteins ADR6 (or SALI5-4A), an auxin-repressible, aluminium-inducible protein and SALI3-2, a protein that is up-regulated by aluminium.Soybean seed coat BURP-domain protein 1 (SCB1). It might play a role in the differentiation of the seed coat parenchyma cells.Arabidopsis RD22 drought induced protein.Maize ZRP2, a protein of unknown function in cortex parenchyma.Tomato PG1beta, the beta-subunit of polygalacturonase isozyme 1 (PG1), which is expressed in ripening fruits.Cereal RAFTIN. It is essential specifically for the maturation phase of pollen development.
The BURP domain was named after the proteins in which it was first identified: BNM2, USP, RD22, and PG1beta. It is found in the C terminus of a number of plant cell wall proteins, which are defined not only by the BURP domain, but also by the overall similarity in their modular construction. The BURP domain-containing proteins consists of either three or four modules: (i) an N-terminal hydrophobic domain - a presumptive transit peptide, joined to (ii) a short conserved segment or other short segment, (iii) an optional segment consisting of repeated units which is unique to each member, and (iv) the C-terminal BURP domain. Although the BURP domain proteins share primary structural features, their expression patterns and the conditions under which they are expressed differ. The presence of the conserved BURP domain in diverse plant proteins suggests an important and fundamental functional role for this domain [
]. It is possible that the BURP domain represents a general motif for localization of proteins within the cell wall matrix. The other structural domains associated with the BURP domain may specify other target sites for intermolecular interactions [].Some proteins known to contain a BURP domain are listed below [
,
,
]:Brassica protein BNM2, which is expressed during the induction of microspore embryogenesis.Field bean USPs, abundant non-storage seed proteins with unknown function.Soybean USP-like proteins ADR6 (or SALI5-4A), an auxin-repressible, aluminium-inducible protein and SALI3-2, a protein that is up-regulated by aluminium.Soybean seed coat BURP-domain protein 1 (SCB1). It might play a role in the differentiation of the seed coat parenchyma cells.Arabidopsis RD22 drought induced protein.Maize ZRP2, a protein of unknown function in cortex parenchyma.Tomato PG1beta, the beta-subunit of polygalacturonase isozyme 1 (PG1), which is expressed in ripening fruits.Cereal RAFTIN. It is essential specifically for the maturation phase of pollen development.
Cereal seed allergen/grain softness/trypsin and alpha-amylase inhibitor
Type:
Family
Description:
The seeds of cereals contain numerous serine protease and alpha-amylase inhibitors. These inhibitors can be grouped into families based on structural similarities and many are described as seed allergens. This family of cereal (monocotyledon) allergens, trypsin/alpha-amylase inhibitors [
] belong to MEROPS inhibitor family I6, clan IJ. Some are known to be serine protease inhibitors, active against S1 peptidases () [
]. For some there is no direct evidence to suggest they any can inhibit serine peptidases and studies on the alpha-amylase inhibitor from Secale cereale (Rye) demonstrates no activity against trypsin, and illustrates the necessity of exercising caution in assigning function based on sequence comparisons [].They consists of proteins of about 120 amino acids which contain 10 cysteine residues, all of which are involved in disulphide bonds. Some of these inhibitors are specific to trypsin, others to alpha-amylase, and a few are bifunctional. The schematic representation of the structure of these inhibitors is shown below:
+----------------------------+
+----------+| +-+ || || | | |
xxCxxxxxxCxxxCxxxxxxCCxxxCxCxxxxxxxxxxxxxCxxxxxxxxCxxxxxxxCxxxx| | | |
| +---------------------------+ |+-------------------------------------------------------+
'C': conserved cysteine involved in a disulphide bond.The 3D structure of the bifunctional alpha-amylase/trypsin inhibitor (RBI) from seeds of Eleusine coracana (Indian finger millet) has been determined in solution using multidimensional 1H and 15N NMR spectroscopy [
]. The inhibitor forms a globular 4-helix motif with a simple 'up-and-down' topology, and includes a short anti-parallel β-sheet [].
Cereal seed allergen/trypsin and alpha-amylase inhibitor, conserved site
Type:
Conserved_site
Description:
The seeds of cereals contain numerous serine protease and alpha-amylase inhibitors. These inhibitors can be grouped into families based on structural similarities. This domain identifies sequences belonging to the cereal (monocotyledon) trypsin/alpha-amylase inhibitor family [
]. It includes those annotated solely as seed allergens or alpha-amyalse inhibitors []. Many belong to MEROPS inhibitor family I6, clan IJ. Some are known to be inhibit trypsin (an S1 peptidase, ) [
]. For some there is no direct evidence to suggest they any can inhibit trypsin or any other serine peptidase. Studies on the alpha-amylase inhibitor from Secale cereale (Rye) has demonstrated no activity against trypsin, and illustrates the necessity of exercising caution in assigning function based on sequence comparisons [
]. The cereal trypsin/alpha-amylase inhibitor family consists of proteins of about 120 amino acids which contain 10 cysteine residues, all of which are involved in disulphide bonds [
]. The schematic representation of the structure of these inhibitors is shown below:+----------------------------+
+----------+| +-+ || || | | |
xxCxxxxxxCxxxCxxxxxxCCxxxCxCxxxxxxxxxxxxxCxxxxxxxxCxxxxxxxCxxxx| | | |
| +---------------------------+ |+-------------------------------------------------------+
'C': conserved cysteine involved in a disulphide bond.The 3D structure of the bifunctional alpha-amylase/trypsin inhibitor (RBI) from seeds of Eleusine coracana (Indian finger millet) has been determined in solution using multidimensional 1H and 15N NMR spectroscopy [
]. The inhibitor forms a globular 4-helix motif with a simple 'up-and-down' topology, and includes a short anti-parallel β-sheet [].
This entry contains the plant LEA (late embryogenesis abundant) proteins, which are small hydrophilic plant seed proteins that are structurally related.
These proteins contains from 83 to 153 amino acid residues and may play a role[
,
] in equipping the seed for survival, maintaining a minimal level ofhydration in the dry organism and preventing the denaturation of cytoplasmic
components. They may also play a role during imbibition by controlling wateruptake.
This entry represents the conserved β-barrel fold of the 'cupin' superfamily ('cupa' is the Latin term for a small barrel). This family contains 11S and 7S plant seed storage proteins, and germins. Plant seed storage proteins provide the major nitrogen source for the developing plant [
,
,
].This domain can also be found as a central component of many microbial proteins including certain types of phosphomannose isomerase, polyketide synthase, epimerase, and dioxygenase [
].
LEA (late embryogenesis abundant) proteins were first identified in land plants. Plant LEA proteins have been found to accumulate to high levels during the last stage of seed formation (when a natural desiccation of the seed tissues takes place) and during periods of water deficit in vegetative organs. Later, LEA homologues have also been found in various species [
,
]. They have been classified into several subgroups in Pfam and according to Bray and Dure [].
Late embryogenesis abundant protein, SMP subgroup domain
Type:
Domain
Description:
LEA (late embryogenesis abundant) proteins were first identified in land plants. Plant LEA proteins have been found to accumulate to high levels during the last stage of seed formation (when a natural desiccation of the seed tissues takes place) and during periods of water deficit in vegetative organs. Later, LEA homologues have also been found in various species [,
]. They have been classified into several subgroups in Pfam and according to Bray and Dure [].This entry represents Pfam SMP, or D-34 from Dure, or group 6 from Bray.
C2H2 zinc finger proteins (ZFPs) constitute an abundant family of nucleic acid binding proteins in the genomes of higher and lower eukaryotes. This entry represents a group of plant ZFPs, including ZFP3, KNUCKLES and LATE FLOWERING from Arabidopsis. ZFP3 is a putative transcriptional regulator negatively regulating ABA suppression of seed germination in Arabidopsis. Together with other ZFPs, it regulates light and ABA responses during germination and early seedling development [
,
]. KNUCKLES plays an important role in the termination of floral meristem activity [] and LATE FLOWERING controls the transition to flowering [].
This is a family of late embryogenesis-abundant proteins There is high accumulation of this protein in dry seeds, and in the roots of full-grown plants in response to dehydration and ABA (abscisic acid application) treatments [
]. This LEA protein disappears after germination. It accumulates in growing regions of well irrigated hypocotyls and meristems suggesting a role in seedling growth resumption on rehydration []. As a group the LEA proteins are highly hydrophilic, contain a high percentage of glycine residues, lack Cys and Trp residues and do not coagulate upon exposure to high temperature, and for these reasons are considered to be members of a group of proteins called hydrophilins []. Expression of the protein is negatively regulated during etiolating growth, particularly in roots, in contrast to its expression patterns during normal growth [].
Late embryogenesis abundant protein, LEA_5 subgroup
Type:
Family
Description:
LEA (late embryogenesis abundant) proteins were first identified in land plants. Plant LEA proteins have been found to accumulate to high levels during the last stage of seed formation (when a natural desiccation of the seed tissues takes place) and during periods of water deficit in vegetative organs. Later, LEA homologues have also been found in various species [
,
]. They have been classified into several subgroups in Pfam and according to Bray and Dure [].This entry represents Pfam LEA_5, or D-19 from Dure, or group 1 from Bray. Proteins in this entry includes EM1 (At3g51810) and EM6 (At2g40170) from Arabidopsis [
]. This entry also includes some bacterial hydrophilins. Some proteins in this entry also contain the KGG motif ().
LEA (late embryogenesis abundant) proteins were first identified in land plants. Plant LEA proteins have been found to accumulate to high levels during the last stage of seed formation (when a natural desiccation of the seed tissues takes place) and during periods of water deficit in vegetative organs. Later, LEA homologues have also been found in various species [
,
]. They have been classified into several subgroups in Pfam and according to Bray and Dure [].This family includes Lea14-like proteins which contain a Water Stress and Hypersensitive response (WHy) domain, a region of unknown function found in several plant proteins involved in either the response to water stress or the response to bacterial infection []. This entry also includes WHy domain-containing proteins from bacteria. Their function is not clear.
Late embryogenesis abundant protein, LEA_1 subgroup
Type:
Family
Description:
LEA (late embryogenesis abundant) proteins were first identified in land plants. Plant LEA proteins have been found to accumulate to high levels during the last stage of seed formation (when a natural desiccation of the seed tissues takes place) and during periods of water deficit in vegetative organs. Later, LEA homologues have also been found in various species [,
]. They have been classified into several subgroups in Pfam and according to Bray and Dure [].This entry represents Pfam LEA_1, or D-113 from Dure, or group 4 from Bray. Proteins in this entry include LEA6, LEA18 and LEA46 from Arabidopsis. They may play roles in the adaptive process to water deficit in higher plants [
].
Late embryogenesis abundant protein, LEA_2 subgroup
Type:
Domain
Description:
LEA (late embryogenesis abundant) proteins were first identified in land plants. Plant LEA proteins have been found to accumulate to high levels during the last stage of seed formation (when a natural desiccation of the seed tissues takes place) and during periods of water deficit in vegetative organs. Later, LEA homologues have also been found in various species [
,
]. They have been classified into several subgroups in Pfam and according to Bray and Dure [].This entry represents Pfam LEA_2, or LEA14 (D-95) from Dure. The structure of Arabidopsis LEA14 has been revealed [
].
Late embryogenesis abundant protein, LEA_3 subgroup
Type:
Family
Description:
LEA (late embryogenesis abundant) proteins were first identified in land plants. Plant LEA proteins have been found to accumulate to high levels during the last stage of seed formation (when a natural desiccation of the seed tissues takes place) and during periods of water deficit in vegetative organs. Later, LEA homologues have also been found in various species [
,
]. They have been classified into several subgroups in Pfam and according to Bray and Dure [].This entry represents Pfam LEA_3, or LEA5 (D-73) from Dure. Proteins in this entry includes LEA-5 from Citrus sinensis [
], whose expression is induced by salt, drought and heat stress []. This entry also includes At4g02380 (SAG21), At1g02820 (LEA2), At3g53770 (LEA37) and At4g15910 (LEA41) from Arabidopsis [].
The family represents plant sugar transport proteins (STPs), also known as hexose transporters, including STP1-STP14 from Arabidopsis and Sugar transport proteins MST1-8 from Oryza sativa subsp. japonica [
]. They mediate the active uptake of hexoses such as glucose, 3-O-methylglucose, fructose, xylose, mannose, galactose, fucose, 2-deoxyglucose and arabinose, by sugar/hydrogen symport. Several STP family transporters are expressed in a tissue-specific manner, or at specific developmental stages. STP1 is the member with the highest expression level of all members and high expression is detected in photosynthetic tissues, such as leaves and stems, while roots, siliques, and flowers show lower expression levels. It plays a major role in the uptake and response of Arabidopsis seeds and seedlings to sugars [
,
].
Plant seed storage proteins, whose principal function appears to be the major
nitrogen source for the developing plant, can be classified, on the basis oftheir structure, into different families. 11S-type globulins are non-glycosylated proteins which form hexameric structures [
,
]. Each of the subunits in the hexamer is itself composed of an acidic and a basic chain derived from a single precursor and linked by a disulphide bond. This structure is shown in the followingrepresentation.
+-------------------------+
| |xxxxxxxxxxxCxxxxxxxxxxxxxxxxxxxxxxNGxCxxxxxxxxxxxxxxxxxxxxxxx
|------Acidic-subunit-------------||-----Basic-subunit------||-----------------About-480-to-500-residues-----------------|
'C': conserved cysteine involved in a disulphide bond.Members of the 11-S family include pea and broad bean legumins, oil seed rapecruciferin, rice glutelins, cotton beta-globulins, soybean glycinins, pumpkin
11-S globulin, oat globulin, sunflower helianthinin G3, etc.This family represents the precursor protein which is cleaved into
the two chains. These proteins contain two β-barrel domains.This family is a member of the 'cupin' superfamily on the
basis of their conserved barrel domain ('cupa' is the Latin termfor a small barrel).
This family includes mannan endo-1,4-beta-mannosidases from plants and fungi (). They catalyse the random hydrolysis of (1->4)-beta-D-mannosidic linkages in mannans, galactomannans and glucomannans and are crucial for depolymerization of seed galactomannans and wood galactoglucomannans [
,
]. The deconstruction of the plant cell wall has had increasing importance as a key biological process in the development of a sustainable biofuel industry []. In plants, they are related to the process of weakening the tissues surrounding the embryo during seed germination [].This entry also includes cellulase domain-containing proteins from bacteria.
Ginkbilobin-2 (Gnk2) is an antifungal protein found in the endosperm of Ginkgo
seeds, which inhibits the growth of phytopathogenic fungi such as Fusariumoxysporum. Gnk2 has considerable homology (~85%) to embryo-abundant proteins
(EAP) from the gymnosperms Picea abies and P. glauca. Plant EAP are expressedin the late stage of seed maturation and are involved in protection against
environmental stresses such as drought. The sequence of Gnk2 is also 28-31%identical to the extracellular domain of cysteine-rich receptor-like kinases
(CRK) from the angiosperm Arabidopsis. The CRK members are induced by pathogeninfection and treatment with reactive oxygen species or salicylic acid and are
involved in the hypersensitive reaction, which is a typical system ofprogrammed cell death. In addition, there are at least 60 genes in Arabidopsis
encoding the cysteine-rich secreted proteins (CRSP) with an Gnk2-homologousdomain. Therefore, the proteins with a Gnk2-homologous domain are regarded as
one of the largest protein superfamilies, although the role of the conservedGnk2-homologous domain remains unclear [
,
].The Gnk2-homologous domain is composed of two α-helices and a fivestranded β-sheet, which forms a compact single-domain architecture with an
alpha+β-fold. It contains a C-X(8)-C-X(2)-C motif.Cysteine residues form three intramolecular disulphide bridges: C1-C5, C2-C3,
and C4-C6 [].
Ginkbilobin-2 (Gnk2) is an antifungal protein found in the endosperm of Ginkgo
seeds, which inhibits the growth of phytopathogenic fungi such as Fusariumoxysporum. Gnk2 has considerable homology (~85%) to embryo-abundant proteins
(EAP) from the gymnosperms Picea abies and P. glauca. Plant EAP are expressed
in the late stage of seed maturation and are involved in protection againstenvironmental stresses such as drought. The sequence of Gnk2 is also 28-31%
identical to the extracellular domain of cysteine-rich receptor-like kinases(CRK) from the angiosperm Arabidopsis. The CRK members are induced by pathogen
infection and treatment with reactive oxygen species or salicylic acid and areinvolved in the hypersensitive reaction, which is a typical system of
programmed cell death. In addition, there are at least 60 genes in Arabidopsisencoding the cysteine-rich secreted proteins (CRSP) with an Gnk2-homologous
domain. Therefore, the proteins with a Gnk2-homologous domain are regarded asone of the largest protein superfamilies, although the role of the conserved
Gnk2-homologous domain remains unclear [,
].The Gnk2-homologous domain is composed of two α-helices and a five
stranded β-sheet, which forms a compact single-domain architecture with analpha+β-fold. It contains a C-X(8)-C-X(2)-C motif.
Cysteine residues form three intramolecular disulphide bridges: C1-C5, C2-C3,and C4-C6 [
].
Oleosins [
] are the proteinaceous components of plants' lipid storage bodiescalled oil bodies. Oil bodies are small droplets (0.2 to 1.5 mu-m in diameter)
containing mostly triacylglycerol that are surrounded by a phospholipid/oleosin annulus. Oleosins may have a structural role in stabilising the lipid
body during dessication of the seed, by preventing coalescence of the oil.They may also provide recognition signals for specific lipase anchorage in
lipolysis during seedling growth. Oleosins are found in the monolayer lipid/water interface of oil bodies and probably interact with both the lipid and
phospholipid moieties.Oleosins are proteins of 16 Kd to 24 Kd and are composed of three domains: an
N-terminal hydrophilic region of variable length (from 30 to 60 residues); acentral hydrophobic domain of about 70 residues and a C-terminal amphipathic
region of variable length (from 60 to 100 residues). The central hydrophobicdomain is proposed to be made up of β-strand structure and to interact with
the lipids []. It is the only domain whose sequenceis conserved.
Alpha-prolamins are the major seed storage proteins of species of the grass tribe Andropogonea. They are unusually rich in glutamine, proline, alanine, and leucine residues and their sequences show a series of tandem repeats presumed to be the result of multiple intragenic duplication [
]. In Zea mays (Maize), the 22kDa and 19kDa zeins are encoded by a large multigene family and are the major seed storage proteins accounting for 70% of the total zein fraction. Structurally the 22kDa and 19kDa zeins are composed of nine adjacent, topologically antiparallel helices clustered within a distorted cylinder. The 22kDa alpha-zeins are encoded by 23 genes [
]; twenty-two of the members are found in a roughly tandem array forming a dense gene cluster. The expressed genes in the cluster are interspersed with nonexpressed genes. Interestingly, some of the expressed genes differ in their transcriptional regulation. Gene amplification appears to be in blocks of genes explaining the rapid and compact expansion of the cluster during the evolution of maize.
Plant seed storage proteins, whose principal function appears to be the major
nitrogen source for the developing plant, can be classified, on the basis oftheir structure, into different families. 11-S are non-glycosylated proteins
which form hexameric structures [,
]. Each of the subunits in the hexamer isitself composed of an acidic and a basic chain derived from a single precursor
and linked by a disulphide bond. This structure is shown in the followingrepresentation.
+-------------------------+
| |xxxxxxxxxxxCxxxxxxxxxxxxxxxxxxxxxxNGxCxxxxxxxxxxxxxxxxxxxxxxx
|------Acidic-subunit-------------||-----Basic-subunit------||-----------------About-480-to-500-residues-----------------|
'C': conserved cysteine involved in a disulphide bond.Members of the 11-S family include pea and broad bean legumins, oil seed rapecruciferin, rice glutelins, cotton beta-globulins, soybean glycinins, pumpkin
11-S globulin, oat globulin, sunflower helianthinin G3, etc.This family represents the precursor protein which is cleaved into
the two chains. These proteins contain two β-barrel domains.This family is a member of the 'cupin' superfamily on the
basis of their conserved barrel domain ('cupa' is the Latin termfor a small barrel).
The signature pattern for this family includes the conserved cleavage site between the acidic and basic subunits (Asn-Gly) and a proximal cysteine residue which is involved in the inter-chain disulphide bond.
This entry represents a group of plant transcription factors, including ASIL1/2m FIP2 and ENAP1/2 from Arabidopsis. ASIL1 functions as a negative regulator of embryonic traits in seedlings and contributes to the maintenance of precise temporal control of seed filling []. ASIL2 is a trihelix transcription factor that acts downstream of miRNAs to repress the maturation program early in embryogenesis []. FIP2 is a FRI interacting protein. FRI up-regulates expression of the floral repressor FLOWERING LOCUS C (FLC) []. ENAP1 associates with chromatin regions associated with ethylene-responses, preserving this regions in the absence of ethylene, while in its presence, ENAP1 interacts with EIN2. This promotes histone acetylation mainly on H3K14 and H3K23, which positively regulates ethylene-responsive genes [,
].
Sugar transport protein STP/Polyol transporter PLT, plant
Type:
Family
Description:
This entry includes sugar transport protein STP and polyol transporter PLT from plants.
Sugar transport proteins (STPs), also known as hexose transporters, including STP1-STP14 from Arabidopsis and Sugar transport proteins MST1-8 from Oryza sativa subsp. japonica []. They mediate the active uptake of hexoses such as glucose, 3-O-methylglucose, fructose, xylose, mannose, galactose, fucose, 2-deoxyglucose and arabinose, by sugar/hydrogen symport. Several STP family transporters are expressed in a tissue-specific manner, or at specific developmental stages. STP1 is the member with the highest expression level of all members and high expression is detected in photosynthetic tissues, such as leaves and stems, while roots, siliques, and flowers show lower expression levels. It plays a major role in the uptake and response of Arabidopsis seeds and seedlings to sugars [,
].The plant Polyol transporter (PLT) subfamily includes PLT1-6 from Arabidopsis thaliana and similar transporters [
]. The best characterised member of the group is Polyol transporter 5, also called Sugar-proton symporter PLT5, which mediates the H-symport of numerous substrates including linear polyols (such as sorbitol, xylitol, erythritol or glycerol), cyclic polyol myo-inositol, and different hexoses, pentoses (including ribose), tetroses, and sugar alcohols. It functions to transport a wide range of substrates into specific sink tissues in the plant [,
]. The PLT subfamily belongs to the Glucose transporter-like (GLUT-like) family of the Major Facilitator Superfamily (MFS) of membrane transport proteins. MFS proteins are thought to function through a single substrate binding site, alternating-access mechanism involving a rocker-switch type of movement [].
LEA (late embryogenesis abundant) proteins were first identified in land plants. Plant LEA proteins have been found to accumulate to high levels during the last stage of seed formation (when a natural desiccation of the seed tissues takes place) and during periods of water deficit in vegetative organs. Later, LEA homologues have also been found in various species [
,
]. They have been classified into several subgroups in Pfam and according to Bray and Dure [].Dehydrin has been classified as part of the LEA family (D-11 from Dure, or group 2 from Bray) [
]. Dehydrins contribute to freezing stress tolerance in plants and it was suggested that this could be partly due to their protective effect on membranes [].Dehydrins share a number of structural features. One of the most notable
features is the presence, in their central region, of a continuous run offive to nine serines followed by a cluster of charged residues. Such a region
has been found in all known dehydrins so far with the exception of peadehydrins. A second conserved feature is the presence of two copies of a
lysine-rich octapeptide; the first copy is located just after the clusterof charged residues that follows the poly-serine region and the second copy
is found at the C-terminal extremity.
This domain is named after Bet v 1, the major birch pollen allergen. Bet v 1 belongs to family 10 of plant pathogenesis-related proteins (PR-10), cytoplasmic proteins of 15-17 kd that are wide-spread among dicotyledonous plants [
]. In recent years, a number of diverse plant proteins with low sequence similarity to Bet v 1 was identified. A classification by sequence similarity yielded several subfamilies related to PR-10 []:Pathogenesis-related proteins PR-10: These proteins were identified as major tree pollen allergens in birch and related species (hazel, alder), as plant food allergens expressed in high levels in fruits, vegetables and seeds (apple, celery, hazelnut), and as pathogenesis-related proteins whose expression is induced by pathogen infection, wounding, or abiotic stress. Hyp-1 (
), an enzyme involved in the synthesis of the bioactive naphthodianthrone hypericin in St. John's wort (Hypericum perforatum) also belongs to this family. Most of these proteins were found in dicotyledonous plants. In addition, related sequences were identified in monocots and conifers.
Cytokinin-specific binding proteins: These legume proteins bind cytokinin plant hormones [].(S)-Norcoclaurine synthases are enzymes catalysing the condensation of dopamine and 4-hydroxyphenylacetaldehyde to (S)-norcoclaurine, the first committed step in the biosynthesis of benzylisoquinoline alkaloids such as morphine [
]. Major latex proteins and ripening-related proteins are proteins of unknown biological function that were first discovered in the latex of opium poppy (Papaver somniferum) and later found to be upregulated during ripening of fruits such as strawberry and cucumber [
]. The occurrence of Bet v 1-related proteins is confined to seed plants with the exception of a cytokinin-binding protein from the moss Physcomitrella patens (
).
Plant cells contain proteins, called non-specific lipid-transfer proteins (nsLTPs) [
,
,
,
], which transfer phospholipids, glycolipids, fatty acids and sterols from liposomes or microsomes to mitochondria [] and are thought to be involved in plant defense. These proteins, expressed throughout the plant tissues but predominantly found in edible parts [], could play a major role in membrane biogenesis by conveying phospholipids such as waxes or cutin from their site of biosynthesis to membranes unable to form these lipids. LTPs exist in animal and plant tissues, including rat liver cytosol, potato tuber, castor bean, maize seedlings, spinach, barley and wheat. While there is no sequence similarity between animal and plant LTPs, similarity between the plant proteins is high. Plant LTPs are proteins of about 9 Kd (90 amino acids), containing eight conserved cysteine residues forming 4 disulphide bridges which allow to form a stable, compact barrel-like structure, which is essential for lipid binding []. Plant TLPs are also similar to alpha-amylase inhibitor I2 from the seeds of Indian finger millet and amylase/protease inhibitors from rice and barley.Some of the proteins in this family are allergens. Allergies are hypersensitivity reactions of the immune system to specific substances called allergens (such as pollen, stings, drugs, or food) that, in most people, result in no symptoms. A nomenclature system has been established for antigens (allergens) that cause IgE-mediated atopic allergies in humans [WHO/IUIS Allergen Nomenclature Subcommittee King T.P., Hoffmann D., Loewenstein H., Marsh D.G., Platts-Mills T.A.E., Thomas W. Bull. World Health Organ. 72:797-806(1994)]. This nomenclature system is defined by a designation that is composed of the first three letters of the genus; a space; the first letter of the species name; a space and an arabic number. In the event that two species names have identical designations, they are discriminated from one another by adding one or more letters (as necessary) to each species designation.The allergens in this family include allergens with the following designations: Par j 1 and Par j 2.
Ricin is a legume lectin from the seeds of the castor bean plant,
Ricinus communis. The seeds are poisonous to people, animals and insects and just one milligram of ricin can kill an adult. Primary structure analysis has shown the presence of a similar domain in many carbohydrate-recognition proteins like plant and bacterial AB-toxins, glycosidases or proteases [
,
,
]. This domain, known as the ricin B lectin domain, can be present in one or more copies and has been shown in some instance to bind simple sugars, such as galactose or lactose.The ricin B lectin domain is composed of three homologous subdomains of 40 amino acids (alpha, beta and gamma) and a linker peptide of around 15 residues (lambda). It has been proposed that the ricin B lectin domain arose by gene triplication from a primitive 40 residue galactoside-binding peptide [
,
]. The most characteristic, though not completely conserved, sequence feature is the presence of a Q-W pattern. Consequently, the ricin B lectin domain as also been refered as the (QxW)3domain and the three homologous regions as the QxW repeats [
,
]. A disulphide bond is also conserved in some of the QxW repeats [].The 3D structure of the ricin B chain has shown that the three QxW repeats pack around a pseudo threefold axis that is stabilised by the lambda linker [
]. The ricin B lectin domain has no major segments of a helix or β-sheet but each of the QxW repeats contains an ω-loop []. An idealized ω-loop is a compact, contiguous segment of polypeptide that traces a 'loop-shaped' path in three-dimensional space; the main chain resembles a Greek omega.
Ricin is a legume lectin from the seeds of the castor bean plant,
Ricinus communis. The seeds are poisonous to people, animals and insects and just one milligram of ricin can kill an adult. Primary structure analysis has shown the presence of a similar domain in many carbohydrate-recognition proteins like plant and bacterial AB-toxins, glycosidases or proteases [
,
,
]. This domain, known as the ricin B lectin domain, can be present in one or more copies and has been shown in some instance to bind simple sugars, such as galactose or lactose.The ricin B lectin domain is composed of three homologous subdomains of 40 amino acids (alpha, beta and gamma) and a linker peptide of around 15 residues (lambda). It has been proposed that the ricin B lectin domain arose by gene triplication from a primitive 40 residue galactoside-binding peptide [
,
]. The most characteristic, though not completely conserved, sequence feature is the presence of a Q-W pattern. Consequently, the ricin B lectin domain as also been refered as the (QxW)3domain and the three homologous regions as the QxW repeats [
,
]. A disulphide bond is also conserved in some of the QxW repeats [].The 3D structure of the ricin B chain has shown that the three QxW repeats pack around a pseudo threefold axis that is stabilised by the lambda linker [
]. The ricin B lectin domain has no major segments of a helix or β-sheet but each of the QxW repeats contains an ω-loop []. An idealized ω-loop is a compact, contiguous segment of polypeptide that traces a 'loop-shaped' path in three-dimensional space; the main chain resembles a Greek omega.
Diacylglycerol O-acyltransferase 1 (DGAT1) catalyses the final step in triacylglycerol synthesis by using diacylglycerol and fatty acyl CoA as substrates. In plants, diacylglycerol O-acyltransferase 1 (DGAT1, TAG1) is a major enzyme for oil accumulation in seeds. It has complementary functions with PDAT1 acyltransferase that are essential for triacylglycerol synthesis and normal development of both seeds and pollen [
,
,
,
,
].In mammals, DGAT1 is a multifunctional acyltransferase capable of synthesizing diacylglycerol, retinyl, and wax esters in addition to triacylglycerol [
]. In liver, it plays a role in esterifying exogenous fatty acids to glycerol []. It also functions as the major acyl-CoA retinol acyltransferase in the skin, where it acts to maintain retinoid homeostasis and prevent retinoid toxicity leading to skin and hair disorders [].
This family is composed of root UVB sensitive proteins and their homologues. In Arabidopsis thaliana, proteins in this family are involved in UVB-sensing and in early seedling morphogenesis and development [
].
This family represents a group of plant E3 ubiquitin protein ligases, including SIS3 (Protein SUGAR INSENSITIVE 3) from Arabidopsis. SIS3 is a RING finger protein involved in a ubiquitination pathway that acts as positive regulator of sugar signalling during early seedling development [
].
Late embryogenesis abundant proteins (LEAs) are late embryonic proteins abundant in higher plant seed embryos and their function is not known. This entry includes LEA1/2 from Cicer arietinum (Chickpea) and LEAD7 from Gossypium hirsutum (Upland cotton).
This entry represents a group of WD and tetratricopeptide repeats proteins, including DCAF5/6/8 and DCAF8L1/DCAF8L2/WDTC1 from humans. They may function as a substrate receptor for CUL4-DDB1 E3 ubiquitin-protein ligase complex [
].Proteins in this entry also include protein ALTERED SEED GERMINATION 2 (ASG2) from Arabidopsis. It is a negative regulator of ABA signalling [
].
Protein BIG BROTHER is an E3 ubiquitin-protein ligase that limits organ size, and possibly seed size, in a dose-dependent manner. It may limit the duration of organ growth and ultimately organ size by actively degrading critical growth stimulators [].
This family includes human protein GLE1 (
) and its homologues. This protein is localised at the nuclear pore complexes and functions in poly(A)+ RNA export to the cytoplasm [
,
]. In Arabidopsis, it is required for seed viability [].
Members of this family belong to a clade of helix-turn-helix DNA-binding proteins. Members are similar in sequence to the HipB protein of Escherichia coli. Genes for members of the seed alignment for this protein family were found to be closely linked to genes encoding proteins related to HipA. The HibBA operon appears to have some features in common with toxin-antitoxin post-segregational killing systems.
This entry includes RFI2, IPI1 and related proteins. In Arabidopsis RFI2 is an E3 ubiquitin-protein ligase that mediates phytochrome (phyA and phyB)-controlled seedling deetiolation responses such as hypocotyl elongation in response to red and far-red light [,
]. In rice IPI1 affects plant architecture through precisely tuning IPA1 protein levels in different tissues [].
This entry includes a group of fungal and plant proteins, including Sip5 from budding yeasts and DA2 from Arabidopsis. The function of Sip5 is not clear. It interacts with both the Reg1/Glc7 protein phosphatase and the Snf1 protein kinase [
]. DA2 (At1g78420) is an E3 ubiquitin-protein ligase involved in the regulation of organ and seed size [
,
]. This entry also includes DA2-like proteins such as At1g17145 and GW2 [].
This domain has a four-helix bundle structure. It contains four disulfide bonds, of which three function to keep the C- and N-terminal parts of the molecule in place [
].
Alpha-Amylase Inhibitors (AAI), Lipid Transfer (LT) and Seed Storage (SS) Protein
Type:
Family
Description:
This entry represents a protein family unique to higher plants that includes cereal-type alpha-amylase inhibitors [
], lipid transfer proteins [], seed storage proteins, and similar proteins []. Proteins in this family are known to play important roles, in defending plants from insects and pathogens, lipid transport between intracellular membranes, and nutrient storage []. Many proteins of this family have been identified as allergens in humans []. These proteins contain a common pattern of eight cysteines that form four disulfide bridges.
This entry represents a group of WD repeat-containing proteins, including WDR89 from humans and GTS1 from Arabidopsis. Human WD repeat-containing protein 89 (WDR89) is an uncharacterized protein containing six WD repeats. GTS1 (also known as protein GIGANTUS 1) is highly expressed during embryo development and involved in the control of plant growth development by acting as a negative regulator of seed germination, cell division in meristematic regions, plant growth and overall biomass accumulation [
].
This entry represents a domain found twice in the protein DUF642 L-galactono-1,4-lactone-responsive gene 2 from Arabidopsis thaliana (DGR2) and similar proteins found in plants and bacteria. DGR2 is involved in the regulation of testa rupture during seed germination [
] and in the development of roots and rosettes []. This domain was previously known as DUF642.
Bifunctional inhibitor/plant lipid transfer protein/seed storage helical domain
Type:
Domain
Description:
This entry represents a structural domain consisting of 4-helices with a folded leaf topology, and forming a right-handed superhelix. This domain occurs in several proteins, including:Plant lipid-transfer proteins, such as the non-specific lipid-transfer proteins ns-LTP1 and ns-LTP2 [
,
].Proteinase/alpha-amylase inhibitors, such as trypsin/alpha-amylase inhibitor RBI from Eleusine coracana (Indian finger millet) [
] and Hageman factor/amylase inhibitor from Zea mays (Maize) [].Seed storage proteins, such as napin from Brassica napus (Rape) [
] and 2S albumin from Ricinus communis (Castor bean) [].
Heat shock factor binding protein 1 (HSBP1) interacts with the oligomerization domain of heat shock factor 1 (Hsf1), suppressing Hsf1's transcriptional activity following stress. It plays an essential role during early mouse and zebrafish embryonic development [
]. In the plant Arabidopsis, heat shock factor-binding protein (HSBP) is required for acquired thermotolerance but not basal thermotolerance [] and for seed development [].
Bifunctional inhibitor/plant lipid transfer protein/seed storage helical domain superfamily
Type:
Homologous_superfamily
Description:
This entry represents a superfamily of structural domains consisting of 4-helices with a folded leaf topology, and forming a right-handed superhelix. This domain occurs in several proteins, including:Plant lipid-transfer proteins, such as the non-specific lipid-transfer proteins ns-LTP1 and ns-LTP2 [
,
].Proteinase/alpha-amylase inhibitors, such as trypsin/alpha-amylase inhibitor RBI from Eleusine coracana (Indian finger millet) [
] and Hageman factor/amylase inhibitor from Zea mays (Maize) [].Seed storage proteins, such as napin from Brassica napus (Rape) [
] and 2S albumin from Ricinus communis (Castor bean) [].
The ISWI chromatin remodeling complexes are widely present in eukaryotic species. This entry represents the plant ISWI binding proteins, the DDT domain proteins, including RLT1 and RLT2 from Arabidopsis [
]. AtISWI physically interacts with RLTs, and this prevents plants from activating the vegetative-to-reproductive transition early by regulating several key genes that contribute to flower timing []. RLT2 may also be involved in the transcriptional regulation of endogenous seed genes [].
This family represents the plant-specific proteins PSI 1-3 (Protein PSK SIMULATOR 1-3) from Arabidopsis. These proteins promote plant growth, especially at the vegetative stage, probably via the regulation of phytosulfokine (PSK) signalling; PSKs are peptide phytohormones acting as growth factors [
]. They are required during vegetative growth and reproduction, PSI1 acts predominantly at the seedling stage whereas PSI2 and 3 are required for vegetative growth where they act synergistically. PSI1 and 3 may also have a function in carbohydrate metabolism [].
CROWDED NUCLEI (CRWN) proteins, also known as LITTLE NUCLEI, are important architectural components of plant nuclei and are thought to play diverse roles in heterochromatin organization and the control of nuclear morphology [
,
]. CRWN proteins regulate ABA-controlled seed germination by modulating the degradation of protein ABI5. CRWN3 has been shown to colocalize with ABI5 in nuclear bodies, where it might participate in its degradation [].Proteins in this entry also includes NMCP1A/B from rice. NMCP1B, also known as OsNMCP1, regulates drought resistance and root growth through chromatin accessibility [
].
This entry represents the GEM and GEM-like (GER 1-8) proteins from plants. They contain a GRAM domain. GEM binds to phospholipids and its expression can be regulated by ABA [
]. GER5 has been shown to be involved in seed development and inflorescence architecture [].
This entry includes beta-glucuronosyltransferase GlcAT14A/B/C from Arabidopsis. They belong to the glycosyltransferase family 14 (GT14) [
]. GlcAT14A is involved in the biosynthesis of type II arabinogalactan (AG) and has a role in cell elongation during seedling growth [].This entry also includes uncharacterised bacterial proteins.
This is an N-terminal domain that is found adjacent to the patatin/phospholipase A2-related domain (see
) in a group of proteins. Proteins containing this domain include Tgl3/4/5 from Saccharomyces cerevisiae. Tgl3 is a bifunctional triacylglycerol lipase and lysophospholipid acyltransferase [
]. Tgl4/5 are multifunctional lipase/hydrolase/phospholipases [,
]. They are generally involved in triacylglycerol mobilisation and localized to lipid particles [,
].This entry also includes plant sugar-dependent 1 (SDP1) protein, which is a triacylglycerol lipase that initiates storage oil breakdown in germinating Arabidopsis seeds [
,
].
This entry represents a domain found in the early nodulin-like protein (OsENODL1) from Oryza sativa and similar proteins. OsENODL1 belongs to the phytocyanin family of blue copper proteins, a ubiquitous family of plant cupredoxins. Phytocyanin is involved in electron transfer reactions with the Cu centre transitioning between the oxidized Cu(II) form and the reduced Cu(I) form. OsENODL1 expression occurs specifically at the late developmental stage of the seeds. Members of this subgroup appear to have lost the T1 copper binding site [
,
,
].
Zinc finger protein 830 (ZNF830; also known as coiled-coil domain-containing protein 16 or CCDC16) contains a C2H2-type zinc finger and a coiled-coil region. It is a component of the XAB2 complex, which binds RNA [
], and a component of a pentameric intron-binding complex that is pre-assembled and then incorporated into the spliceosome [].This entry also includes protein ABA AND ROS SENSITIVE 1 (ARS1) from Arabidopsis. It is essential for seed germination and ROS homeostasis in response to ABA and oxidative stress [
].
Nuclear pore complex protein NUP214 (also known as Protein LONO1 and Protein EMBRYO DEFECTIVE 1011) is a component of the nuclear pore complex in Arabidopsis, required for mature mRNA export from the nucleus to the cytoplasm, essential for normal embryogenesis and seed viability [
]. It is important during early embryogenesis, being involved in the first asymmetrical cell division of the zygote and regulates the number and planes of cell divisions required for generating the normal embryo proper and suspensor, apical-basal axis, cotyledons and meristem [,
].
This region is found in plant seed storage proteins, N-terminal to the Cupin domain (
). In Macadamia integrifolia (Macadamia nut) (
), this region is processed into peptides of approximately 50 amino acids containing a C-X-X-X-C-(10-12)X-C-X-X-X-C motif. These peptides exhibit antimicrobial activity
in vitro[
].
Basic leucine-zipper (bZIP) proteins are found in eukaryotes. They are typically between 174 and 411 amino acids in length. Various bZIP proteins have been found and shown to play a role in seed-specific gene expression. bZIP binds to the alpha-globulin gene promoter, but not to promoters of other major storage genes such as glutelin, prolamin and albumin [
]. This entry represents a C-terminal domain found in bZIP proteins. It is found in association with
. There is a conserved KVK sequence motif and a single completely conserved residue K that may be functionally important.
The plant augmin complex is involved in assembly of microtubules (MT) arrays during mitosis and contains eight subunits (AUG1 -AUG8). Among them, AUG1 to AUG6 share similarity with their animal counterparts, but AUG7 and AUG8 share homology only with proteins of plant origin [
]. AUG8 belongs to the plant QWRF motif-containing protein family, which also includes microtubule-associated protein ENDOSPERM DEFECTIVE 1 [,
] and SNOWY COTYLEDON 3 []. AUG8 binds the microtubule and participates in the reorientation of microtubules in hypocotyls (the stem of a germinating seedling) [].
This entry represents the DNA-binding domain of EIN3 and EIN3-like proteins.
Ethylene insensitive 3 (EIN3) proteins are a family of plant DNA-binding proteins that regulate transcription in response to the gaseous plant hormone ethylene, and are essential for ethylene-mediated responses [
]. In the presence of ethylene, dark-grown dicotyledonous seedlings undergo dramatic morphological changes collectively known as the 'triple response'. In Arabidopsis, these changes consist of a radial swelling of the hypocotyl, an exaggeration in the curvature of the apical hook, and the inhibition of cell elongation in the hypocotyl and root [,
].
Bacteriocin, class IIb, lactobin A/cerein 7B family
Type:
Family
Description:
Members of this protein family are described variably as bacteriocins per se, one chain of a two-chain bacteriocin, or bacteriocin enhancer proteins. All members of the seed alignment occur in paired gene contexts with another member of the same protein family. This family includes bacteriocins that appear not to undergo post-translational modification, other than cleavage at a Gly-Gly motif coupled to sec-independent export. For many members, the N-terminal bacteriocin cleavage motif region is recognised by
. C-terminal to the cleavage motif, these proteins are hydrophobic and low in complexity, consistent with pore-forming activity as a mechanism of bacteriocin action.
Ethylene insensitive 3 (EIN3) proteins are a family of plant DNA-binding proteins that regulate transcription in response to the gaseous plant hormone ethylene, and are essential for ethylene-mediated responses [
]. In the presence of ethylene, dark-grown dicotyledonous seedlings undergo dramatic morphological changes collectively known as the 'triple response'. In Arabidopsis, these changes consist of a radial swelling of the hypocotyl, an exaggeration in the curvature of the apical hook, and the inhibition of cell elongation in the hypocotyl and root [,
].
Members of this protein family are GNAT family acetyltransferases, based on a seed alignment in which every member is associated with a lysine 2,3-aminomutase family protein, usually as the adjacent gene. This family includes AblB, the enzyme beta-lysine acetyltransferase that completes the two-step synthesis of the osmolyte (compatible solute) N-epsilon-acetyl-beta-lysine; all members of the family may have this function [
]. Note that N-epsilon-acetyl-beta-lysine has been observed only in methanogenic archaea (e.g. Methanosarcina) but that this model, paired with Lysine-2,3-aminomutase (), suggests a much broader distribution.
In Brassica species, S locus glycoproteins (SLGs) are involved in self-incompatibility, a mechanism that regulates the acceptance or rejection of pollen to prevent self-fertilization [
,
]. SLGs accumulate in papillar cells of the stigma, at the site where inhibition of self-pollen tube development has been shown to occur [].This family also includes related proteins from Arabidopsis and epidermis-specific secreted glycoprotein EP1 from carrot, which has sequence homology to Brassica SLGs. EP1 is not involved in self-incompatibility, and its expression in seedlings rules out an specific role in pollination. Its role is not clear [
].
Proteins containing this domain include protein DA1 and its homologues.
In Arabidopsis thaliana, DA1 is a ubiquitin-activated endopeptidase that limits final seed and organ size by restricting the period of cell proliferation [
]. DA1 is activated by the RING E3 ligases Big Brother and DA2, both of which are then inactivated by cleavage by the active peptidase. DA1 also cleaves and inactivates deubiquitinase UBP15 and transcription factors TCP15 and TCP22, all of which promote cell proliferation. Presence of an HEXXH motif, which when mutated leads to inactivity, suggests that DA1 is a metalloendopeptidase [].
This entry represents a group of plant proteins, including ANGUSTIFOLIA (AN) from Arabidopsis. AN is related to C-terminal binding protein/brefeldin A ADP-ribosylated substrate (CtBP/BARS) with an important role in animal development. It plays a major role in microtubule-dependent cell morphogenesis. The phenotype of the AN mutants include narrow cotyledons, narrow rosette leaves, twisted seed pods (siliques) and less-branched trichome. Moreover, it has been shown to be involved in a wide range of biological processes, including facilitating both abiotic and biotic stress tolerance through ROS-mediated redox activity [
].
Members of this protein family are putative putative glutamate/gamma-aminobutyrate antiporters. Each member of the seed alignment is found adjacent to a glutamate decarboxylase, which converts glutamate (Glu) to gamma-aminobutyrate (GABA). However, the majority belong to genome contexts with a glutaminase (converts Gln to Glu) as well as the decarboxylase that converts Glu to GABA. The specificity of the transporter remains uncertain.
This entry is mostly composed of known or predicted PIN proteins from plants, though some homologous prokaryotic proteins are also included. The PIN proteins are components of auxin efflux systems from plants. These carriers are saturable, auxin-specific, and localized to the basal ends of auxin transport-competent cells [
,
]. Plants typically posses several of these proteins, each displaying a unique tissue-specific expression pattern. They are expressed in almost all plant tissues including vascular tissues and roots, and influence many processes including the establishment of embryonic polarity, plant growth, apical hook formation in seedlings and the photo- and gravitrophic responses. These plant proteins are typically 600-700 amino acyl residues long and exhibit 8-12 transmembrane segments.
Barwin is a basic protein isolated from aqueous extracts of barley seeds. It is
125 amino acids in length, and contains six cysteine residues that combine to formthree disulphide bridges [
,
]. Comparative analysis shows the sequence to be highly similar to a 122 amino acid stretch in the C-terminal of the products of two wound-induced genes (win1 and win2) from potato, the product of the hevein gene of rubber trees, and pathogenesis-related protein 4 from tobacco. The high levels of similarity to these proteins, and their ability to bind saccharides, suggest that the barwin domain may be involved in a common defence mechanism in plants.
Barwin is a basic protein isolated from aqueous extracts of barley seeds. It is
125 amino acids in length, and contains six cysteine residues that combine to formthree disulphide bridges [
,
]. Comparative analysis shows the sequence to be highly similar to a 122 amino acid stretch in the C-terminal of the products of two wound-induced genes (win1 and win2) from potato, the product of the hevein gene of rubber trees, and pathogenesis-related protein 4 from tobacco. The high levels of similarity to these proteins, and their ability to bind saccharides, suggest that the barwin domain may be involved in a common defence mechanism in plants.
Certain Gram-negative bacteria express proteins that enable them to promote nucleation of ice at relatively high temperatures (above -5C) [
,
]. These proteins are localised at the outer membrane surface and can cause frost damage to many plants. The primary structure of the proteins contains a highly repetitive domain that dominates the sequence. The domain comprises a number of 48-residue repeats, which themselves contain 3 blocks of 16 residues, the first 8 of which are identical. It is thought that the repetitive domain may be responsible for aligning water molecules in the seed crystal.[.........48.residues.repeated.domain..........]
/ / | | \ \AGYGSTxTagxxssli AGYGSTxTagxxsxlt AGYGSTxTaqxxsxlt
[16.residues...][16.residues...] [16.residues...]
This entry includes a group of VQ motif-containing proteins from plants, including VQ5/9/14 from Arabidopsis. VQ14, also known as HAIKU1, regulates endosperm growth and seed size in Arabidopsis [
].In general, Arabidopsis VQPs interacted specifically with the C-terminal WRKY domains of group I and the sole WRKY domains of group IIc WRKY transcription factors [
,
]. Arabidopsis VQPs reported to control stress responses include the calmodulin (CaM)-binding protein CamBP25 and VQ9, which regulate osmotic and salinity tolerance, respectively, the sigma factor binding proteins SIB1 and SIB2, which act as activators of WRKY33 in plant defence, and the negative regulator of the jasmonate defence pathway [].
This entry includes animal NOSIP (nitric oxide synthase-interacting protein) and plant CSU1. They are ubiquitin E3 ligases [
,
]. Human NOSIP negatively regulates nitric oxide production by inducing NOS1 and NOS3 translocation to actin cytoskeleton and inhibiting their enzymatic activity [,
,
].Arabidopsis CSU1 plays a important role in maintaining COP1 homeostasis by targeting COP1 for ubiquitination and degradation in dark-grown seedlings [
].
EMBRYONIC FLOWER1 (EMF1) is a plant-specific protein that encodes a transcriptional regulator. It is involved in plant Polycomb-mediated gene repression and also targets flower homeotic genes directly [
,
,
]. EMF1 regulates developmental phase transitions, as well as specifies cell fates, during vegetative development [,
,
]. It also regulates additional gene programs, including photosynthesis, seed development, hormone, stress, and cold signalling [].
This entry represents a group of BTB/POZ and MATH domain-containing proteins mostly from plants, including BPM1-6 from Arabidopsis. They are part of the Cullin E3 ubiquitin ligase complexes and are known to bind at least three families of transcription factors: ERF/AP2 class I, homeobox-leucine zipper and R2R3 MYB. BPMs play an important role in plant flowering, seed development and abiotic stress response [
].
Mediator of RNA polymerase II transcription subunit 15a/b/c-like
Type:
Family
Description:
MED15 is a component of the Mediator complex, a coactivator involved in transcriptional activation. MED15 is part of the tail module of this complex and is involved in diverse metabolism and developmental pathways. In Arabidopsis although five paralogs have been reported, only three of them (AtMed15a, b and c) have expression data [
]. AtMed15a (At1g15780) is expressed in several tissues including root, leaf, flower and seeds. AtMed15a has a KIX domain at its N terminus, which interacts with different proteins such as transcription factors, single strand nucleic acid-binding proteins and splicing factors to mediate their transcriptional responses [].
This entry represents a group of RING finger proteins from animals and plants, including Unkempt and related proteins.
Unkempt is an evolutionary conserved RNA-binding protein that regulates translation of its target genes and is required for the establishment of the early bipolar neuronal morphology. It carries six CCCH zinc fingers (Znfs) forming two compact clusters, Znf1-3 and Znf4-6, that recognise distinct trinucleotide RNA substrates. These clusters, recognise an unexpectedly short stretch of RNA sequence-only three consecutive ribonucleotides-with a varying degree of specificity. Znf1-3 binds to the UUA motif of RNA substrates [].Proteins in this entry also include Zinc finger CCCH domain-containing proteins from Arabidopsis. They are involved in regulating stress responses [
], light-dependent seed germination [] and embryogenesis [].
This entry represents a mostly uncharacterised family of membrane transport proteins found in eukaryotes, bacteria and archaea. Most characterised members of this family are the PIN components of auxin efflux systems from plants. These carriers are saturable, auxin-specific, and localized to the basal ends of auxin transport-competent cells [
,
]. Plants typically posses several of these proteins, each displaying a unique tissue-specific expression pattern. They are expressed in almost all plant tissues including vascular tissues and roots, and influence many processes including the establishment of embryonic polarity, plant growth, apical hook formation in seedlings and the photo- and gravitrophic responses. These plant proteins are typically 600-700 amino acyl residues long and exhibit 8-12 transmembrane segments.
Members of this protein family are radical SAM family enzymes, maturases that prepare the oxygen-sensitive radical required in the active site of anaerobic sulphatases. This maturase role has led to many misleading legacy annotations suggesting that this enzyme maturase is instead a sulphatase regulatory protein. All members of the seed alignment are radical SAM enzymes encoded next to or near an anaerobic sulphatase. Note that a single genome may encode more than one sulphatase/maturase pair.Proteins in this entry include ChuR from Bacteroides thetaiotaomicron and AnSME (also known as CPF_0616) from Clostridium perfringens. ChuR is involved in 'Ser-type' sulfatase maturation under anaerobic conditions [
], while AnSME is involved in 'Cys-type' sulfatase maturation under anaerobic conditions [].
Proteins in this entry are 3-carboxy-cis,cis-muconate cycloisomerases, which catalyse the second step in the protocatechuate degradation to beta-ketoadipate and then to succinyl-CoA and acetyl-CoA. 4-hydroxybenzoate, 3-hydroxybenzoate, and vanillate can all be converted in one step to protocatechuate. All members of the seed alignment for this entry were chosen from within protocatechuate degradation operons of at least three genes of the pathway, and from genomes with the complete pathway through beta-ketoadipate [
].
Thionins are small, basic plant proteins, 45 to 50 amino acids in length, which include three or four conserved disulphide linkages. The proteins are toxic to animal cells, presumably attacking the cell membrane and rendering it permeable: this results in the inhibition of sugar uptake and allows potassium and phosphate ions, proteins, and nucleotides to leak from cells [
]. Thionins are mainly found in seeds where they may act as a defence against consumption by animals. A barley (Hordeum vulgare) leaf thionin that is highly toxic to plant pathogens and is involved in the mechanism of plant defence against microbial infections has also been identified []. The hydrophobic protein crambin from the Abyssinian kale (Crambe abyssinica) is also a member of the thionin family [].
Thionins are small, basic plant proteins, 45 to 50 amino acids in length, which include three or four conserved disulphide linkages. The proteins are toxic to animal cells, presumably attacking the cell membrane and rendering it permeable: this results in the inhibition of sugar uptake and allows potassium and phosphate ions, proteins, and nucleotides to leak from cells [
]. Thionins are mainly found in seeds where they may act as a defence against consumption by animals. A barley (Hordeum vulgare) leaf thionin that is highly toxic to plant pathogens and is involved in the mechanism of plant defence against microbial infections has also been identified []. The hydrophobic protein crambin from the Abyssinian kale (Crambe abyssinica) is also a member of the thionin family [].
Members of this protein family are the product of one of seven genes regularly clustered in operons to encode the proteins of the tol-pal system, which is critical for maintaining the integrity of the bacterial outer membrane. The gene for this periplasmic protein has been designated orf2 and ybgF, which is then renamed CpoB (Coordinator of PG synthesis and OM constriction, associated with PBP1B). All members of the seed alignment were from unique tol-pal gene regions from completed bacterial genomes. The architecture of this protein is a signal sequence, a low-complexity region usually rich in Asn and Gln, and a well-conserved region with tandem repeats that resemble the tetratricopeptide (TPR) repeat, involved in protein-protein interaction [].Escherichia CpoB coordinates PBP1B and the Tol machines to maintain cell envelope integrity during division [
].
Proteins with this domain are found to be expressed in the plant embryo sac and are regulated by the Myb98 transcription factor. Computational analysis has revealed that they are homologous to the plant prolamin superfamily (Protease inhibitor-seed storage-LTP family,
) [
]. In contrast to the typical prolamin members that have eight conserved Cys residues forming four pairs of disulphide bonds, proteins with this domain only contain six conserved Cys residues that may form three pairs of disulphide bonds. They may have potential functions in lipid transfer or protection during plant embryo sac development and reproduction []. This domain includes both previous DUF784 and DUF1278 domains.
This entry includes DA1 and DAR1-7 from Arabidopsis and LIM domain-containing protein HDR3 from rice [
]. This entry also includes uncharacterised proteins from bacteria.In Arabidopsis thaliana, DA1 is a ubiquitin-activated endopeptidase that limits final seed and organ size by restricting the period of cell proliferation [
]. DA1 is activated by the RING E3 ligases Big Brother and DA2, both of which are then inactivated by cleavage by the active peptidase. DA1 also cleaves and inactivates deubiquitinase UBP15 and transcription factors TCP15 and TCP22, all of which promote cell proliferation. Presence of an HEXXH motif, which when mutated leads to inactivity, suggests that DA1 is a metalloendopeptidase [].
This entry consists of several plant wound-induced protein sequences related to WI12 from Mesembryanthemum crystallinum (Common ice plant) (
). Wounding, methyl jasmonate, and pathogen infection is known to induce local WI12 expression. WI12 expression is also thought to be developmentally controlled in the placenta and developing seeds. WI12 preferentially accumulates in the cell wall and it has been suggested that it plays a role in the reinforcement of cell wall composition after wounding and during plant development [
].
Centrosome-associated protein 350 (CEP350 or CAP350) plays an essential role in centriole growth by stabilising a procentriolar seed composed of, at least, SASS6 and CENPJ [
]. It is required for anchoring microtubules to the centrosomes and for the integrity of the microtubule network []. It also stabilises Golgi-associated microtubules and maintains a continuous pericentrosomal Golgi ribbon [].CEP350 possesses a CAP-Gly domain which is targeted to the centrosome or the Golgi-like network and binds microtubules through an N-terminal basic region [
].