Search our database by keyword

- or -

Examples

  • Search this entire website. Enter identifiers, names or keywords for genes, pathways, authors, ontology terms, etc. (e.g. eve, embryo, zen, allele)
  • Use OR to search for either of two terms (e.g. fly OR drosophila) or quotation marks to search for phrases (e.g. "dna binding").
  • Boolean search syntax is supported: e.g. dros* for partial matches or fly AND NOT embryo to exclude a term

Search results 12301 to 12400 out of 30763 for seed protein

Category restricted to ProteinDomain (x)

0.034s

Categories

Category: ProteinDomain
Type Details Score
Protein Domain
Name: Choice-of-anchor A domain
Type: Domain
Description: This domain may occur as essentially the full length of a protein, except for an N-terminal sequence and a C-terminal protein-sorting signal such as PEP-CTERM or LPXTG. Most often, the putative surface protein is longer, in which this domain is found N-terminal to repeating stalk domains in bacterial surface proteins. This is one of very few domains for which both anchoring domains occur, and designated choice-of-anchor A domain. The structure model of this domain is highly similar to the Ice_binding adhesive proteins ( ). However, some of the bacterial species in which this domain can be found, are less likely to be water-based and thus the function seems unlikely to be ice-binding. This domain is found in a Bacillus anthracis protein with the gene name BA_0871 or BASH2_04951, which was described to be collagen binding and to be involved in the bacterial pathogenicity []. BA0871, has five CNA-family protein B-type repeats toward the C terminus and an LPXTG cell wall attachment motif [].
Protein Domain
Name: ShET2 enterotoxin, N-terminal
Type: Domain
Description: This domain is present in the N-terminal region of the ShET2 enterotoxin produced by Shigella flexneri ( ) and Escherichia coli ( ). This protein was found to confer toxigenicity in Ussing chamber assays, and the N-terminal region was found to be important for its enterotoxic effect. The N-terminal domain is a cysteine-type peptidase with a Cys/His/Asp catalytic triad that cleaves within the receptor-interacting protein homotypic interaction motifs found within host adaptor proteins such as receptor-interacting serine/threonine protein kinases RIPK1 and RIPK3, TIR-domain-containing adapter-inducing interferon beta and Z-DNA-binding protein 1, inactivating them and thus inhibiting necroptosis and inflammatory signalling [ ]. The toxin is injected into the host cell by the type III secretion system [].Most proteins containing this domain are annotated as putative enterotoxins, but one member ( ) is a regulator of acetyl CoA synthetase, and another two members ( and ) are annotated as ankyrin-like regulatory proteins and contain Ank repeats ( ).
Protein Domain
Name: Minor tail T
Type: Domain
Description: This entry represents the 'T' domain of the tail assembly protein GT from lambda-like viruses and their prophage.In bacteriophage lambda, the overlapping open reading frames G and T are expressed by a programmed translational frameshift to produce the tail assembly proteins G and GT [ ]. Tail assembly protein GT shares it's N-terminal residues with tail assembly protein G, followed by residues of unique sequence []. An analogous frameshift is widely conserved among other dsDNA tailed phages in their corresponding 'G' and 'GT' tail genes even in the absence of detectable sequence homology []. The lambda tail assembly protein G and frameshift product GT are produced in a molar ratio of approximately 30:1[]. The correct molar ratio of these two related proteins, normally determined by the efficiency of the frameshift, is crucial for efficient assembly of functional tails []. Although tail assembly proteins G and GT are both required for assembly of functional tails, neither is present in mature tails [].
Protein Domain
Name: ERp29, N-terminal
Type: Domain
Description: ERp29 ( ) is a ubiquitously expressed endoplasmic reticulum protein, and is involved in the processes of protein maturation and protein secretion in this organelle [ , ]. The protein exists as a homodimer, with each monomer being composed of two domains. The N-terminal domain featured in this family is organised into a thioredoxin-like fold that resembles the a domain of human protein disulphide isomerase (PDI) []. However, this domain lacks the C-X-X-C motif required for the redox function of PDI; it is therefore thought that the function of ERp29 is similar to the chaperone function of PDI []. The N-terminal domain is exclusively responsible for the homodimerisation of the protein, without covalent linkages or additional contacts with other domains [].The Drosophila homologue, Wind, is the product of windbeutel, an essential gene in the development of dorsal-ventral patterning. Wind is required for correct targeting of Pipe, a Golgi-resident type II transmembrane protein with homology to 2-O-sulfotransferase [ ].
Protein Domain
Name: LppX/LprAFG lipoprotein family
Type: Family
Description: This entry consists of several lipoproteins mainly from Mycobacterium species, collectively known as the LppX/LprAFG family. Proteins in this entry include:LprG ( ) from Mycobacterium tuberculosis: an immunogenic 27kDa membrane-associated lipoprotein [ ]. Expression of the LprG protein is essential for the growth of M. tuberculosis in immunocompetent mice []. Purification of LprG showed that it inhibits MHC-II antigen processing in primary human macrophages, providing a mechanism to avoid the host MHC-II-restricted CD4+ T cell response which is considered essential for control of M. tuberculosis infection []. LppX: a lipoprotein required for the translocation of complex lipids to the outer membrane of Mycobacterium tuberculosis. Its structure consists of a U-shaped β-half-barrel with a large hydrophobic cavity [ ]. LprF: a membrane lipoprotein involved in the kdp signal transduction pathway, thought to be the primary response to osmotic stress [ ].lprA: a lipoprotein agonist of TLR2 that regulates innate immunity and APC function [ ].
Protein Domain
Name: Endoplasmic reticulum vesicle transporter, N-terminal
Type: Domain
Description: This entry represents the N-terminal domain of the endoplasmic reticulum vesicle transporter proteins (Ervs), also known as endoplasmic reticulum-Golgi intermediate compartment (ERGIC) proteins.Proteins included in this entry are conserved from plants and fungi to humans. Erv46 (ERGIC3) works in close conjunction with Erv41 (ERGIC2) and together they form a complex which cycles between the endoplasmic reticulum and Golgi complex. Erv46-41 interacts strongly with the endoplasmic reticulum glucosidase II. Mammalian glucosidase II comprises a catalytic alpha-subunit and a 58kDa beta subunit, which is required for ER localisation. All proteins identified biochemically as Erv41p-Erv46p interactors are localised to the early secretory pathway and are involved in protein maturation and processing in the ER and/or sorting into COPII vesicles for transport to the Golgi [ ].Proteins containing this domain also include disulfide isomerase (PDI)-C subfamily members from Arabidopsis. They are chimeric proteins containing the thioredoxin (Trx) domain of PDIs, and the conserved N- and C-terminal domains of Erv cargo receptors [ ].
Protein Domain
Name: TRIM34, PRY/SPRY domain
Type: Domain
Description: This domain, consisting of the distinct N-terminal PRY subdomain followed by the SPRY subdomain, is found at the C terminus of E3 ubiquitin-protein ligase TRIM34, also known as RING finger protein 21 (RNF21) or interferon-responsive finger protein (IFP1). TRIM proteins are defined by the presence of the tripartite motif RING/B-box/coiled-coil region and also known as RBCC proteins [ ]. The TRIM34 cDNA possesses at least three kinds of isoforms, due to alternative splicing, of which only the long and medium forms contain the SPRY domain []. It is an interferon-induced protein, predominantly expressed in the testis, kidney, and ovary []. TRIM34 functions as an antiviral protein and contributes to the defense against retroviral infections [, ]. The SPRY domain provides the capsid recognition motif that dictates specificity to retroviral restriction. While the PRY-SPRY domain provides specificity and the capsid recognition motif to retroviral restriction, TRIM34 binds HIV-1 capsid but does not restrict HIV-1 infection [].
Protein Domain
Name: BET1, SNARE domain
Type: Domain
Description: This entry represents the SNARE domain of BET1-like protein.BET1-like protein (also known as Golgi SNARE with a size of 15kDa, GS15 or GOS-15) is a protein required for movement of vesicles with the Golgi. In one hypothesis of how proteins move through the Golgi, vesicles containing proteins are budded from one cisterna and then fuse with the next cisterna in the stack. The fusion of the vesicle to the cisterna membrane requires N-ethylmaleimide-sensitive factor attachment protein receptors (SNAREs). SNAREs on a transport vesicle are known as v-SNAREs, and those on the target compartment are known as t-SNAREs; the v-SNARE and a t-SNARE form a trans-SNARE complex. GS15 is a v-SNARE protein found in the medial-cisternae of the Golgi apparatus and it forms a complex with three t-SNAREs (Syntaxin5, GS28, and Ykt6) to form a distinct GS15/Syn5/GS28/Ykt6 SNARE complex. This complex is required for endoplasmic reticulum to Golgi transport and intra-Golgi transport [, ].
Protein Domain
Name: Ferritin-like superfamily
Type: Homologous_superfamily
Description: Ferritin is one of the major non-haem iron storage proteins in animals, plants, and microorganisms. It is a multisubunit protein with a hollow interior, which contains a mineral core of hydrated ferric oxide, thereby ensuring its solubility in an aqueous environment [ ]. Each subunit consists of a closed, four-helical bundle with a left-handed twist and one crossover connection.This entry represents the ferritin-like superfamily. Proteins with this structure include ferritin and other ferritin-like proteins such as bacterioferritin (cytochrome b1) that binds haem between two subunits, non-haem ferritin, dodecameric ferritin homologue (DPS) that binds to and protects DNA, and the N-terminal domain of rubrerythrin that is found in many air-sensitive bacteria and archaea [ ]. In addition, ribonucleotide reductase-like proteins show a similar structure to the ferritin-like fold; these di-iron carboxylate proteins constitute a diverse class of non-haem iron enzymes performing a multitude of redox reactions []. The superfamily also includes the alpha and beta subunits of methane monooxygenase hydrolase, delta 9-stearoyl-acyl carrier protein desaturase and manganese catalase.
Protein Domain
Name: PAS domain
Type: Domain
Description: PAS domains are involved in many signalling proteins where they are used as a signal sensor domain [ ]. PAS domains appear in archaea, bacteria and eukaryotes. Several PAS-domain proteins are known to detect their signal by way of an associated cofactor. Heme,flavin, and a 4-hydroxycinnamyl chromophore are used in different proteins. The PAS domain was named after three proteins that it occurs in: Per- period circadian proteinArnt- Ah receptor nuclear translocator proteinSim- single-minded protein.PAS domains are often associated with PAC domains . It appears that these domains are directly linked, and that together they form the conserved 3D PAS fold. The division between the PAS and PAC domains is caused by major differences in sequences in the region connecting these two motifs [ ]. In human PAS kinase, this region has been shown to be very flexible, and adopts different conformations depending on the bound ligand []. Probably the most surprising identification of a PAS domain was that in EAG-like K-channels [ ].
Protein Domain
Name: Radical-activating enzyme, conserved site
Type: Conserved_site
Description: In Escherichia coli and related bacteria, the pflA protein (or act) [ ] is involved () in the activation of pyruvate formate-lyase (gene pflB) under anaerobic conditions by generation of an organic free radical, using S-adenosylmethionine and reduced flavodoxin as cosubstrates to produce 5'-deoxy-adenosine. The activity of pflA is iron-dependent. A protein highly similar to pflA and termed pflC [ ] is probably involved in the activation of a second pyruvate format lyase (gene pflD). The pflA/pflC proteins belong to a family that also includes Bacteriophage T4 and E. coli nrdG which are involved [] in the generation of the free radical for the anaerobic ribonucleoside-triphosphate reductase (gene nrdD or sunY). It also includes E. coli hypothetical protein yjjW, Haemophilus influenzae hypothetical protein HI0520 and Methanocaldococcus jannaschii (Methanococcus jannaschii) hypothetical protein MJ0021. All these proteins possess, in their N-terminal section, a highly conserved region which contains three clustered cysteines which are involved in iron-binding.
Protein Domain
Name: Pumilio, RNA binding domain
Type: Domain
Description: Puf repeats (also labelled PUM-HD or Pumilio homology domain) mediate sequence specific RNA binding in fly Pumilio, worm FBF-1 and FBF-2, and many other proteins such as vertebrate Pumilio [ ]. These proteins function as translational repressors in early embryonic development by binding to sequences in the 3' UTR of target mRNAs [, ], such as the nanos response element (NRE) in fly Hunchback mRNA, or the point mutation element (PME) in worm fem-3 mRNA. Other proteins that contain Puf domains are also plausible RNA binding proteins. Yeast PUF1 (JSN1), for instance, appears to contain a single RNA-recognition motif (RRM) domain. Puf repeat proteins have been observed to function asymmetrically and may be responsible for creating protein gradients involved in the specification of cell fate and differentiation. Puf domains usually occur as a tandem repeat of 8 domains. This domain encompasses all 8 tandem repeats []. Some proteins may have fewer (canonical) repeats.
Protein Domain
Name: PAS domain superfamily
Type: Homologous_superfamily
Description: PAS domains are involved in many signalling proteins where they are used as a signal sensor domain [ ]. PAS domains appear in archaea, bacteria and eukaryotes. Several PAS-domain proteins are known to detect their signal by way of an associated cofactor. Heme,flavin, and a 4-hydroxycinnamyl chromophore are used in different proteins. The PAS domain was named after three proteins that it occurs in: Per- period circadian proteinArnt- Ah receptor nuclear translocator proteinSim- single-minded protein.PAS domains are often associated with PAC domains . It appears that these domains are directly linked, and that together they form the conserved 3D PAS fold. The division between the PAS and PAC domains is caused by major differences in sequences in the region connecting these two motifs [ ]. In human PAS kinase, this region has been shown to be very flexible, and adopts different conformations depending on the bound ligand []. Probably the most surprising identification of a PAS domain was that in EAG-like K-channels [ ].
Protein Domain
Name: RGS12, RGS domain
Type: Domain
Description: RGS (Regulator of G-protein Signaling) domain is an essential part of the RGS12 protein. RGS12 is a member of the RA/RGS subfamily of RGS proteins family, a diverse group of multifunctional proteins that regulate cellular signaling events downstream of G-protein coupled receptors (GPCRs). As a major G-protein regulator, RGS domain containing proteins are involved in many crucial cellular processes such as regulation of intracellular trafficking, glial differentiation, embryonic axis formation, skeletal and muscle development, and cell migration during early embryogenesis [ , ]. RGS12 belongs to the R12 RGS subfamily, which includes RGS10 and RGS14, all of which are highly selective for G-alpha-i1 over G-alpha-q. RGS12 exist in multiple splice variants: RGS12s (short) contains the core RGS/RBD/GoLoco domains, while RGS12L (long) has additional N-terminal PDZ and PTB domains. RGS12 splice variants show distinct expression patterns, suggesting that they have discrete functions during mouse embryogenesis []. RGS12 also may play a critical role in coordinating Ras-dependent signals that are required for promoting and maintaining neuronal differentiation [].
Protein Domain
Name: ALMS motif
Type: Domain
Description: This domain is found at the C terminus of centrosome-associated protein ALMS1, centrosomal protein of 295 kDa (CEP295, also known as KIAA1731) and (E2-independent) E3 ubiquitin-conjugating enzyme FATS (also known as C10orf90) [ , ]. These proteins play a role in centrosomal functions. ALMS1 is implicated in ciliary function, cell cycle control, and intracellular transport. It interacts with alpha-actinin and components of the endosomal recycling pathway [ ]. CEP295 is a centriole-enriched microtubule-binding protein involved in centriole biogenesis, essential for the generation of the distal portion of newborn centrioles and recruitment of centriolar proteins, such as POC1B, POC5 and CEP135 [, ].Centrosome-associated protein Alms1a, the Drosophila homologue of centrosome-associated protein ALMS1, plays critical role in ensuring centrosome duplication in asymmetrically dividing germline stem cells (GSCs), which is essential for the production of centrosomes and centrioles in all downstream germ cells [ ]. This domain directly interacts with Klp10, a microtubule-depolymerizing kinesin of kinesin-13 family, at GSC centrosome []. Alms1a might recruit SAK for daughter centriole duplication [].
Protein Domain
Name: Monothiol glutaredoxin-related
Type: Family
Description: Glutaredoxins [ , , ], also known as thioltransferases (disulphide reductases), are small proteins of approximately one hundred amino-acid residues which utilise glutathione and NADPH as cofactors. Oxidized glutathione is regenerated by glutathione reductase. Together these components compose the glutathione system [].Glutaredoxin functions as an electron carrier in the glutathione-dependent synthesis of deoxyribonucleotides by the enzyme ribonucleotide reductase. Like thioredoxin (TRX), which functions in a similar way, glutaredoxin possesses an active centre disulphide bond [ ]. It exists in either a reduced or an oxidized form where the two cysteine residues are linked in an intramolecular disulphide bond. It contains a redox active CXXC motif in a TRX fold and uses a similar dithiol mechanism employed by TRXs for intramolecular disulfide bond reduction of protein substrates. Unlike TRX, GRX has preference for mixed GSH disulfide substrates, in which it uses a monothiol mechanism where only the N-terminal cysteine is required. The flow of reducing equivalents in the GRX system goes from NADPH ->GSH reductase ->GSH ->GRX ->protein substrates [ , , , ]. By altering the redox state of target proteins, GRX is involved in many cellular functions including DNA synthesis, signal transduction and the defense against oxidative stress.Glutaredoxin has been sequenced in a variety of species. On the basis of extensive sequence similarity, it has been proposed [ ] that Vaccinia virus protein O2L is most probably a glutaredoxin. Finally, it must be noted that Bacteriophage T4 thioredoxin seems also to be evolutionary related. In position 5 of the pattern T4 thioredoxin has Val instead of Pro.This family groups proteins from different organisms which are related to monothiol glutaredoxins. Monothiol glutaredoxins occur in different subcellular compartments; for instance, yeast Grx3 and Grx4 are nuclear proteins, whereas Grx5 is mitochondrially localized [ ]. They share a common basic structural motif and biochemical mechanism of action, while participating in a diversity of cellular functions as protein redox regulators.
Protein Domain
Name: Clathrin, heavy chain, propeller repeat
Type: Repeat
Description: This entry represents the propeller repeat found in clathrin heavy chains. The N terminus of the heavy chain is known as the globular domain, and is composed of seven repeats which form a beta propeller [ ].Proteins synthesized on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. These vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transport [ ]. Clathrin coats contain both clathrin (acts as a scaffold) and adaptor complexes that link clathrin to receptors in coated vesicles. Clathrin-associated protein complexes are believed to interact with the cytoplasmic tails of membrane proteins, leading to their selection and concentration. The two major types of clathrin adaptor complexes are the heterotetrameric adaptor protein (AP) complexes, and the monomeric GGA (Golgi-localising, Gamma-adaptin ear domain homology, ARF-binding proteins) adaptors [, ].Clathrin is a trimer composed of three heavy chains and three light chains, each monomer projecting outwards like a leg; this three-legged structure is known as a triskelion [, ]. The heavy chains form the legs, their N-terminal β-propeller regions extending outwards, while their C-terminal α-α-superhelical regions form the central hub of the triskelion. Peptide motifs can bind between the β-propeller blades. The light chains appear to have a regulatory role, and may help orient the assembly and disassembly of clathrin coats as they interact with hsc70 uncoating ATPase []. Clathrin triskelia self-polymerise into a curved lattice by twisting individual legs together. The clathrin lattice forms around a vesicle as it buds from the TGN, plasma membrane or endosomes, acting to stabilise the vesicle and facilitate the budding process []. The multiple blades created when the triskelia polymerise are involved in multiple protein interactions, enabling the recruitment of different cargo adaptors and membrane attachment proteins [].
Protein Domain
Name: Clathrin heavy chain, N-terminal
Type: Homologous_superfamily
Description: Proteins synthesized on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. These vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transport [ ]. Clathrin coats contain both clathrin (acts as a scaffold) and adaptor complexes that link clathrin to receptors in coated vesicles. Clathrin-associated protein complexes are believed to interact with the cytoplasmic tails of membrane proteins, leading to their selection and concentration. The two major types of clathrin adaptor complexes are the heterotetrameric adaptor protein (AP) complexes, and the monomeric GGA (Golgi-localising, Gamma-adaptin ear domain homology, ARF-binding proteins) adaptors [, ].Clathrin is a trimer composed of three heavy chains and three light chains, each monomer projecting outwards like a leg; this three-legged structure is known as a triskelion [ , ]. The heavy chains form the legs, their N-terminal β-propeller regions extending outwards, while their C-terminal α-α-superhelical regions form the central hub of the triskelion. Peptide motifs can bind between the β-propeller blades. The light chains appear to have a regulatory role, and may help orient the assembly and disassembly of clathrin coats as they interact with hsc70 uncoating ATPase []. Clathrin triskelia self-polymerise into a curved lattice by twisting individual legs together. The clathrin lattice forms around a vesicle as it buds from the TGN, plasma membrane or endosomes, acting to stabilise the vesicle and facilitate the budding process []. The multiple blades created when the triskelia polymerise are involved in multiple protein interactions, enabling the recruitment of different cargo adaptors and membrane attachment proteins []. This entry represents a region covering the N-terminal β-propeller region of clathrin heavy chains that extends away from the hub of triskelia, and which is responsible for peptide binding [ ], as well as the core motif for the α-helical zigzag linker region connecting the conserved N-terminal β-propeller region to the C-terminal α-α-superhelical region in clathrin heavy chains [].
Protein Domain
Name: Photosystem II PsbR
Type: Family
Description: Oxygenic photosynthesis uses two multi-subunit photosystems (I and II) located in the cell membranes of cyanobacteria and in the thylakoid membranes of chloroplasts in plants and algae. Photosystem II (PSII) has a P680 reaction centre containing chlorophyll 'a' that uses light energy to carry out the oxidation (splitting) of water molecules, and to produce ATP via a proton pump. Photosystem I (PSI) has a P700 reaction centre containing chlorophyll that takes the electron and associated hydrogen donated from PSII to reduce NADP+ to NADPH. Both ATP and NADPH are subsequently used in the light-independent reactions to convert carbon dioxide to glucose using the hydrogen atom extracted from water by PSII, releasing oxygen as a by-product.PSII is a multisubunit protein-pigment complex containing polypeptides both intrinsic and extrinsic to the photosynthetic membrane [ , , ]. Within the core of the complex, the chlorophyll and beta-carotene pigments are mainly bound to the antenna proteins CP43 (PsbC) and CP47 (PsbB), which pass the excitation energy on to the reaction centre proteins D1 (Qb, PsbA) and D2 (Qa, PsbD) that bind all the redox-active cofactors involved in the energy conversion process. The PSII oxygen-evolving complex (OEC) oxidises water to provide protons for use by PSI, and consists of OEE1 (PsbO), OEE2 (PsbP) and OEE3 (PsbQ). The remaining subunits in PSII are of low molecular weight (less than 10kDa), and are involved in PSII assembly, stabilisation, dimerisation, and photo-protection []. This family represents the low molecular weight intrinsic protein PsbR found in PSII, which is also known as the 10kDa polypeptide. The PsbR gene is found only in the nucleus of green algae and higher plants. PsbR may provide a binding site for the extrinsic oxygen-evolving complex protein PsbP to the thylakoid membrane. PsbR has a transmembrane domain to anchor it to the thylakoid membrane, and a charged N-terminal domain capable of forming ion bridges with extrinsic proteins, allowing PsbR to act as a docking protein. PsbR may be a pH-dependent stabilising protein that functions at both donor and acceptor sides of PSII [ ].
Protein Domain
Name: GrpE nucleotide exchange factor
Type: Family
Description: Molecular chaperones are a diverse family of proteins that function to protect proteins in the intracellular milieu from irreversible aggregation during synthesis and in times of cellular stress. The bacterial molecular chaperone DnaK is an enzyme that couples cycles of ATP binding, hydrolysis, and ADP release by an N-terminal ATP-hydrolysing domain to cycles of sequestration and release of unfolded proteins by a C-terminal substrate binding domain. DnaK is itself a weak ATPase; ATP hydrolysis by DnaK is stimulated by its interaction with another co-chaperone, DnaJ. In prokaryotes the dimeric GrpE is the co-chaperone for DnaK, and acts as a nucleotide exchange factor, stimulating the rate of ADP release 5000-fold [ ]. GrpE participates actively in response to heat shock by preventing aggregation of stress-denatured proteins: unfolded proteins initially bind to DnaJ, the J-domain ATPase-activating protein (Hsp40 family), whereupon DnaK hydrolyzes its bound ATP, resulting in a stable complex. The GrpE dimer binds to the ATPase domain of Hsp70 catalyzing the dissociation of ADP, which enables rebinding of ATP, one step in the Hsp70 reaction cycle in protein folding. Thus the co-chaperones DnaJ and GrpE are capable of tightly regulating the nucleotide-bound and substrate-bound state of DnaK in ways that are necessary for the normal housekeeping functions and stress-related functions of the DnaK molecular chaperone cycle [, , , , , , , , ].In eukaryotes, only the mitochondrial Hsp70, not the cytosolic form, is GrpE dependent. Over-expression of Hsp70 molecular chaperones is important in suppressing toxicity of aberrantly folded proteins that occur in Alzheimer's disease (AD), Parkinson's disease (PD), amyotrophic lateral sclerosis, as well as several polyQ-diseases such as Huntington's disease and ataxias.The X-ray crystal structure of GrpE in complex with the ATPase domain of DnaK revealed that GrpE is an asymmetric homodimer, bent in a manner that favours extensive contacts with only one DnaK ATPasemonomer [ ]. GrpE does not actively compete for the atomic positions occupied by the nucleotide. GrpE and ADP mutually reduce one another's affinity for DnaK 200-fold, and ATP instantly dissociates GrpE from DnaK.
Protein Domain
Name: MT-A70-like
Type: Family
Description: N6-methyladenosine (m6A) is present at internal sites in some mRNAs. m6A affects different aspects of mRNA metabolism, such as half-life, splicing, and translation [ , , , , ].MT-A70 (also known as METTL3) is the S-adenosylmethionine-binding subunit of human mRNA N6-adenosine-methyltransferase (MTase), an enzyme that sequence-specifically methylates adenines in pre-mRNAs. Proteins with sequence similarity to MT-A70 have been identified in eukaryotes and prokaryotes. The resulting family is defined by sequence similarity in the carboxyl-proximal regions of the respective proteins. The amino-proximal regions of the eukaryotic proteins are highly diverse, often Pro-rich, and are conserved only within individual subfamilies [ ]. Corresponding regions are not present in prokaryotic members of the family. MT-A70-like proteins contain examples of some of the consensus methyltransferase motifs that have been derived from mutational and structural studies of bacterial DNA methyltransferases, including the universally conserved motif IV catalytic residues and a proposed motif I (AdoMet binding) element []. The MT-A70-like family comprises four subfamilies with varying degrees of interrelatedness. One subfamily is a small group of bacterial DNA: m6A MTases. The other three are paralogous eukaryotic lineages, two of which have not been associated with MTase activity but include proteins that regulate mRNA levels via unknown mechanisms apparently not involving methylation [].Some proteins known to belong to the MT-A70-like family are listed below: Human N6-adenosine-methyltransferase 70kDa subunit (MT-A70 or METTL3) ( ), the catalytic component of the METTL3-METTL14 heterodimer that forms the N6-methyltransferase complex that methylates adenosine residues at the N6 position of some RNAs [ ]. Human N6-adenosine-methyltransferase non-catalytic subunit (METTL14), the non-catalytic component of the METTL3-METTL14 heterodimer.Yeast N6-adenosine-methyltransferase IME4 ( ), which is important for induction of sporulation. Yeast karyogamy protein KAR4, a phosphoprotein required for expression of karyogamy-specific genes during mating and that it also acts during mitosis and meiosis [ ]. It has been suggested that KAR4 is inactive for methyltransfer and may not even bind AdoMet.
Protein Domain
Name: Signal transduction response regulator, C-terminal effector
Type: Homologous_superfamily
Description: Two-component signal transduction systems enable bacteria to sense, respond, and adapt to a wide range of environments, stressors, and growth conditions [ ]. Some bacteria can contain up to as many as 200 two-component systems that need tight regulation to prevent unwanted cross-talk []. These pathways have been adapted to response to a wide variety of stimuli, including nutrients, cellular redox state, changes in osmolarity, quorum signals, antibiotics, and more []. Two-component systems are comprised of a sensor histidine kinase (HK) and its cognate response regulator (RR) []. The HK catalyses its own auto-phosphorylation followed by the transfer of the phosphoryl group to the receiver domain on RR; phosphorylation of the RR usually activates an attached output domain, which can then effect changes in cellular physiology, often by regulating gene expression. Some HK are bifunctional, catalysing both the phosphorylation and dephosphorylation of their cognate RR. The input stimuli can regulate either the kinase or phosphatase activity of the bifunctional HK.A variant of the two-component system is the phospho-relay system. Here a hybrid HK auto-phosphorylates and then transfers the phosphoryl group to an internal receiver domain, rather than to a separate RR protein. The phosphoryl group is then shuttled to histidine phosphotransferase (HPT) and subsequently to a terminal RR, which can evoke the desired response [ , ].This entry represents a structural domain usually found at the C-terminal of bipartite response regulators. These proteins are known to bind to DNA and RNA polymerases, and their N-terminal receiver domain belongs to the CheY family. The C-terminal effector domain consists of a 3-helical bundle in an up-an-down arrangement with a right-handed twist. This domain occurs in:PhoB-like proteins, which includes PhoB [ ], OmpR [], and DrrB []; these proteins contain a 4-stranded meander β-sheet in the N-terminal extension.GerE-like proteins from the LuxR/UhpA family of proteins, which includes GerE [ ], TraR (quorum-sensing) [], NarL (nitrate/nitrite response regulator) [], and RcsB transcriptional regulator []; these proteins contain an additional fourth helix in the C-terminal extension.Spo0A proteins [ ], which are elaborated with additional helices.
Protein Domain
Name: CRISPR-associated protein, Csx11
Type: Family
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. Members of this uncommon, sporadically distributed Cas protein family are large (>900 amino acids) and strictly associated, so far, with CRISPR-associated (Cas) gene clusters. Nearby Cas genes always include members of the RAMP superfamily and the six-gene CRISPR-associated RAMP module. Species in which it is found, so far, include three archaea (Methanosarcina mazei, Methanosarcina barkeri and Methanobacterium thermoautotrophicum) and two bacteria (Thermodesulfovibrio yellowstonii DSM 11347 and Sulfurihydrogenibium azorense).
Protein Domain
Name: CALCOCO1/2, zinc finger UBZ1-type
Type: Domain
Description: The ubiquitin-binding zinc finger (UBZ) is a type of zinc-coordinating β-β-α fold domain found mainly in proteins involved in DNA repair and transcriptional regulation. UBZ domains coordinate a zinc ion with cysteine or histidine residues; depending on their amino acid sequence, UBZ domains are classified into several families [ , ]. Type 1 UBZs are CCHH-type zinc fingers found in tandem UBZ domains of TAX1-binding protein 1 (TAX1BP1) [, , ], type 2 UBZs are CCHC-type zinc fingers found in FAAP20 which is a subunit of the Fanconi anemia (FA) core complex [, ], type 3 UBZs are CCHH-type zinc fingers found only in the Y-family translesion polymerase eta [, , ], and type 4 UBZs are CCHC-type zinc fingers found in Y-family translesion polymerase kappa, Werner helicase-interacting protein 1 (WRNIP1), and Rad18 [ , , ]. The UBZ domain consists of two short antiparallel β-strands followed by one α-helix. The α-helix packs against the β-strands with a zinc ion sandwiched between the α-helix and the β-strands. The zinc ion is coordinated by two cysteines located on the fingertip formed by the β-strands and two histidines [ , ] or one histidine and one cysteine [] on the α-helix [].This entry represents the UBZ1 type zinc finger domain found in calcium-binding and coiled-coil domain 1/2 (CALCOCO1/2), tax-binding protein 1 and protein spindle-F.This domain is a typical C2H2-type zinc finger which specifically recognizes mono-ubiquitin or poly-ubiquitin chain. The overall ubiquitin-binding mode utilizes the C-terminal α-helix to interact with the solvent-exposed surface of the central β-sheet of ubiquitin, similar to that observed in the RABGEF1/Rabex-5 or POLN/Pol-eta zinc finger [ ].CALCOCO2 (also known as NDP25) is an ubiquitin-binding autophagy receptor involved in the selective autophagic degradation of invading pathogens [ ]. Tax binding protein 1 is a ubiquitin binding protein [] and protein spindle-F plays a role in oocyte axis determination and microtubule organization during oogenesis in Drosophila [, ].
Protein Domain
Name: Major vault protein, N-terminal
Type: Repeat
Description: Vaults are the largest ribonucleoprotein particles known, having a mass of approximately 13 MDa. They are multi-subunit structures that may act as scaffolds for proteins involved in signal transduction and may also play a role in nucleo-cytoplasmic transport. Vaults are present in most normal tissues, but are more highly expressed in epithelial cells with secretory and excretory functions, as well as in cells chronically exposed to xenobiotics, such as bronchial cells and cells lining the intestine [ ]. Overexpression of these proteins is linked with multidrug-resistance in cancer cells.The mammalian vault structure is highly regular and consists of approximately 96 molecules of the 100kDa major vault protein (MVP), 2 molecules of the 240kDa minor vault protein TEP1, 8 molecules of the 193kDa minor vault protein VPARP and at least 6 copies of a small untranslated RNA of 88-141 bases. The MVP molecules form the core of the complex, which is a barrel-like structure with an invaginated waist and two protruding caps. The complex can unfold into two symmetrical flower-like structures with 8 petals each supposedly consisting of 6 MVP molecules [ ]. The MVP protein is composed of two distinct domains [ ]. The N-terminal domain contains ~8 copies of the vault repeat (or MVP repeat) in tandem. The MVP repeat is composed of ~53 amino acids and forms a structural part of the vault wall. The C-terminal part of MVP may be involved in oligomerization and be located in the vault cap, while the MVP repeats in the N-terminal part can be packed like staves in a barrel to form the vault wall. The 3D structure of the repeat forms a fold that consists of a three stranded (B) antiparallel β-sheet in a unique topology B2-B1-B3 and two loops. MVP repeats can be interaction-mediating modules, as MVP repeats 3 and 4 bind VPARP, which is one of the other vault proteins.
Protein Domain
Name: CRISPR-associated protein, MJ0385
Type: Family
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [, , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. This entry represents a family of Cas proteins that tends to be found near CRISPR repeats. The species range for famliy members, so far, is exclusively archaeal. It is found so far in only four different species, and includes two tandem genes in Pyrococcus furiosus DSM 3638. This family is also known as Csa4 and Cas8a2 Type I-A [ ].
Protein Domain
Name: CRISPR-associated protein, CsaX
Type: Family
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. This entry comprises of a minor CRISPR-associated protein. So far, members are only found in the context of the (strictly archaeal) Apern subtype of CRISPR/Cas system, and is further restricted to the Sulfolobales, including Metallosphaera sedula DSM 5348 and multiple species of the genus Sulfolobus.
Protein Domain
Name: CRISPR-associated protein, TM1812
Type: Family
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. This entry represents a family of Cas proteins, including TM1812 from Thermotoga maritima and MJ1674 from Methanocaldococcus jannaschii. Family members are also found in Vibrio vulnificus (strain YJ016), Nitrosomonas europaea (strain ATCC 19718), a large plasmid of Synechocystis sp.(strain PCC 6803), and Fibrobacter succinogenes subsp. succinogenes S85.
Protein Domain
Name: Putative CRISPR-associated protein, VVA1548 family
Type: Family
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. This entry represents a conserved region of about 95 amino acids found exclusively in species with CRISPR repeats. In all bacterial species that contain this entry, the genes encoding the proteins are in the midst of a cluster of cas genes.
Protein Domain
Name: Helicase Cas3, CRISPR-associated, Yersinia-type
Type: Family
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. This entry represents the Yersinia-type Cas3 family of helicases. The Yersinia-type Cas3 helicases differ from the more common Cas3 proteins by being considerably larger, though they still share a number of motifs, and replace Cas3 in some CRISPR/cas loci in a number of Proteobacteria, including Yersinia pestis, Chromobacterium violaceum, Erwinia carotovora subsp. atroseptica SCRI1043, Photorhabdus luminescens subsp. laumondii TTO1, Legionella pneumophila, etc.
Protein Domain
Name: CRISPR system CMR subunit Cmr7 1
Type: Family
Description: In the CRISPR-Cas system, RNA is targeted by the CMR complex. In Sulfolobus solfataricus, this complex is composed of seven CAS protein subunits (Cmr1-7) and carries a diverse "payload"of targeting crRNA. This entry represents the Cmr7 subunit of the CMR complex [ ].The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA). The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ].
Protein Domain
Name: HOOK, N-terminal
Type: Domain
Description: The Hook family consists of several proteins from different eukaryotic organisms, first identified in Drosophila melanogaster in which play a role in endocytic cargo sorting [ ]. In Drosophila and fungi there is a single Hook gene, whereas mammals have three Hook genes, Hook1, Hook2 and Hook3. Endogenous Hook3 binds to Golgi membranes while both Hook1 and Hook2 are localised to discrete but unidentified cellular structures [, ]. In mice the Hook1 gene is predominantly expressed in the testis. Hook1 function is necessary for the correct positioning of microtubule structures within the haploid germ cell. Disruption of Hook1 function in mice causes abnormal sperm head shape and fragile attachment of the flagellum to the sperm head []. They are a widely expressed class of dynein-associated cargo adaptor proteins which include different domains. The N-terminal part of these proteins is sufficient to form a stable complex with dynein-dynactin and includes the most conserved region within the first 160 amino acids, termed the Hook domain. This domain is followed by three coiled-coil domains, important for dimerization and activation of dynein-dynactin complex motility, and then a C-terminal domain that binds a variety of proteins specific for each Hook isoform, involved in binding to specific organelles (organelle-binding domains). All mammalian Hook isoforms form a complex with Fused Toes and the Fused Toes- and Hook-interacting protein; fungal homologues of these proteins are important for dynein-mediated early endosome transport by linking Hook to the cargo [ ].This entry includes residues in the first 160 amino acids at the N-terminal of Hook, which is the most conserved region and necessary for dynein-dynactin interaction. It interacts with dynein light intermediate chain 1 (LIC1) [ ]. This domain is also found in protein Daple [] and Girdin [] which are G-protein modulators involved in ciliogenesis and cilium morphology, integrity of the actin cytoskeleton, formation of actin stress fibres and lamellipodia and membrane sorting in the early endosome.
Protein Domain
Name: Lipocalin, OBP-like
Type: Family
Description: The crystal structures of several lipocalins have been solved and show a novel 8-stranded anti-parallel β-barrel fold well conserved within the family. Sequence similarity within the family is at a much lower level andwould seem to be restricted to conserved disulphides and 3 motifs, which form a juxtaposed cluster that may act as a common cell surface receptorsite [ ]. By contrast, at the more variable end of the fold are found an internal ligand binding site and a putative surface for the formation of macromolecular complexes []. The anti-parallel β-barrel fold is alsoexploited by the fatty acid-binding proteins (which function similarly by binding small hydrophobic molecules), by avidin and the closely relatedmetalloprotease inhibitors, and by triabin. Similarity at the sequence level, however, is less obvious, being confined to a single short N-terminal motif. The lipocalin family can be subdivided into kernal and outlier sets. Thekernal lipocalins form the largest self consistent group, comprising the subfamily of odour-binding proteins. The outlier lipocalins form several smaller distinct subgroups: the OBPs, the von Ebner's gland proteins, alpha-1-acid glycoproteins, tick histamine binding proteins and the nitrophorins. Odour Binding Proteins (OBPs) [ ] are associated with olfactory tissue, and seem able to bind odorant molecules with high specificity. Rattus norvegicus (Rat) OBP is localised to the lateral nasal, or Sterno's, gland, the largest of the 20 discrete nasal glands of the rat. A similar protein from the olfactory tissue of Rana pipiens (Northern leopard frog), which was named protein BG (Bowman's gland), has been identified, cloned and sequenced. It is thought that the OBPsmay function by concentrating and delivering odorant molecules to their receptors.Aphrodisin [ ] is the major macromolecular component of hamster vaginal discharge, and is secreted by vaginal tissue and the Bartholin's gland. These secretions, acting via the vomeronasal organ, are known to elicit a copulatory response in male hamsters. Aphrodisin is a mammalian proteinaceous pheromone.Probasin [ ] is a lipocalin originally isolated from the nuclei of rat dorsolateral prostate epithelial cells. Probasin mRNA expression, which is regulated by androgens, gives rise to both a secreted and a nuclear form ofprobasin, the relative abundance of the two forms being correlated with cell type. Probasin concentration also seems to be closely linked with cell ageand state of differentiation. Bos taurus (Bovine) lipocalin allergen Bos d 2 is found in the secretory cells of skin apocrine sweat glands and the basement membranes of the epithelium and hairfollicles. Immunohistochemistry with a monoclonal anti-Bos d 2 antibody has confirmed that skin is the only tissue where mRNA encoding Bos d 2 is detected. This suggest that Bos d 2 is produced in sweat glands and transported to the skin surface as a carrier of a pheromone. Because dander allergens of several mammalian species are lipocalins, the biological function of pheromone transport appears to be a common feature of animportant group of aeroallergens [ ].These lipocalins belong to the OBP group and may also act as odorant/pheromone carriers [ ].
Protein Domain
Name: DNA mismatch repair MutS family
Type: Family
Description: Mismatch repair contributes to the overall fidelity of DNA replication and is essential for combating the adverse effects of damage to the genome. It involves the correction of mismatched base pairs that have been missed by the proofreading element of the DNA polymerase complex. The post-replicative Mismatch Repair System (MMRS) of Escherichia coli involves MutS (Mutator S), MutL and MutH proteins, and acts to correct point mutations or small insertion/deletion loops produced during DNA replication [ ]. MutS and MutL are involved in preventing recombination between partially homologous DNA sequences. The assembly of MMRS is initiated by MutS, which recognises and binds to mispaired nucleotides and allows further action of MutL and MutH to eliminate a portion of newly synthesized DNA strand containing the mispaired base []. MutS can also collaborate with methyltransferases in the repair of O(6)-methylguanine damage, which would otherwise pair with thymine during replication to create an O(6)mG:T mismatch []. MutS exists as a dimer, where the two monomers have different conformations and form a heterodimer at the structural level []. Only one monomer recognises the mismatch specifically and has ADP bound. Non-specific major groove DNA-binding domains from both monomers embrace the DNA in a clamp-like structure. Mismatch binding induces ATP uptake and a conformational change in the MutS protein, resulting in a clamp that translocates on DNA. MutS is a modular protein with a complex structure [ ], and is composed of:N-terminal mismatch-recognition domain, which is similar in structure to tRNA endonuclease.Connector domain, which is similar in structure to Holliday junction resolvase ruvC.Core domain, which is composed of two separate subdomains that join together to form a helical bundle; from within the core domain, two helices act as levers that extend towards (but do not touch) the DNA.Clamp domain, which is inserted between the two subdomains of the core domain at the top of the lever helices; the clamp domain has a β-sheet structure.ATPase domain (connected to the core domain), which has a classical Walker A motif.HTH (helix-turn-helix) domain, which is involved in dimer contacts.The MutS family of proteins is named after the Salmonella typhimurium MutS protein involved in mismatch repair. Homologues of MutS have been found in many species including eukaryotes (MSH 1, 2, 3, 4, 5, and 6 proteins), archaea and bacteria, and together these proteins have been grouped into the MutS family. Although many of these proteins have similar activities to the E. coli MutS, there is significant diversity of function among the MutS family members. Human MSH has been implicated in non-polyposis colorectal carcinoma (HNPCC) and is a mismatch binding protein [ ].This diversity is even seen within species, where many species encode multiple MutS homologues with distinct functions []. Inter-species homologues may have arisen through frequent ancient horizontal gene transfer of MutS (and MutL) from bacteria to archaea and eukaryotes via endosymbiotic ancestors of mitochondria and chloroplasts [].
Protein Domain
Name: Intein N-terminal splicing region
Type: PTM
Description: Inteins, or protein introns, are parts of protein sequences that are post-translationally excised, their flanking regions (exteins) being spliced together to yield an additional protein product [ , ]. This process is believed to be self-catalysed, apparently initiating at the C-terminal splice junction, where a conserved asparagine residue mediates the nucleophilic attack of the peptide bond between it and its neighbouring residue. Most inteins consist of two domains: One is involved in autocatalytic splicing, and the other is an endonucleasethat is important in the spread of inteins [ ]. Inteins are between 134 and 608 amino acids long found in eukaryotes, bacteria, and archaea, although most frequently in archaea. Inteins are found in proteins with diverse functions, including metabolic enzymes, DNA and RNA polymerases, proteases, ribonucleotide reductases, and the vacuolar-type ATPase. However, enzymes involved in DNA replication and repair appear to dominate. Inteins are found in conserved regions of conserved proteins and can be regarded as parasitic genetic elements [ ]. The splicing of inteins initiates at the C-terminal splice junction. The delta-nitrogen group of a conserved asparagine residue makes a nucleophilic attack on the peptide bond that links this asparagine to the next residue. The next residue (a Cys, Ser or Thr) is then free to attack the peptide bond at the N-terminal splice junction by a transpeptidation reaction that releases the intein and creates a new peptide bond. Such a mechanism is briefly schematised in the following figures.1) Primary translation product +---------------+ +-------------+ +--------------+NH2-| Extein 1 x--y Intein N--z Extein 2 |-COOH +---------------+ +-------------+ +--------------+2) Breakage of the peptide bond at the C-terminal splice junction by nucleophilic attack of the asparagine.+---------------+ +-------------+ +--------------+ NH2-| Extein 1 x--y Intein N NH2-z Extein 2 |-COOH+---------------+ +-------------+ +--------------+ 3) Transpeptidation to produce the final products.+---------------+ +-------------+ +--------------+ NH2-| Extein 1 x--z Extein 2 |-COOH NH2-y Intein N+---------------+ +-------------+ +--------------+ Most inteins are bifunctional proteins mediating both protein splicing and DNA cleavage. The domain involved in splicing is formed by the two terminal splicing regions, which are separated by a small linker in mini-inteins or a homing endonuclease of 200-250 amino acids in larger inteins [, ]. The N-terminal splicing region spans the about 100 N-terminal amino acids and contains the conserved intein blocks A and B which are similar to the motifs found in the C-terminal autoprocessing domain of the hedgehog protein. The C-terminal splicing region is composed of the two conserved blocks F and G located in the about 50 C-terminal amino acids. Although, no single residue is invariant, the Ser and Cys in block A, the His in block B, the His, Asn and Ser/Cys/Thr in block G are the most conserved residues in the splicing motifs. Protein splicing requires neither cofactors nor auxiliary enzymes and involves a series of four intramolecular reactions in which several of these most conserved residues are implicated [, ].The entry represents a the N-terminal splicing region that covers the intein blocks A and B. It starts with the first N-terminal amino acid of the intein.
Protein Domain
Name: Ulp1 protease family, C-terminal catalytic domain
Type: Domain
Description: This entry represents the C-terminal part of ubiquitin-like proteases that displays full proteolytic activity [].Deubiquitinating enzymes (DUB) form a large family of cysteine protease that can deconjugate ubiquitin or ubiquitin-like proteins from ubiquitin-conjugatedproteins. They can be classified in 3 families according to sequence homology [, ]: ubiquitin carboxyl-terminal hydrolases (UCH),ubiquitin-specific processing proteases (UBP), and ubiquitin-like proteases (ULP) () specific for deconjugating ubiquitin-like proteins. In contrast to the UBP pathway, which is veryredundant (16 UBP enzymes in yeast), there is few ubiquitin-like protease (only one in yeast, ULP1).ULP1 catalyses two critical functions in the SUMO/Smt3 pathway via its cysteine protease activity. ULP1 processes the Smt3 C-terminal sequence(-GGATY) to its mature form (-GG), and it deconjugates Smt3 from the lysine ε-amino group of the target protein [].Crystal structure of yeast ULP1 bound to Smt3 [ ] revealed that the catalyticand interaction interface is situated in a shallow and narrow cleft where conserved residues recognise the Gly-Gly motif at the C-terminal extremity ofSmt3 protein. Ulp1 adopts a novel architecture despite some structural similarity with other cysteine protease. The secondary structure is composedof seven alpha helices and seven beta strands. The catalytic domain includes the central alpha helix, β-strands 4 to 6, and the catalytic triad(Cys-His-Asp). A cysteine peptidase is a proteolytic enzyme that hydrolyses a peptide bond using the thiol group of a cysteine residue as a nucleophile. Hydrolysis involves usually a catalytic triad consisting of the thiol group of the cysteine, the imidazolium ring of a histidine, and a third residue, usually asparagine or aspartic acid, to orientate and activate the imidazolium ring. In only one family of cysteine peptidases, is the role of the general base assigned to a residue other than a histidine: in peptidases from family C89 (acid ceramidase) an arginine is the general base. Cysteine peptidases can be grouped into fourteen different clans, with members of each clan possessing a tertiary fold unique to the clan. Four clans of cysteine peptidases share structural similarities with serine and threonine peptidases and asparagine lyases. From sequence similarities, cysteine peptidases can be clustered into over 80 different families [ ]. Clans CF, CM, CN, CO, CP and PD contain only one family.Cysteine peptidases are often active at acidic pH and are therefore confined to acidic environments, such as the animal lysosome or plant vacuole. Cysteine peptidases can be endopeptidases, aminopeptidases, carboxypeptidases, dipeptidyl-peptidases or omega-peptidases. They are inhibited by thiol chelators such as iodoacetate, iodoacetic acid, N-ethylmaleimide or p-chloromercuribenzoate. Clan CA includes proteins with a papain-like fold. There is a catalytic triad which occurs in the order: Cys/His/Asn (or Asp). A fourth residue, usually Gln, is important for stabilising the acyl intermediate that forms during catalysis, and this precedes the active site Cys. The fold consists of two subdomains with the active site between them. One subdomain consists of a bundle of helices, with the catalytic Cys at the end of one of them, and the other subdomain is a β-barrel with the active site His and Asn (or Asp). There are over thirty families in the clan, and tertiary structures have been solved for members of most of these. Peptidases in clan CA are usually sensitive to the small molecule inhibitor E64, which is ineffective against peptidases from other clans of cysteine peptidases [ ].Clan CD includes proteins with a caspase-like fold. Proteins in the clan have an α/β/α sandwich structure. There is a catalytic dyad which occurs in the order His/Cys. The active site His occurs in a His-Gly motif and the active site Cys occurs in an Ala-Cys motif; both motifs are preceded by a block of hydrophobic residues [ ]. Specificity is predominantly directed towards residues that occupy the S1 binding pocket, so that caspases cleave aspartyl bonds, legumains cleave asparaginyl bonds, and gingipains cleave lysyl or arginyl bonds.Clan CE includes proteins with an adenain-like fold. The fold consists of two subdomains with the active site between them. One domain is a bundle of helices, and the other a β-barrel. The subdomains are in the opposite order to those found in peptidases from clan CA, and this is reflected in the order of active site residues: His/Asn/Gln/Cys. This has prompted speculation that proteins in clans CA and CE are related, and that members of one clan are derived from a circular permutation of the structure of the other.Clan CL includes proteins with a sortase B-like fold. Peptidases in the clan hydrolyse and transfer bacterial cell wall peptides. The fold shows a closed β-barrel decorated with helices with the active site at one end of the barrel [ ]. The active site consists of a His/Cys catalytic dyad.Cysteine peptidases with a chymotrypsin-like fold are included in clan PA, which also includes serine peptidases. Cysteine peptidases that are N-terminal nucleophile hydrolases are included in clan PB. Cysteine peptidases with a tertiary structure similar to that of the serine-type aspartyl dipeptidase are included in clan PC. Cysteine peptidases with an intein-like fold are included in clan PD, which also includes asparagine lyases.
Protein Domain
Name: Accessory gene regulator B
Type: Family
Description: This entry represents the accessory gene regulator protein B (AgrB) family. Proteins in this family include AgrB from Staphylococcus aureus and FsrB from Enterococcus faecalis. The accessory gene regulator (agr) of Staphylococcus aureus is the central regulatory system that controls the gene expression for a large set of virulence factors. The arg locus consists of two transcripts: RNAII and RNAIII. RNAII encodes four genes (agrA, B, C, and D) whose gene products assemble a quorum sensing system. At low cell density, the agr genes are continuously expressed at basal levels. A signal molecule, autoinducing peptide (AIP), produced and secreted by the bacteria, accumulates outside of the cells. When the cell density increases and the AIP concentration reaches athreshold, it activates the agr response, i.e. activation of secreted protein gene expression and subsequent repression of cell wall-associated protein genes. AgrB and AgrD are essential for the production of the autoinducing peptide which functions as a signal for quorum sensing. AgrB is a transmembrane protein [ ] involved in the proteolytic processing of AgrD, and may have both proteolytic and transporter activities, facilitating the export ofthe processed AgrD peptide [ ]. FsrB may be involved in the proteolytic processing of a quorum sensing system signal molecule precursor required for the regulation of the virulence genes for gelatinase (gelE) and a serine protease (sprE) [ ].A cysteine peptidase is a proteolytic enzyme that hydrolyses a peptide bond using the thiol group of a cysteine residue as a nucleophile. Hydrolysis involves usually a catalytic triad consisting of the thiol group of the cysteine, the imidazolium ring of a histidine, and a third residue, usually asparagine or aspartic acid, to orientate and activate the imidazolium ring. In only one family of cysteine peptidases, is the role of the general base assigned to a residue other than a histidine: in peptidases from family C89 (acid ceramidase) an arginine is the general base. Cysteine peptidases can be grouped into fourteen different clans, with members of each clan possessing a tertiary fold unique to the clan. Four clans of cysteine peptidases share structural similarities with serine and threonine peptidases and asparagine lyases. From sequence similarities, cysteine peptidases can be clustered into over 80 different families [ ]. Clans CF, CM, CN, CO, CP and PD contain only one family.Cysteine peptidases are often active at acidic pH and are therefore confined to acidic environments, such as the animal lysosome or plant vacuole. Cysteine peptidases can be endopeptidases, aminopeptidases, carboxypeptidases, dipeptidyl-peptidases or omega-peptidases. They are inhibited by thiol chelators such as iodoacetate, iodoacetic acid, N-ethylmaleimide or p-chloromercuribenzoate. Clan CA includes proteins with a papain-like fold. There is a catalytic triad which occurs in the order: Cys/His/Asn (or Asp). A fourth residue, usually Gln, is important for stabilising the acyl intermediate that forms during catalysis, and this precedes the active site Cys. The fold consists of two subdomains with the active site between them. One subdomain consists of a bundle of helices, with the catalytic Cys at the end of one of them, and the other subdomain is a β-barrel with the active site His and Asn (or Asp). There are over thirty families in the clan, and tertiary structures have been solved for members of most of these. Peptidases in clan CA are usually sensitive to the small molecule inhibitor E64, which is ineffective against peptidases from other clans of cysteine peptidases [].Clan CD includes proteins with a caspase-like fold. Proteins in the clan have an α/β/α sandwich structure. There is a catalytic dyad which occurs in the order His/Cys. The active site His occurs in a His-Gly motif and the active site Cys occurs in an Ala-Cys motif; both motifs are preceded by a block of hydrophobic residues [ ]. Specificity is predominantly directed towards residues that occupy the S1 binding pocket, so that caspases cleave aspartyl bonds, legumains cleave asparaginyl bonds, and gingipains cleave lysyl or arginyl bonds.Clan CE includes proteins with an adenain-like fold. The fold consists of two subdomains with the active site between them. One domain is a bundle of helices, and the other a β-barrel. The subdomains are in the opposite order to those found in peptidases from clan CA, and this is reflected in the order of active site residues: His/Asn/Gln/Cys. This has prompted speculation that proteins in clans CA and CE are related, and that members of one clan are derived from a circular permutation of the structure of the other.Clan CL includes proteins with a sortase B-like fold. Peptidases in the clan hydrolyse and transfer bacterial cell wall peptides. The fold shows a closed β-barrel decorated with helices with the active site at one end of the barrel [ ]. The active site consists of a His/Cys catalytic dyad.Cysteine peptidases with a chymotrypsin-like fold are included in clan PA, which also includes serine peptidases. Cysteine peptidases that are N-terminal nucleophile hydrolases are included in clan PB. Cysteine peptidases with a tertiary structure similar to that of the serine-type aspartyl dipeptidase are included in clan PC. Cysteine peptidases with an intein-like fold are included in clan PD, which also includes asparagine lyases.
Protein Domain
Name: APC10/DOC domain
Type: Domain
Description: The anaphase-promoting complex (APC) or cyclosome is a multi-subunit E3 protein ubiquitin ligase that regulates important events in mitosis, such as the initiation of anaphase and exit from telophase. The APC, in conjunction with other enzymes, assembles multi-ubiquitin chains on a variety of regulatory proteins, thereby targeting them for proteolysis by the 26S proteasome [ ].One of the subunits of the APC that is required for ubiquitination activity is APC10, a one-domain protein homologous to a sequence element, termed the DOC domain, found in several hypothetical proteins that may also mediate ubiquitination reactions, because they contain combinations of either RING finger (see ), cullin (see ) or HECT (see ) domains [ , , ].The DOC domain consists of a β-sandwich, in which a five-stranded antiparallel β-sheet is packed on top of a three stranded antiparallel β-sheet, exhibiting a 'jellyroll' fold [, ].Proteins known to contain a DOC domain include: Eucaryotic Doc1/Apc10.Mammalian protein associated with the transcription factor Myc (PAM).Mouse runty-jerky-sterile (RJS) protein.Human HERC2, the ortholog of RJS.
Protein Domain
Name: Generative cell specific-1/HAP2 domain
Type: Domain
Description: The gene encoding Arabidopsis HAP2 is allelic with GCS1 (Generative cell-specific protein 1). HAP2 is expressed only in the haploid sperm and is required for efficient guidance of the pollen tube to the ovules. In Arabidopsis the protein is a predicted membrane protein with an N-terminal secretion signal, a single transmembrane domain and a C-terminal histidine-rich domain [ ]. HAP2-GCS1 is found from plants to lower eukaryotes and is necessary for the fusion of the gametes in fertilisation. It is involved in a novel mechanism for gamete fusion where a first species-specific protein binds male and female gamete membranes together after which a second, broadly conserved protein, either directly or indirectly, causes fusion of the two membranes together. The broadly conserved protein is represented by this HAP2-GCS1 domain, conserved from plants to lower eukaryotes []. In Plasmodium berghei the protein is expressed only in male gametocytes and gametes, having a male-specific function during the interaction with female gametes, and being indispensable for parasite fertilisation. The gene in plants and eukaryotes might well have originated from acquisition of plastids from red algae [].
Protein Domain
Name: Cullin homology domain
Type: Domain
Description: Cullins are a family of hydrophobic proteins that act as scaffolds for ubiquitin ligases (E3). Cullins are found throughout eukaryotes. Humans express seven cullins (Cul1, 2, 3, 4A, 4B, 5 and 7), each forming part of a multi-subunit ubiquitin complex. Cullin-RING ubiquitin ligases (CRLs), such as Cul1 (SCF) [ ], play an essential role in targeting proteins for ubiquitin-mediated destruction; as such, they are diverse in terms of composition and function, regulating many different processes from glucose sensing and DNA replication to limb patterning and circadian rhythms. The catalytic core of CRLs consists of a RING protein and a cullin family member. For Cul1, the C-terminal cullin-homology domain binds the RING protein. The RING protein appears to function as a docking site for ubiquitin-conjugating enzymes (E2s). Other proteins contain a cullin-homology domain, such as the APC2 subunit of the anaphase-promoting complex/cyclosome and the p53 cytoplasmic anchor PARC; both APC2 and PARC have ubiquitin ligase activity. The N-terminal region of cullins is more variable, and is used to interact with specific adaptor proteins [, , ].
Protein Domain
Name: EYA domain
Type: Domain
Description: The Eyes absent proteins are members of a conserved regulatory network implicated in the development of the eye, muscle, kidney and ear. Eyes absent is a nuclear transcription factor, acting through interaction with homeodomain-containing Sine oculis (also known as Six) proteins. Eyes absent is also a protein tyrosine phosphatase [, ], It does not resemble theclassical tyrosine phosphatases that use cysteine as a nucleophile and proceed by means of a thiol-phosphate intermediate. Rather, Eyes absent is the prototype for a class of protein tyrosine phosphatases that use a nucleophilic aspartic acid in a metal-dependent reaction. Furthermore, the phosphatase activity of Eyes absentcontributes to its ability to induce eye formation in drosophila. Thus, Eyes absent belongs to the phosphatase subgroup of the haloacid dehalogenase (HAD) superfamily and appears to act as a nuclear transcriptional coactivator with intrinsic phosphatase activity.The Eyes absent proteins contain a divergent 200-300 residue long N-terminal region and a conserved C-terminal domain of approximately 270 residues, the EYA domain, which is critical for activity and believed to participate in protein-protein interactions [ ].
Protein Domain
Name: OST-HTH/LOTUS domain
Type: Domain
Description: This predicted RNA-binding domain found in insect Oskar and vertebrate TDRD5/TDRD7 proteins that nucleate or organise structurally related ribonucleoprotein (RNP) complexes, the polar granule and nuage, is poorly understood [ , ]. The domain adopts the winged helix-turn-helix fold and binds RNA with a potential specificity for dsRNA []. In eukaryotes, this domain is often combined in the same polypeptide with protein-protein- or lipid- interaction domains that might play a role in anchoring these proteins to specific cytoskeletal structures. Thus, proteins with this domain might have a key role in the recognition and localisation of dsRNA, including miRNAs, rasiRNAs and piRNAs hybridized to their targets. In other cases, this domain is fused to ubiquitin-binding, E3 ligase and ubiquitin-like domains, indicating a previously under-appreciated role for ubiquitination in regulating the assembly and stability of nuage-like RNP complexes. Both bacteria and eukaryotes encode a conserved family of proteins that combines this predicted RNA-binding domain with a previously uncharacterised RNAse domain belonging to the superfamily that includes the 5'->3' nucleases, PIN and NYN domains [ ].
Protein Domain
Name: Spen paralogue and orthologue SPOC, C-terminal
Type: Domain
Description: Spen (split end) proteins regulate the expression of key transcriptional effectors in diverse signalling pathways. They are large proteins characterised by N-terminal RNA-binding motifs and a highly conserved C-terminal SPOC (Spen paralog and ortholog C-terminal) domain. The function of the SPOC domain is unknown, but the SPOC domain of the SHARP Spen protein has been implicated in the interaction of SHARP with the SMRT/NcoR corepressor, where SHARP plays an essential role in the repressor complex [ ].The SPOC domain is folded into a single compact domain consisting of a β-barrel with seven strands framed by six alpha helices. A number of deep grooves and clefts in the surface, plus two nonpolar loops, render the SPOC domain well suited to protein-protein interactions; most of the conserved residues occur on the protein surface rather than in the core. Other proteins containing a SPOC domain include Drosophila Split ends, which promotes sclerite development in the head and restricts it in the thorax, and mouse MINT (homologue of SHARP), which is involved in skeletal and neuronal development via its repression of Msx2.
Protein Domain
Name: Primosome PriB/single-strand DNA-binding
Type: Family
Description: The Escherichia coli single-strand binding protein [ ] (gene ssb), also known as the helix-destabilising protein, is a protein of 177 amino acids. It binds tightly, as a homotetramer, to single-stranded DNA (ss-DNA) and plays an important role in DNA replication, recombination and repair. Closely related variants of SSB are encoded in the genome of a variety of large self-transmissible plasmids. SSB has also been characterized in bacteria such as Proteus mirabilis or Serratia marcescens. Eukaryotic mitochondrial proteins that bind ss-DNA and are probably involved in mitochondrial DNA replication are structurally and evolutionary related to prokaryotic SSB.Primosomal replication protein N (PriB) is a specialist protein from bacteria that binds single-stranded DNA at the primosome assembly site (PAS). The primosome is a mobile multiprotein replication priming complex which is believe to operate on the lagging-strand template at the E. coli DNA replication fork [ ]. The primosome consists of one monomer of PriC and DnaT, two monomers of PriA, two dimers of PriB and one hexamer of DnaB [].
Protein Domain
Name: Survival motor neuron, Tudor domain
Type: Domain
Description: This entry includes eukaryotic survival motor neuron (SMN) and survival of motor neuron-related-splicing factor 30 (SPF30) proteins. The Survival of Motor Neurons (SMN) protein, the product of the spinal muscular atrophy-determining gene, is part of a large macromolecular complex (SMN complex) that functions in the assembly of spliceosomal small nuclear ribonucleoproteins (snRNPs). The SMN complex functions as a specificity factor essential for the efficient assembly of Sm proteins on U snRNAs and likely protects cells from illicit, and potentially deleterious, non-specific binding of Sm proteins to RNAs [ , ]. SMN has three highly conserved domains: a short N-terminal segment responsible for binding with high affinity to Gemin2 (a protein involved in snRNP assembly), therefore, called Gemin2-binding domain (G2-BD), a central Tudor domain and a C-terminal YG-box domain. This domain covers the Gemin2-binding domain (G2-BD) and the Tudor domain of SMN and SPF30, in which an aromatic cage mediates symmetric dimethylarginine modifications recognition by electrostatic stabilization through cation-pi interactions in a number of proteins involved in RNA processing [, , ].
Protein Domain
Name: Zinc finger, UBR-type
Type: Domain
Description: It has been observed that the identity of N-terminal residues of a protein is related to the half life of the protein. This observation yields a rule,called the N-end rule [ ]. Similar but distinct versions of the N-end rule operate in all organisms examined, from mammals to fungi and bacteria. Ineukaryotes, the N-end rule pathway is a part of the ubiquitin degradation system. Some proteins that have a very short half life contain a specificmotif at their N terminus, the N-degron. It consists of a destabilising N-terminal residue and an internal Lys, which is the site of poly-Ub chain[ , ].The UBR1 protein was shown to bind specifically to proteins bearing N-terminal residues that are destabilising according to the N-end rule, but not tootherwise identical proteins bearing stabilising N-terminal residues [ ]. UBR1 contains an N-terminal conserved region (the UBR-type zinc finger) which is also found in various proteins implicated in N-degron recognition. The UBR-type zinc finger defines a unique E3 class, most likely N-degron specific [].
Protein Domain
Name: Citron homology (CNH) domain
Type: Domain
Description: Based on sequence similarities a domain of homology has been identified in the following proteins [ , ]:Citron and Citron kinase. These two proteins interact with the GTP-bound forms of the small GTPases Rho and Rac but not with Cdc42.Myotonic dystrophy kinase-related Cdc42-binding kinase (MRCKalpha). This serine/threonine kinase interacts with the GTP-bound form of the small GTPase Cdc42 and to a lesser extent with that of Rac.NCK Interacting Kinase (NIK), a serine/threonine protein kinase.ROM-1 and ROM-2, from yeast. These proteins are GDP/GTP exchange proteins (GEPs) for the small GTP binding protein Rho1.This domain, called the citron homology domain (CNH), is often found after cysteine rich and pleckstrin homology (PH) domains at the C-terminal end of a group of eukaryotic proteins. It is thought to act as a regulatory domain and could be involved in macromolecular interactions [ , , , ]. Its structure has been solved in Rho guanyl nucleotide exchange factor (Rom2) from Neosartorya fumigata (Aspergillus fumigatus, ), where it shows a canonical β-propeller fold containing seven blades connected by small loops and arranged in a circular fashion [ ].
Protein Domain
Name: La-type HTH domain
Type: Domain
Description: Human Ro ribonucleoproteins (RNPs) are composed of one of the four small Y RNAs and at least two proteins, Ro60 and La. The La protein is a 47kDa polypeptide that frequently acts as an autoantigen in systemic lupus erythematosus and Sjogren's syndrome [ ]. In the nucleus, La acts as a RNA polymerase III (RNAP III) transcription factor, while in the cytoplasm, La acts as a translation factor []. In the nucleus, La binds to the 3'UTR of nascent RNAP III transcripts to assist in folding and maturation []. In the cytoplasm, La recognises specific classes of mRNAs that contain a 5'-terminal oligopyrimidine (5'TOP) motif known to control protein synthesis []. The specific recognition is mediated by the N-terminal domain of La, which comprises a La motif and a RNA recognition motif (RRM). The La motif adopts an alpha/beta fold that comprises a winged-helix motif [].Homologous La domain-containing proteins have been identified in a wide range of organisms except Archaea, bacteria and viruses [ ].This domain is found at the N terminus of La RNA-binding proteins as well as other proteins [ ].
Protein Domain
Name: LRP chaperone MESD
Type: Family
Description: LRP chaperone MESD (also known as mesoderm development candidate 2) represents a set of highly conserved proteins found from nematodes to humans. It is a chaperone that specifically assists with the folding of β-propeller/EGF modules within the family of low-density lipoprotein receptors (LDLRs). It also acts as a modulator of the Wnt pathway, since some LDLRs are coreceptors for the canonical Wnt pathway and is essential for specification of embryonic polarity and mesoderm induction []. The Drosophila homologue, known as boca, is an endoplasmic reticulum protein required for wingless signaling and trafficking of LDL receptor family members [].The final C-terminal residues, KEDL, are the endoplasmic reticulum retention sequence as it is an ER protein specifically required for the intracellular trafficking of members of the low-density lipoprotein family of receptors (LDLRs) [ ]. The N- and C-terminal sequences are predicted to adopt a random coil conformation, with the exception of an isolated predicted helix within the N-terminal region, The central folded domain flanked by natively unstructured regions is the necessary structure for facilitating maturation of LRP6 (Low-Density Lipoprotein Receptor-Related Protein 6 Maturation) [].
Protein Domain
Name: EYA domain superfamily
Type: Homologous_superfamily
Description: The Eyes absent proteins are members of a conserved regulatory network implicated in the development of the eye, muscle, kidney and ear. Eyes absent is a nuclear transcription factor, acting through interaction with homeodomain-containing Sine oculis (also known as Six) proteins. Eyes absent is also a protein tyrosine phosphatase [, ], It does not resemble theclassical tyrosine phosphatases that use cysteine as a nucleophile and proceed by means of a thiol-phosphate intermediate. Rather, Eyes absent is the prototype for a class of protein tyrosine phosphatases that use a nucleophilic aspartic acid in a metal-dependent reaction. Furthermore, the phosphatase activity of Eyes absentcontributes to its ability to induce eye formation in drosophila. Thus, Eyes absent belongs to the phosphatase subgroup of the haloacid dehalogenase (HAD) superfamily and appears to act as a nuclear transcriptional coactivator with intrinsic phosphatase activity.The Eyes absent proteins contain a divergent 200-300 residue long N-terminal region and a conserved C-terminal domain of approximately 270 residues, the EYA domain, which is critical for activity and believed to participate in protein-protein interactions [ ].
Protein Domain
Name: WH1/EVH1 domain
Type: Domain
Description: The EVH1 (WH1, RanBP1-WASP) domain is found in multi-domain proteins implicated in a diverse range of signalling, nuclear transport and cytoskeletal events. This domain of around 115 amino acids is present in species ranging from yeast to mammals. Many EVH1-containing proteins associate with actin-based structures and play a role in cytoskeletal organisation. EVH1 domains recognise and bind the proline-rich motif FPPPP with low-affinity, further interactions then form between flanking residues [ , ].WASP family proteins contain an EVH1 (WH1) in their N-terminals which bind proline-rich sequences in the WASP interacting protein. Proteins of the RanBP1 family contain a WH1 domain in their N-terminal region, which seems to bind a different sequence motif present in the C-terminal part of RanGTP protein [ , ]. Tertiary structure of the WH1 domain of the Mena protein revealed structure similarities with the pleckstrin homology (PH) domain. The overall fold consists of a compact parallel β-sandwich, closed along one edge by a long α-helix. A highly conserved cluster of three surface-exposed aromatic side-chains forms the recognition site for the molecules target ligands. [ ].
Protein Domain
Name: Tellurite resistance predicted, YeaR
Type: Family
Description: Proteins in this group are stand-alone YeaR proteins, the predication is based on domain association, and is known to be involved in tellurite resistance.Tellurite resistance protein TehB is encoded by the tellurite-reducing operon tehAB[ ]. TehB exists in a two-domain form () with a C-terminal SAM-dependent methyltransferase domain and an N-terminal domain of unknown function; and in the form of two stand-alone proteins: YeaR protein (this group) and single-domain TehB methyltransferase ( ). When upregulated or present in high copy number, TehB of Streptococcus pneumoniae is responsible for potassium tellurite resistance, this is achieved by increasing the reduction rate of tellurite to metallic tellurium within the bacterium. TehB is a cytoplasmic protein that possesses three conserved motifs (I, II, and III) found in S-adenosyl-L-methionine (SAM)-dependent non-nucleic acid methyltransferases [ ]. Conformational changes in TehB are observed upon binding of both tellurite and SAM, suggesting that TehB utilises a methyltransferase activity in the detoxification of tellurite [].The role of YeaR protein and of the corresponding domain in two-domain TehB in tellurite resistance is not known.
Protein Domain
Name: Dok-7, PTB domain
Type: Domain
Description: Dok-7 is a cytoplasmic adaptor protein and a member of the Dok family. It is a substrate of MuSK (a receptor tyrosine kinase) and an activator of MuSK's kinase activity [ ]. Mutations of the Dok-7 gene cause Myasthenic syndrome, congenital, 10 (CMS10), a form of congenital myasthenic syndrome, a group of disorders characterised by failure of neuromuscular transmission, including pre-synaptic, synaptic, and post-synaptic disorders that are not of autoimmune origin [, ].The Dok family adapters are phosphorylated by different protein tyrosine kinases. Dok proteins are involved in processes such as modulation of cell differentiation and proliferation, as well as in control of the cell spreading and migration The Dok protein contains an N-terminal pleckstrin homology (PH) domain followed by a central phosphotyrosine binding (PTB) domain, which has a PH-like fold, and a proline- and tyrosine-rich C-terminal tail. The PH domain binds to acidic phospholids and localizes proteins to the plasma membrane, while the PTB domain mediates protein-protein interactions by binding to phosphotyrosine-containing motifs [ ].This entry represents the PTB domain of Dok-7.
Protein Domain
Name: Bacteriophage phiKZ, Orf92, internal head
Type: Family
Description: Phage internal head proteins (IP) are proteins that are encoded by a bacteriophage and assembled into the mature virion inside the capsid head. The most analogous characterised IP proteins are those of bacteriophage T4, which are known to be proteolytically processed during phage maturation, and then subsequently injected into the host cell during infection. The phiKZ_IP family consists of internal head proteins encoded by phiKZ-like phages. Each phage encodes three to six members of this family []. Members of the family reside in the head [] and are cleaved during phage maturation to separate an N-terminal propeptide from a C-terminal domain. The C-terminal domain remains in the mature capsid. The N-terminal propeptide domain is either mostly or completely removed from the mature capsid. In one case, an unrelated polypeptide is embedded in the propeptide and also remains in the mature capsid. The phiKZ-like IP proteins are not discernibly homologous to the T4 IP proteins, and it is not known if the phiKZ-like IP proteins are injected into the host cell, or have some other function within the head.
Protein Domain
Name: Angiomotin
Type: Family
Description: Angiogenesis, the process whereby new blood vessels are formed by a mechanism of sprouting from existing vessels, has recently been the subjectof intense research. Angiostatin, a proteolytically generated fragment of plasminogen consisting of the first four kringle domains, is a potentangiogenesis inhibitor. Whilst this protein has been known for some time, until recently its mechanism of action was unknown. A functional angiostatin-binding protein has been described that inhibits endothelial cell motility and proliferation, key stages in angiogenesis []. This protein has beennamed angiomotin. Angiomotin was identified via a yeast two-hybrid screen for proteins that interact with angiostatin. It is a 72kDa cell-surface associated proteinexpressed in endothelium and other tissues where angiogenesis occurs, such as placenta and tumours. It appears that angiomotin stimulates angiogenesisby increasing cell motility and that binding of angiostatin inhibits this process []. Two paralogues of angiomotin have recently been cloned andtheir amino acid sequences determined. Together with angiomotin, these proteins form a novel family bearing little sequence similarity to otherknown proteins. Sequence analysis has revealed putative coiled-coil and PDZ-binding domains [].
Protein Domain
Name: Spen paralogue/orthologue C-terminal, metazoa
Type: Domain
Description: Spen (split end) proteins regulate the expression of key transcriptional effectors in diverse signalling pathways. They are large proteins characterised by N-terminal RNA-binding motifs and a highly conserved C-terminal SPOC (Spen paralog and ortholog C-terminal) domain. The function of the SPOC domain is unknown, but the SPOC domain of the SHARP Spen protein has been implicated in the interaction of SHARP with the SMRT/NcoR corepressor, where SHARP plays an essential role in the repressor complex [ ].The SPOC domain is folded into a single compact domain consisting of a β-barrel with seven strands framed by six alpha helices. A number of deep grooves and clefts in the surface, plus two nonpolar loops, render the SPOC domain well suited to protein-protein interactions; most of the conserved residues occur on the protein surface rather than in the core. Other proteins containing a SPOC domain include Drosophila Split ends, which promotes sclerite development in the head and restricts it in the thorax, and mouse MINT (homologue of SHARP), which is involved in skeletal and neuronal development via its repression of Msx2.
Protein Domain
Name: Formin, FH3 domain
Type: Domain
Description: Formin homology (FH) proteins play a crucial role in the reorganisation of the actin cytoskeleton, which mediates various functions of the cell cortex including motility, adhesion, and cytokinesis [ ]. Formins are multidomain proteins that interact with diverse signalling molecules and cytoskeletal proteins, although some formins have been assigned functions within the nucleus. Formins are characterised by the presence of three FH domains (FH1, FH2 and FH3), although members of the formin family do not necessarily contain all three domains []. The proline-rich FH1 domain mediates interactions with a variety of proteins, including the actin-binding protein profilin, SH3 (Src homology 3) domain proteins, and WW domain proteins. The FH2 domain () is required to inhibit actin polymerisation. The FH3 domain is less well conserved and is required for directing formins to the correct intracellular location, such the mitotic spindle [ ], or the projection tip during conjugation []. In addition, some formins can contain a GTPase-binding domain (GBD) () required for binding to Rho small GTPases, and a C-terminal conserved Dia-autoregulatory domain (DAD). This entry represents the FH3 domain.
Protein Domain
Name: Quinoprotein relay system zinc metallohydrolase 2
Type: Domain
Description: This entry represent a domain found in a group of putative Zn metallohydrolases, which share homology with the sulfate metabolism associated protein SoxH, Bacillus cereus beta-lactamase II (see PDB:1bc2), and, more distantly, hydroxyacylglutathione hydrolase (glyoxalase II).Proteins containing this domain occur in genomes with both PQQ biosynthesis and a PQQ-dependent (quinoprotein) dehydrogenase that has a motif of two consecutive Cys residues ( ). The Cys-Cys motif is associated with electron transfer by specialised cytochromes such as c551. All these genomes also include a fusion protein ( ) whose domains resemble SoxY and SoxZ from thiosulfate oxidation. A conserved Cys in this fusion protein aligns to the Cys residue in SoxY that carries sulfur cycle intermediates. In many genomes, the genes for PQQ biosynthesis enzymes, PQQ-dependent enzymes, their associated cytochromes, and protein in this entry are clustered. Note that one to three closely related Zn metallohydrolases may occur; this entry represents a specific clade among them. Some proteins in this entry have a short additional N-terminal domain with four conserved Cys residues.
Protein Domain
Name: Bacteriophage T4, Gp59, helicase assembly protein, N-terminal
Type: Domain
Description: The Bacteriophage T4 gene 59 helicase assembly protein (Gp59) is required for recombination-dependent DNA replication and repair, which is the predominant mode of DNA replication in the late stage of T4 infection. Gp59 accelerates the loading of the T4 gene 41 helicase during DNA synthesis by the T4 replication system in vitro. This protein binds to both T4 gene 41 helicase and T4 gene 32 single-stranded DNA binding protein, and to single and double-stranded DNA [ ].The structure of Gp59 helicase assembly protein reveals a novel α-helical bundle fold with two domains of similar size. Surface residues are predominantly basic (pI 9.37) with clusters of acidic residues but exposed hydrophobic residues suggest sites for potential contact with DNA and with other protein molecules [ ]. The N-terminal domain shares structural homology with the high mobility group (HMG) proteins from eukaryotic organisms and it has been suggested that it plays a role in duplex DNA binding ahead of the fork. The C-terminal domain interacts with the helicase (T4 gp41) and with SSB (single-stranded binding protein T4 gp32) [].
Protein Domain
Name: Olfactomedin-like domain
Type: Domain
Description: The olfactomedin-like or OLF domain is a module of ~260 residues present in metazoan secreted glycoproteins with a characteristic tissue-specificexpression. The domain is named after bullfrog olfactomedin, an extracellular matrix protein of olfactory neuroepithelium, whereof it forms the C-terminalpart. Other proteins of the olfactomedin family contain the OLF domain in the C-terminal part, while the N terminus is more variable. Proteins of thelatrophilin subfamily have an OLF domain in the N-terminal extracellular part, C-terminal to a SUEL-type lectin domain, and theirC-terminal part contains domains of the G-protein coupled receptors family 2. Some OLF domain proteins are involved in the formation of the extracellular matrix, e.g. bullfrog olfactomedin, seaurchin amassin and C. elegans unc-122. In addition, OLF domain proteins can function in developmental processes, e.g. noelin and tiarin [, , , ].Secondary structure predictions show that the OLF domain contains several β-strands [, , ]. A disulfide bond between two conserved cysteines within the OLF domain of human myocilin is implicated in mutations associated with severe forms of primary open angle glaucoma [].
Protein Domain
Name: HAP2/GCS1
Type: Family
Description: The gene encoding Arabidopsis HAP2 is allelic with GCS1 (Generative cell-specific protein 1). HAP2 is expressed only in the haploid sperm and is required for efficient guidance of the pollen tube to the ovules. In Arabidopsis the protein is a predicted membrane protein with an N-terminal secretion signal, a single transmembrane domain and a C-terminal histidine-rich domain [ ]. HAP2-GCS1 is found from plants to lower eukaryotes and is necessary for the fusion of the gametes in fertilisation. It is involved in a novel mechanism for gamete fusion where a first species-specific protein binds male and female gamete membranes together after which a second, broadly conserved protein, either directly or indirectly, causes fusion of the two membranes together. The broadly conserved protein is represented by this HAP2-GCS1 domain, conserved from plants to lower eukaryotes []. In Plasmodium berghei the protein is expressed only in male gametocytes and gametes, having a male-specific function during the interaction with female gametes, and being indispensable for parasite fertilisation. The gene in plants and eukaryotes might well have originated from acquisition of plastids from red algae [].
Protein Domain
Name: TRIM21, PRY/SPRY domain
Type: Domain
Description: This domain, consisting of the distinct N-terminal PRY subdomain followed by the SPRY subdomain, is found at the C terminus of TRIM21, which is also known as Sjogren Syndrome Antigen A (SSA), SSA1, 52kDa Ribonucleoprotein Autoantigen (Ro52, Ro/SSA, SS-A/Ro) or RING finger protein 81 (RNF81). TRIM proteins are defined by the presence of the tripartite motif RING/B-box/coiled-coil region and also known as RBCC proteins [ ]. As an E3 ligase, TRIM21 mediates target specificity in ubiquitination; it regulates type 1 interferon and proinflammatory cytokines via ubiquitination of interferon regulatory factors (IRFs) [, ]. It is up-regulated at the site of autoimmune inflammation, such as cutaneous lupus lesions, indicating a central role in the tissue destructive inflammatory process []. It interacts with auto-antigens in patients with Sjogren syndrome and systemic lupus erythematosus, a chronic systemic autoimmune disease characterized by the presence of autoantibodies against the protein component of the human intracellular ribonucleoprotein-RNA complexes and more specifically TRIM21, Ro60/TROVE2 and La/SSB proteins []. It binds the Fc part of IgG molecules via its PRY-SPRY domain with unexpectedly high affinity [, ].
Protein Domain
Name: Helix-turn-helix, type 11
Type: Domain
Description: Winged helix DNA-binding proteins share a related winged helix-turn-helix DNA-binding motif, where the "wings", or loops, are small β-sheets. The winged helix motif consists of two wings (W1, W2), three alpha helices (H1, H2, H3) and three β-sheets (S1, S2, S3) arranged in the order H1-S1-H2-H3-S2-W1-S3-W2 [ ]. The DNA-recognition helix makes sequence-specific DNA contacts with the major groove of DNA, while the wings make different DNA contacts, often with the minor groove or the backbone of DNA. Several winged-helix proteins display an exposed patch of hydrophobic residues thought to mediate protein-protein interactions.This entry represents a subset of the winged helix domain superfamily which is predominantly found in bacterial proteins, though there are also some archaeal and eukaryotic examples. This domain is commonly found in the biotin (vitamin H) repressor protein BirA which regulates transcription of the biotin operon [ ]. It is also found in other proteins including regulators of amino acid biosynthsis such as LysM [], and regulators of carbohydrate metabolisms such as LicR and FrvR [, ].
Protein Domain
Name: Penicillin-binding protein, transpeptidase
Type: Domain
Description: This signature identifies a large group of proteins, which include:Beta-lactamase precursor ( , penicillinase) Peptidoglycan synthetase ftsI ( , peptidoglycan glycosyltransferase 3) Methicillin resistance mecR1 proteinMethicillin resistance blaR1 proteinThe large number of penicillin binding proteins, which are represented in this group of sequences, are responsible for the final stages of peptidoglycan biosynthesis for cell wall formation. The proteins synthesise cross-linked peptidoglycan from lipid intermediates, and contain a penicillin-sensitive transpeptidase carboxy-terminal domain. The active site serine (residue 337 in ) is conserved in all members of this family [ ].MecR1 and BlaR1 are metallopeptidases belonging to MEROPS peptidase family M56, clan M-. BlaR1 and MecR1 cleave their cognate transcriptional repressors BlaI and MecI, respectively, activating the synthesis of MecA.MecR1 is present in Staphylococcus aureus and Staphylococcus sciuri, whereas BlaR1 (also known as BlaR, PenR1, or PenJ) has been found in Bacillus licheniformis, Staphylococcus epidermidis, Staphylococcus haemolyticus, and several S. aureus strains. These proteins are either plasmid-encoded, chromosomal, or transposon-mediated. MecR1/BlaR1 proteins are made up by homologous N-terminal 330-residue transmembrane metallopeptidase domains linked to extracellular 260-residue homologous PBP-like penicillin sensor moieties.
Protein Domain
Name: Runt domain
Type: Domain
Description: The AML1 gene is rearranged by the t(8;21) translocation in acute myeloid leukemia []. The gene is highly similar to the Drosophila melanogaster segmentation gene runt and to the mouse transcription factor PEBP2 alpha subunit gene [ ].The region of shared similarity, known as the Runt domain, is responsible for DNA-binding and protein-protein interaction. In addition to the highly-conserved Runt domain, the AML-1 gene product carries a putative ATP-binding site (GRSGRGKS), and has a C-terminal regionrich in proline and serine residues. The protein (known as acute myeloid leukemia 1 protein, oncogene AML-1, core-binding factor (CBF), alpha-B subunit, etc.) binds to the core site, 5'-pygpyggt-3', of a number of enhancers and promoters. The protein is a heterodimer of alpha- and beta-subunits. The alpha-subunit binds DNA as a monomer, and appears to have a role in the development ofnormal hematopoiesis. CBF is a nuclear protein expressed in numerous tissue types, except brain and heart; highest levels have been found to occur inthymus, bone marrow and peripheral blood.This domain occurs towards the N terminus of the proteins in this entry.
Protein Domain
Name: Dok-7, PH domain
Type: Domain
Description: Dok-7 is a cytoplasmic adaptor protein and a member of the Dok family. It is a substrate of MuSK (a receptor tyrosine kinase) and an activator of MuSK's kinase activity [ ]. Mutations of the Dok-7 gene cause Myasthenic syndrome, congenital, 10 (CMS10), a form of congenital myasthenic syndrome, a group of disorders characterised by failure of neuromuscular transmission, including pre-synaptic, synaptic, and post-synaptic disorders that are not of autoimmune origin [, ].The Dok family adapters are phosphorylated by different protein tyrosine kinases. Dok proteins are involved in processes such as modulation of cell differentiation and proliferation, as well as in control of the cell spreading and migration The Dok protein contains an N-terminal pleckstrin homology (PH) domain followed by a central phosphotyrosine binding (PTB) domain, which has a PH-like fold, and a proline- and tyrosine-rich C-terminal tail. The PH domain binds to acidic phospholids and localizes proteins to the plasma membrane, while the PTB domain mediates protein-protein interactions by binding to phosphotyrosine-containing motifs [ ].This entry represents the PH domain of Dok-7.
Protein Domain
Name: Arterivirus NSP4 peptidase domain
Type: Domain
Description: Arteriviruses are enveloped, positive-stranded RNA viruses and include pathogens of major economic concern to the swine- and horse-breedingindustries:Equine arteritis virus (EAV).Porcine reproductive and respiratory syndrome virus (PRRSV).Mice actate dehydrogenase-elevating virus.Simian hemorrhagic fever virus.The arterivirus replicase gene encodes two large precursor polyproteins thatare processed by the viral main proteinase nonstructural protein 4 (NSP4). The structure of the enzyme reveals two chymotrypsin-like antiparallel beta-barrels and an extra C-terminal alpha/beta domain that may play a role in mediating protein-protein interactions. The N-terminalbarrel consists of six β-strands (A1 to F1), while the C-terminal barrel is composed of seven (A2 to G2). The core of both β-barrels is comprised ofconserved hydrophobic residues. The additional C-terminal domain consists of two pairs of short antiparallel β-sheets and two α-helices. Itinteracts with the C-terminal barrel through an interface consisting of conserved hydrophobic residues. A canonical catalytic triad that is composedof Ser, His, and Asp is located in the open cleft between the β-barrels domain [, ]. The nsp4 proteinase domain is a member of peptidase family S32.The is entry represents the NSP4 proteinase domain.
Protein Domain
Name: Cullin
Type: Family
Description: Cullins are a family of hydrophobic proteins that act as scaffolds for ubiquitin ligases (E3). Cullins are found throughout eukaryotes. Humans express seven cullins (Cul1, 2, 3, 4A, 4B, 5 and 7), each forming part of a multi-subunit ubiquitin complex. Cullin-RING ubiquitin ligases (CRLs), such as Cul1 (SCF) [ ], play an essential role in targeting proteins for ubiquitin-mediated destruction; as such, they are diverse in terms of composition and function, regulating many different processes from glucose sensing and DNA replication to limb patterning and circadian rhythms. The catalytic core of CRLs consists of a RING protein and a cullin family member. For Cul1, the C-terminal cullin-homology domain binds the RING protein. The RING protein appears to function as a docking site for ubiquitin-conjugating enzymes (E2s). Other proteins contain a cullin-homology domain, such as the APC2 subunit of the anaphase-promoting complex/cyclosome and the p53 cytoplasmic anchor PARC; both APC2 and PARC have ubiquitin ligase activity. The N-terminal region of cullins is more variable, and is used to interact with specific adaptor proteins [, , ].
Protein Domain
Name: EV matrix protein, N-terminal
Type: Homologous_superfamily
Description: Ebola virus sp. are non-segmented, negative-strand RNA viruses that causes severe haemorrhagic fever in humans with high rates of mortality. The virus matrix protein VP40 is a major structural protein that plays a central role in virus assembly and budding at the plasma membrane of infected cells. VP40 proteins associate with cellular membranes, interact with the cytoplasmic tails of glycoproteins, and bind to the ribonucleoprotein complex. The VP40 monomer consists of two domains, the N-terminal oligomerization domain and the C-terminal membrane-binding domain, connected by a flexible linker. Both the N- and C-terminal domains fold into beta sandwich structures of similar topology [ ]. Within the N-terminal domain are two overlapping L-domains with the sequences PTAP and PPEY at residues 7 to13, which are required for efficient budding []. L-domains are thought to mediate their function in budding through their interaction with specific host cellular proteins, such as tsg101 and vps-4 []. This entry describes the VP40 N-terminal domain. It is the region of the protein where the two VP40 monomers bind.
Protein Domain
Name: Disulphide bond isomerase, DsbC/G
Type: Family
Description: Disulfide bond isomerases DsbC and DsbG are V-shaped homodimeric proteins containing a redox active CXXC motif imbedded in a TRX fold. They function as protein disulfide isomerases and chaperones in the bacterial periplasm to correct non-native disulfide bonds formed by DsbA and prevent aggregation of incorrectly folded proteins [ ]. DsbC and DsbG are kept in their reduced state by the cytoplasmic membrane protein DsbD, which utilizes the TRX/TRX reductase system in the cytosol as a source of reducing equivalents []. DsbG differ from DsbC in that it has a more limited substrate specificity, and it may preferentially act later in the folding process to catalyze disulfide rearrangements in folded or partially folded proteins [, ].Also included in this entry is the predicted protein TrbB, whose gene was sequenced from the enterohemorrhagic E. coli type IV pilus gene cluster, which is required for efficient plasmid transfer. TrbB may be a disulfide bond isomerase that functions in the conjugative process by facilitating proper folding of a subset of F-plasmid-encoded proteins in the periplasm [ ].
Protein Domain
Name: EYA domain, metazoan
Type: Domain
Description: The Eyes absent proteins are members of a conserved regulatory network implicated in the development of the eye, muscle, kidney and ear. Eyes absent is a nuclear transcription factor, acting through interaction with homeodomain-containing Sine oculis (also known as Six) proteins. Eyes absent is also a protein tyrosine phosphatase [, ], It does not resemble theclassical tyrosine phosphatases that use cysteine as a nucleophile and proceed by means of a thiol-phosphate intermediate. Rather, Eyes absent is the prototype for a class of protein tyrosine phosphatases that use a nucleophilic aspartic acid in a metal-dependent reaction. Furthermore, the phosphatase activity of Eyes absentcontributes to its ability to induce eye formation in drosophila. Thus, Eyes absent belongs to the phosphatase subgroup of the haloacid dehalogenase (HAD) superfamily and appears to act as a nuclear transcriptional coactivator with intrinsic phosphatase activity.The Eyes absent proteins contain a divergent 200-300 residue long N-terminal region and a conserved C-terminal domain of approximately 270 residues, the EYA domain, which is critical for activity and believed to participate in protein-protein interactions [ ].
Protein Domain
Name: YukD-like
Type: Family
Description: YukD adopts a ubiquitin-like fold [ ]. Usually, ubiquitin covalently binds to protein and flags them for protein degradation; however conjugation assays have indicated that the classical YukD lacks the capacity for covalent bond formation with other proteins []. In firmicutes (Gram-positive bacteria), the YukD-like proteins are standalone versions and the YukD-like family is associated in conserved gene neighbourhoods with members of the ESAT-6 export pathway(also called Type VII secretion system or ESX), suggesting a role for YukD in regulating this export system [ , ]. In actinobacteria, YukD-like proteins are often fused to a transporter involved in the ESAT-6/ESX secretion pathway [, ]. Members of the YukD family are also associated in gene neighbourhoods with other enzymatic members of the ubiquitin signaling and degradation pathway such as the E1, E2 and E3 trienzyme complex that catalyse ubiquitin transfer to substrates, and the JAB family metallopeptidases that are involved in its release []. This suggests that a subset of the YukD family in bacteria are conjugated to target proteins and released from proteins as in the eukaryotic ubiquitin-mediated signaling and degradation pathway [].
Protein Domain
Name: Phlebovirus glycoprotein G2, C-terminal domain
Type: Domain
Description: HRTV is an insect-borne virus found in America that can infect humans. It belongs to the newly defined family Phenuiviridae, order Bunyavirales. HRTV contains three single-stranded RNA segments (L, M, and S). The M segment of the virus encodes a polyprotein precursor that is cleaved into two glycoproteins, Gn and Gc. Gc is a fusion protein facilitating virus entry into host cells [ ]. G2 (also known as Gc) is necessary for optimal glycoprotein G1 (also found as Gn) expression and efficient production or viral-like particles and possibly for the cell infection, as G2 is determinant for cell fusion []. Phleboviral G2 fusion glycoprotein is both functionally and structurally analogous to the fusion glycoproteins of alphaviruses and flaviviruses, all of them recognize DC-SIGN receptor to enable viral attachment [].This domain is found at the C-terminal of several Phlebovirus glycoprotein G2 sequences. Two prefusion conformation structures of bunyavirus Gc from RVFV in the Phenuiviridae family [ ] and Hantaan virus (HTNV) in the Hantaviridae family [] have been solved.
Protein Domain
Name: Glutaredoxin, PICOT-like
Type: Domain
Description: This entry represents a glutaredoxin (GRX) domain found in PICOT (also known as glutaredoxin-3 or PKC-interacting cousin of thioredoxin) and GRX-PICOT-like proteins. The non-PICOT members of this group contain only the GRX-like domain, whereas PICOT contains an N-terminal TRX-like domain followed by one to three GRX-like domains. It is interesting to note that PICOT from plants contain three repeats of the GRX-like domain, metazoan proteins (except for insect) have two repeats, while fungal sequences contain only one copy of the domain. PICOT is a protein that interacts with protein kinase C (PKC) theta, a calcium independent PKC isoform selectively expressed in skeletal muscle and T lymphocytes. PICOT inhibits the activation of c-Jun N-terminal kinase and the transcription factors, AP-1 and NF-kB, induced by PKC theta or T-cell activating stimuli. Both GRX and TRX domains of PICOT are required for its activity [ , , ]. Characterized non-PICOT proteins with this domain include CXIP1, a CAX-interacting protein in Arabidopsis thaliana, and PfGLP-1, a GRX-like protein from Plasmodium falciparum [, ].
Protein Domain
Name: Nisin biosynthesis protein, NisC
Type: Family
Description: The LanC-like protein superfamily encompasses a highly divergent group of peptide-modifying enzymes, including the eukaryotic and bacterial lanthionine synthetase C-like proteins (LanC) [ , , ]; subtilin biosynthesis protein SpaC from Bacillus subtilis [, ]; epidermin biosynthesis protein EpiC from Staphylococcus epidermidis []; nisin biosynthesis protein NisC from Lactococcus lactis [, , ]; GCR2 from Arabidopsis thaliana []; and many others. The 3D structure of the lantibiotic cyclase from L. lactis has been determined by X-ray crystallography to 2.5A resolution [ ]. The globular structure is characterised by an all-α fold, in which an outer ring of helices envelops an inner toroid composed of 7 shorter, hydrophobic helices. This 7-fold hydrophobic periodicity has led several authors to claim various members of the family, including eukaryotic LanC-1 and GCR2, to be novel G protein-coupled receptors [, ]; some of these claims have since been corrected [, , ]. This entry represents Nisin biosynthesis protein NisC [ , , ] is believed to be involved in the cyclisation of the lantibiotic nisin: specifically, nisin contains 5 cyclic thioethers, which are installed by the NisC enzyme.
Protein Domain
Name: ZNRF4 /RNF13/RNF167, PA domain
Type: Domain
Description: This PA domain is found in ZNRF4 /RNF13/RNF167 from animals and RMR1-6 from Arabidopsis. These proteins contain a C3H2C3 RING finger and some of them are predicted to be membrane-anchored RING finger ubiquitin ligases [ , ]. ZNRF4 is an ER-localized RING finger ubiquitin ligase and a regulator of calnexin stability and ER homeostasis []. RNF13 is highly enriched in ER and serves as a critical mediator for facilitating ER stress-induced apoptosis through the activation of the IRE1alpha-TRAF2-JNK signaling pathway []. RMR1 (also known as ReMembR-H2) is a sorting receptor involved in transport of storage proteins to protein storage vacuoles [].The significance of the PA domain to these proteins has not been ascertained. It may be a protein-protein interaction domain. At peptidase active sites, the PA domain may participate in substrate binding and/or promoting conformational changes, which influence the stability and accessibility of the site to substrate. It has been suggested that this domain forms a lid-like structure that covers the active site in active proteases, and is involved in protein recognition in vacuolar sorting receptors [ ].
Protein Domain
Name: Fungal SNX3, PX domain
Type: Domain
Description: This entry represents the PX domain found in SNX3 (also known as Grd19) from fungi. Grd19 has been shown to associate with early endosomes through a PX domain-mediated interaction with phosphatidylinositol-3-phosphate (PI3P) [ ]. Grd19 is involved in the localization of late Golgi membrane proteins in yeast. It associates with the retromer complex, a membrane coat multimeric complex required for endosomal retrieval of lysosomal hydrolase receptors to the Golgi, and functions as a cargo-specific adaptor for the retromer [].The Phox Homology (PX) domain is a phosphoinositide (PI) binding module present in many proteins with diverse functions. Sorting nexins (SNXs) make up the largest group among PX domain containing proteins. They are involved in regulating membrane traffic and protein sorting in the endosomal system. The PX domain of SNXs binds phosphoinositides (PIs) and targets the protein to PI-enriched membranes [ , ]. SNXs differ from each other in PI-binding specificity and affinity, and the presence of other protein-protein interaction domains, which help determine subcellular localization and specific function in the endocytic pathway [, , ].
Protein Domain
Name: Helicase superfamily 1/2, ATP-binding domain, DinG/Rad3-type
Type: Domain
Description: Helicases have been classified in 5 superfamilies (SF1-SF5). All of the proteins bind ATP and, consequently, all of them carry the classical Walker A (phosphate-binding loop or P-loop) and Walker B(Mg2+-binding aspartic acid) motifs. For the two largest groups, commonly referred to as SF1 and SF2, a total of seven characteristic motifs has beenidentified [ ]. These two superfamilies encompass a large number of DNA andRNA helicases from archaea, eubacteria, eukaryotes and viruses that seem to be active as monomers or dimers. RNA and DNA helicases are considered to beenzymes that catalyze the separation of double-stranded nucleic acids in an energy-dependent manner [].The various structures of SF1 and SF2 helicases present a common core with two α-β RecA-like domains [, ]. Thestructural homology with the RecA recombination protein covers the five contiguous parallel beta strands and the tandem alpha helices. ATP binds tothe amino proximal α-β domain, where the Walker A (motif I) and Walker B (motif II) are found. The N-terminal domain also contains motif III (S-A-T)which was proposed to participate in linking ATPase and helicase activities. The carboxy-terminal α-β domain is structurally very similar to theproximal one even though it is bereft of an ATP-binding site, suggesting that it may have originally arisen through gene duplication of the first one.Some members of helicase superfamilies 1 and 2 are listed below: DEAD-box RNA helicases. The prototype of DEAD-box proteins is the translation initiation factor eIF4A. The eIF4A protein isan RNA-dependent ATPase which functions together with eIF4B as an RNA helicase [].DEAH-box RNA helicases. Mainly pre-mRNA-splicing factor ATP-dependent RNA helicases [].Eukaryotic DNA repair helicase RAD3/ERCC-2, an ATP-dependent 5'-3' DNA helicase involved in nucleotide excision repair of UV-damaged DNA.Eukaryotic TFIIH basal transcription factor complex helicase XPB subunit. An ATP-dependent 3'-5' DNA helicase which is a component of the core-TFIIHbasal transcription factor, involved in nucleotide excision repair (NER) of DNA and, when complexed to CAK, in RNA transcription by RNA polymerase II.It acts by opening DNA either around the RNA transcription start site or the DNA.Eukaryotic ATP-dependent DNA helicase Q. A DNA helicase that may play a role in the repair of DNA that is damaged by ultraviolet light or othermutagens.Bacterial and eukaryotic antiviral SKI2-like helicase. SKI2 has a role in the 3'-mRNA degradation pathway, repressing dsRNA virus propagation byspecifically blocking translation of viral mRNAs, perhaps recognizing the absence of CAP or poly(A).Bacterial DNA-damage-inducible protein G (DinG). A probable helicase involved in DNA repair and perhaps also replication [].Bacterial primosomal protein N' (PriA). PriA protein is one of seven proteins that make up the restart primosome, an apparatus that promotesassembly of replisomes at recombination intermediates and stalled replication forks.Bacterial ATP-dependent DNA helicase recG. It has a critical role in recombination and DNA repair, helping process Holliday junctionintermediates to mature products by catalyzing branch migration. It has a DNA unwinding activity characteristic of helicases with a 3' to 5'polarity.A variety of DNA and RNA virus helicases and transcription factorsThis entry represents the ATP-binding domain found within bacterial DinG and eukaryotic Rad3 proteins, differing from other SF1 and SF2 helicases by the presence of a large insert after the Walker A motif [ ].
Protein Domain
Name: Helicase superfamily 1/2, DinG/Rad3-like
Type: Family
Description: Helicases have been classified in 5 superfamilies (SF1-SF5). All of the proteins bind ATP and, consequently, all of them carry the classical Walker A (phosphate-binding loop or P-loop) and Walker B(Mg2+-binding aspartic acid) motifs. For the two largest groups, commonly referred to as SF1 and SF2, a total of seven characteristic motifs has beenidentified [ ]. These two superfamilies encompass a large number of DNA andRNA helicases from archaea, eubacteria, eukaryotes and viruses that seem to be active as monomers or dimers. RNA and DNA helicases are considered to beenzymes that catalyze the separation of double-stranded nucleic acids in an energy-dependent manner [].The various structures of SF1 and SF2 helicases present a common core with two α-β RecA-like domains [, ]. Thestructural homology with the RecA recombination protein covers the five contiguous parallel beta strands and the tandem alpha helices. ATP binds to the amino proximal α-β domain, where the Walker A (motif I) and WalkerB (motif II) are found. The N-terminal domain also contains motif III (S-A-T) which was proposed to participate in linking ATPase and helicase activities.The carboxy-terminal α-β domain is structurally very similar to the proximal one even though it is bereft of an ATP-binding site, suggesting thatit may have originally arisen through gene duplication of the first one. Some members of helicase superfamilies 1 and 2 are listed below: DEAD-box RNA helicases. The prototype of DEAD-box proteins is the translation initiation factor eIF4A. The eIF4A protein isan RNA-dependent ATPase which functions together with eIF4B as an RNA helicase [].DEAH-box RNA helicases. Mainly pre-mRNA-splicing factor ATP-dependent RNA helicases [].Eukaryotic DNA repair helicase RAD3/ERCC-2, an ATP-dependent 5'-3' DNA helicase involved in nucleotide excision repair of UV-damaged DNA.Eukaryotic TFIIH basal transcription factor complex helicase XPB subunit. An ATP-dependent 3'-5' DNA helicase which is a component of the core-TFIIHbasal transcription factor, involved in nucleotide excision repair (NER) of DNA and, when complexed to CAK, in RNA transcription by RNA polymerase II.It acts by opening DNA either around the RNA transcription start site or the DNA.Eukaryotic ATP-dependent DNA helicase Q. A DNA helicase that may play a role in the repair of DNA that is damaged by ultraviolet light or othermutagens.Bacterial and eukaryotic antiviral SKI2-like helicase. SKI2 has a role in the 3'-mRNA degradation pathway, repressing dsRNA virus propagation byspecifically blocking translation of viral mRNAs, perhaps recognizing the absence of CAP or poly(A).Bacterial DNA-damage-inducible protein G (DinG). A probable helicase involved in DNA repair and perhaps also replication [].Bacterial primosomal protein N' (PriA). PriA protein is one of seven proteins that make up the restart primosome, an apparatus that promotesassembly of replisomes at recombination intermediates and stalled replication forks.Bacterial ATP-dependent DNA helicase recG. It has a critical role in recombination and DNA repair, helping process Holliday junctionintermediates to mature products by catalyzing branch migration. It has a DNA unwinding activity characteristic of helicases with a 3' to 5'polarity.A variety of DNA and RNA virus helicases and transcription factorsThis entry includes bacterial DinG and eukaryotic Rad3 proteins, differing from other SF1 and SF2 helicases by the presence of a large insert after the Walker A motif [ ].
Protein Domain
Name: Bicarbonate transporter-like, transmembrane domain
Type: Domain
Description: Bicarbonate (HCO 3-) transport mechanisms are the principal regulators of pH in animal cells. Such transport also plays a vital role in acid-base movements in the stomach, pancreas, intestine, kidney, reproductive organs and the central nervous system. Functional studies have suggested four different HCO 3-transport modes. Anion exchanger proteins exchange HCO 3-for Cl -in a reversible, electroneutral manner [ ]. Na+/HCO 3-co-transport proteins mediate the coupled movement of Na +and HCO 3-across plasma membranes, often in an electrogenic manner [ ]. Na+driven Cl -/HCO 3-exchange and K +/HCO 3-exchange activities have also been detected in certain cell types, although the molecular identities of the proteins responsible remain to be determined. Sequence analysis of the two families of HCO 3-transporters that have been cloned to date (the anion exchangers and Na +/HCO 3-co-transporters) reveals that they are homologous. This is not entirely unexpected, given that they both transport HCO 3-and are inhibited by a class of pharmacological agents called disulphonic stilbenes [ ]. They share around ~25-30% sequence identity, which is distributed along their entire sequence length, and have similar predicted membrane topologies, suggesting they have ~10 transmembrane (TM) domains.This entry represents transmembrane segments of bicarbonate transporters and related proteins.In animals, this domain is found at the C terminus of many bicarbonate and similar multifunctional transporters. The crystal structure of Band 3 anion transport protein, the founding member of the solute carrier 4 (SLC4) family of bicarbonate transporters, has been solved. This protein functions both as a transporter that mediates electroneutral anion exchange across the cell membrane and as a structural protein [ , , , ].Boron transporters from plants and yeast comprise only transmembrane segments, confirmed by the solved structures [ , , ]. In plants, boron is essential for maintaining the integrity of cell walls; this transporter mediates boron translocation from roots to shoots under boron limitation []. Boron transporter 1 from Saccharomyces cerevisiae protects yeast cells from boron toxicity and is involved in the trafficking of proteins to the vacuole. The mechanism of its activity seems to be consistent with this described for other members of the family [, ].
Protein Domain
Name: Clathrin light chain
Type: Family
Description: Proteins synthesized on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. These vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transport [ ]. Clathrin coats contain both clathrin (acts as a scaffold) and adaptor complexes that link clathrin to receptors in coated vesicles. Clathrin-associated protein complexes are believed to interact with the cytoplasmic tails of membrane proteins, leading to their selection and concentration. The two major types of clathrin adaptor complexes are the heterotetrameric adaptor protein (AP) complexes, and the monomeric GGA (Golgi-localising, Gamma-adaptin ear domain homology, ARF-binding proteins) adaptors [, ].Clathrin is a trimer composed of three heavy chains and three light chains, each monomer projecting outwards like a leg; this three-legged structure is known as a triskelion [ , ]. The heavy chains form the legs, their N-terminal β-propeller regions extending outwards, while their C-terminal α-α-superhelical regions form the central hub of the triskelion. Peptide motifs can bind between the β-propeller blades. The light chains appear to have a regulatory role, and may help orient the assembly and disassembly of clathrin coats as they interact with hsc70 uncoating ATPase []. Clathrin triskelia self-polymerise into a curved lattice by twisting individual legs together. The clathrin lattice forms around a vesicle as it buds from the TGN, plasma membrane or endosomes, acting to stabilise the vesicle and facilitate the budding process []. The multiple blades created when the triskelia polymerise are involved in multiple protein interactions, enabling the recruitment of different cargo adaptors and membrane attachment proteins []. This entry represents clathrin light chains, which are more divergent in sequence than the heavy chains [ ]. In higher eukaryotes, two genes encode distinct but related light chains, each of which can yield two separate forms via alternative splicing. In yeast there is a single light chain whose sequence is only distantly related to that of higher eukaryotes. Clathrin light chains have a conserved acidic N-terminal domain, a central coiled-coil domain and a conserved C-terminal domain.
Protein Domain
Name: Photosystem II PsbY
Type: Family
Description: Oxygenic photosynthesis uses two multi-subunit photosystems (I and II) located in the cell membranes of cyanobacteria and in the thylakoid membranes of chloroplasts in plants and algae. Photosystem II (PSII) has a P680 reaction centre containing chlorophyll 'a' that uses light energy to carry out the oxidation (splitting) of water molecules, and to produce ATP via a proton pump. Photosystem I (PSI) has a P700 reaction centre containing chlorophyll that takes the electron and associated hydrogen donated from PSII to reduce NADP+ to NADPH. Both ATP and NADPH are subsequently used in the light-independent reactions to convert carbon dioxide to glucose using the hydrogen atom extracted from water by PSII, releasing oxygen as a by-product.PSII is a multisubunit protein-pigment complex containing polypeptides both intrinsic and extrinsic to the photosynthetic membrane [ , , ]. Within the core of the complex, the chlorophyll and beta-carotene pigments are mainly bound to the antenna proteins CP43 (PsbC) and CP47 (PsbB), which pass the excitation energy on to the reaction centre proteins D1 (Qb, PsbA) and D2 (Qa, PsbD) that bind all the redox-active cofactors involved in the energy conversion process. The PSII oxygen-evolving complex (OEC) oxidises water to provide protons for use by PSI, and consists of OEE1 (PsbO), OEE2 (PsbP) and OEE3 (PsbQ). The remaining subunits in PSII are of low molecular weight (less than 10kDa), and are involved in PSII assembly, stabilisation, dimerisation, and photo-protection []. This family represents the low molecular weight transmembrane protein PsbY found in PSII. In higher plants, two related PsbY proteins exist, PsbY-1 and PsbY-2, which appear to function as a heterodimer. In spinach and Arabidopsis, these two proteins arise from a single-copy nuclear gene that is processed in the chloroplast. By contrast, prokaryotic and organellar chromosomes encode a single PsbY protein, as found in cyanobacteria and red algae, indicating a duplication event in the evolution of higher plants [ ]. PsbY has two low manganese-dependent activities: a catalase-like activity and an L-arginine metabolising activity that converts L-arginine into ornithine and urea []. In addition, a redox-active group is thought to be present in the protein. In cyanobacteria, PsbY deletion mutants have a slightly impaired PSII that is less capable of coping with low levels of calcium ions than the wild-type.
Protein Domain
Name: CRISPR-associated protein, Cas6-related
Type: Family
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. Members of this entry resemble the Cas6 proteins described by in having a C-terminal motif GXGXXXXXGXG, where the single X of each GXG is hydrophobic and the spacer XXXXX has at least one Lys or Arg. Examples are found in cas gene operons of CRISPR regions in Anabaena variabilis (strain ATCC 29413/PCC 7937), Leptospira interrogans, Gemmata obscuriglobus UQM 2246, and twice in Myxococcus xanthus (strain DK 1622). Oddly, an orphan member is found in Thiobacillus denitrificans (strain ATCC 25259), whose genome does not seem to contain other evidence of CRISPR repeats or cas genes.
Protein Domain
Name: Alcohol dehydrogenase, Ceratitis-type
Type: Family
Description: The short-chain dehydrogenases/reductases family (SDR) [ ] is avery large family of enzymes, most of which are known to be NAD- or NADP-dependent oxidoreductases. As the first member of this family to be characterised was Drosophila alcohol dehydrogenase, this family used to be called [ , , ] 'insect-type', or 'short-chain' alcohol dehydrogenases. Most members of this family are proteins of about 250 to 300 amino acid residues. Most dehydrogenases possess at least two domains [], the first binding the coenzyme, often NAD, and the second binding the substrate. This latter domain determines the substrate specificity and contains amino acids involved in catalysis. Little sequence similarity has been found in the coenzyme binding domain although there is a large degree of structural similarity, and it has therefore been suggested that the structure of dehydrogenases has arisen through gene fusion of a common ancestral coenzyme nucleotide sequence with various substrate specific domains [].Insect ADH is very different from yeast and mammalian ADHs. The enzyme from Drosophila lebanonensis (Fruit fly) has been characterised by protein analysis and was found to have a 254-residue protein chain with an acetyl-blocked N-terminalMet [ ]. Comparisons with the enzyme from other species reveals that theyhave diverged considerably. The structural variation within Drosophila is about as large as that for mammalian zinc-containing alcohol dehydrogenase.The crystal structure of the apo form of D. lebanonensis ADH has been solved to 1.9A resolution. Three structural features characterise the active site architecture: (i) a deep cavity, covered by a flexible 33-residue loop and an 11-residue C-terminal tail of the neighbouring subunit, whose hydrophobic surface is likely to increase the specificity of the enzyme for secondary aliphatic alcohols; (ii) the Ser-Tyr-Lys residues of the catalytic triad are known to be involved in enzymatic catalysis; and (iii) three well-ordered water molecules in hydrogen bonding distance of side-chains of the catalytic triad may be significant for the proton release steps in the catalysis.A number of proteins within the SDR family share a strong phylogenetic relationship with insect ADH. Amongst these are Drosophila ADH-relatedprotein (duplicate of Adh or Adh-dup) [ ]; Drosophila fat body protein; and development-specific 25Kd protein from Sarcophaga peregrina (Flesh fly). This group specifically identifies proteins related to Ceratitis capitata (Mediterranean fruit fly).
Protein Domain
Name: Competence operon G, ComGF
Type: Family
Description: This entry represents a competence protein ComGF (also known as ComG operon protein 6), which is required for transformation and DNA binding [ ].Competence is the ability of a cell to take up exogenous DNA from its environment, resulting in transformation. It is widespread among bacteria and is probably an important mechanism for the horizontal transfer of genes. DNA usually becomes available by the death and lysis of other cells. Competent bacteria use components of extracellular filaments called type 4 pili to create pores in their membranes and pull DNA through the pores into the cytoplasm. This process, including the development of competence and the expression of the uptake machinery, is regulated in response to cell-cell signalling and/or nutritional conditions [ ].The development of genetic competence in Bacillus subtilis is a highly regulated adaptive response to stationary-phase stress. For competence to develop, the transcriptional regulator, ComK, must be activated. ComK is required for the expression of genes encoding proteins that function in DNA uptake. In log-phase cultures, ComK is inactive in a complex with MecA and ClpC. The comS gene is induced in response to high culture cell density and nutritional stress and its product functions to release active ComK from the complex. ComK then stimulates the transcription initiation of its own gene as well as that of the late competence operons [ ].The comG operon of Bacillus subtilis encodes seven membrane associated proteins which function in binding of transforming DNA to the competent cell surface [ ]. ComGC, GD, GE and GG have N-terminal sequence motifs typical of type 4 pre-pilins and are processed by a pathway that requires the product of comC, also an essential competence gene. They form pilin-like structures that are localised to the cytoplasmic membrane and cell wall []. The comG operon also consist of ComGF, a small integral membrane protein, ComGA and ComGB, which are predicted to be a nucleotide binding protein and an integral membrane protein respectively []. When strains missing each of the 7 proteins are created, they were all found to be nontransformable and failed to bind transforming DNA to the cell surface [].
Protein Domain
Name: Peptidase C16, coronavirus
Type: Domain
Description: This entry contains coronavirus (CoV) cysteine endopeptidases that belong to MEROPS peptidase family C16 (subfamilies C16A and C16B, clan CA). These peptidases are involved in viral polyprotein processing, releasing NSP1, NSP2 and NSP3 proteins [ ] and they also function as deubiquitinating and deISG15ylating (interferon-induced gene 15) enzymes, disrupting host viral immune response to facilitate viral proliferation and replication. Therefore, this is an important target to develop antiviral treatments [].All coronaviruses encode between one and two accessory cysteine proteinases that recognise and process one or three sites in the amino-terminal half of the replicase polyprotein during assembly of the viral replication complex. MHV, HCoV and TGEV encode two accessory proteinases, called coronavirus papain-like proteinase 1 and 2 (PL1-PRO and PL2-PRO) [ ]. IBV and SARS encodes only one called PL-PRO (PL2-PRO, conserved in all CoVs) [, , ]. The structures of both PL-PROs are similar and they also have restricted specificities. The PL1-PRO of TGEV cleaves the polyprotein between Nsp2-Nsp3 recognising the Lys-Met-Gly-Gly motif, and recognises Leu-Arg-Gly-Gly in ubiquitin (ub) which shows that it is able to accommodate residues as different as Lys and Leu. In contrast, PL-PRO from SARS-CoV recognises Leu-Xaa-Gly-Gly (Xaa could be any amino acid) and cleaves peptide bonds between Nsp1-Nsp2, Nsp2-Nsp3 and between Nsp3-Nsp4 [, , ]. In Ub and ISG15 proteins, it recognises Leu-Arg-Gly-Gly motifs. SARS-CoV and SARS-CoV-2 are closely related but exhibit different host substrate preferences: SARS-CoV-2 PL-PRO preferentially cleaves the ubiquitin-like ISG15, whereas SARS-CoV PL-PRO predominantly targets ubiquitin chains [, ].The peptidase family C16 domain is about 260 amino acids in length and the solved structures determined that it consists of thumb, palm, and fingers subdomains. The thumb is comprised of six α-helices and a small β-hairpin; the fingers subdomain is made of six β-strands and two α-helices and includes a zinc binding site, in which the zinc ion is coordinated by four cysteine residues. Zinc binding is essential for structural integrity and protease activity, with a conformation that varies most between different PL-PRO structures. The palm subdomain is comprised of six β-strands and includes the catalytic residues Cys-His-Asp, located at the interface between the thumb and palm subdomains [ ].
Protein Domain
Name: TRIM32, B-box type 1 zinc finger
Type: Domain
Description: TRIM32 (also known as 72 kDa Tat-interacting protein, or zinc finger protein HT2A, or BBS11) is an E3 ubiquitin-protein ligase that promotes degradation of several targets, including actin, PIASgamma, Abl interactor 2, dysbindin, X-linked inhibitor of apoptosis (XIAP), p73 transcription factor, among others [ ]. It plays important roles in neuronal differentiation of neural progenitor cells, as well as in controlling cell fate in skeletal muscle progenitor cells [, ]. It reduces PI3K-Akt-FoxO signalling in muscle atrophy by promoting plakoglobin-PI3K dissociation. It also functions as a pluripotency-reprogramming roadblock that facilitates cellular transition towards differentiation via modulating the levels of Oct4 and cMyc [].Moreover, TRIM32 is an intrinsic influenza A virus (IAV) restriction factor which senses and targets the polymerase basic protein 1 (PB1) for ubiquitination and protein degradation [ , ]. It also plays a significant role in mediating the biological activity of the HIV-1 Tat protein in vivo, binding specifically to the activation domain of HIV-1 Tat []; it and can also interact with the HIV-2 and EIAV Tat proteins. TRIM32 also regulates myoblast proliferation by controlling turnover of NDRG2 (N-myc downstream-regulated gene) [ ] and it negatively regulates tumour suppressor p53 to promote tumorigenesis []. It also facilitates degradation of MYCN on spindle poles and induces asymmetric cell division in human neuroblastoma cells. In addition, TRIM32 plays important roles in regulation of hyperactivities and positively regulates the development of anxiety and depression disorders induced by chronic stress []. It also plays a role in regeneration by affecting satellite cell cycle progression via modulation of the SUMO ligase PIASy (PIAS4). Defects in TRIM32 lead to limb-girdle muscular dystrophy type 2H (LGMD2H), sarcotubular myopathies (STM) and Bardet-Biedl syndrome []. TRIM32 belongs to the C-VII subclass of TRIM (tripartite motif)-NHL family that is defined by their N-terminal RBCC (RING, Bbox, and coiled coil) domains, including three consecutive zinc-binding domains, a RING finger, Bbox1, and a coiled coil domain, as well as a NHL (named after proteins NCL-1, HT2A and Lin-41 that contain repeats folded into a six-bladed β-propeller) repeat domain positioned C-terminal to the RBCC domain.This entry represents the type 1 B-box (Bbox1) zinc finger, which is characterised by a C6H2 zinc-binding consensus motif.
Protein Domain
Name: Thermosome, archaeal
Type: Family
Description: Members of this eukaryotic family are part of the group II chaperonin complex called CCT (chaperonin containing TCP-1 or Tailless Complex Polypeptide 1) or TRiC [ , ]. Chaperonins are involved in productive folding of proteins []. They share a common general morphology, a double toroid of 2 stacked rings. The archaeal equivalent group II chaperonin is often called the thermosome []. Both the thermosome and the TCP-1 family of proteins are weakly, but significantly [], related to the cpn60/groEL chaperonin family (see ). The TCP-1 protein was first identified in mice where it is especially abundant in testis but present in all cell types. It has since been found and characterised in many other animal species, as well as in yeast, plants and protists. The TCP1 complex has a double-ring structure with central cavities where protein folding takes place [ ]. TCP-1 is a highly conserved protein of about 60kDa (556 to 560 residues) which participates in a hetero-oligomeric 900kDa double-torus shaped particle [] with 6 to 8 other different, but homologous, subunits []. These subunits, the chaperonin containing TCP-1 (CCT) subunit beta, gamma, delta, epsilon, zeta and eta are evolutionary related to TCP-1 itself [, ]. Non-native proteins are sequestered inside the central cavity and folding is promoted by using energy derived from ATP hydrolysis [, , ]. The CCT is known to act as a molecular chaperone for tubulin, actin and probably some other proteins [, ].Thermosome (or cpn60) is the name given to the archaeal rather than eukaryotic form of the group II chaperonin (counterpart to the group I chaperonin, GroEL/GroES, in bacteria), a toroidal, ATP-dependent molecular chaperone that assists in the folding or refolding of nascent or denatured proteins [ ]. Cpn60 consists of two stacked octameric rings, which are composed of one or two different subunits. Various homologous subunits, one to five per archaeal genome, may be designated alpha, beta, etc., but phylogenetic analysis does not show distinct alpha subunit and beta subunit lineages traceable to ancient paralogs. TF55 from thermophilic bacteria is also included in this entry.
Protein Domain
Name: NSP3, second ubiquitin-like (Ubl) domain, coronavirus
Type: Domain
Description: Non-structural protein NSP3 (also known as nsp3) is a multi-domain protein, the largest product of ORF1a which encodes the poyprotein 1a/1ab (pp1a/1ab). NSP3 comprises up to 16 different domains and regions, their organisation differs between coronaviruses (CoV). However, eight domains and two transmembrane regions are conserved in all known CoVs: the ubiquitin-like domain 1 (Ubl1), the Glu-rich acidic domain (hypervariable region), a macrodomain (X domain), the ubiquitin-like domain 2 (Ubl2), the papain-like protease 2 (PL2pro, depending on the CoV there are one or two PLpro), the NSP3 ectodomain (3Ecto, also called "zinc-finger domain"), as well as the domains Y1 and CoV-Y of unknown function. NSP3 is released from pp1a/1ab by the papain-like protease domain(s), which is (are) part of NSP3 itself. NSP3 is an essential component of the replication/transcription complex (RTC) as it acts as a scaffold protein to interact with itself and to bind other viral NSPs or host proteins. The RTC associates with host ER membranes producing convoluted membranes and double-membrane vesicles. It is also involved in post-translational modifications of host proteins to antagonise its innate immune response [ , ].Nsp3 comprises various domains of functional and structural importance for virus replication, the organization of which differs between CoV genera, due to duplication or absence of some domains [ ]. Two ubiquitin-like domains, Ubl1 and Ubl2 (Nsp3a and the N-terminal domain of Nsp3d), exist within Nsp3 of all CoVs. The known functional roles of Nsp3a Ubl in CoVs are related to single-stranded (ssRNA) binding and interacting with the nucleocapsid (N) protein [, ]. Nsp3d Ubl is immediately adjacent to the N terminus of the PLpro (or PL2Pro) domain in CoV polyproteins, and it may play a critical role in protease regulation and stability as well as in viral infection [, , , ]. In addition to the four β-strands and two α-helices that are common to ubiquitin-like folds, the Nsp3a Ubl domain contains two short helices [, ]. The Nsp3d Ubl domain comprises five β-strands, one α-helix, and one 3(10)-helix [, , ]. This entry represents a domain covering the entire CoV Nsp3a and Nsp3d second Ubl domain. Its exact function is unclear, but it might act as a modulator helping the PL2pro recognise its specific targets during coronavirus infection [, , ].
Protein Domain
Name: Peptidase S8A, PatG
Type: Family
Description: Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes [ ]. They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Many families of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence [ ]. Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases [].Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base [ ]. The geometric orientations of the catalytic residues are similar between families, despite different protein folds []. The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) [, ].Limited proteolysis of most large protein precursors is carried out in vivo by the subtilisin-like pro-protein convertases. Many important biological processes such as peptide hormone synthesis, viral protein processing and receptor maturation involve proteolytic processing by these enzymes [ ]. The subtilisin-serine protease (SRSP) family hormone and pro-protein convertases (furin, PC1/3, PC2, PC4, PACE4, PC5/6, and PC7/7/LPC) act within the secretory pathway to cleave polypeptide precursors at specific basic sites, generating their biologically active forms. Serum proteins, pro-hormones, receptors, zymogens, viral surface glycoproteins, bacterial toxins, amongst others, are activated by this route []. The SRSPs share the same domain structure, including a signal peptide, the pro-peptide, the catalytic domain, the P/middle or homo B domain, and the C terminus.This entry contains serine endopeptidases belonging to the MEROPS peptidase family S8 (subtilisin family , clan SB); which include PatA and C-terminal peptidase domain of the PatG protein of Prochloron didemni, TenA and a region of TenG from Nostoc spongiaeforme var. tenue, etc. These peptidases are associated with the maturation of various members of the cyanobactin family of ribosomally produced, heavily modified bioactive metabolites.
Protein Domain
Name: E3 ubiquitin-protein ligase ZNRF3, Zinc finger, RING-type
Type: Domain
Description: This entry represents the RING-type zinc finger domain of E3 ubiquitin-protein ligase ZNRF3 (Zinc/RING finger protein 3), a transmembrane enzyme ( ) homologue of Ring finger protein 43 (RNF43). It is predominantly found in vertebrates. In humans, ZNRF3 acts as a negative regulator of the Wnt signaling pathway by mediating the ubiquitination and subsequent degradation of Wnt receptor complex components Frizzled and LRP6 [ , , ]. ZNRF3 also functions as a tumour suppressor in the intestinal stem cell zone by restricting the size of the intestinal stem cell zone []. In frogs (Xenopus), ZNRF3 and RNF43 were seen to play a key role in limb specification, constituting a master switch along with RSPO2, which may have implications for regenerative medicine [ ]. Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [ , , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target.
Protein Domain
Name: TFIIS/LEDGF domain superfamily
Type: Homologous_superfamily
Description: This superfamily is composed of the Transcription factor IIS (TFIIS) and the Lens epithelium-derived growth factor (LEDGF) domains.Transcription factor IIS (TFIIS) is a transcription elongation factor that increases the overall transcription rate of RNA polymerase II by reactivating transcription elongation complexes that have arrested transcription. The three structural domains of TFIIS are conserved from yeast to human. The 80 or so N-terminal residues form a protein interaction domain containing a conserved motif, which has been called the LW motif because of the invariant leucine and tryptophan residues it contains. This N-terminal domain is not required for transcriptional activity, and while a similar sequence has been identified in other transcription factors, and proteins that are predominantly nuclear localized [ , ], the domain is also found in proteins not directly involved in transcription. This domain is found in (amongst others):MED26 (also known as CRSP70 and ARC70), a subunit of the Mediator complex, which is required for the activity of the enhancer-binding protein Sp1. Elongin A, a subunit of a transcription elongation factor previously known as SIII. It increases the rate of transcription by suppressing transient pausing of the elongation complex. PPP1R10, a nuclear regulatory subunit of protein phosphatase 1 that was previously known as p99, FB19 or PNUTS. IWS1, which is thought to function in both transcription initiation and elongation. The TFIIS N-terminal domain is a compact four-helix bundle. The hydrophobic core residues of helices 2, 3, and 4 are well conserved among TFIIS domains, although helix 1 is less conserved [ ]. Lens epithelium-derived growth factor (LEDGF), also known as transcriptional co-activator p75, is a chromatin-associated protein that protects cells from stress-induced apoptosis. It is the binding partner of HIV-1 integrase in human cells [ ]. The integrase binding domain (IBD) of LEDGF is a compact right-handed bundle composed of five α-helices. The residues essential for the interaction with the integrase are present in the inter-helical loop regions of the bundle structure. The integrase binding domain is not unique to LEDGF, as a second human protein, hepatoma-derived growth factor-related protein 2 (HRP2), contains a homologous sequence [].
Protein Domain
Name: Secretoglobin superfamily
Type: Homologous_superfamily
Description: Secretoglobins are relatively small, secreted, disulphide-bridged dimeric proteins with encoding genes sharing substantial sequence similarity [ , ]. Members of this family include:Uteroglobin, a mammalian, steroid-inducible, secreted anti-inflammatory/immunomodulatory protein [ ].Mammaglobin, expressed in ovarian cancer cells [ ].Lipophilin B, which exists as a complex with mammary-specific mammaglobin A [ ].Clara cell 17kDa protein, which inhibits phospholipase A2 and papain, and also binds to progesterone [ , ]. Allergen Fel d 1 (Felis silvestris catus (Cat) allergen 1) chains 1 and 2, a tetrameric glycoprotein formed by two heterodimers that elicit IgE responses in people with allergy to cats [ , ].Secretoglobin proteins have a four-helical structure, and in the case of uteroglobin, form homodimers, whereas allergen Fel d 1 forms a tetramer of two heterodimers (chains 1 and 2). The conservation of this primary and quaternary structure indicates that the genome of the eutherian common ancestor of cats, rodents, and primates contained a similar gene pair.Uteroglobin (blastokinin or Clara cell protein CC10) is a mammalian steroid-inducible secreted protein originally isolated from the uterus of rabbits during early pregnancy [ ]. The mucosal epithelia of several organs that communicate with the external environment express uteroglobin. Its tissue-specific expression is regulated by steroid hormones, and is augmented in the uterus by non-steroidal prolactin. Uteroglobin may be a multi-functional protein with anti-inflammatory/immunomodulatory properties, acting to inhibit phospholipase A2 activity [, ], and binding to (and possibly sequestering) several hydrophobic ligands such as progesterone, retinols, polychlorinated biphenyls, phospholipids and prostaglandins [, ]. In addition, uteroglobin has anti-chemotactic, anti-allergic, anti-tumourigenic and embryo growth-stimulatory properties. Uteroglobin may have a homeostatic role against oxidative damage, inflammation, autoimmunity and cancer [, , , ]. However, the true biological function of uteroglobin is poorly understood. Uteroglobin consists of a disulphide-linked homodimer with a large hydrophobic pocket located between the two dimers []. Each monomer being composed of four helices that do not form a canonical four helix-bundle motif but rather a boomerang-shaped structure in which helices H1, H3, and H4 are able to bind a homodimeric partner []. The hydrophobic pocket binds steroids, particularly progesterone, with high specificity. It is a member of the secretoglobin superfamily.
Protein Domain
Name: CD209-like, C-type lectin-like domain
Type: Domain
Description: This entry represents the C-type lectin-like domain (CTLD) of the type found in human dendritic cell (DC)-specific intercellular adhesion molecule 3-grabbing non-integrin (DC-SIGN), also known as CD209 antigen, and the related receptor, DC-SIGN receptor (DC-SIGNR), also known as CD209 antigen-like protein 1 or C-type lectin domain family 4 member M. This group also contains proteins similar to hepatic asialoglycoprotein receptor (ASGP-R) and langerin (also known as CD207 or C-type lectin domain family 4 member K) in human. These proteins are type II membrane proteins with a CTLD ectodomain. CTLD refers to a domain homologous to the carbohydrate-recognition domains (CRDs) of the C-type lectins [ ].DC-SIGN is thought to mediate the initial contact between dendritic cells and resting T cells, and may also mediate the rolling of DCs on epithelium [ ]. DC-SIGN and DC-SIGNR bind to oligosaccharides present on human tissues, as well as on pathogens including parasites, bacteria, and viruses. DC-SIGN and DC-SIGNR bind to HIV, enhancing viral infection of T cells []. DC-SIGN and DC-SIGNR are homotetrameric, and contain four CTLDs stabilized by a coiled coil of alpha helices []. The hepatic ASGP-R is an endocytic recycling receptor which binds and internalizes desialylated glycoproteins having a terminal galactose or N-acetylgalactosamine residues on their N-linked carbohydrate chains, via the clathrin-coated pit mediated endocytic pathway, and delivers them to lysosomes for degradation. It has been proposed that glycoproteins bearing terminals 'Sia (sialic acid) alpha2, 6GalNAc' and 'Sia alpha2, 6Gal' are endogenous ligands for ASGP-R, and that ASGP-R participates in regulating the relative concentration of serum glycoproteins bearing alpha 2,6-linked Sia []. The human ASGP-R is a hetero-oligomer composed of two subunits, both of which are found within this group []. Langerin is expressed in a subset of dendritic leukocytes, the Langerhans cells (LC). Langerin induces the formation of Birbeck Granules (BGs) and associates with these BGs following internalization []. Langerin binds, in a calcium-dependent manner, to glyco-conjugates containing mannose and related sugars mediating their uptake and degradation. Langerin molecules oligomerize as trimers with three CTLDs held together by a coiled-coil of alpha helices [].
Protein Domain
Name: Secretoglobin
Type: Family
Description: Secretoglobins are relatively small, secreted, disulphide-bridged dimeric proteins with encoding genes sharing substantial sequence similarity [ , ]. Members of this family include:Uteroglobin, a mammalian, steroid-inducible, secreted anti-inflammatory/immunomodulatory protein [ ].Mammaglobin, expressed in ovarian cancer cells [ ].Lipophilin B, which exists as a complex with mammary-specific mammaglobin A [ ].Clara cell 17kDa protein, which inhibits phospholipase A2 and papain, and also binds to progesterone [ , ]. Allergen Fel d 1 (Felis silvestris catus (Cat) allergen 1) chains 1 and 2, a tetrameric glycoprotein formed by two heterodimers that elicit IgE responses in people with allergy to cats [ , ].Secretoglobin proteins have a four-helical structure, and in the case of uteroglobin, form homodimers, whereas allergen Fel d 1 forms a tetramer of two heterodimers (chains 1 and 2). The conservation of this primary and quaternary structure indicates that the genome of the eutherian common ancestor of cats, rodents, and primates contained a similar gene pair.Uteroglobin (blastokinin or Clara cell protein CC10) is a mammalian steroid-inducible secreted protein originally isolated from the uterus of rabbits during early pregnancy [ ]. The mucosal epithelia of several organs that communicate with the external environment express uteroglobin. Its tissue-specific expression is regulated by steroid hormones, and is augmented in the uterus by non-steroidal prolactin. Uteroglobin may be a multi-functional protein with anti-inflammatory/immunomodulatory properties, acting to inhibit phospholipase A2 activity [, ], and binding to (and possibly sequestering) several hydrophobic ligands such as progesterone, retinols, polychlorinated biphenyls, phospholipids and prostaglandins [, ]. In addition, uteroglobin has anti-chemotactic, anti-allergic, anti-tumourigenic and embryo growth-stimulatory properties. Uteroglobin may have a homeostatic role against oxidative damage, inflammation, autoimmunity and cancer [, , , ]. However, the true biological function of uteroglobin is poorly understood. Uteroglobin consists of a disulphide-linked homodimer with a large hydrophobic pocket located between the two dimers []. Each monomer being composed of four helices that do not form a canonical four helix-bundle motif but rather a boomerang-shaped structure in which helices H1, H3, and H4 are able to bind a homodimeric partner []. The hydrophobic pocket binds steroids, particularly progesterone, with high specificity. It is a member of the secretoglobin superfamily.
Protein Domain
Name: Pentatricopeptide repeat
Type: Repeat
Description: This entry represents the PPR repeat.Pentatricopeptide repeat (PPR) proteins are characterised by tandem repeats of a degenerate 35 amino acid motif [ ]. PPR proteins are sequence-specific RNA-binding proteins that are involved in multiple aspects of RNA metabolism [, ]. They can bind a diversity of sequences that confers the variability in its functions []. Most have roles in mitochondria or plastids []. PPR repeats were discovered while screening Arabidopsis proteins for those predicted to be targeted to mitochondria or chloroplast [, ]. Some of these proteins have been shown to play a role in post-transcriptional processes within organelles [, ]. Plant genomes have between one hundred to five hundred PPR genes per genome whereas non-plant genomes encode two to six PPR proteins.The plant PPR protein family has been divided in two subfamilies on the basis of their motif content and organisation [ , ].The crystal structure of maize chloroplast PPR10 has been reported. The nineteen repeats of PPR10 are assembled into a right-handed superhelical spiral. PPR10 forms a homodimer and exhibits considerable conformational changes upon binding to its ssRNA target, with six nucleotides being specifically recognized by six corresponding PPR10 repeats [ ].
Protein Domain
Name: Glycosyl transferase, family 13
Type: Family
Description: The biosynthesis of disaccharides, oligosaccharides and polysaccharides involves the action of hundreds of different glycosyltransferases. These enzymes catalyse the transfer of sugar moieties from activated donor molecules to specific acceptor molecules, forming glycosidic bonds. A classification of glycosyltransferases using nucleotide diphospho-sugar, nucleotide monophospho-sugar and sugar phosphates ([intenz:2.4.1.-]) and related proteins into distinct sequence based families has been described []. This classification is available on the CAZy (CArbohydrate-Active EnZymes) web site. The same three-dimensional fold is expected to occur within each of the families. Because 3-D structures are better conserved than sequences, several of the families defined on the basis of sequence similarities may have similar 3-D structures and therefore form 'clans'.Alpha-1,3-mannosyl-glycoprotein beta-1,2-N-acetylglucosaminyltransferase (GNT-I, GLCNAC-T I) transfers N-acetyl-D-glucosamine from UDP to high-mannose glycoprotein N-oligosaccharide. This is an essential step in the synthesis of complex or hybrid-type N-linked oligosaccharides. The enzyme is an integral membrane protein localized to the Golgi apparatus, and is probably distributed in all tissues [ ]. Protein O-linked-mannose beta-1,2-N-acetylglucosaminyltransferase 1 (POMGNTI, GNT-I.2) participates in O-mannosyl glycosylation by catalyzing the addition of N-acetylglucosamine to O-linked mannose on glycoproteins [, , ]. These proteins are members of the glycosyl transferase family 13 ()
Protein Domain
Name: BSD domain
Type: Domain
Description: The BSD domain is an about 60-residue long domain named after the BTF2-like transcription factors, Synapse-associated proteins and DOS2-like proteins in which it is found. Additionally, it is also found in several hypothetical proteins. The BSD domain occurs in one or two copies in a variety of species ranging from primal protozoan to human. It can be found associated with other domains such as the BTB domain (see ) or the U-box in multidomain proteins. The function of the BSD domain is unknown [ ].Secondary structure prediction indicates the presence of three predicted alpha helices, which probably form a three-helical bundle in small domains. The third predicted helix contains neighbouring phenylalanine and tryptophan residues - less common amino acids that are invariant in all the BSD domains identified and that are the most striking sequence features of the domain [ ].Some proteins known to contain one or two BSD domains are listed below:Mammalian TFIIH basal transcription factor complex p62 subunit (GTF2H1).Yeast RNA polymerase II transcription factor B 73kDa subunit (TFB1), the homologue of BTF2.Yeast DOS2 protein. It is involved in single-copy DNA replication and ubiquitination.Drosophila synapse-associated protein SAP47.Mammalian SYAP1.Arabidopsis thaliana (Mouse-ear cress) TFB1-1 (TFB1A) and TFB1-3 (TFB1C).
Protein Domain
Name: Type III secretion system substrate exporter
Type: Family
Description: Salmonella, and related proteobacteria, secrete large amounts of proteins into the culture media. The major secreted proteins are either flagellar proteins or virulence factors [ ], secreted through the flagellar or virulence export structures respectively. Both secretion systems penetrate the inner and outer membranes and their components bear substantial sequence similarity. Both the flagellar and needle like pilus look fairly similar to each other []. The type III secretion system is of great interest, as it is used to transport virulence factors from the pathogen directly into the host cell [ ] and is only triggered when the bacterium comes into close contact with the host. It is believed that the family of type III flagellar and pilus inner membrane proteins are used as structural moieties in a complex with several other subunits [ ]. One such set of inner membrane proteins, labeled "S"here for nomenclature purposes, includes the Salmonella and Shigella SpaS, the Yersinia YscU, Rhizobium Y4YO, and the Erwinia HrcU genes, Salmonella FlhB and Escherichia coli EscU [ , , , ].Many of the proteins, in this entry, undergo autocatalytic cleavage promoted by cyclization of a conserved asparagine. These proteins belong to the MEROPS peptidase family N6.
Protein Domain
Name: Herpesvirus glycoprotein L, N-terminal domain superfamily
Type: Homologous_superfamily
Description: Herpesviruses are enveloped by a lipid bilayer that contains at least a dozen glycoproteins. The virion surface glycoproteins mediate recognition of susceptible cells and promote fusion of the viral envelope with the cell membrane, leading to virus entry. No single glycoprotein associated with the virion membrane has been identified as the fusogen [ ].Glycoprotein L (gL) forms a non-covalently linked heterodimer with glycoprotein H (gH). This heterodimer is essential for virus-cell and cell-cell fusion since the association of gH and gL is necessary for correct localisation of gH to the virion or cell surface. gH anchoring the heterodimer to the plasma membrane through its transmembrane domain. gL lacks a transmembrane domain and is secreted from cells when expressed in the absence of gH [ ].This entry represents Herpesvirus glycoprotein L (gL), which is a virion associated envelope glycoprotein [ ]. Heterodimer formation between gH and gL has been demonstrated in both virions and infected cells []. Heterodimer formation between gL and gH is important for the proper folding of gH and its insertion into the membrane because the anti-gH conformation-dependent monoclonal antibodies (mAbs) 53S and LP11 bind gH only when gL is present [, ].
USDA
InterMine logo
The Legume Information System (LIS) is a research project of the USDA-ARS:Corn Insects and Crop Genetics Research in Ames, IA.
LegumeMine || ArachisMine | CicerMine | GlycineMine | LensMine | LupinusMine | PhaseolusMine | VignaMine | MedicagoMine
InterMine © 2002 - 2022 Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, United Kingdom