Search our database by keyword

- or -

Examples

  • Search this entire website. Enter identifiers, names or keywords for genes, pathways, authors, ontology terms, etc. (e.g. eve, embryo, zen, allele)
  • Use OR to search for either of two terms (e.g. fly OR drosophila) or quotation marks to search for phrases (e.g. "dna binding").
  • Boolean search syntax is supported: e.g. dros* for partial matches or fly AND NOT embryo to exclude a term

Search results 11701 to 11800 out of 30763 for seed protein

Category restricted to ProteinDomain (x)

0.035s

Categories

Category: ProteinDomain
Type Details Score
Protein Domain
Name: Transglutaminase, C-terminal domain superfamily
Type: Homologous_superfamily
Description: Transglutaminases catalyse the post-translational modification of proteins at glutamine residues, with formation of isopeptide bonds. Members of the transglutaminase family usually have three domains: N-terminal ( ), middle ( ) and C-terminal. The middle domain is usually well conserved, but family members can display major differences in their N- and C-terminal domains, although their overall structure is conserved [ ]. This entry represents the C-terminal domain found in transglutaminases, which consists of an immunoglobulin-like β-sandwich consisting of seven strands in two sheets with a Greek key topology.The best known transglutaminase is blood coagulation factor XIII, a plasma tetrameric protein composed of two catalytic A subunits and two non-catalytic B subunits. Factor XIII is responsible for cross-linking fibrin chains, thus stabilising the fibrin clot. Protein-glutamine gamma-glutamyltransferases ( ) are calcium-dependent enzymes that catalyse the cross-linking of proteins by promoting the formation of isopeptide bonds between the γ-carboxyl group of a glutamine in one polypeptide chain and the ε-amino group of a lysine in a second polypeptide chain. TGases also catalyse the conjugation of polyamines to proteins [ , ].Proteins containing this domain also include Protein 4.2 (also known as Epb42), which is one of the most abundant protein components of the erythrocyte membrane. The protein shares significant sequence homology with transglutaminases, but lacks the catalytic triad residues required for transglutaminase activity [ ]. The complete or nearly complete absence of protein 4.2 is associated with an atypical form of hereditary spherocytosis (HS) [].
Protein Domain
Name: PSTPIP1, SH3 domain
Type: Domain
Description: This entry represents the SH3 domain of proline-serine-threonine phosphatase-interacting protein 1 (PSTPIP1).Proline-serine-threonine phosphatase-interacting protein 1 (PSTPIP1) belongs to the PCH family. It interacts with Wiskott-Aldrich syndrome protein (WASP) and PTPN12, which are important regulators of the cytoskeleton and cell migration, suggesting that PSTPIP1 functions in these pathways [ ]. PSTPIP1 has been identified as a component of the leukocyte uropod that regulates endocytosis and cell migration []. It also controls extracellular matrix degradation and filopodia formation in macrophages []. It interacts with pyrin, a protein that associates with the cytoskeleton in myeloid/monocytic cells and modulates IL-1beta processing, NF-kappaB activation, and apoptosis []. Mutations in the PSTPIP1 gene have been linked to PAPA syndrome, an inflammatory disease [].Pombe Cdc15 homology (PCH) family proteins were initially identified as adaptor proteins involved in the regulation of cytokinesis and actin dynamics [ ]. They share a similar domain architecture, consisting of an N-terminal FCH domain followed by a coiled coil (CC) region and by one or two C-terminal SH3 domains. However, in some family members the SH3 domain is absent (FCHO1, FCHO2 and PSTPIP2) or there are tissue-specific alternatively spliced isoforms with and without an SH3 domain (CIP4b, CIP4c, CIP4V, Fbp17b). PCH family proteins interact with receptors, adaptors, enzymes and structural proteins to regulate their localisation and activity. Through these interactions, PCH proteins regulate cell morphology and motility, organelle integrity, protein trafficking and the organisation of the actin cytoskeleton [].
Protein Domain
Name: Hemerythrin-like superfamily
Type: Homologous_superfamily
Description: The hemerythrin family is composed of hemerythrin proteins found in invertebrates, and a broader collection of bacterial and archaeal homologues. Hemerythrin is an oxygen-binding protein found in the vascular system and coelomic fluid, or in muscles (myohemerythrin) in invertebrates [ ]. Many of the homologous proteins found in prokaryotes are multi-domain proteins with signal-transducing domains such as the GGDEF diguanylate cyclase domain () and methyl-accepting chemotaxis protein (MCP) signalling domain ( ). Most hemerythrins are oxygen-carriers with a bound non-haem iron, but at least one example is a cadmium-binding protein, apparently with a role in sequestering toxic metals rather than in binding oxygen. The prokaryote with the most instances of this domain is Magnetococcus sp. MC-1, a magnetotactic bacterium. Hemerythrins and myohemerythrins [ , ] are small proteins of about 110 to 129 amino acid residues that bind two iron atoms. They are left-twisted 4-α-helical bundles, which provide a hydrophobic pocket where dioxygen binds as a peroxo species, interacting with adjacent aliphatic side chains via van der Waals forces []. In both hemerythrins and myohemerythrins, the active centre is a binuclear iron complex, bound directly to the protein via 7 amino acid side chains [], 5 His, 1 Glu and 1 Asp []. Ovohemerythrin [], a yolk protein from the leech Theromyzon tessulatum seems to belong to this family of proteins, it may play a role in the detoxification of free iron after a blood meal [].
Protein Domain
Name: Transglutaminase, C-terminal
Type: Domain
Description: Transglutaminases catalyse the post-translational modification of proteins at glutamine residues, with formation of isopeptide bonds. Members of the transglutaminase family usually have three domains: N-terminal ( ), middle ( ) and C-terminal. The middle domain is usually well conserved, but family members can display major differences in their N- and C-terminal domains, although their overall structure is conserved [ ]. This entry represents the C-terminal domain found in transglutaminases, which consists of an immunoglobulin-like β-sandwich consisting of seven strands in two sheets with a Greek key topology.The best known transglutaminase is blood coagulation factor XIII, a plasma tetrameric protein composed of two catalytic A subunits and two non-catalytic B subunits. Factor XIII is responsible for cross-linking fibrin chains, thus stabilising the fibrin clot. Protein-glutamine gamma-glutamyltransferases ( ) are calcium-dependent enzymes that catalyse the cross-linking of proteins by promoting the formation of isopeptide bonds between the γ-carboxyl group of a glutamine in one polypeptide chain and the ε-amino group of a lysine in a second polypeptide chain. TGases also catalyse the conjugation of polyamines to proteins [ , ].Proteins containing this domain also include Protein 4.2 (also known as Epb42), which is one of the most abundant protein components of the erythrocyte membrane. The protein shares significant sequence homology with transglutaminases, but lacks the catalytic triad residues required for transglutaminase activity [ ]. The complete or nearly complete absence of protein 4.2 is associated with an atypical form of hereditary spherocytosis (HS) [].
Protein Domain
Name: Cytochrome P450, mitochondrial
Type: Family
Description: Cytochrome P450 enzymes are a superfamily of haem-containing mono-oxygenases that are found in all kingdoms of life, and which show extraordinary diversity in their reaction chemistry. In mammals, these proteins are found primarily in microsomes of hepatocytes and other cell types, where they oxidise steroids, fatty acids and xenobiotics, and are important for the detoxification and clearance of various compounds, as well as for hormone synthesis and breakdown, cholesterol synthesis and vitamin D metabolism. In plants, these proteins are important for the biosynthesis of several compounds such as hormones, defensive compounds and fatty acids. In bacteria, they are important for several metabolic processes, such as the biosynthesis of antibiotic erythromycin in Saccharopolyspora erythraea (Streptomyces erythraeus).Cytochrome P450 enzymes use haem to oxidise their substrates, using protons derived from NADH or NADPH to split the oxygen so a single atom can be added to a substrate. They also require electrons, which they receive from a variety of redox partners. In certain cases, cytochrome P450 can be fused to its redox partner to produce a bi-functional protein, such as with P450BM-3 from Bacillus megaterium [ ], which has haem and flavin domains.Organisms produce many different cytochrome P450 enzymes (at least 58 in humans), which together with alternative splicing can provide a wide array of enzymes with different substrate and tissue specificities. Individual cytochrome P450 proteins follow the nomenclature: CYP, followed by a number (family), then a letter (subfamily), and another number (protein); e.g. CYP3A4 is the fourth protein in family 3, subfamily A. In general, family members should share >40% identity, while subfamily members should share >55% identity.Cytochrome P450 proteins can also be grouped by two different schemes. One scheme was based on a taxonomic split: class I (prokaryotic/mitochondrial) and class II (eukaryotic microsomes). The other scheme was based on the number of components in the system: class B (3-components) and class E (2-components). These classes merge to a certain degree. Most prokaryotes and mitochondria (and fungal CYP55) have 3-component systems (class I/class B) - a FAD-containing flavoprotein (NAD(P)H-dependent reductase), an iron-sulphur protein and P450. Most eukaryotic microsomes have 2-component systems (class II/class E) - NADPH:P450 reductase (FAD and FMN-containing flavoprotein) and P450. There are exceptions to this scheme, such as 1-component systems that resemble class E enzymes [ , , ]. The class E enzymes can be further subdivided into five sequence clusters, groups I-V, each of which may contain more than one cytochrome P450 family (eg, CYP1 and CYP2 are both found in group I). The divergence of the cytochrome P450 superfamily into B- and E-classes, and further divergence into stable clusters within the E-class, appears to be very ancient, occurring before the appearance of eukaryotes.This entry represents mitochondrial cytochrome P450 proteins. In the mitochondrial system, cytochrome P450 can be reduced by the 2Fe-2S iron-sulphur protein adrenodoxin, which can accept electrons from NADPH-dependent adrenodoxin reductase. Both adrenodoxin and adrenodoxin reductase are soluble, and located in the mitochondrial matrix.
Protein Domain
Name: STAT transcription factor homologue, coiled coil
Type: Domain
Description: The STAT protein (Signal Transducers and Activators of Transcription) family contains transcription factors that are specifically activated to regulate gene transcription when cells encounter cytokines and growth factors, hence they act as signal transducers in the cytoplasm and transcription activators in the nucleus []. Binding of these factors to cell-surface receptors leads to receptor autophosphorylation at a tyrosine, the phosphotyrosine being recognised by the STAT SH2 domain, which mediates the recruitment of STAT proteins from the cytosol and their association with the activated receptor. The STAT proteins are then activated by phosphorylation via members of the JAK family of protein kinases, causing them to dimerise and translocated to the nucleus, where they bind to specific promoter sequences in target genes. In mammals, STATs comprise a family of seven structurally and functionally related proteins: Stat1, Stat2, Stat3, Stat4, Stat5a and Stat5b, Stat6. STAT proteins play a critical role in regulating innate and acquired host immune responses. Dysregulation of at least two STAT signalling cascades (i.e. Stat3 and Stat5) is associated with cellular transformation.Signalling through the JAK/STAT pathway is initiated when a cytokine binds to its corresponding receptor. This leads to conformational changes in the cytoplasmic portion of the receptor, initiating activation of receptor associated members of the JAK family of kinases. The JAKs, in turn, mediate phosphorylation at the specific receptor tyrosine residues, which then serve as docking sites for STATs and other signalling molecules. Once recruited to the receptor, STATs also become phosphorylated by JAKs, on a single tyrosine residue. Activated STATs dissociate from the receptor, dimerise, translocate to the nucleus and bind to members of the GAS (gamma activated site) family of enhancers.The seven STAT proteins identified in mammals range in size from 750 and 850 amino acids. The chromosomal distribution of these STATs, as well as the identification of STATs in more primitive eukaryotes, suggest that this family arose from a single primordial gene. STATs share 6 structurally and functionally conserved domains including: an N-terminal domain (ND) that strengthens interactions between STAT dimers on adjacent DNA-binding sites; a coiled-coil STAT domain (CCD) that is implicated in protein-protein interactions; a DNA-binding domain (DBD) with an immunoglobulin-like fold similar to p53 tumour suppressor protein; an EF-hand-like linker domain connecting the DNA-binding and SH2 domains; an SH2 domain ( ) that acts as a phosphorylation-dependent switch to control receptor recognition and DNA-binding; and a C-terminal transactivation domain [ , , ]. The crystal structure of the N terminus of Stat4 reveals a dimer. The interface of this dimer is formed by a ring-shaped element consisting of five short helices. Several studies suggest that this N-terminal dimerisation promotes cooperativity of binding to tandem GAS elements and with the transcriptional coactivator CBP/p300.This entry represents a domain found in Dictyostelium STAT proteins. This domain adopts a structure consisting of four long α-helices, folded into a coiled coil. It is responsible for nuclear export of the protein [ ].
Protein Domain
Name: PA2G4 family
Type: Family
Description: Proteins of this family have been identified in a number of species as a nuclear protein with a cell cycle dependence. Various names have been given to members of this family which include cell cycle protein p38-2G4, also known as proliferation-associated protein PA2G4 [ , ], curved DNA-binding protein [], proliferation-associated protein A, and ErbB3-binding protein 1 (Ebp1) []. They constitute the proliferation-associated PA2G4 family, which is structurally homologous to the type II methionine aminopeptidases, but without methionine aminopeptidase activity [, ]. ErbB3 binding protein-1 (Ebp1) is a potential regulator of ErbB3 signaling, and is implicated in cell growth, apoptosis and differentiation in many cell types [, ].These proteins are classified as non-peptidase homologues in the MEROPS peptidase family M24 (clan MG).
Protein Domain
Name: PKD/Chitinase domain
Type: Domain
Description: The PKD (Polycystic Kidney Disease) domain was first identified in the Polycystic Kidney Disease protein, polycystin-1 (PDK1 gene), and contains an Ig-like fold consisting of a β-sandwich of seven strands in two sheets with a Greek key topology, although some members have additional strands [ ]. Polycystin-1 is a large cell-surface glycoprotein involved in adhesive protein-protein and protein-carbohydrate interactions; however it is not clear if the PKD domain mediates any of these interactions. PKD domains are also found in other proteins, usually in the extracellular parts of proteins involved in interactions with other proteins. For example, domains with a PKD-type fold are found in archaeal surface layer proteins that protect the cell from extreme environments [ ], and in the human VPS10 domain-containing receptor SorCS2 [].
Protein Domain
Name: MCAfunc domain
Type: Domain
Description: MCAfunc domain (MCAfunc) is located in the N-terminal region of a mechanosensitive channel protein MID1-COMPLEMENTING ACTIVITY (MCA). MCAfunc represents the provisionally advocated ARPK domain (Amino-terminal domain of Rice putative Protein Kinases), overlapping with the EF hand-like region at the N terminus [ ]. In MCA proteins, MCAfunc has Ca2 influx activity and is proposed to be a functional domain of MCAs []. MCAfunc is exclusively observed in streptophytes and exists not only in MCA but also in E3 ubiquitin ligase-like proteins, ARO3-like proteins and and protein kinases. In the most basal plant lineage, charophytes, MCAfunc is only found in E3 ubiquitin ligase-like proteins and protein of presently unknown function, while MCA proteins are exclusively found in land plants, from bryophytes to angiosperms.
Protein Domain
Name: CHCH
Type: Domain
Description: A conserved motif was identified in the LOC118487 protein was called the CHCH motif. Alignment of this protein with related members showed the presence of three subgroups of proteins, which are called the S (Small), N (N-terminal extended) and C (C-terminal extended) subgroups. All three sub-groups of proteins have in common that they contain a predicted conserved [coiled coil 1]-[helix 1]-[coiled coil 2]-[helix 2]domain (CHCH domain). Within each helix of the CHCH domain, there are two cysteines present in a C-X9-C motif. The N-group contains an additional double helix domain, and each helix contains the C-X9-C motif. This family contains a number of characterised proteins: Cox19 protein - a nuclear gene of Saccharomyces cerevisiae, codes for an 11kDa protein (Cox19p) required for expression of cytochrome oxidase. Because cox19 mutants are able to synthesise the mitochondrial and nuclear gene products of cytochrome oxidase, Cox19p probably functions post-translationally during assembly of the enzyme. Cox19p is present in the cytoplasm and mitochondria, where it exists as a soluble intermembrane protein. This dual location is similar to what was previously reported for Cox17p, a low molecular weight copper protein thought to be required for maturation of the CuA centre of subunit 2 of cytochrome oxidase. Cox19p have four conserved potential metal ligands, these are three cysteines and one histidine. Mrp10 - belongs to the class of yeast mitochondrial ribosomal proteins that are essential for translation [ ]. Eukaryotic NADH-ubiquinone oxidoreductase 19kDa (NDUFA8) subunit []. The CHCH domain was previously called DUF657 [].
Protein Domain
Name: SAND-like domain superfamily
Type: Homologous_superfamily
Description: The SAND domain (named after Sp100, AIRE-1, NucP41/75, DEAF-1) is a conserved ~80 residue region found in a number of nuclear proteins, many of which function in chromatin-dependent transcriptional control. These include proteins linked to various human diseases, such as the Sp100 (Speckled protein 100kDa), NUDR (Nuclear DEAF-1 related), GMEB (Glucocorticoid Modulatory Element Binding) proteins and AIRE-1 (Autoimmune regulator 1) proteins.Proteins containing the SAND domain have a modular structure; the SAND domain can be associated with a number of other modules, including the bromodomain, the PHD finger and the MYND finger. Because no SAND domain has been found in yeast, it is thought that the SAND domain could be restricted to animal phyla. Many SAND domain-containing proteins, including NUDR, DEAF-1 (Deformed epidermal autoregulatory factor-1) and GMEB, have been shown to bind DNA sequences specifically. The SAND domain has been proposed to mediate the DNA binding activity of these proteins [, ]. Structurally, the SAND domain consists of a novel alpha/beta fold, which has a core of three short helices packed against a barrel-like β-sheet; it is structurally similar to the SH3-like fold.Other proteins display domains that are structurally similar to the SAND domain. One such example is the SMAD4-binding domain of the oncoprotein Ski, which is stabilised by a bound zinc atom, and resembles a SAND domain, in which the corresponding I loop is responsible for DNA binding. Ski is able to disrupt the formation of a functional complex between the Co- and R-SMADs, leading to the repression of TGF-beta, Activin and BMP responses, resulting in the repression of TGF-signalling [ ].
Protein Domain
Name: Tetratricopeptide, SHNi-TPR domain
Type: Domain
Description: The tetratrico peptide repeat region (TPR) is a structural motif present in a wide range of proteins [ , , ]. It mediates protein-protein interactions and the assembly of multiprotein complexes []. The TPR motif consists of 3-16 tandem-repeats of 34 amino acids residues, although individual TPR motifs can be dispersed in the protein sequence. Sequence alignment of the TPR domains reveals a consensus sequence defined by a pattern of small and large amino acids. TPR motifs have been identified in various different organisms, ranging from bacteria to humans. Proteins containing TPRs are involved in a variety of biological processes, such as cell cycle regulation, transcriptional control, mitochondrial and peroxisomal protein transport, neurogenesis and protein folding.The X-ray structure of a domain containing three TPRs from protein phosphatase 5 revealed that TPR adopts a helix-turn-helix arrangement, with adjacent TPR motifs packing in a parallel fashion, resulting in a spiral of repeating anti-parallel α-helices [ ]. The two helices are denoted helix A and helix B. The packing angle between helix A and helix B is ~24 degrees within a single TPR and generates a right-handed superhelical shape. Helix A interacts with helix B and with helix A' of the next TPR. Two protein surfaces are generated: the inner concave surface is contributed to mainly by residue on helices A, and the other surface presents residues from both helices A and B. This entry represents SHNi-TPR (Sim3-Hif1-NASP interrupted TPR), a sequence that is an interrupted form of TPR repeat [ ].
Protein Domain
Name: Hemerythrin, metal-binding domain
Type: Domain
Description: The hemerythrin family is composed of hemerythrin proteins found in invertebrates, and a broader collection of bacterial and archaeal homologues. Hemerythrin is an oxygen-binding protein found in the vascular system and coelomic fluid, or in muscles (myohemerythrin) in invertebrates [ ]. Many of the homologous proteins found in prokaryotes are multi-domain proteins with signal-transducing domains such as the GGDEF diguanylate cyclase domain () and methyl-accepting chemotaxis protein (MCP) signalling domain ( ). Most hemerythrins are oxygen-carriers with a bound non-haem iron, but at least one example is a cadmium-binding protein, apparently with a role in sequestering toxic metals rather than in binding oxygen. The prokaryote with the most instances of this domain is Magnetococcus sp. MC-1, a magnetotactic bacterium. Hemerythrins and myohemerythrins [ , ] are small proteins of about 110 to 129 amino acid residues that bind two iron atoms. They are left-twisted 4-α-helical bundles, which provide a hydrophobic pocket where dioxygen binds as a peroxo species, interacting with adjacent aliphatic side chains via van der Waals forces []. In both hemerythrins and myohemerythrins, the active centre is a binuclear iron complex, bound directly to the protein via 7 amino acid side chains [], 5 His, 1 Glu and 1 Asp []. Ovohemerythrin [], a yolk protein from the leech Theromyzon tessulatum seems to belong to this family of proteins, it may play a role in the detoxification of free iron after a blood meal [].This entry represents a metal-binding domain found in the haemerythrin family. In haemerythrin this domain binds iron.
Protein Domain
Name: Gamma-carboxyglutamic acid-rich (GLA) domain
Type: Domain
Description: The GLA (gamma-carboxyglutamic acid-rich) domain contains glutamate residues that have been post-translationally modified by vitamin K-dependent carboxylation to form gamma-carboxyglutamate (Gla) [ , , ]. All glutamic acid (Glu) residues present in the GLA domain are potential carboxylation sites; in coagulation proteins, all Gu residues are modified to Gla, while in osteocalcin and matrix Gla proteins only some Glu residues are modified to Gla. The GLA domain is responsible for the high-affinity binding of calcium ions. It starts at the N-terminal extremity of the mature form of proteins and ends with a conserved aromatic residue; a conserved Gla-x(3)-Gla-x-Cys motif [ ] is found in the middle of the domain which seems to be important for substrate recognition by the carboxylase.The 3D structure of the GLA domain has been solved [ , ]. Calcium ions induce conformational changes in the GLA domain that and are necessary for the proper folding of the GLA domain. A common structural feature of functional GLA domains is the clustering of N-terminal hydrophobic residues into a hydrophobic patch that mediates interaction with the cell surface membrane []. Proteins known to contain a GLA domain include [ ]:Coagulation factor X [ ] Coagulation factor VII [ ] Coagulation factor IX [ ] Coagulation factor XIV (vitamin K-dependent protein C) [ ] Vitamin K-dependent protein S [ ] Vitamin K-dependent protein Z [ ] ProthrombinTransthyretinOsteocalcin (also known as bone-Gla protein, BGP)Matrix Gla protein (MGP) [ ] Inter-alpha-trypsin inhibitor heavy chain H2Growth arrest-specific protein 6 (Gas-6) [ ]
Protein Domain
Name: Gamma-carboxyglutamic acid-rich (GLA) domain superfamily
Type: Homologous_superfamily
Description: The GLA (gamma-carboxyglutamic acid-rich) domain contains glutamate residues that have been post-translationally modified by vitamin K-dependent carboxylation to form gamma-carboxyglutamate (Gla) [ , , ]. All glutamic acid (Glu) residues present in the GLA domain are potential carboxylation sites; in coagulation proteins, all Gu residues are modified to Gla, while in osteocalcin and matrix Gla proteins only some Glu residues are modified to Gla. The GLA domain is responsible for the high-affinity binding of calcium ions. It starts at the N-terminal extremity of the mature form of proteins and ends with a conserved aromatic residue; a conserved Gla-x(3)-Gla-x-Cys motif [ ] is found in the middle of the domain which seems to be important for substrate recognition by the carboxylase.The 3D structure of the GLA domain has been solved [ , ]. Calcium ions induce conformational changes in the GLA domain that and are necessary for the proper folding of the GLA domain. A common structural feature of functional GLA domains is the clustering of N-terminal hydrophobic residues into a hydrophobic patch that mediates interaction with the cell surface membrane []. Proteins known to contain a GLA domain include [ ]:Coagulation factor X [ ] Coagulation factor VII [ ] Coagulation factor IX [ ] Coagulation factor XIV (vitamin K-dependent protein C) [ ] Vitamin K-dependent protein S [ ] Vitamin K-dependent protein Z [ ] ProthrombinTransthyretinOsteocalcin (also known as bone-Gla protein, BGP)Matrix Gla protein (MGP) [ ] Inter-alpha-trypsin inhibitor heavy chain H2Growth arrest-specific protein 6 (Gas-6) [ ]
Protein Domain
Name: IMD/I-BAR domain
Type: Domain
Description: The I-BAR domain (also known as IMD domain, IRSp53 and MIM homology domain) is a BAR-like domain of approximately 250 amino acids found at the N-terminal in the IRSp53 (insulin receptor tyrosine kinase substrate p53) and in the evolutionarily related IRSp53/MIM family. The BAR domain forms an anti-parallel all-helical dimer, with a curved (banana-like) shape, that promotes membrane tubulation. The BAR domain containing proteins can be classified into three types: BAR, F-BAR and I-BAR. BAR and F-BAR proteins generate positive membrane curvature, while I-BAR proteins induce negative curvature [ , ]. The I-BAR domain containing proteins include: Vertebrate MIM (missing in metastasis), an actin-binding scaffold protein that may be involved in cancer metastasis.Vertebrate ABBA, a MIM-related protein.Vertebrate insulin receptor tyrosine kinase substrate p53 (IRSp53), a multifunctional adaptor protein that links Rac1 with a Wiskott-Aldrich syndrome family verprolin-homologous protein 2 (WAVE2) to induce lamellipodia or Cdc42 with Mena to induce filopodia [ ].Vertebrate IRTKS.Vertebrate Pinkbar.Drosophila melanogaster (Fruit fly) CG32082-PA.Caenorhabditis elegans M04F3.5 protein.The vertebrate I-BAR family is divided into two major groups: the IRSp53/IRTKS/Pinkbar subfamily and the MIM/ABBA subfamily. The putative invertebrate homologues are positioned between them. The IRSp53/IRTKS/Pinkbar subfamily members contain a SH3 domain, and the MIM/ABBA subfamily proteins contain a WH2 (WASP-homology 2) domain. The vertebrate SH3-containing subfamily is further divided into three groups according to the presence or absence of the WWB and the half-CRIB motif [ , ]. The BAR domain binds phosphoinositide-rich vesicles with high affinity and does not display strong actin filament binding/bundling activity [, ].
Protein Domain
Name: Alphavirus E2 glycoprotein, domain B
Type: Homologous_superfamily
Description: Alphaviruses are enveloped RNA viruses that use arthropods such as mosquitoes for transmission to their vertebrate hosts, and include Semliki Forest and Sindbis viruses [ ]. Alphaviruses consist of three structural proteins: the core nucleocapsid protein C, and the envelope proteins P62 and E1 () that associate as a heterodimer. The viral membrane-anchored surface glycoproteins are responsible for receptor recognition and entry into target cells through membrane fusion. The proteolytic maturation of P62 into E2 and E3 ( ) causes a change in the viral surface. Together the E1, E2, and sometimes E3 glycoprotein "spikes"form an E1/E2 dimer or an E1/E2/E3 trimer, where E2 extends from the centre to the vertices, E1 fills the space between the vertices, and E3, if present, is at the distal end of the spike [ , ]. Upon exposure of the virus to the acidity of the endosome, E1 dissociates from E2 to form an E1 homotrimer, which is necessary for the fusion step to drive the cellular and viral membranes together []. This entry represents the alphaviral E2 glycoprotein. The E2 glycoprotein functions to interact with the nucleocapsid through its cytoplasmic domain, while its ectodomain is responsible for binding a cellular receptor.The E2 glycoproteins interact with the nucleocapsid through its cytoplasmic domain, while its ectodomain is responsible for binding a cellular receptor. This is an all beta protein belonging to the immunoglobulin superfamily, with three immunoglobulin domains labelled A, B and C in amino- to carboxy-terminal order.This superfamily represents the central domain, known as domain B, of the alphavirus E2 glycoprotein.
Protein Domain
Name: Alphavirus E2 glycoprotein, domain C
Type: Homologous_superfamily
Description: Alphaviruses are enveloped RNA viruses that use arthropods such as mosquitoes for transmission to their vertebrate hosts, and include Semliki Forest and Sindbis viruses [ ]. Alphaviruses consist of three structural proteins: the core nucleocapsid protein C, and the envelope proteins P62 and E1 ( ) that associate as a heterodimer. The viral membrane-anchored surface glycoproteins are responsible for receptor recognition and entry into target cells through membrane fusion. The proteolytic maturation of P62 into E2 and E3 ( ) causes a change in the viral surface. Together the E1, E2, and sometimes E3 glycoprotein "spikes"form an E1/E2 dimer or an E1/E2/E3 trimer, where E2 extends from the centre to the vertices, E1 fills the space between the vertices, and E3, if present, is at the distal end of the spike [ , ]. Upon exposure of the virus to the acidity of the endosome, E1 dissociates from E2 to form an E1 homotrimer, which is necessary for the fusion step to drive the cellular and viral membranes together []. This entry represents the alphaviral E2 glycoprotein. The E2 glycoprotein functions to interact with the nucleocapsid through its cytoplasmic domain, while its ectodomain is responsible for binding a cellular receptor.The E2 glycoproteins interact with the nucleocapsid through its cytoplasmic domain, while its ectodomain is responsible for binding a cellular receptor. This is an all beta protein belonging to the immunoglobulin superfamily, with three immunoglobulin domains labelled A, B and C in amino- to carboxy-terminal order.This superfamily represents the C-terminal domain, known as domain C, of the alphavirus E2 glycoprotein.
Protein Domain
Name: MIP18 family-like
Type: Domain
Description: This domain (previously known as DUF59) is found in proteins that are mostly defined as members of the MIP18 family. This includes iron-sulfur cluster carrier proteins, where the domain is found in the N terminus. This domain is also found in protein AE7 from Arabidopsis and its homologues. Protein AE7 is thought to be a central member of the cytosolic iron-sulfur (Fe-S) protein assembly (CIA) pathway, however protein AE7-like 1 and 2 (also containing this domain) are probably not involved in this pathway []. MIP18 family protein YHR122W (CIA2) from S. cerevisiae is a component of the CIA machinery, and acts at a late step of Fe-S cluster assembly []. The SufT protein from Staphylococcus aureus is composed of this domain solely and is involved in the maturation of FeS proteins [].
Protein Domain
Name: Specific amino acids and opine-binding periplasmic protein, ABC transporter
Type: Family
Description: Bacterial high affinity transport systems are involved in active transport of solutes across the cytoplasmic membrane. Most of the bacterial ABC (ATP-binding cassette) importers are composed of one or two transmembrane permease proteins, one or two nucleotide-binding proteins and a highly specific periplasmic solute-binding protein. In Gram-negative bacteria the solute-binding proteins are dissolved in the periplasm, while in archaea and Gram-positive bacteria, their solute-binding proteins are membrane-anchored lipoproteins [ , ].On the basis of sequence similarities, the vast majority of these solute-binding proteins can be grouped into eight families of clusters, which generally correlate with the nature of the solute bound. Family 3 groups together specific amino acids periplasmic proteins, such as HisJ [ ], and opine-binding periplasmic proteins, such as NocT []. This entry include a subset of the family 3.
Protein Domain
Name: KID repeat
Type: Repeat
Description: This group of proteins contains the KID repeat as found in Borrelia and spirochete Repeat motif-containing proteins including RepA/Rep+, RepU and various Bdr proteins. RepA and related Borrelia proteins have been suggested to play an important genus-wide role in the biology of the Borrelia [ ]. Bdr proteins are polymorphic inner membrane proteins produced by most Borrelia species []. The bdr genes of encode proteins that form three distinct subfamilies: BdrD, BdrE, and BdrF. bdr orthologues have been appear to be present in all Borrelia species that have been analysed []. It is thought that BdrF2 and the other proteins encoded by the operon form an inner-membrane-associated protein complex that may interact with DNA and which may act during transmission or in the early stages of infection [].
Protein Domain
Name: CS domain
Type: Domain
Description: The bipartite CS domain, which was named after CHORD-containing proteins and SGT1 [ ], is a ~100-residue protein-protein interaction module. The CS domain can be found in stand-alone form, as well as fused with other domains, such as CHORD (), SGS ( ), TPR ( ), cytochrome b5 ( ) or b5 reductase, in multidomain proteins [ ]. The CS domain has a compact antiparallel β-sandwich fold consisting of seven β-strands [, ]. Some proteins known to contain a CS domain are listed below []: Eukaryotic proteins of the SGT1 family. Eukaryotic Rar1, related to pathogenic resistance in plants, and to development in animals. Eukaryotic nuclear movement protein nudC. Eukaryotic proteins of the p23/wos2 family, which act as co-chaperone. Animal b5+b5R flavo-hemo cytochrome NAD(P)H oxydoreductase type B. Mammalian integrin beta-1-binding protein 2 (melusin).
Protein Domain
Name: Mitochondrial import inner membrane translocase subunit Tim54
Type: Family
Description: Mitochondrial function depends on the import of hundreds of different proteins synthesised in the cytosol. Protein import is a multi-step pathway which includes the binding of precursor proteins to surface receptors, translocation of the precursor across one or both mitochondrial membranes, and folding and assembly of the imported protein inside the mitochondrion. Most precursor proteins carry amino-terminal targeting signals, called pre-sequences, and are imported into mitochondria via import complexes located in both the outer and the inner membrane (IM). The IM complex, TIM, is made up of at least two proteins which mediate translocation of proteins into the matrix by removing their signal peptide and another pair of proteins, Tim54 and Tim22, that insert the polytopic proteins, that carry internal targeting information, into the inner membrane [ ].
Protein Domain
Name: NAD-protein ADP-ribosyltransferase ModA-like
Type: Family
Description: This entry represents a family of NAD-protein ADP-ribosyltransferases found in Myoviridae, including ModA from the bacteriophage T4 [ ].Bacteriophage T4 codes for three ADP-ribosyltransferases: Alt, ModA, and ModB. The ADP-ribosylating activity of each is directed to a specific set of host proteins. ModA is known to modify subunits of RNA-polymerase [].Protein ADP-ribosylation is an important posttranslational modification catalyzed by a group of enzymes known as ADP-ribosyltransferases (ADP-RTs) [ ]. ADP-RTs transfer single or multiple ADP-ribose moieties from NAD to a specific amino acid residue within a target protein, forming mono ADP-ribosylation or poly ADP-ribosylation (PARylation) []. ADP-ribosylation changes the electrostatic potential of a target protein by introducing two phosphate groups and may affect protein-DNA as well as protein-protein interactions []. Protein ADP-ribosylation plays versatile roles in multiple biological processes.
Protein Domain
Name: Norovirus peptidase C37
Type: Domain
Description: Noroviruses (NVs, formerly "Norwalk-like viruses"), which belong to the Caliciviridae, are the major causative agents of nonbacterial acute gastroenteritis in humans. The NV genome, which consists of positive-sense,single-stranded RNA, contains three open reading frames (ORFs). The first ORF produces a polyprotein that is processed by the viral 3C-like protease intosix nonstructural proteins. The six NV ORF1 nonstructural proteins are homologous to picornaviral nonstructural proteins and are named accordingly:N-terminal protein, 2C-like nucleoside triphosphatase, 3A-like protein, 3B VPg (genome-linked viral protein), 3C-like protease, and a 3D RNA-dependent RNA polymerase. NV 3C-like proteases are the key enzymes for ORF1 polyprotein processing and also cleave the poly(A)-binding protein, causing cellulartranslation inhibition. NV 3C-like proteases belong to the chymotrypsin-like protease family, in that they appear to have chymotrypsin-like folds. Whether the3C-like protease domain has a catalytic dyad of composed of histidine and cysteine or tryad of histidine, glutamate and cysteine remains controversial [, ]. The NV3C-like protease domain forms MEROPS peptidase family C37. The NV 3C-like protease domain adopts a serine protease-like fold that consists of two β-barrels separated by a cleft within which lie the active site catalyticresidues. The N-terminal beta barrel has two α-helices and seven β-strands. The β-strands form a twisted antiparallel beta-sheet resembling an incomplete β-barrel. The core of the incomplete beta- barrel contains hydrophobic residues. The active site histidine residue isfound in the N-terminal β-barrel, as well as the glutamate. The C-terminal six-stranded antiparallel β-barrel contains the active site cysteine. Thecatalytic site formed is situated deep within a cleft between the N- and C- terminal β-barrels [, ].A cysteine peptidase is a proteolytic enzyme that hydrolyses a peptide bond using the thiol group of a cysteine residue as a nucleophile. Hydrolysis involves usually a catalytic triad consisting of the thiol group of the cysteine, the imidazolium ring of a histidine, and a third residue, usually asparagine or aspartic acid, to orientate and activate the imidazolium ring. In only one family of cysteine peptidases, is the role of the general base assigned to a residue other than a histidine: in peptidases from family C89 (acid ceramidase) an arginine is the general base. Cysteine peptidases can be grouped into fourteen different clans, with members of each clan possessing a tertiary fold unique to the clan. Four clans of cysteine peptidases share structural similarities with serine and threonine peptidases and asparagine lyases. From sequence similarities, cysteine peptidases can be clustered into over 80 different families [ ]. Clans CF, CM, CN, CO, CP and PD contain only one family.Cysteine peptidases are often active at acidic pH and are therefore confined to acidic environments, such as the animal lysosome or plant vacuole. Cysteine peptidases can be endopeptidases, aminopeptidases, carboxypeptidases, dipeptidyl-peptidases or omega-peptidases. They are inhibited by thiol chelators such as iodoacetate, iodoacetic acid, N-ethylmaleimide or p-chloromercuribenzoate. Clan CA includes proteins with a papain-like fold. There is a catalytic triad which occurs in the order: Cys/His/Asn (or Asp). A fourth residue, usually Gln, is important for stabilising the acyl intermediate that forms during catalysis, and this precedes the active site Cys. The fold consists of two subdomains with the active site between them. One subdomain consists of a bundle of helices, with the catalytic Cys at the end of one of them, and the other subdomain is a β-barrel with the active site His and Asn (or Asp). There are over thirty families in the clan, and tertiary structures have been solved for members of most of these. Peptidases in clan CA are usually sensitive to the small molecule inhibitor E64, which is ineffective against peptidases from other clans of cysteine peptidases [ ].Clan CD includes proteins with a caspase-like fold. Proteins in the clan have an α/β/α sandwich structure. There is a catalytic dyad which occurs in the order His/Cys. The active site His occurs in a His-Gly motif and the active site Cys occurs in an Ala-Cys motif; both motifs are preceded by a block of hydrophobic residues [ ]. Specificity is predominantly directed towards residues that occupy the S1 binding pocket, so that caspases cleave aspartyl bonds, legumains cleave asparaginyl bonds, and gingipains cleave lysyl or arginyl bonds.Clan CE includes proteins with an adenain-like fold. The fold consists of two subdomains with the active site between them. One domain is a bundle of helices, and the other a β-barrel. The subdomains are in the opposite order to those found in peptidases from clan CA, and this is reflected in the order of active site residues: His/Asn/Gln/Cys. This has prompted speculation that proteins in clans CA and CE are related, and that members of one clan are derived from a circular permutation of the structure of the other.Clan CL includes proteins with a sortase B-like fold. Peptidases in the clan hydrolyse and transfer bacterial cell wall peptides. The fold shows a closed β-barrel decorated with helices with the active site at one end of the barrel [ ]. The active site consists of a His/Cys catalytic dyad.Cysteine peptidases with a chymotrypsin-like fold are included in clan PA, which also includes serine peptidases. Cysteine peptidases that are N-terminal nucleophile hydrolases are included in clan PB. Cysteine peptidases with a tertiary structure similar to that of the serine-type aspartyl dipeptidase are included in clan PC. Cysteine peptidases with an intein-like fold are included in clan PD, which also includes asparagine lyases.
Protein Domain
Name: Zinc finger, ZPR1-type
Type: Domain
Description: Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [ , , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. This entry represents ZPR1-type zinc finger domains. An orthologous protein found once in each of the completed archaeal genomes corresponds to a zinc finger-containing domain repeated as the N-terminal and C-terminal halves of the mouse protein ZPR1. ZPR1 is an experimentally proven zinc-binding protein that binds the tyrosine kinase domain of the epidermal growth factor receptor (EGFR); binding is inhibited by EGF stimulation and tyrosine phosphorylation, and activation by EGF is followed by some redistribution of ZPR1 to the nucleus. By analogy, other proteins with the ZPR1 zinc finger domain may be regulatory proteins that sense protein phosphorylation state and/or participate in signal transduction (see also ). Deficiencies in ZPR1 may contribute to neurodegenerative disorders. ZPR1 appears to be down-regulated in patients with spinal muscular atrophy (SMA), a disease characterised by degeneration of the alpha-motor neurons in the spinal cord that can arise from mutations affecting the expression of Survival Motor Neurons (SMN) [ ]. ZPR1 interacts with complexes formed by SMN [], and may act as a modifier that effects the severity of SMA.
Protein Domain
Name: Dishevelled family
Type: Family
Description: Wnt proteins constitute a large family of secreted signalling molecules that are involved in intercellular signalling during development. The name derives from the first 2 members of the family to be discovered: int-1 (mouse) and wingless (Wg) (Drosophila) []. It is now recognised that Wnt signalling controls many cell fate decisions in a variety of different organisms, including mammals. Wnt signalling has been implicated in tumourigenesis, early mesodermal patterning of the embryo, morphogenesis of the brain and kidneys, regulation of mammary gland proliferation and Alzheimer's disease [ ].Wnt signal transduction proceeds initially via binding to their cell surface receptors - the so-called frizzled proteins. This activates thesignalling functions of B-catenin and regulates the expression of specific genes important in development []. More recently, however, several non-canonical Wnt signalling pathways have been elucidated that act independently of B-catenin. In both cases, the transduction mechanismrequires dishevelled protein (Dsh), a cytoplasmic phosphoprotein that acts directly downstream of frizzled []. In addition to its role in Wnt signalling, Dsh is also involved in generating planar polarity in Drosophila and has been implicated in the Notch signal transduction cascade. Three human and mouse homologues of Dsh have been cloned (DVL-1 to 3); it is believed that these proteins, like their Drosophila counterpart, are involved in signal transduction. Human and murine orthologues share more than 95% sequence identity and are each 40-50% identical to Drosophila Dsh. Sequence similarity amongst Dsh proteins is concentrated around three conserved domains: at the N terminus lies a DIX domain (mutations mapping to this region reduce or completely disrupt Wg signalling); a PDZ (or DHR) domain, often found in proteins involved in protein-protein interactions, lies within the central portion of the protein (point mutations within this module have been shown to have little effect on Wg-mediated signal transduction); and a DEP domain is located towards the C terminus and is conserved among a set of proteins that regulate various GTPases (whilst genetic and molecular assays have shown this module to be dispensable for Wg signalling, it is thought to be important in planar polarity signalling in flies []). Therefore the requirement of these domains for distinct signaling pathways varies: the DIX domain is essential for B-catenin activation, the DEP domain is implicated in the activation of the JNK pathway, while the PDZ domain is requiredfor both [ ].
Protein Domain
Name: Peripherin/rom-1, conserved site
Type: Conserved_site
Description: Tetraspanins are a distinct family of cell surface proteins, containing four conserved transmembrane domains: a small outer loop (EC1), a larger outer loop (EC2), a small inner loop (IL) and short cytoplasmic tails. They contain characteristic structural features, including 4-6 conserved extracellular cysteine residues, and polar residues within transmembrane domains. A fundamental role of tetraspanins appears to be organising other proteins into a network of multimolecular membrane microdomains, sometimes called the 'tetraspanin web'. Within this web there are primary complexes in which tetraspanins show robust, specific, and direct lateral associations with other proteins. The strong tendency of tetraspanins to associate with each other probably contributes to the assembly of a network of secondary interactions in which non-tetraspanin proteins are associated with each other via palmitoylated tetraspanins acting as linker proteins. In addition, the association of lipids, such as gangliosides and cholesterol, probably contributes to the assembly of even larger tetraspanin complexes, which have some lipid raft-like properties (e.g. resistance to solubilization in non-ionic detergents). Within the tetraspanin web, tetraspanin proteins can associate not only with integrins and other transmembrane proteins, but also with signalling enzymes such as protein kinase C and phosphatidylinositol-4 kinase. Thus, the tetraspanin web provides a mechanistic framework by which membrane protein signalling can be expanded into a lateral dimension [ ].The outer segments of vertebrate rod photoreceptor cells are specialised organelles that function in the transduction of light into electrical signals as part of the visual excitation process. These organelles contain thousands of closely-stacked disk membranes, which have distinctly different protein compositions in their lamellar and rim regions []. Peripherin (or RDS) and rom-1 are related retinal-specific memebers of the tetraspanin family which are located at the rims of the photoreceptor disks, where they may act jointly in disk morphogenesis []. Both peripherin and rom-1 formdisulphide-linked homodimers. Defects in the peripherin gene (RDS) cause various human diseases such as autosomal dominant retinitis pigmentosa, autosomal dominant punctata albescens and butterfly-shaped pigment dystrophy. In mice it causes retinopathy known as 'retinal degeneration slow' (rds). These proteins contain about 350 amino acid residues. Structurally they consist of a short cytoplasmic N-terminal domain, followed by four transmembrane segments that delimit two lumenal and one cytoplasmic loops; the C-terminal domain is cytoplasmic. The second lumenal loop is very large (about 140 amino acid residues) and contains seven conserved cysteines.
Protein Domain
Name: Apolipoprotein C-I
Type: Family
Description: Exchangeable apolipoproteins are water-soluble protein components of lipoproteins that solubilise lipids and regulate their metabolism by binding to cell receptors or activating specific enzymes. Apolipoprotein C-I (ApoC-1) is the smallest exchangeable apolipoprotein and transfers among HDL (high density lipoprotein), VLDL (very low-density lipoprotein) and chlylomicrons. ApoC-1 activates lecithin:choline acetyltransferase (LCAT), inhibits cholesteryl ester transfer protein, can inhibit hepatic lipase and phospholipase 2 and can stimulate cell growth. ApoC-1 delays the clearance of beta-VLDL by inhibiting its uptake via the LDL receptor-related pathway [ ]. ApoC-1 has been implicated in hypertriglyceridemia [], and Alzheimer s disease [].ApoC-1 is believed to comprise of two dynamic helices that are stabilised by interhelical interactions and are connected by a short linker region. The minimal folding unit in the lipid-free state of this and other exchangeable apolipoproteins comprises the helix-turn-helix motif formed of four 11-mer sequence repeats.
Protein Domain
Name: Apolipoprotein C-I domain superfamily
Type: Homologous_superfamily
Description: Exchangeable apolipoproteins are water-soluble protein components of lipoproteins that solubilise lipids and regulate their metabolism by binding to cell receptors or activating specific enzymes. Apolipoprotein C-I (ApoC-1) is the smallest exchangeable apolipoprotein and transfers among HDL (high density lipoprotein), VLDL (very low-density lipoprotein) and chlylomicrons. ApoC-1 activates lecithin:choline acetyltransferase (LCAT), inhibits cholesteryl ester transfer protein, can inhibit hepatic lipase and phospholipase 2 and can stimulate cell growth. ApoC-1 delays the clearance of beta-VLDL by inhibiting its uptake via the LDL receptor-related pathway [ ]. ApoC-1 has been implicated in hypertriglyceridemia [], and Alzheimer s disease [].ApoC-1 is believed to comprise of two dynamic helices that are stabilised by interhelical interactions and are connected by a short linker region. The minimal folding unit in the lipid-free state of this and other exchangeable apolipoproteins comprises the helix-turn-helix motif formed of four 11-mer sequence repeats.
Protein Domain
Name: Tissue inhibitor of metalloproteinases-like, OB-fold
Type: Homologous_superfamily
Description: Tissue inhibitors of metalloproteinases (TIMP) are a family of proteins that can form complexes with extracellular matrix metalloproteinases (such as collagenases) and irreversibly inactivate them [ ]. TIMP and related proteins contains a five-stranded antiparallel β-sheet that is rolled over on itself to form a closed β-barrel, and two short helices, which pack close to one another on the same barrel face. A comparison of the delta TIMP-2 structure with other known protein folds reveals that the β-barrel topology is homologous to that seen in proteins of the oligosaccharide/oligonucleotide binding (OB) fold family, a five-stranded β-sheet coiled to form a closed β-barrel capped by an α-helix located between the third and fourth strands [].Other proteins contain domains with a similar OB-like fold:Netrin-like domain (NTR/C345C module), found in procollagen c-proteinase enhancer protein PCOLCE, and in the complement C5 domain.Laminin-binding domain, found in agrin.
Protein Domain
Name: Cytochrome P450, E-class, group I, CYP2E-like
Type: Family
Description: Cytochrome P450 enzymes are a superfamily of haem-containing mono-oxygenases that are found in all kingdoms of life, and which show extraordinary diversity in their reaction chemistry. In mammals, these proteins are found primarily in microsomes of hepatocytes and other cell types, where they oxidise steroids, fatty acids and xenobiotics, and are important for the detoxification and clearance of various compounds, as well as for hormone synthesis and breakdown, cholesterol synthesis and vitamin D metabolism. In plants, these proteins are important for the biosynthesis of several compounds such as hormones, defensive compounds and fatty acids. In bacteria, they are important for several metabolic processes, such as the biosynthesis of antibiotic erythromycin in Saccharopolyspora erythraea (Streptomyces erythraeus).Cytochrome P450 enzymes use haem to oxidise their substrates, using protons derived from NADH or NADPH to split the oxygen so a single atom can be added to a substrate. They also require electrons, which they receive from a variety of redox partners. In certain cases, cytochrome P450 can be fused to its redox partner to produce a bi-functional protein, such as with P450BM-3 from Bacillus megaterium [ ], which has haem and flavin domains.Organisms produce many different cytochrome P450 enzymes (at least 58 in humans), which together with alternative splicing can provide a wide array of enzymes with different substrate and tissue specificities. Individual cytochrome P450 proteins follow the nomenclature: CYP, followed by a number (family), then a letter (subfamily), and another number (protein); e.g. CYP3A4 is the fourth protein in family 3, subfamily A. In general, family members should share >40% identity, while subfamily members should share >55% identity.Cytochrome P450 proteins can also be grouped by two different schemes. One scheme was based on a taxonomic split: class I (prokaryotic/mitochondrial) and class II (eukaryotic microsomes). The other scheme was based on the number of components in the system: class B (3-components) and class E (2-components). These classes merge to a certain degree. Most prokaryotes and mitochondria (and fungal CYP55) have 3-component systems (class I/class B) - a FAD-containing flavoprotein (NAD(P)H-dependent reductase), an iron-sulphur protein and P450. Most eukaryotic microsomes have 2-component systems (class II/class E) - NADPH:P450 reductase (FAD and FMN-containing flavoprotein) and P450. There are exceptions to this scheme, such as 1-component systems that resemble class E enzymes [ , , ]. The class E enzymes can be further subdivided into five sequence clusters, groups I-V, each of which may contain more than one cytochrome P450 family (eg, CYP1 and CYP2 are both found in group I). The divergence of the cytochrome P450 superfamily into B- and E-classes, and further divergence into stable clusters within the E-class, appears to be very ancient, occurring before the appearance of eukaryotes.This entry represents the CYP2E family from group I, class E, cytochrome P450 proteins, as well as other CYP2 family proteins. The CYP2 family comprises 15 subfamilies (A-H, J-N, P and Q). The first five (A-E) are present in mammalian liver, but in differing amounts and with different inducibilities. These five subfamilies show varied substrate specificities, with some degree of overlap. CYP2E1 metabolises several therapeutic drugs, precarcinogens and solvents to reactive metabolites, and as such is appears involved in cancer susceptibility [ ].
Protein Domain
Name: Macro domain-like
Type: Homologous_superfamily
Description: The Macro or A1pp domain is a module of about 180 amino acids which can bind ADP-ribose (an NAD metabolite) or related ligands. Binding to ADP-ribose could be either covalent or non-covalent [ ]: in certain cases it is believed to bind non-covalently []; while in other cases (such as Aprataxin) it appears to bind both non-covalently through a zinc finger motif, and covalently through a separate region of the protein []. The domain was described originally in association with ADP-ribose 1''-phosphate (Appr-1''-P) processing activity (A1pp) of the yeast YBR022W protein []. The domain is also called Macro domain as it is the C-terminal domain of mammalian core histone macro-H2A [, ]. Macro domain proteins can be found in eukaryotes, in (mostly pathogenic) bacteria, in archaea and in ssRNA viruses, such as coronaviruses [, ], Rubella and Hepatitis E viruses. In vertebrates the domain occurs e.g. in histone macroH2A, in predicted poly-ADP-ribose polymerases (PARPs) and in B aggressive lymphoma (BAL) protein. The macro domain can be associated with catalytic domains, such as PARP, or sirtuin. The Macro domain can recognise ADP-ribose or in some cases poly-ADP-ribose, which can be involved in ADP-ribosylation reactions that occur in important processes, such as chromatin biology, DNA repair and transcription regulation []. The human macroH2A1.1 Macro domain binds an NAD metabolite O-acetyl-ADP-ribose []. The Macro domain has been suggested to play a regulatory role in ADP-ribosylation, which is involved in inter- and intracellular signaling, transcriptional regulation, DNA repair pathways and maintenance of genomic stability, telomere dynamics, cell differentiation and proliferation, and necrosis and apoptosis. The 3D structure of the SARS-CoV Macro domain has a mixed α/β fold consisting of a central seven-stranded twisted mixed β-sheet sandwiched between two α-helices on one face, and three on the other. The final α-helix, located on the edge of the central β-sheet, forms the C terminus of the protein [ ]. The crystal structure of AF1521 (a Macro domain-only protein from Archaeoglobus fulgidus) has also been reported and compared with other Macro domain containing proteins. Several Macro domain only proteins are shorter than AF1521, and appear to lack either the first strand of the β-sheet or the C-terminal helix 5. Well conserved residues form a hydrophobic cleft and cluster around the AF1521-ADP-ribose binding site [, , , ]. Aminopeptidases are exopeptidases involved in the processing and regular turnover of intracellular proteins, although their precise role in cellular metabolism is unclear [ , ].Leucine aminopeptidases cleave leucine residues from the N-terminal of polypeptide chains; in general they are involved in the processing, catabolism and degradation of intracellular proteins [ , , ]. Leucyl aminopeptidase forms a homohexamer containing two trimers stacked on top of one another []. Each monomer binds two zinc ions. The zinc-binding and catalytic sites are located within the C-terminal catalytic domain []. Leucine aminopeptidase has been shown to be identical with prolyl aminopeptidase () in mammals [ ].The N-terminal domain of these proteins has been shown in Escherichia coli PepA to function as a DNA-binding protein in Xer site-specific recombination and in transcriptional control of the carAB operon [ , ].This superfamily represents the Macro domain as well as the N-terminal domain of Leucine aminopeptidase.
Protein Domain
Name: Small-subunit processome, Utp12
Type: Domain
Description: A large ribonuclear protein complex is required for the processing of the small-ribosomal-subunit rRNA - the small-subunit (SSU) processome [ , ]. This preribosomal complex contains the U3 snoRNA and at least 40 proteins, which have the following properties: They are nucleolar.They are able to coimmunoprecipitate with the U3 snoRNA and Mpp10 (a protein specific to the SSU processome). They are required for 18S rRNA biogenesis.There appears to be a linkage between polymerase I transcription and the formation of the SSU processome; as some, but not all, of the SSU processome components are required for pre-rRNA transcription initiation. These SSU processome components have been termed t-Utps. They form a pre-complex with pre-18S rRNA in the absence of snoRNA U3 and other SSU processome components. It has been proposed that the t-Utp complex proteins are both rDNA and rRNA binding proteins that are involved in the initiation of pre18S rRNA transcription. Initially binding to rDNA then associating with the 5' end of the nascent pre18S rRNA. The t-Utpcomplex forms the nucleus around which the rest of the SSU processome components, including snoRNA U3, assemble [ ]. From electron microscopy the SSU processome may correspond to the terminal knobs visualized at the 5' ends of nascent 18S rRNA. This domain is found at the C terminus of proteins containing WD40 repeats. These proteins are part of the U3 ribonucleoprotein. In yeast, these proteins are called Utp5, Utp1 or Pwp2, Utp12 or DIP2 . They interact with snoRNA U3 and with MPP10 [ ]. Pwp2 is an essential Saccharomyces cerevisiae (Baker's yeast) protein involved in cell separation.
Protein Domain
Name: 26S proteasome non-ATPase regulatory subunit Rpn12
Type: Family
Description: Intracellular proteins, including short-lived proteins such as cyclin, Mos, Myc, p53, NF-kappaB, and IkappaB, are degraded by the ubiquitin-proteasome system. The 26S proteasome is a self-compartmentalising protease responsible for the regulated degradation of intracellular proteins in eukaryotes [ , ]. This giant intracellular protease is formed by several subunits arranged into two 19S polar caps, where protein recognition and ATP-dependent unfolding occur, flanking a 20S central barrel-shaped structure with an inner proteolytic chamber. This overall structure is highly conserved among eukaryotes and is essential for cell viability. Proteins targeted to the 26S proteasome are conjugated with a polyubiquitin chain by an enzymatic cascade before delivery to the 26S proteasome for degradation into oligopeptides.The 26S proteasome can be divided into two subcomplexes: the 19S regulatory particle (RP) and the 20S core particle (CP) [ ]. The 19S component is divided into a "base"subunit containing six ATPases (Rpt proteins) and two non-ATPases (Rpn1, Rpn2), and a "lid"subunit composed of eight stoichiometric proteins (Rpn3, Rpn5, Rpn6, Rpn7, Rpn8, Rpn9, Rpn11, Rpn12) [ ]. Additional non-essential and species specific proteins may also be present. The 19S unit performs several essential functions including binding the specific protein substrates, unfolding them, cleaving the attached ubiquitin chains, opening the 20S subunit, and driving the unfolded polypeptide into the proteolytic chamber for degradation. The 26s proteasome and 19S regulator are of medical interest due to their involvement in burn rehabilitation [].This entry represents Rpn12 (also often annotated as 26S proteasome non-ATPase regulatory subunit 8). This protein has been shown to be important for the transition from metaphase to anaphase and the activation of Cdc28p kinase in yeast [ , ].
Protein Domain
Name: Calreticulin
Type: Family
Description: Calreticulin is a ubiquitous protein found in a wide range of species and in all nucleated cell types. It is an ancient and highly conserved protein with an exceptionally wide scope and variety of functions. Initially known as the high-affinity calcium-binding endoplasmic reticulum (ER) and sarcoplamic reticulum (SR) protein "calregulin", calreticulin is now known to associate with proteins in the cytoplasm, nucleus and extracellular compartment. Calreticulin is a major Ca 2 -binding/storage chaperone residing in the ER lumen [ ]. Molecular chaperones residing in the ER facilitate the folding and prevent the aggregation of newly synthesized proteins. Interaction between the molecular chaperone and the misfolded protein leads to the retention, retranslocation and eventual degradation of the misfolded protein by the proteasome after ubiquitination []. Calreticulin binds (buffers) Ca2 with high capacity and participates in folding newly synthesized proteins and glycoproteins. It is an important component of the calreticulin/calnexin cycle and quality control pathways in the ER [ ]. Studies on calreticulin-deficient and calreticulin-transgenic mice revealed that calreticulin is a new cardiac embryonic gene and is essential during cardiac development [, ]. Calreticulin has also been characterised as an extracellular lectin, an intracellular mediator of integrin-mediated cell adhesion, an inhibitor of steroid hormone-regulated gene expression and a C1q-binding protein []. A proposed model of calreticulin domains includes a globular N-domain, a central proline-rich P-domain and an acidic C-domain. A detailed structure of the central P-domain was revealed by NMR studies, while a model of the globular N-domain of calreticulin is based on crystallographic data reported for the highly similar calnexin [ ].Calreticulin is also known as calregulin, Erp60, CRP55, CAB-63 and CaBP3 [ ].
Protein Domain
Name: BEN domain
Type: Domain
Description: The BEN domain is found in diverse proteins including:SMAR1 (Scaffold/Matrix attachment region-binding protein 1; also known as BANP), a tumour-suppressor MAR-binding protein that down-regulates Cyclin D1 expression by recruiting HDAC1-mSin3A co-repressor complex at Cyclin D1 promoter locus; SMAR1 is the target of prostaglandin A2 (PGA2) induced growth arrest [ , ].NAC1, a novel member of the POZ/BTB (Pox virus and Zinc finger/Bric-a-bracTramtrack Broad complex), but which varies from other proteins of this class in that it lacks the characteristic DNA-binding motif [ ].Mod(mdg4) isoform C, the modifier of the mdg4 locus in Drosophila melanogaster (Fruit fly), where mdg4 encodes chromatin proteins which are involved in position effect variegation, establishment of chromatin boundaries, nerve path finding, meiotic chromosome pairing and apoptosis [ ]. Trans-splicing of Mod(mdg4) produces at least 26 transcripts.E5R protein from Chordopoxvirus virosomes, which is found in cytoplasmic sites of viral DNA replication [ ].Several proteins of polydnaviruses.BEN-containing proteins were characterised to have chromatin-related functions as an adaptor for the higher-order structuring of chromatin, and recruitment of chromatin modifying factors in transcriptional regulation. BEN domain is an intrinsic sequence-specific DNA-binding domain [ ]. The presence of BEN domains in a poxviral early virosomal protein and in polydnavirus proteins also suggests a possible role in the organisation of viral DNA during replication or transcription. They are generally linked to other globular domains with functions related to transcriptional regulation and chromatin structure, such as BTB, C4DM, and C2H2 fingers [].This domain forms an all-α fold with four conserved helices. Its conservation pattern reveals several conserved residues, most of which have hydrophobic side-chains and are likely to stabilise the fold through helix-helix packing [, ].
Protein Domain
Name: Geranylgeranyl transferase type-1 subunit beta
Type: Family
Description: This entry represents the beta subunit (alpha 6 - alpha 6 barrel fold) of geranylgeranyltransferase type I (GGTase-I)-like proteins containing the protein prenyltransferase (PTase) domain. GGTase-I is a subgroup of the protein prenyltransferase family of lipid-modifying enzymes PTases, which catalyze the carboxyl-terminal lipidation of Ras, Rab, and several other cellular signal transduction proteins, facilitating membrane associations and specific protein-protein interactions. Prenyltransferases employ a Zn2+ ion to alkylate a thiol group catalyzing the formation of thioether linkages between cysteine residues at or near the C terminus of protein acceptors and the C1 atom of isoprenoid lipids (geranylgeranyl (20-carbon) in the case of GGTase-I) [ ]. GGTase-I prenylates the cysteine in the terminal sequence, 'CAAX' when X is Leu or Phe. Substrates for GTTase-I include the gamma subunit of neural G-proteins and several Ras-related G-proteins [, ]. PTases are heterodimeric with both alpha and beta subunits required for catalytic activity.
Protein Domain
Name: RAD23A/RAD23B, UBA1 domain
Type: Domain
Description: Proteins containing this domain include mammalian orthologues of yeast nucleotide excision repair (NER) proteins Rad23 (in Saccharomyces cerevisiae) and Rhp23 (in Schizosaccharomyces pombe). Rad23 proteins play dual roles in DNA repair as well as in proteosomal degradation. They have affinity for both the proteasome and ubiquitinylated proteins and participate in translocating polyubiquitinated proteins to the proteasome. Rad23 proteins carry a ubiquitin-like (UBL) and two ubiquitin-associated (UBA) domains, as well as a xeroderma pigmentosum group C (XPC) protein-binding domain. The UBL domain is responsible for binding to proteasome. Both the UBL domain and the XPC-binding domain are necessary for efficient NER function of Rad23 proteins [ , , ]. UBA domains are important for binding of ubiquitin (Ub) or multi-ubiquitinated substrates which suggests Rad23 proteins might be involved in certain pathways of ubiquitin metabolism [ , ]. This entry represents the first UBA domain (UBA1).
Protein Domain
Name: Biopterin transporter family
Type: Family
Description: Members of this family are transmembrane proteins. Several are Leishmania putative proteins that are thought to be pteridine transporters. One such protein , previously termed (and is still annotated as) ORFG, was shown to encode a biopterin transport protein using null mutants [ ], thus being subsequently renamed BT1. The significant similarity of ORFG/BT1 to Trypanosoma brucei ESAG10 (a putative transmembrane protein and another member of this family) was previously noted []. This family also contains folate-biopterin transporters (FBTs) from blue-green algae and plants, including Slr0642 protein from Synechocystis and its plastidial orthologue At2g32040 from Arabidopsis []. Both Slr0642 protein and At2g32040 mediate folate monoglutamate transport involved in tetrahydrofolate biosynthesis. However, this entry also includes seven other Arabidopsis FBT proteins that lack conserved critical residues and may not have folate or pterin transport activity []. In addition, it also contains two predicted prokaryotic proteins (from the cyanobacteria Synechocystis and Synechococcus).
Protein Domain
Name: Apolipoprotein N-acyltransferase
Type: Family
Description: Apolipoprotein N-acyltransferase (Lnt) transfers the acyl group to lipoproteins and is involved in lipoprotein biosynthesis in Gram-negative bacteria. It is an integral membrane protein [ ]. In the last step of lipoprotein maturation, N-acylation by apolipoprotein N-acyltransferase of the plasma membrane is required for recognition of the outer membrane lipoproteins by the Lol system, which transports the lipoproteins from the plasma to the outer membrane []. This is a reverse amidase (i.e. condensation) reaction. This entry also includes bifunctional apolipoprotein N-acyltransferase/polyprenol monophosphomannose synthase from Mycobacterium tuberculosis. This subgroup belongs to a larger nitrilase superfamily comprised of nitrile- or amide-hydrolyzing enzymes and amide-condensing enzymes, which depend on a Glu-Lys-Cys catalytic triad. This superfamily has been classified in the literature based on global and structure based sequence analysis into thirteen different enzyme classes (referred to as 1-13), this subgroup corresponds to class 9 [ , ].
Protein Domain
Name: Alpha crystallin/Hsp20 domain
Type: Domain
Description: Prokaryotic and eukaryotic organisms respond to heat shock or other environmental stress by inducing the synthesis of proteins collectively known as heat-shock proteins (hsp) [ ]. Amongst them is a family of proteins with an average molecular weight of 20 Kd, known as the hsp20 proteins []. These seem to act as chaperones that can protect other proteins against heat-induced denaturation and aggregation. Hsp20 proteins seem to form large heterooligomeric aggregates.These low-molecular-weight proteins are evolutionarily related to alpha-crystallin [ ]. Alpha-crystallin is an abundant constituent of the eye lens of most vertebrate species. Its main function appears to be to maintain the correct refractive index and transparency of the lens. It is also found in other tissues where it seems to act as a chaperone [, ]. Other related proteins include certain surface antigens [].This entry represents a conserved C-terminal domain of about 100 residues characteristic of this group of proteins [ ].
Protein Domain
Name: Probable glycine dehydrogenase (decarboxylating) subunit 2
Type: Family
Description: The P protein is part of the glycine decarboxylase multienzyme complex (GDC), also annotated as glycine cleavage system or glycine synthase. GDC consists of four proteins P, H, L and T [ ]. The P protein () binds the alpha-amino group of glycine through its pyridoxal phosphate cofactor, carbon dioxide is released and the remaining methylamin moiety is then transferred to the lipoamide cofactor of the H protein. The reaction catalysed by this protein is: Glycine + lipoylprotein = S-aminomethyldihydrolipoylprotein + CO2 The subunit composition of glycine cleavage system P proteins have been classified into two types. Those from eukaryotes and some of the P proteins from prokaryotes (e.g. Escherichia coli) are in the homodimeric form. The rest of those from prokaryotes are heterotetrameric, with two different subunits which, based on sequence similarities, correspond respectively to the N and C-terminal halves of the eukaryotic subunit [ ].This entry represents the probable glycine dehydrogenase (decarboxylating) subunit 2 from prokaryotes.
Protein Domain
Name: Surface antigen D15-like
Type: Family
Description: This entry includes proteins with a domain similar to that of bacterial surface antigen (D15) from Haemophilus influenzae. Proteins include:Outer membrane protein assembly factor BamA, which forms part of the outer membrane protein assembly complex involved in assembly and insertion of β-barrel proteins into the outer membrane in Gram-negative bacteria [ ].Sorting and assembly machinery component 50, which is required for the maintenance of the structure of mitochondrial cristae and the proper assembly of the mitochondrial respiratory chain complexes [ , ].Translocation and assembly module subunit TamA, which is a component of the translocation and assembly module (TAM) autotransporter assembly complex which translocates autotransporters across the outer membrane [ ].Outer envelope protein 80, chloroplastic, for which knockout is embryonic lethal [ ].Outer envelope protein 39, chloroplastic [ ].Protein TOC75, chloroplastic (translocon at the outer envelope membrane of chloroplasts, 75 kD), which functions as a preprotein translocation channel during chloroplast import [ ].
Protein Domain
Name: P-type trefoil, conserved site
Type: Conserved_site
Description: A cysteine-rich domain of approximately forty five amino-acid residues has been found in some extracellular eukaryotic proteins [ , , , ]. It is known as either the 'P', 'trefoil' or 'TFF' domain, and contains six cysteines linked by three disulphide bonds with connectivity 1-5, 2-4, 3-6. The domain has been found in a variety of extracellular eukaryotic proteins [, , ], including protein pS2 (TFF1), a protein secreted by the stomach mucosa; spasmolytic polypeptide (SP) (TFF2), a protein of about 115 residues that inhibits gastrointestinal motility and gastric acid secretion; intestinal trefoil factor (ITF) (TFF3); Xenopus laevis stomach proteins xP1 and xP4; Xenopus integumentary mucins A.1 (FIM-A.1 or preprospasmolysin) and C.1 (FIM-C.1), proteins which may be involved in defence against microbial infections by protecting the epithelia from the external environment; Xenopus skin protein xp2 (or APEG); Zona pellucida sperm-binding protein B (ZP-B); intestinal sucrase-isomaltase (/ ), a vertebrate membrane bound, multifunctional enzyme complex which hydrolyses sucrose, maltose and isomaltose; and lysosomal alpha-glucosidase ( ).
Protein Domain
Name: Bacteriophage, G3P, N2-domain superfamily
Type: Homologous_superfamily
Description: The G3P protein (also known as attachment protein or coat protein A) of filamentous phage such as M13, phage fd and phage f1, is an essential coat protein for the infection of Escherichia coli. The G3P protein consists of three domains: two N-terminal domains (N1 and N2) with a similar β-barrel fold, and a C-terminal domain [ ]. The N-terminal domains protrude from the phage surface, while the C-terminal domain acts as an anchor embedded in the phage coat, together forming a horseshoe-like structure []. The G3P protein exists as 3-5 copies at the tip of the phage particle. Infection by filamentous phage occurs in two steps, both of which are mediated by the G3P protein: phage attachment to the F-pilus of the host cell as the primary receptor, followed by attachment to the C-terminal domain of the periplasmic protein TolA as a co-receptor. This superfamily represents the second N-terminal domain, N2, of the filamentous phage coat protein G3P.
Protein Domain
Name: Alpha-2-macroglobulin RAP, domain 3
Type: Domain
Description: This entry is the C-terminal domain (D3) of receptor-associated protein, RAP, also known as alpha-2-macroglobulin receptor-associated protein. RAP is a three domain ER (endoplasmic reticulum)-resident protein that is a chaperone for the LRP (low-density lipoprotein receptor-related protein). RAP is an antagonist and a specialized chaperone that binds tightly to members of the low-density lipoprotein (LDL) receptor family and prevents them from associating with other ligands [ ]. D3 is required for folding and trafficking of low-density lipoprotein receptor-related protein (LRP) [, ]. In the mildly acidic pH of the Golgi, unfolding of RAP-D3 helical bundle facilitates dissociation of RAP from the LDL receptor type A (LA) repeats of LDLR family proteins []. Also, RAP has 3 regions that interact weakly with heparin, two regions located in D3 and one in RAP domain 2 (D2) []. The double module of complement type repeats, CR56, of LRP binds many ligands including alpha2-macroglobulin, which promotes the catabolism of the Abeta-peptide implicated in Alzheimer's disease [].
Protein Domain
Name: PMP-22/EMP/MP20
Type: Family
Description: Several vertebrate small integral membrane glycoproteins are evolutionary related [ , , ], including eye lens specific membrane protein 20 (MP20 or MP19); epithelial membrane protein-1 (EMP-1), which is also known as tumor-associated membrane protein (TMP) or as squamous cell-specific protein Cl-20; epithelial membrane protein-2 (EMP-2), which is also known as XMP; epithelial membrane protein-3 (EMP-3), also known as YMP; and peripheral myelin protein 22 (PMP-22), which is expressed in many tissues but mainly by Schwann cells as a component of myelin of the peripheral nervous system (PNS). PMP-22 probably plays a role both in myelinization and in cell proliferation. Mutations affecting PMP-22 are associated with hereditary motor and sensory neuropathies such as Charcot-Marie-Tooth disease type 1A (CMT-1A) in human or the trembler phenotype in mice. The proteins of this family are about 160 to 173 amino acid residues in size, and contain four transmembrane segments. PMP-22, EMP-1, -2 and -3 are highly similar, while MP20 is more distantly related.
Protein Domain
Name: YwkD-like domain
Type: Domain
Description: Proteins containing this domain include Bacillus subtilis YwkD and related proteins. They belong to an uncharacterised subfamily of the vicinal oxygen chelate (VOC) family. The VOC superfamily is composed of structurally related proteins with paired beta.alpha.beta.beta.beta motifs that provide a metal coordination environment with two or three open or readily accessible coordination sites to promote direct electrophilic participation of the metal ion in catalysis [ , ].VOC domain is found in a variety of structurally related metalloproteins, including the bleomycin resistance protein, glyoxalase I, and type I ring-cleaving dioxygenases. A bound metal ion is required for protein activities for the members of this superfamily. A variety of metal ions have been found in the catalytic centres of these proteins including Fe(II), Mn(II), Zn(II), Ni(II) and Mg(II). The protein superfamily contains members with or without domain swapping. The proteins of this family share three conserved metal binding amino acids with the type I extradiol dioxygenases, which shows no domain swapping [ , ].
Protein Domain
Name: WAP-type 'four-disulfide core' domain
Type: Domain
Description: The four-disulfide core (4-DSC) or WAP domain comprises eight cysteine residues involved in disulfide bonds in a conserved arrangement [ ]. The four disulphide core containing Whey Acidic Proteins (WAP) are the major whey proteins in the milk of many mammals and are considered to be the prototypic members of the family. However the WAP domain is not exclusive to WAP proteins, but it is found in many other proteins, a number of which have been shown to exhibit antiproteinase function [].One or more of the WAP domains occur in the WDNM1 protein, which is involved in the metastatic potential of adenocarcinomas in rats [ ]; Kallmann syndrome protein []; caltrin-like protein II from guinea pig [], which inhibits calcium transport into spermatozoa; elafin, a serine elastase inhibitor which belongs to MEROPS inhibitor family I17 []; and papilin, a metalloendopeptidase inhibitor which belongs to MEROPS inhibitor family I2 and is effective against procollagen N-proteinase [].
Protein Domain
Name: VPS9 domain
Type: Domain
Description: Rab proteins form a family of signal-transducing GTPases that cycle between active GTP-bound and inactive GDP-bound forms. The Rab5 GTPase is an essentialregulator of endocytosis and endosome biogenesis. Rab5 is activated by GDP-GTP exchange factors (GEFs) that contain a VPS9 domain and generate the Rab5-GTPcomplex [ ]. The VPS9 domain catalyzes nucleotide exchange on Rab5 or the yeast homologue VPS21. The domain has a length of ~140 residues and forms thecentral part of the yeast VPS9 (vacuolar protein sorting-associated) protein, which acts as a GEF for VPS21. Some domains which can occur in combinationwith the VPS9 domain are CUE, A20-type zinc finger, Ras-associating (RA), SH2, RCC1, DH, PH, rasGAP, MORN and ankyrin repeat.Structurally, the VPS9 domain adopts a layered fold of six alpha helices. Conserved residues from the fourth and sixth helices and the loops N-terminal to these helices form the surface that interacts with Rab5and Rab21 [ ].Some proteins known to contain a VPS9 domain:Fungal Vacuolar Protein Sorting-associated protein VPS9, a guanine nucleotide exchange factor for the Rab-like GTPase VPS21. VPS9 is neededfor the transport of proteins from biosynthetic and endocytic pathways into the vacuole.Mammalian Rab5 GDP/GTP exchange factor or Rabex-5 (Rababtin-5 associated exchange factor for Rab5), catalyzes nucleotide exchange on RAB5A. Rabex-5promotes endocytic membrane fusion and is involved in membrane trafficking of recycling endosomes.Mammalian Ras and Rab interactor 1 (RIN1), 2 (RIN2) and 3 (RIN3).Mammalian alsin or Amyotrophic Lateral Sclerosis protein 2 (ALS2). Different juvenile-onset forms of neurodegenerative diseases (ALS2, JPLS,IAHSP) are caused by mutations in the ALS2 gene, which result in truncated alsin lacking the C-terminal part of the VPS9 domain.Fruit fly protein sprint, which is a RIN homologue.Caenorhabditis elegans RME-6 protein, which is conserved among metazoans.
Protein Domain
Name: 26S proteasome regulatory subunit P45-like
Type: Family
Description: Intracellular proteins, including short-lived proteins such as cyclin, Mos, Myc, p53, NF-kappaB, and IkappaB, are degraded by the ubiquitin-proteasome system. The 26S proteasome is a self-compartmentalising protease responsible for the regulated degradation of intracellular proteins in eukaryotes [ , ]. This giant intracellular protease is formed by several subunits arranged into two 19S polar caps, where protein recognition and ATP-dependent unfolding occur, flanking a 20S central barrel-shaped structure with an inner proteolytic chamber. This overall structure is highly conserved among eukaryotes and is essential for cell viability. Proteins targeted to the 26S proteasome are conjugated with a polyubiquitin chain by an enzymatic cascade before delivery to the 26S proteasome for degradation into oligopeptides.The 26S proteasome can be divided into two subcomplexes: the 19S regulatory particle (RP) and the 20S core particle (CP) []. The 19S component is divided into a "base"subunit containing six ATPases (Rpt proteins) and two non-ATPases (Rpn1, Rpn2), and a "lid"subunit composed of eight stoichiometric proteins (Rpn3, Rpn5, Rpn6, Rpn7, Rpn8, Rpn9, Rpn11, Rpn12) [ ]. Additional non-essential and species specific proteins may also be present. The 19S unit performs several essential functions including binding the specific protein substrates, unfolding them, cleaving the attached ubiquitin chains, opening the 20S subunit, and driving the unfolded polypeptide into the proteolytic chamber for degradation. The 26s proteasome and 19S regulator are of medical interest due to their involvement in burn rehabilitation [].This family includes the six paralogous AAA-ATPases, termed Rpt1-Rpt6, of the 19S component base [ , ]. Members of this family may be phosphorylated within the proteasome. This phosphorylation event may play a key role in ATP-dependent proteolysis because a good correlation exists between the inhibition pattern of protein kinase inhibitors against the phosphorylation of p45 (Rpt6) and that against the ATP-dependent proteolytic activity [, ].
Protein Domain
Name: Apoptosis regulator, Mcl-1
Type: Family
Description: Apoptosis, or programmed cell death (PCD), is a common and evolutionarily conserved property of all metazoans [ ]. In many biological processes, apoptosis is required to eliminate supernumerary or dangerous (such as pre-cancerous) cells and to promote normal development. Dysregulation of apoptosis can, therefore, contribute to the development of many major diseases including cancer, autoimmunity and neurodegenerative disorders. In most cases, proteins of the caspase family execute the genetic programme that leads to cell death. Bcl-2 proteins are central regulators of caspase activation, and play a key role in cell death by regulating the integrity of the mitochondrial and endoplasmic reticulum (ER) membranes [ ]. At least 20 Bcl-2 proteins have been reported in mammals, and several others have been identified in viruses. Bcl-2 family proteins fall roughly into three subtypes, which either promote cell survival (anti-apoptotic) or trigger cell death (pro-apoptotic). All members contain at least one of four conserved motifs, termed Bcl-2 Homology (BH) domains. Bcl-2 subfamily proteins, which contain at least BH1 and BH2, promote cell survival by inhibiting the adapters needed for the activation of caspases.Pro-apoptotic members potentially exert their effects by displacing the adapters from the pro-survival proteins; these proteins belong either to the Bax subfamily, which contain BH1-BH3, or to the BH3 subfamily, which mostly only feature BH3 [ ]. Thus, the balance between antagonistic family members is believed to play a role in determining cell fate. Members of the wider Bcl-2 family, which also includes Bcl-x, Bcl-w and Mcl-1, are described by their similarity to Bcl-2 protein, a member of the pro-survival Bcl-2 subfamily []. Full-length Bcl-2 proteins feature all four BH domains, seven α-helices, and a C-terminal hydrophobic motif that targets the protein to the outer mitochondrial membrane, ER and nuclear envelope.
Protein Domain
Name: Zinc finger, AD-type
Type: Domain
Description: Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [ , , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. The zinc finger-associated domain (Znf-AD domain, also known as ZAD), is found at the N terminus of the C2H2 proteins of many arthropods [ ]. In vertebrates, only one protein containing an N-terminal structure similar to the ZAD has been found, Zinc finger protein 276. This domain forms an atypical treble-cleft-like zinc co-ordinating fold. The Znf-AD domain is a protein-protein interaction module with the capability to form homodimers, but does not bind to DNA [, , ].
Protein Domain
Name: Ras guanine-nucleotide exchange factor, catalytic domain superfamily
Type: Homologous_superfamily
Description: Ras proteins are membrane-associated molecular switches that bind GTP and GDP and slowly hydrolyze GTP to GDP [ ] in fundamental events such as signal transduction, cytoskeleton dynamics and intracellular trafficking. The balance between the GTP bound (active) and GDP bound (inactive) states is regulated by the opposite action of proteins activating the GTPase activity and that of proteins which promote the loss of bound GDP and the uptake of fresh GTP [, ]. The latter proteins are known as guanine-nucleotide exchange (or releasing) factors (GEFs or GRFs) (or also as guanine-nucleotide dissociation stimulators (GDSs)). GEFs catalyze the dissociation of GDP from the inactive GTP-binding proteins. GTP can then bind and induce structural changes that allow interaction with effectors [, ].The crystal structure of the GEF region of human Sos1 complexes with Ras has been solved [ ]. The structure consists of two distinct alpha helical structural domains: the N-terminal domain which seems to have a purely structural role and the C-terminal domain which is sufficient for catalytic activity and contains all residues that interact with Ras. A main feature of the catalytic domain is the protrusion of a helical hairpin important for the nucleotide-exchange mechanism. The N-terminal domain is likely to be important for the stability and correct placement of the hairpin structure.Some proteins known to contain a Ras-GEF domain are listed below:Cdc25 from yeast.Scd25 from yeast.Ste6 from fission yeast.Son of sevenless (gene sos) from Drosophila and mammals.p140-RAS GRF (cdc25Mm) from mammals. This protein possesses both a domain belonging to the CDC25 family and one belonging to the CDC24 family.Bud5 from yeast, that may interact with the ras-like protein RSR1/BUD1.Lte1 from yeast, whose target protein is not yet known.ralGDS from mammals, which interacts with the ras-like proteins ralA and ralB [ ].This entry represents the C-terminal catalytic domain of the Ras guanine-nucleotide exchange factors.
Protein Domain
Name: Protease inhibitor
Type: Family
Description: This family of proteins represent monomeric serralysin inhibitors of about 125 residues, which interact with specific metalloprotease which are synthesised by serralysin secretors and characterised by being plant, insect and animal pathogens. It is probable that the serralysin inhibitors protect the host from proteolysis during export of the protease. The members of this family belong to MEROPS proteinase inhibitor family I38, clan IK.X-ray crystallography of a complex between the Serratia marcescens protease, SmaPI, and the inhibitor of Erwinia chrysanthemi, Inh, reveals that Inh is folded into an eight-stranded b-barrel with an N-terminal trunk of 10 residues. Residues 1-5 occupy part of the extended active site of the proteinase, thereby preventing access of the substrate. Residues 6-10 form a linker that connects the N-terminal proteinase-binding peptide to the body of the b-barrel. The backbone carbonyl of Ser-1 interacts with the catalytic zinc; the Ser-2 side chain occupies the S1'-binding site and also forms a hydrogen bond to the carboxyl end of the catalytic Glu, whereas Leu-3 occupies the S2' recognition site. Penetration of the trunk region further than 5 residues into the substrate binding cleft appears to be prevented by the b-barrel, which itself interacts with the proteinase near its Met turn (19). Peptide mimetics of the trunk at concentrations up to about 100 mM do not inhibit the protease, demonstrating that the barrel is essential for inhibitory activity [ , ].Structurally and functionally these inhibitors are closely related to the lipocalins, fatty acid-binding proteins, avidins and the enigmatic triabin.Together these five protein families constitute the calycin superfamily [ ]. The proteins are characterised by their high specificity for small hydrophobic molecules and by their ability to form complexes with soluble macromolecules either through intramolecular disulphides or protein-protein interactions [ ].
Protein Domain
Name: EutK-Ctail domain
Type: Domain
Description: Bacterial microcompartments (BMCs) are large proteinaceous structures comprised of a roughly icosahedral shell and a series of encapsulated enzymes. They function as organelles by sequestering particular metabolic processes within the cell. A shell or capsid, which is composed of a few thousand protein subunits, surrounds a series of sequentially acting enzymes and controls the diffusion of substrates and products (including toxic or volatile intermediates) into and out of the lumen. One BMC, which is dedicated to ethanolamine utilization (Eut), is present in several bacteria, including Salmonella enterica and Escherichia coli. The cellular function of the Eut microcompartment is to metabolize ethanolamine without allowing the release of acetaldehyde into the cytosol, thus mitigating the potentially toxic effects of excess aldehyde in the bacterial cytosol and also preventing the volatile acetaldehyde from diffusing across the cell membrane and leading to a loss of carbon. The shells of BMCs are made primarily of a family of proteins whose structural core is the BMC domain. Four homologous proteins (EutS, EutL, EutK, and EutM) are thought to be the major shell constituents of a functionally complex Eut microcompartment. A fifth protein, EutN is a minor shell component but not homologous to the BMC shell proteins [ ].The EutK shell protein is distinct among the shell proteins in the Eut microcompartment. First, although all other BMC proteins studied to date form hexamers (or pseudohexamers, like EutL), EutK is a monomer in solution. The apparent inability of EutK to assemble into a hexamer by itself suggests that different BMC paralogs might form mixed hexamers during assembly of the shell. Second, EutK has an extra protein domain (EutK-Ctail) of about 60 amino acids following the conserved BMC domain. The EutK-Ctail domain is a helix-turn- helix domain that might bind nucleic acids [].
Protein Domain
Name: Intermediate filament, rod domain
Type: Domain
Description: Intermediate filaments (IF) [ , ] are proteins which are primordialcomponents of the cytoskeleton and the nuclear envelope. They generally form filamentous structures 8 to 14 nm wide. IF proteins are members of a verylarge multigene family of proteins which has been subdivided in six types: Type I: Acidic cytokeratins.Type II: Basic cytokeratins.Type III: Vimentin, desmin, glial fibrillary acidic protein (GFAP), peripherin, and plasticin.Type IV: Neurofilaments L, H and M, alpha-internexin and nestin.Type V: Nuclear lamins A, B1, B2 and C.Type VI: 'Orphan' IF proteins, which are more distant in terms of their amino acid sequences.All IF proteins are structurally similar in that they consist of: a central rod domain comprising some 300 to 350 residues which is arranged in coiled-coiled α-helices, with at least two short characteristic interruptions; a N-terminal non-helical domain (head) of variable length; and a C-terminaldomain (tail) which is also non-helical, and which shows extreme length variation between different IF proteins.While IF proteins are evolutionary and structurally related, they have limited sequence homologies except in several regions of the rod domain. The IF roddomain is approximately 310 residues long in all cytoplasmic IF proteins and close to 350 residues in the nuclear ones. The IF rod domain exhibits aninterrupted α-helical conformation and reveals a pronounced seven-residue periodicity in the distribution of apolar residues.The heptad periodicity within the rod domain is interrupted in several places, which generates four consecutive α-helical segments: 1A and 1B, whichtogether form the so-called coil 1, and 2A and 2B, which form coil 2. The four α-helical segments are interconnected by relatively short, variablelinkers L1, L12 and L2 [ , ].IF proteins have a very strong tendency to dimerize via the formation of an α-helical coiled coil (CC) by their rod domains [].
Protein Domain
Name: Ras guanine-nucleotide exchange factors catalytic domain
Type: Domain
Description: Ras proteins are membrane-associated molecular switches that bind GTP and GDP and slowly hydrolyze GTP to GDP [ ] in fundamental events such as signal transduction, cytoskeleton dynamics and intracellular trafficking. The balance between the GTP bound (active) and GDP bound (inactive) states is regulated by the opposite action of proteins activating the GTPase activity and that of proteins which promote the loss of bound GDP and the uptake of fresh GTP [, ]. The latter proteins are known as guanine-nucleotide exchange (or releasing) factors (GEFs or GRFs) (or also as guanine-nucleotide dissociation stimulators (GDSs)). GEFs catalyze the dissociation of GDP from the inactive GTP-binding proteins. GTP can then bind and induce structural changes that allow interaction with effectors [, ].The crystal structure of the GEF region of human Sos1 complexes with Ras has been solved [ ]. The structure consists of two distinct alpha helical structural domains: the N-terminal domain which seems to have a purely structural role and the C-terminal domain which is sufficient for catalytic activity and contains all residues that interact with Ras. A main feature of the catalytic domain is the protrusion of a helical hairpin important for the nucleotide-exchange mechanism. The N-terminal domain is likely to be important for the stability and correct placement of the hairpin structure. Some proteins known to contain a Ras-GEF domain are listed below:Cdc25 from yeast.Scd25 from yeast.Ste6 from fission yeast.Son of sevenless (gene sos) from Drosophila and mammals.p140-RAS GRF (cdc25Mm) from mammals. This protein possesses both a domain belonging to the CDC25 family and one belonging to the CDC24 family.Bud5 from yeast, that may interact with the ras-like protein RSR1/BUD1.Lte1 from yeast, whose target protein is not yet known.ralGDS from mammals, which interacts with the ras-like proteins ralA and ralB [ ].This entry represents the catalytic domain of the Ras guanine-nucleotide exchange factors.
Protein Domain
Name: Toll/interleukin-1 receptor homology (TIR) domain superfamily
Type: Homologous_superfamily
Description: Toll proteins or Toll-like receptors (TLRs) and the interleukin-1 receptor (IL-1R) superfamily are both involved in innate antibacterial and antifungal immunity in insects as well as in mammals. These receptors share a conserved cytoplasmic domain of approximately 200 amino acids, known as the Toll/IL-1R homologous region (TIR). The similarity between TLRs and IL-1Rs is not restricted to sequence homology since these proteins also share a similar signalling pathway. They both induce the activation of a Rel type transcription factor via an adaptor protein and a protein kinase []. Interestingly, MyD88, a cytoplasmic adaptor protein found in mammals, contains a TIR domain associated to a DEATH domain [, , ]. Besides the mammalian and Drosophila proteins, a TIR domain is also found in a number of plant cytoplasmic proteins implicated in host defense [].Site directed mutagenesis and deletion analysis have shown that the TIR domain is essential for Toll and IL-1R activities. Sequence analysis have revealed the presence of three highly conserved regions among the different members of the family: box 1 (FDAFISY), box 2 (GYKLC-RD-PG), and box 3 (a conserved W surrounded by basic residues). It has been proposed that boxes 1 and 2 are involved in the binding of proteins involved in signalling, whereas box 3 is primarily involved in directing localization of receptor, perhaps through interactions with cytoskeletal element [ ].Resolution of the crystal structures of the TIR domains of human Toll-like receptors 1 and 2 has shown that they contain a central five-stranded parallel β-sheet that is surrounded by a total of five helices on both sides, with connecting loop structures [ ]. The loop regions appear to play an important role in mediating the specificity of protein-protein interactions [, ].
Protein Domain
Name: Apoptosis regulator, Bcl-2/ BclX
Type: Family
Description: Apoptosis, or programmed cell death (PCD), is a common and evolutionarily conserved property of all metazoans [ ]. In many biological processes, apoptosis is required to eliminate supernumerary or dangerous (such as pre-cancerous) cells and to promote normal development. Dysregulation of apoptosis can, therefore, contribute to the development of many major diseases including cancer, autoimmunity and neurodegenerative disorders. In most cases, proteins of the caspase family execute the genetic programme that leads to cell death.Bcl-2 proteins are central regulators of caspase activation, and play a key role in cell death by regulating the integrity of the mitochondrial and endoplasmic reticulum (ER) membranes [ ]. At least 20 Bcl-2 proteins have been reported in mammals, and several others have been identified in viruses. Bcl-2 family proteins fall roughly into three subtypes, which either promote cell survival (anti-apoptotic) or trigger cell death (pro-apoptotic). All members contain at least one of four conserved motifs, termed Bcl-2 Homology (BH) domains. Bcl-2 subfamily proteins, which contain at least BH1 and BH2, promote cell survival by inhibiting the adapters needed for the activation of caspases.Pro-apoptotic members potentially exert their effects by displacing the adapters from the pro-survival proteins; these proteins belong either to the Bax subfamily, which contain BH1-BH3, or to the BH3 subfamily, which mostly only feature BH3 [ ]. Thus, the balance between antagonistic family members is believed to play a role in determining cell fate. Members of the wider Bcl-2 family, which also includes Bcl-x, Bcl-w and Mcl-1, are described by their similarity to Bcl-2 protein, a member of the pro-survival Bcl-2 subfamily []. Full-length Bcl-2 proteins feature all four BH domains, seven α-helices, and a C-terminal hydrophobic motif that targets the protein to the outer mitochondrial membrane, ER and nuclear envelope.
Protein Domain
Name: Apoptosis regulator, Bcl-2
Type: Family
Description: Apoptosis, or programmed cell death (PCD), is a common and evolutionarily conserved property of all metazoans [ ]. In many biological processes, apoptosis is required to eliminate supernumerary or dangerous (such as pre-cancerous) cells and to promote normal development. Dysregulation of apoptosis can, therefore, contribute to the development of many major diseases including cancer, autoimmunity and neurodegenerative disorders. In most cases, proteins of the caspase family execute the genetic programme that leads to cell death.Bcl-2 proteins are central regulators of caspase activation, and play a key role in cell death by regulating the integrity of the mitochondrial and endoplasmic reticulum (ER) membranes [ ]. At least 20 Bcl-2 proteins have been reported in mammals, and several others have been identified in viruses. Bcl-2 family proteins fall roughly into three subtypes, which either promote cell survival (anti-apoptotic) or trigger cell death (pro-apoptotic). All members contain at least one of four conserved motifs, termed Bcl-2 Homology (BH) domains. Bcl-2 subfamily proteins, which contain at least BH1 and BH2, promote cell survival by inhibiting the adapters needed for the activation of caspases.Pro-apoptotic members potentially exert their effects by displacing the adapters from the pro-survival proteins; these proteins belong either to the Bax subfamily, which contain BH1-BH3, or to the BH3 subfamily, which mostly only feature BH3 [ ]. Thus, the balance between antagonistic family members is believed to play a role in determining cell fate. Members of the wider Bcl-2 family, which also includes Bcl-x, Bcl-w and Mcl-1, are described by their similarity to Bcl-2 protein, a member of the pro-survival Bcl-2 subfamily []. Full-length Bcl-2 proteins feature all four BH domains, seven α-helices, and a C-terminal hydrophobic motif that targets the protein to the outer mitochondrial membrane, ER and nuclear envelope.
Protein Domain
Name: CD9, extracellular domain
Type: Domain
Description: This entry represents the extracellular domain or large extracellular loop (LEL) of CD9. This extracellular domain lies between the 3rd and 4th trans-membrane segment. CD9 belongs to the tetraspanin family of membrane proteins and is found in virtually all tissues, it is potentially involved in developmental processes. It associates with the tetraspanins CD81 and CD63, as well as with some integrin, and has been shown to be involved in a variety of activation, adhesion, and cell motility functions, as well as cell-cell interactions - such as during fertilization [ , , ]. Tetraspanins are a distinct family of cell surface proteins, containing four conserved transmembrane domains: a small outer loop (EC1), a larger outer loop (EC2), a small inner loop (IL) and short cytoplasmic tails. They contain characteristic structural features, including 4-6 conserved extracellular cysteine residues, and polar residues within transmembrane domains. A fundamental role of tetraspanins appears to be organising other proteins into a network of multimolecular membrane microdomains, sometimes called the 'tetraspanin web'. Within this web there are primary complexes in which tetraspanins show robust, specific, and direct lateral associations with other proteins. The strong tendency of tetraspanins to associate with each other probably contributes to the assembly of a network of secondary interactions in which non-tetraspanin proteins are associated with each other via palmitoylated tetraspanins acting as linker proteins. In addition, the association of lipids, such as gangliosides and cholesterol, probably contributes to the assembly of even larger tetraspanin complexes, which have some lipid raft-like properties (e.g. resistance to solubilization in non-ionic detergents). Within the tetraspanin web, tetraspanin proteins can associate not only with integrins and other transmembrane proteins, but also with signalling enzymes such as protein kinase C and phosphatidylinositol-4 kinase. Thus, the tetraspanin web provides a mechanistic framework by which membrane protein signalling can be expanded into a lateral dimension [ ].
Protein Domain
Name: Alphavirus nsP2 protease domain superfamily
Type: Homologous_superfamily
Description: The superfamily of alphaviruses includes 26 known members. They infect a variety of hosts including mosquitoes, birds, rodents and other mammals with worldwide distribution. Alphaviruses also pose a potential threat to human health in many areas. For example, Venezuelan Equine Encephalitis Virus (VEEV) causes encephalitis in humans as well as livestock in Central and South America, and some variants of Sinbis Virus (SIN) and Semliki Forest Virus (SFV) have been found to cause fever and arthritis in humans [ ].Alphaviruses possess a single-stranded RNA genome of approximately 12 kb. The genomic RNA of alphaviruses is translated into two polyproteins that, respectively, encode structural proteins and nonstructural proteins. The nonstructural proteins may be translated as one or two polyproteins, nsp123 or nsp1234, depending on the virus. These polyproteins are cleaved to generate nsp1, nsp2, nsp3 and nsp4 by a protease activity that resides within nsp2. The nsp2 protein of alphaviruses has multiple enzymatic activities. Its N-terminal domain has been shown to possess ATPase and GTPase activity, RNA helicase activity and RNA 5'-triphosphatase activity. The C-terminal nsp2pro domain of nsp2 is responsible for the regulation of 26S subgenome RNA synthesis, switching between negative- and positive-strand RNA synthesis, targeting nsp2 for nuclear transport and proteolytic processing of the nonstructural polyprotein [ , ]. The nsp2pro domain is a member of peptidase family C9 of clan CA.The nsp2pro domain consists of two distinct subdomains. The nsp2pro N-terminal subdomain is largely α-helical and contains the catalytic dyad cysteine and histidine residues organised in a protein fold that differs significantly from any known cysteine protease or protein folds. The nsp2pro C-terminal subdomain displays structural similarity to S-adenosyl-L-methionine-dependent RNA methyltransferases and provides essential elements that contribute to substrate recognition and may also regulate the structure of the substrate binding cleft [].This domain covers the entire nsp2pro domain.
Protein Domain
Name: Mediator complex, subunit Med6, metazoa/plant
Type: Family
Description: The Mediator complex is a coactivator involved in the regulated transcription of nearly all RNA polymerase II-dependent genes. Mediator functions as a bridge to convey information from gene-specific regulatory proteins to the basal RNA polymerase II transcription machinery. The Mediator complex, having a compact conformation in its free form, is recruited to promoters by direct interactions with regulatory proteins and serves for the assembly of a functional preinitiation complex with RNA polymerase II and the general transcription factors. On recruitment the Mediator complex unfolds to an extended conformation and partially surrounds RNA polymerase II, specifically interacting with the unphosphorylated form of the C-terminal domain (CTD) of RNA polymerase II. The Mediator complex dissociates from the RNA polymerase II holoenzyme and stays at the promoter when transcriptional elongation begins. The Mediator complex is composed of at least 31 subunits: MED1, MED4, MED6, MED7, MED8, MED9, MED10, MED11, MED12, MED13, MED13L, MED14, MED15, MED16, MED17, MED18, MED19, MED20, MED21, MED22, MED23, MED24, MED25, MED26, MED27, MED29, MED30, MED31, CCNC, CDK8 and CDC2L6/CDK11. The subunits form at least three structurally distinct submodules. The head and the middle modules interact directly with RNA polymerase II, whereas the elongated tail module interacts with gene-specific regulatory proteins. Mediator containing the CDK8 module is less active than Mediator lacking this module in supporting transcriptional activation. The head module contains: MED6, MED8, MED11, SRB4/MED17, SRB5/MED18, ROX3/MED19, SRB2/MED20 and SRB6/MED22. The middle module contains: MED1, MED4, NUT1/MED5, MED7, CSE2/MED9, NUT2/MED10, SRB7/MED21 and SOH1/MED31. CSE2/MED9 interacts directly with MED4. The tail module contains: MED2, PGD1/MED3, RGR1/MED14, GAL11/MED15 and SIN4/MED16. The CDK8 module contains: MED12, MED13, CCNC and CDK8. Individual preparations of the Mediator complex lacking one or more distinct subunits have been variously termed ARC, CRSP, DRIP, PC2, SMCC and TRAP.Regulation of mRNA synthesis requires intermediary proteins that transduce regulatory signals from upstream transcriptional activator proteins to basal transcription machinery at the core promoter. Three types of intermediary factors that enable the basal transcription machinery to respond to transcriptional activator proteins bound to regulatory DNA sequences have been identified: (i) TAFIIs, which associate with TATA-binding protein (TBP) to form TFIID; (ii) mediator, which associates with RNA polymerase II to form a holo-polymerase; and (iii) coactivators such as human upstream stimulatory activity (USA), mammalian CBP/P300, yeast ADA complex, and HMG proteins. The interaction of these multiprotein complexes with activators and general transcription factors is essential for transcriptional regulation. This family of proteins represent the transcriptional mediator protein that is required for activation of many RNA polymerase II promoters and which are conserved from yeast to humans [ ].This group represents a RNA polymerase II mediator complex, subunit 6, metazoa/plant types.
Protein Domain
Name: Mediator complex, subunit Med6
Type: Family
Description: The Mediator complex is a coactivator involved in the regulated transcription of nearly all RNA polymerase II-dependent genes. Mediator functions as a bridge to convey information from gene-specific regulatory proteins to the basal RNA polymerase II transcription machinery. The Mediator complex, having a compact conformation in its free form, is recruited to promoters by direct interactions with regulatory proteins and serves for the assembly of a functional preinitiation complex with RNA polymerase II and the general transcription factors. On recruitment the Mediator complex unfolds to an extended conformation and partially surrounds RNA polymerase II, specifically interacting with the unphosphorylated form of the C-terminal domain (CTD) of RNA polymerase II. The Mediator complex dissociates from the RNA polymerase II holoenzyme and stays at the promoter when transcriptional elongation begins. The Mediator complex is composed of at least 31 subunits: MED1, MED4, MED6, MED7, MED8, MED9, MED10, MED11, MED12, MED13, MED13L, MED14, MED15, MED16, MED17, MED18, MED19, MED20, MED21, MED22, MED23, MED24, MED25, MED26, MED27, MED29, MED30, MED31, CCNC, CDK8 and CDC2L6/CDK11. The subunits form at least three structurally distinct submodules. The head and the middle modules interact directly with RNA polymerase II, whereas the elongated tail module interacts with gene-specific regulatory proteins. Mediator containing the CDK8 module is less active than Mediator lacking this module in supporting transcriptional activation. The head module contains: MED6, MED8, MED11, SRB4/MED17, SRB5/MED18, ROX3/MED19, SRB2/MED20 and SRB6/MED22. The middle module contains: MED1, MED4, NUT1/MED5, MED7, CSE2/MED9, NUT2/MED10, SRB7/MED21 and SOH1/MED31. CSE2/MED9 interacts directly with MED4. The tail module contains: MED2, PGD1/MED3, RGR1/MED14, GAL11/MED15 and SIN4/MED16. The CDK8 module contains: MED12, MED13, CCNC and CDK8. Individual preparations of the Mediator complex lacking one or more distinct subunits have been variously termed ARC, CRSP, DRIP, PC2, SMCC and TRAP.Regulation of mRNA synthesis requires intermediary proteins that transduce regulatory signals from upstream transcriptional activator proteins to basal transcription machinery at the core promoter. Three types of intermediary factors that enable the basal transcription machinery to respond to transcriptional activator proteins bound to regulatory DNA sequences have been identified: (i) TAFIIs, which associate with TATA-binding protein (TBP) to form TFIID; (ii) mediator, which associates with RNA polymerase II to form a holo-polymerase; and (iii) coactivators such as human upstream stimulatory activity (USA), mammalian CBP/P300, yeast ADA complex, and HMG proteins. The interaction of these multiprotein complexes with activators and general transcription factors is essential for transcriptional regulation.This family of proteins represent the transcriptional mediator protein subunit 6 that is required for activation of many RNA polymerase II promoters and which are conserved from yeast to humans [].
Protein Domain
Name: Zinc finger, CHY-type
Type: Domain
Description: Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [ , , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. Pirh2 is an eukaryotic ubiquitin protein ligase, which has been shown to promote p53 degradation in mammals. Pirh2 physically interacts with p53 and promotes ubiquitination of p53 independently of MDM2. Like MDM2, Pirh2 is thought to participate in an autoregulatory feedback loop that controls p53 function. Pirh2 proteins contain three distinct zinc fingers, the CHY-type, the CTCHY-type which is C-terminal to the CHY-type zinc finger and a RING finger. The CHY-type zinc finger has no currently known function [ ].As well as Pirh2, the CHY-type zinc finger is also found in the following proteins:Yeast helper of Tim protein 13. Hot13 may have a role in the assembly and recycling of the small Tims, a complex of the mitochondrial intermembrane space that participates in the TIM22 import pathway for assembly of the inner membrane [ ] Several plant hypothetical proteins that also contain haemerythrin cation binding domainsSeveral protozoan hypothetical proteins that also contain a Myb domainThe solution structure of this zinc finger has been solved and binds three zinc atoms as shown in the following schematic representation: ++---------+-----+ || | |CXHYxxxxxxxxxCCxxxxxCxxCHxxxxxHxxxxxxxxxxxCxxCxxxxxxxxxCxxC | | | | | | | |+-+-----------------+--+ +--+---------+--+ 'C': conserved cysteine involved in the binding of one zinc atom.'H': conserved histidine involved in the binding of one zinc atom.
Protein Domain
Name: Mediator complex, subunit Med6, fungi
Type: Family
Description: The Mediator complex is a coactivator involved in the regulated transcription of nearly all RNA polymerase II-dependent genes. Mediator functions as a bridge to convey information from gene-specific regulatory proteins to the basal RNA polymerase II transcription machinery. The Mediator complex, having a compact conformation in its free form, is recruited to promoters by direct interactions with regulatory proteins and serves for the assembly of a functional preinitiation complex with RNA polymerase II and the general transcription factors. On recruitment the Mediator complex unfolds to an extended conformation and partially surrounds RNA polymerase II, specifically interacting with the unphosphorylated form of the C-terminal domain (CTD) of RNA polymerase II. The Mediator complex dissociates from the RNA polymerase II holoenzyme and stays at the promoter when transcriptional elongation begins. The Mediator complex is composed of at least 31 subunits: MED1, MED4, MED6, MED7, MED8, MED9, MED10, MED11, MED12, MED13, MED13L, MED14, MED15, MED16, MED17, MED18, MED19, MED20, MED21, MED22, MED23, MED24, MED25, MED26, MED27, MED29, MED30, MED31, CCNC, CDK8 and CDC2L6/CDK11. The subunits form at least three structurally distinct submodules. The head and the middle modules interact directly with RNA polymerase II, whereas the elongated tail module interacts with gene-specific regulatory proteins. Mediator containing the CDK8 module is less active than Mediator lacking this module in supporting transcriptional activation. The head module contains: MED6, MED8, MED11, SRB4/MED17, SRB5/MED18, ROX3/MED19, SRB2/MED20 and SRB6/MED22. The middle module contains: MED1, MED4, NUT1/MED5, MED7, CSE2/MED9, NUT2/MED10, SRB7/MED21 and SOH1/MED31. CSE2/MED9 interacts directly with MED4. The tail module contains: MED2, PGD1/MED3, RGR1/MED14, GAL11/MED15 and SIN4/MED16. The CDK8 module contains: MED12, MED13, CCNC and CDK8. Individual preparations of the Mediator complex lacking one or more distinct subunits have been variously termed ARC, CRSP, DRIP, PC2, SMCC and TRAP.Regulation of mRNA synthesis requires intermediary proteins that transduce regulatory signals from upstream transcriptional activator proteins to basal transcription machinery at the core promoter. Three types of intermediary factors that enable the basal transcription machinery to respond to transcriptional activator proteins bound to regulatory DNA sequences have been identified: (i) TAFIIs, which associate with TATA-binding protein (TBP) to form TFIID; (ii) mediator, which associates with RNA polymerase II to form a holo-polymerase; and (iii) coactivators such as human upstream stimulatory activity (USA), mammalian CBP/P300, yeast ADA complex, and HMG proteins. The interaction of these multiprotein complexes with activators and general transcription factors is essential for transcriptional regulation. This family of proteins represent the transcriptional mediator protein that is required for activation of many RNA polymerase II promoters and which are conserved from yeast to humans [ ].This entry represents the Med6 subunit of the Mediator complex in fungi.
Protein Domain
Name: NSP3, first ubiquitin-like (Ubl) domain, coronavirus
Type: Domain
Description: Non-structural protein NSP3 (also known as nsp3) is a multi-domain protein, the largest product of ORF1a which encodes the poyprotein 1a/1ab (pp1a/1ab). NSP3 comprises up to 16 different domains and regions, their organisation differs between coronaviruses (CoV). However, eight domains and two transmembrane regions are conserved in all known CoVs: the ubiquitin-like domain 1 (Ubl1), the Glu-rich acidic domain (hypervariable region), a macrodomain (X domain), the ubiquitin-like domain 2 (Ubl2), the papain-like protease 2 (PL2pro, depending on the CoV there are one or two PLpro), the NSP3 ectodomain (3Ecto, also called "zinc-finger domain"), as well as the domains Y1 and CoV-Y of unknown function. NSP3 is released from pp1a/1ab by the papain-like protease domain(s), which is (are) part of NSP3 itself. NSP3 is an essential component of the replication/transcription complex (RTC) as it acts as a scaffold protein to interact with itself and to bind other viral NSPs or host proteins. The RTC associates with host ER membranes producing convoluted membranes and double-membrane vesicles. It is also involved in post-translational modifications of host proteins to antagonise its innate immune response [ , ].Nsp3 comprises various domains of functional and structural importance for virus replication, the organization of which differs between CoV genera, due to duplication or absence of some domains [ ]. Two ubiquitin-like domains, Ubl1 and Ubl2 (Nsp3a and the N-terminal domain of Nsp3d), exist within Nsp3 of all CoVs. The known functional roles of Nsp3a Ubl in CoVs are related to single-stranded (ssRNA) binding and interacting with the nucleocapsid (N) protein [, ]. Nsp3d Ubl is immediately adjacent to the N terminus of the PLpro (or PL2Pro) domain in CoV polyproteins, and it may play a critical role in protease regulation and stability as well as in viral infection [, , , ]. In addition to the four β-strands and two α-helices that are common to ubiquitin-like folds, the Nsp3a Ubl domain contains two short helices [, ]. The Nsp3d Ubl domain comprises five β-strands, one α-helix, and one 3(10)-helix [, , ]. This entry represents a domain covering the entire CoV Nsp3a and Nsp3d first Ubl domain. NSP3a interacts with numerous other proteins involved in replication and transcription and may serve as a scaffolding protein for these processes. The Ubl1 domain interacts with N protein to co-localise genomic RNA with the nascent replicase-transcriptase complex at the earliest stages of infection, essential for the virus []. This Ubl1 domain may bind ssRNA with AUA patterns. The C-terminal Glu-rich subdomain is best described as a flexible tail attached to the globular UB1 subdomain [].
Protein Domain
Name: Zinc finger, CHY-type superfamily
Type: Homologous_superfamily
Description: Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [, , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. Pirh2 is an eukaryotic ubiquitin protein ligase, which has been shown to promote p53 degradation in mammals. Pirh2 physically interacts with p53 and promotes ubiquitination of p53 independently of MDM2. Like MDM2, Pirh2 is thought to participate in an autoregulatory feedback loop that controls p53 function. Pirh2 proteins contain three distinct zinc fingers, the CHY-type, the CTCHY-type which is C-terminal to the CHY-type zinc finger and a RING finger. The CHY-type zinc finger has no currently known function [ ].As well as Pirh2, the CHY-type zinc finger is also found in the following proteins:Yeast helper of Tim protein 13. Hot13 may have a role in the assembly and recycling of the small Tims, a complex of the mitochondrial intermembrane space that participates in the TIM22 import pathway for assembly of the inner membrane [ ] Several plant hypothetical proteins that also contain haemerythrin cation binding domainsSeveral protozoan hypothetical proteins that also contain a Myb domainThe solution structure of this zinc finger has been solved and binds three zinc atoms as shown in the following schematic representation: ++---------+-----+ || | |CXHYxxxxxxxxxCCxxxxxCxxCHxxxxxHxxxxxxxxxxxCxxCxxxxxxxxxCxxC | | | | | | | |+-+-----------------+--+ +--+---------+--+ 'C': conserved cysteine involved in the binding of one zinc atom.'H': conserved histidine involved in the binding of one zinc atom.
Protein Domain
Name: Apoptosis regulator, Bcl-X
Type: Family
Description: Apoptosis, or programmed cell death (PCD), is a common and evolutionarily conserved property of all metazoans [ ]. In many biological processes, apoptosis is required to eliminate supernumerary or dangerous (such as pre-cancerous) cells and to promote normal development. Dysregulation of apoptosis can, therefore, contribute to the development of many major diseases including cancer, autoimmunity and neurodegenerative disorders. In most cases, proteins of the caspase family execute the genetic programme that leads to cell death.Bcl-2 proteins are central regulators of caspase activation, and play a key role in cell death by regulating the integrity of the mitochondrial and endoplasmic reticulum (ER) membranes [ ]. At least 20 Bcl-2 proteins have been reported in mammals, and several others have been identified in viruses. Bcl-2 family proteins fall roughly into three subtypes, which either promote cell survival (anti-apoptotic) or trigger cell death (pro-apoptotic). All members contain at least one of four conserved motifs, termed Bcl-2 Homology (BH) domains. Bcl-2 subfamily proteins, which contain at least BH1 and BH2, promote cell survival by inhibiting the adapters needed for the activation of caspases.Pro-apoptotic members potentially exert their effects by displacing the adapters from the pro-survival proteins; these proteins belong either to the Bax subfamily, which contain BH1-BH3, or to the BH3 subfamily, which mostly only feature BH3 [ ]. Thus, the balance between antagonistic family members is believed to play a role in determining cell fate. Members of the wider Bcl-2 family, which also includes Bcl-x, Bcl-w and Mcl-1, are described by their similarity to Bcl-2 protein, a member of the pro-survival Bcl-2 subfamily []. Full-length Bcl-2 proteins feature all four BH domains, seven α-helices, and a C-terminal hydrophobic motif that targets the protein to the outer mitochondrial membrane, ER and nuclear envelope. Bcl-X is a dominant regulator of programmed cell death in mammalian cells. The long form (Bcl-X(L)) displays cell death repressor activity, but the short isoform(Bcl-X(S)) and the b-isoform (Bcl-Xb) promote cell death. Bcl-X(L), Bcl-X(S) and Bcl-Xb are three isoforms derived by alternative RNA splicing. Bcl-X(S) forms heterodimers with Bcl-2. Homologues of Bcl-X include the rat Bax and mouse Bak proteins which also influenceapoptosis.In healthy cells, Bcl-x resides in the cytoplasm and becomes membrane-associated in response to cytotoxic insults. Its 3D structure comprises a bundle of five amphipathic alpha- helices surrounding two central hydrophobic helical regions. A hydro- phobic groove, formed by residues from BH1-BH3, is capable of binding the BH3 α-helix from a pro-apoptotic BH3-only family member [ ].
Protein Domain
Name: Cytochrome P450, E-class, group IV
Type: Family
Description: Cytochrome P450 enzymes are a superfamily of haem-containing mono-oxygenases that are found in all kingdoms of life, and which show extraordinary diversity in their reaction chemistry. In mammals, these proteins are found primarily in microsomes of hepatocytes and other cell types, where they oxidise steroids, fatty acids and xenobiotics, and are important for the detoxification and clearance of various compounds, as well as for hormone synthesis and breakdown, cholesterol synthesis and vitamin D metabolism. In plants, these proteins are important for the biosynthesis of several compounds such as hormones, defensive compounds and fatty acids. In bacteria, they are important for several metabolic processes, such as the biosynthesis of antibiotic erythromycin in Saccharopolyspora erythraea (Streptomyces erythraeus).Cytochrome P450 enzymes use haem to oxidise their substrates, using protons derived from NADH or NADPH to split the oxygen so a single atom can be added to a substrate. They also require electrons, which they receive from a variety of redox partners. In certain cases, cytochrome P450 can be fused to its redox partner to produce a bi-functional protein, such as with P450BM-3 from Bacillus megaterium [ ], which has haem and flavin domains.Organisms produce many different cytochrome P450 enzymes (at least 58 in humans), which together with alternative splicing can provide a wide array of enzymes with different substrate and tissue specificities. Individual cytochrome P450 proteins follow the nomenclature: CYP, followed by a number (family), then a letter (subfamily), and another number (protein); e.g. CYP3A4 is the fourth protein in family 3, subfamily A. In general, family members should share >40% identity, while subfamily members should share >55% identity.Cytochrome P450 proteins can also be grouped by two different schemes. One scheme was based on a taxonomic split: class I (prokaryotic/mitochondrial) and class II (eukaryotic microsomes). The other scheme was based on the number of components in the system: class B (3-components) and class E (2-components). These classes merge to a certain degree. Most prokaryotes and mitochondria (and fungal CYP55) have 3-component systems (class I/class B) - a FAD-containing flavoprotein (NAD(P)H-dependent reductase), an iron-sulphur protein and P450. Most eukaryotic microsomes have 2-component systems (class II/class E) - NADPH:P450 reductase (FAD and FMN-containing flavoprotein) and P450. There are exceptions to this scheme, such as 1-component systems that resemble class E enzymes [ , , ]. The class E enzymes can be further subdivided into five sequence clusters, groups I-V, each of which may contain more than one cytochrome P450 family (eg, CYP1 and CYP2 are both found in group I). The divergence of the cytochrome P450 superfamily into B- and E-classes, and further divergence into stable clusters within the E-class, appears to be very ancient, occurring before the appearance of eukaryotes.This entry represents class E cytochrome P450 proteins that fall into sequence cluster group IV. Group IV comprises the CYP7 (cholesterol 7-alpha-hydroxylase) and CYP51 (lanosterol 14-alpha-demethylase) families, which show significant sequence similarity even though there is no apparent functional resemblance. The CYP8 (prostacyclin synthase) family also falls into this group, and shows high sequence similarity to CYP7 members [ ]. Proteins required in the biosynthesis of fungal mycotoxins are also included: cytochrome P450 monooxygenases gloO and gloP from Glarea lozoyensis are required for synthesis of lipohexapeptides of the echinocandin family that prevent fungal cell wall formation by non-competitive inhibition of beta-1,3-glucan synthase [].
Protein Domain
Name: Cytochrome P450, E-class, group I, CYP1
Type: Family
Description: Cytochrome P450 enzymes are a superfamily of haem-containing mono-oxygenases that are found in all kingdoms of life, and which show extraordinary diversity in their reaction chemistry. In mammals, these proteins are found primarily in microsomes of hepatocytes and other cell types, where they oxidise steroids, fatty acids and xenobiotics, and are important for the detoxification and clearance of various compounds, as well as for hormone synthesis and breakdown, cholesterol synthesis and vitamin D metabolism. In plants, these proteins are important for the biosynthesis of several compounds such as hormones, defensive compounds and fatty acids. In bacteria, they are important for several metabolic processes, such as the biosynthesis of antibiotic erythromycin in Saccharopolyspora erythraea (Streptomyces erythraeus).Cytochrome P450 enzymes use haem to oxidise their substrates, using protons derived from NADH or NADPH to split the oxygen so a single atom can be added to a substrate. They also require electrons, which they receive from a variety of redox partners. In certain cases, cytochrome P450 can be fused to its redox partner to produce a bi-functional protein, such as with P450BM-3 from Bacillus megaterium [ ], which has haem and flavin domains.Organisms produce many different cytochrome P450 enzymes (at least 58 in humans), which together with alternative splicing can provide a wide array of enzymes with different substrate and tissue specificities. Individual cytochrome P450 proteins follow the nomenclature: CYP, followed by a number (family), then a letter (subfamily), and another number (protein); e.g. CYP3A4 is the fourth protein in family 3, subfamily A. In general, family members should share >40% identity, while subfamily members should share >55% identity.Cytochrome P450 proteins can also be grouped by two different schemes. One scheme was based on a taxonomic split: class I (prokaryotic/mitochondrial) and class II (eukaryotic microsomes). The other scheme was based on the number of components in the system: class B (3-components) and class E (2-components). These classes merge to a certain degree. Most prokaryotes and mitochondria (and fungal CYP55) have 3-component systems (class I/class B) - a FAD-containing flavoprotein (NAD(P)H-dependent reductase), an iron-sulphur protein and P450. Most eukaryotic microsomes have 2-component systems (class II/class E) - NADPH:P450 reductase (FAD and FMN-containing flavoprotein) and P450. There are exceptions to this scheme, such as 1-component systems that resemble class E enzymes [ , , ]. The class E enzymes can be further subdivided into five sequence clusters, groups I-V, each of which may contain more than one cytochrome P450 family (eg, CYP1 and CYP2 are both found in group I). The divergence of the cytochrome P450 superfamily into B- and E-classes, and further divergence into stable clusters within the E-class, appears to be very ancient, occurring before the appearance of eukaryotes.This entry represents the CYP1 family from group I, class E, cytochrome P450 proteins. CYP1 enzymes mainly metabolise exogenous substrates and are found in mammalia (1A, 1B), bony fishes (1A), sharks, skates, rays (1A) and birds (1A). CYP1A contains five proteins, namely, CYP1A1, 1A2, 1A3, 1A4 and 1A5. CYP1 family members are closely associated with the metabolic activation of pro-carcinogens and mutagens [ ]. For example, CYP1B1 may play an important role in susceptibility to mammary and ovarian cancer through its involvement in oestrogen metabolism [], as well as with various cancers associated with the activation of polycyclic aromatic hydrocarbons in cigarette smoke [].
Protein Domain
Name: Cytochrome P450, E-class, group I, CYP2D-like
Type: Family
Description: Cytochrome P450 enzymes are a superfamily of haem-containing mono-oxygenases that are found in all kingdoms of life, and which show extraordinary diversity in their reaction chemistry. In mammals, these proteins are found primarily in microsomes of hepatocytes and other cell types, where they oxidise steroids, fatty acids and xenobiotics, and are important for the detoxification and clearance of various compounds, as well as for hormone synthesis and breakdown, cholesterol synthesis and vitamin D metabolism. In plants, these proteins are important for the biosynthesis of several compounds such as hormones, defensive compounds and fatty acids. In bacteria, they are important for several metabolic processes, such as the biosynthesis of antibiotic erythromycin in Saccharopolyspora erythraea (Streptomyces erythraeus).Cytochrome P450 enzymes use haem to oxidise their substrates, using protons derived from NADH or NADPH to split the oxygen so a single atom can be added to a substrate. They also require electrons, which they receive from a variety of redox partners. In certain cases, cytochrome P450 can be fused to its redox partner to produce a bi-functional protein, such as with P450BM-3 from Bacillus megaterium [ ], which has haem and flavin domains.Organisms produce many different cytochrome P450 enzymes (at least 58 in humans), which together with alternative splicing can provide a wide array of enzymes with different substrate and tissue specificities. Individual cytochrome P450 proteins follow the nomenclature: CYP, followed by a number (family), then a letter (subfamily), and another number (protein); e.g. CYP3A4 is the fourth protein in family 3, subfamily A. In general, family members should share >40% identity, while subfamily members should share >55% identity.Cytochrome P450 proteins can also be grouped by two different schemes. One scheme was based on a taxonomic split: class I (prokaryotic/mitochondrial) and class II (eukaryotic microsomes). The other scheme was based on the number of components in the system: class B (3-components) and class E (2-components). These classes merge to a certain degree. Most prokaryotes and mitochondria (and fungal CYP55) have 3-component systems (class I/class B) - a FAD-containing flavoprotein (NAD(P)H-dependent reductase), an iron-sulphur protein and P450. Most eukaryotic microsomes have 2-component systems (class II/class E) - NADPH:P450 reductase (FAD and FMN-containing flavoprotein) and P450. There are exceptions to this scheme, such as 1-component systems that resemble class E enzymes [ , , ]. The class E enzymes can be further subdivided into five sequence clusters, groups I-V, each of which may contain more than one cytochrome P450 family (eg, CYP1 and CYP2 are both found in group I). The divergence of the cytochrome P450 superfamily into B- and E-classes, and further divergence into stable clusters within the E-class, appears to be very ancient, occurring before the appearance of eukaryotes.This entry represents the CYP2D family from group I, class E, cytochrome P450 proteins, as well as other CYP2 family proteins. The CYP2 family comprises 15 subfamilies (A-H, J-N, P and Q). The first five (A-E) are present in mammalian liver, but in differing amounts and with different inducibilities. These five subfamilies show varied substrate specificities, with some degree of overlap. CYP2D6 metabolises several classes of therapeutic drugs, endogenous neurochemicals and toxins, and can be detected in liver, intestine and kidney [ ]. CYP2D6 is non-inducible, and genomic defects in CYP2D6 that inhibit or induce activity can have serious effects with respect to drug efficacy and clearance [].
Protein Domain
Name: Peptidase C30/C16, Betacoronavirus
Type: Domain
Description: This entry represents a domain found in betacoronavirus cysteine endopeptidases that belong to MEROPS peptidase families C30 (clan PA) and C16 (subfamiles C16A and C16B, clan CA). These peptidase are involved in viral polyprotein processing. All coronaviruses encodes between one and two accessory cysteine proteinases that recognise and process one or two sites in the amino-terminal half of the replicase polyprotein during assembly of the viral replication complex. MHV, HCoV and TGEV encode two accesssory proteinases, called coronavirus papain-like proteinase 1 and 2 (PL1-PRO and PL2-PRO). IBV and SARS encodes only one called PL-PRO [ ]. Coronavirus papain-like proteinases 1 and 2 have restricted specificities, cleaving respectively two and one bond(s)in the polyprotein. This restricted activity may be due to extended specificity sites: Arg or Lys at the cleavage site position P5 are required for PL1-PRO [], and Phe at the cleavage site position P6 is required for PL2-PRO []. PL1-PRO releases p28 and p65 from the N terminus of the polyprotein; PL2-PRO cleaves between p210 and p150. A cysteine peptidase is a proteolytic enzyme that hydrolyses a peptide bond using the thiol group of a cysteine residue as a nucleophile. Hydrolysis involves usually a catalytic triad consisting of the thiol group of the cysteine, the imidazolium ring of a histidine, and a third residue, usually asparagine or aspartic acid, to orientate and activate the imidazolium ring. In only one family of cysteine peptidases, is the role of the general base assigned to a residue other than a histidine: in peptidases from family C89 (acid ceramidase) an arginine is the general base. Cysteine peptidases can be grouped into fourteen different clans, with members of each clan possessing a tertiary fold unique to the clan. Four clans of cysteine peptidases share structural similarities with serine and threonine peptidases and asparagine lyases. From sequence similarities, cysteine peptidases can be clustered into over 80 different families [ ]. Clans CF, CM, CN, CO, CP and PD contain only one family.Cysteine peptidases are often active at acidic pH and are therefore confined to acidic environments, such as the animal lysosome or plant vacuole. Cysteine peptidases can be endopeptidases, aminopeptidases, carboxypeptidases, dipeptidyl-peptidases or omega-peptidases. They are inhibited by thiol chelators such as iodoacetate, iodoacetic acid, N-ethylmaleimide or p-chloromercuribenzoate. Clan CA includes proteins with a papain-like fold. There is a catalytic triad which occurs in the order: Cys/His/Asn (or Asp). A fourth residue, usually Gln, is important for stabilising the acyl intermediate that forms during catalysis, and this precedes the active site Cys. The fold consists of two subdomains with the active site between them. One subdomain consists of a bundle of helices, with the catalytic Cys at the end of one of them, and the other subdomain is a β-barrel with the active site His and Asn (or Asp). There are over thirty families in the clan, and tertiary structures have been solved for members of most of these. Peptidases in clan CA are usually sensitive to the small molecule inhibitor E64, which is ineffective against peptidases from other clans of cysteine peptidases [ ].Clan CD includes proteins with a caspase-like fold. Proteins in the clan have an α/β/α sandwich structure. There is a catalytic dyad which occurs in the order His/Cys. The active site His occurs in a His-Gly motif and the active site Cys occurs in an Ala-Cys motif; both motifs are preceded by a block of hydrophobic residues [ ]. Specificity is predominantly directed towards residues that occupy the S1 binding pocket, so that caspases cleave aspartyl bonds, legumains cleave asparaginyl bonds, and gingipains cleave lysyl or arginyl bonds.Clan CE includes proteins with an adenain-like fold. The fold consists of two subdomains with the active site between them. One domain is a bundle of helices, and the other a β-barrel. The subdomains are in the opposite order to those found in peptidases from clan CA, and this is reflected in the order of active site residues: His/Asn/Gln/Cys. This has prompted speculation that proteins in clans CA and CE are related, and that members of one clan are derived from a circular permutation of the structure of the other.Clan CL includes proteins with a sortase B-like fold. Peptidases in the clan hydrolyse and transfer bacterial cell wall peptides. The fold shows a closed β-barrel decorated with helices with the active site at one end of the barrel [ ]. The active site consists of a His/Cys catalytic dyad.Cysteine peptidases with a chymotrypsin-like fold are included in clan PA, which also includes serine peptidases. Cysteine peptidases that are N-terminal nucleophile hydrolases are included in clan PB. Cysteine peptidases with a tertiary structure similar to that of the serine-type aspartyl dipeptidase are included in clan PC. Cysteine peptidases with an intein-like fold are included in clan PD, which also includes asparagine lyases.
Protein Domain
Name: Josephin domain
Type: Domain
Description: The Josephin domain is an eukaryotic protein module of about 180 residues, which occurs in stand-alone form in Josephin-like proteins, and as an amino-terminal domain associated with two or three copies of the ubiquitin- interacting motif (UIM) in ataxin 3-like proteins. Josephin domain-containing proteins function as de-ubiquitination enzymes []. Although it has originally been proposed that the Josephin domain could be an all-alpha helical domain distantly related to ENTH and VHS domains involved in membrane trafficking and regulatory adaptor function [], it is now believed that it is a mainly alpha helical cysteine-protease domain predicted to be active against ubiquitin chains or related substrates [, ].The Josephin domain contains two conserved histidines and one cysteine that is required for the ubiquitin protease activity [ , ] and two ubiquitin-binding sites [].Some proteins known to contain a Josephin domain are:Animal Machado-Joseph disease protein 1 (Ataxin 3). It interacts with key regulators (CBP, p300 and PCAF) of transcription and repressestranscription.Plant Machado-Joseph disease-like protein (MJD1a-like) (Ataxin 3 homologue).Mammalian Josephin 1 and 2.Drosophila melanogaster Josephin-like protein.Arabidopsis thaliana Josephin-like protein.
Protein Domain
Name: Helix-hairpin-helix motif, class 2
Type: Conserved_site
Description: The helix-hairpin-helix (HhH) motif is an around 20 amino acids domain present in prokaryotic and eukaryotic non-sequence-specific DNA binding proteins. The HhH motif is similar to, but distinct from, the helix-turn-helix (HtH) and the helix-loop-helix (HLH) motifs. All three motifs have two helices (H1 and H2) connected by a short turn. DNA-binding proteins with a HhH structural motif are involved in non-sequence-specific DNA binding that occurs via the formation of hydrogen bonds between protein backbone nitrogens and DNA phosphate groups. These HhH motifs are observed in DNA repair enzymes and in DNA polymerases. By contrast, proteins with a HtH motif bind DNA in a sequence-specific manner through the binding of H2 with the major groove; these proteins are primarily gene regulatory proteins. DNA-binding proteins with the HLH structural motif are transcriptional regulatory proteins and are principally related to a wide array of developmental processes [ ].Examples of proteins that contain a HhH motif include the eukaryotic/prokaryotic RAD2 family of 5'-3' exonucleases such as T4 RNase H and T5 [ , ], eukaryotic 5' endonucleases such as FEN-1 (Flap) [], and some viral exonucleases.
Protein Domain
Name: Flavivirus/Alphavirus glycoprotein, immunoglobulin-like domain superfamily
Type: Homologous_superfamily
Description: This immuniglobulin-like domain superfamily is found in polyproteins from Flavivirus and Alphavirus.Flaviruses are small, enveloped RNA viruses that use arthropods such as mosquitoes for transmission to their vertebrate hosts, and include Yellow fever virus, West Nile virus, Tick-borne encephalitis virus, Japanese encephalitis virus, and Dengue virus 2 [ ]. Flaviviruses consist of three structural proteins: the core nucleocapsid protein C (), and the envelope glycoproteins M ( ) and E. Glycoprotein E is a class II viral fusion protein that mediates both receptor binding and fusion. Class II viral fusion proteins are found in flaviviruses and alphaviruses, and are structurally distinct from class I fusion proteins from influenza-type viruses and retroviruses. Glycoprotein E is comprised of three domains: domain I (dimerisation domain) is an 8-stranded beta barrel, domain II (central domain) is an elongated domain composed of twelve beta strands and two alpha helices, and domain III (immunoglobulin-like domain) is an IgC-like module with ten beta strands. This entry represents the Ig-like domain III, which contains a putative receptor-binding loop [ ].
Protein Domain
Name: CD44 antigen
Type: Family
Description: CD44, also known as the hyaluronan receptor [ ], is a polymorphic cell-surface glycoprotein synthesised in a varietyof cells. The protein interacts with actin-based cytoskeletons, and co-localises with ERM proteins (ezrin, radixin and moesin) at actin filament- plasma membrane interaction sites. CD44 may be involved in cell migration, adhesion and differentiation in normal cells, as well as in metastasis in cancer cells. It is a receptor for extracellular materials, such as soluble or cell-bound hyaluronic acid, collagen, fibronectin and serglycin. The protein has a single membrane-spanning domain and has a heavily glycosylated extracellular domain; its cytoplasmic domain is reportedly associated with an ankyrin-like protein [].Some of the proteins in this group are responsible for the molecular basis of the blood group antigens, surface markers on the outside of the red blood cell membrane. Most of these markers are proteins, but some are carbohydrates attached to lipids or proteins [Reid M.E., Lomas-Francis C. The Blood Group Antigen FactsBook Academic Press, London / San Diego, (1997)]. CD44 antigen (Phagocytic glycoprotein I) belongs to the Indian blood group system and is associated with In(a/b) antigen.
Protein Domain
Name: MPP1, SH3 domain
Type: Domain
Description: This entry represents the SH3 domain of MPP1, which is a ubiquitously-expressed scaffolding protein that plays roles in regulating neutrophil polarity, cell shape, hair cell development, and neural development and patterning of the retina [ , ]. It was originally identified as an erythrocyte protein that stabilizes the actin cytoskeleton to the plasma membrane by forming a complex with 4.1R protein and glycophorin C [, ]. MPP1 belongs to the membrane-associated guanylate kinase (MAGUK) p55 subfamily.The membrane-associated guanylate kinase (MAGUK) p55 subfamily (also known as MPP subfamily) members include the Drosophila Stardust protein and its vertebrate homologues, MPP1-7. They contain the core of three domains characteristic of MAGUK (membrane-associated guanylate kinase) proteins: PDZ, SH3, and guanylate kinase (GuK). In addition, they also contain the Hook (Protein 4.1 Binding) motif in between the SH3 and GuK domains [ ]. MPP2-7 have two additional L27 domains at their N terminus. The GuK domain in MAGUK proteins is enzymatically inactive; instead, the domain mediates protein-protein interactions and associates intramolecularly with the SH3 domain [].
Protein Domain
Name: MPP4, SH3 domain
Type: Domain
Description: This entry represents the SH3 domain of MPP4. MPP4, also called Disks Large homologue 6 (DLG6) or amyotrophic lateral sclerosis 2 chromosomal region candidate gene 5 protein (ALS2CR5), is a retina-specific scaffolding protein that plays a role in organizing presynaptic protein complexes in the photoreceptor synapse, where it localizes to the plasma membrane [ ]. It is required in the proper localization of calcium ATPases and for maintenance of calcium homeostasis []. MPP4 belongs to the membrane-associated guanylate kinase (MAGUK) p55 subfamily. The membrane-associated guanylate kinase (MAGUK) p55 subfamily (also known as MPP subfamily) members include the Drosophila Stardust protein and its vertebrate homologues, MPP1-7. They contain the core of three domains characteristic of MAGUK (membrane-associated guanylate kinase) proteins: PDZ, SH3, and guanylate kinase (GuK). In addition, they also contain the Hook (Protein 4.1 Binding) motif in between the SH3 and GuK domains [ ]. MPP2-7 have two additional L27 domains at their N terminus. The GuK domain in MAGUK proteins is enzymatically inactive; instead, the domain mediates protein-protein interactions and associates intramolecularly with the SH3 domain [].
Protein Domain
Name: Retroviral Gag polyprotein, M
Type: Domain
Description: The Gag polyprotein directs the assembly and release of virus particles from infected cells. The Gag polyprotein has three domains required for activity: an N-terminal membrane-binding (M) domain that directs Gag to the plasma membrane, an interaction (I) domain involved in Gag aggregation, and a late assembly (L) domain that mediates the budding process [ ]. During viral maturation, the Gag polyprotein is then cleaved into major structural proteins by the viral protease, yielding the matrix, capsid, nucleoprotein, and some smaller peptides. In Rous sarcoma virus (RSV), the M domain consists of the first 85 residues of the matrix protein. However, unlike other Gag polyproteins, the M domain of RSV Gag is not myristylated, but retains full activity [].This domain forms an alpha helical bundle structure [].This entry represents the M domain of the Gag polyprotein found in avian retroviruses. This entry also identifies Gag polyproteins from several avian endogenous retroviruses, which arise when one or more copies of the retroviral genome becomes integrated into the host genome [ ].
Protein Domain
Name: WW domain
Type: Domain
Description: Synonym(s): Rsp5 or WWP domainThe WW domain is a short conserved region in a number of unrelated proteins, which folds as a stable, triple stranded β-sheet. This short domain of approximately 40 amino acids, may be repeated up to four times in some proteins [ , , , ]. The name WW or WWP derives from the presence of two signature tryptophan residues that are spaced 20-23 amino acids apart and are present in most WW domains known to date, as well as that of a conserved Pro. The WW domain binds to proteins with particular proline-motifs, [AP]-P-P-[AP]-Y, and/or phosphoserine- phosphothreonine-containing motifs [, ]. It is frequently associated with other domains typical for proteins in signal transduction processes.A large variety of proteins containing the WW domain are known. These include; dystrophin, a multidomain cytoskeletal protein; utrophin, a dystrophin-like protein of unknown function; vertebrate YAP protein, substrate of an unknown serine kinase; Mus musculus (Mouse) NEDD-4, involved in the embryonic development and differentiation of the central nervous system; Saccharomyces cerevisiae (Baker's yeast) RSP5, similar to NEDD-4 in its molecular organisation; Rattus norvegicus (Rat) FE65, a transcription-factor activator expressed preferentially in liver; Nicotiana tabacum (Common tobacco) DB10 protein, amongst others.
Protein Domain
Name: Geranylgeranyl transferase type-2 subunit beta
Type: Family
Description: This entry includes the beta subunit of geranylgeranyl transferase type-2 (GGTase-II), which is also known as Rab geranyl-geranyltransferase subunit beta, Bet2 and Ptb1. GGTase-II catalyses the transfer of a geranyl-geranyl moiety from geranyl-geranyl pyrophosphate to proteins having the C-terminal -XCC or -XCXC [ , , ]. GGTase-IIs are a subgroup of protein prenyltransferases (PTases) of lipid-modifying enzymes. PTases catalyze the carboxyl-terminal lipidation of Ras, Rab, and several other cellular signal transduction proteins, facilitating membrane associations and specific protein-protein interactions. Prenyltransferases employ a Zn2+ ion to alkylate a thiol group catalyzing the formation of thioether linkages between cysteine residues at or near the C terminus of protein acceptors and the C1 atom of isoprenoid lipids (geranylgeranyl (20-carbon) in the case of GGTase-II). GGTase-II catalyzes alkylation of both cysteine residues in Rab proteins containing carboxy-terminal "CC", "CXCX"or "CXC"motifs. PTases are heterodimeric with both alpha and beta subunits required for catalytic activity. The beta subunit has an alpha 6 - alpha 6 barrel fold. In contrast to other prenyltransferases, GGTas-II requires an escort protein to bring the substrate protein to the catalytic heterodimer and to escort the geryanylgeranylated product to the membrane [ , ].
Protein Domain
Name: Peptidase, FtsH
Type: Family
Description: AAA proteases are ATP-dependent metallopeptidases present in eubacteria as well as in organelles of bacterial origin, i.e., mitochondria and chloroplasts. The AAA proteases are also known as FtsH, referring to the Escherichia coli enzyme (Filamentous temperature sensitive H). Most bacteria have a single gene encoding FtsH, three genes are present in yeast and humans, while 12 orthologs have been found in the genome of plants [].E. coil FtsH is a membrane-anchored ATP-dependent protease that degrades misfolded or misassembled membrane proteins as well as a subset of cytoplasmic regulatory proteins. FtsH is a 647-residue protein of 70kDa, with two putative transmembrane segments towards its N terminus which anchor the protein to the membrane, giving rise to a periplasmic domain of 70 residues and a cytoplasmic segment of 520 residues containing the ATPase and protease domains [ ].The main function of organellar AAA/FtsH proteases is selective degradation of non-assembled, incompletely assembled and/or damaged membrane-anchored proteins. Additional functions of the AAA/FtsH proteases that are not directly connected with protein quality control are processing of pre-proteins, dislocation of membrane proteins or degradation of regulatory proteins [ , , ].
Protein Domain
Name: Fumarate reductase, flavoprotein subunit
Type: Family
Description: In bacteria two distinct, membrane-bound, enzyme complexes are responsible for the interconversion of fumarate and succinate ( ): fumaratereductase (Frd) is used in anaerobic growth, and succinate dehydrogenase (Sdh) is used in aerobic growth. Both complexes consist of two main components: amembrane-extrinsic component composed of a FAD-binding flavoprotein and an iron-sulphur protein; and an hydrophobic component composed of a membraneanchor protein and/or a cytochrome B. In eukaryotes mitochondrial succinate dehydrogenase (ubiquinone) ( ) is an enzyme composed of two subunits: a FAD flavoprotein and and iron-sulphurprotein. The flavoprotein subunit is a protein of about 60 to 70 Kd to which FAD is covalently bound to a histidine residue which is located in the N-terminalsection of the protein [ ]. The sequence around that histidine is well conserved in Frd and Sdh from various bacterial and eukaryotic species [].The terms succinate dehydrogenase and fumarate reductase may be used interchangeably in certain systems. However, a number of species have distinct complexes, with the fumarate reductase active under anaerobic conditions. This model represents the fumarate reductase flavoprotein subunit from several such species in which a distinct succinate dehydrogenase is also found.
Protein Domain
Name: LOTUS-like domain
Type: Homologous_superfamily
Description: This superfamily represents a RNA-binding domain identified when studying Tudor domain containing proteins and was designed LOTUS after Limkain, Oskar and TUdor-containing proteins 5 and 7.This predicted RNA-binding domain found in insect Oskar and vertebrate TDRD5/TDRD7 proteins that nucleate or organise structurally related ribonucleoprotein (RNP) complexes, the polar granule and nuage, is poorly understood [ , ]. The domain adopts the winged helix-turn-helix fold and binds RNA with a potential specificity for dsRNA []. In eukaryotes, this domain is often combined in the same polypeptide with protein-protein- or lipid- interaction domains that might play a role in anchoring these proteins to specific cytoskeletal structures. Thus, proteins with this domain might have a key role in the recognition and localisation of dsRNA, including miRNAs, rasiRNAs and piRNAs hybridized to their targets. In other cases, this domain is fused to ubiquitin-binding, E3 ligase and ubiquitin-like domains, indicating a previously under-appreciated role for ubiquitination in regulating the assembly and stability of nuage-like RNP complexes. Both bacteria and eukaryotes encode a conserved family of proteins that combines this predicted RNA-binding domain with a previously uncharacterised RNAse domain belonging to the superfamily that includes the 5'->3' nucleases, PIN and NYN domains [ ].
Protein Domain
Name: Arrestin-like, C-terminal
Type: Homologous_superfamily
Description: Arrestins comprise a family of closely-related proteins. In addition to the inactivation of G protein-coupled receptors, arrestins have been implicated in the endocytosis of receptors and cross talk with other signalling pathways. S-Arrestin (retinal S-antigen) is a major protein of the retinal rod outer segments. It interacts with photo-activated phosphorylated rhodopsin, inhibiting or 'arresting' its ability to interact with transducin [ ]. Beta-arrestin-1 and -2, which regulate the function of beta-adrenergic receptors by binding to their phosphorylated forms, impairing their capacity to activate G(S) proteins; Cone photoreceptors C-arrestin (arrestin-X) [], which could bind to phosphorylated red/green opsins; and Drosophila phosrestins I and II, which undergo light-induced phosphorylation, and probably play a role in photoreceptor transduction [, , ]. The crystal structure of bovine retinal arrestin comprises two domains of antiparallel β-sheets connected through a hinge region and one short α-helix on the back of the amino-terminal fold []. This superfamily represents the C-terminal domain of arrestin, and Vacuolar protein sorting protein 26 (VPS26), consisting of an immunoglobulin-like β-sandwich structure.VPS26 assembles into a multimeric complex with other vacuolar protein sorting proteins (VPSs) and plays a role in vesicular protein sorting [ ].
Protein Domain
Name: WW domain superfamily
Type: Homologous_superfamily
Description: Synonym(s): Rsp5 or WWP domainThe WW domain is a short conserved region in a number of unrelated proteins, which folds as a stable, triple stranded β-sheet. This short domain of approximately 40 amino acids, may be repeated up to four times in some proteins [ , , , ]. The name WW or WWP derives from the presence of two signature tryptophan residues that are spaced 20-23 amino acids apart and are present in most WW domains known to date, as well as that of a conserved Pro. The WW domain binds to proteins with particular proline-motifs, [AP]-P-P-[AP]-Y, and/or phosphoserine- phosphothreonine-containing motifs [, ]. It is frequently associated with other domains typical for proteins in signal transduction processes.A large variety of proteins containing the WW domain are known. These include; dystrophin, a multidomain cytoskeletal protein; utrophin, a dystrophin-like protein of unknown function; vertebrate YAP protein, substrate of an unknown serine kinase; Mus musculus (Mouse) NEDD-4, involved in the embryonic development and differentiation of the central nervous system; Saccharomyces cerevisiae (Baker's yeast) RSP5, similar to NEDD-4 in its molecular organisation; Rattus norvegicus (Rat) FE65, a transcription-factor activator expressed preferentially in liver; Nicotiana tabacum (Common tobacco) DB10 protein, amongst others.
Protein Domain
Name: Herpesvirus glycoprotein D/GG/GX domain
Type: Domain
Description: Herpesviruses are dsDNA viruses with no RNA stage. This entry represents a conserved domain found in several Herpes viruses glycoproteins, including:Glycoprotein-D (gD or gIV), which is common to Human herpesvirus 1 (HHV-1) and Human herpesvirus 2 (HHV-2), as well as Equid herpesvirus 1, Bovine herpesvirus 1 and Meleagrid herpesvirus 1 (MeHV-1). Glycoprotein-D has been found on the viral envelope and the plasma membrane of infected cells. gD immunisation can produce an immune response to bovine herpes virus (BHV-1). This response is stronger than that of the other major glycoproteins gB (gI) and gC (gIII) in BHV-1 [ , , , ].Glycoprotein G (gG), which is one of the seven external glycoproteins of Human herpesvirus 1 (HHV-1) and Human herpesvirus 2 (HHV-2) [ ]. In the HHV-2 virus-infected cell, gG-2 is cleaved into a secreted amino-terminal portion (sgG-2) and a carboxy-terminal portion. The latter protein is further O-glycosylated, generating the cell membrane-associated mature gG-2 (mgG-2). The mgG-2 protein has widely been used as a prototype antigen for detection of type-specific antibodies against HHV-2 [].Glycoprotein GX (gX), which was initially identified in Suid herpesvirus 1 (Pseudorabies virus).
Protein Domain
Name: Butyrophylin-like, SPRY domain
Type: Domain
Description: Several proteins that contain RING fingers also contain a well-conserved 40-residue cysteine-rich domain termed a B-box zinc finger. Often, one or two copies of the B-box are associated with a coiled coil domain in addition to the ring finger, forming a tripartite motif. The tripartite motif is found in transcription factors, ribonucleoproteins and proto-oncoproteins, but no function has yet been ascribed to the domain [].The solution structure of the B-box motif has been determined by NMR. The protein is a monomer, with 2 β-strands, 2 helical turns and 3extended loop regions packed in a novel topology [ ]. Of 7 potential zincligands, only 4 are used, binding a single zinc atom in a C2-H2 tetrahedral arrangement. The B-box structure differs in tertiary fold from allother known zinc-binding motifs. A group of proteins that contain the B-box motif also host a well conserved domain of unknown function. Proteins that include this domain are,e.g.: butyrophilin, the RET finger protein, the 52kDa Ro protein and the Xenopus nuclear factor protein. The C-terminal portion of this region hasbeen termed the SPRY domain (after SPla and the RYanodine Receptor) [ ].
Protein Domain
Name: APOBEC/CMP deaminase, zinc-binding
Type: Binding_site
Description: Cytidine deaminase ( ) (cytidine aminohydrolase) catalyzes the hydrolysis of cytidine into uridine and ammonia while deoxycytidylate deaminase ( ) (dCMP deaminase) hydrolyzes dCMP into dUMP. Both enzymes are known to bind zinc and to require it for their catalytic activity [ , ]. These two enzymes do not share any sequence similarity with the exception of a region that contains three conserved histidine and cysteine residues which are thought to be involved in the binding of the catalytic zinc ion.Such a region is also found in other proteins [ , ]:Yeast cytosine deaminase ( ) (gene FCY1) which transforms cytosine into uracil. Mammalian apolipoprotein B mRNA editing protein, responsible for the postranscriptional editing of a CAA codon into a UAA (stop) codon in the APOB mRNA.Riboflavin biosynthesis protein ribG, which converts 2,5-diamino-6-(ribosylamino)-4(3H)-pyrimidinone 5'-phosphate into 5-amino-6-(ribosylamino)-2,4(1H,3H)-pyrimidinedione 5'-phosphate.Bacillus cereus blasticidin-S deaminase ( ), which catalyzes the deamination of the cytosine moiety of the antibiotics blasticidin S, cytomycin and acetylblasticidin S. Bacillus subtilis protein comEB. This protein is required for the binding and uptake of transforming DNA.B. subtilis hypothetical protein yaaJ.Escherichia coli hypothetical protein yfhC.Yeast hypothetical protein YJL035c.
Protein Domain
Name: Coronavirus spike glycoprotein S1, C-terminal
Type: Domain
Description: The coronavirus spike is an envelope glycoprotein which aids viral entry into the host cell. The precursor spike protein can be cleaved into three chains: spike protein S1, S2 and S2'. Spike protein S1 attaches the virion to the cell membrane by interacting with host receptor, initiating the infection. Spike protein S2 mediates fusion of the virion and cellular membranes by acting as a class I viral fusion protein. Spike protein S2' acts as a viral fusion peptide which is unmasked following S2 cleavage occurring upon virus endocytosis [ , ].S1 contains two major domains: an N-terminal domain (NTD) and a receptor-binding domain (S1 RBD) also referred to as the S1 CTD or domain B. Either the S1 NTD or S1 RBD, or occasionally both, are involved in binding the host receptors [ ].This entry represents an additional, short domain found at the very C terminus of the S1 protein. It is found across a range of alpha, beta and gamma coronaviruses. This small all beta stranded domain is known as subdomain 2 in the structure of the porcine epidemic diarrhea virus spike protein [ ].
Protein Domain
Name: HSPH1, nucleotide-binding domain
Type: Domain
Description: Human HSPH1 (also known as heat shock 105kDa/110kDa protein 1, HSP105; HSP105A; HSP105B) suppresses the aggregation of denatured proteins caused by heat shock in vitro, and may substitute for HSP70 family proteins to suppress the aggregation of denatured proteins in cells under severe stress [ ]. It reduces the protein aggregation and cytotoxicity associated with Polyglutamine (PolyQ) diseases, including Huntington's disease, which are a group of inherited neurodegenerative disorders sharing the characteristic feature of having insoluble protein aggregates in neurons []. The expression of HSPH1 is elevated in various malignant tumors, including malignant melanoma, and there is a direct correlation between HSPH1 expression and B-cell non-Hodgkin lymphomas (B-NHLs) aggressiveness and proliferation [, , ]. HSPH1 belongs to the 105/110kDa heat shock protein (HSP105/110) subfamily of the HSP70-like family. HSP105/110s are believed to function generally as co-chaperones of HSP70 chaperones, acting as nucleotide exchange factors (NEFs), to remove ADP from their HSP70 chaperone partners during the ATP hydrolysis cycle. HSP70 chaperones assist in protein folding and assembly, and can direct incompetent 'client' proteins towards degradation. Like HSP70 chaperones, HSP105/110s have an N-terminal nucleotide-binding domain (NBD) and a C-terminal substrate-binding domain (SBD) [].This entry represents the N-terminal nucleotide-binding domain of HSPH1.
Protein Domain
Name: Disintegrin domain superfamily
Type: Homologous_superfamily
Description: Disintegrins are a family of small proteins from viper venoms that function as potent inhibitors of both platelet aggregation and integrin-dependent cell adhesion [ , ]. Integrin receptors are involved in cell-cell and cell-extracellular matrix interactions, serving as the final common pathway leading to aggregation via formation of platelet-platelet bridges, which are essential in thrombosis and haemostasis. Disintegrins contain an RGD (Arg-Gly-Asp) or KGD (Lys-Gly-Asp) sequence motif that binds specifically to integrin IIb-IIIa receptors on the platelet surface, thereby blocking the binding of fibrinogen to the receptor-glycoprotein complex of activated platelets. Disintegrins act as receptor antagonists, inhibiting aggregation induced by ADP, thrombin, platelet-activating factor and collagen []. The role of disintegrin in preventing blood coagulation renders it of medical interest, particularly with regard to its use as an anti-coagulant [].Disintegrins from different snake species have been characterised: albolabrin, applagin, barbourin, batroxostatin, bitistatin, obtustatin [ ], schistatin [], echistatin [], elegantin, eristicophin, flavoridin [], halysin, kistrin, tergeminin, salmosin [] and triflavin.Disintegrin-like proteins are found in various species ranging from slime mold to humans. Some other proteins known to contain a disintegrin domain are:Some snake venom zinc metalloproteinases [ ] consist of an N-terminal catalytic domain fused to a disintegrin domain. Such is the case for trimerelysin I (HR1B), atrolysin-e (Ht-e) and trigramin. It has been suggested that these proteinases are able to cleave themselves from the disintegrin domains and that the latter may arise from such a post-translational processing.The beta-subunit of guinea pig sperm surface protein PH30 [ ]. PH30 is a protein involved in sperm-egg fusion. The beta subunit contains a disintegrin at the N-terminal extremity.Mammalian epididymial apical protein 1 (EAP I) [ ]. EAP I is associated with the sperm membrane and may play a role in sperm maturation. Structurally, EAP I consists of an N-terminal domain, followed by a zinc metalloproteinase domain, a disintegrin domain, and a large C-terminal domain that contains a transmembrane region.
Protein Domain
Name: CRISPR-associated protein, TM1791
Type: Family
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. This family represents a family of Cas proteins that includes TM1791. This family is both closely related to and frequently encoded next to the TM1792 family of Cas proteins. The two proteins are fused in an example from Methanopyrus kandleri ( ), where it is describes as a predicted component of a thermophile-specific DNA repair system containing contains two domains of the RAMP family [ ].
Protein Domain
Name: CRISPR-associated protein, Csn2-type
Type: Family
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability []. This entry represents the Csn2 family of Cas proteins, which are found only in CRISPR-containing species, near other CRISPR-associated proteins (cas), as part of the NMENI subtype of CRISPR/Cas loci. The species range so far for this subtype is animal pathogens and commensals only. This protein is present in some but not all NMENI CRISPR/Cas loci.
Protein Domain
Name: CRISPR-associated protein, Cas5a type
Type: Family
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA). The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. This entry represents a minor family of CRISPR-associated proteins, which includes MJ0382 from Methanocaldococcus jannaschii (Methanococcus jannaschii). These proteins are found adjacent to a characteristic short, palindromic repeat cluster termed CRISPR, a probable mobile DNA element. The family is designated Cas5a, for CRISPR-associated protein Cas5, Apern subtype.
Protein Domain
Name: EutN/Ccml superfamily
Type: Homologous_superfamily
Description: The ethanolamine utilization protein EutN is involved in the cobalamin-dependent degradation of ethanolamine [ ]. The crystal structure of EutN contains a central five-stranded β-barrel, with an α-helix at the open end of this barrel (). The structure also contains three additional β-strands, which help the formation of a tight hexamer, with a hole in the centre. This suggests that EutN forms a pore, with an opening of 26 Amstrong in diameter on one face and 14 Amstrong on the other face [ ]. Beside the Escherichia coli ethanolamine utilization protein EutN and the Synechocystis sp. carboxysome (beta-type) structural protein CcmL, this family also includes alpha-type carboxysome structural proteins CsoS4A and CsoS4B (previously known as OrfA and OrfB), propanediol utilization protein PduN, and some hypothetical homologous of various bacterial microcompartments. The carboxysome, a polyhedral organelle, participates in carbon fixation by sequestering enzymes. It is the prototypical bacterial microcompartment. Its enzymatic components, ribulose bisphosphate carboxylase/oxygenase(RuBisCO) and carbonic anhydrase (CA), are surrounded by a polyhedral protein shell. Similarly, the ethanolamine utilization (eut) microcompartment, and the 1,2-propanediol utilization (pdu) microcompartment encapsulate the enzymes necessary for the process of cobalamin-dependent ethanolamine degradation, and coenzyme B12-dependent degradation of 1,2-propanediol, respectively, within its polyhedral protein shells. It is interesting that both carboxysome structural proteins CcmL and CsoS4A assemble as pentamers in the crystal structures, which might constitute the twelve pentameric vertices of a regular icosahedral carboxysome. However, the reported EutN structure is hexameric rather than pentameric. The absence of pentamers in Eut microcompartments might lead to less-regular icosahedral shell shapes. Due to the lack of structure evidence, the functional roles of the CsoS4A adjacent paralog, CsoS4B, and propanediol utilization protein PduN are not yet clear [, , , , , , , , , , , , , , , , , , , , , , , , , , ].This entry represents a superfamily of related bacterial proteins with roles in ethanolamine and carbon dioxide metabolism. The structure of these domains has a close or partly open barrel fold and a greek-key topology.
Protein Domain
Name: CRISPR-associated RAMP Cmr4
Type: Family
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. This entry represents a Cas family of proteins that includes TM1792 from Thermotoga maritima. It is part of the broad RAMP superfamily collection of CRISPR-associated proteins. It is the fourth of a recurring set of six proteins, four of are in the RAMP superfamily, that we designate the CRISPR RAMP module.
USDA
InterMine logo
The Legume Information System (LIS) is a research project of the USDA-ARS:Corn Insects and Crop Genetics Research in Ames, IA.
LegumeMine || ArachisMine | CicerMine | GlycineMine | LensMine | LupinusMine | PhaseolusMine | VignaMine | MedicagoMine
InterMine © 2002 - 2022 Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, United Kingdom