The K homology (KH) domain was first identified in the human heterogeneous nuclear ribonucleoprotein (hnRNP) K. It is a domain of around 70 amino acids that is present in a wide variety of quite diverse nucleic acid-binding proteins [
]. It has been shown to bind RNA [,
]. Like many other RNA-binding motifs, KH motifs are found in one or multiple copies (14 copies in chicken vigilin) and, at least for hnRNP K (three copies) and FMR-1 (two copies), each motif is necessary for in vitroRNA binding activity, suggesting that they may function cooperatively or, in the case of single KH motif proteins (for example, Mer1p), independently [
].According to structural [
,
,
] analysis the KH domain can be separated in two groups. The first group or type-1 contain a β-α-α-β-β-α structure, whereas in the type-2 the two last β-sheet are located in the N-terminal part of the domain (α-β-β-α-α-β). Sequence similarity between these two folds are limited to a short region (VIGXXGXXI) in the RNA binding motif. This motif is located between helices 1 and 2 in type-1 and between helices 2 and 3 in type-2. Proteins known to contain a type-1 KH domain include bacterial polyribonucleotide nucleotidyltransferase (); vertebrate Fragile X messenger ribonucleoprotein 1 (FMR1); eukaryotic heterogeneous nuclear ribonucleoprotein K (hnRNP K), one of at least 20 major proteins that are part of hnRNP particles in mammalian cells; mammalian poly(rC) binding proteins; Artemia salina glycine-rich protein GRP33; yeast PAB1-binding protein 2 (PBP2); vertebrate vigilin; and human high-density lipoprotein binding protein (HDL-binding protein).
The KLHL (Kelch-like) proteins generally have a BTB/POZ domain, a BACK domain, and five to six Kelch motifs. They constitute a subgroup at the intersection between the BTB/POZ domain and Kelch domain superfamilies. The BTB/POZ domain facilitates protein binding [
], while the Kelch domain (repeats) form β-propellers. The Kelch superfamily of proteins can be subdivided into five groups: (1) N-propeller, C-dimer proteins, (2) N-propeller proteins, (3) propeller proteins, (4) N-dimer, C-propeller proteins, and (5) C-propeller proteins. KLHL family members belong to the N-dimer, C-propeller subclass of Kelch repeat proteins []. In addition to BTB/POZ and Kelch domains, the KLHL family members contain a BACK domain, first described as a 130-residue region of conservation observed amongst BTB-Kelch proteins []. Many of the Kelch-like proteins have been identified as adaptors for the recruitment of substrates to Cul3-based E3 ubiquitin ligases [,
].Gigaxonin (also known as KLHL16) regulates microtubule-associated protein 1B (MAP1B), which is involved in maintaining the integrity of cytoskeletal structures and promoting neuronal stability [
]. Gigaxonin belongs to the KLHL family. Mutations in the gigaxonin gene cause giant axonal neuropathy (GAN), which is a devastating sensory and motor neuropathy []. Gigaxonin binds to the ubiquitin-activating enzyme E1 through its N-terminal BTB domain, while the carboxy-terminal kelch repeat domain interacts directly with the light chain (LC) of MAP1B. It may serve as an ubiquitin substrate adaptor protein that controls MAP1B-LC degradation [,
]. This entry represents the BACK domain of gigaxonin, localized between the BTB the kelch repeats domains.
MPP5, also called PALS1 (protein associated with Lin7) or Nagie oko protein in zebrafish [
] or Stardust in Drosophila [], is a scaffolding protein which associates with Crumbs homologue 1 (CRB1), CRB2, or CRB3 through its PDZ domain and with PALS1-associated tight junction protein (PATJ) or multi-PDZ domain protein 1 (MUPP1) through its L27 domain []. The resulting tri-protein complexes are core proteins of the Crumb complex, which localizes at tight junctions or subapical regions, and is involved in the maintenance of apical-basal polarity in epithelial cells and the morphogenesis and function of photoreceptor cells []. MPP5 is critical for the proper stratification of the retina and is also expressed in T lymphocytes where it is important for TCR-mediated activation of NFkB []. Drosophila Stardust exists in several isoforms, some of which show opposing functions in photoreceptor cells, which suggests that the relative ratio of different Crumbs complexes regulates photoreceptor homeostasis []. MPP5 belongs to the membrane-associated guanylate kinase (MAGUK) p55 subfamily.The membrane-associated guanylate kinase (MAGUK) p55 subfamily (also known as MPP subfamily) members include the Drosophila Stardust protein and its vertebrate homologues, MPP1-7. They contain the core of three domains characteristic of MAGUK (membrane-associated guanylate kinase) proteins: PDZ, SH3, and guanylate kinase (GuK). In addition, they also contain the Hook (Protein 4.1 Binding) motif in between the SH3 and GuK domains [
]. MPP2-7 have two additional L27 domains at their N terminus. The GuK domain in MAGUK proteins is enzymatically inactive; instead, the domain mediates protein-protein interactions and associates intramolecularly with the SH3 domain [].
Ajuba-like LIM domain-containing protein family includes three highly homologous proteins Ajuba, LIMD1, and WTIP, adapter or scaffold proteins that participate in the assembly of numerous protein complexes and are involved in several cellular processes such as cell fate determination, cytoskeletal organization, repression of gene transcription, mitosis, cell-cell adhesion, cell differentiation, proliferation and migration [
]. They bind to Ago1/2, RCK, Dcp2, and eIF4E in vivo, being required for miRNA-mediated gene silencing. These three proteins bind to the mRNA 5' m(7)GTP cap-protein complex []. Members of this family negatively regulates the Hippo signalling pathway []. WTIP, the Wt1-interacting protein, was originally identified as an interaction partner of the Wilms tumour protein 1 (WT1). WTIP is involved in kidney and neural crest development. It interacts with the receptor tyrosine kinase Ror2 and inhibits canonical Wnt signaling.LIMD1 was reported to inhibit cell growth and metastases; it has been described that LIMD1 functions as tumour suppressor to block lung tumour cell line in vitro and in vivo [
,
]. The inhibition may be mediated through an interaction with the protein barrier-to-autointegration (BAF), a component of SWI/SNF chromatin-remodeling protein; or through the interaction with retinoblastoma protein (pRB), resulting in inhibition of E2F-mediated transcription, and expression of the majority of genes with E2F1-responsive elements. Recently, LIMD1 was shown to interact with the p62/sequestosome protein and influence IL-1 and RANKL signalling by facilitating the assembly of a p62/TRAF6/a-PKC multi-protein complex.Members of the family contain three tandem C-terminal LIM domains and a proline-rich N-terminal region.
Leucine-rich repeats (LRR) consist of 2-45 motifs of 20-30 amino acids in length that generally folds into an arc or horseshoe shape [
]. LRRs occur in proteins ranging from viruses to eukaryotes, and appear to provide a structural framework for the formation of protein-protein interactions [,
].Proteins containing LRRs include tyrosine kinase receptors, cell-adhesion molecules, virulence factors, and extracellular matrix-binding glycoproteins, and are involved in a variety of biological processes, including signal transduction, cell adhesion, DNA repair, recombination, transcription, RNA processing, disease resistance, apoptosis, and the immune response [,
].Sequence analyses of LRR proteins suggested the existence of several different subfamilies of LRRs. The significance of this classification is that repeats from different subfamilies never occur simultaneously and have most probably evolved independently. It is, however, now clear that all major classes of LRR have curved horseshoe structures with a parallel beta sheet on the concave side and mostly helical elements on the convex side. At least six families of LRR proteins, characterised by different lengths and consensus sequences of the repeats, have been identified. Eleven-residue segments of the LRRs (LxxLxLxxN/CxL), corresponding to the β-strand and adjacent loop regions, are conserved in LRR proteins, whereas the remaining parts of the repeats (herein termed variable) may be very different. Despite the differences, each of the variable parts contains two half-turns at both ends and a "linear"segment (as the chain follows a linear path overall), usually formed by a helix, in the middle. The concave face and the adjacent loops are the most common protein interaction surfaces on LRR proteins.
3D structure of some LRR proteins-ligand complexes show that the concave surface of LRR domain is ideal for interaction with α-helix, thus supporting earlier conclusions that the elongated and curved LRR structure provides an outstanding framework for achieving diverse protein-protein interactions []. Molecular modeling suggests that the conserved pattern LxxLxL, which is shorter than the previously proposed LxxLxLxxN/CxL is sufficient to impart the characteristic horseshoe curvature to proteins with 20- to 30-residue repeats []. This entry includes some LRRs that fail to be detected by the
model.
Leucine-rich repeats (LRR) consist of 2-45 motifs of 20-30 amino acids in length that generally folds into an arc or horseshoe shape [
]. LRRs occur in proteins ranging from viruses to eukaryotes, and appear to provide a structural framework for the formation of protein-protein interactions [,
].Proteins containing LRRs include tyrosine kinase receptors, cell-adhesion molecules, virulence factors, and extracellular matrix-binding glycoproteins, and are involved in a variety of biological processes, including signal transduction, cell adhesion, DNA repair, recombination, transcription, RNA processing, disease resistance, apoptosis, and the immune response [,
].Sequence analyses of LRR proteins suggested the existence of several different subfamilies of LRRs. The significance of this classification is that repeats from different subfamilies never occur simultaneously and have most probably evolved independently. It is, however, now clear that all major classes of LRR have curved horseshoe structures with a parallel beta sheet on the concave side and mostly helical elements on the convex side. At least six families of LRR proteins, characterised by different lengths and consensus sequences of the repeats, have been identified. Eleven-residue segments of the LRRs (LxxLxLxxN/CxL), corresponding to the β-strand and adjacent loop regions, are conserved in LRR proteins, whereas the remaining parts of the repeats (herein termed variable) may be very different. Despite the differences, each of the variable parts contains two half-turns at both ends and a "linear"segment (as the chain follows a linear path overall), usually formed by a helix, in the middle. The concave face and the adjacent loops are the most common protein interaction surfaces on LRR proteins.
3D structure of some LRR proteins-ligand complexes show that the concave surface of LRR domain is ideal for interaction with α-helix, thus supporting earlier conclusions that the elongated and curved LRR structure provides an outstanding framework for achieving diverse protein-protein interactions []. Molecular modeling suggests that the conserved pattern LxxLxL, which is shorter than the previously proposed LxxLxLxxN/CxL is sufficient to impart the characteristic horseshoe curvature to proteins with 20- to 30-residue repeats []. This entry includes some LRRs that fail to be detected by
[
,
].
Ubiquitinylation is an ATP-dependent process that involves the action of at least three enzymes: a ubiquitin-activating enzyme (E1,
), a ubiquitin-conjugating enzyme (E2,
), and a ubiquitin ligase (E3,
,
), which work sequentially in a cascade. There are many different E3 ligases, which are responsible for the type of ubiquitin chain formed, the specificity of the target protein, and the regulation of the ubiquitinylation process [
]. Ubiquitinylation is an important regulatory tool that controls the concentration of key signalling proteins, such as those involved in cell cycle control, as well as removing misfolded, damaged or mutant proteins that could be harmful to the cell. Several ubiquitin-like molecules have been discovered, such as Ufm1 (), SUMO1 (
), NEDD8, Rad23 (
), Elongin B and Parkin (
), the latter being involved in Parkinson's disease [
].Ubiquitin is a protein of 76 amino acid residues, found in all eukaryotic cells and whose sequence is extremely well conserved from protozoan to vertebrates. Ubiquitin acts through its post-translational attachment (ubiquitinylation) to other proteins, where these modifications alter the function, location or trafficking of the protein, or targets it for destruction by the 26S proteasome [
]. The terminal glycine in the C-terminal 4-residue tail of ubiquitin can form an isopeptide bond with a lysine residue in the target protein, or with a lysine in another ubiquitin molecule to form a ubiquitin chain that attaches itself to a target protein. Ubiquitin has seven lysine residues, any one of which can be used to link ubiquitin molecules together, resulting in different structures that alter the target protein in different ways. It appears that Lys(11)-, Lys(29) and Lys(48)-linked poly-ubiquitin chains target the protein to the proteasome for degradation, while mono-ubiquitinylated and Lys(6)- or Lys(63)-linked poly-ubiquitin chains signal reversible modifications in protein activity, location or trafficking [
]. For example, Lys(63)-linked poly-ubiquitinylation is known to be involved in DNA damage tolerance, inflammatory response, protein trafficking and signal transduction through kinase activation []. In addition, the length of the ubiquitin chain alters the fate of the target protein. Regulatory proteins such as transcription factors and histones are frequent targets of ubquitinylation [].
Ubiquitinylation is an ATP-dependent process that involves the action of at least three enzymes: a ubiquitin-activating enzyme (E1,
), a ubiquitin-conjugating enzyme (E2,
), and a ubiquitin ligase (E3,
,
), which work sequentially in a cascade. There are many different E3 ligases, which are responsible for the type of ubiquitin chain formed, the specificity of the target protein, and the regulation of the ubiquitinylation process [
]. Ubiquitinylation is an important regulatory tool that controls the concentration of key signalling proteins, such as those involved in cell cycle control, as well as removing misfolded, damaged or mutant proteins that could be harmful to the cell. Several ubiquitin-like molecules have been discovered, such as Ufm1 (), SUMO1 (
), NEDD8, Rad23 (
), Elongin B and Parkin (
), the latter being involved in Parkinson's disease [
].Ubiquitin is a protein of 76 amino acid residues, found in all eukaryotic cells and whose sequence is extremely well conserved from protozoan to vertebrates. Ubiquitin acts through its post-translational attachment (ubiquitinylation) to other proteins, where these modifications alter the function, location or trafficking of the protein, or targets it for destruction by the 26S proteasome []. The terminal glycine in the C-terminal 4-residue tail of ubiquitin can form an isopeptide bond with a lysine residue in the target protein, or with a lysine in another ubiquitin molecule to form a ubiquitin chain that attaches itself to a target protein. Ubiquitin has seven lysine residues, any one of which can be used to link ubiquitin molecules together, resulting in different structures that alter the target protein in different ways. It appears that Lys(11)-, Lys(29) and Lys(48)-linked poly-ubiquitin chains target the protein to the proteasome for degradation, while mono-ubiquitinylated and Lys(6)- or Lys(63)-linked poly-ubiquitin chains signal reversible modifications in protein activity, location or trafficking []. For example, Lys(63)-linked poly-ubiquitinylation is known to be involved in DNA damage tolerance, inflammatory response, protein trafficking and signal transduction through kinase activation []. In addition, the length of the ubiquitin chain alters the fate of the target protein. Regulatory proteins such as transcription factors and histones are frequent targets of ubquitinylation [].This entry represents the conserved region at the centre of the Ubiquitin sequence.
Leucine-rich repeats (LRR) consist of 2-45 motifs of 20-30 amino acids in length that generally folds into an arc or horseshoe shape [
]. LRRs occur in proteins ranging from viruses to eukaryotes, and appear to provide a structural framework for the formation of protein-protein interactions [,
].Proteins containing LRRs include tyrosine kinase receptors, cell-adhesion molecules, virulence factors, and extracellular matrix-binding glycoproteins, and are involved in a variety of biological processes, including signal transduction, cell adhesion, DNA repair, recombination, transcription, RNA processing, disease resistance, apoptosis, and the immune response [,
].Sequence analyses of LRR proteins suggested the existence of several different subfamilies of LRRs. The significance of this classification is that repeats from different subfamilies never occur simultaneously and have most probably evolved independently. It is, however, now clear that all major classes of LRR have curved horseshoe structures with a parallel beta sheet on the concave side and mostly helical elements on the convex side. At least six families of LRR proteins, characterised by different lengths and consensus sequences of the repeats, have been identified. Eleven-residue segments of the LRRs (LxxLxLxxN/CxL), corresponding to the β-strand and adjacent loop regions, are conserved in LRR proteins, whereas the remaining parts of the repeats (herein termed variable) may be very different. Despite the differences, each of the variable parts contains two half-turns at both ends and a "linear"segment (as the chain follows a linear path overall), usually formed by a helix, in the middle. The concave face and the adjacent loops are the most common protein interaction surfaces on LRR proteins.
3D structure of some LRR proteins-ligand complexes show that the concave surface of LRR domain is ideal for interaction with α-helix, thus supporting earlier conclusions that the elongated and curved LRR structure provides an outstanding framework for achieving diverse protein-protein interactions [
]. Molecular modeling suggests that the conserved pattern LxxLxL, which is shorter than the previously proposed LxxLxLxxN/CxL is sufficient to impart the characteristic horseshoe curvature to proteins with 20- to 30-residue repeats []. This signature describes a leucine-rich repeat variant (LRV), which has a novel repetitive structural motif consisting of alternating alpha- and 3(10)-helices arranged in a right-handed superhelix, with the absence of the β-sheets present in other LRRs [
].
Leucine-rich repeats (LRR) consist of 2-45 motifs of 20-30 amino acids in length that generally folds into an arc or horseshoe shape [
]. LRRs occur in proteins ranging from viruses to eukaryotes, and appear to provide a structural framework for the formation of protein-protein interactions [,
].Proteins containing LRRs include tyrosine kinase receptors, cell-adhesion molecules, virulence factors, and extracellular matrix-binding glycoproteins, and are involved in a variety of biological processes, including signal transduction, cell adhesion, DNA repair, recombination, transcription, RNA processing, disease resistance, apoptosis, and the immune response [,
].Sequence analyses of LRR proteins suggested the existence of several different subfamilies of LRRs. The significance of this classification is that repeats from different subfamilies never occur simultaneously and have most probably evolved independently. It is, however, now clear that all major classes of LRR have curved horseshoe structures with a parallel beta sheet on the concave side and mostly helical elements on the convex side. At least six families of LRR proteins, characterised by different lengths and consensus sequences of the repeats, have been identified. Eleven-residue segments of the LRRs (LxxLxLxxN/CxL), corresponding to the β-strand and adjacent loop regions, are conserved in LRR proteins, whereas the remaining parts of the repeats (herein termed variable) may be very different. Despite the differences, each of the variable parts contains two half-turns at both ends and a "linear"segment (as the chain follows a linear path overall), usually formed by a helix, in the middle. The concave face and the adjacent loops are the most common protein interaction surfaces on LRR proteins.
3D structure of some LRR proteins-ligand complexes show that the concave surface of LRR domain is ideal for interaction with α-helix, thus supporting earlier conclusions that the elongated and curved LRR structure provides an outstanding framework for achieving diverse protein-protein interactions []. Molecular modeling suggests that the conserved pattern LxxLxL, which is shorter than the previously proposed LxxLxLxxN/CxL is sufficient to impart the characteristic horseshoe curvature to proteins with 20- to 30-residue repeats []. LRRs are often flanked by cysteine-rich domains: an N-terminal LRR domain and a C-terminal LRR domain (
). This entry represents the N-terminal LRR domain.
Bacterial microcompartments (BMCs) are large proteinaceous structures comprised of a roughly icosahedral shell and a series of encapsulated enzymes. They are found across bacteria where they play functionally diverse roles including CO(2) fixation and the catabolism of a range of organic compounds. They function as organelles by sequestering particular metabolic processes within the cell. A shell or capsid, which is composed of a few thousand protein subunits, surrounds a series of sequentially acting enzymes and controls the diffusion of substrates and products (including toxic or volatile intermediates) into and out of the lumen. Although functionally distinct BMCs vary in their encapsulated enzymes, all are defined by homologous shell proteins. The shells of BMCs are made primarily of a family of proteins whose structural core is the BMC domain, and variations upon this core provide functional diversity [,
,
]. There are three classes of constituent proteins that form a shell with icosahedral symmetry: hexamer-forming proteins containing a single BMC domain (BMC-H); trimer/pseudohexamer-forming proteins consisting of a fusion of two BMC domains (BMC-T), and pentamer-forming proteins containing a bacterial microcompartment vertex (or BMV) domain (BMC-P). The BMC-H and BMC-T proteins form the facets, and the BMC-P proteins form the vertices of the icosahedron. These three protein types form cyclic homooligomers with pores at the centre of symmetry that enable metabolite transport across the shell [
,
,
,
,
,
,
,
].The BMC domain fold consists of three α-helices (designated A, B, and C) and four β-strands (designated β1, β2, β3, and β4). Some instances of the BMC shell protein reveal a circular permutation in which a highly similar tertiary structure is built from secondary structure elements occurring in a different order. The secondary structure elements contributed by the C-terminal region of the typical BMC fold are instead contributed by the N-terminal region of the BMC circularly permuted domain [
,
,
].This entry represents the BMC domain found in CsoS1/CcmK and related proteins. CsoS1 and CcmK are the shell proteins of the carboxysome, a polyhedral inclusion where RuBisCO (ribulose bisphosphate carboxylase, ccbL-ccbS) is sequestered [,
].
Structural maintenance of chromosomes protein, prokaryotic
Type:
Family
Description:
The SMC (structural maintenance of chromosomes) family of proteins, exist in virtually all organisms, including bacteria and archaea. The SMC proteins are essential for successful chromosome transmission during replication and segregation of the genome in all organisms. They function together with other proteins in a range of chromosomal transactions, including chromosome condensation, sister-chromatid cohesion, recombination, DNA repair and epigenetic silencing of gene expression [
].SMCs are generally present as single proteins in bacteria, and as at least six distinct proteins in eukaryotes. The proteins range in size from approximately 110 to 170kDa, and share a five-domain structure, with globular N- and C-terminal domains separated by a long
(circa 100 nm or 900 residues) coiled coil segment in the centre of which is a globular ''hinge'' domain, characterised by a set of four highly conserved glycine residuesthat are typical of flexible regions in a protein. The amino-terminal domain contains a 'Walker A' nucleotide-binding domain (GxxGxGKS/T), which has been shown by mutational studies to be essential in several proteins. The carboxy-terminal domain contains a sequence (the DA-box) that resembles a 'Walker B' motif (XXXXD, where X is any hydrophobic residue), and a LSGG motif with homology to the signature sequence of the ATP-binding cassette (ABC) family of ATPases [
]. All SMC proteins appear to form dimers, either forming homodimers, as in the case of prokaryotic SMC proteins, or heterodimers between different but related SMC proteins. The dimers form core components of large multiprotein complexes. The best known complexes are cohesin, which is responsible for sister-chromatid cohesion, and condensin, which is required for full chromosome condensation in mitosis. SMC dimers are arranged in an antiparallel alignment. This orientation brings the N- and C-terminal globular domains (from either different or identical protamers) together, which unites an ATP binding site (Walker A motif) within the N-terminal domain with a Walker B motif (DA box) within the C-terminal domain, to form a potentially functional ATPase. Protein interaction and microscopy data suggest that SMC dimers form a ring-like structure which might embrace DNA molecules. Non-SMC subunits associate with the SMC amino- and carboxy-terminal domains.This entry represents the SMC protein from bacteria and archaea [
,
,
].
MAM is an acronym derived from meprin, A-5 protein, and receptor protein-tyrosine phosphatase mu. The MAM domain consists of approximately 170 amino acids. It occurs in several cell surface proteins and is likely to have an adhesive function [
]. The domain has been shown to play a role in homodimerization of protein-tyrosine phosphatase mu [] and appears to help determine the specificity of these interactions. It contains four conserved cysteines which probably form two disulfide bridges. It has been reported that certain cysteine mutations in the MAM domain of murine meprin A result in the formation of monomeric meprin, which has altered stability and activity []. This indicates that these domain-domain interactions are critical for structure and function of the enzyme. Proteins containing this domain are listed below.Meprin. This cell surface glycoprotein contains a zinc-metalloprotease
domain capable of degrading a variety of polypeptides. Meprin is composedof two structurally related subunits (alpha and beta) that form homo- or
heterotetramers by the non-covalent association of two disulfide-linkeddimers. In both subunits, the MAM domain is located after the catalytic
domain. It has also been shown that the MAM domain of meprins is necessary for correct folding and transport through the secretory pathway [].Neuropilin (A5 antigen), a calcium-independent cell adhesion molecule that
function during the formation of certain neuronal circuits. The sequencecontains 2 CUB domains
and a MAM domain.Receptor-like tyrosine protein phosphatases Mu, Kappa and PCP-2
()
. These PTPases have an extracellular region which consists ofa MAM domain followed by an Ig-like domain and four fibronectin-type III
domains.Vertebrate enteropeptidase (
), a type II membrane protein of the
intestinal brush border, which activates trypsinogen. It consists at leastof a catalytic light chain and a multidomain heavy chain which has 2 LDL
receptor class A domains, a MAM domain, a SRCR domain and a CUB domain.Apical endosomal glycoprotein from rat, a protein probably involved in the
sorting and selective transport of receptors and ligands across polarizedepithelia. This protein contains 6 MAM domains.Xenopus laevis thyroid hormone induced protein B. This protein contains 4
MAM domains.Pig zonadhesin, a protein that binds in a species-specific manner to the
zona pellucida of the egg.
This entry includes proteins with a JmjC domain. Proteins are bifunctional, acting as histone lysine demethylases and ribosomal histidine hydroxylases. Proteins include:Bifunctional lysine-specific demethylase and histidyl-hydroxylase NO66 (also known as Jumanjic domain protein 1) [
].Ribosomal oxygenase 1 (also known as ribosomal oxygenase NO66), which specifically demethylates 'Lys-4' (H3K4me) and 'Lys-36' (H3K36me) of histone H3 [
].Ribosomal oxygenase 2, which demethylates trimethylated 'Lys-9' on histone H3 (H3K9me3), leading to an increase in ribosomal RNA expression [
]. It also hydroxylates 60S ribosomal protein L27a on 'His-39' [].50S ribosomal protein L16 3-hydroxylase from Escherichia coli, which catalyzes the hydroxylation of 50S ribosomal protein L16 on 'Arg-81' [
].The JmjN and JmjC domains are two non-adjacent domains which have been identified in the jumonji family of transcription factors. Although it was originally suggested that the JmjN and JmjC domains always co-occur and might form a single functional unit within the folded protein, the JmjC domain was later found without the JmjN domain in organisms from bacteria to human [
,
,
].Proteins containing JmjC domain are predicted to be metalloenzymes that adopt the cupin fold and are candidates for enzymes that regulate chromatin remodelling [
]. The cupin fold is a flattened β-barrel structure containing two sheets of five antiparallel β-strands that form the walls of a zinc-binding cleft. Based on the crystal structure of JmjC domain containing protein FIH and JHDM3A/JMJD2A, the JmjC domain forms an enzymatically active pocket that coordinates Fe(III) and alphaKG. Three amino-acid residues within the JmjC domain bind to the Fe(II) cofactor and two additional residues bind to alphaKG []. JmjC domains were identified in numerous eukaryotic proteins containing domains typical of transcription factors, such as PHD, C2H2, ARID/BRIGHT and zinc fingers [
,
]. The JmjC has been shown to function in a histone demethylation mechanism that is conserved from yeast to human []. JmjC domain proteins may be protein hydroxylases that catalyse a novel histone modification []. The human JmjC protein named Tyw5p unexpectedly acts in the biosynthesis of a hypermodified nucleoside, hydroxy-wybutosine, in tRNA-Phe by catalysing hydroxylation [].
The CUB domain (for complement C1r/C1s, Uegf, Bmp1) is a structural motif of approximately 110 residues found almost exclusively in extracellular and plasma membrane-associated proteins, many of which are developmentally regulated [
,
]. These proteins are involved in a diverse range of functions, including complement activation, developmental patterning, tissue repair, axon guidance and angiogenesis, cell signalling, fertilisation, haemostasis, inflammation, neurotransmission, receptor-mediated endocytosis, and tumour suppression []. Many CUB-containing proteins are peptidases belonging to MEROPS peptidase families M12A (astacin) and S1A (chymotrypsin). Proteins containing a CUB domain include:Mammalian complement subcomponents C1s/C1r, which form the calcium-dependent complex C1, the first component of the classical pathway of the complement system.Cricetidae sp. (Hamster) serine protease Casp, which degrades type I and IV collagen and fibronectin in the presence of calcium.Mammalian complement-activating component of Ra-reactive factor (RARF), a protease that cleaves the C4 component of complement.Vertebrate enteropeptidase (
), a type II membrane protein of the intestinal brush border, which activates trypsinogen.
Vertebrate bone morphogenic protein 1 (BMP-1), a protein which induces cartilage and bone formation and expresses metalloendopeptidase activity.Sea urchin blastula proteins BP10 and SpAN.Caenorhabditis elegans hypothetical proteins F42A10.8 and R151.5.Neuropilin (A5 antigen), a calcium-independent cell adhesion molecule that functions during the formation of certain neuronal circuits.Fibropellins I and III from Strongylocentrotus purpuratus (Purple sea urchin).Mammalian hyaluronate-binding protein TSG-6 (or PS4), a serum and growth factor induced protein.Mammalian spermadhesins.Xenopus laevis embryonic protein UVS.2, which is expressed during dorsoanterior development.Several of the above proteins consist of a catalytic domain together with several CUB domains interspersed by calcium-binding EGF domains. Some CUB domains appear to be involved in oligomerisation and/or recognition of substrates and binding partners. For example, in the complement proteases, the CUB domains mediate dimerisation and binding to collagen-like regions of target proteins (e.g. C1q for C1r/C1s). The structure of CUB domains consists of a β-sandwich with a jelly-roll fold. Almost all CUB domains contain four conserved cysteines that probably form two disulphide bridges (C1-C2, C3-C4). The CUB1 domains of C1s and Map19 have calcium-binding sites [
].Structurally, the spermadhesins consist of a CUB domain [
].
Iron-sulphur (FeS) clusters are important cofactors for numerous proteins involved in electron transfer, in redox and non-redox catalysis, in gene regulation, and as sensors of oxygen and iron. These functions depend on the various FeS cluster prosthetic groups, the most common being [2Fe-2S] and [4Fe-4S][
]. FeS cluster assembly is a complex process involving the mobilisation of Fe and S atoms from storage sources, their assembly into [Fe-S]form, their transport to specific cellular locations, and their transfer to recipient apoproteins. So far, three FeS assembly machineries have been identified, which are capable of synthesising all types of [Fe-S] clusters: ISC (iron-sulphur cluster), SUF (sulphur assimilation), and NIF (nitrogen fixation) systems.The ISC system is conserved in eubacteria and eukaryotes (mitochondria), and has broad specificity, targeting general FeS proteins [
,
]. It is encoded by the isc operon (iscRSUA-hscBA-fdx-iscX). IscS is a cysteine desulphurase, which obtains S from cysteine (converting it to alanine) and serves as a S donor for FeS cluster assembly. IscU and IscA act as scaffolds to accept S and Fe atoms, assembling clusters and transferring them to recipient apoproteins. HscA is a molecular chaperone and HscB is a co-chaperone. Fdx is a [2Fe-2S]-type ferredoxin. IscR is a transcription factor that regulates expression of the isc operon. IscX (also known as YfhJ) appears to interact with IscS and may function as an Fe donor during cluster assembly [
].The SUF system is an alternative pathway to the ISC system that operates under iron starvation and oxidative stress. It is found in eubacteria, archaea and eukaryotes (plastids). The SUF system is encoded by the suf operon (sufABCDSE), and the six encoded proteins are arranged into two complexes (SufSE and SufBCD) and one protein (SufA). SufS is a pyridoxal-phosphate (PLP) protein displaying cysteine desulphurase activity. SufE acts as a scaffold protein that accepts S from SufS and donates it to SufA [
]. SufC is an ATPase with an unorthodox ATP-binding cassette (ABC)-like component. SufA is homologous to IscA [], acting as a scaffold protein in which Fe and S atoms are assembled into [FeS]cluster forms, which can then easily be transferred to apoproteins targets.
In the NIF system, NifS and NifU are required for the formation of metalloclusters of nitrogenase in Azotobacter vinelandii, and other organisms, as well as in the maturation of other FeS proteins. Nitrogenase catalyses the fixation of nitrogen. It contains a complex cluster, the FeMo cofactor, which contains molybdenum, Fe and S. NifS is a cysteine desulphurase. NifU binds one Fe atom at its N-terminal, assembling an FeS cluster that is transferred to nitrogenase apoproteins [
]. Nif proteins involved in the formation of FeS clusters can also be found in organisms that do not fix nitrogen [].This entry represents the HscA chaperone protein from the SUF system. HscA (or Hsc66) is a specialised bacterial Hsp70-class molecular chaperone that participates in the assembly of iron-sulphur cluster proteins. HscA resembles DnaK, but belongs to a separate clade. HscA interacts with IscU, which is believed to serve as a template for Fe-S cluster formation. The HscA-IscU interaction is facilitated by the J-type co-chaperone protein HscB (or Hsc20), which binds to both HscA and IscU, bringing them into contact with each other. HscA recognises a conserved LPPVK sequence motif at positions 99-103 of IscU [
].
The viral polyprotein of parechoviruses contains: coat protein VP0 (P1AB); coat protein VP3 (P1C); coat protein VP1 (P1D); picornain 2A (
, core protein P2A); core protein P2B; core protein P2C; core protein P3A; genome-linked protein VPg (P3B); picornain 3C (
, MEROPS peptidase subfamily 3CF: parechovirus picornain 3C (P3C))[
].This entry consists of the parechovirus P3A protein. P3A has been identified as a genome-linked protein (VPg), which is involved in replication [
].
The ClpA/B family of ATP-binding proteins includes the regulatory subunit of the ATP-dependent protease Clp, ClpA; heat shock proteins ClpB, 104 and 78; and chloroplast proteins CD4a (ClpC) and CD4b [
,
]. The proteins are thought to protect cells from stress by controlling the aggregation and denaturation of vital cellular structures. They vary in size, but share two conserved regions of about 200 amino acids that each contains an ATP-binding site [].This entry represents a conserved site found in the second conserved region.Proteins containing this site are listed below:Escherichia coli clpA, which acts as the regulatory subunit of the ATP- dependent protease clp.Rhodopseudomonas blastica clpA homolog.Escherichia coli heat shock protein clpB and homologues in other bacteria.Bacillus subtilis protein mecB.Yeast heat shock protein 104 (gene HSP104), which is vital for tolerance to heat, ethanol and other stresses.Neurospora heat shock protein hsp98.Yeast mitochondrial heat shock protein 78 (gene HSP78) [
].CD4A and CD4b, two highly related tomato proteins that seem to be located in the chloroplast.Trypanosoma brucei protein clp.Porphyra purpurea chloroplast encoded clpC.
A group of microtubule-associated proteins called +TIPs (plus end tracking
proteins), including EB1 (end-binding protein 1) family proteins, labelgrowing microtubules ends specifically in diverse organisms and are implicated
in spindle dynamics, chromosome segregation, and directing microtubules towardcortical sites. EB1 members have a bipartite composition: the N-terminal CHdomain (
) mediates microtubule plus end localization and a C-terminal cargo binding domain (EB1-C) that captures cell polarity
determinants. The EB1-C domain comprises a unique EB1-like sequence motif thatacts as a binding site for other +TIP proteins. It interacts with the carboxy
terminus of the adenomatous polyposis coli (APC) tumor suppressor, a wellconserved +TIP phosphoprotein with a pivotal function in cell cycle
regulation. Another binding partner of the EB1-C domain is the well conserved+TIP protein dynactin, a component of the large cytoplasmic dynein/dynactin
complex [,
,
].The ~80-residue EB1-C domain starts with a long smoothly curved helix
(alpha1), which is followed by a hairpin connection leading to a short secondhelix (alpha2) running antiparallel to alpha1. The two
parallel alpha1 helices of the EB1-C domain dimer wrap around each other in aslightly left-handed supercoil. The two alpha2 helices run antiparallel to
helices alpha1 and form a similar fork in the opposite orientation and rotatedby 90 degrees. As a result, two helical segments from each monomer form a four-helix
bundle. The side chain forming the hydrophobic core of this bundle are highlyconserved [
,
,
].Some protein known to contain an EB1-C domain are listed below:
Yeast protein BIM1.Fission yeast microtubule integrity protein mal3.Vertebrate microtubule-associated protein RP/EB family member 1 (EB1).Vertebrate microtubule-associated protein RP/EB family member 2 (EB2 or
RP1).Vertebrate microtubule-associated protein RP/EB family member 3 (EBF3).
A group of microtubule-associated proteins called +TIPs (plus end tracking
proteins), including EB1 (end-binding protein 1) family proteins, labelgrowing microtubules ends specifically in diverse organisms and are implicated
in spindle dynamics, chromosome segregation, and directing microtubules towardcortical sites. EB1 members have a bipartite composition: the N-terminal CH
domain () mediates microtubule plus end localization and a C-terminal cargo binding domain (EB1-C) that captures cell polarity
determinants. The EB1-C domain comprises a unique EB1-like sequence motif thatacts as a binding site for other +TIP proteins. It interacts with the carboxy
terminus of the adenomatous polyposis coli (APC) tumor suppressor, a well
conserved +TIP phosphoprotein with a pivotal function in cell cycleregulation. Another binding partner of the EB1-C domain is the well conserved
+TIP protein dynactin, a component of the large cytoplasmic dynein/dynactincomplex [
,
,
].The ~80-residue EB1-C domain starts with a long smoothly curved helix
(alpha1), which is followed by a hairpin connection leading to a short secondhelix (alpha2) running antiparallel to alpha1. The two
parallel alpha1 helices of the EB1-C domain dimer wrap around each other in aslightly left-handed supercoil. The two alpha2 helices run antiparallel to
helices alpha1 and form a similar fork in the opposite orientation and rotatedby 90 degrees. As a result, two helical segments from each monomer form a four-helix
bundle. The side chain forming the hydrophobic core of this bundle are highlyconserved [
,
,
].Some protein known to contain an EB1-C domain are listed below:
Yeast protein BIM1.Fission yeast microtubule integrity protein mal3.Vertebrate microtubule-associated protein RP/EB family member 1 (EB1).Vertebrate microtubule-associated protein RP/EB family member 2 (EB2 or
RP1).Vertebrate microtubule-associated protein RP/EB family member 3 (EBF3).
This entry represents a domain found in the C-terminal of the coatamer beta subunit proteins (Beta-coat proteins). It is a platform domain on the appendage that carries a highly conserved tryptophan [
,
].Proteins synthesised on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. This traffic is bidirectional, to ensure that proteins required to form vesicles are recycled. Vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transfer [
]. While clathrin mediates endocytic protein transport, and transport from ER to Golgi, coatomers primarily mediate intra-Golgi transport, as well as the reverse Golgi to ER transport of dilysine-tagged proteins []. For example, the coatomer COP1 (coat protein complex 1) is responsible for reverse transport of recycled proteins from Golgi and pre-Golgi compartments back to the ER, while COPII buds vesicles from the ER to the Golgi []. Coatomers reversibly associate with Golgi (non-clathrin-coated) vesicles to mediate protein transport and for budding from Golgi membranes [
]. Activated small guanine triphosphatases (GTPases) attract coat proteins to specific membrane export sites, thereby linking coatomers to export cargos. As coat proteins polymerise, vesicles are formed and budded from membrane-bound organelles. Coatomer complexes also influence Golgi structural integrity, as well as the processing, activity, and endocytic recycling of LDL receptors. In mammals, coatomer complexes can only be recruited by membranes associated to ADP-ribosylation factors (ARFs), which are small GTP-binding proteins. Coatomer complexes are hetero-oligomers composed of at least an alpha, beta, beta', gamma, delta, epsilon and zeta subunits.
This family consists of the protein beta-microseminoprotein/prostate-associated microseminoprotein from humans and some small serum protein from snakes. Prostatic secretory protein of 94 amino acids (PSP94), also called beta-microseminoprotein, is a small, nonglycosylated protein, rich in cysteine residues. It was first isolated as a major protein from Homo sapiens seminal plasma [
]. The exact function of this protein is unknown.The small serum proteins may serve as a self-defense protein against the toxic effects of the snake venom during accidental envenomation.
Guanylate kinase (
) (GK) [
] catalyzes the ATP-dependent phosphorylation of GMP into GDP. It is essential for recycling GMP and indirectly, cGMP. In prokaryotes (such as Escherichia coli), lower eukaryotes(such as yeast) and in vertebrates, GK is a highly conserved monomeric protein of about 200 amino acids. GK has been shown [
,
,
] to be structurally similar to protein A57R (or SalG2R) from various strains of Vaccinia virus.Proteins containing one or more copies of the DHR domain, an SH3 domain as well as a C-terminal GK-like domain, are collectively termed MAGUKs (membrane-associated guanylate kinase homologues) [
], andinclude Drosophila lethal(1)discs large-1 tumor suppressor protein (gene dlg1); mammalian tight junction protein Zo-1; a family of mammalian synaptic proteins that seem to interact with the cytoplasmic tail of NMDA receptor subunits (SAP90/PSD-95, CHAPSYN-110/PSD-93, SAP97/DLG1 and SAP102); vertebrate 55kDa erythrocyte membrane protein (p55); Caenorhabditis elegans protein lin-2; rat protein CASK; and human proteins DLG2 and DLG3.
There is an ATP-binding site (P-loop) in the N-terminal section of GK, which is not conserved in the GK-like domain of the above proteins. However these proteins retain the residues known, in GK, to be involved in the binding of GMP.
WD-40 repeats (also known as WD or beta-transducin repeats) are short ~40 amino acid motifs, often terminating in a Trp-Asp (W-D) dipeptide. WD40 repeats usually assume a 7-8 bladed β-propeller fold, but proteins have been found with 4 to 16 repeated units, which also form a circularised β-propeller structure. WD-repeat proteins are a large family found in all eukaryotes and are implicated in a variety of functions ranging from signal transduction and transcription regulation to cell cycle control and apoptosis. Repeated WD40 motifs act as a site for protein-protein or protein-DNA interaction, and proteins containing WD40 repeats are known to serve as platforms for the assembly of protein complexes or mediators of transient interplay among other proteins [
]. The specificity of the proteins is determined by the sequences outside the repeats themselves. Examples of such complexes are G proteins (beta subunit is a β-propeller), TAFII transcription factor, and E3 ubiquitin ligase [,
]. In Arabidopsis spp., several WD40-containing proteins act as key regulators of plant-specific developmental events.This entry represents a region that spans the WD-40 repeats in members of the WD repeat G protein beta family.
Clusterin (Clu), also known as apolipoprotein J, is a vertebrate glycoprotein [
]. Clusterin expression is complex, appearing as different forms indifferent cell compartments. One set of proteins is directed for secretion, and other clusterin species are expressed in the
cytoplasm and nucleus. The secretory form of the clusterin protein (sCLU) is targeted to the ER by an initial leader peptide. This ~60kDa pre-sCLU protein is proteolytically cleaved into alpha- and beta-subunits and further glycosylated to form mature disulfide-linked heterodimeric secretory CLU (sCLU). sCLU is an 80kDa protein and acts as a molecular chaperone, scavenging denatured proteins outside cells [,
]. sCLU possesses nonspecific binding activity to hydrophobic domains of various non-native proteins [], binds to some bacteria and bacterial proteins [], and interacts with different immune molecules [].A specific nuclear form of CLU (nCLU) acts as a pro-death signal, inhibiting cell growth and
survival. ThenCLU protein has two coiled-coil domains, one at its N terminus that is unable to bind Ku70, and a C-terminal coiled-coil domain that is uniquely able to associate
with Ku70 and is minimally required for cell death. The sCLU protein is cytoprotective and anti-apoptotic, whereas the nCLU protein is pro-apoptotic [,
,
].
Ran (
) is an evolutionary conserved member of the Ras superfamily of small GTPases that regulates all receptor-mediated transport between the nucleus and the cytoplasm. Import receptors bind their cargos in the cytoplasm where the concentration of RanGTP is low and release their cargos in the nucleus where the concentration of RanGTP is high [
]. Export receptors respond to Ran GTP in the oppositemanner.
Nuclear transport factor 2 (NTF2) is a homodimer of approximately 14kDa subunits which stimulates efficient nuclear import of a cargo protein. NTF2 binds to both RanGDP and FxFG repeat-containing nucleoporins. NTF2 binds to RanGDP sufficiently strongly for the complex to remain intact during transport through NPCs, but the interaction between NTF2 and FxFG nucleoporins is much more transient, which would enable NTF2 to move through the NPC by hopping from one repeat to another [
,
].NTF2 folds into a cone with a deep hydrophobic cavity, the opening of which is surrounded by several negatively charged residues. RanGDP binds to NTF2 by inserting a conserved phenylalanine residue into the hydrophobic pocket of NTF2 and making electrostatic interactions with the conserved negatively charged residues that surround the cavity.This entry contains predominantly eukaryotic proteins. The following proteins contain a region similar to NTF2:
Eukaryotic NXF proteins []. These are nuclear mRNA export factors. These proteins contain, in addition to a NTF2 domain, a number of leucine-rich repeats and a UBA domain.Eukaryotic NXT1/NXT2 proteins [
]. These proteins are stimulators of protein export for NES-containing proteins. The also play a role in mRNA nuclear export. They heterodimerize with NFX proteins. In contrast to NTF2, NXT1 and NXT2 preferentially bind RanGTP.Eukaryotic Ras-GTPase-activating protein (GAP)-binding proteins (G3BP's). These proteins contain one NTF2 domain and one RRM (see
).
Acyl-CoA-binding protein (ACBP) is a small (10 Kd) protein that binds medium- and long-chain acyl-CoA esters with very high affinity and may function as an intracellular carrier of acyl-CoA esters [
]. ACBP is also known as diazepam binding inhibitor (DBI) or endozepine (EP) because of its ability to displace diazepam from the benzodiazepine (BZD) recognition site located on the GABA type A receptor. It is therefore possible that this protein also acts as a neuropeptide to modulate the action of the GABA receptor [].ACBP is a highly conserved protein of about 90 residues that is found in all four eukaryotic kingdoms, Animalia, Plantae, Fungi and Protista, and in some eubacterial species [
].Although ACBP occurs as a completely independent protein, intact ACB domains have been identified in a number of large, multifunctional proteins in a variety of eukaryotic species. These include large membrane-associated proteins with N-terminal ACB domains, multifunctional enzymes with both ACB and peroxisomal enoyl-CoA Delta(3), Delta(2)-enoyl-CoA isomerase domains, and proteins with both an ACB domain and ankyrin repeats (
) [
].The ACB domain consists of four α-helices arranged in a bowl shape with a
highly exposed acyl-CoA-binding site. The ligand is boundthrough specific interactions with residues on the protein, most notably
several conserved positive charges that interact with the phosphate group onthe adenosine-3'phosphate moiety, and the acyl chain is sandwiched between the
hydrophobic surfaces of CoA and the protein [].Other proteins containing an ACB domain include:
Endozepine-like peptide (ELP) (gene DBIL5) from mouse [
]. ELP is a testis-specific ACBP homologue that may be involved in the energy metabolism of the mature sperm.MA-DBI, a transmembrane protein of unknown function which has been found in
mammals. MA-DBI contains a N-terminal ACB domain.DRS-1 [
], a human protein of unknown function that contains a N-terminal ACB domain and a C-terminal enoyl-CoA isomerase/hydratase domain.
Proteins synthesised on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. This traffic is bidirectional, to ensure that proteins required to form vesicles are recycled. Vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transfer [
]. While clathrin mediates endocytic protein transport, and transport from ER to Golgi, coatomers primarily mediate intra-Golgi transport, as well as the reverse Golgi to ER transport of dilysine-tagged proteins []. For example, the coatomer COP1 (coat protein complex 1) is responsible for reverse transport of recycled proteins from Golgi and pre-Golgi compartments back to the ER, while COPII buds vesicles from the ER to the Golgi []. Coatomers reversibly associate with Golgi (non-clathrin-coated) vesicles to mediate protein transport and for budding from Golgi membranes []. Activated small guanine triphosphatases (GTPases) attract coat proteins to specific membrane export sites, thereby linking coatomers to export cargos. As coat proteins polymerise, vesicles are formed and budded from membrane-bound organelles. Coatomer complexes also influence Golgi structural integrity, as well as the processing, activity, and endocytic recycling of LDL receptors. In mammals, coatomer complexes can only be recruited by membranes associated to ADP-ribosylation factors (ARFs), which are small GTP-binding proteins. Coatomer complexes are hetero-oligomers composed of at least an alpha, beta, beta', gamma, delta, epsilon and zeta subunits. This group represents Coatomer subunit alpha. Structural studies show the homo-oligomerization of this protein plays a key role in the stability of the coat complex [
,
]. In humans, defects in its expression are related to primary immunodeficiencies that lead to immune dysregulation, arthritis and interstitial lung disease [
]. This protein has also been related to Alzheimer's disease [].
The G-patch domain is an approximately 48 amino acid domain, which is found in
a single copy in several RNA-associated proteins and in type D retroviralpolyproteins. It is widespread among eukaryotes but is absent in archaea and
bacteria. The G-patch domain has been called after its most notable feature,the presence of six highly conserved glycine residues. The position following
the first conserved glycine is occupied almost invariably by an aromaticresidue, and several other positions are occupied predominantly by either
hydrophobic or small residues. Several groups of G-patch containing proteinsare conserved in animals, plants and fungi. In some of these proteins the G-
patch is the only recognisable domain but in most of them it is combined withother domains, which include well-defined RNA-binding domains, such as the
RRM, dsRBD, SURP and R3H. It has been suggested that the G-patch domain has a specific function in RNA processing and, in particular, that it might be a previously
undetected RNA-binding domain mediating a distinct type of RNA-protein interaction.Secondary structure prediction indicates that the G-patch domain probably
contains two α-helices, with four out of the six glycines located withinan intervening loop.
Proteins known to contain a G-patch domain include:Eukaryotic 45kDa splicing factor (SPF-45).Mammmalian SON protein, a DNA-binding protein.Human LUCA15, a multidomain RNA-binding protein that is the product of a gene deleted in certain lung tumors.Human DAN26/EPROT, a multidomain protein, which, in addition to the G-patch domain, contains an RNA polymerase II C-terminal repeat-binding domain seen in many proteins of the polyA-addition machinery.Arabidopsis thaliana DRT111, a protein which has been shown to partially restore recombination proficiency and DNA-damage resistance to E. coli mutants.Type D retroviral polyprotein, where the G-patch domain is found directly downstream of the protease domain.
Secretion across the inner membrane in some Gram-negative bacteria occurs
via the preprotein translocase pathway. Proteins are produced in the cytoplasm as precursors, and require a chaperone subunit to direct them to the translocase component []. From there, the mature proteins are eithertargeted to the outer membrane, or remain as periplasmic proteins [
]. The translocase protein subunits are encoded on the bacterial chromosome. The translocase itself comprises 7 proteins, including a chaperone (SecB),
ATPase (SecA), an integral membrane complex (SecY, SecE and SecG), and two additional membrane proteins that promote the release of the mature peptide
into the periplasm (SecD and SecF) []. Other cytoplasmic/periplasmic proteins play a part in preprotein translocase activity, namely YidC and YajC []. The latter is bound in a complex to SecD and SecF, and plays a part in stabilising and regulating secretion through the SecYEG integral membrane component via SecA [].Homologues of the YajC gene have been found in a range of pathogenic and
commensal microbes. Brucella abortis YajC- and SecD-like proteins were shownto stimulate a Th1 cell-mediated immune response in mice, and conferred
protection when challenged with B.abortis []. Therefore, these proteins may have an antigenic role as well as a secretory one in virulent bacteria []. A number of previously uncharacterised "hypothetical"proteins also show similarity to E.coli YajC, suggesting that this family is wider than first thought [
].More recently, the precise interactions between the E.coli SecYEG complex,
SecD, SecF, YajC and YidC have been studied []. Rather than acting individually, the four proteins form a heterotetrameric complex and associate with the SecYEG heterotrimeric complex [
]. The SecF and YajC subunits link the complex to the integral membrane translocase. In
Bacillus subtilis, YajC is known as YrbF.
This short bi-helical repeat is related to HEAT repeats and is present in phycocyanobilin lyases and other proteins.Cyanobacteria and red algae harvest light energy using macromolecular complexes known as phycobilisomes (PBS), peripherally attached to the photosynthetic membrane. The major components of PBS are the phycobiliproteins. These heterodimeric proteins are covalently attached to phycobilins: open-chain tetrapyrrole chromophores, which function as the photosynthetic light-harvesting pigments. Phycobiliproteins differ in sequence and in the nature and number of attached phycobilins to each of their subunits. These proteins include the lyase enzymes that specifically attach particular phycobilins to apophycobiliprotein subunits. The most comprehensively studied of these is the CpcE/Flyase
,
, which attaches phycocyanobilin (PCB) to the alpha subunit of apophycocyanin [
]. Similarly, MpeU/V attaches phycoerythrobilin to phycoerythrin II, while CpeY/Z is thought to be involved in phycoerythrobilin (PEB) attachment to phycoerythrin (PE) I (PEs I and II differ in sequence and in the number of attached molecules of PEB: PE I has five, PE II has six) [].All the reactions of the above lyases involve an apoprotein cysteine SH addition to a terminal delta 3,3'-double bond. Such a reaction is not possible in the case of phycoviolobilin (PVB), the phycobilin of alpha-phycoerythrocyanin (alpha-PEC). It is thought that in this case, PCB, not PVB, is first added to apo-alpha-PEC, and is then isomerized to PVB. The addition reaction has been shown to occur in the presence of either of the components of alpha-PEC-PVB lyase PecE or PecF (or both). The isomerisation reaction occurs only when both PecE and PecF components are present, i.e. the PecE/F phycobiliprotein lyase is also a phycobilin isomerase [
]. Another member of this family is the NblB protein, whose similarity to the phycobiliprotein lyases was previously noted []. This constitutively expressed protein is not known to have any lyase activity. It is thought to be involved in the coordination of PBS degradation with environmental nutrient limitation. It has been suggested that the similarity of NblB to the phycobiliprotein lyases is due to the ability to bind tetrapyrrole phycobilins via the common repeated motif []. This repeat is also found in proteins not related to the phycobilisomes, such as archaeal proteins that are essential for chemotaxis and phototaxis [
], epoxyqueuosine reductases [] and deoxyhypusine hydroxylases [].
Signal recognition particle, SRP54 subunit, GTPase domain
Type:
Domain
Description:
The signal recognition particle (SRP) is a multimeric protein, which along with its conjugate receptor (SR), is involved in targeting secretory proteins to the rough endoplasmic reticulum (RER) membrane in eukaryotes, or to the plasma membrane in prokaryotes [
,
]. SRP recognises the signal sequence of the nascent polypeptide on the ribosome. In eukaryotes this retards its elongation until SRP docks the ribosome-polypeptide complex to the RER membrane via the SR receptor []. Eukaryotic SRP consists of six polypeptides (SRP9, SRP14, SRP19, SRP54, SRP68 and SRP72) and a single 300 nucleotide 7S RNA molecule. The RNA component catalyses the interaction of SRP with its SR receptor []. In higher eukaryotes, the SRP complex consists of the Alu domain and the S domain linked by the SRP RNA. The Alu domain consists of a heterodimer of SRP9 and SRP14 bound to the 5' and 3' terminal sequences of SRP RNA. This domain is necessary for retarding the elongation of the nascent polypeptide chain, which gives SRP time to dock the ribosome-polypeptide complex to the RER membrane. In archaea, the SRP complex contains 7S RNA like its eukaryotic counterpart, yet only includes two of the six protein subunits found in the eukarytic complex: SRP19 and SRP54 [].This entry represents the GTPase domain of the 54kDa SRP54 component, a GTP-binding protein that interacts with the signal sequence when it emerges from the ribosome. SRP54 of the signal recognition particle has a three-domain structure: an N-terminal helical bundle domain, a GTPase domain, and the M-domain that binds the 7s RNA and also binds the signal sequence. The extreme C-terminal region is glycine-rich and lower in complexity and poorly conserved between species. The GTPase domain is evolutionary related to P-loop NTPase domains found in a variety of other proteins [
].These proteins include Escherichia coli and Bacillus subtilis ffh protein (P48), which seems to be the prokaryotic counterpart of SRP54; signal recognition particle receptor alpha subunit (docking protein), an integral membrane GTP-binding protein which ensures, in conjunction with SRP, the correct targeting of nascent secretory proteins to the endoplasmic reticulum membrane; bacterial FtsY protein, which is believed to play a similar role to that of the docking protein in eukaryotes; the pilA protein from Neisseria gonorrhoeae, the homologue of ftsY; and bacterial flagellar biosynthesis protein flhF.
Proteins synthesized on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. This traffic is bidirectional, to ensure that proteins required to form vesicles are recycled. Vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transfer [
]. Clathrin coats contain both clathrin and adaptor complexes that link clathrin to receptors in coated vesicles. Clathrin-associated protein complexes are believed to interact with the cytoplasmic tails of membrane proteins, leading to their selection and concentration. The two major types of clathrin adaptor complexes are the heterotetrameric adaptor protein (AP) complexes, and the monomeric GGA (Golgi-localising, Gamma-adaptin ear domain homology, ARF-binding proteins) adaptors [
]. All AP complexes are heterotetramers composed of two large subunits (adaptins), a medium subunit (mu) and a small subunit (sigma). Each subunit has a specific function. Adaptin subunits recognise and bind to clathrin through their hinge region (clathrin box), and recruit accessory proteins that modulate AP function through their C-terminal appendage domains. By contrast, GGAs are monomers composed of four domains, which have functions similar to AP subunits: an N-terminal VHS (Vps27p/Hrs/Stam) domain, a GAT (GGA and Tom1) domain, a hinge region, and a C-terminal GAE (gamma-adaptin ear) domain. The GAE domain is similar to the AP gamma-adaptin ear domain, being responsible for the recruitment of accessory proteins that regulate clathrin-mediated endocytosis [].While clathrin mediates endocytic protein transport from ER to Golgi, coatomers (COPI, COPII) primarily mediate intra-Golgi transport, as well as the reverse Golgi to ER transport of dilysine-tagged proteins [
]. Coatomers reversibly associate with Golgi (non-clathrin-coated) vesicles to mediate protein transport and for budding from Golgi membranes []. Coatomer complexes are hetero-oligomers composed of at least an alpha, beta, beta', gamma, delta, epsilon and zeta subunits. This entry represents the small sigma and mu subunits of various adaptins from different AP clathrin adaptor complexes (including AP1, AP2, AP3 and AP4), and the zeta and delta subunits of various coatomer (COP) adaptors. The small sigma subunit of AP proteins have been characterised in several species [
,
,
,
]. The sigma subunit plays a role in protein sorting in the late-Golgi/trans-Golgi network (TGN) and/or endosomes. The zeta subunit of coatomers (zeta-COP) is required for coatomer binding to Golgi membranes and for coat-vesicle assembly [,
].
Proteins synthesized on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. This traffic is bidirectional, to ensure that proteins required to form vesicles are recycled. Vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transfer [
]. Clathrin coats contain both clathrin and adaptor complexes that link clathrin to receptors in coated vesicles. Clathrin-associated protein complexes are believed to interact with the cytoplasmic tails of membrane proteins, leading to their selection and concentration. The two major types of clathrin adaptor complexes are the heterotetrameric adaptor protein (AP) complexes, and the monomeric GGA (Golgi-localising, Gamma-adaptin ear domain homology, ARF-binding proteins) adaptors [
]. All AP complexes are heterotetramers composed of two large subunits (adaptins), a medium subunit (mu) and a small subunit (sigma). Each subunit has a specific function. Adaptin subunits recognise and bind to clathrin through their hinge region (clathrin box), and recruit accessory proteins that modulate AP function through their C-terminal appendage domains. By contrast, GGAs are monomers composed of four domains, which have functions similar to AP subunits: an N-terminal VHS (Vps27p/Hrs/Stam) domain, a GAT (GGA and Tom1) domain, a hinge region, and a C-terminal GAE (gamma-adaptin ear) domain. The GAE domain is similar to the AP gamma-adaptin ear domain, being responsible for the recruitment of accessory proteins that regulate clathrin-mediated endocytosis [].While clathrin mediates endocytic protein transport from ER to Golgi, coatomers (COPI, COPII) primarily mediate intra-Golgi transport, as well as the reverse Golgi to ER transport of dilysine-tagged proteins [
]. Coatomers reversibly associate with Golgi (non-clathrin-coated) vesicles to mediate protein transport and for budding from Golgi membranes []. Coatomer complexes are hetero-oligomers composed of at least an alpha, beta, beta', gamma, delta, epsilon and zeta subunits. This entry represents the small sigma subunit of various adaptins from different AP clathrin adaptor complexes (including AP1, AP2, AP3 and AP4), and the zeta subunit of various coatomer (COP) adaptors. The small sigma subunit of AP proteins have been characterised in several species [
,
,
,
]. The sigma subunit plays a role in protein sorting in the late-Golgi/trans-Golgi network (TGN) and/or endosomes. The zeta subunit of coatomers (zeta-COP) is required for coatomer binding to Golgi membranes and for coat-vesicle assembly [,
].
Although Hepatitis A virus, Hepatitis B virus, and Hepatitis C virus have similar names, because they all cause liver inflammation, these are distinctly different viruses both genetically and clinically. The Hepatitis C virus (HCV) is a small (50-80 nm in diameter), enveloped, single-stranded, positive sense RNA virus. It is member of the family Flaviviridae. There are seven genotypes and a number of subtypes with diverse geographic distributions. The genome of HCV consists of a single open reading frame. At the 5' and 3' ends of the RNA are the UTR regions that are not translated into proteins but are important to translation and replication of the viral RNA. The 5' UTR has a ribosome binding site (IRES - Internal ribosome entry site) that starts the translation of unique polyprotein that is later cut by cellular and viral proteases into 10 active structural and non-structural smaller proteins [
]. The HCV core protein is located at the N terminus of the polyprotein and is followed by the signal sequence located between the core protein and the E1 envelope glycoprotein. This signal sequence targets the nascent HCV polyprotein to the endoplasmic reticulum (ER), allowing the translocation of E1 to the ER lumen. Cleavage by a signal peptidase in the ER lumen releases the N-terminal end of E1, leaving the 191-amino acids (aa) core protein anchored by its C-terminal signal peptide [
,
]. This 191aa polypeptide, also known as p23, is the immature form of the core protein; p23 is further processed by an intramembrane protease, the signal peptide peptidase (SPP), that removes the ER anchor , releasing p21, the N-terminal 179aa mature form of the core protein []. Core protein (p21) is responsible for packaging viral RNA to form a viral nucleocapsid, and it also promotes virion budding []. Two domains have been identified in the mature form of the HCV core protein, based on predicted structural and functional characteristics [
]. Domain I, corresponding to the N-terminal region of approximately 120 aa, is a highly basic domain that is probably involved in the recruitment of viral RNA during particle morphogenesis. Domain II, located between aa 120 and aa 175, is a hydrophobic region predicted to form one or two α-helices that are probably involved in the association of core with the ER membrane and lipid droplets. This entry represents domain II and domain III (ER anchor sequence) of the core protein p23.
Although Hepatitis A virus, Hepatitis B virus, and Hepatitis C virus have similar names, because they all cause liver inflammation, these are distinctly different viruses both genetically and clinically. The Hepatitis C virus (HCV) is a small (50-80 nm in diameter), enveloped, single-stranded, positive sense RNA virus. It is member of the family Flaviviridae. There are seven genotypes and a number of subtypes with diverse geographic distributions. The genome of HCV consists of a single open reading frame. At the 5' and 3' ends of the RNA are the UTR regions that are not translated into proteins but are important to translation and replication of the viral RNA. The 5' UTR has a ribosome binding site (IRES - Internal ribosome entry site) that starts the translation of unique polyprotein that is later cut by cellular and viral proteases into 10 active structural and non-structural smaller proteins [
]. The HCV core protein is located at the N terminus of the polyprotein and is followed by the signal sequence located between the core protein and the E1 envelope glycoprotein. This signal sequence targets the nascent HCV polyprotein to the endoplasmic reticulum (ER), allowing the translocation of E1 to the ER lumen. Cleavage by a signal peptidase in the ER lumen releases the N-terminal end of E1, leaving the 191-amino acids (aa) core protein anchored by its C-terminal signal peptide [,
]. This 191aa polypeptide, also known as p23, is the immature form of the core protein; p23 is further processed by an intramembrane protease, the signal peptide peptidase (SPP), that removes the ER anchor , releasing p21, the N-terminal 179aa mature form of the core protein []. Core protein (p21) is responsible for packaging viral RNA to form a viral nucleocapsid, and it also promotes virion budding []. Two domains have been identified in the mature form of the HCV core protein, based on predicted structural and functional characteristics [
]. Domain I, corresponding to the N-terminal region of approximately 120 aa, is a highly basic domain that is probably involved in the recruitment of viral RNA during particle morphogenesis. Domain II, located between aa 120 and aa 175, is a hydrophobic region predicted to form one or two α-helices that are probably involved in the association of core with the ER membrane and lipid droplets. This entry covers domain I of the core protein.
Iron-sulphur (FeS) clusters are important cofactors for numerous proteins involved in electron transfer, in redox and non-redox catalysis, in gene regulation, and as sensors of oxygen and iron. These functions depend on the various FeS cluster prosthetic groups, the most common being [2Fe-2S] and [4Fe-4S][
]. FeS cluster assembly is a complex process involving the mobilisation of Fe and S atoms from storage sources, their assembly into [Fe-S]form, their transport to specific cellular locations, and their transfer to recipient apoproteins. So far, three FeS assembly machineries have been identified, which are capable of synthesising all types of [Fe-S] clusters: ISC (iron-sulphur cluster), SUF (sulphur assimilation), and NIF (nitrogen fixation) systems.
The ISC system is conserved in eubacteria and eukaryotes (mitochondria), and has broad specificity, targeting general FeS proteins [
,
]. It is encoded by the isc operon (iscRSUA-hscBA-fdx-iscX). IscS is a cysteine desulphurase, which obtains S from cysteine (converting it to alanine) and serves as a S donor for FeS cluster assembly. IscU and IscA act as scaffolds to accept S and Fe atoms, assembling clusters and transferring them to recipient apoproteins. HscA is a molecular chaperone and HscB is a co-chaperone. Fdx is a [2Fe-2S]-type ferredoxin. IscR is a transcription factor that regulates expression of the isc operon. IscX (also known as YfhJ) appears to interact with IscS and may function as an Fe donor during cluster assembly [
].The SUF system is an alternative pathway to the ISC system that operates under iron starvation and oxidative stress. It is found in eubacteria, archaea and eukaryotes (plastids). The SUF system is encoded by the suf operon (sufABCDSE), and the six encoded proteins are arranged into two complexes (SufSE and SufBCD) and one protein (SufA). SufS is a pyridoxal-phosphate (PLP) protein displaying cysteine desulphurase activity. SufE acts as a scaffold protein that accepts S from SufS and donates it to SufA [
]. SufC is an ATPase with an unorthodox ATP-binding cassette (ABC)-like component. SufA is homologous to IscA [], acting as a scaffold protein in which Fe and S atoms are assembled into [FeS]cluster forms, which can then easily be transferred to apoproteins targets.
In the NIF system, NifS and NifU are required for the formation of metalloclusters of nitrogenase in Azotobacter vinelandii, and other organisms, as well as in the maturation of other FeS proteins. Nitrogenase catalyses the fixation of nitrogen. It contains a complex cluster, the FeMo cofactor, which contains molybdenum, Fe and S. NifS is a cysteine desulphurase. NifU binds one Fe atom at its N-terminal, assembling an FeS cluster that is transferred to nitrogenase apoproteins [
]. Nif proteins involved in the formation of FeS clusters can also be found in organisms that do not fix nitrogen [].This entry represents SufD proteins that form part of the SufBCD complex in the SUF system. No specific functions have been assigned to these proteins.
Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [
,
,
,
,
]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. This entry represents B-box-type zinc finger domains, which are around 40 residues in length. B-box zinc fingers can be divided into two groups, where types 1 and 2 B-box domains differ in their consensus sequence and in the spacing of the 7-8 zinc-binding residues. Several proteins contain both types 1 and 2 B-boxes, suggesting some level of cooperativity between these two domains. B-box domains are found in over 1500 proteins from a variety of organisms. They are found in TRIM (tripartite motif) proteins that consist of an N-terminal RING finger (originally called an A-box), followed by 1-2 B-box domains and a coiled-coil domain (also called RBCC for Ring, B-box, Coiled-Coil). TRIM proteins contain a type 2 B-box domain, and may also contain a type 1 B-box. In proteins that do not contain RING or coiled-coil domains, the B-box domain is primarily type 2. Many type 2 B-box proteins are involved in ubiquitination. Proteins containing a B-box zinc finger domain include transcription factors, ribonucleoproteins and proto-oncoproteins; for example, MID1, MID2, TRIM9, TNL, TRIM36, TRIM63, TRIFIC, NCL1 and CONSTANS-like proteins [
].The microtubule-associated E3 ligase MID1 (
) contains a type 1 B-box zinc finger domain. MID1 specifically binds Alpha-4, which in turn recruits the catalytic subunit of phosphatase 2A (PP2Ac). This complex is required for targeting of PP2Ac for proteasome-mediated degradation. The MID1 B-box coordinates two zinc ions and adopts a β/β/α cross-brace structure similar to that of ZZ, PHD, RING and FYVE zinc fingers [
,
].
A variety of substrate carrier proteins that are involved in energy transfer are found in the inner mitochondrial membrane or integral to the membrane of other eukaryotic organelles such as the peroxisome [
,
,
,
,
]. Such proteins include: ADP, ATP carrier protein (ADP/ATP translocase); 2-oxoglutarate/malate carrier protein; phosphate carrier protein; tricarboxylate transport protein (or citrate transport protein); Graves disease carrier protein; yeast mitochondrial proteins MRS3 and MRS4; yeast mitochondrial FAD carrier protein; and many others. Structurally, these proteins can consist of up to three tandem repeats of a domain of approximately 100 residues, each domain containing two transmembrane regions.
This domain represents the N-terminal transmembrane region of the apolipoprotein N-acyltransferase enzyme. Proteins containing this domain also include bifunctional apolipoprotein N-acyltransferase/polyprenol monophosphomannose synthase from Mycobacterium tuberculosis. Apolipoprotein N-acyltransferase (Lnt) transfers the acyl group to lipoproteins and is involved in lipoprotein biosynthesis in Gram-negative bacteria. It is an integral membrane protein [
]. In the last step of lipoprotein maturation, N-acylation by apolipoprotein N-acyltransferase of the plasma membrane is required for recognition of the outer membrane lipoproteins by the Lol system, which transports the lipoproteins from the plasma to the outer membrane []. This is a reverse amidase (i.e. condensation) reaction.
This entry refers to the DHHC domain, found in DHHC proteins which are palmitoyltransferases [
].Palmitoylation or, more specifically S-acylation, plays important roles in the regulation of protein localization, stability, and activity. It is a post-translational protein modification that involves the attachment of palmitic acid to Cys residues through a thioester linkage. Protein acyltransferases (PATs), also known as palmitoyltransferases, catalyse this reaction by transferring the palmitoyl group from palmitoyl-CoA to the thiol group of Cys residues. They are characterised by the presence of 50-residue-long domain called the DHHC domain, which in most but not all cases is also cysteine-rich and gets its name from a highly conserved DHHC signature tetrapeptide (Asp-His-His-Cys). The Cys residue within the DHHC domain forms a stable acyl intermediate and transfers the acyl chain to the Cys residues of a target protein [
,
].Some of the proteins containing a DHHC domain are listed below:Drosophila DNZ1 protein [
]
Mouse Abl-philin 2 (Aph2) protein. Interacts with c-Abl. May play a role in apoptosis [
]
Mammalian ZDHHC9, an integral membrane protein [
]
Yeast ankyrin repeat-containing protein AKR1 [
]
Yeast Erf2 protein. This protein localizes to the endoplasmic reticulum and seems to be important for Ras function [
]
Arabidopsis thaliana tip growth defective 1 [
]
This entry represents a domain with a β/α/β/α-β(2) structure found in the C-terminal region of many Gram-negative bacterial outer membrane proteins [
], such as porin-like integral membrane proteins (such as ompA) [], small lipid-anchored proteins (such as pal) [], and MotB proton channels []. The N-terminal half is variable although some of the proteins in this group have the OmpA-like transmembrane domain at the N terminus. OmpA from Escherichia coli is required for pathogenesis, and can interact with host receptor molecules [
]. MotB (and MotA) serves two functions in E. coli, the MotA(4)-MotB(2) complex attaches to the cell wall via MotB to form the stator of the flagellar motor, and the MotA-MotB complex couples the flow of ions across the cell membrane to movement of the rotor [].Other Gram-negative outer membrane proteins with this domain:Outer membrane protein P5 from Haemophilus influenzae.Outer membrane protein P.III/class IV from Neisseria.Outer membrane porin F (gene oprF) from Pseudomonas.Protein TpN50 from Treponema pallidum [
].Peptidoglycan-associated lipoprotein (gene pal) from Escherichia coli,
Haemophilus influenzae, Legionella pneumophila and Pseudomonas putida.Outer membrane lipoprotein P6 from Haemophilus influenzae.Escherichia coli hypothetical lipoprotein yiaD.Vibrio parahaemolyticus sodium-type flagellar protein motY [
,
].The OmpA-like domain is thought to be responsible for non-covalent
interactions with peptidoglycan [].
This entry represents a domain superfamily with a β/α/β/α-β(2) structure found in the C-terminal region of many Gram-negative bacterial outer membrane proteins [
], such as porin-like integral membrane proteins (such as ompA) [], small lipid-anchored proteins (such as pal) [], and MotB proton channels []. The N-terminal half is variable although some of the proteins in this group have the OmpA-like transmembrane domain at the N terminus. OmpA from Escherichia coli is required for pathogenesis, and can interact with host receptor molecules [
]. MotB (and MotA) serves two functions in E. coli, the MotA(4)-MotB(2) complex attaches to the cell wall via MotB to form the stator of the flagellar motor, and the MotA-MotB complex couples the flow of ions across the cell membrane to movement of the rotor [].Other Gram-negative outer membrane proteins with this domain:Outer membrane protein P5 from Haemophilus influenzae.Outer membrane protein P.III/class IV from Neisseria.Outer membrane porin F (gene oprF) from Pseudomonas.Protein TpN50 from Treponema pallidum [
].Peptidoglycan-associated lipoprotein (gene pal) from Escherichia coli,
Haemophilus influenzae, Legionella pneumophila and Pseudomonas putida.Outer membrane lipoprotein P6 from Haemophilus influenzae.Escherichia coli hypothetical lipoprotein yiaD.Vibrio parahaemolyticus sodium-type flagellar protein motY [
,
].The OmpA-like domain is thought to be responsible for non-covalent
interactions with peptidoglycan [].
RGS (Regulator of G protein Signalling) proteins are multi-functional, GTPase-accelerating proteins that promote GTP hydrolysis by the alpha subunit of heterotrimeric G proteins, thereby inactivating the G protein and rapidly switching off G protein-coupled receptor signalling pathways. Upon activation by GPCRs, heterotrimeric G proteins exchange GDP for GTP, are released from the receptor, and dissociate into free, active GTP-bound alpha subunit and beta-gamma dimer, both of which activate downstream effectors. The response is terminated upon GTP hydrolysis by the alpha subunit, which can then bind the beta-gamma dimer and the receptor. RGS proteins markedly reduce the lifespan of GTP-bound alpha subunits by stabilising the G protein transition state [
,
]. The human genome encode more than three dozen RGS domain-containing proteins with varied G-alpha substrate specificities, and are classified in eight different subfamilies: A or RZ; B or R4; C or R7; D or R12; E or RA; F or GEF; G or GRK (GPCR kinase) and H or SNX (sorting nexin) [].This entry represents the R12 RGS subfamily, which includes RGS10, RGS12 and RGS14 proteins, all of which are highly selective for G-alpha-i1 over G-alpha-q. This entry also includes regulator of G-protein signalling loco from Drosophila melanogaster, which is an RGS14 orthologue.
Molecular chaperones, or heat shock proteins (Hsps) are ubiquitous proteins that act to maintain proper protein folding within the cell [
]. They assist in the folding of nascent polypeptide chains, and are also involved in the re-folding of denatured proteins following proteotoxic stress. As their name implies, the heat shock proteins were first identified as proteins that were up-regulated under conditions of elevated temperature. However, subsequent studies have shown that increased Hsp expression is induced by a variety of cellular stresses, including oxidative stress and inflammation. Five major Hsp families have been determined, and are categorized according to their molecular size (Hsp100, Hsp90, Hsp70, Hsp60, and the small Hsps). Hsps are involved in a variety of cellular processes that are ATP-dependent. These include: prevention of protein aggregation, protein degradation, protein trafficking, and maintenance of signalling proteins in a conformation that permits activation.Hsp90 chaperones are unique in their ability to regulate a specific subset of cellular signalling proteins that have been implicated in disease processes, including intracellular protein kinases, steroid hormone receptors, and growth factor receptors [
].This entry represents the C-terminal domain of HPS90. Structurally, this domain has a α-β(3)-alpha(3) fold arranged in two layers (alpha/beta) with mixed β-sheet and crossing loops.
Secretion across the inner membrane in some Gram-negative bacteria occurs via the preprotein translocase pathway. Proteins are produced in the cytoplasm as precursors, and require a chaperone subunit to direct them to the translocase component [
]. From there, the mature proteins are either targeted to the outer membrane, or remain as periplasmic proteins. The translocase protein subunits are encoded on the bacterial chromosome.The translocase itself comprises 7 proteins, including a chaperone protein (SecB), an ATPase (SecA), an integral membrane complex (SecCY, SecE and SecG), and two additional membrane proteins that promote the release of the mature peptide into the periplasm (SecD and SecF) [
]. The chaperone protein SecB [] is a highly acidic homotetrameric protein that exists as a "dimer of dimers"in the bacterial cytoplasm. SecB maintains preproteins in an unfolded state after translation, and targets these to the peripheral membrane protein ATPase SecA for secretion [
].The tertiary structure of Haemophilus influenzae SecB (
) was resolved by means of X-ray crystallography to 2.5A [
]. The chaperone comprises four chains, forming a tetramer, each chain of which has a simple alpha+beta fold arrangement. While one binding site on the homotetramer recognises unfolded polypeptides by hydrophobic interactions, the second binds to SecA through the latter's C-terminal 22 residues.
Apolipoprotein B receptor (also known as apolipoprotein B-100 receptor) is a macrophage receptor that binds to apolipoprotein (apo) B48 of dietary triglyceride (TG)-rich lipoproteins, or to an apoB100-like domain in hypertriglyceridemic very low-density lipoproteins [
]. The receptor has been shown to be capable of binding and internalising triglyceride-rich lipoproteins, even in the absence of other macrophage-specific proteins, such as apoE or lipoprotein lipase, that can enhance macrophage uptake of lipoproteins []. Apolipoprotein B receptor may be involved in provision of essential lipids to reticuloendothelial cells. It also has a role in remnant lipoprotein-induced macrophage foam cell formation, and thus may be involved in atherosclerosis [].
Guanine nucleotide binding proteins (G proteins) are membrane-associated, heterotrimeric proteins composed of three subunits: alpha (
), beta (
) and gamma (
) [
]. G proteins and their receptors (GPCRs) form one of the most prevalent signalling systems in mammalian cells, regulating systems as diverse as sensory perception, cell growth and hormonal regulation []. At the cell surface, the binding of ligands such as hormones and neurotransmitters to a GPCR activates the receptor by causing a conformational change, which in turn activates the bound G protein on the intracellular-side of the membrane. The activated receptor promotes the exchange of bound GDP for GTP on the G protein alpha subunit. GTP binding changes the conformation of switch regions within the alpha subunit, which allows the bound trimeric G protein (inactive) to be released from the receptor, and to dissociate into active alpha subunit (GTP-bound) and beta/gamma dimer. The alpha subunit and the beta/gamma dimer go on to activate distinct downstream effectors, such as adenylyl cyclase, phosphodiesterases, phospholipase C, and ion channels. These effectors in turn regulate the intracellular concentrations of secondary messengers, such as cAMP, diacylglycerol, sodium or calcium cations, which ultimately lead to a physiological response, usually via the downstream regulation of gene transcription. The cycle is completed by the hydrolysis of alpha subunit-bound GTP to GDP, resulting in the re-association of the alpha and beta/gamma subunits and their binding to the receptor, which terminates the signal []. The length of the G protein signal is controlled by the duration of the GTP-bound alpha subunit, which can be regulated by RGS (regulator of G protein signalling) proteins or by covalent modifications [].G protein alpha subunits are 350-400 amino acids in length and have
molecular weights in the range 40-45kDa. Seventeen distinct types ofalpha subunit have been identified in mammals. These fall into 4 main
groups on the basis of both sequence similarity and function: alpha-S (),
alpha-Q (), alpha-I (
)and alpha-12(
) [
].The specific combination of subunits in heterotrimeric G proteins affects not only which receptor it can bind to, but also which downstream target is affected, providing the means to target specific physiological processes in response to specific external stimuli [
,
]. G proteins carry lipid modifications on one or more of their subunits to target them to the plasma membrane and to contribute to protein interactions.This family consists of the G protein alpha subunit group S (stimulatory) which transduces signals from various cell surface receptors to the cAMP-generating enzyme adenylyl cyclase. The G alpha-S subunit is encoded by GNAS, a complex imprinted gene that uses multiple promoters to generate several gene products. G alpha-S is imprinted in a tissue-specific manner, and is expressed primarily from the maternal allele in renal proximal tubules, thyroid, pituitary and ovary [
]. Several disease states are linked to the G alpha-S, including McCune-Albright syndrome, pseudohypoparathyroidism, adenomas, testotoxicosis and the action of cholera toxin. G alpha-olf is a specialised form of G alpha-S expressed in olfactory neuroepithelial cells, brain and pancreas. In addition to its interaction with adenylyl cyclase, G alpha-S also activates ion channels, such as atrial voltage gated sodium channels and dihydropyridine-sensitive calcium channels in skeletal muscle.
SUF system FeS cluster assembly, SufBD, N-terminal
Type:
Domain
Description:
Iron-sulphur (FeS) clusters are important cofactors for numerous proteins involved in electron transfer, in redox and non-redox catalysis, in gene regulation, and as sensors of oxygen and iron. These functions depend on the various FeS cluster prosthetic groups, the most common being [2Fe-2S] and [4Fe-4S][
]. FeS cluster assembly is a complex process involving the mobilisation of Fe and S atoms from storage sources, their assembly into [Fe-S]form, their transport to specific cellular locations, and their transfer to recipient apoproteins. So far, three FeS assembly machineries have been identified, which are capable of synthesising all types of [Fe-S] clusters: ISC (iron-sulphur cluster), SUF (sulphur assimilation), and NIF (nitrogen fixation) systems.The ISC system is conserved in eubacteria and eukaryotes (mitochondria), and has broad specificity, targeting general FeS proteins [
,
]. It is encoded by the isc operon (iscRSUA-hscBA-fdx-iscX). IscS is a cysteine desulphurase, which obtains S from cysteine (converting it to alanine) and serves as a S donor for FeS cluster assembly. IscU and IscA act as scaffolds to accept S and Fe atoms, assembling clusters and transferring them to recipient apoproteins. HscA is a molecular chaperone and HscB is a co-chaperone. Fdx is a [2Fe-2S]-type ferredoxin. IscR is a transcription factor that regulates expression of the isc operon. IscX (also known as YfhJ) appears to interact with IscS and may function as an Fe donor during cluster assembly [
].The SUF system is an alternative pathway to the ISC system that operates under iron starvation and oxidative stress. It is found in eubacteria, archaea and eukaryotes (plastids). The SUF system is encoded by the suf operon (sufABCDSE), and the six encoded proteins are arranged into two complexes (SufSE and SufBCD) and one protein (SufA). SufS is a pyridoxal-phosphate (PLP) protein displaying cysteine desulphurase activity. SufE acts as a scaffold protein that accepts S from SufS and donates it to SufA [
]. SufC is an ATPase with an unorthodox ATP-binding cassette (ABC)-like component. SufA is homologous to IscA [], acting as a scaffold protein in which Fe and S atoms are assembled into [FeS]cluster forms, which can then easily be transferred to apoproteins targets.
In the NIF system, NifS and NifU are required for the formation of metalloclusters of nitrogenase in Azotobacter vinelandii, and other organisms, as well as in the maturation of other FeS proteins. Nitrogenase catalyses the fixation of nitrogen. It contains a complex cluster, the FeMo cofactor, which contains molybdenum, Fe and S. NifS is a cysteine desulphurase. NifU binds one Fe atom at its N-terminal, assembling an FeS cluster that is transferred to nitrogenase apoproteins []. Nif proteins involved in the formation of FeS clusters can also be found in organisms that do not fix nitrogen [].This entry represents SufB and SufD proteins, which are homologous, and form part of the SufBCD complex in the SUF system [
]. SufB accepts sulfur transferred from SufE [], whereas SufD may play a role in iron acquisition [].This domain is found at the N-terminal part of the SufB and SufD proteins, which has a right-handed parallel beta helix structure [
].
Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [
,
]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [
,
].Ribosome-binding factor A [
] (gene rbfA) is a bacterial protein that associates with free 30S ribosomal subunits. It does not associate with 30S subunits that are part of 70S ribosomes or polysomes. It is essential for efficient processing of 16S rRNA. Ribosome-binding factor A is a protein of from 13 to 15 Kd which is found in most bacteria. A putative chloroplastic form seems to exist in plants.This entry represents a conserved site located in the second half of the protein.
This superfamily includes ribosomal proteins with CxxC motifs, including L28 and L31 from the 50S subunit. In early ribosomal structures, L28p had been misinterpreted as L31p. In the ribosomal protein L28p family there are sequences containing two CxxC pairs. Threading these sequences into this fold brings the four cysteines in a similar site to the zinc-binding site of glucocorticoid receptor-like zinc fingers. In the ribosomal protein L31p, there are also members with two CxxC pairs. However, these do not form a putative zinc-binding site in this fold.Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [
,
]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [
,
].
A key aspect of eukaryotic intracellular trafficking is the sorting of cell-surface proteins into multi-vesicular endosomes or bodies (MVBs), which eventually fuse with the lysosome, where they are degraded by lipases and peptidases. This is the primary mechanism for down-regulation of signaling via transmembrane receptors and removal of misfolded or defective membrane proteins. This process is also utilised by several viruses (e.g. HIV-1) to facilitate budding of their virions from the cell-membrane. Studies in animals and fungi have shown that it depends on an intricate series of interactions, which is initiated via ubiquitination (typically one or more mono-ubiquitinations) of the cytoplasmic tails of membrane proteins by specific E3 ligases. Ubiquitinated membrane proteins are then captured into endosomes by the ESCRT system and prevented from being recycled back to the plasma membrane via the retrograde trafficking system. The ESCRT system also folds the endosomal membranes into invaginations that are concentrated in these ubiquitinated targets and catalyzes their abscission into intra-luminal-vesicles inside the endosome. This largely seals the fate of these membrane proteins as targets for lysosomal degradation. The ESCRT system is comprised of 4 major protein complexes, ESCRT-0 to ESCRT-III, which are successively involved in the above-described steps [
].ESCRT-I contains three subunits that are conserved between yeast and animals, namely the inactive E2-ligase protein TSG101/VPS23, VPS28 and VPS37. Additionally, both yeast and metazoan ESCRT-I contain a fourth subunit termed MVB12 (multivesicular body sorting factor of 12 kD); however, the MVB12 subunits from the two lineages do not show significant sequence similarity. The metazoan MVB12 proteins contain two distinct conserved domains that occur independently in various proteins. The C-terminal region of MVB12, which is shared with ubiquitin associated protein-1 (UBAP1), forms the UBAP1-MVB12 associated (UMA) domain. Human UBAP1 is implicated in nasopharyngeal carcinoma risk and fronto-temporal lobar degeneration. The UMA domain is also found in several other poorly characterised proteins, including at leat one orthologous group of proteins conserved in vertebrates prototyped be the human protein LOC390595 and another group conserved across Metazoa typified by human tcag7.903. The UMA domain found in MVB12 and UBAP1 defines a novel adaptor that might recruit diverse targets to ESCRT-I. The different UMA proteins might function as alternative as MVB12-like subunit that recruit different targets via their specific intercation modules (such as MABP) or UBA or the specific extensions) to the ESCRT-I complex [
].This entry represents the UMA domain, which contains a conserved proline followed by a hydrophobic residue in the N terminus and a nearly absolutely conserved glutamate at the C terminus. It is predicted to adopt an alpha+beta fold [
].
This entry represents a motif that corresponds to residue 19 to 37 in mammalian ACBP, adn which contains a conserved site.Acyl-CoA-binding protein (ACBP) is a small (10 Kd) protein that binds medium- and long-chain acyl-CoA esters with very high affinity and may function as an intracellular carrier of acyl-CoA esters [
]. ACBP is also known as diazepam binding inhibitor (DBI) or endozepine (EP) because of its ability to displace diazepam from the benzodiazepine (BZD) recognition site located on the GABA type A receptor. It is therefore possible that this protein also acts as a neuropeptide to modulate the action of the GABA receptor [].ACBP is a highly conserved protein of about 90 residues that is found in all four eukaryotic kingdoms, Animalia, Plantae, Fungi and Protista, and in some eubacterial species [
].Although ACBP occurs as a completely independent protein, intact ACB domains have been identified in a number of large, multifunctional proteins in a variety of eukaryotic species. These include large membrane-associated proteins with N-terminal ACB domains, multifunctional enzymes with both ACB and peroxisomal enoyl-CoA Delta(3), Delta(2)-enoyl-CoA isomerase domains, and proteins with both an ACB domain and ankyrin repeats (
) [
].The ACB domain consists of four α-helices arranged in a bowl shape with a
highly exposed acyl-CoA-binding site. The ligand is boundthrough specific interactions with residues on the protein, most notably
several conserved positive charges that interact with the phosphate group onthe adenosine-3'phosphate moiety, and the acyl chain is sandwiched between the
hydrophobic surfaces of CoA and the protein [].Other proteins containing an ACB domain include:
Endozepine-like peptide (ELP) (gene DBIL5) from mouse [
]. ELP is a testis-specific ACBP homologue that may be involved in the energy metabolism of the mature sperm.MA-DBI, a transmembrane protein of unknown function which has been found in
mammals. MA-DBI contains a N-terminal ACB domain.DRS-1 [
], a human protein of unknown function that contains a N-terminal ACB domain and a C-terminal enoyl-CoA isomerase/hydratase domain.
This globular domain is named fido after the Fic and Doc proteins where it is found. It is approximately 125 to 150 residues long, and is present in proteins from all kingdoms of life [
,
,
,
], including:Fic (filamentation induced by cAMP) from diverse bacteria. It contains a longer insert in the fido domain.Doc (death on curing) proteins from phage P1 and several bacteria. All these proteins contain a minimal stand-alone version of the fido domain.HypE (Huntingtin associated protein E) from animal. In humans, HypE is thought to interact with Huntingtin, one of the major proteins in the Huntington's disease protein interaction network. Proteins related to HypE are also found in several bacteria and some archaea. HypE proteins contain a longer insert in their fido domain and are typically multidomain proteins.Type IV secretion system effector AnkX from Legionella.VopS, a type III secretion system effector from Vibrio that causes eukaryotic cell cytotoxicity.IbpA (virulence factor p76) from Haemophilus somnus. It includes an N-terminal haemagglutination activity domain, two fido domains and a peptidase C58 domain.BepA, an anti-apoptotic bacterial effector protein, which is a type IV secretion system substrate.The fido domain of Vibrio VopS covalently modifies Rho GTPase threonine with AMP to inhibit downstream signaling events in host cells. The AMPylation activity extends to a eukaryotic fido domain in Drosophila fic homologue CG9523. AMPylation represents a newly discovered posttranslational modification used to stably modify proteins with AMP. This signaling mechanism is predicted to be functionally similar to other posttranslation modifications such as phosphorylation, SUMOylation or acetylation, because the added moiety changes the activity of the modified protein. The covalent attachment of AMP by a phosphodiester bond is predicted to be reversible and is bulky enough to provide a docking site for a putative AMP binding domain [
].The fido domain contains a central motif conserved in most sequences (H-x-F-x-[DE]-[AG]-N-[GK]-R), with the motif His contributing to fic AMPylation. The fido domain adopts an α-helical fold, arranged as a six-helix up and down bundle [,
,
].
Apoptosis, or programmed cell death (PCD), is a common and evolutionarily conserved property of all metazoans [
]. In many biological processes, apoptosis is required to eliminate supernumerary or dangerous (such as pre-cancerous) cells and to promote normal development. Dysregulation of apoptosis can, therefore, contribute to the development of many major diseases including cancer, autoimmunity and neurodegenerative disorders. In most cases, proteins of the caspase family execute the genetic programme that leads to cell death.Bcl-2 proteins are central regulators of caspase activation, and play a key role in cell death by regulating the integrity of the mitochondrial and endoplasmic reticulum (ER) membranes [
]. At least 20 Bcl-2 proteins have been reported in mammals, and several others have been identified in viruses. Bcl-2 family proteins fall roughly into three subtypes, which either promote cell survival (anti-apoptotic) or trigger cell death (pro-apoptotic). All members contain at least one of four conserved motifs, termed Bcl-2 Homology (BH) domains. Bcl-2 subfamily proteins, which contain at least BH1 and BH2, promote cell survival by inhibiting the adapters needed for the activation of caspases.Pro-apoptotic members potentially exert their effects by displacing the adapters from the pro-survival proteins; these proteins belong either to the Bax subfamily, which contain BH1-BH3, or to the BH3 subfamily, which mostly only feature BH3 [
]. Thus, the balance between antagonistic family members is believed to play a role in determining cell fate. Members of the wider Bcl-2 family, which also includes Bcl-x, Bcl-w and Mcl-1, are described by their similarity to Bcl-2 protein, a member of the pro-survival Bcl-2 subfamily []. Full-length Bcl-2 proteins feature all four BH domains, seven α-helices, and a C-terminal hydrophobic motif that targets the protein to the outer mitochondrial membrane, ER and nuclear envelope. This entry represents the N-terminal region of several mammal specific Bim proteins. The Bim protein is one of the BH3-only proteins, members of the Bcl-2 family that have only one of the Bcl-2 homology regions, BH3.
This globular domain is named fido after the Fic and Doc proteins where it is found. It is approximately 125 to 150 residues long, and is present in proteins from all kingdoms of life [
,
,
,
], including:Fic (filamentation induced by cAMP) from diverse bacteria. It contains a longer insert in the fido domain.Doc (death on curing) proteins from phage P1 and several bacteria. All these proteins contain a minimal stand-alone version of the fido domain.HypE (Huntingtin associated protein E) from animal. In humans, HypE is thought to interact with Huntingtin, one of the major proteins in the Huntington's disease protein interaction network. Proteins related to HypE are also found in several bacteria and some archaea. HypE proteins contain a longer insert in their fido domain and are typically multidomain proteins.Type IV secretion system effector AnkX from Legionella.VopS, a type III secretion system effector from Vibrio that causes eukaryotic cell cytotoxicity.IbpA (virulence factor p76) from Haemophilus somnus. It includes an N-terminal haemagglutination activity domain, two fido domains and a peptidase C58 domain.BepA, an anti-apoptotic bacterial effector protein, which is a type IV secretion system substrate.The fido domain of Vibrio VopS covalently modifies Rho GTPase threonine with AMP to inhibit downstream signaling events in host cells. The AMPylation activity extends to a eukaryotic fido domain in Drosophila fic homologue CG9523. AMPylation represents a newly discovered posttranslational modification used to stably modify proteins with AMP. This signaling mechanism is predicted to be functionally similar to other posttranslation modifications such as phosphorylation, SUMOylation or acetylation, because the added moiety changes the activity of the modified protein. The covalent attachment of AMP by a phosphodiester bond is predicted to be reversible and is bulky enough to provide a docking site for a putative AMP binding domain [
].The fido domain contains a central motif conserved in most sequences (H-x-F-x-[DE]-[AG]-N-[GK]-R), with the motif His contributing to fic AMPylation. The fido domain adopts an α-helical fold, arranged as a six-helix up and down bundle [,
,
].
This entry represents a conserved unique core sequence shared by large numbers of proteins. It is occasionally found in the Archaea, including Methanosarcina barkeri) but commonly found in the bacteria and eukaryotes. Most fall into two large classes. One class consists of long proteins in which two classes of repeats are abundant: an FG-GAP repeat (
) class, and an RHS repeat (
) or YD repeat (
). This class includes secreted bacterial insecticidal toxins and intercellular signalling proteins such as the teneurins in animals. The other class consists of uncharacterised proteins shorter than 400 amino acids, where this core domain of about 75 amino acids tends to occur in the N-terminal half. Over twenty such proteins are found in Pseudomonas putida alone; little sequence similarity or repeat structure is found among these proteins outside of this domain.
The C protein of bacterial ABC toxin complexes has a conserved RHS-repeat-containing N-terminal region and a variable C-terminal region, which is the main cytotoxic component. The C protein forms a large hollow shell structure with the B protein encapsulating the divergent C-terminal domain. An explanation for this structure could be that it allows the toxic load to remain sequestered until a change in pH triggers its release. The RHS repeat-associated core domain forms a short strip of β-sheets that spirals inwards the shell structure, forming a plug at the C end of the shell. The RHS core domain also functions as a self-cleaving protease, cleaving the C-terminal domain from the rest of the protein [
].YD repeats are found in many bacterial and eukaryotic proteins, notably in the extracellular domains of teneurin proteins, which are developmental signalling proteins conserved from flies to mammals [
]. It has been suggested that RHS and YD repeats may represent the same conserved structural motif contributing to a similar shell structure that encapsulates the teneurin C-terminal region [].
Amyloid beta A4 precursor protein-binding family B member 1-interacting protein (APBB1IP) consists of a Ras-associated (RA) domain, a PH domain, a family-specific BPS region, and a C-terminal SH2 domain. Grb7, Grb10 and Grb14 are paralogues that are also present in this entry [
]. These adapter proteins bind a variety of receptor tyrosine kinases, including the insulin and insulin-like growth factor-1 (IGF1) receptors. Grb10 and Grb14 are important tissue-specific negative regulators of insulin and IGF1 signaling based and may contribute to type 2 (non-insulin-dependent) diabetes in humans. RA-PH function as a single structural unit and is dimerized via a helical extension of the PH domain. The PH domain here are proposed to bind phosphoinositides non-cannonically and are unlikely to bind an activated GTPase [
]. The tandem RA-PH domains are present in a second adapter-protein family, MRL proteins, Caenorhabditis elegansprotein MIG-1012, the mammalian proteins RIAM and lamellipodin and the
Drosophila melanogasterprotein Pico12, all of which are Ena/VASP-binding proteins involved in actin-cytoskeleton rearrangement.
PH domains have diverse functions, but in general are involved in targeting proteins to the appropriate cellular location or in the interaction with a binding partner. They share little sequence conservation, but all have a common fold, which is electrostatically polarized. Less than 10% of PH domains bind phosphoinositide phosphates (PIPs) with high affinity and specificity. PH domains are distinguished from other PIP-binding domains by their specific high-affinity binding to PIPs with two vicinal phosphate groups: PtdIns(3,4)P2, PtdIns(4,5)P2 or PtdIns(3,4,5)P3 which results in targeting some PH domain proteins to the plasma membrane [
]. A few display strong specificity in lipid binding. Any specificity is usually determined by loop regions or insertions in the N terminus of the domain, which are not conserved across all PH domains. PH domains are found in cellular signaling proteins such as serine/threonine kinase, tyrosine kinases, regulators of G-proteins, endocytotic GTPases, adaptors, as well as cytoskeletal associated molecules and in lipid associated enzymes.
G protein-coupled receptors are a large family of signalling molecules that respond to a wide variety of extracellular stimuli. The receptors relay the information encoded by the ligand through the activation of heterotrimeric G proteins and intracellular effector molecules. To ensure the appropriate regulation of the signalling cascade, it is vital to properly inactivate the receptor. This inactivation is achieved, in part, by the binding of a soluble protein, arrestin, which uncouples the receptor from the downstream G protein after the receptors are phosphorylated by G protein-coupled receptor kinases. In addition to the inactivation of G protein-coupled receptors, arrestins have also been implicated in the endocytosis of receptors and cross talk with other signalling pathways. Arrestin (retinal S-antigen) is a major protein of the retinal rod outer segments. It interacts with photo-activated phosphorylated rhodopsin, inhibiting or 'arresting' its ability to interact with transducin [
]. The protein binds calcium, and shows similarity in its C terminus to alpha-transducin and other purine nucleotide-binding proteins. In mammals, arrestin is associated with autoimmune uveitis.Arrestins comprise a family of closely-related proteins that includes beta-arrestin-1 and -2, which regulate the function of beta-adrenergic receptors by binding to their phosphorylated forms, impairing their capacity to activate G(S) proteins; Cone photoreceptors C-arrestin (arrestin-X) [
], which could bind to phosphorylated red/green opsins; and Drosophila phosrestins I and II, which undergo light-induced phosphorylation, and probably play a role in photoreceptor transduction [,
,
].The crystal structure of bovine retinal arrestin comprises two domains of antiparallel β-sheets connected through a hinge region and one short α-helix on the back of the amino-terminal fold [
]. The binding region for phosphorylated light-activated rhodopsin is located at the N-terminal domain, as indicated by the docking of the photoreceptor to the three-dimensional structure of arrestin. The N-terminal domain consists of an immunoglobulin-like β-sandwich structure and is found in arrestin and related proteins. For example, thioredoxin-interacting protein (TXNIP) matches the arrestin N domain [
,
].
Guanylate kinase (
) (GK) [
] catalyzes the ATP-dependent phosphorylation of GMP into GDP. It is essential for recycling GMP and indirectly, cGMP. In prokaryotes (such as Escherichia coli), lower eukaryotes(such as yeast) and in vertebrates, GK is a highly conserved monomeric protein of about 200 amino acids. GK has been shown [
,
,
] to be structurally similar to protein A57R (or SalG2R) from various strains of Vaccinia virus.Proteins containing one or more copies of the DHR domain, an SH3 domain as well as a C-terminal GK-like domain, are collectively termed MAGUKs (membrane-associated guanylate kinase homologues) [
], andinclude Drosophila lethal(1)discs large-1 tumor suppressor protein (gene dlg1); mammalian tight junction protein Zo-1; a family of mammalian synaptic proteins that seem to interact with the cytoplasmic tail of NMDA receptor subunits (SAP90/PSD-95, CHAPSYN-110/PSD-93, SAP97/DLG1 and SAP102); vertebrate 55kDa erythrocyte membrane protein (p55); Caenorhabditis elegans protein lin-2; rat protein CASK; and human proteins DLG2 and DLG3. There is an ATP-binding site (P-loop) in the N-terminal section of GK, which is not conserved in the GK-like domain of the above proteins. However these proteins retain the residues known, in GK, to be involved in the binding of GMP.
This signature pattern covers a highly conserved region that contains two arginine and a tyrosine which are involved in GMP-binding.
The polypeptide-transport-associated (POTRA) domain is a module found in
members of the FtsQ/DivIB, ShlB, CGI51 (with one repeat), Toc75, YTFM (withthree repeats) and D15 (with five repeats) protein families. The POTRA domains
are hypothesized to mediate protein-protein interactions, nucleate beta-strands formation in nascent outer membrane proteins (OMPs) and have
chaperone-like activity [,
,
,
,
].The POTRA domain fold comprises a three-stranded β-sheet overlaid with a
pair of antiparallel helices. The order of secondary-structure elements isbeta-α-α-β-β; the first and second beta strands form the two
edges of the sheet, with the beta3 strand sandwiched between them. The conserved residues that define the POTRA domains areprimarily in the hydrophobic core or loop regions, suggesting that they are
important for the structural integrity of POTRA domain [,
,
,
].Some proteins known to contain a POTRA domain are listed below:Escherichia coli outer membrane protein assembly factor BamA, involved in
assembly and insertion of β-barrel proteins into the outer membrane.Escherichia coli cell division protein FtsQ.Bacillus subtilis cell division protein DivIB.Serratia marcescens hemolysin transporter protein ShlB, a virulence-factor
transporter that is present in the outer membrane.Arabidopsis thaliana chloroplastic protein TOC75, mediates the insertion of
proteins targeted to the outer membrane of chloroplasts.Human sorting and assembly machinery component 50 (SAM50), plays a crucial
role in the maintenance of the structure of mitochondrial cristae and theproper assembly of the mitochondrial respiratory chain complexes.
This entry represents a family of NAD-protein ADP-ribosyltransferases mostly from Myoviridae, incluidng ModB from the bacteriophage T4 [
,
].Bacteriophage T4 encodes three ADP-ribosyltransferases: Alt, ModA, and ModB. The ADP-ribosylating activity of each is directed to a specific set of host proteins. ModB ADP-ribosylates a number of host proteins including ribosomal protein S1 [
].Protein ADP-ribosylation is an important posttranslational modification catalyzed by a group of enzymes known as ADP-ribosyltransferases (ADP-RTs) [
]. ADP-RTs transfer single or multiple ADP-ribose moieties from NAD to a specific amino acid residue within a target protein, forming mono ADP-ribosylation or poly ADP-ribosylation (PARylation) []. ADP-ribosylation changes the electrostatic potential of a target protein by introducing two phosphate groups and may affect protein-DNA as well as protein-protein interactions []. Protein ADP-ribosylation plays versatile roles in multiple biological processes.
This entry represents the kinase associated domain 1 (KA1) and the C-terminal domain of the SNF1-like protein kinase Ssp2. Members of the KIN2/PAR-1/MARK kinase subfamily are conserved from yeast to human and share the same domain organisation: an N-terminal kinase domain (
) and a C-terminal kinase associated domain 1 (KA1). Some members of the KIN1/PAR-1/MARK family also contain an UBA domain (
) [
]. Members of this kinase subfamily are involved in various biological processes such as cell polarity, cell cycle control, intracellular signalling, microtubule stability and protein stability []. The function of the KA1 domain is not yet known.Some proteins known to contain this domain are listed below:
Mammalian MAP/microtubule affinity-regulating kinases (MARK 1,2,3). They regulate polarity in neuronal cell models and appear to function redundantly in phosphorylating MT-associated proteins and in regulating MT stability [
].5'-AMP-activated protein kinase catalytic subunit alpha 1/2 (AMPK subunit alpha-1/2). SNF1-like protein kinase Ssp2 [
] and protein kinase domain-containing protein ppk9 from Schizosaccharomyces pombe. Mammalian maternal embryonic leucine zipper kinase (MELK). It phosphorylates ZNF622 and may contribute to its redirection to the nucleus. It may be involved in the inhibition of spliceosome assembly during mitosis.Caenorhabditis elegans and drosophila PAR-1 protein. It is required for establishing polarity in embryos where it is asymmetrically distributed [
].Fungal Kin1 and Kin2 protein kinases involved in regulation of exocytosis. They localise to the cytoplasmic face of the plasma membrane [
].CBL-interacting protein kinases and CBL-interacting serine/threonine-protein kinases from plants.
Blue (type 1) copper proteins constitute a diverse class of
proteins, including small blue proteins and multicopperoxidases. They bind copper and are characterised by an intense electronic absorption band near 600 nm [
,
].The most well known members of this class of proteins are the small blue proteins, which includes azurins and plastocyanins. It is a group of monomeric
proteins which contain one copper ion per molecule. The plant chloroplastic plastocyanins exchange electrons with cytochrome c6, and the distantly related bacterial azurins exchange electrons with cytochrome c551. This group also includes amicyanin from bacteria such as Methylobacterium extorquens or Paracoccus versutus (Thiobacillus versutus) that can grow on methylamine; auracyanins A and B from Chloroflexus aurantiacus []; blue copper protein from Alcaligenes faecalis; cupredoxin (CPC) from Cucumis sativus (Cucumber) peelings []; cusacyanin (basic blue protein; plantacyanin, CBP) from cucumber; halocyanin from Natronomonas pharaonis (Natronobacterium pharaonis) [], a membrane associated copper-binding protein; pseudoazurin from Pseudomonas; rusticyanin from Thiobacillus ferrooxidans []; stellacyanin from Rhus vernicifera (Japanese lacquer tree); umecyanin from the roots of Armoracia rusticana (Horseradish); and allergen Ra3 from ragweed. This pollen protein is evolutionary related to the above proteins, but seems to have lost the ability to bind copper.Although there is an appreciable amount of divergence in the sequences of all these proteins, the copper ligand sites are conserved. This entry represents a conserved site that includes two of the ligands: a cysteine and a histidine.
Iron-sulphur (FeS) clusters are important cofactors for numerous proteins involved in electron transfer, in redox and non-redox catalysis, in gene regulation, and as sensors of oxygen and iron. These functions depend on the various FeS cluster prosthetic groups, the most common being [2Fe-2S] and [4Fe-4S][
]. FeS cluster assembly is a complex process involving the mobilisation of Fe and S atoms from storage sources, their assembly into [Fe-S]form, their transport to specific cellular locations, and their transfer to recipient apoproteins. So far, three FeS assembly machineries have been identified, which are capable of synthesising all types of [Fe-S] clusters: ISC (iron-sulphur cluster), SUF (sulphur assimilation), and NIF (nitrogen fixation) systems.The ISC system is conserved in eubacteria and eukaryotes (mitochondria), and has broad specificity, targeting general FeS proteins [
,
]. It is encoded by the isc operon (iscRSUA-hscBA-fdx-iscX). IscS is a cysteine desulphurase, which obtains S from cysteine (converting it to alanine) and serves as a S donor for FeS cluster assembly. IscU and IscA act as scaffolds to accept S and Fe atoms, assembling clusters and transferring them to recipient apoproteins. HscA is a molecular chaperone and HscB is a co-chaperone. Fdx is a [2Fe-2S]-type ferredoxin. IscR is a transcription factor that regulates expression of the isc operon. IscX (also known as YfhJ) appears to interact with IscS and may function as an Fe donor during cluster assembly [
].The SUF system is an alternative pathway to the ISC system that operates under iron starvation and oxidative stress. It is found in eubacteria, archaea and eukaryotes (plastids). The SUF system is encoded by the suf operon (sufABCDSE), and the six encoded proteins are arranged into two complexes (SufSE and SufBCD) and one protein (SufA). SufS is a pyridoxal-phosphate (PLP) protein displaying cysteine desulphurase activity. SufE acts as a scaffold protein that accepts S from SufS and donates it to SufA [
]. SufC is an ATPase with an unorthodox ATP-binding cassette (ABC)-like component. SufA is homologous to IscA [], acting as a scaffold protein in which Fe and S atoms are assembled into [FeS]cluster forms, which can then easily be transferred to apoproteins targets.
In the NIF system, NifS and NifU are required for the formation of metalloclusters of nitrogenase in Azotobacter vinelandii, and other organisms, as well as in the maturation of other FeS proteins. Nitrogenase catalyses the fixation of nitrogen. It contains a complex cluster, the FeMo cofactor, which contains molybdenum, Fe and S. NifS is a cysteine desulphurase. NifU binds one Fe atom at its N-terminal, assembling an FeS cluster that is transferred to nitrogenase apoproteins [
]. Nif proteins involved in the formation of FeS clusters can also be found in organisms that do not fix nitrogen [].This entry describes IscR, an iron-sulphur binding transcription factor of the ISC iron-sulphur cluster assembly system [
]. The HTH-type transcriptional regulator IscR (iron-sulphur cluster regulator) regulates the transcription of several operons and genes involved in the biogenesis of Fe-S clusters and Fe-S-containing proteins. It is a transcriptional repressor of the iscRSUA operon, which is involved in the assembly of Fe-S clusters into Fe-S proteins. In its apoform, under conditions of oxidative stress or iron deprivation, it activates the suf operon, which is a second operon involved in the assembly of Fe-S clusters. It represses its own transcription [,
]. It is induced by oxidative stress conditions and iron starvation [].
Proteins synthesized on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. This traffic is bidirectional, to ensure that proteins required to form vesicles are recycled. Vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transfer [
]. Clathrin coats contain both clathrin and adaptor complexes that link clathrin to receptors in coated vesicles. Clathrin-associated protein complexes are believed to interact with the cytoplasmic tails of membrane proteins, leading to their selection and concentration. The two major types of clathrin adaptor complexes are the heterotetrameric adaptor protein (AP) complexes, and the monomeric GGA (Golgi-localising, Gamma-adaptin ear domain homology, ARF-binding proteins) adaptors [
]. All AP complexes are heterotetramers composed of two large subunits (adaptins), a medium subunit (mu) and a small subunit (sigma). Each subunit has a specific function. Adaptin subunits recognise and bind to clathrin through their hinge region (clathrin box), and recruit accessory proteins that modulate AP function through their C-terminal appendage domains. By contrast, GGAs are monomers composed of four domains, which have functions similar to AP subunits: an N-terminal VHS (Vps27p/Hrs/Stam) domain, a GAT (GGA and Tom1) domain, a hinge region, and a C-terminal GAE (gamma-adaptin ear) domain. The GAE domain is similar to the AP gamma-adaptin ear domain, being responsible for the recruitment of accessory proteins that regulate clathrin-mediated endocytosis [].While clathrin mediates endocytic protein transport from ER to Golgi, coatomers (COPI, COPII) primarily mediate intra-Golgi transport, as well as the reverse Golgi to ER transport of dilysine-tagged proteins []. Coatomers reversibly associate with Golgi (non-clathrin-coated) vesicles to mediate protein transport and for budding from Golgi membranes []. Coatomer complexes are hetero-oligomers composed of at least an alpha, beta, beta', gamma, delta, epsilon and zeta subunits. The alpha and beta2 adaptor subunits can each be divided into a trunk domain and the appendage domain (or ear domain), separated by a linker region. Clathrin polymerisation is promoted by its binding to the beta2 appendage and hinge domains. The alpha appendage domain interacts with a number of accessory proteins, including eps15, epsin, amphiphysin, AP180, auxilin, numb, and Dab2, thereby regulating the translocation of these proteins to the bud site. This entry represents a subdomain of the appendage (ear) domain of alpha- and beta-adaptin from AP clathrin adaptor complexes, and the appendage domain of the gamma subunit of coatomer complexes. These domains have a three-layer arrangement, α-β-alpha, with a bifurcated antiparallel β-sheet [
,
, [
]. Although the appendage domains from AP adaptins and coatomers share a similar fold, there is little sequence identity between them. However, they also share similar motif-based cargo recognition and accessory factor recruitment mechanisms.
This entry represents members of the SMP-30/CGR1 family which act as lactonases such as regucalcin. Regucalcin (RGN) is a gluconolactonase (
), converting D-glucono-1,5-lactone to D-gluconate, but also hydrolyses other carbohydrate lactones. This enzyme is required for the penultimate step in vitamin C biosynthesis. From its crystal structure, regucalcin has a six-bladed β-propeller fold, and binds a single metal ion (either Ca
2+or Zn
2+)) [
]. Homologues with similar catalytic activity have been isolated and characterized from bacteria []. There are other bacterial homologues. L-arabinolactonase (), from
Azospirillum brasilense, converts L-arabino-gamma-lactone to L-arabonate, allowing the bacterium utilize L-arabinose as a sole carbon source [
]; lactonase drp35 from Staphylococcusspecies acts as a lactonase on dihydrocoumarin or 2-coumaranone [
].A homologue from the squid
Loligo vulgarican act as a diisopropyl-fluorophosphatase (
), but its physiological substrate is unknown [
].Regucalcin, also known as senesence marker protein-30 (SMP30), was discovered in 1978 as a Ca2+binding protein that does not contain EF-hand motifs, suggesting a novel class of Ca
2+binding protein. It is primarily localised to the liver and kidney cortex of animals. Expression of its mRNA in the liver and renal cortex of rats is stimulated by an increase in cellular Ca2+levels [
,
]. Regucalin, as a regulatory protein of Ca2+, has a pivotal role in the
control of many cell functions. The protein has a reversible effect on Ca2+-induced activation and inhibition of many enzymes in both the liver and renal cortex cells [
]. It has also been shown to inhibit various protein kinases (including Ca2+/calmodulin-dependent protein kinase [
], protein kinase C [] and tyrosine kinase) and protein phosphatases, indicating a regulatory role in signal transduction within the cell. In addition, regucalcin regulates intracellular Ca2+homeostasis by enhancing Ca
2+-pumping activity in the plasma membrane through activation of the pump enzymes [
]. Moreover, it can inhibit RNA synthesis in the nuclei of normal and regenerating rat livers in vitro [].Hydropathy profiles indicate hydrophobic domains in both N- and C-terminal
regions of the regucalcin molecule; the protein also exhibits hydrophilic characteristics. Human and rodent regucalcins share 89% sequence identity, the high degree of conservation between species suggesting that the complete structure is required for physiological function. SMP30 sequences also share a high level of similarity with proteins from a
wide taxonomic range: these include fly anterior fat body proteins; fireflyluciferin regenerating enzyme; putative calcium binding transcriptional
regulatory proteins from Rhizobium meliloti and Streptomyces coelicolor;gluconolactonase from Brucella melitensis; cell growth protein CGR1 from
Candida albicans; and homologues from Thermoplasma acidophilum, Thermoplasmavolcanium, Sulfolobus tokodaii, Sulfolobus solfataricus, Bacillus subtilis
and Rhizobium loti. As such, a number of lactonases are included in this family.
Iron-sulphur (FeS) clusters are important cofactors for numerous proteins involved in electron transfer, in redox and non-redox catalysis, in gene regulation, and as sensors of oxygen and iron. These functions depend on the various FeS cluster prosthetic groups, the most common being [2Fe-2S] and [4Fe-4S][
]. FeS cluster assembly is a complex process involving the mobilisation of Fe and S atoms from storage sources, their assembly into [Fe-S]form, their transport to specific cellular locations, and their transfer to recipient apoproteins. So far, three FeS assembly machineries have been identified, which are capable of synthesising all types of [Fe-S] clusters: ISC (iron-sulphur cluster), SUF (sulphur assimilation), and NIF (nitrogen fixation) systems.The ISC system is conserved in eubacteria and eukaryotes (mitochondria), and has broad specificity, targeting general FeS proteins [
,
]. It is encoded by the isc operon (iscRSUA-hscBA-fdx-iscX). IscS is a cysteine desulphurase, which obtains S from cysteine (converting it to alanine) and serves as a S donor for FeS cluster assembly. IscU and IscA act as scaffolds to accept S and Fe atoms, assembling clusters and transferring them to recipient apoproteins. HscA is a molecular chaperone and HscB is a co-chaperone. Fdx is a [2Fe-2S]-type ferredoxin. IscR is a transcription factor that regulates expression of the isc operon. IscX (also known as YfhJ) appears to interact with IscS and may function as an Fe donor during cluster assembly [].The SUF system is an alternative pathway to the ISC system that operates under iron starvation and oxidative stress. It is found in eubacteria, archaea and eukaryotes (plastids). The SUF system is encoded by the suf operon (sufABCDSE), and the six encoded proteins are arranged into two complexes (SufSE and SufBCD) and one protein (SufA). SufS is a pyridoxal-phosphate (PLP) protein displaying cysteine desulphurase activity. SufE acts as a scaffold protein that accepts S from SufS and donates it to SufA [
]. SufC is an ATPase with an unorthodox ATP-binding cassette (ABC)-like component. SufA is homologous to IscA [], acting as a scaffold protein in which Fe and S atoms are assembled into [FeS]cluster forms, which can then easily be transferred to apoproteins targets.
In the NIF system, NifS and NifU are required for the formation of metalloclusters of nitrogenase in Azotobacter vinelandii, and other organisms, as well as in the maturation of other FeS proteins. Nitrogenase catalyses the fixation of nitrogen. It contains a complex cluster, the FeMo cofactor, which contains molybdenum, Fe and S. NifS is a cysteine desulphurase. NifU binds one Fe atom at its N-terminal, assembling an FeS cluster that is transferred to nitrogenase apoproteins [
]. Nif proteins involved in the formation of FeS clusters can also be found in organisms that do not fix nitrogen [].This entry represents the NifU protein from the NIF system that is involved in nitrogenase maturation.
Signal recognition particle, SRP54 subunit, M-domain
Type:
Domain
Description:
The signal recognition particle (SRP) is a multimeric protein, which along with its conjugate receptor (SR), is involved in targeting secretory proteins to the rough endoplasmic reticulum (RER) membrane in eukaryotes, or to the plasma membrane in prokaryotes [
,
]. SRP recognises the signal sequence of the nascent polypeptide on the ribosome. In eukaryotes this retards its elongation until SRP docks the ribosome-polypeptide complex to the RER membrane via the SR receptor []. Eukaryotic SRP consists of six polypeptides (SRP9, SRP14, SRP19, SRP54, SRP68 and SRP72) and a single 300 nucleotide 7S RNA molecule. The RNA component catalyses the interaction of SRP with its SR receptor []. In higher eukaryotes, the SRP complex consists of the Alu domain and the S domain linked by the SRP RNA. The Alu domain consists of a heterodimer of SRP9 and SRP14 bound to the 5' and 3' terminal sequences of SRP RNA. This domain is necessary for retarding the elongation of the nascent polypeptide chain, which gives SRP time to dock the ribosome-polypeptide complex to the RER membrane. In archaea, the SRP complex contains 7S RNA like its eukaryotic counterpart, yet only includes two of the six protein subunits found in the eukarytic complex: SRP19 and SRP54 [].This entry represents the M domain of the 54kDa SRP54 component, a GTP-binding protein that interacts with the signal sequence when it emerges from the ribosome. SRP54 of the signal recognition particle has a three-domain structure: an N-terminal helical bundle domain, a GTPase domain, and the M-domain that binds the 7s RNA and also binds the signal sequence. The extreme C-terminal region is glycine-rich and lower in complexity and poorly conserved between species.These proteins include Escherichia coli and Bacillus subtilis ffh protein (P48), which seems to be the prokaryotic counterpart of SRP54; signal recognition particle receptor alpha subunit (docking protein), an integral membrane GTP-binding protein which ensures, in conjunction with SRP, the correct targeting of nascent secretory proteins to the endoplasmic reticulum membrane; bacterial FtsY protein, which is believed to play a similar role to that of the docking protein in eukaryotes; the pilA protein from Neisseria gonorrhoeae, the homologue of ftsY; and bacterial flagellar biosynthesis protein flhF.
Signal recognition particle, SRP54 subunit, M-domain superfamily
Type:
Homologous_superfamily
Description:
The signal recognition particle (SRP) is a multimeric protein, which along with its conjugate receptor (SR), is involved in targeting secretory proteins to the rough endoplasmic reticulum (RER) membrane in eukaryotes, or to the plasma membrane in prokaryotes [
,
]. SRP recognises the signal sequence of the nascent polypeptide on the ribosome. In eukaryotes this retards its elongation until SRP docks the ribosome-polypeptide complex to the RER membrane via the SR receptor []. Eukaryotic SRP consists of six polypeptides (SRP9, SRP14, SRP19, SRP54, SRP68 and SRP72) and a single 300 nucleotide 7S RNA molecule. The RNA component catalyses the interaction of SRP with its SR receptor []. In higher eukaryotes, the SRP complex consists of the Alu domain and the S domain linked by the SRP RNA. The Alu domain consists of a heterodimer of SRP9 and SRP14 bound to the 5' and 3' terminal sequences of SRP RNA. This domain is necessary for retarding the elongation of the nascent polypeptide chain, which gives SRP time to dock the ribosome-polypeptide complex to the RER membrane. In archaea, the SRP complex contains 7S RNA like its eukaryotic counterpart, yet only includes two of the six protein subunits found in the eukarytic complex: SRP19 and SRP54 [].This entry represents the M domain superfamily of the 54kDa SRP54 component, a GTP-binding protein that interacts with the signal sequence when it emerges from the ribosome. SRP54 of the signal recognition particle has a three-domain structure: an N-terminal helical bundle domain, a GTPase domain, and the M-domain that binds the 7s RNA and also binds the signal sequence. The extreme C-terminal region is glycine-rich and lower in complexity and poorly conserved between species.These proteins include Escherichia coli and Bacillus subtilis ffh protein (P48), which seems to be the prokaryotic counterpart of SRP54; signal recognition particle receptor alpha subunit (docking protein), an integral membrane GTP-binding protein which ensures, in conjunction with SRP, the correct targeting of nascent secretory proteins to the endoplasmic reticulum membrane; bacterial FtsY protein, which is believed to play a similar role to that of the docking protein in eukaryotes; the pilA protein from Neisseria gonorrhoeae, the homologue of ftsY; and bacterial flagellar biosynthesis protein flhF.
This superfamily represents the N-terminal domain of the 50S ribosomal protein L15 and 60S ribosomal protein L27a. L15 binds to the 23S rRNA.Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [,
]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [
,
].
This entry represents viral capsid proteins from group I dsDNA viruses, including Papovaviridae-like Polyomaviruses and Papillomaviruses. Virus-encoded capsid proteins play a major role in the life cycles of all viruses. Structures have been determined for the major capsid protein VP1 (viral protein 1) from Murine polyomavirus (strain P16 small-plaque) (MPyV) [] and the major late protein L1 from Human papillomavirus (HPV) []. These capsid proteins share a β-sandwich topology. Characteristic interactions between the domains of this fold allows the formation of 5-fold and pseudo 6-fold assemblies.Polyomaviruses are dsDNA viruses with no RNA stage in their life cycle. The virus capsid is composed of 72 icosahedral units, each of which is composed of five copies of VP1. The virus attaches to the cell surface by recognition of oligosaccharides terminating in alpha(2,3)-linked sialic acid. The capsid protein VP1 forms a pentamer. The complete capsid is composed of 72 VP1 pentamers, with a minor capsid protein, VP2 or VP3, inserted into the centre of each pentamer like a hairpin. This structure restricts the exposure of internal proteins during viral entry. Polyomavirus coat assembly is rigorously controlled by chaperone-mediated assembly. During viral infection, the heat shock chaperone hsc70 binds VP1 and co-localises it in the nucleus, thereby regulating capsid assembly [
].Papillomaviruses are members of the papovavirus superfamily. More than 70 different types of papillomavirus have been discovered in humans, some of which have been shown to cause genital carcinomas and cutaneous warts [
]. The viruses contain a circular dsDNA genome surrounded by an icosahedral capsid (non-enveloped). Two proteins are involved in capsid formation: a major (L1) and a minor (L2) protein, in the approximate proportion 95:5%. L1 forms a pentameric assembly unit of the viral shell in a manner that closely resembles VP1 from polyomaviruses. Intermolecular disulphide bonding holds the L1 capsid proteins together []. L1 capsid proteins can bind via its nuclear localisation signal (NLS) to karyopherins Kapbeta(2) and Kapbeta(3) and inhibit the Kapbeta(2) and Kapbeta(3) nuclear import pathways during the productive phase of the viral life cycle []. Surface loops on L1 pentamers contain sites of sequence variation between HPV types.
Twin-arginine translocation pathway, signal sequence, bacterial/archaeal
Type:
Conserved_site
Description:
The twin-arginine translocation (Tat) pathway serves the role of transporting folded proteins across energy-transducing membranes [
]. Homologues of the genes that encode the transport apparatus occur in archaea, bacteria, chloroplasts, and plant mitochondria []. In bacteria, the Tat pathway catalyses the export of proteins from the cytoplasm across the inner/cytoplasmic membrane. In chloroplasts, the Tat components are found in the thylakoid membrane and direct the import of proteins from the stroma. The Tat pathway acts separately from the general secretory (Sec) pathway, which transports proteins in an unfolded state [].It is generally accepted that the primary role of the Tat system is to translocate fully folded proteins across membranes. An example of proteins that need to be exported in their 3D conformation are redox proteins that have acquired complex multi-atom cofactors in the bacterial cytoplasm (or the chloroplast stroma or mitochondrial matrix). They include hydrogenases, formate dehydrogenases, nitrate reductases, trimethylamine N-oxide (TMAO) reductases and dimethyl sulphoxide (DMSO) reductases [
,
]. The Tat system can also export whole heteroligomeric complexes in which some proteins have no Tat signal. This is the case of the DMSO reductase or formate dehydrogenase complexes. But there are also other cases where the physiological rationale for targeting a protein to the Tat signal is less obvious. Indeed, there are examples of homologous proteins that are in some cases targeted to the Tat pathway and in other cases to the Sec apparatus. Some examples are: copper nitrite reductases, flavin domains of flavocytochrome c and N-acetylmuramoyl-L-alanine amidases [].In halophilic archaea such as Halobacterium almost all secreted proteins appear to be Tat targeted. It has been proposed to be a response to the difficulties these organisms would otherwise face in successfully folding proteins extracellularly at high ionic strength [
].The Tat signal peptide consists of three motifs: the positively charged N-terminal motif, the hydrophobic region and the C-terminal region that generally ends with a consensus short motif (A-x-A) specifying cleavage by signal peptidase. Sequence analysis revealed that signal peptides capable of targeting the Tat protein contain the consensus sequence [ST]-R-R-x-F-L-K. The nearly invariant twin-arginine gave rise to the pathway's name. In addition the h-region of Tat signal peptides is typically less hydrophobic than that of Sec-specific signal peptides [,
].
This is a family of glycine cleavage H-proteins, part of the glycine cleavage system (GCS) found in bacteria, archaea, and the mitochondria of eukaryotes. GCS is a multienzyme complex consisting of 4 different components (P-, H-, T- and L-proteins) which catalyzes the oxidative cleavage of glycine []. The H-protein shuttles the methylamine group of glycine from the P-protein (glycine dehydrogenase) to the T-protein (aminomethyltransferase) via a lipoyl group, attached to a completely conserved lysine residue [].This entry represents the glycine cleavage system H protein. The genome of Aquifex aeolicus contains one protein belonging to this group, and four more related proteins not included here; it seems doubtful that all of these homologues are authentic H proteins. The Chlamydial homologue of the H protein is nearly as divergent as the Aquifex outgroup, is not accompanied by P and T proteins, and is not included in this entry.
The PET domain is a ~110 amino acid motif in the N-terminal part of LIM domain proteins. The domain was described in Drosophila proteins involved in cell differentiation and is named after Prickle, Espinas and Testin. PET domain proteins contain about three zinc-binding LIM domains (see
,
) and are found among metazoans. The PET domain has been suggested to play a role in protein-protein interactions with proteins involved in planar polarity signalling or organisation of the cytoskeleton [
]. Some proteins known to contain a PET domain:Mammalian testin protein (
), which may function as a tumour suppressor.
Mammalian LIM domain only protein 6 (LMO6/Prickle3,
).
Fruit fly prickle (
) and espinas (
) proteins encoded by the tissue polarity gene prickle (pk), involved in the control of orientation of bristles and hairs.
Mammalian prickle-like proteins 1 (
) and 2 (
).
This entry includes Hobbit protein from Drosophila melanogaster and its homologues such as Bridge-like lipid transfer protein family member 2 from human (BLTP2/KIAA0100) and FMP27 from yeast, referred to the Hob proteins [
,
]. These proteins are localized to endoplasmic reticulum-plasma membrane (ER-PM) contact sites described as conserved lipid-binding proteins. They have a long hydrophobic groove and can mediate bulk transport of lipids between organelles. This entry also includes maize protein APT1 and Arabidopsis homologues SABRE and KIP []. The Hob family belongs to the repeating β-groove (RBG) superfamily together with VPS13, ATG2, SHIP164, Csf1/BLTP1, which are all conserved lipid transfer proteins containing long hydrophobic grooves []. They all share the same structure consisting of multiple repeating modules consisting of five β-sheets followed by a loop. The function of the FMP27 (Found in mitochondrial proteome protein 27) is tube-forming lipid transport protein which binds to phosphatidylinositols and affects phosphatidylinositol-4,5-bisphosphate (PtdIns-4,5-P2) distribution [
,
]. APT1 (Aberrant pollen transmission 1) is required for pollen tube growth. It is a Golgi-localised protein and appears to regulate vesicular trafficking []. SABRE (Hypersensitive to Pi starvation 4) and KIP (Kinky pollen) are APT1 homologues and they are involved in the elongation of root cortex cells and pollen tubes respectively.This entry represents the C-terminal region of the Hobbit family of proteins. They belong to the repeating β-groove (RBG) superfamily together with VPS13, ATG2, SHIP164, Csf1/BLTP1, which are all conserved lipid transfer proteins containing long hydrophobic grooves [
]. They all share the same structure consisting of multiple repeating modules consisting of five β-sheets followed by a loop.
The drosophila Tudor protein, the founder of the Tudor domain family, is encoded by a 'posterior group' gene, which when mutated disrupt normal abdominal segmentation and pole cell formation. Another drosophila gene, homeless, is required for RNA localization during oogenesis. The tudor protein contains multiple repeats of a domain which is also found in homeless [
,
].The tudor domain is found in many proteins that colocalise with ribonucleoprotein or single-strand DNA-associated complexes in the nucleus, in the mitochondrial membrane, or at kinetochores. At first it was not clear if the domain binds directly to RNA and ssDNA, or controls interactions with the nucleoprotein complexes but it is now known that this domain recognises and binds to methyl-arginine-lysine residues, playing important roles in diverse epigenetics, gene expression and the regulation of various small RNAs [
,
,
]. The tudor-containing protein homeless, also contains a zinc finger typical of RNA-binding proteins [].This domain has been implicated in protein-protein interactions in which methylated protein substrates bind to these domains. One example is the Tudor domain of Survival of Motor Neuron (SMN), linked to spinal muscular atrophy, which binds to symmetrically dimethylated arginines of arginine-glycine (RG) rich sequences found in the C-terminal tails of Sm proteins. The resolution of the solution structure of the Tudor domain of human SMN revealed that the Tudor domain forms a strongly bent antiparallel β-sheet with five strands forming a barrel-like fold. The structure exhibits a conserved negatively charged surface that interacts with the C-terminal Arg and Gly-rich tails of the spliceosomal Sm D1 and D3 proteins [
,
].
Sequence analysis of the products of the GRAS (GAI, RGA, SCR) gene family indicates that they share a variable N terminus and a highly conserved C terminus that contains five recognizable motifs [
]. Proteins in the GRAS family are major players in gibberellin (GA) signaling, which regulates various aspects of plant growth and development []. Mutation of the SCARECROW (SCR) gene results in a radial pattern defect, loss of a ground tissue layer, in the root. The PAT1 protein is involved in phytochrome A signal transduction [].A sequence, structure and evolutionary analysis showed that the GRAS family emerged in bacteria and belongs to the Rossmann-fold, AdoMET (SAM)-dependent methyltransferase superfamily [
]. All bacterial, and a subset of plant GRAS proteins, are predicted to be active and function as small-molecule methylases. Several plant GRAS proteins lack one or more AdoMet (SAM)-binding residues while preserving their substrate-binding residues. Although GRAS proteins are implicated to function as transcriptional factors, the above analysis suggests that they instead might either modify or bind small molecules [].Some proteins known to belong to the GRAS family are listed below:
Arabidopsis thaliana SCARECROW (SCR) protein. It regulates asymetric cell divisions of cortex/endodermal initial cells during root development.Arabidopsis thaliana SCARECROW-LIKE (SCL) protein.Arabidopsis thaliana GIBBERELLIN-ACID INSENSITIVE (GAI) and REPRESSOR OF
GA1 (RGA), two closely related proteins involved in gibberellin signaling.Arabidopsis thaliana SHORT ROOT (SHR) protein. It is necessary for cell
division and endodermis specification.Arabidopsis thaliana PAT1 protein. It inhibits light signaling via the
phytochrome A (phyA).LATERAL SUPPRESSOR (LS), a protein from tomato that controls the formation
of lateral branches during vegetative development.
Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [
,
,
,
,
]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. This entry represents ZZ-type zinc finger domains, named because of their ability to bind two zinc ions [
]. These domains contain 4-6 Cys residues that participate in zinc binding (plus additional Ser/His residues), including a Cys-X2-Cys motif found in other zinc finger domains. These zinc fingers are thought to be involved in protein-protein interactions. The structure of the ZZ domain shows that it belongs to the family of cross-brace zinc finger motifs that include the PHD, RING, and FYVE domains []. ZZ-type zinc finger domains are found in:Transcription factors P300 and CBP.Plant proteins involved in light responses, such as Hrb1.E3 ubiquitin ligases MEX and MIB2 (
).
Dystrophin and its homologues.Single copies of the ZZ zinc finger occur in the transcriptional adaptor/coactivator proteins P300, in cAMP response element-binding protein (CREB)-binding protein (CBP) and ADA2. CBP provides several binding sites for transcriptional coactivators. The site of interaction with the tumour suppressor protein p53 and the oncoprotein E1A with CBP/P300 is a Cys-rich region that incorporates two zinc-binding motifs: ZZ-type and TAZ2-type. The ZZ-type zinc finger of CBP contains two twisted anti-parallel β-sheets and a short α-helix, and binds two zinc ions [
]. One zinc ion is coordinated by four cysteine residues via 2 Cys-X2-Cys motifs, and the third zinc ion via a third Cys-X-Cys motif and a His-X-His motif. The first zinc cluster is strictly conserved, whereas the second zinc cluster displays variability in the position of the two His residues.In Arabidopsis thaliana (Mouse-ear cress), the hypersensitive to red and blue 1 (Hrb1) protein, which regulating both red and blue light responses, contains a ZZ-type zinc finger domain [
].ZZ-type zinc finger domains have also been identified in the testis-specific E3 ubiquitin ligase MEX that promotes death receptor-induced apoptosis [
]. MEX has four putative zinc finger domains: one ZZ-type, one SWIM-type and two RING-type. The region containing the ZZ-type and RING-type zinc fingers is required for interaction with UbcH5a and MEX self-association, whereas the SWIM domain was critical for MEX ubiquitination.In addition, the Cys-rich domains of dystrophin, utrophin and an 87kDa post-synaptic protein contain a ZZ-type zinc finger with high sequence identity to P300/CBP ZZ-type zinc fingers. In dystrophin and utrophin, the ZZ-type zinc finger lies between a WW domain (flanked by and EF hand) and the C-terminal coiled-coil domain. Dystrophin is thought to act as a link between the actin cytoskeleton and the extracellular matrix, and perturbations of the dystrophin-associated complex, for example, between dystrophin and the transmembrane glycoprotein beta-dystroglycan, may lead to muscular dystrophy. Dystrophin and its autosomal homologue utrophin interact with beta-dystroglycan via their C-terminal regions, which are comprised of a WW domain, an EF hand domain and a ZZ-type zinc finger domain [
]. The WW domain is the primary site of interaction between dystrophin or utrophin and dystroglycan, while the EF hand and ZZ-type zinc finger domains stabilise and strengthen this interaction.
It has been shown that a number of prokaryotic and eukaryotic enzymes which all probably act via an ATP-dependent covalent binding of AMP to their substrate, share a region of sequence similarity [
,
,
,
]. These enzymes are:Insects luciferase (luciferin 4-monooxygenase) (
). Luciferase produces light by catalysing the oxidation of luciferin in presence of ATP and molecular oxygen.
Alpha-aminoadipate reductase (
) from yeast (gene LYS2). This enzyme catalyses the activation of alpha-aminoadipate by ATP-dependent adenylation and the reduction of activated alpha-aminoadipate by NADPH.
Acetate--CoA ligase (
) (acetyl-CoA synthetase), an enzyme that catalyses the formation of acetyl-CoA from acetate and CoA.
Long-chain-fatty-acid--CoA ligase (
), an enzyme that activates long-chain fatty acids for both the synthesis of cellular lipids and their degradation via beta-oxidation.
4-coumarate--CoA ligase (
) (4CL), a plant enzyme that catalyses the formation of 4-coumarate-CoA from 4-coumarate and coenzyme A; the branchpoint reactions between general phenylpropanoid metabolism and pathways leading to various specific end products.
O-succinylbenzoic acid--CoA ligase (
) (OSB-CoA synthetase) (gene menE) [
], a bacterial enzyme involved in the biosynthesis of menaquinone(vitamin K2).
4-Chlorobenzoate--CoA ligase (
) (4-CBA--CoA ligase) [
], a Pseudomonas enzyme involved in the degradation of 4-CBA.Indoleacetate--lysine ligase (
) (IAA-lysine synthetase) [
], an enzyme from Pseudomonas syringae that converts indoleacetate to IAA-lysine.Bile acid-CoA ligase (gene baiB) from Eubacterium sp. (strain VPI 12708) [
].This enzyme catalyses the ATP-dependent formation of a variety of C-24 bile acid-CoA.
Crotonobetaine/carnitine-CoA ligase (
) from Escherichia coli (gene caiC).
L-(alpha-aminoadipyl)-L-cysteinyl-D-valine synthetase (ACV synthetase) from various fungi (gene acvA or pcbAB). This enzyme catalyzes the first step in the biosynthesis of penicillin and cephalosporin, the formation of ACV from the constituent amino acids. The amino acids seem to be activated by adenylation. It is a protein of around 3700 amino acids that contains three related domains of about 1000 amino acids.
Gramicidin S synthetase I (gene grsA) from Brevibacillus brevis (Bacillus brevis). This enzyme catalyzes the first step in the biosynthesis of the cyclic antibiotic gramicidin S, the ATP-dependent racemization of phenylalanine (
)
Tyrocidine synthetase I (gene tycA) from B. brevis. The reaction carried out by tycA is identical to that catalyzed by GrsA
Gramicidin S synthetase II (gene grsB) from B. brevis. This enzyme is a multifunctional protein that activates and polymerises proline, valine, ornithine and leucine. GrsB consists of four related domains.
Enterobactin synthetase components E (gene entE) and F (gene entF) from E. coli. These two enzymes are involved in the ATP-dependent activation of respectively 2,3-dihydroxybenzoate and serine during enterobactin (enterochelin) biosynthesis.
Cyclic peptide antibiotic surfactin synthase subunits 1, 2 and 3 from Bacillus subtilis. Subunits 1 and 2 contains three related domains while subunit 3 only contains a single domain.
HC-toxin synthetase (gene HTS1) from Cochliobolus carbonum (Bipolaris zeicola). This enzyme activates the four amino acids (Pro, L-Ala, D-Ala and 2-amino-9,10-epoxi-8-oxodecanoic acid) that make up HC-toxin, a cyclic tetrapeptide. HTS1 consists of four related domains.There are also some proteins, whose exact function is not yet known, but which are, very probably, also AMP-binding enzymes. These proteins are:ORA (octapeptide-repeat antigen), a Plasmodium falciparum protein whose function is not known but which shows a high degree of similarity with the above proteins.
AngR, a Vibrio anguillarum (Listonella anguillarum) protein. AngR is thought to be a transcriptional activator which modulates the anguibactin (an iron-binding siderophore) biosynthesis gene cluster operon. But we believe, that AngR is not a DNA-binding protein, but rather an enzyme involved in the biosynthesis of anguibactin. This conclusion is based on three facts: the presence of the AMP-binding domain; the size of AngR (1048 residues), which is far bigger than any bacterial transcriptional protein; and the presence of a probable S-acyl thioesterase immediately downstream of the gene for AngR.
A hypothetical protein in MmsB 3'region in Pseudomonas aeruginosa.
E. coli hypothetical protein YdiD.
Yeast hypothetical protein YBR041w.
Yeast hypothetical protein YBR222c.
Yeast hypothetical protein YER147c.All these proteins contains a highly conserved region very rich in glycine, serine, and threonine which is followed by a conserved lysine. A parallel can be drawn between this type of domain and the G-x(4)-G-K-[ST] ATP-/GTP-binding 'P-loop' domain or the protein kinases G-x-G-x(2)-[SG]-x(10,20)-K ATP-binding domains (see
and
).
This entry represents the delta subunit of the coatomer complex, which is involved in the regulation of intracellular protein trafficking between the endoplasmic reticulum and the Golgi complex [
].Proteins synthesised on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. This traffic is bidirectional, to ensure that proteins required to form vesicles are recycled. Vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transfer [
]. While clathrin mediates endocytic protein transport, and transport from ER to Golgi, coatomers primarily mediate intra-Golgi transport, as well as the reverse Golgi to ER transport of dilysine-tagged proteins []. For example, the coatomer COP1 (coat protein complex 1) is responsible for reverse transport of recycled proteins from Golgi and pre-Golgi compartments back to the ER, while COPII buds vesicles from the ER to the Golgi []. Coatomers reversibly associate with Golgi (non-clathrin-coated) vesicles to mediate protein transport and for budding from Golgi membranes []. Activated small guanine triphosphatases (GTPases) attract coat proteins to specific membrane export sites, thereby linking coatomers to export cargos. As coat proteins polymerise, vesicles are formed and budded from membrane-bound organelles. Coatomer complexes also influence Golgi structural integrity, as well as the processing, activity, and endocytic recycling of LDL receptors. In mammals, coatomer complexes can only be recruited by membranes associated to ADP-ribosylation factors (ARFs), which are small GTP-binding proteins. Coatomer complexes are hetero-oligomers composed of at least an alpha, beta, beta', gamma, delta, epsilon and zeta subunits.
This entry represents the epsilon subunit of the coatomer complex, which is involved in the regulation of intracellular protein trafficking between the endoplasmic reticulum and the Golgi complex [
].Proteins synthesised on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. This traffic is bidirectional, to ensure that proteins required to form vesicles are recycled. Vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transfer [
]. While clathrin mediates endocytic protein transport, and transport from ER to Golgi, coatomers primarily mediate intra-Golgi transport, as well as the reverse Golgi to ER transport of dilysine-tagged proteins []. For example, the coatomer COP1 (coat protein complex 1) is responsible for reverse transport of recycled proteins from Golgi and pre-Golgi compartments back to the ER, while COPII buds vesicles from the ER to the Golgi []. Coatomers reversibly associate with Golgi (non-clathrin-coated) vesicles to mediate protein transport and for budding from Golgi membranes []. Activated small guanine triphosphatases (GTPases) attract coat proteins to specific membrane export sites, thereby linking coatomers to export cargos. As coat proteins polymerise, vesicles are formed and budded from membrane-bound organelles. Coatomer complexes also influence Golgi structural integrity, as well as the processing, activity, and endocytic recycling of LDL receptors. In mammals, coatomer complexes can only be recruited by membranes associated to ADP-ribosylation factors (ARFs), which are small GTP-binding proteins. Coatomer complexes are hetero-oligomers composed of at least an alpha, beta, beta', gamma, delta, epsilon and zeta subunits.
The papillomavirus E6 oncoproteins are small zinc-binding proteins that share a conserved zinc-binding CXXC motif and do not have identified intrinsic enzymatic activity. E6 proteins are thought to act as adapter proteins, thereby altering the function of E6-associated cellular proteins. This model for E6 function is best supported by observations of human papillomavirus type 16 (HPV-16) E6 (16E6), which can alter the metabolism of the p53 tumor suppressor through association with a cellular E3 ubiquitin ligase called E6AP. HPV-16 E6 interacts with an 18-amino-acid sequence in E6AP, and in an as yet ill-defined fashion the E6AP-16E6 complex binds to p53, inducing the ubiquitin-dependent degradation of the trimolecular complex. 16E6 apparently functions as an adapter protein in the complex with p53, since E6AP does not interact with p53 in the absence of E6 and since the degradation of p53 requires both E6 and E6AP [
,
,
].Despite the similarity in structure of the E6 oncoproteins, studies have indicated surprising biochemical diversity among E6 oncoproteins of different papillomavirus types. E6 from the cancer-associated human papillomaviruses (HPVs) complex with a cellular protein termed E6-AP and together with E6-AP bind to the p53 tumor suppressor protein thereby degrading p53 through ubiquitin-mediated proteolysis. E6 from the non-cancer-associated HPV types do not bind E6-AP or degrade p53. Bovine papilloma virus E6 (BE6) binds E6-AP but fails either to complex with p53 or to degrade associated proteins, implying that BE6 might transform cells through a mechanism different from that of the HPVs. In addition to targeting p53, E6 of both cancer-associated HPVs and BPV-1 have been shown to associate with a cellular-calcium-binding protein localized to the endoplasmic reticulum [
,
].
The hemerythrin family is composed of hemerythrin proteins found in invertebrates, and a broader collection of bacterial and archaeal homologues. Hemerythrin is an oxygen-binding protein found in the vascular system and coelomic fluid, or in muscles (myohemerythrin) in invertebrates [
]. Many of the homologous proteins found in prokaryotes are multi-domain proteins with signal-transducing domains such as the GGDEF diguanylate cyclase domain () and methyl-accepting chemotaxis protein (MCP) signalling domain (
). Most hemerythrins are oxygen-carriers with a bound non-haem iron, but at least one example is a cadmium-binding protein, apparently with a role in sequestering toxic metals rather than in binding oxygen. The prokaryote with the most instances of this domain is Magnetococcus sp. MC-1, a magnetotactic bacterium.
Hemerythrins and myohemerythrins [
,
] are small proteins of about 110 to 129 amino acid residues that bind two iron atoms. They are left-twisted 4-α-helical bundles, which provide a hydrophobic pocket where dioxygen binds as a peroxo species, interacting with adjacent aliphatic side chains via van der Waals forces []. In both hemerythrins and myohemerythrins, the active centre is a binuclear iron complex, bound directly to the protein via 7 amino acid side chains [], 5 His, 1 Glu and 1 Asp []. Ovohemerythrin [], a yolk protein from the leech Theromyzon tessulatum seems to belong to this family of proteins, it may play a role in the detoxification of free iron after a blood meal [].This entry represents the iron-binding site found in haemerythrin and in related proteins. This site is located in the central region of these proteins and contains four of the iron-ligands: three histidines and a glutamate or glutamine.
The viral polyprotein of parechoviruses contains: coat protein VP0 (P1AB); coat protein VP3 (P1C); coat protein VP1 (P1D); picornain 2A (
, core protein P2A); core protein P2B; core protein P2C; core protein P3A; genome-linked protein VPg (P3B); picornain 3C (
, MEROPS peptidase subfamily 3CF: parechovirus picornain 3C (P3C)) [
].This entry consists of the genome-linked protein Vpg type P3B.
Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [
,
]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [
,
].Ribosome-binding factor A [
] (gene rbfA) is a bacterial protein that associates with free 30S ribosomal subunits. It does not associate with 30S subunits that are part of 70S ribosomes or polysomes. It is essential for efficient processing of 16S rRNA. Ribosome-binding factor A is a protein of from 13 to 15 Kd which is found in most bacteria. A putative chloroplastic form seems to exist in plants.The structural domain of RbfA has an α-β fold containing three helices and three β-strands: alpha1-beta1-beta2-alpha2-alpha3-beta3. The structure has type-II KH-domain fold topology, related to conserved KH sequence family proteins whose beta-α-α-beta subunits are characterised by a helix-turn-helix motif with sequence signature GxxG at the turn. In RbfA, this beta-α-α-beta subunit is characterised by a helix-kink-helix motif in which the GxxG sequence is replaced by a conserved AxG sequence [
].
TAZ (Transcription Adaptor putative Zinc finger) domains are zinc-containing domains found in the homologous transcriptional co-activators CREB-binding protein (CBP) and the P300. CBP and P300 are histone acetyltransferases (
) that catalyse the reversible acetylation of all four histones in nucleosomes, acting to regulate transcription via chromatin remodelling. These large nuclear proteins interact with numerous transcription factors and viral oncoproteins, including p53 tumour suppressor protein, E1A oncoprotein, MyoD, and GATA-1, and are involved in cell growth, differentiation and apoptosis [
]. Both CBP and P300 have two copies of the TAZ domain, one in the N-terminal region, the other in the C-terminal region. The TAZ1 domain of CBP and P300 forms a complex with CITED2 (CBP/P300-interacting transactivator with ED-rich tail), inhibiting the activity of the hypoxia inducible factor (HIF-1alpha) and thereby attenuating the cellular response to low tissue oxygen concentration []. Adaptation to hypoxia is mediated by transactivation of hypoxia-responsive genes by hypoxia-inducible factor-1 (HIF-1) in complex with the CBP and p300 transcriptional coactivators [].Proteins containing this domain also include a group of land-plant specific proteins, know as the BTB/POZ and TAZ domain-containing (BT) protein. The reports of their interaction with CUL3 are contradictory. They are multifunctional scaffold proteins essential for male and female gametophyte development [
]. The TAZ domain adopts an all-alpha fold with zinc-binding sites in the loops connecting the helices. The TAZ1 domain in P300 and the TAZ2 (CH3) domain in CBP have each been shown to have four amphipathic helices, organised by three zinc-binding clusters with HCCC-type coordination [
,
,
].Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [
,
,
,
,
]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target.
Iron-sulphur (FeS) clusters are important cofactors for numerous proteins involved in electron transfer, in redox and non-redox catalysis, in gene regulation, and as sensors of oxygen and iron. These functions depend on the various FeS cluster prosthetic groups, the most common being [2Fe-2S] and [4Fe-4S][
]. FeS cluster assembly is a complex process involving the mobilisation of Fe and S atoms from storage sources, their assembly into [Fe-S]form, their transport to specific cellular locations, and their transfer to recipient apoproteins. So far, three FeS assembly machineries have been identified, which are capable of synthesising all types of [Fe-S] clusters: ISC (iron-sulphur cluster), SUF (sulphur assimilation), and NIF (nitrogen fixation) systems.The ISC system is conserved in eubacteria and eukaryotes (mitochondria), and has broad specificity, targeting general FeS proteins [
,
]. It is encoded by the isc operon (iscRSUA-hscBA-fdx-iscX). IscS is a cysteine desulphurase, which obtains S from cysteine (converting it to alanine) and serves as a S donor for FeS cluster assembly. IscU and IscA act as scaffolds to accept S and Fe atoms, assembling clusters and transferring them to recipient apoproteins. HscA is a molecular chaperone and HscB is a co-chaperone. Fdx is a [2Fe-2S]-type ferredoxin. IscR is a transcription factor that regulates expression of the isc operon. IscX (also known as YfhJ) appears to interact with IscS and may function as an Fe donor during cluster assembly [].The SUF system is an alternative pathway to the ISC system that operates under iron starvation and oxidative stress. It is found in eubacteria, archaea and eukaryotes (plastids). The SUF system is encoded by the suf operon (sufABCDSE), and the six encoded proteins are arranged into two complexes (SufSE and SufBCD) and one protein (SufA). SufS is a pyridoxal-phosphate (PLP) protein displaying cysteine desulphurase activity. SufE acts as a scaffold protein that accepts S from SufS and donates it to SufA [
]. SufC is an ATPase with an unorthodox ATP-binding cassette (ABC)-like component. SufA is homologous to IscA [], acting as a scaffold protein in which Fe and S atoms are assembled into [FeS]cluster forms, which can then easily be transferred to apoproteins targets.
In the NIF system, NifS and NifU are required for the formation of metalloclusters of nitrogenase in Azotobacter vinelandii, and other organisms, as well as in the maturation of other FeS proteins. Nitrogenase catalyses the fixation of nitrogen. It contains a complex cluster, the FeMo cofactor, which contains molybdenum, Fe and S. NifS is a cysteine desulphurase. NifU binds one Fe atom at its N-terminal, assembling an FeS cluster that is transferred to nitrogenase apoproteins [
]. Nif proteins involved in the formation of FeS clusters can also be found in organisms that do not fix nitrogen [].This entry represents proteins belonging to the Rrf2 family of transcriptional regulators and are found, typically, as the first gene of the SUF operon. They are found only in a subset of the genomes that encode the SUF system, including the genus Xanthomonas. The conserved location suggests an autoregulatory role.
SUF system FeS cluster assembly, SufBD superfamily
Type:
Homologous_superfamily
Description:
Iron-sulphur (FeS) clusters are important cofactors for numerous proteins involved in electron transfer, in redox and non-redox catalysis, in gene regulation, and as sensors of oxygen and iron. These functions depend on the various FeS cluster prosthetic groups, the most common being [2Fe-2S] and [4Fe-4S][
]. FeS cluster assembly is a complex process involving the mobilisation of Fe and S atoms from storage sources, their assembly into [Fe-S]form, their transport to specific cellular locations, and their transfer to recipient apoproteins. So far, three FeS assembly machineries have been identified, which are capable of synthesising all types of [Fe-S] clusters: ISC (iron-sulphur cluster), SUF (sulphur assimilation), and NIF (nitrogen fixation) systems.The ISC system is conserved in eubacteria and eukaryotes (mitochondria), and has broad specificity, targeting general FeS proteins [
,
]. It is encoded by the isc operon (iscRSUA-hscBA-fdx-iscX). IscS is a cysteine desulphurase, which obtains S from cysteine (converting it to alanine) and serves as a S donor for FeS cluster assembly. IscU and IscA act as scaffolds to accept S and Fe atoms, assembling clusters and transferring them to recipient apoproteins. HscA is a molecular chaperone and HscB is a co-chaperone. Fdx is a [2Fe-2S]-type ferredoxin. IscR is a transcription factor that regulates expression of the isc operon. IscX (also known as YfhJ) appears to interact with IscS and may function as an Fe donor during cluster assembly [
].The SUF system is an alternative pathway to the ISC system that operates under iron starvation and oxidative stress. It is found in eubacteria, archaea and eukaryotes (plastids). The SUF system is encoded by the suf operon (sufABCDSE), and the six encoded proteins are arranged into two complexes (SufSE and SufBCD) and one protein (SufA). SufS is a pyridoxal-phosphate (PLP) protein displaying cysteine desulphurase activity. SufE acts as a scaffold protein that accepts S from SufS and donates it to SufA [
]. SufC is an ATPase with an unorthodox ATP-binding cassette (ABC)-like component. SufA is homologous to IscA [], acting as a scaffold protein in which Fe and S atoms are assembled into [FeS]cluster forms, which can then easily be transferred to apoproteins targets.
In the NIF system, NifS and NifU are required for the formation of metalloclusters of nitrogenase in Azotobacter vinelandii, and other organisms, as well as in the maturation of other FeS proteins. Nitrogenase catalyses the fixation of nitrogen. It contains a complex cluster, the FeMo cofactor, which contains molybdenum, Fe and S. NifS is a cysteine desulphurase. NifU binds one Fe atom at its N-terminal, assembling an FeS cluster that is transferred to nitrogenase apoproteins [
]. Nif proteins involved in the formation of FeS clusters can also be found in organisms that do not fix nitrogen [].This entry represents SufB and SufD proteins, which are homologous, and form part of the SufBCD complex in the SUF system [
]. SufB accepts sulfur transferred from SufE [], whereas SufD may play a role in iron acquisition [].
The mitochondrial protein translocase (MPT) family, which brings nuclearly encoded preproteins into mitochondria, is very complex with 19 currently identified protein constituents. These proteins include several chaperone proteins, four proteins of the outer membrane translocase (Tom) import receptor, five proteins of the Tom channel complex, five proteins of the inner membrane translocase (Tim) and three "motor"proteins. This family is specific for the Tom70 proteins.
WD-40 repeats (also known as WD or beta-transducin repeats) are short ~40 amino acid motifs, often terminating in a Trp-Asp (W-D) dipeptide. WD40 repeats usually assume a 7-8 bladed β-propeller fold, but proteins have been found with 4 to 16 repeated units, which also form a circularised β-propeller structure. WD-repeat proteins are a large family found in all eukaryotes and are implicated in a variety of functions ranging from signal transduction and transcription regulation to cell cycle control and apoptosis. Repeated WD40 motifs act as a site for protein-protein or protein-DNA interaction, and proteins containing WD40 repeats are known to serve as platforms for the assembly of protein complexes or mediators of transient interplay among other proteins [
]. The specificity of the proteins is determined by the sequences outside the repeats themselves. Examples of such complexes are G proteins (beta subunit is a β-propeller), TAFII transcription factor, and E3 ubiquitin ligase [,
]. In Arabidopsis spp., several WD40-containing proteins act as key regulators of plant-specific developmental events.
WD-40 repeats (also known as WD or beta-transducin repeats) are short ~40 amino acid motifs, often terminating in a Trp-Asp (W-D) dipeptide. WD40 repeats usually assume a 7-8 bladed β-propeller fold, but proteins have been found with 4 to 16 repeated units, which also form a circularised β-propeller structure. WD-repeat proteins are a large family found in all eukaryotes and are implicated in a variety of functions ranging from signal transduction and transcription regulation to cell cycle control and apoptosis. Repeated WD40 motifs act as a site for protein-protein or protein-DNA interaction, and proteins containing WD40 repeats are known to serve as platforms for the assembly of protein complexes or mediators of transient interplay among other proteins [
]. The specificity of the proteins is determined by the sequences outside the repeats themselves. Examples of such complexes are G proteins (beta subunit is a β-propeller), TAFII transcription factor, and E3 ubiquitin ligase [,
]. In Arabidopsis spp., several WD40-containing proteins act as key regulators of plant-specific developmental events.This entry represents a conserved site found in the WD40 repeat.
The EXS domain is named after ERD1/XPR1/SYG1 and proteins containing this motif include the C-terminal of the SYG1 G-protein associated signal transduction protein from Saccharomyces cerevisiae, and sequences that are thought to be Murine leukemia virus (MLV) receptors (XPR1. The N-terminal of these proteins often have an SPX domain (
) [
].While the N-terminal is thought to be involved in signal transduction, the role of the C-terminal is not known. This region of similarity contains several predicted transmembrane helices. This family also includes the ERD1 (ERD: ER retention defective) S. cerevisiae proteins. ERD1 proteins are involved in the localization of endogenous endoplasmic reticulum (ER) proteins. Erd1 null mutants secrete such proteins even though they possess the C-terminal HDEL ER lumen localization label sequence. In addition, null mutants also exhibit defects in the Golgi-dependent processing of several glycoproteins, which led to the suggestion that the sorting of luminal ER proteins actually occurs in the Golgi, with subsequent return of these proteins to the ER via `salvage' vesicles [
].
Retroviral matrix proteins (or major core proteins) are components of envelope-associated capsids, which line the inner surface of virus envelopes and are associated with viral membranes [
]. Matrix proteins are produced as part of Gag precursor polyproteins. During viral maturation, the Gag polyprotein is cleaved into major structural proteins by the viral protease, yielding the matrix (MA), capsid (CA), nucleocapsid (NC), and some smaller peptides. Gag-derived proteins govern the entire assembly and release of the virus particles, with matrix proteins playing key roles in Gag stability, capsid assembly, transport and budding. Although matrix proteins from different retroviruses appear to perform similar functions and can have similar structural folds which predominantly consist of four closely packed α-helices that are interconnected through loops, their primary sequences can be very different []. This entry represents matrix proteins from immunodeficiency lentiviruses, such as Human immunodeficiency virus (HIV) and Simian immunodeficiency virus (SIV-cpz) [
]. The structure of the HIV protein consists of 5 alpha helices, a short 3.10 helix and a 3-stranded mixed β-sheet [].
Proteins synthesized on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. These vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transport [
]. Clathrin coats contain both clathrin (acts as a scaffold) and adaptor complexes that link clathrin to receptors in coated vesicles. Clathrin-associated protein complexes are believed to interact with the cytoplasmic tails of membrane proteins, leading to their selection and concentration. The two major types of clathrin adaptor complexes are the heterotetrameric adaptor protein (AP) complexes, and the monomeric GGA (Golgi-localising, Gamma-adaptin ear domain homology, ARF-binding proteins) adaptors [,
].AP (adaptor protein) complexes are found in coated vesicles and clathrin-coated pits. AP complexes connect cargo proteins and lipids to clathrin at vesicle budding sites, as well as binding accessory proteins that regulate coat assembly and disassembly (such as AP180, epsins and auxilin). There are different AP complexes in mammals. AP1 is responsible for the transport of lysosomal hydrolases between the TGN and endosomes [
]. AP2 associates with the plasma membrane and is responsible for endocytosis []. AP3 is responsible for protein trafficking to lysosomes and other related organelles []. AP4 is less well characterised. AP complexes are heterotetramers composed of two large subunits (adaptins), a medium subunit (mu) and a small subunit (sigma). For example, in AP1 these subunits are gamma-1-adaptin, beta-1-adaptin, mu-1 and sigma-1, while in AP2 they are alpha-adaptin, beta-2-adaptin, mu-2 and sigma-2. Each subunit has a specific function. Adaptins recognise and bind to clathrin through their hinge region (clathrin box), and recruit accessory proteins that modulate AP function through their C-terminal ear (appendage) domains. Mu recognises tyrosine-based sorting signals within the cytoplasmic domains of transmembrane cargo proteins []. One function of clathrin and AP2 complex-mediated endocytosis is to regulate the number of GABA(A) receptors available at the cell surface []. AP adaptor alpha-adaptin can be divided into a trunk domain and the C-terminal appendage domain (or ear domain), separated by a linker region. The C-terminal appendage domain regulates translocation of endocytic accessory proteins to the bud site [
].This entry represents a subdomain of the appendage (ear) domain of alpha-adaptin from AP clathrin adaptor complexes. This domain has a three-layer arrangement, α-β-alpha, with a bifurcated antiparallel β-sheet [
].
Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [
,
]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [,
].Ribosome-binding factor A [
] (gene rbfA) is a bacterial protein that associates with free 30S ribosomal subunits. It does not associate with 30S subunits that are part of 70S ribosomes or polysomes. It is essential for efficient processing of 16S rRNA. Ribosome-binding factor A is a protein of from 13 to 15 Kd which is found in most bacteria. A putative chloroplastic form seems to exist in plants.
Twin-arginine translocation pathway, signal sequence
Type:
Conserved_site
Description:
The twin-arginine translocation (Tat) pathway serves the role of transporting folded proteins across energy-transducing membranes [
]. Homologues of the genes that encode the transport apparatus occur in archaea, bacteria, chloroplasts, and plant mitochondria []. In bacteria, the Tat pathway catalyses the export of proteins from the cytoplasm across the inner/cytoplasmic membrane. In chloroplasts, the Tat components are found in the thylakoid membrane and direct the import of proteins from the stroma. The Tat pathway acts separately from the general secretory (Sec) pathway, which transports proteins in an unfolded state [].It is generally accepted that the primary role of the Tat system is to translocate fully folded proteins across membranes. An example of proteins that need to be exported in their 3D conformation are redox proteins that have acquired complex multi-atom cofactors in the bacterial cytoplasm (or the chloroplast stroma or mitochondrial matrix). They include hydrogenases, formate dehydrogenases, nitrate reductases, trimethylamine N-oxide (TMAO) reductases and dimethyl sulphoxide (DMSO) reductases [
,
]. The Tat system can also export whole heteroligomeric complexes in which some proteins have no Tat signal. This is the case of the DMSO reductase or formate dehydrogenase complexes. But there are also other cases where the physiological rationale for targeting a protein to the Tat signal is less obvious. Indeed, there are examples of homologous proteins that are in some cases targeted to the Tat pathway and in other cases to the Sec apparatus. Some examples are: copper nitrite reductases, flavin domains of flavocytochrome c and N-acetylmuramoyl-L-alanine amidases [].In halophilic archaea such as Halobacterium almost all secreted proteins appear to be Tat targeted. It has been proposed to be a response to the difficulties these organisms would otherwise face in successfully folding proteins extracellularly at high ionic strength [
].The Tat signal peptide consists of three motifs: the positively charged N-terminal motif, the hydrophobic region and the C-terminal region that generally ends with a consensus short motif (A-x-A) specifying cleavage by signal peptidase. Sequence analysis revealed that signal peptides capable of targeting the Tat protein contain the consensus sequence [ST]-R-R-x-F-L-K. The nearly invariant twin-arginine gave rise to the pathway's name. In addition the h-region of Tat signal peptides is typically less hydrophobic than that of Sec-specific signal peptides [,
].This entry represents the Tat signal, from the methionine to the A-x-A short motif.
Proteins synthesised on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. This traffic is bidirectional, to ensure that proteins required to form vesicles are recycled. Vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transfer [
]. While clathrin mediates endocytic protein transport, and transport from ER to Golgi, coatomers primarily mediate intra-Golgi transport, as well as the reverse Golgi to ER transport of dilysine-tagged proteins []. For example, the coatomer COP1 (coat protein complex 1) is responsible for reverse transport of recycled proteins from Golgi and pre-Golgi compartments back to the ER, while COPII buds vesicles from the ER to the Golgi []. Coatomers reversibly associate with Golgi (non-clathrin-coated) vesicles to mediate protein transport and for budding from Golgi membranes []. Activated small guanine triphosphatases (GTPases) attract coat proteins to specific membrane export sites, thereby linking coatomers to export cargos. As coat proteins polymerise, vesicles are formed and budded from membrane-bound organelles. Coatomer complexes also influence Golgi structural integrity, as well as the processing, activity, and endocytic recycling of LDL receptors. In mammals, coatomer complexes can only be recruited by membranes associated to ADP-ribosylation factors (ARFs), which are small GTP-binding proteins. Coatomer complexes are hetero-oligomers composed of at least an alpha, beta, beta', gamma, delta, epsilon and zeta subunits. This entry represents the WD-associated region found in coatomer subunits alpha, beta and beta' subunits. The alpha-subunit (RET1P) of the coatomer complex in Saccharomyces cerevisiae (Baker's yeast), participates in membrane transport between the endoplasmic reticulum and Golgi apparatus. The protein contains six WD-40 repeat motifs in its N-terminal region [
].
The 'baculovirus inhibitior of apoptosis protein repeat' (BIR) [
,
] is a domain of about 70 residues arranged in tandem repeats separated by a variable length linker, that seems to confer cell death-preventing activity. It is found in proteins belonging to the IAP (inhibitor of apoptosis proteins) family. The critical motifs required for anti-apoptotic activity of IAP proteins are the BIRs. All IAP proteins contain from one to three BIRs, and all known interactions between IAPs and other proteins are mediated by one or more BIRs []. In higher eukaryotes, BIR domains inhibit apoptosis by acting as direct inhibitors of the caspase family of protease enzymes. Proteins with BIR domains are considered peptidase inhibitors in family I32. In yeast, BIR domains are involved in regulating cytokinesis. This novel fold is stabilized by zinc tetrahedrally coordinated by one histidine and three cysteine residues and resembles a classical zinc finger [].The BIR domain has a fold that is stabilised by zinc tetrahedrally coordinated by one histidine and three cysteine residues. The structure consists of three short α-helices and turns with the zinc packed in an unusually hydrophobic environment created by residues that are highly conserved among all BIRs. A subclass of repeats, comprising those at the C terminus of a series of BIR repeats from IAP proteins bearing RING finger domains, are likely to contain a C-terminal region that form an α-helix [
].Proteins that are known to contain this domain include:Baculoviruses apoptosis inhibitors (IAPs).Mammalian apoptosis inhibitors 1 and 2 (IAP1 and IAP2; BIRC-2 and BIRC-3; MEROPS identifiers I32.002 and I32.003, respectively).Mammalian X-linked inhibitor of apoptosis protein (X-linked IAP; MEROPS identifier I32.004).Chicken IAP (ITA).Human neuronal apoptosis inhibitory protein (NAIP, BIRC-1; MEROPS identifier I32.001).Drosophila apoptosis inhibitors 1 and 2 (Iap1 and Iap2; MEROPS identifiers I32.009 and I32.011, respectively)).African Swine Fever Virus (ASFV) protein p27.
H/ACA ribonucleoprotein particles (RNPs) are a family of RNA pseudouridine synthases that specify modification sites through guide RNAs. The function of these H/ACA RNPs is essential for biogenesis of the ribosome, splicing of precursor mRNAs (pre-mRNAs), maintenance of telomeres and probably for additional cellular processes [
]. All H/ACA RNPs contain a specific RNA component (snoRNA or scaRNA) and at least four proteins common to all such particles: Cbf5, Gar1, Nhp2 and Nop10. These proteins are highly conserved from yeast to mammals and homologues are also present in archaea []. The H/ACA protein complex contains a stable core composed of Cbf5 and Nop10, to which Gar1 and Nhp2 subsequently bind [].This entry represents H/ACA ribonucleoprotein complex subunit NHP2 and similar proteins from eukaryotes, including NHP2-like protein 1 from mammals (SNU13 homologue) and 13 kDa ribonucleoprotein-associated protein (SNU13) from yeast.Nhp2 is part of a complex which catalyses pseudouridylation of rRNA and is required for rRNA biogenesis. This involves the isomerisation of uridine such that the ribose is subsequently attached to C5, instead of the normal N1. Pseudouridine ("psi") residues may serve to stabilise the conformation of rRNAs. Nph2 associates non-specifically with RNA secondary structures instead of directly binding to an specific RNA motif. This protein seem to have evolved from the archaeal ribosomal L7Ae protein family [
]. Human SNU13 homologue is involved in pre-mRNA splicing as component of the spliceosome [
]. The protein undergoes a conformational change upon RNA-binding [].SNU13 from Saccharomyces cerevisiae (Baker's yeast) is also a component of the spliceosome and rRNA processing machinery, required for splicing of pre-mRNA and essential for the accumulation and stability of U4 snRNA, U6 snRNA, and box C/D snoRNAs [
,
,
].