Nucleolar GTP-binding protein 1, Rossman-fold domain
Type:
Domain
Description:
This domain represents a conserved region of approximately 60 residues in length within nucleolar GTP-binding protein 1 (NOG1). The NOG1 family includes eukaryotic, bacterial and archaeal proteins. In Saccharomyces cerevisiae, the NOG1 gene has been shown to be essential for cell viability, suggesting that NOG1 may play an important role in nucleolar functions. In particular, NOG1 is believed to be functionally linked to ribosome biogenesis, which occurs in the nucleolus. In eukaryotes, NOG1 mutants were found to disrupt the biogenesis of the 60S ribosomal subunit [].The DRG and OBG proteins as well as the prokaryotic NOG-like proteins are homologous throughout their length to the amino half of eukaryotic NOG1, which contains the GTP binding motifs (
); the N-terminal GTP-binding motif is required for function.
This entry represents a family of conserved proteins found from nematodes to humans. Cytochrome c oxidase assembly protein Pet191 carries six highly conserved cysteine residues. Pet191 is required for the assembly of active cytochrome c oxidase but does not form part of the final assembled complex [
].
Vac14 is a component of the PI(3,5)P2 regulatory complex, which regulates both the synthesis and turnover of phosphatidylinositol 3,5-bisphosphate (PI(3,5)P2) [
]. This complex is composed of Atg18, Fig4, Fab1, Vac14 and Vac7 in yeast. In mammals it consists of PIKfyve, FIG4 and VAC14. Fab1/PIKfyve (Fab1 in yeast and PIKfyve in mammals) is a kinase that converts PI(3)P into PI(3,5)P2 by phosphorylation at the 5-position; conversely Fig4 dephosphorylates PI(3,5)P2 back to PI(3)P. Vac14 is a scaffolding adaptor protein whose multimerisation is a prerequisite step for PI(3,5)P2 complex assembly and function [].
Vacuolar protein 14, C-terminal Fig4-binding domain
Type:
Domain
Description:
Vac14 is a scaffold for the Fab1 kinase complex, a complex that allows for the dynamic interconversion of PI3P and PI(3,5)P2p (phosphoinositide phosphate (PIP) lipids, that are generated transiently on the cytoplasmic face of selected intracellular membranes) [
]. This interconversion is regulated by at least five proteins in yeast: the lipid kinase Fab1p, lipid phosphatase Fig4p, the Fab1p activator Vac7p, the Fab1p inhibitor Atg18p, and Vac14p, a protein required for the activity of both Fab1p and Fig4p. The full length Vac14 in yeasts is likely to be a protein carrying a succession of HEAT repeats, most of which have now degenerated. This regulatory system is crucial for the proper functioning of the mammalian nervous system [].This entry represents the C-terminal domain of Vac14, which binds to Fig4p.
Ribosomal protein S5 is one of the proteins from the small ribosomal subunit, and is a protein of 166 to 254 amino-acid residues. In Escherichia coli, S5 is known to be important in the assembly and function of the 30S ribosomal subunit. Mutations in S5 have been shown to increase translational error frequencies. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities [
], groups bacterial, cyanelle, red algal chloroplast, archaeal and fungal mitochondrial S5; mammalian, Caenorhabditis elegans, Drosophila and plant S2; and yeast S4 (SUP44).This entry represents the conserved site of the ribosomal protein S5. This entry represents the N-terminal domain of ribosomal protein S5, which has an α-β(3)-alpha structure that folds into two layers, alpha/beta.
Mismatch repair contributes to the overall fidelity of DNA replication and is essential for combating the adverse effects of damage to the genome. It involves the correction of mismatched base pairs that have been missed by the proofreading element of the DNA polymerase complex. The post-replicative Mismatch Repair System (MMRS) of Escherichia coli involves MutS (Mutator S), MutL and MutH proteins, and acts to correct point mutations or small insertion/deletion loops produced during DNA replication [
]. MutS and MutL are involved in preventing recombination between partially homologous DNA sequences. The assembly of MMRS is initiated by MutS, which recognises and binds to mispaired nucleotides and allows further action of MutL and MutH to eliminate a portion of newly synthesized DNA strand containing the mispaired base []. MutS can also collaborate with methyltransferases in the repair of O(6)-methylguanine damage, which would otherwise pair with thymine during replication to create an O(6)mG:T mismatch []. MutS exists as a dimer, where the two monomers have different conformations and form a heterodimer at the structural level []. Only one monomer recognises the mismatch specifically and has ADP bound. Non-specific major groove DNA-binding domains from both monomers embrace the DNA in a clamp-like structure. Mismatch binding induces ATP uptake and a conformational change in the MutS protein, resulting in a clamp that translocates on DNA. MutS is a modular protein with a complex structure [
], and is composed of:N-terminal mismatch-recognition domain, which is similar in structure to tRNA endonuclease.Connector domain, which is similar in structure to Holliday junction resolvase ruvC.Core domain, which is composed of two separate subdomains that join together to form a helical bundle; from within the core domain, two helices act as levers that extend towards (but do not touch) the DNA.Clamp domain, which is inserted between the two subdomains of the core domain at the top of the lever helices; the clamp domain has a β-sheet structure.ATPase domain (connected to the core domain), which has a classical Walker A motif.HTH (helix-turn-helix) domain, which is involved in dimer contacts.The MutS family of proteins is named after the Salmonella typhimurium MutS protein involved in mismatch repair. Homologues of MutS have been found in many species including eukaryotes (MSH 1, 2, 3, 4, 5, and 6 proteins), archaea and bacteria, and together these proteins have been grouped into the MutS family. Although many of these proteins have similar activities to the E. coli MutS, there is significant diversity of function among the MutS family members. Human MSH has been implicated in non-polyposis colorectal carcinoma (HNPCC) and is a mismatch binding protein [
].This diversity is even seen within species, where many species encode multiple MutS homologues with distinct functions []. Inter-species homologues may have arisen through frequent ancient horizontal gene transfer of MutS (and MutL) from bacteria to archaea and eukaryotes via endosymbiotic ancestors of mitochondria and chloroplasts []. This entry represents the N-terminal domain of proteins in the MutS family of DNA mismatch repair proteins, as well as closely related proteins. The N-terminal domain of MutS is responsible for mismatch recognition and forms a 6-stranded mixed β-sheet surrounded by three α-helices, which is similar to the structure of tRNA endonuclease. Yeast MSH3 [
], bacterial proteins involved in DNA mismatch repair, and the predicted protein product of the Rep-3 gene of mouse share extensive sequence similarity. Human MSH has been implicated in non-polyposis colorectal carcinoma (HNPCC) and is a mismatch binding protein.
Ribosome maturation protein SDO1/SBDS, central domain
Type:
Domain
Description:
This entry represents the central domain of proteins that are highly conserved in species ranging from archaea to vertebrates and plants [
]. This entry contains several Shwachman-Bodian-Diamond syndrome (SBDS) proteins from both mouse and humans. Shwachman-Diamond syndrome (OMIM 260400) is an autosomal recessive disorder with clinical features that include pancreatic exocrine insufficiency, haematological dysfunction and skeletal abnormalities. It is characterised by bone marrow failure and leukemia predisposition.Members of this entry play a role in RNA metabolism [
,
]. In yeast, SBDS orthologue SDO1 is involved in the biogenesis of the 60S ribosomal subunit and translational activation of ribosomes. Together with the EF-2-like GTPase RIA1 (EfI1), it triggers the GTP-dependent release of TIF6 from 60S pre-ribosomes in the cytoplasm, thereby activating ribosomes for translation competence by allowing 80S ribosome assembly and facilitating TIF6 recycling to the nucleus, where it is required for 60S rRNA processing and nuclear export. This data links defective late 60S subunit maturation to an inherited bone marrow failure syndrome associated with leukemia predisposition [].The SBDS protein is composed of three domains. The N-terminal (FYSH,
) domain is the most frequent target for disease mutations and contains a novel mixed α/β-fold, the central domain (represented in this entry) consists of a three-helical bundle and the C-terminal domain (
) has a ferredoxin-like fold [
,
].
Proteins identified by this conserved site are involved in the biogenesis of the 60S ribosomal subunit and translational activation of ribosomes in eukaryotes. Together with the EF-2-like GTPase RIA1, they may trigger the GTP-dependent release of TIF6 from 60S pre-ribosomes in the cytoplasm, thereby activating ribosomes for translation competence by allowing 80S ribosome assembly and facilitating TIF6 recycling to the nucleus, where it is required for 60S rRNA processing and nuclear export.
Adaptor protein complex AP-2 has a role in clathrin-mediated endocytosis and intracellular transport [
]. AP-2 links clathrin to membrane lipids and transmembrane cargoes at the vesicle budding site []. It consists of two large subunits termed adaptins (alpha and beta subunits), a medium-sized one (mu), and a small one (sigma). It is thought to act in cargo selection by interaction with selection motifs encoded in the cytoplasmic tails of cargo molecules. The N-terminal domains of the adaptins, together with the mu and sigma subunits, constitute the 'head' of the adaptor, which interacts with plasma membrane lipids and cargoes. AP-2 plays a role in the recycling of synaptic vesicle membranes from the presynaptic surface [].This family represents the alpha subunit of adaptor protein complex AP-2.
This entry represents the N-terminal domain of DNA mismatch repair proteins, such as MutL. Bacterial MutL proteins are homodimers, while their eukaryotic homologues form heterodimers consisting of the MutL homologue Mlh1 and either Pms1, Pms2 or Mlh3 [
,
]. MutL proteins and their homologues share sequence homology at their N termini over the first 300-400 residues; the C termini are less well conserved, they constitute the main dimerization domain and are required for interaction between MutL and UvrD helicase []. The activity of the protein is modulated by the ATP-dependent dimerization of the N-terminal domain [].The dimeric MutL protein has a key function in communicating mismatch recognition by MutS to downstream repair processes. Mismatch repair contributes to the overall fidelity of DNA replication by targeting mispaired bases that arise through replication errors during homologous recombination and as a result of DNA damage. It involves the correction of mismatched base pairs that have been missed by the proofreading element of the DNA polymerase complex [
].
Ribosomal RNA-processing protein 7, C-terminal domain
Type:
Domain
Description:
Ribosomal RNA-processing protein 7 (RRP7) is an essential protein in yeast that is involved in pre-rRNA processing and ribosome assembly [
]. It is speculated to be required for correct assembly of rpS27 into the pre-ribosomal particle [,
]. This entry represents the C-terminal domain of RRP7.
The adaptor protein complexes mediate both the recruitment of clathrin to membranes and the recognition of sorting signals within the cytosolic tails of transmembrane cargo molecules. Adaptor protein complex 3 (AP-3) appears to be involved in the sorting of a subset of transmembrane proteins targeted to lysosomes and lysosome-related organelles. AP-3 is a heterotetramer composed of two large adaptins (delta-type subunit AP3D1 and beta-type subunit AP3B1 or AP3B2), a medium adaptin (mu-type subunit AP3M1 or AP3M2) and a small adaptin (sigma-type subunit APS1 or AP3S2). The subunits of non-clathrin- and clathrin-associated adaptor protein complex 3 play a role in protein sorting in the late-Golgi/trans-Golgi network (TGN) and/or in endosomes. This group represents an adaptor protein complex AP-3, delta subunit.
Prokaryotes and eukaryotes respond to heat shock and other forms of environmental stress by inducing synthesis of heat-shock proteins (hsp) [
]. The 90kDa heat shock protein, Hsp90, is one of the most abundant proteins in eukaryotic cells, comprising 1-2% of cellular proteins under non-stress conditions []. Its contribution to various cellular processes including signal transduction, protein folding, protein degradation and morphological evolution has been extensively studied [,
]. The full functional activity of Hsp90 is gained in concert with other co-chaperones, playing an important role in the folding of newly synthesised proteins and stabilisation and refolding of denatured proteins after stress. Apart from its co-chaperones, Hsp90 binds to an array of client proteins, where the co-chaperone requirement varies and depends on the actual client. The
sequences of hsp90s show a distinctive domain structure, with a highly-conserved N-terminal domain separated from a conserved, acidic C-terminaldomain by a highly-acidic, flexible linker region.The signature pattern for the hsp90 family of proteins is located in a highly conserved region found in the N-terminal part of these proteins.
Saccharomyces cerevisiae ER membrane protein complex (EMC) comprises six subunits [
]. Four and three additional subunits have been identified in mammals and Drosophila, respectively []. EMC is required for protein folding in the endoplasmic reticulum (ER). It also facilitates lipid transfer from ER to mitochondria [].This family includes mammalian subunits EMC8 and EMC9, and Drosophila EMC8/9 homologue [
]. EMC8 is also known as neighbour of COX4 (NOC4) [].
Ribosomal protein L4/L1e, eukaryotic/archaeal, conserved site
Type:
Conserved_site
Description:
Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [
,
]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [
,
].This family includes ribosomal L4/L1 from eukaryotes and archeabacteria. L4 from yeast has been shown to bind rRNA [
]. These proteins have 246 (plant) to 427 (human) amino acids.
SMC5 is a core component of the SMC5-SMC6 complex [
,
], a complex involved in repair of DNA double-strand breaks by homologous recombination [,
]. In humans, the SMC5-SMC6 complex may promote sister chromatid homologous recombination by recruiting the SMC1-SMC3 cohesin complex to double-strand breaks []. The complex is required for telomere maintenance via recombination in ALT (alternative lengthening of telomeres) cell lines and mediates sumoylation of shelterin complex (telosome) components which is proposed to lead to shelterin complex disassembly in ALT-associated PML bodies (APBs) []. SMC5 is required for sister chromatid cohesion during prometaphase and mitotic progression; the function seems to be independent of SMC6 [].
Cytochrome c oxidase assembly protein is essential for the assembly of functional cytochrome oxidase protein. In eukaryotes it is an integral protein of the mitochondrial inner membrane. Cox11 is essential for the insertion of Cu(I) ions to form the CuB site. This is essential for the stability of other structures in subunit I, for example haems a and a3, and the magnesium/manganese centre. Cox11 is probably only required in sub-stoichiometric amounts relative to the structural units [
]. The C-terminal region of the protein is known to form a dimer. Each monomer coordinates one Cu(I) ion via three conserved cysteine residues (111, 208 and 210) in Saccharomyces cerevisiae (). Met 224 is also thought to play a role in copper transfer or stabilising the copper site [
].
The breast cancer type 2 susceptibility protein (BRCA2) is a breast tumour suppressor involved in double-strand break repair and/or homologous recombination [
]. BRCA2 gene expression is regulated in a cell-cycle dependent manner and peak expression of BRCA2 mRNA occurring in S phase, suggesting BRCA2 may participate in regulating cell proliferation. BRCA2, and related protein BRCA1, have transcriptional activation potential and the two proteins are associated with the activation of double-strand break repair and/or homologous recombination. The two proteins have been shown to coexist and colocalize in a biochemical complex. BRCA2 has a number of 39 amino acid repeats [] that are critical for binding to RAD51 (a key protein in DNA recombinational repair) and resistance to methyl methanesulphonate treatment [,
,
]. There are eight repeats in BRCA2 designated as BRC1 to BRC8. BRC1, BRC2, BRC3, BRC4, BRC7, and BRC8 have high sequence identity and bind to Rad51, whereas BRC5 and BRC6 are less well conserved and are unable to bind Rad51 []. It has been suggested that BRCA2 plays a role in positioning Rad51 at the site of DNA repair or in removing Rad51 from DNA once repair has been completed.Mutations in BRCA1 and BRCA2 have been linked to an elevated risk of young onset breast cancer and confer a high risk of the disease through a dominantly inherited fashion [
]. BRCA2 mutations are typically microdeletions.Homologues exist in plants: the BRCA2A and BRCA2B proteins from
Arabidopsis thalianaare required for repair of breaks in double-stranded DNA and homologous recombination and in the prophase stage of meiosis are required for formation of RAD51 and DMC1 foci in males [
].
This entry represents the N-terminal domain of DDA1 (DET1- and DDB1-associated protein 1) ubiquitin ligase, which binds strongly with Det1 (De-etiolated 1) and DDB1 (Damaged DNA binding protein 1 associated 1). Together DDA1, DDB1 and Det1 form the DDD core complex, which recruits a specific UBE2E enzyme to form specific DDD-E2 complexes [
]. Component of the DDD-E2 complexes which may provide a platform for interaction with cul4a and WD repeat proteins. These proteins may be involved in ubiquitination and subsequent proteasomal degradation of target proteins.
This entry represents a domain found in trafficking protein particle complex subunit 11 (Trappc11), which is involved in endoplasmic reticulum to Golgi apparatus trafficking at a very early stage [
]. The C terminus of this region contains TPR repeats.In zebrafish, Trappc11 is also known as protein foie gras. It has been shown to affect development; the mutants develop large, lipid-filled hepatocytes in the liver, resembling those in individuals with fatty liver disease [
].
Adapter-like complex 4 (AP-4) is a heterotetramer composed of two large adaptins (epsilon-type subunit AP4E1 and beta-type subunit AP4B1), a medium adaptin (mu-type subunit AP4M1) and a small adaptin (sigma-type AP4S1). It is a subunit of a novel type of clathrin- or non-clathrin-associated protein coat involved in targeting proteins from the trans-Golgi network (TGN) to the endosomal-lysosomal system [
,
].This group represents an adaptor protein complex AP-4, epsilon subunit.
Iron-sulfur cluster assembly scaffold protein IscU
Type:
Family
Description:
Iron-sulphur (FeS) clusters are important cofactors for numerous proteins involved in electron transfer, in redox and non-redox catalysis, in gene regulation, and as sensors of oxygen and iron. These functions depend on the various FeS cluster prosthetic groups, the most common being [2Fe-2S] and [4Fe-4S][
]. FeS cluster assembly is a complex process involving the mobilisation of Fe and S atoms from storage sources, their assembly into [Fe-S]form, their transport to specific cellular locations, and their transfer to recipient apoproteins. So far, three FeS assembly machineries have been identified, which are capable of synthesising all types of [Fe-S] clusters: ISC (iron-sulphur cluster), SUF (sulphur assimilation), and NIF (nitrogen fixation) systems.The ISC system is conserved in eubacteria and eukaryotes (mitochondria), and has broad specificity, targeting general FeS proteins [
,
]. It is encoded by the isc operon (iscRSUA-hscBA-fdx-iscX). IscS is a cysteine desulphurase, which obtains S from cysteine (converting it to alanine) and serves as a S donor for FeS cluster assembly. IscU and IscA act as scaffolds to accept S and Fe atoms, assembling clusters and transferring them to recipient apoproteins. HscA is a molecular chaperone and HscB is a co-chaperone. Fdx is a [2Fe-2S]-type ferredoxin. IscR is a transcription factor that regulates expression of the isc operon. IscX (also known as YfhJ) appears to interact with IscS and may function as an Fe donor during cluster assembly [
].The SUF system is an alternative pathway to the ISC system that operates under iron starvation and oxidative stress. It is found in eubacteria, archaea and eukaryotes (plastids). The SUF system is encoded by the suf operon (sufABCDSE), and the six encoded proteins are arranged into two complexes (SufSE and SufBCD) and one protein (SufA). SufS is a pyridoxal-phosphate (PLP) protein displaying cysteine desulphurase activity. SufE acts as a scaffold protein that accepts S from SufS and donates it to SufA [
]. SufC is an ATPase with an unorthodox ATP-binding cassette (ABC)-like component. SufA is homologous to IscA [], acting as a scaffold protein in which Fe and S atoms are assembled into [FeS]cluster forms, which can then easily be transferred to apoproteins targets.
In the NIF system, NifS and NifU are required for the formation of metalloclusters of nitrogenase in Azotobacter vinelandii, and other organisms, as well as in the maturation of other FeS proteins. Nitrogenase catalyses the fixation of nitrogen. It contains a complex cluster, the FeMo cofactor, which contains molybdenum, Fe and S. NifS is a cysteine desulphurase. NifU binds one Fe atom at its N-terminal, assembling an FeS cluster that is transferred to nitrogenase apoproteins [
]. Nif proteins involved in the formation of FeS clusters can also be found in organisms that do not fix nitrogen [].This entry represents IscU from the ISC system, a homologue of the N-terminal region of NifU (NIF system), an Fe-S cluster assembly protein found mostly in nitrogen-fixing bacteria. IscU is a scaffold protein on which Fe-S clusters are assembled before transfer to apoproteins [
,
]. This family includes largely proteobacterial and eukaryotic forms and excludes the true NifU proteins from Klebsiella sp. and Anabaena sp. as well as the archaeal homologues.
Extensins are plant cell-wall proteins; they can account for up to 20% of the dry weight of the cell wall. They are highly-glycosylated, possibly reflecting their interactions with cell-wall carbohydrates. Amongst their functions is cell
wall strengthening in response to mechanical stress (e.g., during attack by pests, plant-bending in the wind, etc.). This repeat occurs within extensin-like proteins.
This family represents Protein FAM136A and similar uncharacterised proteins of unknown function from animals and plants. The sequences carry three sets of CxxxC motifs, which might suggest a type of zinc-finger formation. In humans, FAM136A has been related to Meniere's disease, a complex disorder of the inner ear [
,
].
Ribosomal protein S24/S35, mitochondrial, conserved domain
Type:
Domain
Description:
Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [
,
]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [
,
].This entry represents a conserved region of approx. 125 residues of one of the proteins that makes up the small subunit of the mitochondrial ribosome. In Saccharomyces cerevisiae (baker's yeast) it is mitochondrial ribosomal protein S24 whereas in humans it is S35.
Arp2/3 binds to pre-existing actin filaments and nucleates new daughter filaments, and thus becomes incorporated into the dynamic actin network at the leading edge of motile cells and other actin-based protrusive structures [
]. In order to nucleate filaments, Arp2/3 must bind to a member of the N-WASp/SCAR family protein []. Arp2 and Arp3 are thought to be brought together after activation, forming an actin-like nucleus for actin monomers to bind and create a new actin filament. In the absence of an activating protein, Arp2/3 shows very little nucleation activity. Recent research has focused on the binding and hydrolysis of ATP by Arp2 and Arp3 [], and crystal structures of the Arp2/3 complex have been solved [].The human Arp2/3 complex consists of ARP2, ARP3, ARPC1B/p41-ARC, ARPC2/p34-ARC, ARPC3/p21-ARC, ARPC4/p20-ARC and ARPC5/p16-ARC. This family represents the ARPC5/p16-ARC subunit.
Transcription initiation is dictated by the presence and activity of specific nuclear factors that bind to DNA regulatory sequences and interact with the transcriptional machinery. The functions of some of these factors can be altered by phosphorylation, which affects both DNA binding and transcriptional activation [
]. Phosphorylation is effected by specific protein kinases that have been activated by the stimulus of signal transduction pathways, resulting in the regulation of gene transcription by modulating the phosphorylation sites of transcription factors.Cyclic AMP (cAMP) regulates the expression of many genes via a conserved gene promoter element CRE (cAMP response element) [
], which has the sequence 5'-TGACGTCA-3' []. The cAMP response element binding protein (CREB) is a nuclear factor that is regulated by protein kinase A phosphorylation. Transcription is stimulated on binding to the CRE of a phosphorylated CREB dimer, which is held together by leucine zippers. Dimerisation and transcriptional efficacy have been found to be stimulated by phosphorylation at several distinct sites, and it has thus been suggested [] that CREB may be regulated by multiple kinases. Sequence analysis of the gene has revealed a cluster of protein kinase A, protein kinase C, and casein kinase II consensus recognition sites near the N terminus of the protein sequence, and the proximity of these sites to one another indicates the possibility of interaction in a positive or negative fashion to regulate CREB bioactivity.The 'leucine zipper' is a structure that is believed to mediate the function of several eukaryotic gene regulatory proteins. The zipper consists of a periodic repetition of leucine residues at every seventh position, and regions containing them appear to span 8 turns of α-helix. The leucine side chains that extend from one helix interact with those from a similar helix, hence facilitating dimerisation in the form of a coiled-coil. Leucine zippers are present in many gene regulatory proteins, including the CREB proteins, Jun/AP1 transcription factors, fos oncogene and fos-related proteins, C-myc, L-myc and N-myc oncogenes, and so on.
Ribosomal RNA-processing protein 12-like, conserved domain
Type:
Domain
Description:
Ribosomal RNA-processing protein 12 (RRP12) is a member of pre-ribosome-associated HEAT repeats-containing proteins [
,
,
]. RRP12 is required for nuclear export of both pre-40S and pre-60S ribosomal subunits, and it is also required for the late maturation of the 18S and 5.8S rRNA of the pre-40S ribosomes and for maturation of the 25S and 5.8S rRNA of the pre-60S ribosomes [,
].This entry represents a conserved domain found in RRP12 proteins, which consists of HEAT repeats.
UbiA prenyltransferase domain containing protein 1
Type:
Family
Description:
UbiA prenyltransferase domain-containing protein 1 is a prenyltransferase that mediates the formation of menaquinone-4 (MK-4), a vitamin K2 isoform present at high concentrations in the brain, kidney and pancreas [
]. Also included in this family are 1,4-dihydroxy-2-naphthoate octaprenyltransferase MenA [] and 2-carboxy-1,4-naphthoquinone phytyltransferase from bacteria.
Peroxisome membrane anchor protein Pex14p, N-terminal
Type:
Domain
Description:
This conserved region defines a group of peroxisomal membrane anchor proteins which bind the PTS1 (peroxisomal targeting signal) receptor and are required for the import of PTS1-containing proteins into peroxisomes. Loss of functional Pex14p results in defects in both the PTS1 and PTS2-dependent import pathways. Deletion analysis of this conserved region implicates it in selective peroxisome degradation. In the majority of members this region is situated at the N terminus of the protein [
,
].
Sec-independent periplasmic protein translocase, conserved site
Type:
Conserved_site
Description:
Proteins encoded by the mttABC operon (formerly yigTUW), mediate a novel Sec-independent membrane targeting and translocation system in Escherichia coli that interacts with cofactor-containing redox proteins having a S/TRRXFLK "twin arginine"leader motif. This family contains the E. coli mttB gene (TATC) [
].A functional Tat system or Delta pH-dependent pathway requires three integral membrane proteins: TatA/Tha4, TatB/Hcf106 and TatC/cpTatC. The TatC protein is essential for the function of both pathways. It might be involved in twin-arginine signal peptide recognition, protein translocation and proton translocation. Sequence analysis predicts that TatC contains six transmembrane helices (TMHs), and experimental data confirmed that N and C termini of TatC or cpTatC are exposed to the cytoplasmic or stromal face of the membrane. The cytoplasmic N terminus and the first cytoplasmic loop region of the E. coli TatC protein are essential for protein export. At least two TatC molecules co-exist within each Tat translocon [
,
].This entry represents a conserved site from the central section of these proteins.
This family is the protein translocase SEC61 complex gamma subunit of the archaeal and eukaryotic type. It does not hit bacterial SecE proteins. Sec61 is required for protein translocation in the endoplasmic reticulum.The Sec61 complex (eukaryotes) or SecY complex (prokaryotes) forms a conserved heterotrimeric integral membrane protein complex and forms a protein-conducting channel that allows polypeptides to be transferred across (or integrated into) the endoplasmic reticulum (eukaryotes) or across the cytoplasmic membrane (prokaryotes) [
,
]. This complex is itself a part of a larger translocase heterotrimeric complex composed of alpha, beta and gamma subunits.The channel is a passive conduit for polypeptides. It therefore has to associate with other components that provide a driving force. The partner proteins in bacteria and eukaryotes differ. In bacteria, the translocase complex comprises 7 proteins [
], including a chaperone protein (SecB) an ATPase (SecA), an integral membrane complex (SecY, SecE and SecG), and two additional membrane proteins that promote the release of the mature peptide into the periplasm (SecD) and SecF. The SecA ATPase interacts dynamically with the SecYEG integral membrane components to drive the transmembrane movement of newly synthesized preproteins []. In yeast (and probably in all eukaryotes), the full translocase comprises another membrane protein subcomplex (the tetrameric Sec62/63p complex), and the lumenal protein BiP, a member of the Hsp70 family of ATPases. BiP promotes translocation by acting as a molecular ratchet, preventing the polypeptide chain from sliding back into the cytosol [
].
Tls1 in fission yeast is required for telomeric heterochromatin assembly as well as telomere length control [
,
]. The human homologue, the splicing factor C9orf78 (also known as hepatocellular carcinoma-associated antigen 59), plays a role in pre-mRNA splicing by promoting usage of the upstream 3'-splice site at alternative NAGNAG splice sites []. It may also modulate exon inclusion events and plays a role in spliceosomal remodeling by displacing WBP4 from SNRNP200 and may act to inhibit SNRNP200 helicase activity [,
]. C9orf78 is required for proper chromosome segregation [].
The P-II protein (gene glnB) is a bacterial protein important for the control of glutamine synthetase [
,
,
]. In nitrogen-limiting conditions, when the ratio of glutamine to 2-ketoglutarate decreases, P-II is uridylylated on a tyrosine residue to form P-II-UMP. P-II-UMP allows the deadenylation of glutamine synthetase (GS), thus activating the enzyme. Conversely, in nitrogen excess, P-II-UMP is deuridylated and then promotes the adenylation of GS. P-II also indirectly controls the transcription of the GS gene (glnA) by preventing NR-II (ntrB) to phosphorylate NR-I (ntrC) which is the transcriptional activator of glnA. Once P-II is uridylylated, these events are reversed.P-II is a protein of about 110 amino acid residues extremely well conserved. The tyrosine, which is uridylated, is located in the central part of the protein. In cyanobacteria, P-II seems to be phosphorylated on a serine residue rather than being uridylated. In the red alga, Porphyra purpurea, there is a glnB homologue encoded in the chloroplast genome. Other proteins highly similar to glnB include Bacillus subtilis protein nrgB [
]; and Escherichia coli hypothetical protein ybaI [].This entry represents a conserved site in the C-terminal region of the P-II protein.
EMC3 is a subunit if the ER Membrane protein Complex (EMC), which is required for efficient folding of proteins in the endoplasmic reticulum (ER). Loss of the EMC leads to accumulation of misfolded membrane proteins [
].
Endoplasmic reticulum resident protein 29, C-terminal
Type:
Domain
Description:
ERp29 (also known as ERp28 and ERp31) is a ubiquitously expressed endoplasmic reticulum protein found in mammals [
]. This protein has an N-terminal thioredoxin-like domain, which is homologous to the domain of human protein disulphide isomerase (PDI). ERp29 may help mediate the chaperone function of PDI. The C-terminal Erp29 domain has a 5-helical bundle fold. ERp29 is thought to form part of the thyroglobulin folding complex []. The Drosophila homologue, Wind, is the product of windbeutel, an essential gene in the development of dorsal-ventral patterning. Wind is required for correct targeting of Pipe, a Golgi-resident type II transmembrane protein with homology to 2-O-sulfotransferase. The C-terminal domain of Wind is thought to provide a distinct site required for interaction with its substrate, Pipe [
].
Arp2/3 binds to pre-existing actin filaments and nucleates new daughter filaments, and thus becomes incorporated into the dynamic actin network at the leading edge of motile cells and other actin-based protrusive structures [
]. In order to nucleate filaments, Arp2/3 must bind to a member of the N-WASp/SCAR family protein []. Arp2 and Arp3 are thought to be brought together after activation, forming an actin-like nucleus for actin monomers to bind and create a new actin filament. In the absence of an activating protein, Arp2/3 shows very little nucleation activity. Recent research has focused on the binding and hydrolysis of ATP by Arp2 and Arp3 [], and crystal structures of the Arp2/3 complex have been solved [].The human complex consists of Arp2/3 complex composed of ARP2, ARP3, ARPC1B/p41-ARC, ARPC2/p34-ARC, ARPC3/p21-ARC, ARPC4/p20-ARC and ARPC5/p16-ARC. This family represents the ARPC2/p34-ARC subunit.
Uncharacterised protein family UPF0012, conserved site
Type:
Conserved_site
Description:
A group of uncharacterised proteins share this well-conserved region centred on a cysteine residue. These proteins are a subset of the carbon-nitrogen hydrolase family indicating that these as yet uncharacterised proteins may be related to members of this family [
].
Biotin synthase/Biotin biosynthesis bifunctional protein BioAB
Type:
Family
Description:
This family consists of biotin synthase B-like and bifunctional biotin synthase AB proteins. Biotin synthase
works with flavodoxin, S-adenosylmethionine, and possibly cysteine to catalyze the last step of the biotin biosynthetic pathway. The reaction consists of the introduction of a sulphur atom into
dethiobiotin, thus requiring activation of C-H bonds []. Biotin (vitamin H) is a prosthetic group in enzymes catalysing carboxylation and transcarboxylation reactions [].Biotin synthase from Escherichia coli is a homodimer of 76kDa, with each polypeptide
chain carrying an oxygen-sensitive (4Fe-4S) cluster, probably ligated by three cysteines ofa CXXXCXXC box conserved among all known BioB sequences
and a fourth still not identified ligand. BioB displays a pyridoxal phosphate-dependent cysteine desulphurase activity, which allows mobilisation of thesulphur atom from free cysteine [
].
This entry represents a domain found in a group of proteins, including Csm3 from budding yeasts, Swi3 from fission yeasts and TIPIN from animals. They are involved in DNA replication and the maintenance of replication fork stability [
,
,
].
This entry represents ER membrane protein complex subunit 6 (Emc6). Emc6 is a component of the ER membrane protein complex (EMC), which is composed of Emc1, Emc2, Emc3, Emc4, Emc5 and Emc6 in budding yeast [
]. The EMC complex is required for efficient folding of proteins in the ER [,
].
This entry contains the photosystem II P680 chlorophyll A apoprotein, PsbB.Photosystem II light-harvesting proteins are the intrinsic transmembrane antenna proteins CP43 (PsbC) and CP47 (PsbB) found in the reaction centre of PSII. These polypeptides bind to chlorophyll a and beta-carotene and pass the excitation energy on to the reaction centre [
]. This family also includes the iron-stress induced chlorophyll-binding protein CP43 (IsiA), which evolved in cyanobacteria from a PSII protein to cope with light limitations and stress conditions [].
UV radiation resistance protein/autophagy-related protein 14
Type:
Family
Description:
This entry includes Atg14 (autophagy-related protein 14) from budding yeasts, Vps38 from fission yeasts and their homologues, Atg14L/Bakor (beclin-1-associated autophagy-related key regulator) and UVRAG (UV irradiation resistance-associated gene), from animals. Atg14 is a hydrophilic protein with a coiled-coil motif at the N terminus region. Yeast cells with mutant Atg14 are defective not only in autophagy but also in sorting of carboxypeptidase Y (CPY), a vacuolar-soluble hydrolase, to the vacuole [
].Barkor positively regulates autophagy through its interaction with Beclin-1, with decreased levels of autophagosome formation observed when Barkor expression is eliminated [
]. UVRAG is also a Beclin1 binding protein that positively stimulate starvation-induced autophagy [
]. Autophagy mediates the cellular response to nutrient deprivation, protein aggregation, and pathogen invasion in humans, and malfunction of autophagy has been implicated in multiple human diseases including cancer. Class III phosphatidylinositol 3-kinase (PI3-kinase) regulates multiple membrane trafficking. In yeast, two distinct PI3-kinase complexes are known: complex I (Vps34, Vps15, Vps30/Atg6, and Atg14) is involved in autophagy, andcomplex II (Vps34, Vps15, Vps30/Atg6, and Vps38) functions in the vacuolar protein sorting pathway [
]. In mammals, complex II is also involved in autophagy []. The mammalian counterparts of Vps34, Vps15, and Vps30/Atg6 are Vps34, p150, and Beclin 1, respectively, and UV irradiation resistance-associated gene (UVRAG) has been identified as identical to yeast Vps38 [].
Arp2/3 binds to pre-existing actin filaments and nucleates new daughter filaments, and thus becomes incorporated into the dynamic actin network at the leading edge of motile cells and other actin-based protrusive structures [
]. In order to nucleate filaments, Arp2/3 must bind to a member of the N-WASp/SCAR family protein []. Arp2 and Arp3 are thought to be brought together after activation, forming an actin-like nucleus for actin monomers to bind and create a new actin filament. In the absence of an activating protein, Arp2/3 shows very little nucleation activity. Recent research has focused on the binding and hydrolysis of ATP by Arp2 and Arp3 [], and crystal structures of the Arp2/3 complex have been solved [].The human complex consists of Arp2/3 complex composed of ARP2, ARP3, ARPC1B/p41-ARC, ARPC2/p34-ARC, ARPC3/p21-ARC, ARPC4/p20-ARC and ARPC5/p16-ARC. This family represents the ARPC4/p20-ARC subunit.
Cytochrome c oxidase assembly protein COX16 is required for the assembly of cytochrome c oxidase [
]. It is foundin the inner membrane of the mitochondrion.
Major Histocompatibility Complex (MHC) glycoproteins are heterodimeric cell surface receptors that function to present antigen peptide fragments to T cells responsible for cell-mediated immune responses. MHC molecules can be subdivided into two groups on the basis of structure and function: class I molecules present intracellular antigen peptide fragments (~10 amino acids) on the surface of the host cells to cytotoxic T cells; class II molecules present exogenously derived antigenic peptides (~15 amino acids) to helper T cells. MHC class I and II molecules are assembled and loaded with their peptide ligands via different mechanisms. However, both present peptide fragments rather than entire proteins to T cells, and are required to mount an immune response.This superfamily represents MHC class I and II-like antigen-recognition domains from:MHC class II, N-terminal domains of alpha and beta chains [
]
MHC class I, alpha-1 and alpha-2 domains [
]
MHC class I related proteins, such as gammadelta T-cell ligand [
], Ulbp3 [], Fc (IgG) receptor (alpha-1 and -2 domains) [], CD1 (alpha-1 and -2 domains) [], zinc-alpha-2-glycoprotein ZAG (fat depleting factor) []
Immunomodulatory protein m144, alpha-1 and alpha-2 domains [
]
Haemochromatosis protein Hfe, alpha-1 and alpha-2 domains [
]
NK cell ligand RAE-1 beta [
]
Endothelial protein C receptor (phospholipid-binding protein) [
]
Homologous-pairing protein 2 (Hop2) is required for proper homologous pairing and efficient cross-over and intragenic recombination during meiosis [
,
,
].The mammalian HOP2 homologue, TBPIP, was first identified as a factor interacting with TBP-1, which binds to the human immunodeficiency virus, type 1 Tat protein [
]. Later, TBPIP was found to be an activator that specifically stimulates the homologous pairing catalyzed by DMC1 []. This entry represents the winged helix domain found in Hop2.
Trafficking protein particle complex subunit 2, also known as Sedlin, is a 140 amino-acid protein with a putative role in endoplasmic reticulum-to-Golgi transport [
]. Several missense mutations and deletion mutations in the SEDL gene, which result in protein truncation by frame shift, are responsible for spondyloepiphyseal dysplasia tarda, a progressive skeletal disorder (OMIM:313400) [].
Multiple myeloma tumor-associated protein 2-like, N-terminal
Type:
Domain
Description:
This entry represents a glycine-rich domain found in MMTA2 (multiple myeloma tumor-associated protein 2). The region may contain nuclear localisation signals, so it might act as a signal molecule in the nucleus [
,
].
Cell division cycle and apoptosis regulator protein 1 (CCAR1) associates with components of the Mediator and p160 coactivator complexes that play a role as intermediaries transducing regulatory signals from upstream transcriptional activator proteins to basal transcription machinery. CCAR1 also functions as a p53 coactivator and regulates expression of key proliferation-inducing genes [
].Cell cycle and apoptosis regulator protein 2 (CCAR2, also known as DBC-1) regulates biological processes such as transcription, heterochromatin formation, metabolism, mRNA splicing, apoptosis, and cell proliferation [
]. It is a core component of the DBIRD complex, which affects local transcript elongation rates and alternative splicing of a large set of exons embedded in (A + T)-rich DNA regions []. It binds to SIRT1 and is a negative regulator of SIRT1 []. DBC-1 has been implicated in tumorigenesis [].This entry also includes protein SHORT ROOT IN SALT MEDIUM 1 (RSA1, also known as EMB1579) from Arabidopsis. It regulates the transcription of several genes involved in the detoxification of reactive oxygen species generated by salt stress and the SOS1 gene that encodes a plasma membrane Na(+)/H(+) antiporter essential for salt tolerance [
]. RSA1 is localised to the nucleus and the loss of function of RSA1 affects global transcription and mRNA splicing [].
Arp2/3 binds to pre-existing actin filaments and nucleates new daughter filaments, and thus becomes incorporated into the dynamic actin network at the leading edge of motile cells and other actin-based protrusive structures [
]. In order to nucleate filaments, Arp2/3 must bind to a member of the N-WASp/SCAR family protein []. Arp2 and Arp3 are thought to be brought together after activation, forming an actin-like nucleus for actin monomers to bind and create a new actin filament. In the absence of an activating protein, Arp2/3 shows very little nucleation activity. Recent research has focused on the binding and hydrolysis of ATP by Arp2 and Arp3 [], and crystal structures of the Arp2/3 complex have been solved [].The human Arp2/3 complex composed of ARP2, ARP3, ARPC1B/p41-ARC, ARPC2/p34-ARC, ARPC3/p21-ARC, ARPC4/p20-ARC and ARPC5/p16-ARC. This family represents the ARPC1/p41-ARC subunit.
Malonyl CoA-acyl carrier protein transacylase, FabD-type
Type:
Family
Description:
Malonyl CoA-acyl carrier protein transacylases transfer the malonyl moiety from coenzyme A to acyl-carrier protein. This entry represents the FabD-type enzymes, which include the fatty acid biosynthesis protein FabD and the antibiotic biosynthesis proteins PksC, PksE, BaeE/C and ThaF [
,
,
,
].
Three transport protein particle (TRAPP) complexes exist in yeast (TRAPPI-TRAPPIII), which share a common core in addition to unique subunits. TRAPPI-TRAPPIII regulate endoplasmic reticulum (ER)-to-Golgi transport, intra-Golgi transport and autophagy, respectively. TRAPPC composition seems to be more complex in higher eukaryotes than in yeast, and its roles are less clear.Mammalian TRAPPC13 is involved in regulating autophagy and survival in response to small molecule compound-induced Golgi stress [
]. The overall architecture of TRAPPC is not disrupted upon TRAPPC13 depletion. This is also the case for yeast TRAPP II Trs65 subunit, which was previously reported to be specific to yeast. However, Trs65 has been shown to have homology to TRAPPC13 and they are now thought to be orthologues []. This entry consists of high eukaryotes TRAPPC13 and some related yeast proteins, but it does not include S. cerevisiae Trs65 (see ).
Domain 2 of the ribosomal protein S5 has a left-handed, 2-layer α/β fold with a core structure consisting of β(3)-α-β-α. Domains with this fold are found in numerous RNA/DNA-binding proteins, as well as in kinases from the GHMP kinase family. Proteins containing this α/β fold domain include: Translational machinery components (ribosomal proteins S5 and S9, and domain IV of elongation factors EF-G and eEF-2) [
].Ribonuclease P protein (RNase P) [
].Ribonuclease PH (domain 1) [
], as well as various exosome complex exonucleases (RRP41, RRP42, RRP43, RRP45, RRP46, MTR3, ECX1, ECX2) [].DNA modification proteins (DNA mismatch repair proteins MutL and PMS2, DNA gyrase B, DNA topoisomerase II, IV-B and VI-B) [
]. GHMP kinases that transfer a phosphoryl group from ATP to an acceptor (galactokinase (
), homoserine kinase (
), and mevalonate kinase (
)) [
,
].Caenorhabditis elegans early switch protein Xol-1 (a divergent member of the GHMP kinase family that has lost the ATP-binding site) [
].Hsp90 chaperone (middle domain), which is related to the DNA gyrase/MutL family [
]; this domain contains an extra C-terminal α/β subdomain.Imidazole glycerol phosphate dehydratase, which contains a duplication consisting of two structural repeats of this fold [
].The catalytic domain of ATP-dependent protease Lon (La), which contains an extra C-terminal α/β subdomain [
].Formaldehyde-activating enzyme FAE, which contains a modification of this fold consisting of an extra α/β unit after strand 2 [
].
The sequences found in this family are all derived from hypothetical plant proteins of unknown function. The region features a number of highly conserved cysteine residues.
Protein Networked (NET), actin-binding (NAB) domain
Type:
Domain
Description:
This entry represents the NAB domain found in the Networked proteins.The Networked (NET) proteins are a superfamily of plant-specific actin-binding
proteins which localize simultaneously to the actin cytoskeleton and specificmembrane compartments and are suggested to couple these membranes to the actin
cytoskeleton in plant cells. The minimal actin binding region, referred to asthe NET actin-binding (NAB) domain, represents a new actin binding motif
unique to plants with no apparent primary sequence homolgy to previouslyidentified actin binding domains. In Arabidopsis, the NAB domain always starts
with three conserved tryptophan residues, WWW, a motif whose worldwide webconnection gives added significance to the NET family name. The C-terminal
half of the domain is very highly conserved, more so than the N-terminal[
,
].The predicted secondary structure of the domain includes three major alpha
helices connected by a beta turns with the WWW motif predicted to form a betasheet [
,
].
Ribosomal protein L23/L15e core domain superfamily
Type:
Homologous_superfamily
Description:
Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [
,
]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [
,
].Both the L23 and L15e ribosomal proteins have a core domain consisting of a beta-(alpha)-beta-α-β(2) structure folded into three layers, alpha/beta/alpha, where the β-sheets are antiparallel.
A number of eukaryotic and archaebacterial ribosomal proteins can be grouped
on the basis of sequence similarities []. One of these families consists of:Mammalian L15.Insect L15.Plant L15.Yeast YL10 (L13) (Rp15r).Archaebacterial L15e.These ribosomal proteins have a structure corresponding to a 3-layer (α-β-alpha) sandwich.
Ribosomal protein S2, flavodoxin-like domain superfamily
Type:
Homologous_superfamily
Description:
Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [
,
]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [
,
].Ribosomal S2 proteins have been shown to belong to a family that includes 40S ribosomal subunit 40kDa proteins, putative laminin-binding proteins, NAB-1 protein and 29.3kDa protein from Haloarcula marismortui [
,
]. The laminin-receptor proteins are thus predicted to be the eukaryotic homologue of the eubacterial S2 risosomal proteins [].Ribosomal protein S2 (RPS2) are involved in formation of the translation initiation complex, where it might contact the messenger RNA and several components of the ribosome. It has been shown that in Escherichia coli RPS2 is essential for the binding of ribosomal protein S1 to the 30s ribosomal subunit. In humans, most likely in all vertebrates, and perhaps in all metazoans, the protein also functions as the 67kDa laminin receptor (LAMR1 or 67LR), which is formed from a 37kDa precursor, and is overexpressed in many tumors. 67LR is a cell surface receptor which interacts with a variety of ligands, laminin-1 and others. It is assumed that the ligand interactions are mediated via the conserved C terminus, which becomes extracellular as the protein undergoes conformational changes which are not well understood. Specifically, a conserved palindromic motif, LMWWML, may participate in the interactions. 67LR plays essential roles in the adhesion of cells to the basement membrane and subsequent signalling events, and has been linked to several diseases. Some evidence also suggests that the precursor of 67LR, 37LRP is also present in the nucleus in animals, where it appears associated with histones [,
,
,
,
,
,
,
,
,
,
,
,
,
].This entry represents a flavodoxin-like domain superfamily found in ribosomal protein S2.
Heat shock proteins, Hsp70 chaperones help to fold many proteins. Hsp70 assisted folding involves repeated cycles of substrate binding and release. Hsp70 activity is ATP dependent. Hsp70 proteins are made up of two regions: the amino terminus is the ATPase domain and the carboxyl terminus is the substrate binding region [
].Hsp70 proteins have an average molecular weight of 70kDa [
,
,
]. In most species,there are many proteins that belong to the hsp70 family. Some of these are only expressed under stress conditions (strictly inducible), while some are present in cells under normal growth conditions and are not heat-inducible (constitutive or cognate) [,
]. Hsp70 proteins can be found in different cellular compartments(nuclear, cytosolic, mitochondrial, endoplasmic reticulum, for example).This entry represents three conserved sites of the heat shock 70 protein family.
A number of eukaryotic and archaeal ribosomal proteins can be grouped on the basis of
sequence similarities. One of these families includes yeast S7 (YS6); archaeal S4e; and
mammalian and plant cytoplasmic S4 []. Two highly similar isoforms of mammalian S4 exist, one coded by a gene on chromosome Y, and the other on chromosome X. These proteins have
233 to 264 amino acids.This entry represents the conserved site of the ribosomal protein S4e that is found at the N-terminal region.
GTP-binding protein TrmE/Aminomethyltransferase GcvT, domain 1
Type:
Homologous_superfamily
Description:
This entry represents an alpha/beta domain found in GTP-binding protein TrmE (N-terminal domain) [
] and domain 1 of glycine cleavage T-protein (also known as aminomethyltransferase) [].TrmE is a guanine nucleotide-binding protein conserved between bacteria and eukaryotes. It is involved in the modification of uridine bases at the first anticodon (wobble) position of tRNAs. The N-terminal portion of the protein is necessary for mediating dimer formation within the protein [
]. Glycine cleavage T-protein is part of the glycine cleavage multienzyme complex (GCV) found in bacteria and the mitochondria of eukaryotes [
]. GCV catalyses the oxidative decarboxylation of glycine. The T-protein is an aminomethyl transferase . The N-terminal region (residues 14-35) of domain 1 plays a crucial role in H-protein interaction [
].
In eukaryotes this family is predicted to play a role in protein secretion and Golgi organisation [
]. In plants this family includes , which is involved in water permeability in the cuticles of fruit [
]. has been found to be expressed during early embryogenesis in mice [
]. This protein contains a conserved NRDE motif. This gene has been characterised in Drosophila melanogaster and named as transport and Golgi organisation 2, hence the name Tango2.
Putative pyruvate, phosphate dikinase regulatory protein
Type:
Family
Description:
This is a family of proteins which are putative bifunctional serine/threonine kinase/phosphorylases involved in the regulation of the pyruvate, phosphate dikinase (PPDK) by catalysing its phosphorylation/dephosphorylation [
,
]. In plants, the pyruvate, phosphate dikinase regulatory protein 1 (RP1) is a bifunctional serine/threonine kinase and phosphorylase involved in the dark/light-mediated regulation of PPDK by catalysing its phosphorylation/dephosphorylation. In the dark, RP1 phosphorylates the catalytic intermediate of PPDK (PPDK-HisP), inactivating it. Light exposure induces the phosphorolysis reaction that reactivates PPDK [
,
,
,
].
This family consists of several plant proteins of unknown function, including pathogen-associated molecular patterns-induced protein A70 from Arabidopsis thaliana, which is induced during Pseudomonas syringae infection by jasmonic acid and wounding [
].
RER1 family proteins are involved in the retrieval of some endoplasmic reticulum membrane proteins from the early golgi
compartment. The C terminus of yeast Rer1p interacts with a coatomer complex [].
The hsp70 chaperone machine performs many diverse roles in the cell, including folding of nascent proteins, translocation of polypeptides across organelle membranes, coordinating responses to stress, and targeting selected proteins for degradation. DnaJ is a member of the hsp40 family of molecular chaperones, which is also called the J-protein family, the members of which regulate the activity of hsp70s. DnaJ (hsp40) binds to DnaK (hsp70) and stimulates its ATPase activity, generating the ADP-bound state of DnaK, which interacts stably with the polypeptide substrate [
]. Besides stimulating the ATPase activity of DnaK through its J-domain, DnaJ also associates with unfolded polypeptide chains and prevents their aggregation [].DnaJ consists of an N-terminal conserved domain (called 'J' domain) of about 70 amino acid residues, a glycine and phenylalanine-rich domain ('G/F' domain), a central cysteine rich domain (CR-type zinc finger) containing four repeats of a CXXCXGXG motif which can coordinate two zinc atom and a C-terminal domain (CTD) [
].This entry represents the central cysteine-rich (CR) domain of DnaJ proteins. This central cysteine rich domain (CR-type zinc finger) has an overall V-shaped extended β-hairpin topology and contains four repeats of the motif CXXCXGXG where X is any amino acid. The isolated cysteine rich domain folds in zinc dependent fashion. Each set of two repeats binds one unit of zinc. Although this domain has been implicated in substrate binding, no evidence of specific interaction between the isolated DnaJ cysteine rich domain and various hydrophobic peptides has been found [
].
Guanine nucleotide binding proteins (G proteins) are membrane-associated, heterotrimeric proteins composed of three subunits: alpha (
), beta (
) and gamma (
) [
]. G proteins act as signal transducers, relaying a signal from a ligand-activated GPCR (G protein-coupled receptor) to an enzyme or ion channel effector. The activated GPCR promotes the exchange of GDP for GTP on the G protein alpha subunit, allowing the trimeric G protein to be released from the receptor and to dissociate into active (GTP-bound) alpha subunit and beta/gamma dimer, both of which activate distinct downstream effectors. There are several isoforms of each subunit, which together can makeup hundreds of combinations of G proteins, each one linking a specific receptor to a certain effector.The heterotrimeric G protein alpha subunit is composed of two domains: a GTP-binding domain and a helical insertion domain. The GTP-binding domain is homologous to Ras-like small GTPases, and includes switch regions I and II, which change conformation during activation. The helical insertion domain is inserted into the GTP-binding domain before switch region I, and is unique to heterotrimeric G proteins. This helical insertion domain functions to sequester the guanine nucleotide at the interface with the GTP-binding domain and must be displaced to enable nucleotide dissociation []. This superfamily represents the G protein alpha subunit helical insertion domain.
Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [
,
]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [
,
].The proteins in this entry belong to the zinc-binding subfamily of ribosomal protein S14. They bind 1 zinc ion per subunit and bind to the 16S rRNA. S14 is required for the assembly of 30S particles and may also be responsible for determining the conformation of the 16S rRNA at the A site.
This entry represents the middle domain of minichromosome loss protein 1, which lies between a 7-bladed β-propeller at the N terminus and a Homeobox (HMG) domain at the C terminus. The full length proteins with all three domains are referred to as DNA polymerase alpha accessory factor Mcl1, but the exact function of this domain is not known.
Mitogen-activated protein (MAP) kinase, conserved site
Type:
Conserved_site
Description:
Protein phosphorylation, which plays a key role in most cellular activities, is a reversible process mediated by protein kinases and phosphoprotein phosphatases. Protein kinases catalyse the transfer of the gamma phosphate from nucleotide triphosphates (often ATP) to one or more amino acid residues in a protein substrate side chain, resulting in a conformational change affecting protein function. Phosphoprotein phosphatases catalyse the reverse process.
Protein kinases fall into three broad classes, characterised with respect to substrate specificity []:Serine/threonine-protein kinasesTyrosine-protein kinasesDual specificity protein kinases (e.g. MEK - phosphorylates both Thr and Tyr on target proteins)Protein kinase function is evolutionarily conserved from Escherichia coli to human [
]. Protein kinases play a role in a multitude of cellular processes, including division, proliferation, apoptosis, and differentiation []. Phosphorylation usually results in a functional change of the target protein by changing enzyme activity, cellular location, or association with other proteins. The catalytic subunits of protein kinases are highly conserved, and several structures have been solved [], leading to large screens to develop kinase-specific inhibitors for the treatments of a number of diseases [].Eukaryotic serine-threonine mitogen-activated protein (MAP) kinases are key regulators of cellular signal transduction systems and are conserved from Saccharomyces cerevisiae (Baker's yeast) to human beings. MAPK pathways are signalling cascades differentially regulated by growth factors, mitogens, hormones and stress which mediate cell growth, differentiation and survival. MAPK activity is regulated through a (usually) three-tiered cascade composed of a MAPK, a MAPK kinase (MAPKK, MEK) and a MAPK kinase kinase (MAPKK, MEKK). Substrates for the MAPKs include other kinases and transcription factors [
]. Mammals express at least four distinctly related groups of MAPKs, extracellularly-regulated kinases (ERKs), c-jun N-terminal kinases (JNKs), p38 proteins and ERK5. Plant MAPK pathways have attracted increasing interest, resulting in the isolation of a large number of different components of MAPK cascades. MAPKs play important roles in the signalling of most plant hormones and in developmental processes [
]. In the budding yeast S. cerevisiae, four separate but structurally related mitogen-activated protein kinase (MAPK)activation pathways are known, regulating mating, cell integrity and osmosity [].Enzymes in this family are characterised by two domains separated by a deep channel where potential substrates might bind. The N-terminal domain creates a binding pocket for the adenine ring of ATP, and the C-terminal domain contains the catalytic base, magnesium binding sites and phosphorylation lip [
]. Almost all MAPKs possess a conserved TXY motif in which both the threonine and
tyrosine residues are phosphorylated during activation of the enzyme byupstream dual-specificity MAP kinase kinases (MAPKKs).
In bacteria, FtsZ [
,
,
,
] is an essential cell division protein involved in the initiation of this event. It assembles into a cytokinetic ring on the inner surface of the cytoplasmic membrane at the place where division will occur. The ring serves as a scaffold that is disassembled when septation is completed. FtsZ ring formation is initiated at a single site on one side of the bacterium and appears to grow bidirectionally. In Escherichia coli, MinCD , encoded by the MinB locus, form a complex which appears to block the formation of FtsZ rings at the cell poles, at the ancient mid cell division sites, whilst MinE, encoded at the same locus, specifically prevents the action of MinCD at mid cell.
FtsZ is a GTP binding protein with a GTPase activity. It undergoes GTP-dependent polymerisation into filaments (or tubules) that seem to form a cytoskeleton involved in septum synthesis. The structure and the properties of FtsZ clearly provide it with the capacity for the cytoskeletal, perhaps motor role, necessary for "contraction"along the division plane. In addition, however, the FtsZ ring structure provides the framework for the recruitment or assembly of the ten or so membrane and cytoplasmic proteins, uniquely required for cell division in E. coli or Bacillus subtilis, some of which are required for biogenesis of the new hemispherical poles of the two daughter cells. FtsZ can polymerise into various structures, for example a single linear polymer of FtsZ monomers, called a protofilament. Protofilaments can associate laterally to form pairs (sometimes called thick filaments), bundles (ill-defined linear associations of multiple protofilaments) or thick filaments, sheets (parallel or anti-parallel two-dimensional associations of thick filaments) and tubes (anti-parallel associations of thick filaments in a circular fashion to form a tubular structure). In addition, small circles of FtsZ monomers (a short protofilament bent around to join itself, apparently head to tail) have been observed and termed mini-rings. FtsZ is a protein of about 400 residues which is well conserved across bacterial species and which is also present in the chloroplast of plants [
] as well as in archaebacteria []. FtsZ is a homologue of eukaryotic tubulin with which it shows structural similarity.
Cytochrome b561 and DOMON domain-containing protein
Type:
Family
Description:
This entry represents a group of DOMON domain (named after DOpamine beta-MOnooxygenase N-terminal domain) containing proteins from plants. They also contain a cytochrome b561 domain C-terminal to the DOMON domain. DOMON domain could bind catecholamines and thereby could regulate the cytochrome b561 domain function. Proteins in this family may act as a catecholamine-responsive trans-membrane electron transporter [
].
Mismatch repair contributes to the overall fidelity of DNA replication and is essential for combating the adverse effects of damage to the genome. It involves the correction of mismatched base pairs that have been missed by the proofreading element of the DNA polymerase complex. The post-replicative Mismatch Repair System (MMRS) of Escherichia coli involves MutS (Mutator S), MutL and MutH proteins, and acts to correct point mutations or small insertion/deletion loops produced during DNA replication [
]. MutS and MutL are involved in preventing recombination between partially homologous DNA sequences. The assembly of MMRS is initiated by MutS, which recognises and binds to mispaired nucleotides and allows further action of MutL and MutH to eliminate a portion of newly synthesized DNA strand containing the mispaired base []. MutS can also collaborate with methyltransferases in the repair of O(6)-methylguanine damage, which would otherwise pair with thymine during replication to create an O(6)mG:T mismatch []. MutS exists as a dimer, where the two monomers have different conformations and form a heterodimer at the structural level []. Only one monomer recognises the mismatch specifically and has ADP bound. Non-specific major groove DNA-binding domains from both monomers embrace the DNA in a clamp-like structure. Mismatch binding induces ATP uptake and a conformational change in the MutS protein, resulting in a clamp that translocates on DNA. MutS is a modular protein with a complex structure [
], and is composed of:N-terminal mismatch-recognition domain, which is similar in structure to tRNA endonuclease.Connector domain, which is similar in structure to Holliday junction resolvase ruvC.Core domain, which is composed of two separate subdomains that join together to form a helical bundle; from within the core domain, two helices act as levers that extend towards (but do not touch) the DNA.Clamp domain, which is inserted between the two subdomains of the core domain at the top of the lever helices; the clamp domain has a β-sheet structure.ATPase domain (connected to the core domain), which has a classical Walker A motif.HTH (helix-turn-helix) domain, which is involved in dimer contacts.The MutS family of proteins is named after the Salmonella typhimurium MutS protein involved in mismatch repair. Homologues of MutS have been found in many species including eukaryotes (MSH 1, 2, 3, 4, 5, and 6 proteins), archaea and bacteria, and together these proteins have been grouped into the MutS family. Although many of these proteins have similar activities to the E. coli MutS, there is significant diversity of function among the MutS family members. Human MSH has been implicated in non-polyposis colorectal carcinoma (HNPCC) and is a mismatch binding protein [
].This diversity is even seen within species, where many species encode multiple MutS homologues with distinct functions []. Inter-species homologues may have arisen through frequent ancient horizontal gene transfer of MutS (and MutL) from bacteria to archaea and eukaryotes via endosymbiotic ancestors of mitochondria and chloroplasts []. This entry represents the core domain (domain 3) found in proteins of the MutS family. The core domain of MutS adopts a multi-helical structure comprised of two subdomains, which are interrupted by the clamp domain. Two of the helices in the core domain comprise the levers that extend towards the DNA. This domain is found associated with Pfam:PF00488, Pfam:PF05188, Pfam:PF01624 and Pfam:PF05190. The aligned region corresponds with domain III, which is central to the structure of Thermus aquaticus MutS as characterised in [
].
Mismatch repair contributes to the overall fidelity of DNA replication and is essential for combating the adverse effects of damage to the genome. It involves the correction of mismatched base pairs that have been missed by the proofreading element of the DNA polymerase complex. The post-replicative Mismatch Repair System (MMRS) of Escherichia coli involves MutS (Mutator S), MutL and MutH proteins, and acts to correct point mutations or small insertion/deletion loops produced during DNA replication [
]. MutS and MutL are involved in preventing recombination between partially homologous DNA sequences. The assembly of MMRS is initiated by MutS, which recognises and binds to mispaired nucleotides and allows further action of MutL and MutH to eliminate a portion of newly synthesized DNA strand containing the mispaired base []. MutS can also collaborate with methyltransferases in the repair of O(6)-methylguanine damage, which would otherwise pair with thymine during replication to create an O(6)mG:T mismatch []. MutS exists as a dimer, where the two monomers have different conformations and form a heterodimer at the structural level []. Only one monomer recognises the mismatch specifically and has ADP bound. Non-specific major groove DNA-binding domains from both monomers embrace the DNA in a clamp-like structure. Mismatch binding induces ATP uptake and a conformational change in the MutS protein, resulting in a clamp that translocates on DNA. MutS is a modular protein with a complex structure [
], and is composed of:N-terminal mismatch-recognition domain, which is similar in structure to tRNA endonuclease.Connector domain, which is similar in structure to Holliday junction resolvase ruvC.Core domain, which is composed of two separate subdomains that join together to form a helical bundle; from within the core domain, two helices act as levers that extend towards (but do not touch) the DNA.Clamp domain, which is inserted between the two subdomains of the core domain at the top of the lever helices; the clamp domain has a β-sheet structure.ATPase domain (connected to the core domain), which has a classical Walker A motif.HTH (helix-turn-helix) domain, which is involved in dimer contacts.The MutS family of proteins is named after the Salmonella typhimurium MutS protein involved in mismatch repair. Homologues of MutS have been found in many species including eukaryotes (MSH 1, 2, 3, 4, 5, and 6 proteins), archaea and bacteria, and together these proteins have been grouped into the MutS family. Although many of these proteins have similar activities to the E. coli MutS, there is significant diversity of function among the MutS family members. Human MSH has been implicated in non-polyposis colorectal carcinoma (HNPCC) and is a mismatch binding protein [
].This diversity is even seen within species, where many species encode multiple MutS homologues with distinct functions []. Inter-species homologues may have arisen through frequent ancient horizontal gene transfer of MutS (and MutL) from bacteria to archaea and eukaryotes via endosymbiotic ancestors of mitochondria and chloroplasts [
]. This entry represents the C-terminal domain found in proteins in the MutS family of DNA mismatch repair proteins. The C-terminal region of MutS is comprised of the ATPase domain and the HTH (helix-turn-helix) domain, the latter being involved in dimer contacts. Yeast MSH3 [
], bacterial proteins involved in DNA mismatch repair, and the predicted protein product of the Rep-3 gene of mouse share extensive sequence similarity. Human MSH has been implicated in non-polyposis colorectal carcinoma (HNPCC) and is a mismatch binding protein.
Bile acid:sodium symporter/arsenical resistance protein Acr3
Type:
Family
Description:
This family of proteins are found both in prokaryotes and eukaryotes. They are related to the human bile acid:sodium symporters (TC 2.A.28), which are transmembrane proteins functioning in the liver in the uptake of bile acids from portal blood plasma, a process mediated by the co-transport of Na
+[
].This entry also includes members of the ACR3 family of arsenite (As(III)) permeases, which confer resistance to arsenic by extrusion from cells [
]. They exist in prokaryotes and eukaryotes (lower plants and fungi) [,
]. The ACR3 permeases have ten-transmembrane span topology []. Corynebacterium glutamicum has three Acr3 proteins, CgAcr3-1, CgAcr3-2, and CgAcr3-3. CgAcr3-1 is thought to be an antiporter that catalyses arsenite-proton exchange [].The Shewanella oneidensis Acr3 is not able to transport As(III) and confers resistance only to arsenate (As(V)) [
], whereas the Acr3 orthologue from Synechocystis mediates tolerance to As(III), As(V) and antimonite (Sb(III)) [].In budding yeast, overexpression of the Acr3 gene confers an arsenite- but not an arsenate-resistance phenotype [
]. Saccharomyces cerevisiae Acr3 is a plasma membrane metalloid/H+ antiporter that transports arsenite and antimonite [].
This entry represents the START domain found in polyketide cylcases/dehydrases such as TcmN [
] and in coenzyme Q-binding protein COQ10 []. COQ10 is required for the function of coenzyme Q in the respiratory chain and has a steroidogenic acute regulatory protein-related lipid transfer (START) domain, known to bind specific lipids in other START domain family members [,
].
Mismatch repair contributes to the overall fidelity of DNA replication and is essential for combating the adverse effects of damage to the genome. It involves the correction of mismatched base pairs that have been missed by the proofreading element of the DNA polymerase complex. The post-replicative Mismatch Repair System (MMRS) of Escherichia coli involves MutS (Mutator S), MutL and MutH proteins, and acts to correct point mutations or small insertion/deletion loops produced during DNA replication [
]. MutS and MutL are involved in preventing recombination between partially homologous DNA sequences. The assembly of MMRS is initiated by MutS, which recognises and binds to mispaired nucleotides and allows further action of MutL and MutH to eliminate a portion of newly synthesized DNA strand containing the mispaired base []. MutS can also collaborate with methyltransferases in the repair of O(6)-methylguanine damage, which would otherwise pair with thymine during replication to create an O(6)mG:T mismatch []. MutS exists as a dimer, where the two monomers have different conformations and form a heterodimer at the structural level []. Only one monomer recognises the mismatch specifically and has ADP bound. Non-specific major groove DNA-binding domains from both monomers embrace the DNA in a clamp-like structure. Mismatch binding induces ATP uptake and a conformational change in the MutS protein, resulting in a clamp that translocates on DNA.
MutS is a modular protein with a complex structure [
], and is composed of:N-terminal mismatch-recognition domain, which is similar in structure to tRNA endonuclease.Connector domain, which is similar in structure to Holliday junction resolvase ruvC.Core domain, which is composed of two separate subdomains that join together to form a helical bundle; from within the core domain, two helices act as levers that extend towards (but do not touch) the DNA.Clamp domain, which is inserted between the two subdomains of the core domain at the top of the lever helices; the clamp domain has a β-sheet structure.ATPase domain (connected to the core domain), which has a classical Walker A motif.HTH (helix-turn-helix) domain, which is involved in dimer contacts.The MutS family of proteins is named after the Salmonella typhimurium MutS protein involved in mismatch repair. Homologues of MutS have been found in many species including eukaryotes (MSH 1, 2, 3, 4, 5, and 6 proteins), archaea and bacteria, and together these proteins have been grouped into the MutS family. Although many of these proteins have similar activities to the E. coli MutS, there is significant diversity of function among the MutS family members. Human MSH has been implicated in non-polyposis colorectal carcinoma (HNPCC) and is a mismatch binding protein [
].This diversity is even seen within species, where many species encode multiple MutS homologues with distinct functions []. Inter-species homologues may have arisen through frequent ancient horizontal gene transfer of MutS (and MutL) from bacteria to archaea and eukaryotes via endosymbiotic ancestors of mitochondria and chloroplasts []. This entry represents the clamp domain (domain 4) found in proteins of the MutS family. The clamp domain is inserted within the core domain at the top of the lever helices. It has a β-sheet structure [
].
Protein kinase C-like, phorbol ester/diacylglycerol-binding domain
Type:
Domain
Description:
Diacylglycerol (DAG) is an important second messenger. Phorbol esters (PE) are analogues of DAG and potent tumour promoters that cause a variety of physiological changes when administered to both cells and tissues. DAG activates a family of serine/threonine protein kinases, collectively known as protein kinase C (PKC) [
]. Phorbol esters can directly stimulate PKC. The N-terminal region of PKC, known as C1, has been shown [] to bind PE and DAG in a phospholipid and zinc-dependent fashion. The C1 region contains one or two copies (depending on the isozyme of PKC) of a cysteine-rich domain, which is about 50 amino-acid residues long, and which is essential for DAG/PE-binding. The DAG/PE-binding domain binds two zinc ions; the ligands of these metal ions are probably the six cysteines and two histidines that are conserved in this domain.
DNA recombination/repair protein RecA, conserved site
Type:
Conserved_site
Description:
The recA gene product is a multifunctional enzyme that plays a role in homologous recombination, DNA repair and induction of the SOS response [
]. In homologous recombination, the protein functions as a DNA-dependent ATPase, promoting synapsis, heteroduplex formation and strand exchange between homologous DNAs []. RecA also acts as a protease cofactor that promotes autodigestion of the lexA product and phage repressors. The proteolytic inactivation of the lexA repressor by an activated form of recA may cause a derepression of the 20 or so genes involved in the SOS response, which regulates DNA repair, induced mutagenesis, delayed cell division and prophage induction in response to DNA damage []. RecA is a protein of about 350 amino acid residues. Its sequence is very well conserved [
,
,
] among eubacterial species. It is also found in the chloroplast of plants []. RecA-like proteins are found in archaea and diverse eukaryotic organisms, like fission yeast, mouse or human. In the filament visualised by X-ray crystallography, β-strand 3, the loop C-terminal to β-strand 2, and α-helix D of the core domain form one surface that packs against αa-helix A and β-strand 0 (the N-terminal domain) of an adjacent monomer during polymerisation []. The core ATP-binding site domain is well conserved, with 14 invariant residues. It contains the nucleotide binding loop between β-strand 1 and α-helix C. The Escherichia coli sequence GPESSGKT matches the consensus sequence of amino acids (G/A)XXXXGK(T/S) for the Walker A box (also referred to as the P-loop) found in a number of nucleoside triphosphate (NTP)-binding proteins. Another nucleotide binding motif, the Walker B box is found at β-strand 4 in the RecA structure. The Walker B box is characterised by four hydrophobic amino acids followed by an acidic residue (usually aspartate). Nucleotide specificity and additional ATP-binding interactions are contributed by the amino acid residues at β-strand 2 and the loop C-terminal to that strand, all of which are greater than 90% conserved among bacterial RecA proteins.This signature pattern is specific for the bacterial and chloroplastic RecA protein, and covers the most conserved region within these proteins, namely a nonapeptide located in the middle of the sequence and which is part of the monomer-monomer interface in a recA filament.
The recA gene product is a multifunctional enzyme that plays a role in homologous recombination, DNA repair and induction of the SOS response [
]. In homologous recombination, the protein functions as a DNA-dependent ATPase, promoting synapsis, heteroduplex formation and strand exchange between homologous DNAs []. RecA also acts as a protease cofactor that promotes autodigestion of the lexA product and phage repressors. The proteolytic inactivation of the lexA repressor by an activated form of recA may cause a derepression of the 20 or so genes involved in the SOS response, which regulates DNA repair, induced mutagenesis, delayed cell division and prophage induction in response to DNA damage []. RecA is a protein of about 350 amino-acid residues. Its sequence is very well conserved [
,
,
] among eubacterial species. It is also found in the chloroplast of plants []. RecA-like proteins are found in archaea and diverse eukaryotic organisms, like fission yeast, mouse or human. In the filamentvisualised by X-ray crystallography, β-strand 3, the loop C-terminal to β-strand 2, and α-helix D of the core domain form one surface that packs against α-helix A and β-strand 0 (the N-terminal domain) of an adjacent monomer during polymerisation []. The core ATP-binding site domain is well conserved, with 14 invariant residues. It contains the nucleotide binding loop between β-strand 1 and α-helix C. The Escherichia coli sequence GPESSGKT matches the consensus sequence of amino acids (G/A)XXXXGK(T/S) for the Walker A box (alsoreferred to as the P-loop) found in a number of nucleoside triphosphate (NTP)-binding proteins. Another
nucleotide binding motif, the Walker B box is found at β-strand 4 in the RecA structure. The Walker Bbox is characterised by four hydrophobic amino acids followed by an acidic residue (usually aspartate). Nucleotide specificity and additional ATP binding interactions are contributed by the amino acid residues at β-strand 2 and the loop C-terminal to that
strand, all of which are greater than 90% conserved among bacterial RecA proteins.
Uncharacterised protein family UPF0758, conserved site
Type:
Conserved_site
Description:
UPF0758 was previously known as the radC family. The name was assigned according to the radC102 mutant of E. coli which was later demonstrated to be an allele of the transcription-repair-coupling factor recG [
,
]. It has been described as a putative JAMM-family deubiquitinating enzyme, but its function remains to be determined [].
This protein family is found in bacteria, including Protein YciF from Escherichia coli. This protein is produced by bacteria in response to stress conditions. It adopts a dimeric configuration in which each monomer shows five α-helices. Its function is still unknown [
].
Phosphotransferase system, IIA-like nitrogen-regulatory protein PtsN
Type:
Family
Description:
The phosphoenolpyruvate-dependent sugar phosphotransferase system (PTS) [
,
] is a major carbohydrate transport system in bacteria. The PTS catalyses the phosphorylation of incoming sugar substrates and coupled with translocation across the cell membrane, makes the PTS a link between the uptake and metabolism of sugars.The general mechanism of the PTS is the following: a phosphoryl group from phosphoenolpyruvate (PEP) is transferred via a signal transduction pathway, to enzyme I (EI) which in turn transfers it to a phosphoryl carrier, the histidine protein (HPr). Phospho-HPr then transfers the phosphoryl group to a sugar-specific permease, a membrane-bound complex known as enzyme 2 (EII), which transports the sugar to the cell. EII consists of at least three structurally distinct domains IIA, IIB and IIC [
]. These can either be fused together in a single polypeptide chain or exist as two or three interactive chains, formerly called enzymes II (EII) and III (EIII). The first domain (IIA or EIIA) carries the first permease-specific phosphorylation site, a histidine which is phosphorylated by phospho-HPr. The second domain (IIB or EIIB) is phosphorylated by phospho-IIA on a cysteinyl or histidyl residue, depending on the sugar transported. Finally, the phosphoryl group is transferred from the IIB domain to the sugar substrate concomitantly with the sugar uptake processed by the IIC domain. This third domain (IIC or EIIC) forms the translocation channel and the specific substrate-binding site. An additional transmembrane domain IID, homologous to IIC, can be found in some PTSs, e.g. for mannose [
,
,
,
]. These sequences, which are about 160 residues in length, are closely related to the fructose-specific phosphotransferase (PTS) system IIA component. It is a regulatory protein found only in species with a phosphoenolpyruvate-protein phosphotransferase (enzyme I of PTS systems) and an HPr-like phosphocarrier protein, but not all species have a IIC-like permease. Members of this family are found in Proteobacteria, Chlamydia, and the spirochete Treponema pallidum [
,
].
tRNA threonylcarbamoyl adenosine modification protein TsaE
Type:
Family
Description:
Members of this family have a conserved nucleotide-binding motif GXXGXGKT and a nucleotide-binding fold. Member protein YjeE of Haemophilus influenzae (HI0065) was shown to have ATPase activity [,
]. The protein has a nucleotide-binding fold with a four-stranded parallel β-sheet flanked by antiparallel β-strands on each side. The topology of the β-sheet is unique among P-loop proteins and has features of different families of enzymes. ADP has been shown to bind to the P-loop in the presence of Mg2+.Recently YjeE has been renamed as TsaE [
]. It is responsible for tRNA threonylcarbamoyl adenosine modification together with proteins TsaB, TsaC, and TsaD, and probably forms a complex with them [,
].
Methyl-accepting chemotaxis protein (MCP) signalling domain
Type:
Domain
Description:
Methyl-accepting chemotaxis proteins (MCPs) are a family of bacterial receptors that mediate chemotaxis to diverse signals, responding to changes in the concentration of attractants and repellents in the environment by altering swimming behaviour [
]. Environmental diversity gives rise to diversity in bacterial signalling receptors, and consequently there are many genes encoding MCPs []. For example, there are four well-characterised MCPs found in Escherichia coli: Tar (taxis towards aspartate and maltose, away from nickel and cobalt), Tsr (taxis towards serine, away from leucine, indole and weak acids), Trg (taxis towards galactose and ribose) and Tap (taxis towards dipeptides). MCPs share similar topology and signalling mechanisms. MCPs either bind ligands directly or interact with ligand-binding proteins, transducing the signal to downstream signalling proteins in the cytoplasm. MCPs undergo two covalent modifications: deamidation and reversible methylation at a number of glutamate residues. Attractants increase the level of methylation, while repellents decrease it. The methyl groups are added by the methyl-transferase cheR and are removed by the methylesterase cheB. Most MCPs are homodimers that contain the following organisation: an N-terminal signal sequence that acts as a transmembrane domain in the mature protein; a poorly-conserved periplasmic receptor (ligand-binding) domain; a second transmembrane domain; and a highly-conserved C-terminal cytoplasmic domain that interacts with downstream signalling components. The C-terminal domain contains the glycosylated glutamate residues. This entry represents the signalling domain found in several methyl-accepting chemotaxis proteins. This domain is thought to transduce the signal to CheA since it is highly conserved in very diverse MCPs.
Bacterial high affinity transport systems are involved in active transport of solutes across the cytoplasmic membrane. Most of the bacterial ABC (ATP-binding cassette) importers are composed of one or two transmembrane permease proteins, one or two nucleotide-binding proteins and a highly specific periplasmic solute-binding protein. In Gram-negative bacteria the solute-binding proteins are dissolved in the periplasm, while in archaea and Gram-positive bacteria, their solute-binding proteins are membrane-anchored lipoproteins [
,
]. On the basis of sequence similarities, the vast majority of these solute-binding proteins can be grouped [
] into eight families or clusters, which generally correlate with the nature of the solute bound. This entry represents a conserved site found in the extracellular solute-binding protein family 3 members from Gram-positive bacteria, Gram-negative bacteria and archaea.Familiy 3 members include:Histidine-binding protein (gene hisJ) of Escherichia coli and related bacteria. An homologous lipoprotein exists in Neisseria gonorrhoeae. Lysine/arginine/ornithine-binding proteins (LAO) (gene argT) of Escherichia coli and related bacteria are involved in the same transport system than hisJ. Both solute-binding proteins interact with a common membrane-bound receptor hisP of the binding protein dependent transport system HisQMP. Glutamine-binding proteins (gene glnH) of Escherichia coli and Bacillus stearothermophilus.Glutamate-binding protein (gene gluB) of Corynebacterium glutamicum. Arginine-binding proteins artI and artJ of Escherichia coli. Nopaline-binding protein (gene nocT) from Agrobacterium tumefaciens. Octopine-binding protein (gene occT) from Agrobacterium tumefaciens. Major cell-binding factor (CBF1) (gene: peb1A) from Campylobacter jejuni. Bacteroides nodosus protein aabA. Cyclohexadienyl/arogenate dehydratase of Pseudomonas aeruginosa, a periplasmic enzyme which forms an alternative pathway for phenylalanine biosynthesis. Escherichia coli protein fliY. Vibrio harveyi protein patH. Escherichia coli hypothetical protein ydhW. Bacillus subtilis hypothetical protein yckB. Bacillus subtilis hypothetical protein yckK.
Phosphate ABC transporter, substrate-binding protein PstS
Type:
Family
Description:
PstS is the substrate-binding component of the ABC-type transporter complex pstSACB, involved in phosphate import. This protein is accompanied, generally in the same operon, by an ATP binding protein (pstB) and two permease proteins (pstC and pstA). The accumulation of this protein is enhanced under phosphate starvation [
,
].Bacterial high affinity transport systems are involved in active transport of solutes across the cytoplasmic membrane. Most of the bacterial ABC (ATP-binding cassette) importers are composed of one or two transmembrane permease proteins, one or two nucleotide-binding proteins and a highly specific periplasmic solute-binding protein. In Gram-negative bacteria the solute-binding proteins are dissolved in the periplasm, while in archaea and Gram-positive bacteria, their solute-binding proteins are membrane-anchored lipoproteins [
,
].
ABC transporters belong to the ATP-Binding Cassette (ABC) superfamily, which uses the hydrolysis of ATP to energise diverse biological systems. ABC transporters minimally consist of two conserved regions: a highly conserved ATP binding cassette (ABC) and a less conserved transmembrane domain (TMD). These can be found on the same protein or on two different ones. Most ABC transporters function as a dimer and therefore are constituted of four domains, two ABC modules and two TMDs.ABC transporters are involved in the export or import of a wide variety of substrates ranging from small ions to macromolecules. The major function of ABC import systems is to provide essential nutrients to bacteria. They are found only in prokaryotes and their four constitutive domains are usually encoded by independent polypeptides (two ABC proteins and two TMD proteins). Prokaryotic importers require additional extracytoplasmic binding proteins (one or more per systems) for function. In contrast, export systems are involved in the extrusion of noxious substances, the export of extracellular toxins and the targeting of membrane components. They are found in all living organisms and in general the TMD is fused to the ABC module in a variety of combinations. Some eukaryotic exporters encode the four domains on the same polypeptide chain [
].The ABC module (approximately two hundred amino acid residues) is known to bind and hydrolyse ATP, thereby coupling transport to ATP hydrolysis in a large number of biological processes. The cassette is duplicated in several subfamilies. Its primary sequence is highly conserved, displaying a typical phosphate-binding loop: Walker A, and a magnesium binding site: Walker B. Besides these two regions, three other conserved motifs are present in the ABC cassette: the switch region which contains a histidine loop, postulated to polarise the attaching water molecule for hydrolysis, the signature conserved motif (LSGGQ) specific to the ABC transporter, and the Q-motif (between Walker A and thesignature), which interacts with the gamma phosphate through a water bond. The Walker A, Walker B, Q-loop and switch region form the nucleotide binding site [
,
,
].The 3D structure of a monomeric ABC module adopts a stubby L-shape with two distinct arms. ArmI (mainly β-strand) contains Walker A and Walker B. The important residues for ATP hydrolysis and/or binding are located in the P-loop. The ATP-binding pocket is located at the extremity of armI. The perpendicular armII contains mostly the alpha helical subdomain with the signature motif. It only seems to be required for structural integrity of the ABC module. ArmII is in direct contact with the TMD. The hinge between armI and armII contains both the histidine loop and the Q-loop, making contact with the gamma phosphate of the ATP molecule. ATP hydrolysis leads to a conformational change that could facilitate ADP release. In the dimer the two ABC cassettes contact each other through hydrophobic interactions at the antiparallel β-sheet of armI by a two-fold axis [
,
,
,
,
,
].The ATP-Binding Cassette (ABC) superfamily forms one of the largest of all protein families with a diversity of physiological functions [
]. Several studies have shown that there is a correlation between the functional characterisation and the phylogenetic classification of the ABC cassette [,
]. More than 50 subfamilies have been described based on a phylogenetic and functional classification [,
,
].These proteins are involved in the transmembrane transport of sulphate and thiosulphate and part of the ABC transporter complex cysAWTP responsible for energy coupling to the transport system. In Escherichia coli, the complex is composed of two ATP-binding proteins (cysA), two transmembrane proteins (cysT and cysW), and a solute-binding protein (cysP) [
].
Outer membrane protein OmpA-like, transmembrane domain
Type:
Domain
Description:
The ompA-like transmembrane domain is present in a number of different outer membrane proteins of several Gram-negative bacteria. Many of the proteins having this domain in the N-terminal also have the conserved bacterial outer membrane protein domain
at the C terminus. The outer membrane protein A of Escherichia coli (OmpA), is one of the most studied proteins in this group [
]. It has a multifunctional role. OmpA is required for the action of colicins K and L and for the stabilisation of mating aggregates in conjugation. It also serves as a receptor for a number of T-even like phages and can act as a porin with low permeability that allows slow penetration of small solutes [].OmpA consists of a regular, extended eight-stranded β-barrel and appears to be constructed like an inverse micelle with large water-filled cavities, but does not form a pore. The cavities seem to be highly conserved during evolution. The structure corroborates the concept that all outer membrane proteins consist of β-barrels [
]. The β-barrel membrane anchor appears to be the outer membrane equivalent of the single-chain α-helix anchor of the inner membrane.
Urea ABC transporter, substrate-binding protein UrtA-like
Type:
Family
Description:
This entry consists of ABC transporter substrate-binding proteins associated with urea transport and metabolism. This family includes UrtA from Cyanobacteria, which is encoded in an operon typically found adjacent to urease genes. It was shown in that disruption leads to the loss of high-affinity urea transport activity [
]. It also includes the periplasmic component (FmdD) of an active transport system for short-chain amides and urea (FmdDEF), found in Methylophilus methylotrophus []. These proteins tend to have the twin-arginine signal for Sec-independent transport across the plasma membrane.
Outer membrane protein assembly factor BamD is part of the outer membrane protein assembly Bam complex (composed of the outer membrane protein BamA, and four lipoproteins BamB, BamC, BamD and BamE), which is involved in assembly and insertion of β-barrel proteins into the outer membrane [,
,
,
,
]. In E. coli, the N-terminal of BamD may interact with various proteins as a chaperone to assist in the folding and insertion of proteins into the outer membrane. The C-terminal region of BamD may serve as the link between BamA, BamC and BamE [].
The bacterial protein motA [
] is required for the rotation of the flagellar motor. This protein probably forms a transmembrane proton channel used to energize the flagellar rotary motor. The motA protein is an integral membrane protein that contains four transmembrane domains.