BAR domains are dimerization, lipid binding and curvature sensing modules found in many different proteins with diverse functions [
]. This entry represents the BAR domain found in fungal proteins with similarity to Saccharomyces cerevisiae Reduced viability upon starvation protein 161 (Rvs161p) and Schizosaccharomyces pombe Hob3 (homologue of Bin3). S. cerevisiae Rvs161p plays a role in regulating cell polarity, actin cytoskeleton polarization, vesicle trafficking, endocytosis, bud formation, and the mating response. It forms a heterodimer with another BAR domain protein Rvs167p. Rvs161p and Rvs167p share common functions but are not interchangeable [,
]. Their BAR domains cannot be replaced with each other and the overexpression of one cannot suppress the mutant phenotypes of the other. S. pombe Hob3 is important in regulating filamentous actin localization and may be required in activating Cdc42 and recruiting it to cell division sites []. BAR domains form dimers that bind to membranes, induce membrane bending and curvature, and may also be involved in protein-protein interactions [].
DAZ (Deleted in Azoospermia) proteins are found almost exclusively in germ cells in distant animal species. Deletion or mutations of their encoding genes usually impair either oogenesis or spermatogenesis or both. The family includes three members, Boule (or Boll), Dazl (or Dazla) and DAZ, encoding RNA binding proteins. Boule is the ancestral gene that is conserved from flies to humans, whereas, Dazl arose in the early vertebrate lineage and Daz arrived on Y chromosome during primate evolution. Basically, DAZ family proteins have been proposed to function as adaptors for target mRNA transport and activators of their translation. These proteins have a highly conserved RNA recognition motif (RRM) for binding target mRNAs and one (Boule and Dazl) or multiple copies (Daz) of the DAZ domain, a 24 amino acids motif rich in Asn, Tyr, and Gln residues. The function of the DAZ domain is not known, but may be involved in protein-protein interactions [
,
,
,
,
]. The entry represents the DAZ domain.
The transcription factor activator protein (AP)-1 consists of Jun (c-Jun, JunB, and JunD), Fos (c-Fos, FosB, Fra1, and Fra2), ATF (ATFa, ATF-2 and ATF-3) and JDP (JDP-1 and JDP-2) family members [
]. They are basic leucine zipper transcription factors that play a central role in regulating gene transcription in various biological processes []. This entry includes Jun family members. AP-1 proteins have a α-helical bZIP domain, which contains a basic DNA-binding region and regularly spaced leucine residues known as the leucine zipper motif [
]. They have similar protein structure and can either form homodimers or form heterodimers with other AP-1 proteins (predominantly with Jun proteins), which can then bind to TRE-like sequences (consensus sequence 5'-TGAG/CTCA-3') []. Each of these proteins are expressed in different tissues and can be regulated in different ways, which means that every cell type has a complex mixture of AP-1 dimers with subtly different functions [].
This is the immunoglobulin domain on Izumo proteins from higher eukaryotes. Izumo is a typical type I membrane glycoprotein with one immunoglobulin-like domain and a putative N-glycoside link motif - glycosylation site. The full-length human IZUMO1 protein is a molecule with a single immunoglobulin (Ig) domain. It is thought that Izumo proteins bind to putative Izumo receptors on the oocyte. Izumo is not detectable on the surface of fresh sperm but becomes exposed only after an exocytotic process, the acrosome reaction, has occurred. Studies have shown that knock-out mice (Izumo-/- males) were sterile despite normal mating behaviour and ejaculation, indicating the importance of the protein in fertilisation [
]. There are cysteine residues thought to form a disulphide bridge. Izumo is a typical type I membrane glycoprotein with one immunoglobulin-like domain and a putative N-glycoside link motif (Asn 204) []. There is a conserved GCL sequence motif. Izumo expression has been found to be testis-specific [,
].
This entry represents a stuctural domain superfamily found in serine peptidases belonging to MEROPS peptidase families: S24 (LexA family, clan SF); S26A (signal peptidase I), S26B (signalase) and S26C TraF peptidase. This domain has a complex fold made of several coiled β-sheets, which contains an SH3-like barrel structure.The S26 family includes Escherichia coli signal peptidase, SPase, which is a membrane-bound endopeptidase with two N-terminal transmembrane segments and a C-terminal catalytic region [
]. SPase functions to release proteins that have been translocated into the inner membrane from the cell interior, by cleaving off their signal peptides. In SPase proteins, this domain is disrupted by the insertion of an additional all-beta subdomain. Note: This signature covers both the SH3-like barrel β-ribbon domain and the all-β subdomain inserted into it.The S24 family includes:the lambda repressor CI/C2 family and related bacterial prophage repressor proteins [
]. LexA, the diverse family of bacterial transcription factors that repress genes in the cellular SOS response to DNA damage [
,
]. MucA and the related UmuD proteins, which are lesion-bypass DNA polymerases, induced in response to mitogenic DNA damage [
]. UmuD is self-processed by its own serine protease activity during the SOS response.RulA, a component of the rulAB locus that confers resistance to UV.All of these proteins, with the possible exception of RulA, interact with RecA, which activates self cleavage either derepressing transcription in the case of CI and LexA [
] or activating the lesion-bypass polymerase in the case of UmuD and MucA. UmuD'2, is the homodimeric component of DNA pol V, which is produced from UmuD by RecA-facilitated self-cleavage. The first 24 N-terminal residues of UmuD are removed; UmuD'2 is a DNA lesion bypass polymerase [,
]. MucA [,
], like UmuD, is a plasmid encoded a DNA polymerase (pol RI) which is converted into the active lesion-bypass polymerase by a self-cleavage reaction involving RecA [].This group of proteins also contains proteins not recognised as peptidases as well as those classified as non-peptidase homologues as they either have been found experimentally to be without peptidase activity, or lack amino acid residues that are believed to be essential for catalytic activity.
This entry represents the beta-α-β-α-β(2) domains common both to bacterial chorismate mutase and to members of the YjgF/Yer057p/UK114 family. These proteins form trimers with a three-fold symmetry with three closely-packed β-sheets. The conserved domain is similar in structure to chorismate mutase but there is no sequence similarity and no functional connection. This domain homotrimerizes forming a distinct intersubunit cavity that may serve as a small molecule binding site [
,
,
,
].Chorismate mutase (CM,
) is an enzyme of the aromatic amino acid biosynthetic pathway that catalyses the reaction at the branch point of the pathway leading to the three aromatic amino acids, phenylalanine, tryptophan and tyrosine (chorismic acid is the last common intermediate, and CM leads to the L-phenylalanine/L-tyrosine branch). It is part of the shikimate pathway, which is present only in bacteria, fungi and plants. The structure of chorismate mutase enzymes from Bacillus subtilis [
] and Thermus thermophilus have been solved and were shown to have a catalytic homotrimer, with the active sites being located at the subunit interfaces, where residues from two subunits contribute to each site.The YjgF/YER057c/UK114 family is a large, highly conserved, and widely distributed family of proteins found in bacteria, archaea and eukaryotes [
]. YjgF (renamed RidA) deaminates reactive enamine/imine intermediates of pyridoxal 5'-phosphate (PLP)-dependent enzyme reactions. The YjgF/YER057c/UK114 family of proteins is conserved in all domains of life suggesting that reactive enamine/imine metabolites are of concern to all organisms []. This family includes: YjgF (RidA) [
]
the yeast growth inhibitor YER057c (protein HMF1) that appears to play a role in the regulation of metabolic pathways and cell differentiation [
]
the mammalian 14.5kDa translational inhibitor protein UK114, also known as L-PSP (liver perchloric acid-soluble protein), with endoribonucleolytic activity that directly affects mRNA translation and can induce disaggregation of the reticulocyte polysomes into 80 S ribosomes [
]
RutC from E. coli, which is essential for growth on uracil as sole nitrogen source and is thought to reduce aminoacrylate peracid to aminoacrylate [
] YabJ from B. subtilis, which is required for adenine-mediated repression of purine biosynthetic genes [
]
Structurally these proteins are homotrimers with clefts between the monomeric subunits that are proposed to have some functional relevance [,
,
].
Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes [
]. They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Many families of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence []. Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases [].Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base [
]. The geometric orientations of the catalytic residues are similar between families, despite different protein folds []. The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) [,
].This group of serine peptidases belong to MEROPS peptidase family S49 (protease IV family, clan S-). The predicted active site serine for members of this family occurs in a transmembrane domain. This group of sequences represent both long and short forms of the bacterial SppA and homologues found in the archaea and plants.Signal peptides of secretory proteins seem to serve at least two important biological functions. First, they are required for protein targeting to and translocation across membranes, such as the eubacterial plasma membrane and the endoplasmic reticular membrane of eukaryotes. Second, in addition to their role as determinants for protein targeting and translocation, certain signal peptides have a signalling function.During or shortly after pre-protein translocation, the signal peptide is removed by signal peptidases. The integral membrane protein, SppA (protease IV), of Escherichia coli was shown experimentally to degrade signal peptides. The member of this family from Bacillus subtilis has only been shown to be required for efficient processing of pre-proteins under conditions of hyper-secretion [
].
Tubby, an autosomal recessive mutation, mapping to mouse chromosome 7, was recently found to be the result of a splicing defect in a novel gene with unknown function. This mutation maps to the tub gene [
,
]. The mouse tubby mutation is the cause of maturity-onset obesity, insulin resistance and sensory deficits. By contrast with the rapid juvenile-onset weight gain seen in diabetes (db) and obese (ob) mice, obesity in tubby mice develops gradually, and strongly resembles the late-onset obesity observed in the human population. Excessive deposition of adipose tissue culminates in a two-fold increase of body weight. Tubby mice also suffer retinal degeneration and neurosensory hearing loss. The tripartite character of the tubby phenotype is highly similar to human obesity syndromes, such as Alstrom and Bardet-Biedl. Although these phenotypes indicate a vital role for tubby proteins, no biochemical function has yet been ascribed to any family member [
], although it has been suggested that the phenotypic features of tubby mice may be the result of cellular apoptosis triggered by expression of the mutated tub gene. TUB is the founding-member of the tubby-like proteins, the TULPs. TULPs are found in multicellular organisms from both the plant and animal kingdoms. Ablation of members of this protein family cause disease phenotypes that are indicative of their importance in nervous-system function and development [].Mammalian TUB is a hydrophilic protein of ~500 residues. The N-terminal (
) portion of the protein is conserved neither in length nor sequence, but, in TUB, contains the nuclear localisation signal and may have transcriptional-activation activity. The C-terminal 250 residues are highly conserved. The C-terminal extremity contains a cysteine residue that might play an important role in the normal functioning of these proteins. The crystal structure of the C-terminal core domain from mouse tubby has been determined to 1.9A resolution. This domain is arranged as a 12-stranded, all anti-parallel, closed β-barrel that surrounds a central alpha helix, (which is at the extreme carboxyl terminus of the protein) that forms most of the hydrophobic core. Structural analyses suggest that TULPs constitute a unique family of bipartite transcription factors [
].
Tubby, an autosomal recessive mutation, mapping to mouse chromosome 7, was recently found to be the result of a splicing defect in a novel gene with unknown function. This mutation maps to the tub gene [
,
]. The mouse tubby mutation is the cause of maturity-onset obesity, insulin resistance and sensory deficits. By contrast with the rapid juvenile-onset weight gain seen in diabetes (db) and obese (ob) mice, obesity in tubby mice develops gradually, and strongly resembles the late-onset obesity observed in the human population. Excessive deposition of adipose tissue culminates in a two-fold increase of body weight. Tubby mice also suffer retinal degeneration and neurosensory hearing loss. The tripartite character of the tubby phenotype is highly similar to human obesity syndromes, such as Alstrom and Bardet-Biedl. Although these phenotypes indicate a vital role for tubby proteins, no biochemical function has yet been ascribed to any family member [], although it has been suggested that the phenotypic features of tubby mice may be the result of cellular apoptosis triggered by expression of the mutated tub gene. TUB is the founding-member of the tubby-like proteins, the TULPs. TULPs are found in multicellular organisms from both the plant and animal kingdoms. Ablation of members of this protein family cause disease phenotypes that are indicative of their importance in nervous-system function and development [].Mammalian TUB is a hydrophilic protein of ~500 residues. The N-terminal (
) portion of the protein is conserved neither in length nor sequence, but, in TUB, contains the nuclear localisation signal and may have transcriptional-activation activity. The C-terminal 250 residues are highly conserved. The C-terminal extremity contains a cysteine residue that might play an important role in the normal functioning of these proteins. The crystal structure of the C-terminal core domain from mouse tubby has been determined to 1.9A resolution. This domain is arranged as a 12-stranded, all anti-parallel, closed β-barrel that surrounds a central alpha helix, (which is at the extreme carboxyl terminus of the protein) that forms most of the hydrophobic core. Structural analyses suggest that TULPs constitute a unique family of bipartite transcription factors [
].This superfamily represents the tubby C-terminal domain and the structurally related LURP1-like domain.
Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes []. They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Many families of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence []. Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases [].Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base [
]. The geometric orientations of the catalytic residues are similar between families, despite different protein folds []. The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) [,
].This signature defines the active site of the serine peptidases belonging to the MEROPS peptidase family S16 (lon protease family, clan SF). These proteases which are dependent on the hydrolysis of ATP for their activity and have a serine in their active site, they include:Bacterial ATP-dependent proteases [
,
]. The prototype of those bacterial enzymes is the Escherichia coli La protease () (gene lon). La is capable of hydrolysing large proteins; it degrades short-lived regulatory (such as rcsA and sulA) and abnormal proteins. It is a cytoplasmic protein of 87kDa that associates as an homotetramer. Its proteolytic activity is stimulated by single-stranded DNA.
Eukaryotic mitochondrial matrix proteases [
,
]. The prototype of these enzymes is the yeast PIM1 protease. It is a mitochondrial matrix protein of 120kDa that associated as an homohexamer. It catalyses the initial step of mitochondrial protein degradation.Haemophilus influenzae lon-B (HI1324), a protein which does not contain the ATP-binding domain, but possess a slightly divergent form of the catalytic domain.
The short-chain dehydrogenases/reductases family (SDR) [] is a very large family of enzymes, most of which are known to be NAD- or NADP-dependent oxidoreductases. As the first member of this family to be characterised was drosophila alcohol dehydrogenase, this family used to be called [,
,
] 'insect-type', or 'short-chain' alcohol dehydrogenases. Most members of this family are proteins of about 250 to 300 amino acid residues. Most dehydrogenases possess at least two domains [], the first binding the coenzyme, often NAD, and the second binding the substrate. This latter domain determines the substrate specificity and contains amino acids involved in catalysis. Little sequence similarity has been found in the coenzyme binding domain although there is a large degree of structural similarity, and it has therefore been suggested that the structure of dehydrogenases has arisen through gene fusion of a common ancestral coenzyme nucleotide sequence with various substrate specific domains [].Insect ADH is very different from yeast and mammalian ADHs. The enzyme from Drosophila lebanonensis (Fruit fly) has been characterised by protein analysis and was
found to have a 254-residue protein chain with an acetyl-blocked N-terminalMet [
]. Comparisons with the enzyme from other species reveals that theyhave diverged considerably. The structural variation within drosophila is
about as large as that for mammalian zinc-containing alcohol dehydrogenase.The crystal structure of the apo form of drosophila ADH has been solved to
1.9A resolution []. Three structural features characterise the active site architecture: (i) a deep cavity, covered by a
flexible 33-residue loop and an 11-residue C-terminal tail of the neighbouring subunit, whose hydrophobic surface is likely to increase the
specificity of the enzyme for secondary aliphatic alcohols; (ii) the Ser-Tyr-Lys residues of the catalytic triad are known to be involved in
enzymatic catalysis; and (iii) three well-ordered water molecules in hydrogen bonding distance of side-chains of the catalytic triad may be significant
for the proton release steps in the catalysis. A number of proteins within the SDR family share a strong phylogenetic
relationship with insect ADH. Amongst these are drosophila ADH-relatedprotein (duplicate of Adh or Adh-dup) [
]; drosophila fat body protein; and development-specific 25Kd protein from Sarcophaga peregrina (Flesh fly).
The short-chain dehydrogenases/reductases family (SDR) [
] is a very large family of enzymes, most of which are known to be NAD- or NADP-dependent oxidoreductases. As the first member of this family to be characterised was Drosophila alcohol dehydrogenase, this family used to be called [,
,
] 'insect-type', or 'short-chain' alcohol dehydrogenases. Most members of this family are proteins of about 250 to 300 amino acid residues. Most dehydrogenases possess at least two domains [], the first binding the coenzyme, often NAD, and the second binding the substrate. This latter domain determines the substrate specificity and contains amino acids involved in catalysis. Little sequence similarity has been found in the coenzyme binding domain although there is a large degree of structural similarity, and it has therefore been suggested that the structure of dehydrogenases has arisen through gene fusion of a common ancestral coenzyme nucleotide sequence with various substrate specific domains [
].Insect ADH is very different from yeast and mammalian ADHs. The enzyme from
Drosophila lebanonensis (Fruit fly) has been characterised by protein analysis and wasfound to have a 254-residue protein chain with an acetyl-blocked N-terminal
Met []. Comparisons with the enzyme from other species reveals that theyhave diverged considerably. The structural variation within Drosophila is
about as large as that for mammalian zinc-containing alcohol dehydrogenase.The crystal structure of the apo form of D. lebanonensis ADH has been solved to
1.9A resolution []. Three structural features characterise the active site architecture: (i) a deep cavity, covered by a
flexible 33-residue loop and an 11-residue C-terminal tail of the neighbouring subunit, whose hydrophobic surface is likely to increase the
specificity of the enzyme for secondary aliphatic alcohols; (ii) the Ser-Tyr-Lys residues of the catalytic triad are known to be involved in
enzymatic catalysis; and (iii) three well-ordered water molecules in hydrogen bonding distance of side-chains of the catalytic triad may be significant
for the proton release steps in the catalysis.A number of proteins within the SDR family share a strong phylogenetic
relationship with insect ADH. Amongst these are Drosophila ADH-relatedprotein (duplicate of Adh or Adh-dup) [
]; drosophila fat body protein; and development-specific 25Kd protein from Sarcophaga peregrina (Flesh fly).
Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes [
]. They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Many families of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence []. Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases [].Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base [
]. The geometric orientations of the catalytic residues are similar between families, despite different protein folds []. The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) [,
].Limited proteolysis of most large protein precursors is carried out in vivo by the subtilisin-like pro-protein convertases. Many important biological processes such as peptide hormone synthesis, viral protein processing and receptor maturation involve proteolytic processing by these enzymes [
]. The subtilisin-serine protease (SRSP) family hormone and pro-protein convertases (furin, PC1/3, PC2, PC4, PACE4, PC5/6, and PC7/7/LPC) act within the secretory pathway to cleave polypeptide precursors at specific basic sites, generating their biologically active forms. Serum proteins, pro-hormones, receptors, zymogens, viral surface glycoproteins, bacterial toxins, amongst others, are activated by this route []. The SRSPs share the same domain structure, including a signal peptide, the pro-peptide, the catalytic domain, the P/middle or homo B domain, and the C terminus.This group represents bacillopeptidase F (Bpr) an extracellular serine protease that is expressed at the beginning of the stationary phase [
]. Members of this family belong to MEROPS peptidase family S8, subfamily S8A (subtilisin, clan SB). Bpr is required for the processing and activation of the extracellular metalloprotease (Mpr, glutamyl endopeptidase ) in Bacillus subtilis [
].
Protein tyrosine (pTyr) phosphorylation is a common post-translational modification which can create novel recognition motifs for protein interactions and cellular localisation, affect protein stability, and regulate enzyme activity. Consequently, maintaining an appropriate level of protein tyrosine phosphorylation is essential for many cellular functions. Tyrosine-specific protein phosphatases (PTPase;
) catalyse the removal of a phosphate group attached to a tyrosine residue, using a cysteinyl-phosphate enzyme intermediate. These enzymes are key regulatory components in signal transduction pathways (such as the MAP kinase pathway) and cell cycle control, and are important in the control of cell growth, proliferation, differentiation and transformation [
,
]. The PTP superfamily can be divided into four subfamilies []:(1) pTyr-specific phosphatases(2) dual specificity phosphatases (dTyr and dSer/dThr)(3) Cdc25 phosphatases (dTyr and/or dThr)(4) LMW (low molecular weight) phosphatasesBased on their cellular localisation, PTPases are also classified as:Receptor-like, which are transmembrane receptors that contain PTPase domains [
]
Non-receptor (intracellular) PTPases [
]
All PTPases carry the highly conserved active site motif C(X)5R (PTP signature motif), employ a common catalytic mechanism, and share a similar core structure made of a central parallel β-sheet with flanking α-helices containing a β-loop-α-loop that encompasses the PTP signature motif [
]. Functional diversity between PTPases is endowed by regulatory domains and subunits. This entry represents non-receptor PTPase types 6 and 11, also known as SHP-1 and SHP-2 respectively. SHP-1 is expressed predominantly in haematopoietic and epithelial cells, playing an important role in haematopoiesis and functioning as a terminator of signalling transduction, predominantly by dephosphorylation of appropriate substrate proteins [
]. SHP-2 is expressed in most cell types and is involved in signal transduction stimulated by epidermal growth factor, platelet-derived growth factor, and insulin, acting as a positive regulator of cell proliferation []. The structure of human SHP-2 () shows that its catalytic activity is regulated by its two SH2 domains [
]. In the absence of protein, the N-terminal SH2 domain binds the phosphatase domain, inhibiting its activity, while the binding of a tyrosine-phosphorylated substrate to this domain causes a conformational change which activates the enzyme. The C-terminal SH2 domain does not play a direct role in activation, but contributes to substrate specificity and binding energy.
Protein tyrosine (pTyr) phosphorylation is a common post-translational modification which can create novel recognition motifs for protein interactions and cellular localisation, affect protein stability, and regulate enzyme activity. Consequently, maintaining an appropriate level of protein tyrosine phosphorylation is essential for many cellular functions. Tyrosine-specific protein phosphatases (PTPase; ) catalyse the removal of a phosphate group attached to a tyrosine residue, using a cysteinyl-phosphate enzyme intermediate. These enzymes are key regulatory components in signal transduction pathways (such as the MAP kinase pathway) and cell cycle control, and are important in the control of cell growth, proliferation, differentiation and transformation [
,
]. The PTP superfamily can be divided into four subfamilies []:(1) pTyr-specific phosphatases(2) dual specificity phosphatases (dTyr and dSer/dThr)(3) Cdc25 phosphatases (dTyr and/or dThr)(4) LMW (low molecular weight) phosphatasesBased on their cellular localisation, PTPases are also classified as:Receptor-like, which are transmembrane receptors that contain PTPase domains [
]
Non-receptor (intracellular) PTPases [
]
All PTPases carry the highly conserved active site motif C(X)5R (PTP signature motif), employ a common catalytic mechanism, and share a similar core structure made of a central parallel β-sheet with flanking α-helices containing a β-loop-α-loop that encompasses the PTP signature motif [
]. Functional diversity between PTPases is endowed by regulatory domains and subunits. This entry represents non-receptor PTPase types 1 and 2 (also known as T-cell PTPase). These types appear to have different biological functions: in knock-out mice, type1 knock-outs showed increased insulin sensitivity but a normal lifespan, while type 2 knock-outs died when only a few weeks old [
,
]. Substrate-trapping experiments suggest that these types recognise different cellular targets, though it is not known if this is due to sequence differences or to other regulatory mechanisms. Regulation of function and activity can occur at the transcriptional, alternative splicing, proteolytic processing and covalent modification levels. For example, T-cell PTPase has two different isoforms generated by alternative splicing, one of which recognises nuclear substrates, while the other recognises cytoplasmic substrates. These proteins adopt an α-β-alpha fold where the active site is located in a deep cleft located on the surface of the protein [,
].
This entry represents a stuctural domain superfamily found in serine peptidases belonging to MEROPS peptidase families: S24 (LexA family, clan SF); S26A (signal peptidase I), S26B (signalase) and S26C TraF peptidase. This domain has a complex fold made of several coiled β-sheets, which contains an SH3-like barrel structure.The S26 family includes Escherichia coli signal peptidase, SPase, which is a membrane-bound endopeptidase with two N-terminal transmembrane segments and a C-terminal catalytic region [
]. SPase functions to release proteins that have been translocated into the inner membrane from the cell interior, by cleaving off their signal peptides. In SPase proteins, this domain is disrupted by the insertion of an additional all-beta subdomain. Note: This signature covers both the SH3-like barrel β-ribbon domain and the all-β subdomain inserted into it.The S24 family includes:the lambda repressor CI/C2 family and related bacterial prophage repressor proteins [
]. LexA, the diverse family of bacterial transcription factors that repress genes in the cellular SOS response to DNA damage [,
]. MucA and the related UmuD proteins, which are lesion-bypass DNA polymerases, induced in response to mitogenic DNA damage [
]. UmuD is self-processed by its own serine protease activity during the SOS response.RulA, a component of the rulAB locus that confers resistance to UV.All of these proteins, with the possible exception of RulA, interact with RecA, which activates self cleavage either derepressing transcription in the case of CI and LexA [
] or activating the lesion-bypass polymerase in the case of UmuD and MucA. UmuD'2, is the homodimeric component of DNA pol V, which is produced from UmuD by RecA-facilitated self-cleavage. The first 24 N-terminal residues of UmuD are removed; UmuD'2 is a DNA lesion bypass polymerase [,
]. MucA [,
], like UmuD, is a plasmid encoded a DNA polymerase (pol RI) which is converted into the active lesion-bypass polymerase by a self-cleavage reaction involving RecA [].This group of proteins also contains proteins not recognised as peptidases as well as those classified as non-peptidase homologues as they either have been found experimentally to be without peptidase activity, or lack amino acid residues that are believed to be essential for catalytic activity.
Notch cell surface receptors are large, single-pass type-1 transmembrane proteins found in a diverse range of metazoan species, from human to Caenorhabditis species. The fruit fly, Drosophila melanogaster, possesses only one Notch protein, whereas in C.elegans, two receptors have been found; by contrast, four Notch paralogues (designated N1-4) have been identified in mammals, playing both unique and redundant roles. The hetero-oligomer Notch comprises a large extracellular domain (ECD), containing 10-36 tandem Epidermal Growth Factor (EFG)-like repeats, which are involved in ligand interactions; a negative regulatory region, including three cysteine-rich Lin12-Notch Repeats (LNR); a single trans-membrane domain (TM); a small intracellular domain (ICD), which includes a RAM (RBPjk-association module) domain; six ankyrin repeats (ANK), which are involved in protein-protein interactions; and a PEST domain. Drosophila Notch also contains an OPA domain [
]. Notch signalling is an evolutionarily conserved pathway involved in a wide variety of developmental processes, including adult homeostasis and stem cell maintenance, cell proliferation and apoptosis [
]. Notch is activated by a range of ligands -the so-called DSL ligands (Delta/Seratte/LAG-2). Activation is also mediated by a sequence of proteolytic events: ligand binding leads to cleavage of Notch by ADAM proteases [
] at site 2 (S2) and presenilin-1/g-secretase at sites 3 (S3)and 4 (S4) [].The last cleavage releases the Notch intracellular part of the protein (NICD) from the membrane and, upon release, the NICD translocates to the nucleus where it associates with a CBF1/RBJk/Su(H)/Lag1 (CSL) family of DNA-binding proteins. The subsequent recruitment of a co-activator mastermind like (MAML1) protein [] promotes transcriptional activation of Notch target genes: well established Notch targets are the Hes and Hey gene families. Aberrant Notch function and signalling has been associated with a number of human disorders, including Allagile syndrome, spondylocostal dysostosis, aortic valve disease, CADASIL (Cerebral Autosomal Dominant Arteriopathy with Subcortical Infarcts and Leukoencephalopathy), and T-cell Acute Lympho-blastic Leukemia (T-ALL); it has also been implicated in various human carcinomas [
,
]. Notch3 displays a more restrictive distribution than the rest of the Notch subtypes, being expressed predominantly in vascular smooth muscle cells, the central nervous system, certain thymocytes subsets, and in regulatory T cells [
].
The YqgF domain superfamily is described as RNase H-like and typified by the Escherichia coli protein YqgF [
].YqgF domain-containing proteins are predicted to be ribonucleases or resolvases based on homology to RuvC Holliday junction resolvases.The group of proteins containing this domain are found primarily in the low-GC Gram-positive bacteria Holliday junction resolvases (HJRs) and in eukaryote orthologs. The RuvC HJRs are conspicuously absent in the low-GC Gram-positive bacterial lineage, with the exception of Ureaplasma urealyticum(
, [
]). Furthermore, loss of function ruvC mutants of Escherichia coli show a residual HJR activity that cannot be ascribed to the prophage-encoded RusA resolvase []. This suggests that the YqgF family proteins could be alternative HJRs whose function partially overlaps with that of RuvC [].The functions of eukaryotic proteins having this domain are less well described. In Saccharomyces cerevisiae (Baker's yeast) Spt6p and its orthologues, the catalytic residues are substituted indicating that they lack the enzymatic function of resolvases [
]. Spt6p has been implicated in transcription initiation [] and in maintaining normal chromatin structure during transcription elongation [].Horizontal gene transfer, lineage-specific gene loss and gene family expansion, and non-orthologous gene displacement seem to have been major forces in the evolution of HJRs and related nucleases. The diversity of HJRs and related nucleases in bacteria and archaea contrasts with their near absence in eukaryotes. The few detected eukaryotic representatives of the endonuclease fold and the RNase H fold have probably been acquired from bacteria via horizontal gene transfer. The identity of the principal HJR(s) involved in recombination in eukaryotes remains uncertain; this function could be performed by topoisomerase IB or by a novel, so far undetected, class of enzymes. Likely HJRs and related nucleases were identified in the genomes of numerous bacterial and eukaryotic DNA viruses. Gene flow between viral and cellular genomes has probably played a major role in the evolution of this class of enzymes.The YqgF domain is also found in Tex proteins, where maintains the core structural elements and aligns especially well with RuvC nucleases, although Tex does not appear to possess nuclease activity [
]. Tex (toxin expression) is a highly conserved bacterial protein involved in expression of critical toxin genes [].
Bordetella pertussis is a Gram-negative, aerobic coccobacillus that causes
pertussis (whooping cough), especially in young children []. Once present in the lungs, the bacterium attaches to ciliated pulmonary epithelial cells via a collection of outer membrane proteins, all of which are virulence factors.
Pertactin, or P69 protein, is one of these virulence factors. Pertactin and
filamentous haemagglutinin have been identified as Bordetella adhesins []. Both proteins contain an arg-gly-asp (RGD) motif that promotes binding to integrins, known to be important in cell mobility and development. Theproduction of most Bordetella virulence factors (including pertactin) is
controlled by a two-component signal transduction system, comprising theBvgA regulator and the BvgS sensor [
]. Pertactin shares a high level of similarity with other Bordetella adhesins, such as BrkA. The protein isfirst produced as a 93kDa precursor. Upon secretion into the extracellular
environment, a 30kDa domain at the C terminus remains in the outer membrane,while the mature 60.4kDa pertactin molecule is released [
].The crystal structure of mature pertactin has been determined to 2.5A
resolution by means of X-ray diffraction. The fold is characterised by a 16-stranded parallel β-helix, with a V-shaped cross-section. Several between-strand amino-acid repeats form internal and external ladders. The helical structure is interrupted by several protruding loops that contain motifs associated with the activity of the protein. One such sequence - [GGXXP]5 - appears directly after the RGD motif, and may mediate interaction with epithelial cells. The C-terminal region of P.69 pertactin contains a [PQP]5 motif loop, which contains the major immunoprotective epitope [].The tcfA gene of B. pertussis encodes a unique virulence-associated
factor, Tcf (tracheal colonisation factor) []. The derived amino acid sequence of tcfA predicts a 68kDa RGD-containing, proline-rich protein.Amino acid sequence analysis reveals that the C-terminal 30kDa of this protein shares ~50% identity with the 30kDa C terminus of the Bordetella pertactin precursor.
The Bordetella resistance to killing, brk, locus has cloned and sequenced
and found to encode two divergently transcribed open reading frames (ORFs),designated BrkA and BrkB [
]. Both ORFs are necessary for serum resistance. BrkA shares ~29% identity with pertactin and contains two RGD motifs, in addition to a proteolytic processing site and an outer membrane targeting signal. BrkA, like pertactin, is involved in adherence and invasion.
The YqgF domain family is described as RNase H-like and typified by the Escherichia coli protein YqgF [
].YqgF domain-containing proteins are predicted to be ribonucleases or resolvases based on homology to RuvC Holliday junction resolvases.The group of proteins containing this domain are found primarily in the low-GC Gram-positive bacteria Holliday junction resolvases (HJRs) and in eukaryote orthologs. The RuvC HJRs are conspicuously absent in the low-GC Gram-positive bacterial lineage, with the exception of Ureaplasma urealyticum(
, [
]). Furthermore, loss of function ruvC mutants of Escherichia coli show a residual HJR activity that cannot be ascribed to the prophage-encoded RusA resolvase []. This suggests that the YqgF family proteins could be alternative HJRs whose function partially overlaps with that of RuvC [].The functions of eukaryotic proteins having this domain are less well described. In Saccharomyces cerevisiae (Baker's yeast) Spt6p and its orthologues, the catalytic residues are substituted indicating that they lack the enzymatic function of resolvases [
]. Spt6p has been implicated in transcription initiation [] and in maintaining normal chromatin structure during transcription elongation [].Horizontal gene transfer, lineage-specific gene loss and gene family expansion, and non-orthologous gene displacement seem to have been major forces in the evolution of HJRs and related nucleases. The diversity of HJRs and related nucleases in bacteria and archaea contrasts with their near absence in eukaryotes. The few detected eukaryotic representatives of the endonuclease fold and the RNase H fold have probably been acquired from bacteria via horizontal gene transfer. The identity of the principal HJR(s) involved in recombination in eukaryotes remains uncertain; this function could be performed by topoisomerase IB or by a novel, so far undetected, class of enzymes. Likely HJRs and related nucleases were identified in the genomes of numerous bacterial and eukaryotic DNA viruses. Gene flow between viral and cellular genomes has probably played a major role in the evolution of this class of enzymes.The YqgF domain is also found in Tex proteins, where maintains the core structural elements and aligns especially well with RuvC nucleases, although Tex does not appear to possess nuclease activity [
]. Tex (toxin expression) is a highly conserved bacterial protein involved in expression of critical toxin genes [].
ABC transporters belong to the ATP-Binding Cassette (ABC) superfamily, which uses the hydrolysis of ATP to energise diverse biological systems. ABC transporters minimally consist of two conserved regions: a highly conserved ATP binding cassette (ABC) and a less conserved transmembrane domain (TMD). These can be found on the same protein or on two different ones. Most ABC transporters function as a dimer and therefore are constituted of four domains, two ABC modules and two TMDs.ABC transporters are involved in the export or import of a wide variety of substrates ranging from small ions to macromolecules. The major function of ABC import systems is to provide essential nutrients to bacteria. They are found only in prokaryotes and their four constitutive domains are usually encoded by independent polypeptides (two ABC proteins and two TMD proteins). Prokaryotic importers require additional extracytoplasmic binding proteins (one or more per systems) for function. In contrast, export systems are involved in the extrusion of noxious substances, the export of extracellular toxins and the targeting of membrane components. They are found in all living organisms and in general the TMD is fused to the ABC module in a variety of combinations. Some eukaryotic exporters encode the four domains on the same polypeptide chain [
].The ATP-Binding Cassette (ABC) superfamily forms one of the largest of all protein families with a diversity of physiological functions [
]. Several studies have shown that there is a correlation between the functional characterisation and the phylogenetic classification of the ABC cassette [,
]. More than 50 subfamilies have been described based on a phylogenetic and functional classification [,
,
]; (for further information see http://www.tcdb.org/tcdb/index.php?tc=3.A.1).ABC transporters minimally contain two conserved regions: a highly conserved ATP binding cassette (ABC) and a less conserved transmembrane domain (TMD). In certain bacterial transporters, these regions are found on different polypeptides. The function of the integral inner-membrane protein is to translocate the substrate across the membrane, as well as in substrate recognition [
,
].This entry represents the transmembrane domain in cases where the TMD and ABC region are found in the same protein, and corresponds to ABC type 1 from Transporter Classification Database (http://www.tcdb.org/tcdb/index.php?tc=3.A.1).
Nucleosides are hydrophilic molecules and require specialised transport proteins for permeation of cell membranes. There are two types of nucleoside transport processes: equilibrative bidirectional processes driven by chemical gradients, and inwardly directed concentrative processes driven by an electrochemical gradient [
]. The two types of nucleoside transporters are classified into two families: the solute carrier (SLC) 29 and SLC28 families, corresponding to equilibrative and concentrative nucleoside transporters, respectively [].The microbial proteins include broad specificity transporters, such as the Escherichia coli NupC protein which transports all nucleosides (both ribo- and deoxyribonucleosides) except hypoxanthine and guanine nucleosides [
]. Bacillus subtilis NupC transporter has been shown to be involved in transport of the pyrimidine nucleoside uridine []. A recently characterised fungal protein, the first transporter of this type to be described in eukaryotes, exhibited transport activity for adenosine, uridine, inosine and guanosine but not cytidine, thymidine or the nucleobase hypoxanthine [].The characterised mammalian proteins can be divided into three subgroups; CNT1, CNT2 and CNT3 []. CNT1 preferentially transports pyrimidines and weakly transports adenosine. Several antiviral and anticancer nucleoside analogues, including AZT and dFdC are also substrates for CNT1. CNT2 selectively transports purines, and the human form has also been shown to facilitate the uptake of some antiviral compounds including ddI and ribavirin. CNT3 has a broader specificity, transporting both purines and pyrimidines. Several anticancer nucleoside analogues such as CdA, dFdC and FdU are also transported by CNT3. Substrate specificity appears to depend on a region containing transmembrane regions 7, 8 and 9. Mutation of just four residues in this region was sufficient to convert the activity of human CNT1 to that of CNT2. At least three other concentrative nucleoside transport activities have been described in mammalian cells, but the proteins responsible for these activities have not yet been identified.This entry represents a family of Concentrative Nucleoside Transporter (CNT) proteins found in bacteria and eukaryotes. Most of the bacteria and fungi homologues identified are H+ symporters, while the mammalian members (CNT1/2/3) are mostly Na+ symporters. However, mammalian CNT3 exhibits uniquely broad cation interactions with Na+, H+, and Li+ and can couple transport of uridine to uptake of H(+) [
,
].
Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [
,
,
,
,
]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. In eukaryotes, initiation of DNA replication requires the assembly of pre-replication complexes (pre-RCs) on chromatin during the G1 phase. In the S phase, pre-RCs are activated by two protein kinases, Cdk2 and Cdc7, which results in the loading of replication factors and the unwinding of replication origins by the MCM helicase complex [
]. Cdc7 is a serine/threonine kinase that is conserved from yeast to human. It is regulated by its association with a regulatory subunit, the Dbf4 protein. This complex is often referred to as DDK (Dbf4-dependent kinase) [].DBF4 contains an N-terminal BRCT domain and a C-terminal conserved region that could potentially coordinate one zinc atom, the DBF4-type zinc finger. This entry represents the zinc finger, which is important for the interaction with Cdc7 [
,
].
ABC transporters belong to the ATP-Binding Cassette (ABC) superfamily, which uses the hydrolysis of ATP to energise diverse biological systems. ABC transporters minimally consist of two conserved regions: a highly conserved ATP binding cassette (ABC) and a less conserved transmembrane domain (TMD). These can be found on the same protein or on two different ones. Most ABC transporters function as a dimer and therefore are constituted of four domains, two ABC modules and two TMDs.ABC transporters are involved in the export or import of a wide variety of substrates ranging from small ions to macromolecules. The major function of ABC import systems is to provide essential nutrients to bacteria. They are found only in prokaryotes and their four constitutive domains are usually encoded by independent polypeptides (two ABC proteins and two TMD proteins). Prokaryotic importers require additional extracytoplasmic binding proteins (one or more per systems) for function. In contrast, export systems are involved in the extrusion of noxious substances, the export of extracellular toxins and the targeting of membrane components. They are found in all living organisms and in general the TMD is fused to the ABC module in a variety of combinations. Some eukaryotic exporters encode the four domains on the same polypeptide chain [
].The ABC module (approximately two hundred amino acid residues) is known to bind and hydrolyse ATP, thereby coupling transport to ATP hydrolysis in a large number of biological processes. The cassette is duplicated in several subfamilies. Its primary sequence is highly conserved, displaying a typical phosphate-binding loop: Walker A, and a magnesium binding site: Walker B. Besides these two regions, three other conserved motifs are present in the ABC cassette: the switch region which contains a histidine loop, postulated to polarise the attaching water molecule for hydrolysis, the signature conserved motif (LSGGQ) specific to the ABC transporter, and the Q-motif (between Walker A and the signature), which interacts with the gamma phosphate through a water bond. The Walker A, Walker B, Q-loop and switch region form the nucleotide binding site [,
,
].The 3D structure of a monomeric ABC module adopts a stubby L-shape with two distinct arms. ArmI (mainly β-strand) contains Walker A and Walker B. The important residues for ATP hydrolysis and/or binding are located in the P-loop. The ATP-binding pocket is located at the extremity of armI. The perpendicular armII contains mostly the alpha helical subdomain with the signature motif. It only seems to be required for structural integrity of the ABC module. ArmII is in direct contact with the TMD. The hinge between armI and armII contains both the histidine loop and the Q-loop, making contact with the gamma phosphate of the ATP molecule. ATP hydrolysis leads to a conformational change that could facilitate ADP release. In the dimer the two ABC cassettes contact each other through hydrophobic interactions at the antiparallel β-sheet of armI by a two-fold axis [
,
,
,
,
,
].The ATP-Binding Cassette (ABC) superfamily forms one of the largest of all protein families with a diversity of physiological functions [
]. Several studies have shown that there is a correlation between the functional characterisation and the phylogenetic classification of the ABC cassette [,
]. More than 50 subfamilies have been described based on a phylogenetic and functional classification [,
,
].This domain is found at the N terminus of FbpC, which is part of the ABC transporter complex involved in ferric cation import. The complex is composed of two ATP-binding proteins (FbpC), two transmembrane proteins (FbpB) and a solute-binding protein (FbpA) [
,
].
ABC transporters belong to the ATP-Binding Cassette (ABC) superfamily, which uses the hydrolysis of ATP to energise diverse biological systems. ABC transporters minimally consist of two conserved regions: a highly conserved ATP binding cassette (ABC) and a less conserved transmembrane domain (TMD). These can be found on the same protein or on two different ones. Most ABC transporters function as a dimer and therefore are constituted of four domains, two ABC modules and two TMDs.ABC transporters are involved in the export or import of a wide variety of substrates ranging from small ions to macromolecules. The major function of ABC import systems is to provide essential nutrients to bacteria. They are found only in prokaryotes and their four constitutive domains are usually encoded by independent polypeptides (two ABC proteins and two TMD proteins). Prokaryotic importers require additional extracytoplasmic binding proteins (one or more per systems) for function. In contrast, export systems are involved in the extrusion of noxious substances, the export of extracellular toxins and the targeting of membrane components. They are found in all living organisms and in general the TMD is fused to the ABC module in a variety of combinations. Some eukaryotic exporters encode the four domains on the same polypeptide chain [
].The ABC module (approximately two hundred amino acid residues) is known to bind and hydrolyse ATP, thereby coupling transport to ATP hydrolysis in a large number of biological processes. The cassette is duplicated in several subfamilies. Its primary sequence is highly conserved, displaying a typical phosphate-binding loop: Walker A, and a magnesium binding site: Walker B. Besides these two regions, three other conserved motifs are present in the ABC cassette: the switch region which contains a histidine loop, postulated to polarise the attaching water molecule for hydrolysis, the signature conserved motif (LSGGQ) specific to the ABC transporter, and the Q-motif (between Walker A and the signature), which interacts with the gamma phosphate through a water bond. The Walker A, Walker B, Q-loop and switch region form the nucleotide binding site [,
,
].The 3D structure of a monomeric ABC module adopts a stubby L-shape with two distinct arms. ArmI (mainly β-strand) contains Walker A and Walker B. The important residues for ATP hydrolysis and/or binding are located in the P-loop. The ATP-binding pocket is located at the extremity of armI. The perpendicular armII contains mostly the alpha helical subdomain with the signature motif. It only seems to be required for structural integrity of the ABC module. ArmII is in direct contact with the TMD. The hinge between armI and armII contains both the histidine loop and the Q-loop, making contact with the gamma phosphate of the ATP molecule. ATP hydrolysis leads to a conformational change that could facilitate ADP release. In the dimer the two ABC cassettes contact each other through hydrophobic interactions at the antiparallel β-sheet of armI by a two-fold axis [
,
,
,
,
,
].The ATP-Binding Cassette (ABC) superfamily forms one of the largest of all protein families with a diversity of physiological functions [
]. Several studies have shown that there is a correlation between the functional characterisation and the phylogenetic classification of the ABC cassette [,
]. More than 50 subfamilies have been described based on a phylogenetic and functional classification [,
,
].This domain is found at the N terminus of MalK, which is part of the ABC transporter complex involved in maltose/maltodextrin import. The complex is composed of two ATP-binding proteins (MalK), two transmembrane proteins (MalG and MalK) and a solute-binding protein (MalE) [
,
,
].
RNA-directed RNA polymerase (RdRp) (
) is an essential protein encoded in the genomes of all RNA containing viruses with no DNA stage [
,
]. It catalyses synthesis of the RNA strand complementary to a given RNA template, but the precise molecular mechanism remains unclear.The postulated RNA replication process is a two-step mechanism. First, the initiation step of RNA synthesis begins at or near the 3' end of the RNA template by means of a primer-independent (de novo) mechanism. The de novo initiation consists in the addition of a nucleotide tri-phosphate (NTP) to the 3'-OH of the first initiating NTP. During the following so-called elongation phase, this nucleotidyl transfer reaction is repeated with subsequent NTPs to generate the complementary RNA product [
]. All the RNA-directed RNA polymerases, and many DNA-directed polymerases, employ a fold whose organisation has been likened to the shape of a right hand with three subdomains termed fingers, palm and thumb [
]. Only the catalytic palm subdomain, composed of a four-stranded antiparallel β-sheet with two α-helices, is well conserved among all of these enzymes. In RdRp, the palm subdomain comprises three well conserved motifs (A, B and C). Motif A (D-x(4,5)-D) and motif C (GDD) are spatially juxtaposed; the Asp residues of these motifs are implied in the binding of Mg2+ and/or Mn2+. The Asn residue of motif B is involved in selection of ribonucleoside triphosphates over dNTPs and thus determines whether RNA is synthesised rather than DNA [
].The domain organisation [
] and the 3D structure of the catalytic centre of a wide range of RdPp's, even those with a low overall sequence homology, are conserved. The catalytic centre is formed by several motifs containing a number of conserved amino acid residues.There are 4 superfamilies of viruses that cover all RNA containing viruses with no DNA stage:
Viruses containing positive-strand RNA or double-strand RNA, except retroviruses and Birnaviridae: viral RNA-directed RNA polymerases including all positive-strand RNA viruses with no DNA stage, double-strand RNA viruses, and the Cystoviridae, Reoviridae, Hypoviridae, Partitiviridae, Totiviridae families.Mononegavirales (negative-strand RNA viruses with non-segmented genomes).Negative-strand RNA viruses with segmented genomes, i.e. Orthomyxoviruses (including influenza A, B, and C viruses, Thogotoviruses, and the infectious salmon anemia virus), Arenaviruses, Bunyaviruses, Hantaviruses, Nairoviruses, Phleboviruses, Tenuiviruses and Tospoviruses.Birnaviridae family of dsRNA viruses.The RNA-directed RNA polymerases in the first of the above superfamilies can be divided into the following three subgroups:All positive-strand RNA eukaryotic viruses with no DNA stage.All RNA-containing bacteriophages -there are two families of RNA-containing bacteriophages: Leviviridae (positive ssRNA phages) and Cystoviridae (dsRNA phages).Reoviridae family of dsRNA viruses.This entry represents RNA-directed RNA polymerase (also known as the large structural protein) from Mononegavirales including Paramyxoviruses [
]. The large structural protein (or L protein) carries four enzymatic activities: RNA-directed RNA polymerase (), mRNA (guanine-N(7)-)-methyltransferase (
), mRNA guanylyltransferase (
), and poly(A) synthetase. The viral mRNA guanylyl transferase displays a different biochemical reaction than the cellular enzyme. The template is composed of the viral RNA tightly encapsidated by the nucleoprotein (N). The protein can function either as transcriptase or as replicase. The transcriptase synthesises subsequently subgenomic RNAs, assuring their capping and polyadenylation by a stuttering mechanism. The transcriptase stutters on a specific sequence, resulting on a cotranscriptional editing of the phosphoprotein (P) mRNA. The replicase mode is dependent on intracellular N protein concentration. In this mode, the polymerase replicates the whole viral genome without recognizing the transcriptional signals. 5' GpppApGpG sequence is required for mRNA cap methylation by the enzyme.
ABC transporters belong to the ATP-Binding Cassette (ABC) superfamily, which uses the hydrolysis of ATP to energise diverse biological systems. ABC transporters minimally consist of two conserved regions: a highly conserved ATP binding cassette (ABC) and a less conserved transmembrane domain (TMD). These can be found on the same protein or on two different ones. Most ABC transporters function as a dimer and therefore are constituted of four domains, two ABC modules and two TMDs.ABC transporters are involved in the export or import of a wide variety of substrates ranging from small ions to macromolecules. The major function of ABC import systems is to provide essential nutrients to bacteria. They are found only in prokaryotes and their four constitutive domains are usually encoded by independent polypeptides (two ABC proteins and two TMD proteins). Prokaryotic importers require additional extracytoplasmic binding proteins (one or more per systems) for function. In contrast, export systems are involved in the extrusion of noxious substances, the export of extracellular toxins and the targeting of membrane components. They are found in all living organisms and in general the TMD is fused to the ABC module in a variety of combinations. Some eukaryotic exporters encode the four domains on the same polypeptide chain [
].The ABC module (approximately two hundred amino acid residues) is known to bind and hydrolyse ATP, thereby coupling transport to ATP hydrolysis in a large number of biological processes. The cassette is duplicated in several subfamilies. Its primary sequence is highly conserved, displaying a typical phosphate-binding loop: Walker A, and a magnesium binding site: Walker B. Besides these two regions, three other conserved motifs are present in the ABC cassette: the switch region which contains a histidine loop, postulated to polarise the attaching water molecule for hydrolysis, the signature conserved motif (LSGGQ) specific to the ABC transporter, and the Q-motif (between Walker A and the signature), which interacts with the gamma phosphate through a water bond. The Walker A, Walker B, Q-loop and switch region form the nucleotide binding site [,
,
].The 3D structure of a monomeric ABC module adopts a stubby L-shape with two distinct arms. ArmI (mainly β-strand) contains Walker A and Walker B. The important residues for ATP hydrolysis and/or binding are located in the P-loop. The ATP-binding pocket is located at the extremity of armI. The perpendicular armII contains mostly the alpha helical subdomain with the signature motif. It only seems to be required for structural integrity of the ABC module. ArmII is in direct contact with the TMD. The hinge between armI and armII contains both the histidine loop and the Q-loop, making contact with the gamma phosphate of the ATP molecule. ATP hydrolysis leads to a conformational change that could facilitate ADP release. In the dimer the two ABC cassettes contact each other through hydrophobic interactions at the antiparallel β-sheet of armI by a two-fold axis [
,
,
,
,
,
].The ATP-Binding Cassette (ABC) superfamily forms one of the largest of all protein families with a diversity of physiological functions [
]. Several studies have shown that there is a correlation between the functional characterisation and the phylogenetic classification of the ABC cassette [,
]. More than 50 subfamilies have been described based on a phylogenetic and functional classification [,
,
].Most bacterial importers employ a periplasmic substrate-binding protein (PBP) that delivers the ligand to the extracellular gate of the TM domains. These proteins bind their substrates selectively and with high affinity, which is thought to ensure the specificity of the transport reaction. Binding proteins in Gram-negative bacteria are present within the periplasm, whereas those in Gram-positive bacteria are tethered to
the cell membrane via the acylation of a cysteine residue that is an integralcomponent of a lipoprotein signal sequence. In planta expression of a high-affinity iron-uptake system involving the siderophore chrysobactin in Erwinia chrysanthemi 3937 contributes greatly to invasive growth of this pathogen on its natural host, African violets [
]. The cobalamin (vitamin B12) andthe iron transport systems share many common attributes and probably evolved
from the same origin [,
].The periplasmic-binding domain is composed of two subdomains,
each consisting of a central β-sheet and surrounding α-helices, linkedby a rigid α-helix. The substrate binding site is located
in a cleft between the two alpha/beta subdomains [].This entry represents bacterial BtuD, which is a component of the vitamin B12 ABC transport system. BtuD are the ATP-binding cassettes [
].
This family consists of several eukaryotic transcription elongation Spt5 proteins. These proteins contain two copies of a domain (Supt5;
) that is characteristic of proteins involved in chromatin regulation. An NGN domain separates the Supt5 domains. In yeast Spt5 protein, this domain possesses a RNP-like fold and it is thought to confer affinity for Spt4 protein. Supt5 domains are followed by four to five copies of a KOW domain (), present in many ribosomal proteins.
Three transcription-elongation factors Spt4, Spt5, and Spt6 are conserved among eukaryotes and are essential for transcription via modulation of chromatin structure. Spt4 and Spt5 are tightly associated in a complex, while the physical association Spt6 is considerably weaker. It has been demonstrated that Spt4, Spt5, and Spt6 play roles in transcription elongation in both yeast and humans, including a role in activation by Tat. It is known that Spt4, Spt5, and Spt6 are general transcription-elongation factors, controlling transcription both positively and negatively in important regulatory and developmental roles [
].This information was partially derived from InterPro (
).
This superfamily represents the N-terminal winged helix domain of the vps25 subunit (vacuolar protein sorting-associated protein 25) of the endosome-associated complex ESCRT-II (Endosomal Sorting Complexes Required for Transport protein II). ESCRT (ESCRT-I, -II, -III) complexes orchestrate efficient sorting of ubiquitinated transmembrane receptors to lysosomes via multivesicular bodies (MVBs) [
]. ESCRT-II recruits the transport machinery for protein sorting at MVB []. In addition, the human ESCRT-II has been shown to form a complex with RNA polymerase II elongation factor ELL in order to exert transcriptional control activity. ESCRT-II transiently associates with the endosomal membrane and thereby initiates the formation of ESCRT-III, a membrane-associated protein complex that functions immediately downstream of ESCRT-II during sorting of MVB cargo. ESCRT-II in turn functions downstream of ESCRT-I, a protein complex that binds to ubiquitinated endosomal cargo [].ESCRT-II is a trilobal complex composed of two copies of vps25, one copy of vps22 and the C-terminal region of vps36. The crystal structure of vps25 revealed two winged-helix domains, the N-terminal domain of vps25 interacting with vps22 and vps35 [
].
This entry represents the Vps25 subunit (vacuolar protein sorting-associated protein 25) of the endosome-associated complex ESCRT-II (Endosomal Sorting Complexes Required for Transport protein II). ESCRT (ESCRT-I, -II, -III) complexes orchestrate efficient sorting of ubiquitinated transmembrane receptors to lysosomes via multivesicular bodies (MVBs) [
]. ESCRT-II recruits the transport machinery for protein sorting at MVB []. In addition, the human ESCRT-II has been shown to form a complex with RNA polymerase II elongation factor ELL in order to exert transcriptional control activity. ESCRT-II transiently associates with the endosomal membrane and thereby initiates the formation of ESCRT-III, a membrane-associated protein complex that functions immediately downstream of ESCRT-II during sorting of MVB cargo. ESCRT-II in turn functions downstream of ESCRT-I, a protein complex that binds to ubiquitinated endosomal cargo [].ESCRT-II is a trilobal complex composed of two copies of vps25, one copy of vps22 and the C-terminal region of vps36. The crystal structure of vps25 revealed two winged-helix domains, the N-terminal domain of vps25 interacting with vps22 and vps35 [
].
This entry represents a β-propeller domain found in galactose oxidase and in Kelch repeat-containing proteins.The known functions of kelch-containing proteins are diverse: scruin is an actin cross-linking protein; galactose oxidase catalyses the oxidation of the hydroxyl group at the C6 position in D-galactose; neuraminidase hydrolyses sialic acid residues from glycoproteins; and kelch may have a cytoskeletal function, as it is localised to the actin-rich ring canals that connect the 15 nurse cells to the developing oocyte in Drosophila [
]. Nevertheless, based on the location of the kelch pattern in the catalytic unit in galactose oxidase, functionally important residues have been predicted in glyoxal oxidase [].Galactose oxidase (
) is a monomeric enzyme that contains a single copper ion and catalyses the stereospecific oxidation of primary alcohols to their corresponding aldehyde [
]. The protein contains an unusual covalent thioether bond between a tyrosine and a cysteine that forms during its maturation []. Galactose oxidase is a three-domain protein: the N-terminal domain forms a jelly-roll sandwich, the central domain forms a seven 4-bladed β-propeller, and the C-terminal domain has an immunoglobulin-like fold.
This entry represents the paired amphipathic helix (PAH) repeat. Sin3 proteins have at least three PAH domains (PAH1, PAH2, and PAH3) [
,
]. They are components of a co-repressor complex that silences transcription, playing important roles in the transition between proliferation and differentiation. Sin3 proteins are recruited to the DNA by various DNA-binding transcription factors such as the Mad family of repressors, Mnt/Rox, PLZF, MeCP2, p53, REST/NRSF, MNFbeta, Sp1, TGIF and Ume6 []. Sin3 acts as a scaffold protein that in turn recruits histone-binding proteins RbAp46/RbAp48 and histone deacetylases HDAC1/HDAC2, which deacetylate the core histones resulting in a repressed state of the chromatin []. The PAH domains are protein-protein interaction domains through which Sin3 fulfils its role as a scaffold. The PAH2 domain of Sin3 can interact with a wide range of unrelated and structurally diverse transcription factors that bind using different interaction motifs. For example, the Sin3 PAH2 domain can interact with the unrelated Mad and HBP1 factors using alternative interaction motifs that involve binding in opposite helical orientations [].
This family belongs to the AAA+ (ATPase associated with diverse cellular activities) superfamily. Most of these proteins of this family (ClpA, ClpC, ClpD, ClpE, ClpX, HslU) are involved in proteolysis and associate with a separate proteolytic subunit (ClpP, HslV) to form the active protease. ClpB is not involved in proteolysis but rather acts in collaboration with the DnaK (Hsp70) chaperone system to disassemble and refold large protein aggregates.A group of ATP-binding proteins that includes the regulatory subunit of the
ATP-dependent protease clpA; heat shock proteins clpB, 104 and 78; and chloroplast proteins CD4a (ClpC) and CD4b belong to this family [,
]. The proteins are thought to protect cells from stress by controlling the aggregation and denaturation of vital cellular structures. They vary in size, but share a domain
which contains an ATP-binding site.These signatures which span the ATP binding region also identify the bacterial DNA polymerase III subunit tau (
), ATP-dependent protease La (
) and the mitochondrial lon protease homologue (
), both of which belong to MEROPS peptidase family S16.
This entry represents a DNA-binding domain with a helix-turn-helix (HTH) structure that is found in several bacterial and archaeal transcriptional regulators, such as TetR, the tetracycline resistance repressor. Numerous other transcriptional regulatory proteins also contain HTH-type DNA-binding domains, and can be grouped into subfamiles based on sequence similarity. The domain represented by this entry is found in a subfamily of proteins that includes the transcriptional regulators TetR, TetC, AcrR, BetI, Bm3R1, EnvR, QacR, MtrR, TcmR, Ttk, YbiH, and YhgD [
,
,
]. Many of these proteins function as repressors that control the level of susceptibility to hydrophobic antibiotics and detergents. They all have similar molecular weights, ranging from 21 to 25kDa. The helix-turn-helix motif is located in the initial third of the protein. The 3D structure of the homodimeric TetR protein complexed with 7-chloro-tetracycline-magnesium has been determined to 2.1 A resolution []. TetR folds into ten α-helices with connecting turns and loops. The three N-terminal α-helices of the repressor form the DNA-binding domain: this structural motif encompasses an HTH fold with an inverse orientation compared with that of other DNA-binding proteins.
Integration host factor (IHF) (,
) is a small heterodimeric protein that binds the minor groove of DNA in a sequence-specific manner and induces a large bend. This bending stabilises distinct DNA conformations that are required during several bacterial processes, such as recombination, transposition, replication and transcription [
]. The core structure of IHF consists of a partly opened 4-helical bundle that is capped with a β-sheet.Prokaryotic protein HU and the bacteriophage SPO1 transcription factor TF1 are closely related to IHF. These proteins are collectively referred to as type II DNA-binding proteins (DBPII), forming a group of basic, dimeric proteins found in all bacteria that are able to bind DNA to induce and stabilise DNA bending. HU plays a structural role in replication initiation, transcription regulation, site-specific recombination, and the compaction of the bacterial genome [
]. TF1 is essential for viral multiplication [].The DNA-binding domain of the TraM protein (
), an essential component of the DNA transfer machinery of the conjugative resistance plasmid R1, appears to have a similar structure to DBPII [
].
The BON domain is typically ~60 residues long and has an α/β fold. There is a conserved glycine residue and several hydrophobic regions which suggests a binding function, and, actually, it contains a phospholipid-binding site
,
]. Most proteobacteria seem to possess one or two BON-containing proteins, typically of the OsmY-type proteins [,
,
]; outside of this group the distribution is more disparate. The OsmY protein is an Escherichia coli 20kDa outer membrane or periplasmic protein that is expressed in response to a variety of stress conditions, in particular, helping to provide protection against osmotic shock. One hypothesis is that OsmY prevents shrinkage of the cytoplasmic compartment by contacting the phospholipid interfaces surrounding the periplasmic space. The domain architecture of two BON domains alone suggests that these domains contact the surfaces of phospholipids, with each domain contacting a membrane [
].In the potassium binding protein Kbp, this domain is able to bind K [
].This domain is also found in ArfA, a membrane protein required for supporting bacterial growth in acidic environments [
].
One of the virulence mechanisms of Escherichia coli is the production of toxins from dedicated machineries called secretion systems. Seven secretion systems have been described, which assemble from 3 to up-to more than 20 subunits. These secretion systems derive from or have co-evolved with bacterial organelles such as ABC transporters (type I), type IV pili (type 2), flagella (type 3), or conjugative machines (type IV).The type VI secretion system (T6SS) is present in most pathogens that have contact with animals, plants, or humans. T6SS exports Hcp (Haemolysin-Coregulated Protein) and a class of proteins named Vgr (Val-Gly Repeats), whose exact function is still speculative. In addition to Vgr and Hcp proteins, T6SS is characterised by the presence of an AAA+ Clp-like ATPase and of two additional genes icmF and dotU, encoding homologues of T4SS stabilising proteins [
].SciN is a lipoprotein tethered to the outer membrane and expressed in the periplasm of E. coli, and is essential for T6SS-dependent secretion of the Hcp-like SciD protein and for biofilm formation [
].
This 70 residue domain, known as SPOR domain, is composed of two 35 residue repeats that are found in bacterial proteins involved in sporulation and cell division, such as FtsN, CwlM and RlpA. The SPOR domains in the FtsN cell division proteins from Escherichia coli and Caulobacter crescentus have been shown to bind peptidoglycan. SPOR domains can localise to the division site by binding preferentially to septal peptidoglycan [
,
,
].FtsN is an essential cell division protein with a simple bitopic topology: a short N-terminal cytoplasmic segment fused to a large carboxy periplasmic domain through a single transmembrane domain. The repeats lie at the periplasmic C terminus, which has an RNP-like fold [
]. FtsN localises to the septum ring complex. The CwlM protein is a cell wall hydrolase, where the C-terminal region, including the repeats, determines substrate specificity []. RlpA is a rare lipoprotein A protein that may be important for cell division. Its N-terminal cysteine may be attached to thioglyceride and N-fatty acyl residues [].
A cysteine-rich domain (CRD) is an essential component of a number of cell surface receptors, which are involved in multiple signal transduction pathways, particularly in modulating the activity of the Wnt proteins, which play a fundamental role in the early development of metazoans. CRD is also found in secreted frizzled related proteins (SFRPs), which lack the transmembrane segment found in the frizzled protein. The CRD domain is also present in the alpha-1 chain of mouse type XVIII collagen, in carboxypeptidase Z, several receptor tyrosine kinases, and the mosaic transmembrane serine protease corin. The CRD domain is well conserved in metazoans - 10 frizzled proteins have been identified in mammals, 4 in Drosophila and 3 in Caenorhabditis elegans. CRD domains have also been identified in multiple tandem copies in a Dictyostelium discoideum protein. Very little is known about the mechanism by which CRD domains interact with their ligands. The domain contains 10 conserved cysteines.This entry represents the CRD domain of lin-17, a protein involved in cell type specification during Caenorhabditis elegans vulval development [
,
].
Serine/threonine protein phosphatase-5 (PP5) is a member of the PPP gene family of protein phosphatases that is highly conserved among eukaryotes and widely expressed in mammalian tissues. PP5 has a C-terminal phosphatase domain and an extended N-terminal TPR (tetratricopeptide repeat) domain containing three TPR motifs [
,
,
,
,
]. This entry represents the C-terminal phosphatase domain. Proteins containing this domain also include yeast Ppt1, which is a serine/threonine phosphatase that regulates Hsp90 chaperone by affecting its ATPase and cochaperone binding activitie []. The PPP (phosphoprotein phosphatase) family is one of two known protein phosphatase families specific for serine and threonine. The PPP family also includes: PP1, PP2A, PP2B (calcineurin), PP4, PP6, PP7, Bsu1, RdgC, PrpE, PrpA/PrpB, and ApA4 hydrolase. The PPP catalytic domain is defined by three conserved motifs (-GDXHG-, -GDXVDRG- and -GNHE-). The PPP enzyme family is ancient with members found in all eukaryotes, and in most bacterial and archeal genomes. Dephosphorylation of phosphoserines and phosphothreonines on target proteins plays a central role in the regulation of many cellular processes [
,
]. PPPs belong to the metallophosphatase (MPP) superfamily.
Syntaxin 8 forms a complex with syntaxin 7 (Qa), Vti1b (Qb) and either VAMP7 or VAMP8 (R-SNARE) and is involved in the transport from early endosomes to the lysosome [
,
]. Syntaxin 8 is a member of the Qc subgroup of SNARE (soluble N-ethylmaleimide-sensitive factor attachment protein receptor) proteins, which consist of coiled-coil helices (called SNARE motifs) that mediate the interactions between SNARE proteins, and a transmembrane domain []. The SNARE complexes mediate membrane fusion, important for trafficking of newly synthesized proteins, recycling of pre-existing proteins and organelle formation [
]. SNARE proteins are classified into four groups, Qa-, Qb-, Qc- and R-SNAREs, depending on whether the residue in the hydrophilic centre layer of the four-helical bundle is a glutamine (Q) or arginine (R). Qa-, as well as Qb- and Qc-SNAREs, are localized to target organelle membranes, while R-SNARE is localized to vesicle membranes. They form unique complexes consisting of one member of each subgroup, that mediate fusion between a specific type of vesicles and their target organelle [].
The Dok family adapters are phosphorylated by different protein tyrosine kinases. Dok proteins are involved in processes such as modulation of cell differentiation and proliferation, as well as in control of the cell spreading and migration The Dok protein contains an N-terminal pleckstrin homology (PH) domain followed by a central phosphotyrosine binding (PTB) domain, which has a PH-like fold, and a proline- and tyrosine-rich C-terminal tail. The PH domain binds to acidic phospholids and localizes proteins to the plasma membrane, while the PTB domain mediates protein-protein interactions by binding to phosphotyrosine-containing motifs [
].There are 7 mammalian Dok member (Dok-1 to 7). Dok-1 and Dok-2 act as negative regulators of the Ras-Erk pathway downstream of many immunoreceptor-mediated signaling systems, and it is believed that recruitment of p120 rasGAP by Dok-1 and Dok-2 is critical to their negative regulation [
]. Dok-3 is a negative regulator of the activation of JNK and mobilization of Ca2+ in B-cell receptor-mediated signaling, interacting with SHIP-1 and Grb2 [].This entry represents the PTB domain of Dok1/2/3.
The fungus-specific velvet family of regulatory proteins plays a key role in
coordinating secondary metabolism and differentiation processes such asasexual or sexual sporulation and sclerotia or fruiting body formation. These
velvet regulators are present in most parts of the fungal kingdom fromchytrids to basidiomycetes. Velvet proteins interact with each other, alone
("homodimers"), in various combinations ("heterodimers"), and also with other
proteins. The velvet proteins share a homologous region comprising about 150amino acids, which lack significant sequence similarity to any other known
proteins. The velvet domain is involved in specific DNA binding as well as inthe dimerization of the different velvet proteins, resulting in the formation
of homo- and heterodimers [,
].The velvet domain is an RHD-like domain related to NF-kappaB. It folds into a highly twisted β-sandwich composed of seven
antiparallel β-strands. One side of the β-sandwich isinvolved in dimer formation, whereas the other one is flanked by several loops
of which two fold into an α-helix. These α-helical fragments arelocated between β-strands 2 and 3 and at the C terminus [
,
].
Escherichia coli stringent starvation protein B (SspB), is thought to enhance the specificity of degradation of tmRNA-tagged proteins by the ClpXP protease. The tmRNA tag, also known as ssrA, is an 11-aa peptide added to the C terminus of proteins stalled during translation, targets proteins for degradation by ClpXP and ClpAP. SspB is a cytoplasmic protein that specifically binds to residues 1-4 and 7 of the tag. Binding of SspB enhances degradation of tagged proteins by ClpX, and masks sequence elements important for ClpA interactions, inhibiting degradation by ClpA [
]. However, more recent work has cast doubt on the importance of SspB in wild-type cells []. SspB is encoded in an operon whose synthesis is stimulated by carbon, amino acid, and phosphate starvation. SspB may play a special role during nutrient stress, for example by ensuring rapid degradation of the products of stalled translation, without causing a global increase in degradation of all ClpXP substrates [].The structure of SspB revealed an SH3-like topology, sharing some similarity with the Sm-like fold [
].
This 70 residue domain, known as SPOR domain, is composed of two 35 residue repeats that are found in bacterial proteins involved in sporulation and cell division, such as FtsN, CwlM and RlpA. The SPOR domains in the FtsN cell division proteins from Escherichia coli and Caulobacter crescentus have been shown to bind peptidoglycan. SPOR domains can localise to the division site by binding preferentially to septal peptidoglycan [
,
,
].FtsN is an essential cell division protein with a simple bitopic topology: a short N-terminal cytoplasmic segment fused to a large carboxy periplasmic domain through a single transmembrane domain. The repeats lie at the periplasmic C terminus, which has an RNP-like fold [
]. FtsN localises to the septum ring complex. The CwlM protein is a cell wall hydrolase, where the C-terminal region, including the repeats, determines substrate specificity []. RlpA is a rare lipoprotein A protein that may be important for cell division. Its N-terminal cysteine may be attached to thioglyceride and N-fatty acyl residues [].
Breast cancer anti-estrogen resistance protein 3 (BCAR3) is an SH2-containing signal transducer that regulates the proliferation in breast cancer cells [
]. BCAR3 binds to the adaptor molecule p130Cas (also known as BCAR1), which function as key signalling nodes with important regulatory roles in normal and pathological cells []. BCAR3 promotes cell motility by regulating actin cytoskeletal and adhesion remodeling in invasive breast cancer cells []. It also promotes interactions between p130Cas and the protein tyrosine kinase c-Src, leading to increased c-Src kinase activity and p130Cas phosphorylation [].This entry represents the SH2 domain found in SHEP1 (also known as SH2D3C), BCAR3 and NSP1 (also known as SH2D3A). SHEP1, BCAR3 and NSP1 are cytoplasmic proteins involved in cell adhesion/migration and antiestrogen resistance. All three proteins contain an SH2 domain and an exchange factor-like domain that binds both Ras GTPases and the scaffolding protein Cas [
]. In general, SH2 domains are involved in signal transduction. They typically bind pTyr-containing ligands via two surface pockets, a pTyr and hydrophobic binding pocket, allowing proteins with SH2 domains to localize to tyrosine phosphorylated sites [].
The L27 domain is a ~50-amino acid module, initially identified in the Lin-2
and Lin-7 proteins, that exists in a large family of animal scaffold proteins[
]. The L27 domain is a specific protein-protein interaction module capable of forming heteromeric complexes that can integrate multiple scaffold proteins into supramolecular assemblies required for establishment and maintenance of cell polarity. The L27 domain can be found as a single occurrence or as a duplication in association with other domains such as PDZ,SH3, the guanylate kinase domain or the serine/threonine protein kinase domain.
The main features of the L27 domain are conserved negatively charged residues
and a conserved aromatic amino acid []. Study of individual L27 domains revealed largely unfolded domains that require the formation of obligateheterodimers to achieve well-folded structures. Each L27 domain is composed of
three helices. The two L27 domains heterodimerize by building a compactstructure consisting of a four-helix bundle formed by the first two helices of
each L27 domain and one coiled-coil formed by the third helix of each domain [,
].
This entry represents the RZ type zinc finger domain, found in NFX1-type zinc finger-containing protein 1 (ZNFX1) and E3 ubiquitin-protein ligase RNF213 from humans and other eukaryotic proteins. It contains two conserved histidine residues and four conserved cysteine residues in a CHC3H motif. It can bind divalent transition metals, with a preference for Zn2+.ZNFX1 is an RNA-binding protein that initiates the antiviral response and is required to restrict the replication of RNA viruses [
,
].RNF213 is a giant E3 ubiquitin ligase that can catalyse ubiquitination of both proteins and lipids, which is involved in various processes, such as lipid metabolism, angiogenesis and cell-autonomous immunity. This protein functions as a key immune sensor by catalysing ubiquitination of the lipid A moiety of bacterial lipopolysaccharide (LPS) via its RZ-type zinc-finger: restricts the proliferation of cytosolic bacteria, such as Salmonella, by generating the bacterial ubiquitin coat through the ubiquitination of LPS. It is a major susceptibility factor of Moyamoya disease, a cerebrovascular disorder that can result in stroke or death [
,
,
,
,
].
The RGS domain is an essential part of the Rho guanine nucleotide exchange factor 11, also known as PDZ-RhoGEF (PDZ:Postsynaptic density 95, Disk large, Zona occludens-1; RhoGEF: Rho guanine nucleotide exchange factor; alias PRG), a member of RhoGEFs subfamily of the RGS protein family. The RhoGEFs are peripheral membrane proteins that regulate essential cellular processes, including cell shape, cell migration, and cell cycle progression, as well as gene transcription by linking signals from heterotrimeric G-alpha12/13 protein-coupled receptors to Rho GTPase activation, leading to various cellular responses, such as actin reorganization and gene expression [
,
]. RhoGEFs subfamily includes leukemia-associated RhoGEF protein (LARG), p115RhoGEF, PDZ-RhoGEF and its rat specific splice variant GTRAP48. The RGS domain of RhoGEFs has very little sequence similarity with the canonical RGS domain of the RGS proteins and is often refered to as RH (RGS Homology) domain. In contrast to p115RhoGEF and LARG, PDZ-RhoGEF cannot serve as a GTPase-activating protein (GAP), due to the mutation of sites in the RGS domain region that are crucial for GAP activity [
].
The Shroom family is a small group of related proteins that are defined by sequence similarity and in most cases by some link to the actin cytoskeleton. The Shroom (Shrm) protein family is found only in animals. Proteins of this family are predicted to be utilised in multiple morphogenic and developmental processes across animal phyla to regulate cells shape or intracellular architecture in an actin and myosin-dependent manner [
]. The Shrm family consists of:Shrm1 (formerly Apx), first found in Xenopus [
]. Human Shroom1 links a membrane bound protein to the actin cytoskeleton [].Shrm2 (formerly Apxl), a protein involved in the morphogenesis, maintenance, and/or function of vascular endothelial cells. Shrm3 (formerly Shroom), a protein necessary for neural tube closure in vertebrate development as deficiency in Shrm results in spina bifida. Shrm3 is also conserved in some invertebrates, as orthologues can be found in sea urchins. Shrm4, a regulator of cyto-skeletal architecture that may play an important role in vertebrate development. It is implicated in X-linked mental retardation in humans.
Claudins form the paracellular tight junction seal in epithelial tissues. In humans, 24 claudins (claudin 1-24) have been identified. Their ability to polymerise and form strands is affected by the cell types [
,
,
]. They can also form heteropolymers with each other within and between tight junction strands []. Most of the claudins (claudin-12 being the exception) have a C-terminal PDZ-binding motif that can interact with other PDZ domain proteins, such as scaffolding protein, ZO-1, -2 and -3 []. They also interact with non-tight junction proteins, such as cell adhesion proteins EpCam and tetraspanins and the signaling proteins, ephrin A and B and their receptors, EphA and EphB [].Claudin-11 was originally termed oligodendrocyte-specific protein (OSP).
It was reclassified as claudin-11 due to its sequence similarity to claudins and its ability to form TJ strands in transfected fibroblasts.
Claudin-11 expression is highly regulated during development and it has been postulated that it may play an important role in the growth and
differentiation of oligodendrocytes and other cells outside the CNS [].
The ARFGAP domain was first identified in the cell cycle control GTPase activating protein (GAP) GCS1 [
]. GCS1 is important for the inactivation of the ADP ribosylation factor ARF a member of the Ras superfamily of GTP-binding proteins. GTP-bound form of ARF is essential for the maintenance of normal Golgi morphology, it participates in recruitment of coat proteins which are required for budding and fission of membranes. Before the fusion with an acceptor compartment the membrane must be uncoated. This step required the hydrolysis of GTP associated to ARF, a process dependent on the ARFGAP domain of GCS1 [].The ARFGAP domain contains a characteristic zinc finger motif (Cys-x2-Cys-x(16,17)-Cys-x2-Cys) which displays some similarity to the C4-type GATA zinc finger. The ARFGAP domain display no obvious similarity to other GAP proteins. However a C4-type zinc finger is also found in the ARD1 GAP domain [
]. This entry represents a structure domain containing the zinc finger motif that can also be found in bacterial protein RecO. RecO is a DNA repair protein involved in damage avoidance-tolerance pathway(s) [
].
The fungus-specific velvet family of regulatory proteins plays a key role in
coordinating secondary metabolism and differentiation processes such asasexual or sexual sporulation and sclerotia or fruiting body formation. These
velvet regulators are present in most parts of the fungal kingdom fromchytrids to basidiomycetes. Velvet proteins interact with each other, alone
("homodimers"), in various combinations ("heterodimers"), and also with other
proteins. The velvet proteins share a homologous region comprising about 150amino acids, which lack significant sequence similarity to any other known
proteins. The velvet domain is involved in specific DNA binding as well as inthe dimerization of the different velvet proteins, resulting in the formation
of homo- and heterodimers [,
].The velvet domain is an RHD-like domain related to NF-kappaB. It folds into a highly twisted β-sandwich composed of seven
antiparallel β-strands. One side of the β-sandwich isinvolved in dimer formation, whereas the other one is flanked by several loops
of which two fold into an α-helix. These α-helical fragments arelocated between β-strands 2 and 3 and at the C terminus [
,
].
The malF gene product (MalF) is an inner membrane component of the maltose transport system in Escherichia coli. Some gene fusions between malF and lacZ produce hybrid proteins which are membrane-bound while other fusions produce hybrid proteins which are cytoplasmic [
,
].The MalF protein belongs to the ABC transporter superfamily, the binding-protein-dependent transport system permease family and the MalFG subfamily. Hydrophobic proteins from the ABC transporter superfamily display a conserved, at least 20 amino acid region (EAA-X3-G-X9-I-X-LP), exposed in the cytosol, called the EAA region. This region is presume to be a recognition site for the ABC ATPase helical domain [
]. MalF is a constituent of the MalFGK(2) maltose transport complex in Escherichia coli. It is targeted via the SRP pathway to the Sec/YidC insertion site. YidC supports the folding of MalF into a stable conformation before it is incorporated into the maltose transport complex [
].The MalF structure is composed of eight transmembrane domains [
]. This entry corresponds to the N-terminal domain of the protein.
Adenylate cyclase catalyses the conversion of ATP to 3',5'-cyclic AMP (cAMP) and pyrophosphate. It plays an essential role in the regulation of cellular metabolism by catalysing the synthesis of a second messenger, cAMP. G protein-mediated signalling is implicated in yeast and fungal cAMP pathways. The cAMP-PKA pathway consists of an extracellular ligand-sensitive G protein-coupled receptor, a G protein signal transmitter, and the effector adenylate cyclase. The product of adenylate cyclase, cAMP, acts as an intracellular second messenger [
].GTP-bound RAS2 is required to elicit magnesium-dependent adenylyl cyclase activity in Saccharomyces cerevisiae. In Schizosaccharomyces pombe, however, the cyclase is probably not regulated by RAS proteins, but is activated by git1.In S. pombe, Gpa2 Galpha binds an N-terminal domain of adenylate cyclase, comprising a moderately conserved sequence, which is within a region that is poorly related to other fungal adenylate cyclases. Adenylate cyclase is directly activated by a fungal G protein, which suggests a distinct activation mechanism from that of mammals [].This fungal domain interacts with the alpha subunit of heterotrimeric G proteins [
].
The WWC (WW-and-C2-domain-containing protein) family negatively regulates cell proliferation and organ growth by suppressing the transcriptional activity of YAP, a major effector of the Hippo pathway. They activate large tumour suppressor 1 and 2 kinases (LATS1/2), which in turn phosphorylates the transcriptional co-activator YAP [
,
]. Their two amino terminal WW domains mediate binding to target proteins, whereas the internal C2 domain is required for membrane association. WWC family members include WWC1 (also known as KIBRA), WWC2 and WWC3 [].C2 domains fold into an 8-standed β-sandwich that can adopt 2 structural arrangements: type I and type II, distinguished by a circular permutation involving their N- and C-terminal beta strands. Many C2 domains are Ca2+-dependent membrane-targeting modules that bind a wide variety of substances including bind phospholipids, inositol polyphosphates, and intracellular proteins. Whereas the majority of known C2 domains have a role in Ca2+-dependent vesicle membrane association, these protein modules are also involved in Ca2+-insensitive membrane targeting as well as in intracellular protein-protein interactions. The C2 domain of KIBRA has been shown to have Ca2+-dependent-binding specificity for monophosphorylated phosphatidylinositols [
].
Caldesmon (CDM) is an actin- and myosin-binding protein implicated in the
regulation of actomyosin interactions in smooth muscle and non-muscle cells,possibly acting as a bridge between myosin and actin filaments [
]. CDM isbelieved to be an elongated molecule, with an N-terminal myosin/calmodulin-
binding domain and a C-terminal tropomyosin/actin/calmodulin-binding domain,separated by a 40nm-long central helix [
].A high-molecular-weight form of CDM is predominantly expressed in smooth
muscles, while a low-molecular-weight form is widely distributed in non-muscle tissues and cells (the protein is not expressed in skeletal muscleor heart). A short CDM has been cloned from a chicken gizzard library [
]. The predicted protein contains 524 amino acids, with molecular mass ~60kDa.
The expressed protein binds to F-actin and is retained on calmodulin-Sepharose in the presence of Ca
2+[
]. Like the human non-muscle form [],this CDM is identical to the smooth muscle protein at its N- and C-termini,
but is missing 232 amino acids from the centre. Lack of this centralsegment, which is thought to be helical, renders the non-muscle protein
~35nm shorter than smooth muscle CDM [].
The L27 domain is a ~50-amino acid module, initially identified in the Lin-2
and Lin-7 proteins, that exists in a large family of animal scaffold proteins[
]. The L27 domain is a specific protein-protein interaction module capable of forming heteromeric complexes that can integrate multiple scaffold proteins into supramolecular assemblies required for establishment and maintenance of cell polarity. The L27 domain can be found as a single occurrence or as a duplication in association with other domains such as PDZ,SH3, the guanylate kinase domain or the serine/threonine protein kinase domain.
The main features of the L27 domain are conserved negatively charged residues
and a conserved aromatic amino acid []. Study of individual L27 domains revealed largely unfolded domains that require the formation of obligateheterodimers to achieve well-folded structures. Each L27 domain is composed of
three helices. The two L27 domains heterodimerize by building a compactstructure consisting of a four-helix bundle formed by the first two helices of
each L27 domain and one coiled-coil formed by the third helix of each domain [,
].
The SidE family includes four large proteins SidE, SdeA, SdeB, and SdeC, required for efficient intracellular bacterial replication and catalyses ubiquitination in an E1/E2-independent manner. These proteins contain four domains: a DUB domain, a phosphodiesterase (PDE) domain, a mono-ADP-ribosyltransferase (mART) domain, and a coiled-coil (CC) domain [
,
].Ubiquitination is a post-translational modification that regulates many cellular processes, the conventional ubiquitination cascade culminates in a covalent linkage between the C terminus of ubiquitin (Ub) and a target protein [
,
]. SidE family proteins can catalyse the non-canonical ubiquitination of several different substrate proteins, including Rab small GTPases, Reticulon-4 (Rtn4), and Rag small GTPases, as well as SidE proteins themselves []. This specificity resides on the specific ubiquitin-binding surfaces of mART and the unique features of the PDE domain [,
].The DUB activity of SdeA is important for regulating the dynamics of ubiquitin association with the bacterial phagosome, but is not necessary for its role in intracellular bacterial replication [
,
].This entry represents the N-terminal deubiquitinating (DUB) domain of SidE from Legionella pneumophila subsp. pneumophila [
].
Bacterial type IV pili are surface filaments critical for diverse biological processes including surface and host cell adhesion, colonisation, biofilm formation, twitching motility, DNA uptake during natural transformation and virulence [
,
]. The proteins necessary to form the type IV pili inner-membrane complex, are included in the pilMNOPQ operon which encodes the cytoplasmic actin-like protein PilM, PilN, PilO, the periplasmic lipoprotein PilP and the outer-membrane secretin PilQ. The inner-membrane PilM/N/O/P complex is required for the optimal function of the outer-membrane secretin PilQ. This cluster is highly conserved across the type IV pilus-producing bacterial species, and all of these proteins have been shown to be essential for twitching motility [,
].The PilP family are periplasmic proteins involved in the biogenesis of type IV pili [
]. The N-terminal domain of PilP interacts with the periplasmic regions of PilNO, while the C-terminal beta-domain of PilP interacts with the N0 domain of PilQ, connecting PilO to the secretin PilQ [].This entry also includes competence protein D (comD) from Haemophilus influenzae which is encoded in the homologous operon comA/B/C/D/E.
This entry represents the CARD domain found in RAIDD (RIP-associated ICH-1 homologous protein with a death domain), also known as CRADD (Caspase and RIP adapter with death domain). RAIDD is an adaptor protein that, together with the p53-inducible protein PIDD and caspase-2, forms the PIDDosome complex, which is required for caspase-2 activation and plays a role in mediating stress-induced apoptosis. RAIDD contains an N-terminal CARD, which interacts with the caspase-2 CARD, and a C-terminal Death domain (DD), which interacts with the DD of PIDD [
,
].In general, CARDs are death domains (DDs) found associated with caspases [
]. They are known to be important in the signaling pathways for apoptosis, inflammation, and host-defense mechanisms []. DDs are protein-protein interaction domains found in a variety of domain architectures. Their common feature is that they form homodimers by self-association or heterodimers by associating with other members of the DD superfamily including PYRIN and DED (Death Effector Domain). They serve as adaptors in signaling pathways and can recruit other proteins into signaling complexes [].
This entry is represented by Coronavirus non-structural protein 2A (32kDa); it is a family of uncharacterised viral proteins.Members have a phosphoesterase module (2H) [
] and are predicted to be involved in RNA modification. The viral group of 2H phosphoesterases contains proteins from two unrelated virus types: the type C rotaviruses (VP3 protein, ) that are double stranded multipartite RNA viruses and the coronaviruses (NS2 protein, this group) that are positive strand RNA viruses. Given that these viruses have vertebrate hosts, it is likely that the 2H phosphoesterase domain was derived from the host by one of virus groups followed by rapid sequence divergence [
]. Subsequently, it may have been exchanged between the viral families. Although the direction of the exchange is not clear, it is possible that a double stranded replicative form of a subgenomic RNA transcript of the coronavirus NS2 was stabilised by a rotavirus and incorporated into its multiple double stranded RNA genome []. These proteins can be utilised as novel drug targets because of their predicted RNA modification role.
Apoptosis regulator, Bcl-2, BH4 motif, conserved site
Type:
Conserved_site
Description:
Active cell suicide (apoptosis) is induced by events such as growth factor withdrawal and toxins. It is controlled by regulators, which have either an inhibitory effect on programmed cell death (anti-apoptotic) or block the protective effect of inhibitors (pro-apoptotic) [
,
]. Many viruses have found a way of countering defensive apoptosis by encoding their own anti-apoptosis genes preventing their target-cells from dying too soon. All proteins belonging to the Bcl-2 family [
] contain either a BH1, BH2, BH3, or BH4 motif. All anti-apoptotic proteins contain BH1 and BH2 motifs; some of them contain an additional N-terminal BH4 motif (Bcl-2, Bcl-x(L), Bcl-w), which is never seen in pro-apoptotic proteins, except for Bcl-x(S). On the other hand, all pro-apoptotic proteins contain a BH3 motif (except for Bad) necessary for dimerisation with other proteins of Bcl-2 family and crucial for their killing activity; some of them also contain BH1 and BH2 motifs (Bax, Bak). The BH3 motif is also present in some anti-apoptotic protein, such as Bcl-2 or Bcl-x(L).
Apoptosis regulator, Bcl-2, BH2 motif, conserved site
Type:
Conserved_site
Description:
Active cell suicide (apoptosis) is induced by events such as growth factor withdrawal and toxins. It is controlled by regulators, which have either an inhibitory effect on programmed cell death (anti-apoptotic) or block the protective effect of inhibitors (pro-apoptotic) [
,
]. Many viruses have found a way of countering defensive apoptosis by encoding their own anti-apoptosis genes preventing their target-cells from dying too soon. All proteins belonging to the Bcl-2 family [
] contain either a BH1, BH2, BH3, or BH4 motif. All anti-apoptotic proteins contain BH1 and BH2 motifs; some of them contain an additional N-terminal BH4 motif (Bcl-2, Bcl-x(L), Bcl-w), which is never seen in pro-apoptotic proteins, except for Bcl-x(S). On the other hand, all pro-apoptotic proteins contain a BH3 motif (except for Bad) necessary for dimerisation with other proteins of Bcl-2 family and crucial for their killing activity; some of them also contain BH1 and BH2 motifs (Bax, Bak). The BH3 motif is also present in some anti-apoptotic protein, such as Bcl-2 or Bcl-x(L).
Apoptosis regulator, Bcl-2, BH3 motif, conserved site
Type:
Conserved_site
Description:
Active cell suicide (apoptosis) is induced by events such as growth factor withdrawal and toxins. It is controlled by regulators, which have either an inhibitory effect on programmed cell death (anti-apoptotic) or block the protective effect of inhibitors (pro-apoptotic) [
,
]. Many viruses have found a way of countering defensive apoptosis by encoding their own anti-apoptosis genes preventing their target-cells from dying too soon. All proteins belonging to the Bcl-2 family [
] contain either a BH1, BH2, BH3, or BH4 motif. All anti-apoptotic proteins contain BH1 and BH2 motifs; some of them contain an additional N-terminal BH4 motif (Bcl-2, Bcl-x(L), Bcl-w), which is never seen in pro-apoptotic proteins, except for Bcl-x(S). On the other hand, all pro-apoptotic proteins contain a BH3 motif (except for Bad) necessary for dimerisation with other proteins of Bcl-2 family and crucial for their killing activity; some of them also contain BH1 and BH2 motifs (Bax, Bak). The BH3 motif is also present in some anti-apoptotic protein, such as Bcl-2 or Bcl-x(L).
Apoptosis regulator, Bcl-2, BH1 motif, conserved site
Type:
Conserved_site
Description:
Active cell suicide (apoptosis) is induced by events such as growth factor withdrawal and toxins. It is controlled by regulators, which have either an inhibitory effect on programmed cell death (anti-apoptotic) or block the protective effect of inhibitors (pro-apoptotic) [
,
]. Many viruses have found a way of countering defensive apoptosis by encoding their own anti-apoptosis genes preventing their target-cells from dying too soon. All proteins belonging to the Bcl-2 family [
] contain either a BH1, BH2, BH3, or BH4 motif. All anti-apoptotic proteins contain BH1 and BH2 motifs; some of them contain an additional N-terminal BH4 motif (Bcl-2, Bcl-x(L), Bcl-w), which is never seen in pro-apoptotic proteins, except for Bcl-x(S). On the other hand, all pro-apoptotic proteins contain a BH3 motif (except for Bad) necessary for dimerisation with other proteins of Bcl-2 family and crucial for their killing activity; some of them also contain BH1 and BH2 motifs (Bax, Bak). The BH3 motif is also present in some anti-apoptotic protein, such as Bcl-2 or Bcl-x(L).
The P-loop guanosine triphosphatases (GTPases) control a
multitude of biological processes, ranging from cell division, cell cycling,and signal transduction, to ribosome assembly and protein synthesis. GTPases
exert their control by interchanging between an inactive GDP-bound state andan active GTP-bound state, thereby acting as molecular switches. The common
denominator of GTPases is the highly conserved guanine nucleotide-binding (G)domain that is responsible for binding and hydrolysis of guanine nucleotides.The p47 or immunity-related GTPases (IRG) are at least as old as the
vertebrates. The IRG proteins are an essential resistance system in the mousefor immunity against pathogens that enter the cell via a vacuole. Despite its
importance for the mouse, the IRG resistance system is absent from humansbecause it has been lost during the divergent evolution of the primates. The
IRG proteins appear to be accompanied phylogenetically by homologous proteins,named 'quasi IRG' (IRGQ) proteins, that probably lack nucleotide binding or
hydrolysis function, and that may form regulatory heterodimers with functionalIRG proteins. The region of lowest similarity is in the G domain, and
conserved GTP-binding motifs are lacking [,
,
].
This family includes a group of membrane-shaping proteins that can promote local membrane curvatures, termed N-Ank proteins, which use their ankyrin repeat array and an N-terminal amphipathic helix to bind and shape membranes. The N-Ank superfamily can be divided in two subfamilies, the smaller one contains ankycorbin (also known as RAI14), UACA, Ankyrin repeat domain-containing protein 24 and 35 (Ankrd24/35); the larger second subfamily includes the uncharacterised proteins of the Ankrd18, Ankrd30 and Ankrd20 subfamilies, which all are specific to primates, or even to humans, as well as Ankrd7, Ankrd26, Ankrd36 and Ankrd62 [
].This entry represents the smaller N-Ank subfamily. Ankycorbin is an actin binding protein that regulates F-actin dynamics [
,
]. It plays an important role in early morphogenesis of neurons which depends on ankyrin repeat and N-terminal amphipathic helix. UACA (also known as Nucling) mediates apoptosis, regulates NF-κB and STAT3, sequesters par-4 and acts as an autoantigen in patients with panuveitis [,
]. Ankrd24/35 are uncharacterised proteins. Ankrd35 is associated with chronic myeloid leukaemia [].
Sorting nexins (SNXs) are hydrophilic molecules that are localized in the cytoplasm and have the potential for membrane association either through their lipid-binding PX domains () or through protein-protein interactions with membrane-associated protein complexes [
]. Indeed, several of the SNXs require several targeting motifs for their appropriate cellular localization. In almost every case studied, mammalian SNXs can be shown to have a role in protein sorting, with the most commonly used experimental model being plasma-membrane receptor endocytosis and sorting through the endosomal pathway. However, it is equally probable that SNXs sort vesicles that are not derived from the plasma membrane, and have a function in the accurate targeting of these vesicles and their cargo. The N-terminal domain appears to be specific to sorting nexins 1 and 2. SNX1 and SNX2 are members of the retromer complex involved in protein sorting within the endocytic pathway [
]. SNX1 is both membrane-associated and cytosolic, where it probably exists as a tetramer in large protein complexes and may hetero-oligomerize with SNX2.
Bacterial Fmu (Sun)/eukaryotic nucleolar NOL1/Nop2p, conserved site
Type:
Conserved_site
Description:
This pattern is to a conserved central domain which contains some highly conserved regions, and which is found in archaeal, bacterial and eukaryotic proteins. In the archaea and bacteria, they are annotated as putative nucleolar protein, Sun (Fmu) family protein or tRNA/rRNA cytosine-C5-methylase. The majority have the S-adenosyl methionine (SAM) binding domain and are related to Escherichia coli Fmu (Sun) protein (16S rRNA m5C 967 methyltransferase) whose structure has been determined [
]. In the eukaryota, the majority are annotated as being 'hypothetical protein', nucleolar protein or the Nop2/Sun (Fmu) family. Unlike their bacterial homologues, few of the eukaryotic members in this family have a the SAM binding signature. Despite this, Saccharomyces cerevisiae (Baker's yeast) Nop2p is a probable RNA m5C methyltransferase [
]. It is essential for processing and maturation of 27S pre-rRNA and large ribosomal subunit biogenesis []; localized to the nucleolus and is essential for viability []. Reduced Nop2p expression limits yeast growth and decreases levels of mature 60S ribosomal subunits while altering rRNA processing []. There is substantial identity between Nop2p and Homo sapiens (Human) p120 (NOL1), which is also called the proliferation-associated nucleolar antigen [,
].
Peroxisomal proteins catalyse metabolic reactions. The import of proteins from the cytosol into the peroxisomes matrix depends on more than a dozen peroxin (PEX) proteins, among which PEX5 and PEX7 serve as receptors that shuttle proteins bearing one of two peroxisome-targeting signals (PTSs) into the organelle. PEX5 is the PTS1 receptor, while PEX7 is the PTS2 receptor. In plants, PEX7 depends on PEX5 binding to deliver PTS2 cargo into the peroxisome, and PEX7 also facilitates PEX5 accumulation and import of PTS1 cargo into peroxisomes [
,
]. This entry include PEX5 (also known as PTS1R) from animals, fungi and plants. This entry also includes PEX5L from vertebrates. PEX5 binds to the C-terminal PTS1-type tripeptide peroxisomal targeting signal (SKL-type) and plays an essential role in peroxisomal protein import [,
,
]. Based on subcellular localization and binding properties mammalian PEX5 may function as a regulator in an early step of the PTS1 protein import process []. PEX5L acts as an accessory subunit of hyperpolarization-activated cyclic nucleotide-gated (HCN) channels, regulating their cell-surface expression and cyclic nucleotide dependence [,
]. Interestingly, although PEX5 and PEX5L have structurally similar binding at their TPR domains, they bind to different substrates in vivo [].
The HORMA domain (for HOP1, REV7 and MAD2) is an about 180-240 amino acids region containing several conserved motifs. Whereas the MAD2 and the REV7 proteins are almost entirely made up of HORMA domains, HOP1 contains a HORMA domain in its N-terminal region and a Zn-finger domain, whose general arrangement of metal-chelating residues is similar to that of the PHD finger, in the C-terminal region. The HORMA domain is found in proteins showing a direct association with chromatin of all crown group eukaryotes. It has been suggested that the HORMA domain recognises chromatin states that result from DNA adducts, double-stranded breaks or non-attachment to the spindle and acts as an adaptor that recruits other proteins involved in repair [
].Secondary structure prediction suggests that the HORMA domain is globular and could potentially form a complex β-sheet(s) with associated α-helices [
].Some proteins known to contain a HORMA domain are listed below:Eukaryotic HOP1, a conserved protein that is involved in meiotic-synaptonemal-complex assembly.Eukaryotic mitotic-arrest-deficient 2 protein (MAD2), a key component of the mitotic-spindle-assembly checkpoint [
].Eukaryotic REV7, a subunit of the DNA polymerase zeta that is involved in translesion, template-independent DNA synthesis.
The eukaryotic proteins in this entry include frataxin, the protein that is mutated in Friedreich's ataxia [
], and related sequences. Friedreich's ataxia is a progressive neurodegenerative disorder caused by loss of function mutations in the gene encoding frataxin (FRDA). Frataxin mRNA is predominantly expressed in tissues with a high metabolic rate (including liver, kidney, brown fat and heart). Mouse and yeast frataxin homologues contain a potential N-terminal mitochondrial targeting sequence, and human frataxin has been observed to co-localise with a mitochondrial protein. Furthermore, disruption of the yeast gene has been shown to result in mitochondrial dysfunction. Friedreich's ataxia is thus believed to be a mitochondrial disease caused by a mutation in the nuclear genome (specifically, expansion of an intronic GAA triplet repeat) [,
,
].The bacterial proteins in this entry are iron-sulphur cluster (FeS) metabolism CyaY proteins homologous to eukaryotic frataxin. Partial Phylogenetic Profiling [
] suggests that CyaY most likely functions as part of the ISC system for FeS cluster biosynthesis, and is supported by expermimental data in some species [,
]. This conserved site covers a conserved region in the central section of these proteins.
The BTB (for BR-C, ttk and bab) [
] or POZ (for Pox virus and Zinc finger) [,
] domain is a versatile protein-protein interaction motif involved in many cellular functions, including transcriptional regulation, cytoskeleton dynamics, ion channel assembly and gating, and targeting proteins for ubiquitination []. The BTB domain can occur alongside other domains: BTB-zinc finger (BTB-ZF), BTB-BACK-Kelch (BBK), voltage-gated potassium channel T1 (T1-Kv) [], MATH-BTB, BTB-NPH3 and BTB-BACK-PHR (BBP). Other proteins, such as Skp1 and ElonginC, consist almost exclusively of the core BTB fold. In all of these protein families, the BTB core fold is structurally conserved, consisting of a 2-layer α/β topology where a cluster of α-helices is flanked by short β-sheets []. POZ domains from several zinc finger proteins have been shown to mediate transcriptional repression and to interact with components of histone deacetylase co-repressor complexes including N-CoR and SMRT [,
,
]. The POZ or BTB domain is also known as BR-C/Ttk or ZiN.This entry includes the BTB/POZ domain, as well as Skp1 N-terminal region which has an α/β structure similar to that of the BTB/POZ domain fold [
,
].
In eukaryotes, glutathione S-transferases (GSTs) participate in the detoxification of reactive electrophilic compounds by catalysing their
conjugation to glutathione. The GST domain is also found in S-crystallins from squid, and proteins with no known GST activity, such as eukaryotic elongation factors 1-gamma and the HSP26 family of stress-related proteins, which include auxin-regulated proteins in plants and stringent starvation proteins in Escherichia coli. The major lens polypeptide of cephalopods is also a GST [,
,
,
].Bacterial GSTs of known function often have a specific, growth-supporting role in biodegradative metabolism: epoxide ring opening and tetrachlorohydroquinone reductive dehalogenation are two examples of the reactions catalysed by these bacterial GSTs. Some regulatory proteins, like the stringent starvation proteins, also belong to the GST family [
,
]. GST seems to be absent from Archaea in which gamma-glutamylcysteine substitute to glutathione as major thiol.Glutathione S-transferases form homodimers, but in eukaryotes can also form heterodimers of the A1 and A2 or YC1 and YC2 subunits. The homodimeric enzymes display a conserved structural fold. Each monomer is composed of a distinct N-terminal sub-domain, which adopts the thioredoxin fold, and a C-terminal all-helical sub-domain. This entry is the C-terminal domain.
The 14-3-3 proteins are a family of closely related acidic homodimeric proteins of about 30kDa which were first identified as being very abundant in mammalian brain tissues and located preferentially in neurons [
,
,
]. The 14-3-3 proteins seem to have multiple biological activities and play a key role in signal transduction pathways and the cell cycle. They interacts with kinases such as PKC or Raf-1; they seem to also function as protein-kinase dependent activators of tyrosine and tryptophan hydroxylases and in plants they are associated with a complex that binds to the G-box promoter elements. The 14-3-3 family of proteins are ubiquitously found in all eukaryotic species studied and have been sequenced in fungi (yeast BMH1 and BMH2, fission yeast rad24 and rad25), plants, Drosophila, and vertebrates. The sequences of the 14-3-3 proteins are extremely well conserved. As signature patterns we have selected two highly conserved regions: the first is a peptide of 11 residues located in the N-terminal section; the second, a 20 amino acid region located in the C-terminal section.
This signature patterns in this entry cover both the 11 and 20 residue conserved regions.
The deoR-type HTH domain is a DNA-binding, helix-turn-helix (HTH) domain ofvabout 50-60 amino acids present in transcription regulators of the deoR family, involved in sugar catabolism. This family of prokaryotic regulators is named after the Escherichia coli protein DeoR, a repressor of the deo operon, which encodes nucleotide and deoxyribonucleotide catabolic enzymes. DeoR also negatively regulates the expression of nupG and tsx, a nucleoside-specific transport protein and a channel-forming protein, respectively.DeoR-like transcription repressors occur in diverse bacteria as regulators of sugar and nucleoside metabolic systems. The effector molecules for deoR-like regulators are generally phosphorylated intermediates of the relevant metabolic pathway. The DNA-binding deoR-type HTH domain occurs usually in the N-terminal part. The C-terminal part can contain an effector-binding domain and/or an oligomerisation domain. DeoR occurs as an octamer, whilst glpR and agaR are tetramers. Several operators may be bound simultaneously, which could facilitate DNA looping [
,
].It is worth noting that the DeoR in this entry is represented by the protein, UniProt
, from E. coli. Not the DeoR, UniProt
, from Bacillus subtilis. Despite sharing the same name, these two proteins do not share protein sequence similarity [
].
Transcription regulator, HTH DeoR-type, conserved site
Type:
Conserved_site
Description:
The deoR-type HTH domain is a DNA-binding, helix-turn-helix (HTH) domain of
about 50-60 amino acids present in transcription regulators of the deoR family, involved in sugar catabolism.This family of prokaryotic regulators is named after the Escherichia coli protein DeoR, a repressor of the deo operon, which encodes nucleotide and deoxyribonucleotide catabolic enzymes. DeoR also negatively regulates the expression of nupG and tsx, a nucleoside-specific transport protein and a channel-forming protein, respectively.DeoR-like transcription repressors occur in diverse bacteria as regulators of
sugar and nucleoside metabolic systems. The effector molecules for deoR-likeregulators are generally phosphorylated intermediates of the relevant
metabolic pathway. The DNA-binding deoR-type HTH domain occurs usually in theN-terminal part. The C-terminal part can contain an effector-binding domain
and/or an oligomerisation domain. DeoR occurs as an octamer, whilst glpR andagaR are tetramers. Several operators may be bound simultaneously, which could
facilitate DNA looping [,
].It is worth noting that the DeoR in this entry is represented by the protein, UniProt P0ACK5, from E. coli. Not the DeoR, UniProt P39140, from Bacillus subtilis. Despite sharing the same name, these two proteins do not share protein sequence similarity [
].
Most of the hereditary idiopathic epilepsies are due to mutation in ion
channels expressed in brain. Recently two non-ion channel genes LGI1 andVGLR1 have emerged as important causes of specific epilepsy syndromes. The
product of these two genes share a conserved repeated region of about 44 aminoacid residues, the EAR domain (for epilepsy-associated repeat) [
].The predicted secondary structure (four β-strands) and the numbers of
repeated copies (seven) suggest that the EAR domain belongs to theβ-propeller fold. A common functional feature found in all characterised
domains of this class is a participation in protein-protein interactions.Since the EAR repeat is found in the ectodomain of VLGR1, it is most probably
involved in ligand recognition by the receptor [
].Proteins known to contain EAR repeats are listed below:Mammalian LGI1 to LGI4. LGI1 is mutated in autosomal dominant partial
epilepsy with auditory features (ADPEAF). The F348C missense mutation islocated in the third EAR repeat (7 copies).Mammalian thrombo-spondin N-terminal domain and EAR repeats containg
protein (TSPEAR) (7 copies).Mammalian very large G protein-coupled receptor 1 (VGLR1) or monogenic
audiogenic seizure-susceptible (MASS1) protein. In mouse, mutations inMASS1 gene are associated with generalized epilepsy and seizures in
response to loud noises (7 copies) [].
Somatomedin B (SMB), a serum factor of unknown function, is a small cysteine-rich peptide, derived proteolytically from the N terminus of the cell-substrate adhesion protein vitronectin [
]. Cys-rich somatomedin B-like domains are found in a number of proteins [], including ectonucleotide pyrophosphatase/phosphodiesterase family member proteins (previously known as plasma-cell membrane glycoprotein) [] and placental protein 11 (also known as Poly(U)-specific endoribonuclease), which appears to possess amidolytic activity.The SMB domain of vitronectin has been demonstrated to interact with both the urokinase receptor and the plasminogen activator inhibitor-1 (PAI-1) and the conserved cysteines of the NPP1 somatomedin B-like domain have been shown to mediate homodimerisation [
].The SMB domain contains eight Cys residues, arranged into four disulphide bonds. It has been suggested that the active SMB domain may be permitted considerable disulphide bond heterogeneity or variability, provided that the Cys25-Cys31 disulphide bond is preserved. The three dimensional structure of the SMB domain is extremely compact and the disulphide bonds are packed in the centre of the domain forming a covalently bonded core [
]. The structure of the SMB domain presents a new protein fold, with the only ordered secondary structure being a single-turn α-helix and a single-turn 3(10)-helix [].
This DNA-binding domain superfamily is found in the p53 and the RUNT families of transcription factors. The DNA-binding domain acts to clamp or encircle the DNA target in order to stabilise the protein-DNA complex. This domain has an immunoglobulin-like fold consisting of a β-sandwich of 9 strands in two sheets with a Greek key topology [
]. The p53 tumour suppressor functions primarily as a sequence-specific transcription factor, but also has transcription-independent roles in DNA repair and recombination [
]. In addition, p53, as well as its family members p63 and p73, are involved in the survival/death checkpoint in peripheral and central neurons, where full-length proteins appear to be involved in cell death, and truncated isoforms that retain the DNA-binding domain are involved in cell survival [].RUNT domain (RD) proteins (also known as RUNX, AML, CBF-alpha and PEBP2-alpha) are a family of transcription factors involved in various developmental pathways. The RD has two general functions: it directs the binding of the protein to a target sequence, PyGPyGGTPy (Py=pyrimidine), in order to regulate the expression of target genes; and it mediates protein-protein interactions with CBF-beta which modulates the DNA-binding affinity of the RD [
].
This entry represents the Bacteriophage T4, Gp32, single-stranded DNA-binding domain. The characteristics of the protein distribution suggest prophage matches in addition to the phage matches.Single-stranded DNA-binding protein (also known as Gp32 or SSB) is essential for bacteriophage T4 DNA replication, recombination and repair, acting to stimulate replisome processing and accuracy through its binding to ssDNA as the replication fork advances [
]. The crystal structure of Gp32 shows an ssDNA binding cleft comprised of regions from three structural subdomains, through which ssDNA can slide freely []. The structure of Gp32 is similar to other phage ssDNA-binding proteins such as Gp2.5 from bacteriophage T4, and gene V protein, both of which have a nucleic acid-binding OB-type fold. However, Gp32 contains a zinc-finger subdomain at residues 63-111 that is not found in the other two phage proteins.This protein stimulates the activities of viral DNA polymerase and DnaB-like SF4 replicative helicase, probably via its interaction with the helicase assembly factor [
], and together with DnaB-like SF4 replicative helicase and the helicase assembly factor, promotes pairing of two homologous DNA molecules containing complementary single-stranded regions and mediates homologous DNA strand exchange [].
Hepatitus C virus, Non-structural 5a protein, C-terminal
Type:
Domain
Description:
Although Hepatitis A virus, Hepatitis B virus, and Hepatitis C virus have similar names, because they all cause liver inflammation, these are distinctly different viruses both genetically and clinically. The Hepatitis C virus (HCV) is a small (50-80 nm in diameter), enveloped, single-stranded, positive sense RNA virus. It is member of the family Flaviviridae. There are seven genotypes and a number of subtypes with diverse geographic distributions. The genome of HCV consists of a single open reading frame. At the 5' and 3' ends of the RNA are the UTR regions that are not translated into proteins but are important to translation and replication of the viral RNA. The 5' UTR has a ribosome binding site (IRES - Internal ribosome entry site) that starts the translation of unique polyprotein that is later cut by cellular and viral proteases into 10 active structural and non-structural smaller proteins [
]. This entry represents the C-terminal region of the Hepatitus C virus, NS5a protein. The molecular function of the non-structural 5a protein is uncertain. The NS5a protein is phosphorylated when expressed in mammalian cells. It is thought to interact with the dsRNA dependent (interferon inducible) kinase PKR [
,
].
Winged helix DNA-binding proteins share a related winged helix-turn-helix DNA-binding motif, where the "wings", or loops, are small β-sheets. The winged helix motif consists of two wings (W1, W2), three α-helices (H1, H2, H3) and three β-sheets (S1, S2, S3) arranged in the order H1-S1-H2-H3-S2-W1-S3-W2 [
]. The DNA-recognition helix makes sequence-specific DNA contacts with the major groove of DNA, while the wings make different DNA contacts, often with the minor groove or the backbone of DNA. Several winged-helix proteins display an exposed patch of hydrophobic residues thought to mediate protein-protein interactions.Many different proteins with diverse biological functions contain a winged helix DNA-binding domain, including transcriptional repressors such as biotin repressor, LexA repressor and the arginine repressor [
]; transcription factors such as the hepatocyte nuclear factor-3 proteins involved in cell differentiation, heat-shock transcription factor, and the general transcription factors TFIIE and TFIIF [,
]; helicases such as RuvB that promotes branch migration at the Holliday junction, and CDC6 in the pre-replication complex [,
]; endonucleases such as FokI and TnsA []; histones; and Mu transposase, where the flexible wing of the enhancer-binding domain is essential for efficient transposition [].
Ribonuclease E inhibitor RraA/RraA-like superfamily
Type:
Homologous_superfamily
Description:
This entry represents the regulator of ribonuclease E activity A (RraA). These proteins contain a swivelling 3-layer beta/beta/alpha domain that appears to be mobile in most multi-domain proteins known to contain it. These proteins are structurally similar, and may have distant homology, to the phosphohistidine domain of pyruvate phosphate dikinase. The RraA fold is an ancient platform that has been adapted for a wide range of functions. RraA had been identified as a putative demethylmenaquinone methyltransferase and was annotated as MenG, but further analysis showed that RraA lacked the structural motifs usually required for methylases [
]. The Escherichia coli protein regulator RraA acts as a trans-acting modulator of RNA turnover, binding essential endonuclease RNase E and inhibiting RNA processing [
]. RNase E forms the core of a large RNA-catalysis machine termed the degradosomes. RraA (and RraB) causes remodelling of degradosome composition, which is associated with alterations in RNA decay and global transcript abundance and as such is a bacterial mechanism for the regulation of RNA cleavage.This fold is also found in 4-hydroxy-4-methyl-2-oxoglutarate aldolase, also known as RraA-like protein [
] and at the C terminus of the DlpA protein .
The ISC system is conserved in eubacteria and eukaryotes (mitochondria), and has broad specificity, targeting general FeS proteins [
,
]. It is encoded by the isc operon (iscRSUA-hscBA-fdx-iscX). IscS is a cysteine desulphurase, which obtains S from cysteine (converting it to alanine) and serves as a S donor for FeS cluster assembly. IscU and IscA act as scaffolds to accept S and Fe atoms, assembling clusters and transfering them to recipient apoproteins. HscA is a molecular chaperone and HscB is a co-chaperone. Fdx is a [2Fe-2S]-type ferredoxin. IscR is a transcription factor that regulates expression of the isc operon. IscX (also known as YfhJ) appears to interact with IscS and may function as an Fe donor during cluster assembly [
].This entry represents IscX proteins (also known as hypothetical protein YfhJ) that are part of the ISC system. IscX is active as a monomer. The structure of YfhJ is an orthogonal α-bundle [
]. YfhJ is a small acidic protein that binds IscS, and contains a modified winged helix motif that is usually found in DNA-binding proteins []. YfhJ/IscX can bind Fe, and may function as an Fe donor in the assembly of FeS clusters
This entry represents the paired amphipathic helix (PAH) repeat. Sin3 proteins have at least three PAH domains (PAH1, PAH2, and PAH3) [
,
]. They are components of a co-repressor complex that silences transcription, playing important roles in the transition between proliferation and differentiation. Sin3 proteins are recruited to the DNA by various DNA-binding transcription factors such as the Mad family of repressors, Mnt/Rox, PLZF, MeCP2, p53, REST/NRSF, MNFbeta, Sp1, TGIF and Ume6 [
]. Sin3 acts as a scaffold protein that in turn recruits histone-binding proteins RbAp46/RbAp48 and histone deacetylases HDAC1/HDAC2, which deacetylate the core histones resulting in a repressed state of the chromatin []. The PAH domains are protein-protein interaction domains through which Sin3 fulfils its role as a scaffold. The PAH2 domain of Sin3 can interact with a wide range of unrelated and structurally diverse transcription factors that bind using different interaction motifs. For example, the Sin3 PAH2 domain can interact with the unrelated Mad and HBP1 factors using alternative interaction motifs that involve binding in opposite helical orientations [].Structurally, PAH2 is composed of four helices arranged in an open up-and-down bundle fold which binds α-helical peptides.
The UNC-45 protein contain an NH2-terminal domain with three tetratricopeptide repeat motifs, a unique central domain, and a COOH-terminal domain homologous to CRO1 and She4 [
]. This entry represents the central domain of the UNC-45 protein. The UNC-45 or small muscle protein 1 of Caenorhabditis elegans is expressed in two forms from different genomic positions in mammals: as a general tissue protein (UNC-45a) and as a specific form (UNC-45b) expressed only in striated and skeletal muscle. Myofibril formation requires both UNC-45 forms, consistent with the fact that the cytoskeleton is necessary for the development and maintenance of organised myofibrils [
]. Rng3 (Ring assembly protein 3), the homologue in Schizosaccharomyces pombe, is crucial for cell shape, normal actin cytoskeleton, and contractile ring assembly, and is essential for assembly of the myosin II-containing progenitors of the contractile ring. Widespread defects in the cytoskeleton are found in null mutants of all three fungal proteins []. Mammalian Unc45 is found to act as a specific chaperone during the folding of myosin and the assembly of striated muscle by forming a stable complex with the general chaperone Hsp90 [].
STAT2 is a member of the STAT protein family. In response to interferon, STAT2 forms a complex with STAT1 and IFN regulatory factor family protein p48 (ISGF3G), in which this protein acts as a transactivator, but lacks the ability to bind DNA directly [
]. Transcription adaptor P300/CBP (EP300/CREBBP) has been shown to interact specifically with STAT2, which is thought to be involved in the process of blocking IFN-alpha response by adenovirus []. This entry represents the SH2 domain of STAT2.STAT proteins have a dual function: signal transduction and activation of transcription. When cytokines are bound to cell surface receptors, the associated Janus kinases (JAKs) are activated, leading to tyrosine phosphorylation of the given STAT proteins []. Phosphorylated STATs form dimers, translocate to the nucleus, and bind specific response elements to activate transcription of target genes []. STAT proteins contain an N-terminal domain (NTD), a coiled-coil domain (CCD), a DNA-binding domain (DBD), an α-helical linker domain (LD), an SH2 domain, and a transactivation domain (TAD). The SH2 domain is necessary for receptor association and tyrosine phosphodimer formation. There are seven mammalian STAT family members which have been identified: STAT1, STAT2, STAT3, STAT4, STAT5 (STAT5A and STAT5B), and STAT6 [].
TssK is an essential baseplate component of the type VI secretion system, which connects the membrane complex, the baseplate and the tail components [
,
,
].The structure of TssK was solved revealing the proteins organise into a tightly packed trimer [
,
]. Each TssK monomer comprises three domains: an N-terminal β-sandwich domain, a linker and a four-helix-bundle middle domain, and a C-terminal domain. The N-terminal domain of TssK is structurally homologous to the shoulder domain of phage receptor-binding proteins, and the C-terminal domain binds the T6SS membrane complex [].This family includes TssK proteins and TssK homologues ImpJ and vasE.The type VI secretion system (T6SS) is a supra-molecular bacterial complex that resembles phage tails. It is a toxin delivery systems which fires toxins into target cells upon contraction of its TssBC sheath [
]. Thirteen essential core proteins are conserved in all T6SSs: the membrane associated complex TssJ-TssL-TssM, the baseplate proteins TssE, TssF, TssG, and TssK, the bacteriophage-related puncturing complex composed of the tube (Hcp), the tip/puncturing device VgrG, and the contractile sheath structure (TssB and TssC). Finally, the starfish-shaped dodecameric protein, TssA, limits contractile sheath polymerization at its distal part when TagA captures TssA [].
The type III secretion system of Gram-negative bacteria is used to transport virulence factors from the pathogen directly into the host cell [
] and is only triggered when the bacterium comes into close contact with the host. Effector proteins secreted by the type III system do not possess a secretion signal, and are considered unique because of this. Yersinia spp. secrete an effector protein called YopE through the type III needle []. This acts as a Rho GTPase-activating protein that disrupts the host cell actin cytoskeleton, and is regulated by a chaperone protein called SycE/YerA []. In the absence of the SycE chaperone, YopE is not transported through the needle and remains in the bacterial cytoplasm, so suggesting a crucial role for this moiety []. Both the YopE regulator and SycE/YerA proteins share similarity with the exoenzyme S (ExoS) gene product of Pseudomonas aeruginosa [
]. ExoS has both ADP-ribosylating and GTPase activity, and is implicated as a virulence factor. As type III secretion in Pseudomonas is often associated with systemic and even fatal infections in susceptible patients [], the proteins involved are of interest as vaccine and drug targets.
The Phox Homology (PX) domain is a phosphoinositide (PI) binding module present in many proteins with diverse functions. Sorting nexins (SNXs) make up the largest group among PX domain containing proteins. They are involved in regulating membrane traffic and protein sorting in the endosomal system. The PX domain of SNXs binds phosphoinositides (PIs) and targets the protein to PI-enriched membranes [
,
]. SNXs differ from each other in PI-binding specificity and affinity, and the presence of other protein-protein interaction domains, which help determine subcellular localization and specific function in the endocytic pathway [,
,
].SNX1 harbors a Bin/Amphiphysin/Rvs (BAR) domain (), which detects membrane curvature, C-terminal to the PX domain. Both domains have been shown to determine the specific membrane-targeting of SNX1 [
]. SNX1 is components of the retromer complex, a membrane coat multimeric complex required for endosomal retrieval of lysosomal hydrolase receptors to the Golgi []. The retromer consists of a cargo-recognition subcomplex and a subcomplex formed by a dimer of sorting nexins (SNX1 and/or SNX2), which ensures efficient cargo sorting by facilitating proper membrane localization of the cargo-recognition subcomplex [].This entry represents the SNX1 PX domain.
YKT6 forms complexes with syntaxin-5 (Qa), GS28 (Qb) and either Bet1 (Qc) or GS15 (Qc). This complex regulates the early secretory pathway of eukaryotic cells at the level of the transport from the ER-Golgi intermediate compartment (ERGIC) to the cis-Golgi and transport from the trans-Golgi network to the cis-Golgi, respectively [,
]. YKT6 is a member of the R-SNARE subfamily of SNARE (soluble N-ethylmaleimide-sensitive factor attachment protein receptor) proteins which contain coiled-coil helices (called SNARE motifs) that mediate the interactions between SNARE proteins, and a transmembrane domain.The SNARE complex mediates membrane fusion, important for trafficking of newly synthesized proteins, recycling of pre-existing proteins and organelle formation. SNARE proteins are classified into four groups, Qa-, Qb-, Qc- and R-SNAREs, depending on whether the residue in the hydrophilic centre layer of the four-helical bundle is a glutamine (Q) or arginine (R). Qa-, as well as Qb- and Qc-SNAREs, are localized to target organelle membranes, while R-SNARE is localized to vesicle membranes. They form unique complexes consisting of one member of each subgroup, that mediate fusion between a specific type of vesicles and their target organelle. Their SNARE motifs form twisted and parallel heterotetrameric helix bundles [].
This entry represents the DBB domain.The following proteins share a number of distinct parts, namely with ankyrin repeats (see
), a coiled coil, and a stretch of approximately 140 amino acid residues the development of the immune system in higher upstream of the ankyrin repeats, which has been called the Dof/BCAP/BANK (DBB) domain [
,
,
]:Drosophila Downstream-of-EGF receptor (Dof), a protein essential for the morphogenesis of both the mesoderm and the tracheae. It has been proposed to mediate the transmission of a signal from an activated receptor to other components of the cell, including the MAP kinase cascade.Vertebrate BANK and BCAP proteins that function in B-cell signalling.These proteins are involved in signalling; however, unlike Dof, BANK and BCAP are not implicated in FGF signalling but appear to have undergone rapid change during the course of evolution to acquire a novel function with vertebrates.The DBB domain in both Dof and BCAP is required to mediate self-association in yeast cells, indicating that this domain may have a more general role in mediating protein-protein interactions [
]. DBB domain of BCAP is required for dimerization and leads to negative regulation of TLR signalling [].
This domain is found on the N terminus of the viral protein D10 (VD10) and the related MutT motif proteins [
]. The VD10 protein is probably essential for virus replication [] and is often found to the N terminus of a NUDIX hydrolase domain. Previous studies indicated that the vaccinia virus D10 protein, which is conserved in all sequenced poxviruses, participates in the rapid turnover of host and viral mRNAs. D10 contains a motif present in the family of Nudix/MutT enzymes, a subset of which has been shown to enhance mRNA turnover in eukaryotic cells through cleavage of the 5' cap (m7GpppNm-). The D10 protein possesses an intrinsic activity that liberates m7GDP from capped RNA substrates. Furthermore, point mutations in the Nudix/MutT motif abolished decapping activity. D10 has a strong affinity for capped RNA substrates of lengths of 24-309 nt were decapped efficiently. The poxviruses represent the only virus family shown to encode a Nudix hydrolase-decapping enzyme. The activity of the decapping and capping enzymes, accelerate mRNA turnover and helps to eliminate competing host mRNAs allowing stage-specific synthesis of viral proteins [
].
This family of proteins is found exclusively in epsilon-proteobacteria. Proteins in this family are approximately 180 amino acids in length. The crystal structure of HobA from Helicobacter pylori has been reported at 1.7A resolution; HobA represents a modified Rossmann fold consisting of a five-stranded parallel β-sheet (beta1-5) flanked on one side by alpha-2, alpha-3 and alpha-6 helices and alpha-4 and alpha-5 on the other. The alpha-1 helix is extended away from and has minimal interaction with the globular part of the protein. Four monomers interact to form a tetrameric molecule. Four calcium atoms bind to the tetramer and these binding sites may have functional relevance. The closest structural homologue of HobA is a sugar isomerase (SIS) domain containing protein, the phosphoheptose isomerase from Pseudomonas aeruginosa. The SIS proteins share strong sequence homology with DiaA from Escherichia coli; yet, HobA and DiaA share no sequence homology [
]. HobA is a novel protein essential for initiation of H. pylori chromosome replication. It interacts specifically via DnaA with the oriC-DnaA complex. It is possible that HobA is essential for the correct formation and stabilisation of the orisome by facilitating the spatial positioning of DnaA at oriC [
].
Mitochondrial intermediate peptidase (MIP; MEROPS identifier M03.006;
) belongs to the widespread peptidase subfamily M3A [
]. MIP shows similarity to the thimet oligopeptidase (TOP). These proteins are enriched in cysteine residues, two of which are highly conserved, suggesting their importance to stability as well as in formation of metal binding sites, thus playing a role in MIP activity [].MIP is one of three peptidases responsible for the proteolytic processing of both nuclear and mitochondrial encoded precursor polypeptides targeted to the various subcompartments of the mitochondria [
,
]. It cleaves intermediate-size proteins initially processed by mitochondrial processing peptidase (MPP) to yield a processing intermediate with a typical N-terminal octapeptide that is sequentially cleaved by MIP to mature-size protein []. MIP cleaves precursor proteins of respiratory components, including subunits of the electron transport chain and tri-carboxylic acid cycle enzymes, and components of the mitochondrial genetic machinery, including ribosomal proteins, translation factors, and proteins required for mitochondrial DNA metabolism. It has been suggested that the human MIP (HMIP polypeptide; gene symbol MIPEP) may be one of the loci predicted to influence the clinical manifestations of Friedreich's ataxia (FRDA), an autosomal recessive neurodegenerative disease caused by lack of human frataxin [].
Non-structural protein NSP3 (also known as nsp3) is a multi-domain protein, the largest product of ORF1a which encodes the poyprotein 1a/1ab (pp1a/1ab). NSP3 comprises up to 16 different domains and regions, their organisation differs between coronaviruses (CoV). However, eight domains and two transmembrane regions are conserved in all known CoVs: the ubiquitin-like domain 1 (Ubl1), the Glu-rich acidic domain (hypervariable region), a macrodomain (X domain), the ubiquitin-like domain 2 (Ubl2), the papain-like protease 2 (PL2pro, depending on the CoV there are one or two PLpro), the NSP3 ectodomain (3Ecto, also called "zinc-finger domain"), as well as the domains Y1 and CoV-Y of unknown function. NSP3 is released from pp1a/1ab by the papain-like protease domain(s), which is (are) part of NSP3 itself. NSP3 is an essential component of the replication/transcription complex (RTC) as it acts as a scaffold protein to interact with itself and to bind other viral NSPs or host proteins. The RTC associates with host ER membranes producing convoluted membranes and double-membrane vesicles. It is also involved in post-translational modifications of host proteins to antagonise its innate immune response [
,
].This entry represents the C-terminal region of NSP3 found in coronaviruses.
CagE, TrbE, VirB component of type IV transporter system
Type:
Family
Description:
This family includes the Helicobacter pylori protein CagE (
), which together with other proteins from the cag pathogenicity island (PAI), encodes a type IV transporter secretion system. It is the ATPase component of the type IV secretion system Cag (Cag-T4SS) that may acts as a molecular motor to provide the energy that is required for the export of proteins. It is essential for pathogenesis in Helicobacter pylori induced gastritis and peptic ulceration [
], being required for CagA translocation and induction of IL-8 in host gastric epithelial cells []. Indeed, the expression of the cag PAI has been shown to be essential for stimulating human gastric epithelial cell apoptosis in vitro []. CagE plays a key role in Cag-T4SS pilus biogenesis, especially in the localization and stabilization of the pilus-associated components CagI, CagL and the surface protein CagH [] and it is also critical for assembly of the entire cytoplasmic portion of the Cag inner membrane complex (IMC) [].Similar type IV transport systems are also found in other bacteria. This family includes proteins from the trb and Vir conjugal transfer systems in
Agrobacterium tumefaciens and homologues of VirB proteins from other species.
The VASt (VAD1 Analog of StAR-related lipid transfer) domain is conserved across eukaryotes and is structurally related to Bet v1-like domains, including START lipid-binding domains. The ~190-amino acid VASt domain is predominantly associated with lipid binding
domains such as GRAM, C2 and PH domains. The VASt domain is likely to have a function in binding large hydrophobic ligands and may be specific for sterol [,
].The predicted structure of the VASt domain is a two-layer sandwich α/β fold, also called "helix grip fold", containing three α-helices (α1 to
3), six β-sheets (β1 to 6) and two loops (ω1 and 2) numbered from N to C terminus [].Some proteins known to contain a VASt domain are listed below:Plant vascular associated death1 (VAD1), a regulator of programmed cell death (PCD) harboring a GRAM putative lipid-binding domain.Yeast SNF1 Interacting Protein 3 (SIP3), may be involved in sterol transfer between intracellular membranes.Yeast Suicide Protein 1 (YSP1), a mitochondrial protein specifically required for the mitochondrial thread-grain transition, de-energization, and the cell death. May be involved in sterol transfer between intracellular membranes.Yeast Suicide Protein 2 (YSP2), a mitochondrial membrane protein involved in mitochondrial fragmentation. May be involved in sterol transfer between intracellular membranes.Human GramD1a-c.