This entry represents the N-terminal region of aminoacyl-transferases found in both eukaryotic (Arginine-tRNA-protein transferase) and prokaryotic (Aspartate/glutamate leucyltransferase) enzymes.Arginine-tRNA-protein transferase catalyses the post-translational conjugation of arginine to the N terminus of a protein. In eukaryotes, this functions as part of the N-end rule pathway of protein degradation by conjugating a destabilising amino acid to the N-terminal aspartate or glutamate of a protein, targeting the protein for ubiquitin-dependent proteolysis [
]. In Saccharomyces cerevisiae, Cys20, 23, 94 and/or 95 are thought to be important for activity []. Of these, only Cys 94 appears to be completely conserved in this family. Aspartate/glutamate leucyltransferase (also known as bacterial protein transferase or Bpt) functions in the N-end rule pathway of protein degradation where it conjugates Leu from its aminoacyl-tRNA to the N-termini of proteins containing an N-terminal aspartate or glutamate. This protein shows sequence similarity to the eukaryotic N-end rule pathway component arginyl-transferase Ate1 [
].
SAM-dependent methyltransferase RsmI, conserved site
Type:
Conserved_site
Description:
Several uncharacterised proteins have been shown to share regions of similarities, including Escherichia coli hypothetical protein yraL, and HI1654, the corresponding Haemophilus influenzae protein; Bacillus subtilis hypothetical protein yabC; Helicobacter pylori hypothetical protein HP0552; Mycoplasma genitalium and Mycoplasma pneumoniae hypothetical protein MG056; Mycobacterium tuberculosis hypothetical protein MtCI237.19; and Synechocystis sp. (strain PCC 6803) hypothetical protein sll0818.In bacterial 16S rRNAs, methylated nucleosides are clustered within the decoding centre, and these nucleoside modifications are thought to modulate translational fidelity. The N(4), 2'-O-dimethylcytidine (m(4)Cm) at position 1402 of the Escherichia coli 16S rRNA directly interacts with the P-site codon of the mRNA. RsmI is an AdoMet-dependent methyltransferase responsible for the 2'O-methylation of C1402. It is conserved in almost all species of bacteria [
].Proteins of the RsmI family are of about 30 to 34 Kd and contain a number of conserved regions. The signature pattern in this entry is to a highly conserved region in the central section of the proteins.
Members of this protein are the substrate-binding protein of a predicted carbohydrate transporter operon, together with permease subunits of ABC transporter homology families. This substrate-binding protein frequently co-occurs in genomes with a family of disaccharide phosphorylases,
, suggesting that the molecule transported will include beta-D-galactopyranosyl-(1->3)-N-acetyl-D-glucosamine and related carbohydrates. Members of this family are sporadically strain by strain, often in species with a human host association, including Propionibacterium acnes and Clostridium perfringens (protein CPR_0540, which gives its name to the group), and Bacillus cereus.
Bacterial high affinity transport systems are involved in active transport of solutes across the cytoplasmic membrane. Most of the bacterial ABC (ATP-binding cassette) importers are composed of one or two transmembrane permease proteins, one or two nucleotide-binding proteins and a highly specific periplasmic solute-binding protein. In Gram-negative bacteria the solute-binding proteins are dissolved in the periplasm, while in archaea and Gram-positive bacteria, their solute-binding proteins are membrane-anchored lipoproteins [
,
].
The yeast polymerase suppressor 1 (PSP1) protein partially suppresses
mutations in DNA polymerases alpha and delta []. The C-terminal half of PSP1 contains a domain, which is also found in several hypothetical proteins from both eukaryotic and prokaryotic sources:Crithidia fasciculata RBP45 and RBP33, subunits of the cycling sequence
binding protein (CSBP) II. RBP45 and RBP33 proteins bind specifically tothe cycling sequences present in several mRNAs that accumulate periodically
during the cell cycle. RBP45 and RBP33 are phosphoproteins, which arephosphorylated differentially during progression through the cell cycle.
Hypothetical proteins with high sequence similarity have been identified inother kinetoplastid organisms [
].Bacillus subtilis yaaT protein, which plays a significant role in
phosphorelay during initiation of sporulation. It is possible that the yaaTprotein is also related to DNA replication. The sequence of the yaaT
protein is widely conserved in prokaryotes (bacteria and archaea), but thefunctions of the protein are unknown [
].The actual biological significance of the PSP1 C-terminal domain has not yetbeen clearly established.
Proteins synthesised on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. This traffic is bidirectional, to ensure that proteins required to form vesicles are recycled. Vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transfer []. While clathrin mediates endocytic protein transport, and transport from ER to Golgi, coatomers primarily mediate intra-Golgi transport, as well as the reverse Golgi to ER transport of dilysine-tagged proteins []. For example, the coatomer COP1 (coat protein complex 1) is responsible for reverse transport of recycled proteins from Golgi and pre-Golgi compartments back to the ER, while COPII buds vesicles from the ER to the Golgi []. Coatomers reversibly associate with Golgi (non-clathrin-coated) vesicles to mediate protein transport and for budding from Golgi membranes []. Activated small guanine triphosphatases (GTPases) attract coat proteins to specific membrane export sites, thereby linking coatomers to export cargos. As coat proteins polymerise, vesicles are formed and budded from membrane-bound organelles. Coatomer complexes also influence Golgi structural integrity, as well as the processing, activity, and endocytic recycling of LDL receptors. In mammals, coatomer complexes can only be recruited by membranes associated to ADP-ribosylation factors (ARFs), which are small GTP-binding proteins. Coatomer complexes are hetero-oligomers composed of at least an alpha, beta, beta', gamma, delta, epsilon and zeta subunits. This entry represents the C terminus (approximately 500 residues) of the eukaryotic coatomer alpha subunit [
,
]. This domain is found along with the domain.
Protein phosphatases remove phosphate groups from various proteins that are the key components of a number of signalling pathways in eukaryotes and prokaryotes. Protein phosphatases that dephosphorylate Ser and Thr residues are classified into the phosphoprotein (PPP) and the protein phosphatase Mg2- or Mn2-dependent (PPM) families. The core structure of PPMs is the 300-residue PPM-type phosphatase domain that catalyses the dephosphorylation of phosphoserine- and phosphothreonine-containing protein. The PPM-type phosphatase domain is found as a module in diverse structural contexts and is modulated by targeting and regulatory subunits [
,
,
,
].Some proteins known to contain a PPM-type phosphatase domain are listed below:Bacillus subtilis stage II sporulation protein E (SpoIIE), controls the sporulation by dephosphorylating an anti-transcription factor SpoIIAA, reversing the actions of the SpoIIAB protein kinase in a process that is governed by the ADP/ATP ratio [levdikov].Mycobacterium tuberculosis PP2C-family Ser/Thr phosphatase (PstP).Eucaryotic PP2C, a negative regulator of protein kinase cascades that are activated as a result of stress.Yeast adenylyl cyclase, plays essential roles in regulation of cellular metabolism by catalysing the synthesis of a second messenger, cAMP [
].Mammalian mitochondrial pyruvate dehydrogenase phosphatase 1 (PDP1).Plant kinase-associated protein phosphatase (KAPP), regulates receptor-like kinase (RLK) signalling pathways.Plant absissic acid-insenstive 1 and 2 (ABI1 and ABI2), play a key absissic acid (ABA) signal transduction.The PP2C-type phosphatase domain consists of 10 segments of β-strands and 5 segments of α-helix and comprises a pair of detached subdomains. The first is a small β-sandwich with strand β1 packed against strands β2 and β3; the second is a larger β-sandwich in which a four-stranded β-sheet packs against a three-stranded β-sheet with flanking α-helices [
,
].
Protein phosphatases remove phosphate groups from various proteins that are the key components of a number of signalling pathways in eukaryotes and prokaryotes. Protein phosphatases that dephosphorylate Ser and Thr residues are classified into the phosphoprotein (PPP) and the protein phosphatase Mg2- or Mn2-dependent (PPM) families. The core structure of PPMs is the 300-residue PPM-type phosphatase domain that catalyses the dephosphorylation of phosphoserine- and phosphothreonine-containing protein. The PPM-type phosphatase domain is found as a module in diverse structural contexts and is modulated by targeting and regulatory subunits [
,
,
,
].Some proteins known to contain a PPM-type phosphatase domain are listed below:Bacillus subtilis stage II sporulation protein E (SpoIIE), controls the sporulation by dephosphorylating an anti-transcription factor SpoIIAA, reversing the actions of the SpoIIAB protein kinase in a process that is governed by the ADP/ATP ratio [levdikov].Mycobacterium tuberculosis PP2C-family Ser/Thr phosphatase (PstP).Eucaryotic PP2C, a negative regulator of protein kinase cascades that are activated as a result of stress.Yeast adenylyl cyclase, plays essential roles in regulation of cellular metabolism by catalysing the synthesis of a second messenger, cAMP [
].Mammalian mitochondrial pyruvate dehydrogenase phosphatase 1 (PDP1).Plant kinase-associated protein phosphatase (KAPP), regulates receptor-like kinase (RLK) signalling pathways.Plant absissic acid-insenstive 1 and 2 (ABI1 and ABI2), play a key absissic acid (ABA) signal transduction.The PP2C-type phosphatase domain consists of 10 segments of β-strands and 5 segments of α-helix and comprises a pair of detached subdomains. The first is a small β-sandwich with strand β1 packed against strands β2 and β3; the second is a larger β-sandwich in which a four-stranded β-sheet packs against a three-stranded β-sheet with flanking α-helices [
,
].
Alphaviruses are enveloped RNA viruses that use arthropods such as mosquitoes for transmission to their vertebrate hosts, and include Semliki Forest and Sindbis viruses [
]. Alphaviruses consist of three structural proteins: the core nucleocapsid protein C, and the envelope proteins P62 and E1 () that associate as a heterodimer. The viral membrane-anchored surface glycoproteins are responsible for receptor recognition and entry into target cells through membrane fusion. The proteolytic maturation of P62 into E2 and E3 (
) causes a change in the viral surface. Together the E1, E2, and sometimes E3 glycoprotein "spikes"form an E1/E2 dimer or an E1/E2/E3 trimer, where E2 extends from the centre to the vertices, E1 fills the space between the vertices, and E3, if present, is at the distal end of the spike [
,
]. Upon exposure of the virus to the acidity of the endosome, E1 dissociates from E2 to form an E1 homotrimer, which is necessary for the fusion step to drive the cellular and viral membranes together []. This entry represents the alphaviral E2 glycoprotein. The E2 glycoprotein functions to interact with the nucleocapsid through its cytoplasmic domain, while its ectodomain is responsible for binding a cellular receptor.The E2 glycoproteins interact with the nucleocapsid through its cytoplasmic domain, while its ectodomain is responsible for binding a cellular receptor. This is an all beta protein belonging to the immunoglobulin superfamily, with three immunoglobulin domains labelled A, B and C in amino- to carboxy-terminal order.This superfamily represents the N-terminal domain, known as domain A, of the alphavirus E2 glycoprotein and which is found at the centre of the protein [
].
Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [
,
,
,
,
]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. This entry represents a domain presumed to be a zinc binding domain. The following pattern describes the zinc finger:C-X(1-6)-H-X-C-X3-C(H/C)-X(3-4)-(H/C)-X(1-10)-C
where X can be any amino acid, and numbers in brackets indicate the number of residues. The two position can be either His or Cys. This central cysteine-rich portion encodes the DNA-binding domain which is highly conserved in eukaryotes [
]. The NFX1 family of proteins may have additional roles mediated by protein-protein interactions regarding the reiterated RING finger motifs in this central domain which strongly suggest that NFX1 is a probable E3 ubiquitin protein ligase []. This domain is found in the human transcriptional repressor NK-X1, a repressor of HLA-DRA transcription []; the Drosophila shuttle craft protein, which plays an essential role during the late stages of embryonic neurogenesis and has been shown to be a DNA- or RNA-binding protein [
]; and the yeast FKBP12-associated protein 1 (FAP1) [].
Small GTPases form an independent superfamily within the larger class of regulatory GTP hydrolases. This superfamily contains proteins that control a vast number of important processes and possess a common, structurally preserved GTP-binding domain [
,
]. Sequence comparisons of small G proteins from various species have revealed that they are conserved in primary structures at the level of 30-55% similarity [].Crystallographic analysis of various small G proteins revealed the presence of a 20kDa catalytic domain that is unique for the whole superfamily [
,
]. The domain is built of five alpha helices (A1-A5), six β-strands (B1-B6) and five polypeptide loops (G1-G5). A structural comparison of the GTP- and GDP-bound form, allows one to distinguish two functional loop regions: switch I and switch II that surround the gamma-phosphate group of the nucleotide. The G1 loop (also called the P-loop) that connects the B1 strand and the A1 helix is responsible for the binding of the phosphate groups. The G3 loop provides residues for Mg2 and phosphate binding and is located at the N terminus of the A2 helix. The G1 and G3 loops are sequentially similar to Walker A and Walker B boxes that are found in other nucleotide binding motifs. The G2 loop connects the A1 helix and the B2 strand and contains a conserved Thr residue responsible for Mg2 binding. The guanine base is recognised by the G4 and G5 loops. The consensus sequence NKXD of the G4 loop contains Lys and Asp residues directly interacting with the nucleotide. Part of the G5 loop located between B6 and A5 acts as a recognition site for the guanine base [].The small GTPase superfamily can be divided into at least 8 different families, including:Arf small GTPases. GTP-binding proteins involved in protein trafficking by modulating vesicle budding and uncoating within the Golgi apparatus.Ran small GTPases. GTP-binding proteins involved in nucleocytoplasmic transport. Required for the import of proteins into the nucleus and also for RNA export.Rab small GTPases. GTP-binding proteins involved in vesicular traffic.Rho small GTPases. GTP-binding proteins that control cytoskeleton reorganisation.Ras small GTPases. GTP-binding proteins involved in signalling pathways.Sar1 small GTPases. Small GTPase component of the coat protein complex II (COPII) which promotes the formation of transport vesicles from the endoplasmic reticulum (ER).Mitochondrial Rho (Miro). Small GTPase domain found in mitochondrial proteins involved in mitochondrial trafficking.Roc small GTPases domain. Small GTPase domain always found associated with the COR domain.
Wnt proteins constitute a large family of secreted signalling molecules that
are involved in intercellular signalling during development. The name derives from the first 2 members of the family to be discovered: int-1
(mouse) and wingless (Wg) (Drosophila) []. It is now recognised that Wnt signalling controls many cell fate decisions in a variety of different organisms, including mammals. Wnt signalling has been implicated in tumourigenesis, early mesodermal patterning of the embryo, morphogenesis of
the brain and kidneys, regulation of mammary gland proliferation and Alzheimer's disease [
].Wnt signal transduction proceeds initially via binding to their cell
surface receptors - the so-called frizzled proteins. This activates thesignalling functions of B-catenin and regulates the expression of specific
genes important in development []. More recently, however, several non-canonical Wnt signalling pathways have been elucidated that act independently of B-catenin. In both cases, the transduction mechanismrequires dishevelled protein (Dsh), a cytoplasmic phosphoprotein that acts
directly downstream of frizzled []. In addition to its role in Wnt signalling, Dsh is also involved in generating planar polarity in Drosophila and has been implicated in the Notch signal transduction cascade. Three human and mouse homologues of Dsh have been cloned (DVL-1 to 3); it is believed that these proteins, like their Drosophila counterpart, are
involved in signal transduction. Human and murine orthologues share more than 95% sequence identity and are each 40-50% identical to Drosophila Dsh.
Sequence similarity amongst Dsh proteins is concentrated around three
conserved domains: at the N terminus lies a DIX domain (mutations mapping to this region reduce or completely disrupt Wg signalling); a PDZ
(or DHR) domain, often found in proteins involved in protein-protein interactions, lies within the central portion of the protein (point
mutations within this module have been shown to have little effect on Wg-mediated signal transduction); and a DEP domain is located towards the C terminus and is conserved among a set of proteins that regulate various
GTPases (whilst genetic and molecular assays have shown this module to be dispensable for Wg signalling, it is thought to be important in planar
polarity signalling in flies []). Therefore the requirement of these domains for distinct signaling pathways varies: the DIX domain is essential for B-catenin activation, the DEP domain is implicated in the activation of the JNK pathway, while the PDZ domain is requiredfor both [
].This entry represents a domain found in the C-terminal of Dsh proteins.
The Btk-type zinc finger or Btk motif (BM) is a conserved zinc-binding motif containing conserved cysteines and a histidine that is present in certain eukaryotic signalling proteins. The motif is named after Bruton's tyrosine kinase (Btk), an enzyme which is essential for B cell maturation in humans and mice [
,
]. Btk is a member of the Tec family of protein tyrosine kinases (PTK). These kinases contain a conserved Tec homology (TH) domain between the N-terminal pleckstrin homology (PH) domain () and the Src homology 3 (SH3) domain (
). The N-terminal of the TH domain is highly conserved and known as the Btf motif, while the C-terminal region of the TH domain contains a proline-rich region (PRR). The Btk motif contains a conserved His and three Cys residues that form a zinc finger (although these differ from known zinc finger topologies), while PRRs are commonly involved in protein-protein interactions, including interactions with G proteins [
,
]. The TH domain may be of functional importance in various signalling pathways in different species []. A complete TH domain, containing both the Btk and PRR regions, has not been found outside the Tec family; however, the Btk motif on its own does occur in other proteins, usually C-terminal to a PH domain (note that although a Btk motif always occurs C-terminal to a PH domain, not all PH domains are followed by a Btk motif).The crystal structures of Btk show that the Btk-type zinc finger has a globular core, formed by a long loop which is held together by a zinc ion, and that the Btk motif is packed against the PH domain [
]. The zinc-binding residues are a histidine and three cysteines, which are fully conserved in the Btk motif []. Proteins known to contain a Btk-type zinc finger include:Mammalian Bruton's tyrosine kinase (Btk), a protein tyrosine kinase involved in modulation of diverse cellular processes. Mutations affecting Btk are the cause of X-linked agammaglobulinemia (XLA) in humans and X-linked immunodeficiency in mice. Mammalian Tec, Bmx, and Itk proteins, which are tyrosine protein kinases of the Tec subfamily. Drosophila tyrosine-protein kinase Btk29A, which is required for the development of proper ring canals and of male genitalia and required for adult survival. Mammalian Ras GTPase-activating proteins (RasGAP), which regulate the activation of inactive GDP-bound Ras by converting GDP to GTP.
The tripartite DENN (after differentially expressed in neoplastic versus normal cells) domain is found in several proteins that share common structural features and have been shown to be guanine nucleotide exchange factors (GEFs) for Rab GTPases, which are regulators of practically all membrane trafficking events in eukaryotes. The tripartite DENN domain is composed of three distinct modules which are always associated due to functional and/or structural constraints: upstream DENN or uDENN, the better conserved central or core or cDENN, and downstream or dDENN regions. The tripartite DENN domain is found associated with other domains, such as RUN, PLAT, PH, PPR, WD-40, GRAM or C1. The function of DENN domain remains to date unclear, although it appears to represent a good candidate for a GTP/GDP exchange
activity [,
,
,
,
].Some proteins known to contain a tripartite DENN domain are listed below:Rat Rab3 GDP/GTP exchange protein (Rab3GEP).Human mitogen-activated protein kinase activating protein containing death domain (MADD). It is orthologous to Rab3GEP.Caenorhabditis elegans regulator of presynaptic activity aex-3, the ortholog of Rab3GEP.Mouse Rab6 interacting protein 1 (Rab6IP1).Human SET domain-binding factor 1(SBF1).Human suppressor of tumoreginicity 5 (ST5).Human C-MYC promoter-binding protein IRLB.The DENN domain forms a heart-shaped structure, with the N-terminal residues forming one and the C-terminal residues forming the second one. The N-terminal half forms the uDENN domain and consists of a central antiparallel β-sheet layered between one helix and two helices. A long random-coil region links the two lobes. The C-terminal lobe is composed of the cDENN and dDENN domains. The cDENN domain is an alpha/beta three layered sandwich domain with a central sheet of 5-strands. The dDENN domain is an all-alpha helical domain, whose core contains two alpha-hairpins which diverge
rapidly in sequence [,
].Divergent types of the tripartite DENN domain have also been detected in other protein families [
]:Folliculin (FLCN), a tumor suppressor protein disrupted in various cancers and the Birt-Hogg-Dube syndrome, and Smith-Magenis syndrome chromosomal region candidate eight protein (SMCR8), which has been implicated in autophagy [
].FLCN-interacting proteins (FNIP1 and FNIP2), interact with FLCN and function in conjunction with it to regulate cellular energy metabolism both in the kidney tissue and B-cells.C9ORF72 protein, expansions of the hexanucleotide GGGGCC in the first intron of its gene have been implicated in amyotrophic lateral sclerosis (ALS) and fronto-temporal dementia (FTD).This entry represents the FNIP1/FNIP2-type divergent tripartite DENN domain.
The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [
]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [
,
,
].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [
]. This entry represents a highly conserved core region found in the Cas3 family of proteins. These proteins are found in association with CRISPR repeat elements in a broad range of bacteria and archaea [
]. Cas3 is one of four protein families (Cas1 to Cas4) that are associated with CRISPR elements and always occur near a repeat cluster, usually in the order cas3-cas4-cas1-cas2. Cas3 proteins have motifs characteristic of helicases from superfamily 2 and contain a DEAD/DEAH box region and a conserved C-terminal domain. Some but not all Cas3 family members have an N-terminal HD domain region (), although these sequences are not included within this group. These Cas proteins may be involved in DNA metabolism or gene expression [
].
Small GTPases form an independent superfamily within the larger class of
regulatory GTP hydrolases. This superfamily contains proteins that control avast number of important processes and possess a common, structurally
preserved GTP-binding domain [,
]. Sequence comparisons of small G proteinsfrom various species have revealed that they are conserved in primary
structures at the level of 30-55% similarity [].Crystallographic analysis of various small G proteins revealed the presence of a 20kDa catalytic domain that is unique for the whole superfamily [
,
]. The domain is built of five alpha helices (A1-A5), sixβ-strands (B1-B6) and five polypeptide loops (G1-G5). A structural
comparison of the GTP- and GDP-bound form, allows one to distinguish twofunctional loop regions: switch I and switch II that surround the
gamma-phosphate group of the nucleotide. The G1 loop (also called the P-loop)that connects the B1 strand and the A1 helix is responsible for the binding of
the phosphate groups. The G3 loop provides residues for Mg(2+) and phosphatebinding and is located at the N terminus of the A2 helix. The G1 and G3 loops
are sequentially similar to Walker A and Walker B boxes that are found inother nucleotide binding motifs. The G2 loop connects the A1 helix and the B2
strand and contains a conserved Thr residue responsible for Mg(2+) binding.The guanine base is recognised by the G4 and G5 loops. The consensus sequence
NKXD of the G4 loop contains Lys and Asp residues directly interacting withthe nucleotide. Part of the G5 loop located between B6 and A5 acts as a
recognition site for the guanine base [].The small GTPase superfamily can be divided in 8 different families:
Arf small GTPases. GTP-binding proteins involved in protein trafficking by
modulating vesicle budding and un-coating within the Golgi apparatusRan small GTPases. GTP-binding proteins involved in nucleocytoplasmic
transport. Required for the import of proteins into the nucleus and alsofor RNA export
Rab small GTPases. GTP-binding proteins involved in vesicular traffic.
Rho small GTPases. GTP-binding proteins that control cytoskeleton
reorganisationRas small GTPases. GTP-binding proteins involved in signaling pathways
Sar1 small GTPases. Small GTPase component of the coat protein complex II
(COPII) which promotes the formation of transport vesicles from theendoplasmic reticulum (ER)
Mitochondrial Rho (Miro). Small GTPase domain found in mitochondrial
proteins involved in mitochondrial traffickingRoc small GTPases domain. Small GTPase domain always found associated with
the COR domain.This entry represents the Roc small GTPase domain.
Cytochrome P450 enzymes are a superfamily of haem-containing mono-oxygenases that are found in all kingdoms of life, and which show extraordinary diversity in their reaction chemistry. In mammals, these proteins are found primarily in microsomes of hepatocytes and other cell types, where they oxidise steroids, fatty acids and xenobiotics, and are important for the detoxification and clearance of various compounds, as well as for hormone synthesis and breakdown, cholesterol synthesis and vitamin D metabolism. In plants, these proteins are important for the biosynthesis of several compounds such as hormones, defensive compounds and fatty acids. In bacteria, they are important for several metabolic processes, such as the biosynthesis of antibiotic erythromycin in Saccharopolyspora erythraea (Streptomyces erythraeus).Cytochrome P450 enzymes use haem to oxidise their substrates, using protons derived from NADH or NADPH to split the oxygen so a single atom can be added to a substrate. They also require electrons, which they receive from a variety of redox partners. In certain cases, cytochrome P450 can be fused to its redox partner to produce a bi-functional protein, such as with P450BM-3 from Bacillus megaterium [
], which has haem and flavin domains.Organisms produce many different cytochrome P450 enzymes (at least 58 in humans), which together with alternative splicing can provide a wide array of enzymes with different substrate and tissue specificities. Individual cytochrome P450 proteins follow the nomenclature: CYP, followed by a number (family), then a letter (subfamily), and another number (protein); e.g. CYP3A4 is the fourth protein in family 3, subfamily A. In general, family members should share >40% identity, while subfamily members should share >55% identity.Cytochrome P450 proteins can also be grouped by two different schemes. One scheme was based on a taxonomic split: class I (prokaryotic/mitochondrial) and class II (eukaryotic microsomes). The other scheme was based on the number of components in the system: class B (3-components) and class E (2-components). These classes merge to a certain degree. Most prokaryotes and mitochondria (and fungal CYP55) have 3-component systems (class I/class B) - a FAD-containing flavoprotein (NAD(P)H-dependent reductase), an iron-sulphur protein and P450. Most eukaryotic microsomes have 2-component systems (class II/class E) - NADPH:P450 reductase (FAD and FMN-containing flavoprotein) and P450. There are exceptions to this scheme, such as 1-component systems that resemble class E enzymes [
,
,
]. The class E enzymes can be further subdivided into five sequence clusters, groups I-V, each of which may contain more than one cytochrome P450 family (eg, CYP1 and CYP2 are both found in group I). The divergence of the cytochrome P450 superfamily into B- and E-classes, and further divergence into stable clusters within the E-class, appears to be very ancient, occurring before the appearance of eukaryotes.This entry represents the CYP2B family from group I, class E, cytochrome P450 proteins, as well as other CYP2 family proteins. The CYP2 family comprises 15 subfamilies (A-H, J-N, P and Q). The first five (A-E) are present in mammalian liver, but in differing amounts and with different inducibilities. These five subfamilies show varied substrate specificities, with some degree of overlap. The structure-function relationships of CYP2B4 reveals that substrate specificity of an individual protein is determined by active site residues as well as non-active site residues that modulate conformational changes important for substrate access [
]. CYP2B proteins have been linked with toxic effects produced by reactive oxygen species (ROS) via a mechanism known as futile cycling in rodent. The likelihood of toxic activation mediated by CYP2B is minimal in man, as the relevant orthologue is poorly expressed in human liver and is only associated with the toxicity of a very small number of carcinogens and cytotoxic agents.
The genome of pestiviruses is a single plus-strand RNA and contains a
single large open reading frame (ORF), a 5' untranslated region (5'UTR) and a3' untranslated region (3'UTR). The ORF encodes a polyprotein, which is
processed by viral and cellular proteases into mature proteins (the structuralproteins in the N-terminal portion of the polyprotein whilst the replicative
(non-structural or NS) proteins constitute the remainder).The pestivirus NS3
protein is a multifunctional protein possessing serine protease, RNA helicase, and a nucleoside triphosphatase (NTPase) activitieslocated in two functionally distinct domains. The N-terminal one-third of
pestiviral NS3 primarily serves as a protease to process the viralpolyprotein. The helicase and NTPase activities are localized to the C-
terminal of NS3 protein [,
,
]. The NS3 protease domain has approximately 180 aawith a catalytic triad of His-Asp-Ser. Positive regulation (activation) is
observed by its interaction with NS4A. The pestivirus NS3 proteolytic domain constitutes MEROPS peptidase family S31 of clan PA.This entry represents the pestivirus NS3 protease domain.A cysteine peptidase is a proteolytic enzyme that hydrolyses a peptide bond using the thiol group of a cysteine residue as a nucleophile. Hydrolysis involves usually a catalytic triad consisting of the thiol group of the cysteine, the imidazolium ring of a histidine, and a third residue, usually asparagine or aspartic acid, to orientate and activate the imidazolium ring. In only one family of cysteine peptidases, is the role of the general base assigned to a residue other than a histidine: in peptidases from family C89 (acid ceramidase) an arginine is the general base. Cysteine peptidases can be grouped into fourteen different clans, with members of each clan possessing a tertiary fold unique to the clan. Four clans of cysteine peptidases share structural similarities with serine and threonine peptidases and asparagine lyases. From sequence similarities, cysteine peptidases can be clustered into over 80 different families [
]. Clans CF, CM, CN, CO, CP and PD contain only one family.Cysteine peptidases are often active at acidic pH and are therefore confined to acidic environments, such as the animal lysosome or plant vacuole. Cysteine peptidases can be endopeptidases, aminopeptidases, carboxypeptidases, dipeptidyl-peptidases or omega-peptidases. They are inhibited by thiol chelators such as iodoacetate, iodoacetic acid,
N-ethylmaleimide or
p-chloromercuribenzoate.
Clan CA includes proteins with a papain-like fold. There is a catalytic triad which occurs in the order: Cys/His/Asn (or Asp). A fourth residue, usually Gln, is important for stabilising the acyl intermediate that forms during catalysis, and this precedes the active site Cys. The fold consists of two subdomains with the active site between them. One subdomain consists of a bundle of helices, with the catalytic Cys at the end of one of them, and the other subdomain is a β-barrel with the active site His and Asn (or Asp). There are over thirty families in the clan, and tertiary structures have been solved for members of most of these. Peptidases in clan CA are usually sensitive to the small molecule inhibitor E64, which is ineffective against peptidases from other clans of cysteine peptidases [].Clan CD includes proteins with a caspase-like fold. Proteins in the clan have an α/β/α sandwich structure. There is a catalytic dyad which occurs in the order His/Cys. The active site His occurs in a His-Gly motif and the active site Cys occurs in an Ala-Cys motif; both motifs are preceded by a block of hydrophobic residues [
]. Specificity is predominantly directed towards residues that occupy the S1 binding pocket, so that caspases cleave aspartyl bonds, legumains cleave asparaginyl bonds, and gingipains cleave lysyl or arginyl bonds.Clan CE includes proteins with an adenain-like fold. The fold consists of two subdomains with the active site between them. One domain is a bundle of helices, and the other a β-barrel. The subdomains are in the opposite order to those found in peptidases from clan CA, and this is reflected in the order of active site residues: His/Asn/Gln/Cys. This has prompted speculation that proteins in clans CA and CE are related, and that members of one clan are derived from a circular permutation of the structure of the other.Clan CL includes proteins with a sortase B-like fold. Peptidases in the clan hydrolyse and transfer bacterial cell wall peptides. The fold shows a closed β-barrel decorated with helices with the active site at one end of the barrel [
]. The active site consists of a His/Cys catalytic dyad.Cysteine peptidases with a chymotrypsin-like fold are included in clan PA, which also includes serine peptidases. Cysteine peptidases that are N-terminal nucleophile hydrolases are included in clan PB. Cysteine peptidases with a tertiary structure similar to that of the serine-type aspartyl dipeptidase are included in clan PC. Cysteine peptidases with an intein-like fold are included in clan PD, which also includes asparagine lyases.
Protein tyrosine (pTyr) phosphorylation is a common post-translational modification which can create novel recognition motifs for protein interactions and cellular localisation, affect protein stability, and regulate enzyme activity. Consequently, maintaining an appropriate level of protein tyrosine phosphorylation is essential for many cellular functions. Tyrosine-specific protein phosphatases (PTPase;
) catalyse the removal of a phosphate group attached to a tyrosine residue, using a cysteinyl-phosphate enzyme intermediate. These enzymes are key regulatory components in signal transduction pathways (such as the MAP kinase pathway) and cell cycle control, and are important in the control of cell growth, proliferation, differentiation and transformation [
,
]. The PTP superfamily can be divided into four subfamilies []:(1) pTyr-specific phosphatases(2) dual specificity phosphatases (dTyr and dSer/dThr)(3) Cdc25 phosphatases (dTyr and/or dThr)(4) LMW (low molecular weight) phosphatasesBased on their cellular localisation, PTPases are also classified as:Receptor-like, which are transmembrane receptors that contain PTPase domains [
]
Non-receptor (intracellular) PTPases [
]
All PTPases carry the highly conserved active site motif C(X)5R (PTP signature motif), employ a common catalytic mechanism, and share a similar core structure made of a central parallel β-sheet with flanking α-helices containing a β-loop-α-loop that encompasses the PTP signature motif [
]. Functional diversity between PTPases is endowed by regulatory domains and subunits. This entry represents dual specificity protein-tyrosine phosphatases. Ser/Thr and Tyr dual specificity phosphatases are a group of enzymes with both Ser/Thr (
) and tyrosine specific protein phosphatase (
) activity able to remove both the serine/threonine or tyrosine-bound phosphate group from a wide range of phosphoproteins, including a number of enzymes which have been phosphorylated
under the action of a kinase. Dual specificity protein phosphatases (DSPs) regulate mitogenic signal transduction and control the cell cycle. The crystal structure of a human DSP, vaccinia H1-related phosphatase (or VHR), has been determined at 2.1 angstrom resolution []. A shallow active site pocket in VHR allows for the hydrolysis of phosphorylated serine, threonine, or tyrosine protein residues, whereas the deeper active site of protein tyrosine phosphatases (PTPs) restricts substrate specificity to only phosphotyrosine. Positively charged crevices near the active site may explain the enzyme's preference for substrates with two phosphorylated residues. The VHR structure defines a conserved structural scaffold for both DSPs and PTPs. A "recognition region"connecting helix alpha1 to strand beta1, may determine differences in substrate specificity between VHR, the PTPs, and other DSPs.
These proteins may also have inactive phosphatase domains, and dependent on the domain composition this loss of catalytic activity has different effects on protein function. Inactive single domain phosphatases can still specifically bind substrates, and protect again dephosphorylation, while the inactive domains of tandem phosphatases can be further subdivided into two classes. Those which bind phosphorylated tyrosine residues may recruit multi-phosphorylated substrates for the adjacent active domains and are more conserved, while the other class have accumulated several variable amino acid substitutions and have a complete loss of tyrosine binding capability. The second class shows a release of evolutionary constraint for the sites around the catalytic centre, which emphasises a difference in function from the first group. There is a region of higher conservation common to both classes, suggesting a new regulatory centre [
].
Cytochrome P450 enzymes are a superfamily of haem-containing mono-oxygenases that are found in all kingdoms of life, and which show extraordinary diversity in their reaction chemistry. In mammals, these proteins are found primarily in microsomes of hepatocytes and other cell types, where they oxidise steroids, fatty acids and xenobiotics, and are important for the detoxification and clearance of various compounds, as well as for hormone synthesis and breakdown, cholesterol synthesis and vitamin D metabolism. In plants, these proteins are important for the biosynthesis of several compounds such as hormones, defensive compounds and fatty acids. In bacteria, they are important for several metabolic processes, such as the biosynthesis of antibiotic erythromycin in Saccharopolyspora erythraea (Streptomyces erythraeus).Cytochrome P450 enzymes use haem to oxidise their substrates, using protons derived from NADH or NADPH to split the oxygen so a single atom can be added to a substrate. They also require electrons, which they receive from a variety of redox partners. In certain cases, cytochrome P450 can be fused to its redox partner to produce a bi-functional protein, such as with P450BM-3 from Bacillus megaterium [
], which has haem and flavin domains.Organisms produce many different cytochrome P450 enzymes (at least 58 in humans), which together with alternative splicing can provide a wide array of enzymes with different substrate and tissue specificities. Individual cytochrome P450 proteins follow the nomenclature: CYP, followed by a number (family), then a letter (subfamily), and another number (protein); e.g. CYP3A4 is the fourth protein in family 3, subfamily A. In general, family members should share >40% identity, while subfamily members should share >55% identity.Cytochrome P450 proteins can also be grouped by two different schemes. One scheme was based on a taxonomic split: class I (prokaryotic/mitochondrial) and class II (eukaryotic microsomes). The other scheme was based on the number of components in the system: class B (3-components) and class E (2-components). These classes merge to a certain degree. Most prokaryotes and mitochondria (and fungal CYP55) have 3-component systems (class I/class B) - a FAD-containing flavoprotein (NAD(P)H-dependent reductase), an iron-sulphur protein and P450. Most eukaryotic microsomes have 2-component systems (class II/class E) - NADPH:P450 reductase (FAD and FMN-containing flavoprotein) and P450. There are exceptions to this scheme, such as 1-component systems that resemble class E enzymes [,
,
]. The class E enzymes can be further subdivided into five sequence clusters, groups I-V, each of which may contain more than one cytochrome P450 family (eg, CYP1 and CYP2 are both found in group I). The divergence of the cytochrome P450 superfamily into B- and E-classes, and further divergence into stable clusters within the E-class, appears to be very ancient, occurring before the appearance of eukaryotes.This entry represents the CYP2A family from group I, class E, cytochrome P450 proteins, as well as other CYP2 family proteins. The CYP2 family comprises 15 subfamilies (A-H, J-N, P and Q). The first five (A-E) are present in mammalian liver, but in differing amounts and with different inducibilities. These five subfamilies show varied substrate specificities, with some degree of overlap. CYP2A proteins are responsible of metabolising a variety of drugs, which in pigs appears to be gender-dependent (influenced by sex hormones), the highest activity being in females [
].
This entry includes members of the CYP52 family from class E, cytochrome P450 proteins, found in ascomycetes. These enzymes should be classed as sequence cluster group II [
]. Group II proteins are distributed widely amongst the kingdoms of life, but the CYP52 family has been first described only amongst Candida-related species of fungi, and as such may represent a novel development in Candida yeast. CYP52 proteins catalyse the conversion of fatty acids and alkanes to alpha,omega-dicarboxylic acids [,
].Cytochrome P450 enzymes are a superfamily of haem-containing mono-oxygenases that are found in all kingdoms of life, and which show extraordinary diversity in their reaction chemistry. In mammals, these proteins are found primarily in microsomes of hepatocytes and other cell types, where they oxidise steroids, fatty acids and xenobiotics, and are important for the detoxification and clearance of various compounds, as well as for hormone synthesis and breakdown, cholesterol synthesis and vitamin D metabolism. In plants, these proteins are important for the biosynthesis of several compounds such as hormones, defensive compounds and fatty acids. In bacteria, they are important for several metabolic processes, such as the biosynthesis of antibiotic erythromycin in Saccharopolyspora erythraea (Streptomyces erythraeus).Cytochrome P450 enzymes use haem to oxidise their substrates, using protons derived from NADH or NADPH to split the oxygen so a single atom can be added to a substrate. They also require electrons, which they receive from a variety of redox partners. In certain cases, cytochrome P450 can be fused to its redox partner to produce a bi-functional protein, such as with P450BM-3 from Bacillus megaterium [
], which has haem and flavin domains.Organisms produce many different cytochrome P450 enzymes (at least 58 in humans), which together with alternative splicing can provide a wide array of enzymes with different substrate and tissue specificities. Individual cytochrome P450 proteins follow the nomenclature: CYP, followed by a number (family), then a letter (subfamily), and another number (protein); e.g. CYP3A4 is the fourth protein in family 3, subfamily A. In general, family members should share >40% identity, while subfamily members should share >55% identity.Cytochrome P450 proteins can also be grouped by two different schemes. One scheme was based on a taxonomic split: class I (prokaryotic/mitochondrial) and class II (eukaryotic microsomes). The other scheme was based on the number of components in the system: class B (3-components) and class E (2-components). These classes merge to a certain degree. Most prokaryotes and mitochondria (and fungal CYP55) have 3-component systems (class I/class B) - a FAD-containing flavoprotein (NAD(P)H-dependent reductase), an iron-sulphur protein and P450. Most eukaryotic microsomes have 2-component systems (class II/class E) - NADPH:P450 reductase (FAD and FMN-containing flavoprotein) and P450. There are exceptions to this scheme, such as 1-component systems that resemble class E enzymes [
,
,
]. The class E enzymes can be further subdivided into five sequence clusters, groups I-V, each of which may contain more than one cytochrome P450 family (eg, CYP1 and CYP2 are both found in group I). The divergence of the cytochrome P450 superfamily into B- and E-classes, and further divergence into stable clusters within the E-class, appears to be very ancient, occurring before the appearance of eukaryotes.
This entry includes members of the CYP52 family from class E, cytochrome P450 proteins, found in fungi. These enzymes should be classed as sequence cluster group II [
]. Group II proteins are distributed widely amongst the kingdoms of life, but the CYP52 family has been first described only amongst Candida-related species of fungi, and as such may represent a novel development in Candida yeast. CYP52 proteins catalyse the conversion of fatty acids and alkanes to alpha,omega-dicarboxylic acids [,
].Cytochrome P450 enzymes are a superfamily of haem-containing mono-oxygenases that are found in all kingdoms of life, and which show extraordinary diversity in their reaction chemistry. In mammals, these proteins are found primarily in microsomes of hepatocytes and other cell types, where they oxidise steroids, fatty acids and xenobiotics, and are important for the detoxification and clearance of various compounds, as well as for hormone synthesis and breakdown, cholesterol synthesis and vitamin D metabolism. In plants, these proteins are important for the biosynthesis of several compounds such as hormones, defensive compounds and fatty acids. In bacteria, they are important for several metabolic processes, such as the biosynthesis of antibiotic erythromycin in Saccharopolyspora erythraea (Streptomyces erythraeus).Cytochrome P450 enzymes use haem to oxidise their substrates, using protons derived from NADH or NADPH to split the oxygen so a single atom can be added to a substrate. They also require electrons, which they receive from a variety of redox partners. In certain cases, cytochrome P450 can be fused to its redox partner to produce a bi-functional protein, such as with P450BM-3 from Bacillus megaterium [
], which has haem and flavin domains.Organisms produce many different cytochrome P450 enzymes (at least 58 in humans), which together with alternative splicing can provide a wide array of enzymes with different substrate and tissue specificities. Individual cytochrome P450 proteins follow the nomenclature: CYP, followed by a number (family), then a letter (subfamily), and another number (protein); e.g. CYP3A4 is the fourth protein in family 3, subfamily A. In general, family members should share >40% identity, while subfamily members should share >55% identity.Cytochrome P450 proteins can also be grouped by two different schemes. One scheme was based on a taxonomic split: class I (prokaryotic/mitochondrial) and class II (eukaryotic microsomes). The other scheme was based on the number of components in the system: class B (3-components) and class E (2-components). These classes merge to a certain degree. Most prokaryotes and mitochondria (and fungal CYP55) have 3-component systems (class I/class B) - a FAD-containing flavoprotein (NAD(P)H-dependent reductase), an iron-sulphur protein and P450. Most eukaryotic microsomes have 2-component systems (class II/class E) - NADPH:P450 reductase (FAD and FMN-containing flavoprotein) and P450. There are exceptions to this scheme, such as 1-component systems that resemble class E enzymes [
,
,
]. The class E enzymes can be further subdivided into five sequence clusters, groups I-V, each of which may contain more than one cytochrome P450 family (eg, CYP1 and CYP2 are both found in group I). The divergence of the cytochrome P450 superfamily into B- and E-classes, and further divergence into stable clusters within the E-class, appears to be very ancient, occurring before the appearance of eukaryotes.
The FAS1 (fasciclin-like) domain is an extracellular module of about 140 amino acid residues. It has been suggested that the FAS1 domain represents an ancient cell adhesion domain common to plants and animals [
]; related FAS1 domains are also found in bacteria [].The crystal structure of FAS1 domains 3 and 4 of fasciclin I from Drosophila melanogaster (Fruit fly) has been determined, revealing a novel domain fold consisting of a seven-stranded beta wedge and at least five alpha helices; two well-ordered N-acetylglucosamine groups attached to a conserved asparagine are located in the interface region between the two FAS1 domains [
]. Fasciclin I is an insect neural cell adhesion molecule involved in axonal guidance that is attached to the membrane by a GPI-anchored protein.FAS1 domains are present in many secreted and membrane-anchored proteins. These proteins are usually GPI anchored and consist of: (i) a single FAS1 domain, (ii) a tandem array of FAS1 domains, or (iii) FAS1 domain(s) interspersed with other domains. Proteins known to contain a FAS1 domain include:Fasciclin I (4 FAS1 domains).Human TGF-beta induced Ig-H3 (BIgH3) protein (4 FAS1 domains), where the FAS1 domains mediate cell adhesion through an interaction with alpha3/beta1 integrin; mutation in the FAS1 domains result in corneal dystrophy [
].Volvox major cell adhesion protein (2 FAS1 domains) [
].Arabidopsis fasciclin-like arabinogalactan proteins (2 FAS1 domains) [
].Mammalian stabilin protein, a family of fasciclin-like hyaluronan receptor homologues (7 FAS1 domains)[
].Human extracellular matrix protein periostin (4 FAS1 domains).Bacterial immunogenic protein MPT70 (1 FAS1 domain) [
].The FAS1 domains of both human periostin (
) and BIgH3 (
) proteins were found to contain vitamin K-dependent gamma-carboxyglutamate residues [
]. Gamma-carboxyglutamate residues are more commonly associated with GLA domains (), where they occur through post-translational modification catalysed by the vitamin K-dependent enzyme gamma-glutamylcarboxylase.
The 2-oxo acid dehydrogenase multienzyme complexes [
] from bacterial andeukaryotic sources catalyze the oxidative decarboxylation of 2-oxo acids to
the corresponding acyl-CoA. These include: Pyruvate dehydrogenase complex (PDC). 2-oxoglutarate dehydrogenase complex (OGDC). Branched-chain 2-oxo acid dehydrogenase complex (BCOADC). These three complexes share a common architecture: they are composed of
multiple copies of three component enzymes - E1, E2 and E3. E1 is a thiaminepyrophosphate-dependent 2-oxo acid dehydrogenase, E2 a dihydrolipamide
acyltransferase, and E3 an FAD-containing dihydrolipamide dehydrogenase. E2 acyltransferases have an essential cofactor, lipoic acid, which is
covalently bound via a amide linkage to a lysine group. The E2 components ofOGCD and BCOACD bind a single lipoyl group, while those of PDC bind either one
(in yeast and in Bacillus), two (in mammals), or three (in Azotobacter and inEscherichia coli) lipoyl groups [
]. In addition to the E2 components of the three enzymatic complexes described
above, a lipoic acid cofactor is also found in the following proteins: H-protein of the glycine cleavage system (GCS) [
]. GCS is a multienzymecomplex of four protein components, which catalyzes the degradation of
glycine. H protein shuttles the methylamine group of glycine from the Pprotein to the T protein. H-protein from either prokaryotes or eukaryotes
binds a single lipoic group. Mammalian and yeast pyruvate dehydrogenase complexes differ from that of
other sources, in that they contain, in small amounts, a protein of unknownfunction - designated protein X or component X. Its sequence is closely
related to that of E2 subunits and seems to bind a lipoic group []. Fast migrating protein (FMP) (gene acoC) from Ralstonia eutropha (Alcaligenes eutrophus) [
].This protein is most probably a dihydrolipamide acyltransferase involved in
acetoin metabolism. This signature contains the lipoyl-binding lysine residue. The domain surronding this site is evolutionary related to that around the biotin-binding lysine residue of biotin requiring enzymes.
The cysteine-rich secretory proteins, antigen 5, and pathogenesis-related 1 proteins (CAP) superfamily proteins are found in a wide range of organisms, including prokaryotes [
] and non-vertebrate eukaryotes [], The nine subfamilies of the mammalian CAP superfamily include: the human glioma pathogenesis-related 1 (GLIPR1), Golgi associated pathogenesis related-1 (GAPR1) proteins, peptidase inhibitor 15 (PI15), peptidase inhibitor 16 (PI16), cysteine-rich secretory proteins (CRISPs), CRISP LCCL domain containing 1 (CRISPLD1), CRISP LCCL domain containing 2 (CRISPLD2), mannose receptor like and the R3H domain containing like proteins. Members are most often secreted and have an extracellular endocrine or paracrine function and are involved in processes including the regulation of extracellular matrix and branching morphogenesis, potentially as either proteases or protease inhibitors; in ion channel regulation in fertility; as tumour suppressor or pro-oncogenic genes in tissues including the prostate; and in cell-cell adhesion during fertilisation. The overall protein structural conservation within the CAP superfamily results in fundamentally similar functions for the CAP domain in all members, yet the diversity outside of this core region dramatically alters the target specificity and, thus, the biological consequences []. The Ca2-chelating function [] would fit with the various signalling processes (e.g. the CRISP proteins) that members of this family are involved in, and also the sequence and structural evidence of a conserved pocket containing two histidines and a glutamate. It also may explain how blocks the Ca2 transporting ryanodine receptors.
This entry represents a subgroup of the CAP domains found only in bacteria capable of endospore formation. Proteins containing this domain include YkwD of Bacillus subtilis. This domain is generally found at the C-terminal region of these proteins, while the N-terminal region sometimes contains a domain homologous to the spore coat assembly protein SafA (
).
G protein-coupled receptors are a large family of signalling molecules that respond to a wide variety of extracellular stimuli. The receptors relay the information encoded by the ligand through the activation of heterotrimeric G proteins and intracellular effector molecules. To ensure the appropriate regulation of the signalling cascade, it is vital to properly inactivate the receptor. This inactivation is achieved, in part, by the binding of a soluble protein, arrestin, which uncouples the receptor from the downstream G protein after the receptors are phosphorylated by G protein-coupled receptor kinases. In addition to the inactivation of G protein-coupled receptors, arrestins have also been implicated in the endocytosis of receptors and cross talk with other signalling pathways. Arrestin (retinal S-antigen) is a major protein of the retinal rod outer segments. It interacts with photo-activated phosphorylated rhodopsin, inhibiting or 'arresting' its ability to interact with transducin [
]. The protein binds calcium, and shows similarity in its C terminus to alpha-transducin and other purine nucleotide-binding proteins. In mammals, arrestin is associated with autoimmune uveitis.
Arrestins comprise a family of closely-related proteins that includes beta-arrestin-1 and -2, which regulate the function of beta-adrenergic receptors by binding to their phosphorylated forms, impairing their capacity to activate G(S) proteins; Cone photoreceptors C-arrestin (arrestin-X) [
], which could bind to phosphorylated red/green opsins; and Drosophila phosrestins I and II, which undergo light-induced phosphorylation, and probably play a role in photoreceptor transduction [,
,
].The crystal structure of bovine retinal arrestin comprises two domains of antiparallel β-sheets connected through a hinge region and one short α-helix on the back of the amino-terminal fold [
]. The binding region for phosphorylated light-activated rhodopsin is located at the N-terminal domain, as indicated by the docking of the photoreceptor to the three-dimensional structure of arrestin. The N-terminal domain consists of an immunoglobulin-like β-sandwich structure.
The FAS1 (fasciclin-like) domain is an extracellular module of about 140 amino acid residues. It has been suggested that the FAS1 domain represents an ancient cell adhesion domain common to plants and animals [
]; related FAS1 domains are also found in bacteria [].The crystal structure of FAS1 domains 3 and 4 of fasciclin I from Drosophila melanogaster (Fruit fly) has been determined, revealing a novel domain fold consisting of a seven-stranded beta wedge and at least five alpha helices; two well-ordered N-acetylglucosamine groups attached to a conserved asparagine are located in the interface region between the two FAS1 domains [
]. Fasciclin I is an insect neural cell adhesion molecule involved in axonal guidance that is attached to the membrane by a GPI-anchored protein.FAS1 domains are present in many secreted and membrane-anchored proteins. These proteins are usually GPI anchored and consist of: (i) a single FAS1 domain, (ii) a tandem array of FAS1 domains, or (iii) FAS1 domain(s) interspersed with other domains. Proteins known to contain a FAS1 domain include:Fasciclin I (4 FAS1 domains).Human TGF-beta induced Ig-H3 (BIgH3) protein (4 FAS1 domains), where the FAS1 domains mediate cell adhesion through an interaction with alpha3/beta1 integrin; mutation in the FAS1 domains result in corneal dystrophy [
].Volvox major cell adhesion protein (2 FAS1 domains) [
].Arabidopsis fasciclin-like arabinogalactan proteins (2 FAS1 domains) [
].Mammalian stabilin protein, a family of fasciclin-like hyaluronan receptor homologues (7 FAS1 domains)[
].Human extracellular matrix protein periostin (4 FAS1 domains).Bacterial immunogenic protein MPT70 (1 FAS1 domain) [
].The FAS1 domains of both human periostin (
) and BIgH3 (
) proteins were found to contain vitamin K-dependent gamma-carboxyglutamate residues [
]. Gamma-carboxyglutamate residues are more commonly associated with GLA domains (), where they occur through post-translational modification catalysed by the vitamin K-dependent enzyme gamma-glutamylcarboxylase.
G protein-coupled receptors are a large family of signalling molecules that respond to a wide variety of extracellular stimuli. The receptors relay the information encoded by the ligand through the activation of heterotrimeric G proteins and intracellular effector molecules. To ensure the appropriate regulation of the signalling cascade, it is vital to properly inactivate the receptor. This inactivation is achieved, in part, by the binding of a soluble protein, arrestin, which uncouples the receptor from the downstream G protein after the receptors are phosphorylated by G protein-coupled receptor kinases. In addition to the inactivation of G protein-coupled receptors, arrestins have also been implicated in the endocytosis of receptors and cross talk with other signalling pathways. Arrestin (retinal S-antigen) is a major protein of the retinal rod outer segments. It interacts with photo-activated phosphorylated rhodopsin, inhibiting or 'arresting' its ability to interact with transducin []. The protein binds calcium, and shows similarity in its C terminus to alpha-transducin and other purine nucleotide-binding proteins. In mammals, arrestin is associated with autoimmune uveitis.Arrestins comprise a family of closely-related proteins that includes beta-arrestin-1 and -2, which regulate the function of beta-adrenergic receptors by binding to their phosphorylated forms, impairing their capacity to activate G(S) proteins; Cone photoreceptors C-arrestin (arrestin-X) [
], which could bind to phosphorylated red/green opsins; and Drosophila phosrestins I and II, which undergo light-induced phosphorylation, and probably play a role in photoreceptor transduction [,
,
].The crystal structure of bovine retinal arrestin comprises two domains of antiparallel β-sheets connected through a hinge region and one short α-helix on the back of the amino-terminal fold [
]. The binding region for phosphorylated light-activated rhodopsin is located at the N-terminal domain, as indicated by the docking of the photoreceptor to the three-dimensional structure of arrestin.
Bcl-2 proteins are central regulators of caspase activation, and play a key role in cell death by regulating the integrity of the mitochondrial and endoplasmic reticulum (ER) membranes [
]. At least 20 Bcl-2 proteins have been reported in mammals, and several others have been identified in viruses. Bcl-2 family proteins fall roughly into three subtypes, which either promote cell survival (anti-apoptotic) or trigger cell death (pro-apoptotic). All members contain at least one of four conserved motifs, termed Bcl-2 Homology (BH) domains. Bcl-2 subfamily proteins, which contain at least BH1 and BH2, promote cell survival by inhibiting the adapters needed for the activation of caspases.Pro-apoptotic members potentially exert their effects by displacing the adapters from the pro-survival proteins; these proteins belong either to the Bax subfamily, which contain BH1-BH3, or to the BH3 subfamily, which mostly only feature BH3 [
]. Thus, the balance between antagonistic family members is believed to play a role in determining cell fate. Members of the wider Bcl-2 family, which also includes Bcl-x, Bcl-w and Mcl-1, are described by their similarity to Bcl-2 protein, a member of the pro-survival Bcl-2 subfamily []. Full-length Bcl-2 proteins feature all four BH domains, seven α-helices, and a C-terminal hydrophobic motif that targets the protein to the outer mitochondrial membrane, ER and nuclear envelope. BID is a member of the Bcl-2 superfamily of proteins that are key regulators of programmed cell death, hence this family is related to the Apoptosis regulator Bcl-2 protein BH domain. BID is a pro-apoptotic member of the Bcl-2 superfamily and as such posses the ability to target intracellular membranes and contains the BH3 death domain. The activity of BID is regulated by a Caspase 8-mediated cleavage event, exposing the BH3 domain and significantly changing the surface charge and hydrophobicity, which causes a change of cellular localisation [
].
G protein-coupled receptors are a large family of signalling molecules that respond to a wide variety of extracellular stimuli. The receptors relay the information encoded by the ligand through the activation of heterotrimeric G proteins and intracellular effector molecules. To ensure the appropriate regulation of the signalling cascade, it is vital to properly inactivate the receptor. This inactivation is achieved, in part, by the binding of a soluble protein, arrestin, which uncouples the receptor from the downstream G protein after the receptors are phosphorylated by G protein-coupled receptor kinases. In addition to the inactivation of G protein-coupled receptors, arrestins have also been implicated in the endocytosis of receptors and cross talk with other signalling pathways. Arrestin (retinal S-antigen) is a major protein of the retinal rod outer segments. It interacts with photo-activated phosphorylated rhodopsin, inhibiting or 'arresting' its ability to interact with transducin [
]. The protein binds calcium, and shows similarity in its C terminus to alpha-transducin and other purine nucleotide-binding proteins. In mammals, arrestin is associated with autoimmune uveitis.Arrestins comprise a family of closely-related proteins that includes beta-arrestin-1 and -2, which regulate the function of beta-adrenergic receptors by binding to their phosphorylated forms, impairing their capacity to activate G(S) proteins; Cone photoreceptors C-arrestin (arrestin-X) [
], which could bind to phosphorylated red/green opsins; and Drosophila phosrestins I and II, which undergo light-induced phosphorylation, and probably play a role in photoreceptor transduction [,
,
].The crystal structure of bovine retinal arrestin comprises two domains of antiparallel β-sheets connected through a hinge region and one short α-helix on the back of the amino-terminal fold [
]. The binding region for phosphorylated light-activated rhodopsin is located at the N-terminal domain, as indicated by the docking of the photoreceptor to the three-dimensional structure of arrestin.
This entry represents the metalloprotease inhibitor I38, as well as the outer membrane lipoprotein Omp19.This family of proteins represent monomeric serralysin inhibitors of about 125 residues, which interact with specific metalloprotease which are synthesised by serralysin secretors and characterised by being plant, insect and animal pathogens. It is probable that the serralysin inhibitors protect the host from proteolysis during export of the protease. The members of this family belong to MEROPS proteinase inhibitor family I38, clan IK.X-ray crystallography of a complex between the Serratia marcescens protease, SmaPI, and the inhibitor of Erwinia chrysanthemi, Inh, reveals that Inh is folded into an eight-stranded b-barrel with an N-terminal trunk of 10 residues. Residues 1-5 occupy part of the extended active site of the proteinase, thereby preventing access of the substrate. Residues 6-10 form a linker that connects the N-terminal proteinase-binding peptide to the body of the b-barrel. The backbone carbonyl of Ser-1 interacts with the catalytic zinc; the Ser-2 side chain occupies the S1'-binding site and also forms a hydrogen bond to the carboxyl end of the catalytic Glu, whereas Leu-3 occupies the S2' recognition site. Penetration of the trunk region further than 5 residues into the substrate binding cleft appears to be prevented by the b-barrel, which itself interacts with the proteinase near its Met turn (19). Peptide mimetics of the trunk at concentrations up to about 100 mM do not inhibit the protease, demonstrating that the barrel is essential for inhibitory activity [
,
].Structurally and functionally these inhibitors are closely related to the
lipocalins, fatty acid-binding proteins, avidins and the enigmatic triabin.Together these five protein families constitute the calycin superfamily [
]. The proteins are characterised by their high specificity for small hydrophobic molecules and by their ability to form complexes with soluble macromolecules either through intramolecular disulphides or protein-protein interactions [
].
This entry represents the C-terminal domain of YidC/Oxa1/ALB proteins from some species and full length protein from other species. Members of this group of proteins are found in bacteria and eukaryotes.YidC is a bacterial membrane protein which is required for the insertion and assembly of inner membrane proteins [
,
]. The well-characterised YidC protein from Escherichia coli and its close homologues contain a large N-terminal periplasmic domain ().
COX18 is a mitochondrial membrane insertase required for the translocation of the C terminus of cytochrome c oxidase subunit II (MT-CO2/COX2) across the mitochondrial inner membrane. It plays a role in MT-CO2/COX2 maturation following the COX20-mediated stabilization of newly synthesized MT-CO2/COX2 protein and before the action of the metallochaperones SCO1/2 [
].OXA1 is a mitochondrial inner membrane insertase that mediates the insertion of both mitochondrion-encoded precursors and nuclear-encoded proteins from the matrix into the inner membrane. It links mitoribosomes with the inner membrane [
].Plant ALBINO3-like proteins are required for the insertion of some light harvesting chlorophyll-binding proteins (LHCP) into the chloroplast thylakoid membrane [
,
].
The CHROMO (CHRromatin Organization MOdifier) domain [
,
,
,
] is a conserved region of around 60 amino acids, originally identified in Drosophila modifiers of variegation.These are proteins that alter the structure of chromatin to the condensed morphology of heterochromatin, a cytologically visible condition where gene expression is repressed. In one of these proteins, Polycomb, the chromo domain has been shown to be important for chromatin targeting. Proteins that contain a chromo domain appear to fall into 3 classes. The first class includes proteins having an N-terminal chromo domain followed by a region termed the chromo shadow domain [
], eg. Drosophila and human heterochromatin protein Su(var)205 (HP1). The second class includes proteins with a single chromo domain, eg. Drosophila protein Polycomb (Pc); mammalian modifier 3; human Mi-2 autoantigen and and several yeast and Caenorhabditis elegans hypothetical proteins. In the third class paired tandem chromo domains are found, eg. in mammalian DNA-binding/helicase proteins CHD-1 to CHD-4 and yeast protein CHD1.This entry represents a subgroup of the Chromo domain
This entry represents the TED (thiol ester-containing domain) domain found in alpha2-macroglobulin (alpha (2)-M) and related proteins [
,
,
]. This domain has a short highly conserved region of proteinase-binding alpha-macro-globulins containing the cysteine and a glutamine of a thiol-ester bond that is cleaved at the moment of proteinase binding, and mediates the covalent binding of the alpha-macro-globulin to the proteinase. The GCGEQ motif is highly conserved [,
,
,
]. Proteins containing this domain also include pregnancy zone protein (PZP). Alpha(2)-M and PZP are broadly specific proteinase inhibitors. Alpha (2)-M is a major carrier protein in serum. The structural thioester of alpha (2)-M, is involved in the immobilization and entrapment of proteases. PZP is a trace protein in the plasma of non-pregnant females and males which is elevated in pregnancy. Alpha (2)-M and PZP bind to placental protein-14 and may modulate its activity in T-cell growth and cytokine production contributing to fetal survival. It has been suggested that thioester bond cleavage promotes the binding of PZP and alpha (2)-M to the CD91 receptor clearing them from circulation [,
,
].
This entry represents the cavin family (caveolae-associated proteins; previously known as the PTRF/SDPR family), which includes proteins cavin-1 to 4. They are critical regulators for caveolae dynamics [
]. Caveolae are invaginations of the plasma membrane involved in many cellular processes, including clathrin-independent endocytosis, cholesterol transport, and signal transduction. Caveolins are the principal components of caveolae membranes. Polymerase I and transcript release factor (PTRF; also known as cavin-1) is essential for formation of caveolae and proper localisation of caveolins []. Cavin-2 (also known as serum deprivation-response protein) may play a role in targeting protein kinase C alpha (PKCalpha) to caveolae [
]. Cavin-3 (protein kinase C delta-binding protein, also known as SRBC) seems to have an immune potentiation function, especially in the glioma [,
]. Cavin-4 (muscle-related coiled-coil protein) is a muscle-restricted cavin [].The members of the cavin family contain putative leucine zipper-like domains normally involved in protein-protein interactions and PEST domains (proline, glutamic acid, serine and threonine-rich domains), which may play a role in targeting proteins towards proteolytic degradation [
].
The family of alphaviruses includes 26 known members. They infect a variety of hosts including mosquitoes, birds, rodents and other mammals with worldwide
distribution. Alphaviruses also pose a potential threat to human health inmany area. For example, Venezuelan Equine Encephalitis Virus (VEEV) causes
encephalitis in humans as well as livestock in Central and South America, andsome variants of Sinbis Virus (SIN) and Semliki Forest Virus (SFV) have been
found to cause fever and arthritis in humans [].Alphaviruses possess a single-stranded RNA genome of approximately 12 kb. The genomic RNA of alphaviruses is translated into two polyproteins that,
respectively, encode structural proteins and nonstructural proteins. Thenonstructural proteins may be translated as one or two polyproteins, nsp123 or
nsp1234, depending on the virus. These polyproteins are cleaved to generatensp1, nsp2, nsp3 and nsp4 by a protease activity that resides within nsp2. The
nsp2 protein of alphaviruses has multiple enzymatic acivities. Its N-terminaldomain has been shown to possess ATPase and GTPase activity, RNA helicase
activity and RNA 5'-triphosphatase activity. The C-terminal nsp2pro domain ofnsp2 is responsible for the regulation of 26S subgenome RNA synthesis,
switching between negative- and positive-strand RNA synthesis, targeting nsp2for nuclear transport and proteolytic processing of the nonstructural
polyprotein [,
]. The nsp2pro domain is a member of peptidase family C9 of clan CA.The nsp2pro domain consists of two distinct subdomains. The
nsp2pro N-terminal subdomain is largely α-helical and contains thecatalytic dyad cysteine and histidine residues organised in a protein fold
that differs significantly from any known cysteine protease or protein folds.The nsp2pro C-terminal subdomain displays structural similarity to S-adenosyl-
L-methionine-dependent RNA methyltransferases and provides essential elementsthat contribute to substrate recognition and may also regulate the structure
of the substrate binding cleft [].This domain covers the entire nsp2pro domain.A cysteine peptidase is a proteolytic enzyme that hydrolyses a peptide bond using the thiol group of a cysteine residue as a nucleophile. Hydrolysis involves usually a catalytic triad consisting of the thiol group of the cysteine, the imidazolium ring of a histidine, and a third residue, usually asparagine or aspartic acid, to orientate and activate the imidazolium ring. In only one family of cysteine peptidases, is the role of the general base assigned to a residue other than a histidine: in peptidases from family C89 (acid ceramidase) an arginine is the general base. Cysteine peptidases can be grouped into fourteen different clans, with members of each clan possessing a tertiary fold unique to the clan. Four clans of cysteine peptidases share structural similarities with serine and threonine peptidases and asparagine lyases. From sequence similarities, cysteine peptidases can be clustered into over 80 different families []. Clans CF, CM, CN, CO, CP and PD contain only one family.Cysteine peptidases are often active at acidic pH and are therefore confined to acidic environments, such as the animal lysosome or plant vacuole. Cysteine peptidases can be endopeptidases, aminopeptidases, carboxypeptidases, dipeptidyl-peptidases or omega-peptidases. They are inhibited by thiol chelators such as iodoacetate, iodoacetic acid,
N-ethylmaleimide or
p-chloromercuribenzoate.
Clan CA includes proteins with a papain-like fold. There is a catalytic triad which occurs in the order: Cys/His/Asn (or Asp). A fourth residue, usually Gln, is important for stabilising the acyl intermediate that forms during catalysis, and this precedes the active site Cys. The fold consists of two subdomains with the active site between them. One subdomain consists of a bundle of helices, with the catalytic Cys at the end of one of them, and the other subdomain is a β-barrel with the active site His and Asn (or Asp). There are over thirty families in the clan, and tertiary structures have been solved for members of most of these. Peptidases in clan CA are usually sensitive to the small molecule inhibitor E64, which is ineffective against peptidases from other clans of cysteine peptidases [
].Clan CD includes proteins with a caspase-like fold. Proteins in the clan have an α/β/α sandwich structure. There is a catalytic dyad which occurs in the order His/Cys. The active site His occurs in a His-Gly motif and the active site Cys occurs in an Ala-Cys motif; both motifs are preceded by a block of hydrophobic residues [
]. Specificity is predominantly directed towards residues that occupy the S1 binding pocket, so that caspases cleave aspartyl bonds, legumains cleave asparaginyl bonds, and gingipains cleave lysyl or arginyl bonds.Clan CE includes proteins with an adenain-like fold. The fold consists of two subdomains with the active site between them. One domain is a bundle of helices, and the other a β-barrel. The subdomains are in the opposite order to those found in peptidases from clan CA, and this is reflected in the order of active site residues: His/Asn/Gln/Cys. This has prompted speculation that proteins in clans CA and CE are related, and that members of one clan are derived from a circular permutation of the structure of the other.Clan CL includes proteins with a sortase B-like fold. Peptidases in the clan hydrolyse and transfer bacterial cell wall peptides. The fold shows a closed β-barrel decorated with helices with the active site at one end of the barrel [
]. The active site consists of a His/Cys catalytic dyad.Cysteine peptidases with a chymotrypsin-like fold are included in clan PA, which also includes serine peptidases. Cysteine peptidases that are N-terminal nucleophile hydrolases are included in clan PB. Cysteine peptidases with a tertiary structure similar to that of the serine-type aspartyl dipeptidase are included in clan PC. Cysteine peptidases with an intein-like fold are included in clan PD, which also includes asparagine lyases.
Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [,
,
,
,
]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. SINA/Siah family proteins represent mammalian homologs of the Drosophila SINA (seven in absentia) protein. SINA is required for R7 photoreceptor cell differentiation within the sevenless pathway [
]. Members of this family are E3 ubiquitin ligases that regulate ubiquitination and protein degradation. Siahs are known to recognise several target proteins including Deleted in Colorectal Cancer (DCC), synaptophysin and Numb and promote their degradation [,
]. SINA/Siah sequences are highly conserved from plants to mammals. Whereas the N terminus and RING domain of Siah bind E2 proteins, the C terminus can be considered as a substrate- and cofactor-interaction domain (substrate-binding domain, SBD) that interacts with a number of proteins, some of which are degraded []. The SBD domain displays some sequence similarities with the C-terminal region of TRAF proteins. It contains a cysteine-rich region, the SIAH-type zinc finger, with eight totally conserved Cys and His residues that coordinate two zinc atoms []. The crystal structure of SIAH-type zinc finger has been solved []. It folds in two subdomains, each one binding one zinc atom and consisting of two β-strands and an α-helix.
Chromo shadow domain is distantly related to chromo domain. It is always found in association with a chromo domain. The CHROMO (CHRromatin Organization MOdifier) domain [
,
,
,
] is a conserved region of around 60 amino acids, originally identified in Drosophila modifiers of variegation.
These are proteins that alter the structure of chromatin to the condensed morphology of heterochromatin, a cytologically visible condition where gene expression is repressed. In one of these proteins, Polycomb,
the chromo domain has been shown to be important for chromatin targeting. Proteins that contain a chromo domain appear to fall into 3 classes. The first class includes proteins having an N-terminal chromo domain
followed by a region termed the chromo shadow domain [], eg. Drosophila and human heterochromatin protein Su(var)205 (HP1); and mammalian modifier 1 and modifier 2. The second class includes proteins with
a single chromo domain, eg. Drosophila protein Polycomb (Pc); mammalian modifier 3; human Mi-2 autoantigenand and several yeast and Caenorhabditis elegans hypothetical proteins. In the third class paired tandem chromo domains are
found, eg. in mammalian DNA-binding/helicase proteins CHD-1 to CHD-4 and yeast protein CHD1.
The ERM family consists of three closely-related proteins, ezrin, radixin and moesin [
,
]. Ezrin was first identified as a constituent of microvilli, radixin as a barbed, end-capping actin-modulating protein from isolated junctional fractions, and moesin as a heparin-binding protein []. ERM proteins crosslink actin filaments with plasma membranes. They co-localise with CD44 at actin filament plasma membrane interaction sites, associating with CD44 via their N-terminal domains and with actin filaments via their C-terminal domains []. A tumour suppressor molecule responsible for neurofibromatosis type 2 (NF2) is highly similar to ERM proteins and has been designated merlin (moesin-ezrin-radixin-like protein) []. ERM molecules contain 3 domains, an N-terminal globular domain, an extended α-helical domain and a charged C-terminal domain () [
]. Ezrin, radixin and merlin also contain a polyproline linker region between the helical and C-terminal domains. The N-terminal domain is highly conserved and is also found in merlin, band 4.1 proteins and members of the band 4.1 superfamily, designated the FERM domain. This entry represents the α-helical domain, which is involved in intramolecular masking of protein-protein interaction sites, that regulate the activity of these proteins [,
].
The eubacterial secY protein [
] plays an important role in protein export. Itinteracts with the signal sequences of secretory proteins as well as with two
other components of the protein translocation system: secA and secE. SecY isan integral plasma membrane protein of 419 to 492 amino acid residues that
apparently contains ten transmembrane segments. Such a structure probablyconfers to secY a 'translocator' function, providing a channel for periplasmic
and outer-membrane precursor proteins.Homologues of secY are found in archaebacteria [
]. SecY is also encoded inthe chloroplast genome of some algae [
] where it could be involved in aprokaryotic-like protein export system across the two membranes of the
chloroplast endoplasmic reticulum (CER) which is present in chromophyte andcryptophyte algae.
In eukaryotes, the evolutionary related protein sec61-alpha plays a role in
protein translocation through the endoplasmic reticulum; it is part of atrimeric complex that also consist of sec61-beta and gamma [
].This entry represents two conserved sites for secY. The first
corresponds to the second transmembrane region, which is the most conservedsection of these proteins. The second spans the C-terminal part of the fourth
transmembrane region, a short intracellular loop, and the N-terminal part ofthe fifth transmembrane region.
The amino-terminal module of the poxvirus D6R/NIR proteins defines a novel conserved DNA-binding domain (the KilA-N domain) that is found in a wide range of proteins of large bacterial and eukaryotic DNA viruses [
]. Putative proteins with homology to the KilA-N domain have also been identified in Maverick transposable elements of the parabasalid protozoa Trichomonas vaginalis []. The KilA-N domain has been suggested to be homologous to the fungal DNA-binding APSES domain (see ). In all proteins shown to contain the KilA-N domain, it occurs at the extreme amino terminus accompanied by a wide range of distinct carboxy-terminal domains. These carboxy-terminal modules may be enzymes, such as the nuclease domains, or might mediate additional, specific interactions with nucleic acids or proteins, like the RING (see
) or CCCH fingers in the poxviruses [
]. The KilA-N domain is predicted to adopt an α-β fold with four conserved strands and at least two conserved helices []. Some proteins known to contain a KilA-N domain are listed below:Bacteriophage P1 protein kilA (
).
Fowlpox virus (FPV) protein FPV236 (
).
Trichomonas vaginalis G3 Putative uncharacterised protein (
).
Vaccinia virus hypothetical 21.7kDa HindIII-C protein (
).
Synonym(s): Protein-glutamine gamma-glutamyltransferase, Fibrinoligase, TGase. Protein-glutamine gamma-glutamyltransferases (
) (TGase) are calcium-dependent enzymes that catalyse the cross-linking of proteins by promoting the formation of isopeptide bonds between the γ-carboxyl group of a glutamine in one polypeptide chain and the ε-amino group of a lysine in a second polypeptide chain. TGases also catalyse the conjugation of polyamines to proteins [
,
].Transglutaminases are widely distributed in various organs, tissues and body fluids. The best known transglutaminase is blood coagulation factor XIII, a plasma tetrameric protein composed of two catalytic A subunits and two non-catalytic B subunits. Factor XIII is responsible for cross-linking fibrin chains, thus stabilising the fibrin clot.There are commonly three domains: N-terminal, middle (
) and C-terminal (
). This entry represents the N-terminal domain found in transglutaminases.
Proteins containing this domain also include Protein 4.2 (also known as Epb42), which is one of the most abundant protein components of the erythrocyte membrane. The protein shares significant sequence homology with transglutaminases, but lacks the catalytic triad residues required for transglutaminase activity [
]. The complete or nearly complete absence of protein 4.2 is associated with an atypical form of hereditary spherocytosis (HS) [].
Proteins synthesised on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. This traffic is bidirectional, to ensure that proteins required to form vesicles are recycled. Vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transfer []. While clathrin mediates endocytic protein transport, and transport from ER to Golgi, coatomers primarily mediate intra-Golgi transport, as well as the reverse Golgi to ER transport of dilysine-tagged proteins []. For example, the coatomer COP1 (coat protein complex 1) is responsible for reverse transport of recycled proteins from Golgi and pre-Golgi compartments back to the ER, while COPII buds vesicles from the ER to the Golgi []. Coatomers reversibly associate with Golgi (non-clathrin-coated) vesicles to mediate protein transport and for budding from Golgi membranes []. Activated small guanine triphosphatases (GTPases) attract coat proteins to specific membrane export sites, thereby linking coatomers to export cargos. As coat proteins polymerise, vesicles are formed and budded from membrane-bound organelles. Coatomer complexes also influence Golgi structural integrity, as well as the processing, activity, and endocytic recycling of LDL receptors. In mammals, coatomer complexes can only be recruited by membranes associated to ADP-ribosylation factors (ARFs), which are small GTP-binding proteins. Coatomer complexes are hetero-oligomers composed of at least an alpha, beta, beta', gamma, delta, epsilon and zeta subunits. This entry represents a β-sandwich structural motif found in the appendage domain of the gamma subunit of coatomer complexes. This subdomain has an immunoglobulin-like β-sandwich fold containing 7 strands in 2 β-sheets in a Greek key topology [
]. The appendage domain of the gamma coatomer subunit has a similar overall fold to the appendage domain of clathrin adaptors, and can also share the same motif-based cargo recognition and accessory factor recruitment mechanisms.
Helicases have been classified in 5 superfamilies (SF1-SF5). All of the
proteins bind ATP and, consequently, all of them carry the classical Walker A(phosphate-binding loop or P-loop) and Walker B
(Mg2+-binding aspartic acid) motifs. Superfamily 3 consists of helicasesencoded mainly by small DNA viruses and some large nucleocytoplasmic DNA
viruses [,
]. Small viruses are very dependent on the host-cell machinery toreplicate. SF3 helicase in small viruses is associated with an origin-binding
domain. By pairing a domain that recognises the ori with a helicase, the viruscan bypass the host-cell-based regulation pathway and initiate its own
replication. The protein binds to the viral ori leading to origin unwinding.Cellular replication proteins are then recruited to the ori and the viral DNA
is replicated.In SF3 helicases the Walker A and Walker B motifs are separated by spacers of
rather uniform, and relatively short, length. In addition to the A and Bmotifs this family is characterised by a third motif (C) which resides between
the B motif and the C terminus of the conserved region. This motif consists ofan Asn residue preceded by a run of hydrophobic residues [
].Several structures of SF3 helicases have been solved [
]. Theyall possess the same core alpha/beta fold, consisting of a five-stranded
parallel beta sheet flanked on both sides by several alpha helices. Incontrast to SF1 and SF2 helicases, which have RecA-like core folds, the strand
connectivity within the alpha/beta core domain is that of AAA+ proteins [].The SF3 helicase proteins assemble into a hexameric ring.
Some proteins known to contain an SF3 helicase domain are listed below:
Polyomavirus large T antigen. It initiates DNA unwinding and replication
via interactions with the viral origin of replication.Papillomavirus E1 protein. An ATP-dependent DNA helicase required for
initiation of viral DNA replication.Parvovirus Rep/NS1 protein, which is also required for the initiation of
viral replication.Poxviridae and other large DNA viruses D5 protein.Bacteriophage DNA primase/helicase protein.Bacterial prophage DNA primase/helicase protein.The entry represents the core alpha/beta fold of the SF3 helicase domain found predominantly in DNA viruses.
Helicases have been classified in 5 superfamilies (SF1-SF5). All of the
proteins bind ATP and, consequently, all of them carry the classical Walker A(phosphate-binding loop or P-loop) and Walker B
(Mg2+-binding aspartic acid) motifs. Superfamily 3 consists of helicasesencoded mainly by small DNA viruses and some large nucleocytoplasmic DNA
viruses [,
]. Small viruses are very dependent on the host-cell machinery toreplicate. SF3 helicase in small viruses is associated with an origin-binding
domain. By pairing a domain that recognises the ori with a helicase, the viruscan bypass the host-cell-based regulation pathway and initiate its own
replication. The protein binds to the viral ori leading to origin unwinding.Cellular replication proteins are then recruited to the ori and the viral DNA
is replicated.In SF3 helicases the Walker A and Walker B motifs are separated by spacers of
rather uniform, and relatively short, length. In addition to the A and Bmotifs this family is characterised by a third motif (C) which resides between
the B motif and the C terminus of the conserved region. This motif consists ofan Asn residue preceded by a run of hydrophobic residues [
].Several structures of SF3 helicases have been solved [
]. Theyall possess the same core alpha/beta fold, consisting of a five-stranded
parallel beta sheet flanked on both sides by several alpha helices. Incontrast to SF1 and SF2 helicases, which have RecA-like core folds, the strand
connectivity within the alpha/beta core domain is that of AAA+ proteins [].The SF3 helicase proteins assemble into a hexameric ring.
Some proteins known to contain an SF3 helicase domain are listed below:
Polyomavirus large T antigen. It initiates DNA unwinding and replication
via interactions with the viral origin of replication.Papillomavirus E1 protein. An ATP-dependent DNA helicase required for
initiation of viral DNA replication.Parvovirus Rep/NS1 protein, which is also required for the initiation of
viral replication.Poxviridae and other large DNA viruses D5 protein.Bacteriophage DNA primase/helicase protein.Bacterial prophage DNA primase/helicase protein.The entry represents the core alpha/beta fold of the SF3 helicase domain from predominantly single-stranded RNA viruses.
Helicases have been classified in 5 superfamilies (SF1-SF5). All of the
proteins bind ATP and, consequently, all of them carry the classical Walker A(phosphate-binding loop or P-loop) and Walker B
(Mg2+-binding aspartic acid) motifs. Superfamily 3 consists of helicasesencoded mainly by small DNA viruses and some large nucleocytoplasmic DNA
viruses [,
]. Small viruses are very dependent on the host-cell machinery toreplicate. SF3 helicase in small viruses is associated with an origin-binding
domain. By pairing a domain that recognises the ori with a helicase, the viruscan bypass the host-cell-based regulation pathway and initiate its own
replication. The protein binds to the viral ori leading to origin unwinding.Cellular replication proteins are then recruited to the ori and the viral DNA
is replicated.In SF3 helicases the Walker A and Walker B motifs are separated by spacers of
rather uniform, and relatively short, length. In addition to the A and Bmotifs this family is characterised by a third motif (C) which resides between
the B motif and the C terminus of the conserved region. This motif consists ofan Asn residue preceded by a run of hydrophobic residues [
].Several structures of SF3 helicases have been solved [
]. Theyall possess the same core alpha/beta fold, consisting of a five-stranded
parallel beta sheet flanked on both sides by several alpha helices. Incontrast to SF1 and SF2 helicases, which have RecA-like core folds, the strand
connectivity within the alpha/beta core domain is that of AAA+ proteins [].The SF3 helicase proteins assemble into a hexameric ring.
Some proteins known to contain an SF3 helicase domain are listed below:
Polyomavirus large T antigen. It initiates DNA unwinding and replication
via interactions with the viral origin of replication.Papillomavirus E1 protein. An ATP-dependent DNA helicase required for
initiation of viral DNA replication.Parvovirus Rep/NS1 protein, which is also required for the initiation of
viral replication.Poxviridae and other large DNA viruses D5 protein.Bacteriophage DNA primase/helicase protein.Bacterial prophage DNA primase/helicase protein.The entry represents the core alpha/beta fold of the SF3 helicase domain found predominantly in DNA viruses.
Bacterial lipoproteins represent a large group of specialized membrane proteins that perform a variety of functions including maintenance and stabilization of the cell envelope, protein targeting and transit to the outer membrane, membrane biogenesis, and cell adherence [
]. Pathogenic Gram-negative bacteria within the Neisseriaceae and Pasteurellaceae families rely on a specialized uptake system, characterized by an essential surface receptor complex that acquires iron from host transferrin (Tf) and transports the iron across the outer membrane. They have an iron uptake system composed of surface exposed lipoprotein, Tf-binding protein B (TbpB), and an integral outer-membrane protein, Tf-binding protein A (TbpA), that together function to extract iron from the host iron binding glycoprotein (Tf).TbpB is a bilobed (N and C lobe) lipid-anchored protein with each lobe consisting of an eight-stranded beta barrel flanked by a 'handle' domain made up of four (N lobe) or eight (C lobe) beta strands [
]. TbpB extends from the outer membrane surface by virtue of an N-terminal peptide region that is anchored to the outer membrane by fatty acyl chains on the N-terminal cysteine, and is involved in the initial capture of iron-loaded Tf []. The 4-residue conserved LSAC motif found at the amino terminus of TbpB represents a prototypical lipobox, with the cysteine residue serving as the first amino acid in the mature protein which is subsequently modified by the addition of a diacyl glycerol. A second conserved motif of interest is located two amino acids downstream of the LSAC site. This region consists of four glycine residues in tandem. Deletion of the conserved polyglycine motif has significant negative effects on growth in certain conditions, while mutational analysis revealed that the LSAC motif constituting the lipobox of TbpB is necessary for lipidation and hence tethering of TbpB to the bacterial surface [
].This entry represents a domain found on the N-terminal region of TbpB proteins, which comprises the N-lobe handle consisting of a four-stranded antiparallel beta sheets held together by a short surface-exposed alpha helix. Tf-binding activity primarily resides in the TbpB N lobe [
].
The entry refers to the EVH1 domain found in WASP family proteins.The EVH1 (WH1, RanBP1-WASP) domain is found in multi-domain proteins implicated in a diverse range of signalling, nuclear transport and cytoskeletal events. This domain of around 115 amino acids is present in species ranging from yeast to mammals. Many EVH1-containing proteins associate with actin-based structures and play a role in cytoskeletal organisation. EVH1 domains recognise and bind the proline-rich motif FPPPP with low-affinity, further interactions then form between flanking residues [
,
].WASP family proteins contain an EVH1 (WH1) in their N-terminals which bind proline-rich sequences in the WASP interacting protein. Proteins of the RanBP1 family contain a WH1 domain in their N-terminal region, which seems to bind a different sequence motif present in the C-terminal part of RanGTP protein [
,
]. Tertiary structure of the WH1 domain of the Mena protein revealed structure similarities with the pleckstrin homology (PH) domain. The overall fold consists of a compact parallel β-sandwich, closed along one edge by a long α-helix. A highly conserved cluster of three surface-exposed aromatic side-chains forms the recognition site for the molecules target ligands. [
].The actin nucleation-promoting factor WAS (WASP; also called Bee1p) and its homologue N (neuronal)-WASP are signal transduction proteins that promote actin polymerization in response to upstream intracellular signals [
]. Wiskott-Aldrich Syndrome (WAS) is an X-linked recessive disease, characterized by eczema, immunodeficiency, and thrombocytopenia []. The majority of patients with WAS, or a milder version of the disorder, X-linked thrombocytopenia (XLT), have point mutations in the EVH1 domain of WASP []. WASP is an actin regulatory protein consisting of an N-terminal EVH1 domain, a basic region (B), a GTP binding domain (GBP), a proline rich region, a WH2 domain, and a verprolin-cofilin-acidic motif (VCA) which activates the actin-related protein (Arp)2/3 actin nucleating complex []. The B, GBD, and the proline-rich region are involved in autoinhibitory interactions that repress or block the activity of the VCA. Yeast members lack the GTP binding domain. The EVH1 domains are part of the PH domain superfamily [].
The 14-3-3 proteins are a large family of approximately 30kDa acidic proteins which exist primarily as homo- and heterodimers within all eukaryotic cells [
,
]. These are structurally similar phospho-binding proteins that regulate multiple signaling pathways []. There is a high degree of sequence identity and conservation between all the 14-3-3 isotypes, particularly in the regions which form the dimer interface or line the central ligand binding channel of the dimeric molecule. Each 14-3-3 protein sequence can be roughly divided into three sections: a divergent amino terminus, the conserved core region and a divergent carboxyl terminus. The conserved middle core region of the 14-3-3s encodes an amphipathic groove that forms the main functional domain, a cradle for interacting with client proteins. The monomer consists of nine helices organised in an antiparallel manner, forming an L-shaped structure. The interior of the L-structure is composed of four helices: H3 and H5, which contain many charged and polar amino acids, and H7 and H9, which contain hydrophobic amino acids. These four helices form the concave amphipathic groove that interacts with target peptides.The 14-3-3 proteins mainly bind proteins containing phosphothreonine or phosphoserine motifs, however exceptions to this rule do exist. Extensive investigation of the 14-3-3 binding site of the mammalian serine/threonine kinase Raf-1 has produced a consensus sequence for 14-3-3-binding, RSxpSxP (in the single-letter amino-acid code, where x denotes any amino acid and p indicates that the next residue is phosphorylated). The 14-3-3 proteins appear to effect intracellular signalling in one of three ways - by direct regulation of the catalytic activity of the bound protein, by regulating interactions between the bound protein and other molecules in the cell by sequestration or modification or by controlling the subcellular localisation of the bound ligand. Proteins appear to initially bind to a single dominant site and then subsequently to many, much weaker secondary interaction sites. The 14-3-3 dimer is capable of changing the conformation of its bound ligand whilst itself undergoing minimal structural alteration.
Proteins synthesised on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. This traffic is bidirectional, to ensure that proteins required to form vesicles are recycled. Vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transfer [
]. While clathrin mediates endocytic protein transport, and transport from ER to Golgi, coatomers primarily mediate intra-Golgi transport, as well as the reverse Golgi to ER transport of dilysine-tagged proteins []. For example, the coatomer COP1 (coat protein complex 1) is responsible for reverse transport of recycled proteins from Golgi and pre-Golgi compartments back to the ER, while COPII buds vesicles from the ER to the Golgi []. Coatomers reversibly associate with Golgi (non-clathrin-coated) vesicles to mediate protein transport and for budding from Golgi membranes []. Activated small guanine triphosphatases (GTPases) attract coat proteins to specific membrane export sites, thereby linking coatomers to export cargos. As coat proteins polymerise, vesicles are formed and budded from membrane-bound organelles. Coatomer complexes also influence Golgi structural integrity, as well as the processing, activity, and endocytic recycling of LDL receptors. In mammals, coatomer complexes can only be recruited by membranes associated to ADP-ribosylation factors (ARFs), which are small GTP-binding proteins. Coatomer complexes are hetero-oligomers composed of at least an alpha, beta, beta', gamma, delta, epsilon and zeta subunits. This entry represents a β-sandwich structural motif found in the appendage domain superfamily of the gamma subunit of coatomer complexes. This subdomain has an immunoglobulin-like β-sandwich fold containing 7 strands in 2 β-sheets in a Greek key topology [
]. The appendage domain of the gamma coatomer subunit has a similar overall fold to the appendage domain of clathrin adaptors, and can also share the same motif-based cargo recognition and accessory factor recruitment mechanisms.
RNase P subunit Pop5/Rpp14/Rnp2-like domain superfamily
Type:
Homologous_superfamily
Description:
This superfamily contains ribonuclease P (Rnp) proteins from eukaryotes and archaea. Rnp is a ubiquitous ribozyme that catalyzes a Mg2+-dependent hydrolysis to remove the 5'-leader sequence of precursor tRNA (pre-tRNA) [
]. Archaeal and eukaryote RNase P consist of a single RNA and archaeal RNase P has four or five proteins, while eukaryotic RNase P consists of 9 or 10 proteins. Eukaryotic and archaeal RNase P RNAs cooperatively function with protein subunits in catalysis []. Eukaryotic nuclear RNase P shares most of its protein components with another essential RNP enzyme, nucleolar RNase MRP [
]. RNase MRP (mitochondrial RNA processing) is an rRNA processing enzyme that cleaves a specific site within precursor rRNA to generate the mature 5'-end of 5.8S rRNA []. Despite its name, the vast majority of RNase MRP is localized in the nucleolus []. RNase MRP has been shown to cleave primers for mitochondrial DNA replication and CLB2 mRNA. In yeast, RNase MRP possesses one putatively catalytic RNA and at least 9 protein subunits (Pop1, Pop3-Pop8, Rpp1, Snm1 and Rmp1) [].Human RNase P is composed of a singular protein Pop1 and three subcomplexes, the Rpp20-Rpp25 heterodimer, Pop5-Rpp14-(Rpp30)2-Rpp40 heteropentamer, and Rpp21-Rpp29-Rpp38 heterotrimer [
].In the hyperthermophilic archaeon Pyrococcus horikoshii OT3, RNase P is composed of the RNase P RNA (pRNA) and five proteins (PhoPop5, PhoRpp38, PhoRpp21, PhoRpp29, and PhoRpp30) [,
].Proteins in this entry include Rnp2 (also known as Pop5) from archaea and Pop5/Rpp14 from humans. In eukaryotes Pop5 is a subunit of both the Rnp and MRP complexes. Although both Pop5 and Rpp14 have similar protein structure, they share a very limited sequence similarity. Moreover, the C-terminal fragments after the conserved beta sheets in Pop5 and Rpp14 exhibit distinct structural features that mediate interactions with Pop1 and Rpp40, respectively [
].The structure of Rnp2 (ribonuclease P protein component 2) has a ferrodoxin-like fold composed of an α-β sandwich with antiparallel β-sheet and contains an extra C-terminal helix.
In bacteria or archaea, YjeF N-terminal domains occur either as single proteins or fused with other domains and are commonly associated with enzymes. YjeF N-terminal domains are often fused to a YjeF C-terminal domain. It is a bifunctional enzyme that catalyses the epimerisation of the S- and R-forms of NAD(P)HX and the dehydration of the S-form of NAD(P)HX at the expense of ADP, which is converted to AMP [
]. Structurally, YjeF N-terminal domains represent a novel version of the Rossmann fold, one of the most common protein folds in nature. The YjeF N-terminal domain is comprised of a three-layer α-β-alpha sandwich with a central β-sheet surrounded by helices. This domain contains a putative catalytic site [
]. The YjeF N-terminal domain is homologous to AIBP in mammals and YNL200C in budding yeasts. AIBP and YNL200C are NAD(P)H-hydrate epimerases that catalyses the epimerisation of the S- and R-forms of NAD(P)HX, at the expense of ATP, which is converted to ADP [
]. Some proteins known to contain a YjeF N-terminal domain are listed below: Escherichia coli hypothetical protein YjeF. Thermotoga maritima hypothetical protein Tm0922. Yeast uncharacterised protein YNL200C. Yeast enhancer of mRNA-decapping protein 3 (EDC3). Vertebrate enhancer of mRNA-decapping protein 3 (EDC3).Mammalian apolipoprotein A-I binding protein (AI-BP).
This entry represents the DEP (Dishevelled, Egl-10 and Pleckstrin) domain, a globular domain of about 80 residues that is found in over 50 proteins involved in G-protein signalling pathways. It was named after the three proteins it was initially found in:Dishevelled (Dsh and Dvl), which play a key role in the transduction of the Wg/Wnt signal from the cell surface to the nucleus; it is a segment polarity protein required to establish coherent arrays of polarized cells and segments in embryos, and plays a role in wingless signalling.Egl-10, which regulates G-protein signalling in the central nervous system. Pleckstrin, the major substrate of protein kinase C in platelets; Pleckstrin contains two PH domains flanking the DEP domain.Mammalian regulators of G-protein signalling also contain these domains, and regulate signal transduction by increasing the GTPase activity of G-protein alpha subunits, thereby driving them into their inactive GDP-bound form. It has been proposed that the DEP domain could play a selective role in targeting DEP domain-containing proteins to specific subcellular membranous sites, perhaps even to specific G protein-coupled signaling pathways [
,
]. Nuclear magnetic resonance spectroscopy has revealed that the DEP domain comprises a three-helix bundle, a β-hairpin 'arm' composed of two β-strands and two short β-strands in the C-terminal region [].
This entry represents the acyltransferase domain present in a wide range of acyltransferase enzymes, including, mainly, bacterial proteins which catalyse the transfer of acyl groups, other than amino-acyl, from one compound to another, such as Glucans biosynthesis protein C (OPGC) or protein OatA from Listeria monocytogenes serovar 1/2a and Staphylococcus aureus, an integral membrane protein which is responsible for O-acetylation at the C6-hydroxyl group of N-acetylmuramyl residues, forming the corresponding N,6-O-diacetylmuramic acid of the peptidoglycan, a modification that determines lysozyme resistance [
,
,
,
]. The pathogenic bacteria, Staphylococcus aureus, is able to cause persistent infections due to its ability to resist the immune defence system. Lysozyme, a cell wall-lytic enzyme, is one of the first defence compounds induced in serum and tissues after the onset of infection []. This entry also includes NolL proteins. NolL-dependent acetylation is specific for the fucosyl penta-N-acetylglucosamine species. In addition, the NolL protein caused elevated production of lipo-chitin oligosaccharides (LCOs). The NolL protein obtained from Rhizobium loti (Mesorhizobium loti) functions as an acetyl transferase [].This domain is also present in eukaryotic proteins, namely O-acyltransferase like protein (OACYL) from mouse and RHY1 (Regulator of hypoxia-inducible factor 1) [
] and NRF6 (Nose resistant to fluoxetine protein 6) [] from Caenorhabditis elegans.
Exchangeable apolipoproteins (apoA, apoC and apoE) have the same genomic structure and are members of a multi-gene family that probably evolved from a common ancestral gene. This entry includes the ApoA1, ApoA4, Apo5 and ApoE proteins. ApoA1, ApoA4 and Apo5 are part of the APOA1/C3/A4/A5 gene cluster on chromosome 11 [
]. Apolipoproteins function in lipid transport as structural components of lipoprotein particles, cofactors for enzymes and ligands for cell-surface receptors. In particular, apoA1 is the major protein component of high-density lipoproteins; apoA4 is thought to act primarily in intestinal lipid absorption; and apoE is a blood plasma protein that mediates the transport and uptake of cholesterol and lipid by way of its high affinity interaction with different cellular receptors, including the low-density lipoprotein (LDL) receptor. Recent findings with apoA1 and apoE suggest that the tertiary structures of these two members of the human exchangeable apolipoprotein gene family are related []. The three-dimensional structure of the LDL receptor-binding domain of apoE indicates that the protein forms an unusually elongated four-helix bundle that may be stabilised by a tightly packed hydrophobic core that includes leucine zipper-type interactions and by numerous salt bridges on the mostly charged surface. Basic amino acids important for LDL receptor binding are clustered into a surface patch on one long helix [].
Several biological processes regulate the activity of target proteins through changes in the redox state of thiol groups (S2 to SH2), where a hydrogen donor is linked to an intermediary disulphide protein. Such processes include the ferredoxin/thioredoxin system, the NADP/thioredoxin system, and the glutathione/glutaredoxin system [
]. Several of these disulphide proteins share a common structure, consisting of a three-layer α/β/α core. Proteins that contain domains with a thioredoxin-like fold include:Arsenate reductase (ArsC) [
]
Calsequestrin (contains three tandem repeats of this fold) [
]
Circadian oscillation regulator KaiB [
]
Disulphide bond isomerase DsbC and DsbG (C-terminal domain) [
,
]
Disulphide bond facilitator DsbA (contains an α-helical insertion) [
]
Endoplasmic reticulum protein ERP29 (N-terminal domain) [
]
Glutathione S-transferase (GST) (N-terminal domain) [
]
Mitochondrial ribosomal protein L51/S25/CI-B8 domain (variable positions for Cys residues in active site) [
]
Phosducin [
]
Protein disulphide isomerase (PDI) (contains two tandem repeats of this fold) [
]
Glutathione peroxidase-like enzymes [
]
Selenoprotein W-related [
]
SH3-binding glutamic acid-rich protein like (SH3BGR) (lacks both conserved Cys residues) [
]
Spliceosomal protein U5-15Kd [
]
Thioltransferases, including thioredoxin [
], glutaredoxin [], hybrid peroxiredoxin hyPrx5 []
Thioredoxin-like 2Fe-2S ferredoxin [
]
In bacteria or archaea, YjeF N-terminal domains occur either as single proteins or fused with other domains and are commonly associated with enzymes. YjeF N-terminal domains are often fused to a YjeF C-terminal domain. It is a bifunctional enzyme that catalyses the epimerisation of the S- and R-forms of NAD(P)HX and the dehydration of the S-form of NAD(P)HX at the expense of ADP, which is converted to AMP [
]. Structurally, YjeF N-terminal domains represent a novel version of the Rossmann fold, one of the most common protein folds in nature. The YjeF N-terminal domain is comprised of a three-layer α-β-alpha sandwich with a central β-sheet surrounded by helices. This domain contains a putative catalytic site [
]. The YjeF N-terminal domain is homologous to AIBP in mammals and YNL200C in budding yeasts. AIBP and YNL200C are NAD(P)H-hydrate epimerases that catalyses the epimerisation of the S- and R-forms of NAD(P)HX, at the expense of ATP, which is converted to ADP [
]. Some proteins known to contain a YjeF N-terminal domain are listed below: Escherichia coli hypothetical protein YjeF. Thermotoga maritima hypothetical protein Tm0922. Yeast uncharacterised protein YNL200C. Yeast enhancer of mRNA-decapping protein 3 (EDC3). Vertebrate enhancer of mRNA-decapping protein 3 (EDC3).Mammalian apolipoprotein A-I binding protein (AI-BP).
This entry includes Hobbit protein from Drosophila melanogaster and its homologues such as Bridge-like lipid transfer protein family member 2 from human (BLTP2/KIAA0100) and FMP27 from yeast, referred to the Hob proteins [
,
]. These proteins are localized to endoplasmic reticulum-plasma membrane (ER-PM) contact sites described as conserved lipid-binding proteins. They have a long hydrophobic groove and can mediate bulk transport of lipids between organelles. This entry also includes maize protein APT1 and Arabidopsis homologues SABRE and KIP []. The Hob family belongs to the repeating β-groove (RBG) superfamily together with VPS13, ATG2, SHIP164, Csf1/BLTP1, which are all conserved lipid transfer proteins containing long hydrophobic grooves []. They all share the same structure consisting of multiple repeating modules consisting of five β-sheets followed by a loop. The function of the FMP27 (Found in mitochondrial proteome protein 27) is tube-forming lipid transport protein which binds to phosphatidylinositols and affects phosphatidylinositol-4,5-bisphosphate (PtdIns-4,5-P2) distribution [
,
]. APT1 (Aberrant pollen transmission 1) is required for pollen tube growth. It is a Golgi-localised protein and appears to regulate vesicular trafficking []. SABRE (Hypersensitive to Pi starvation 4) and KIP (Kinky pollen) are APT1 homologues and they are involved in the elongation of root cortex cells and pollen tubes respectively.
The Ras association domain (RASSF) proteins are named due to the presence of a Ras association (RA) domain in their N or C terminus that can potentially interact with the Ras GTPase family of proteins. These GTPases control a variety of cellular processes, such as membrane trafficking, apoptosis, and proliferation. RASSF proteins contain several other functional domains that modulate associations with other proteins. RASSF proteins with the RA domain at the C terminus (which are termed C-terminal or classical RASSF) usually also include a Salvador-RASSF-Hippo (SARAH) domain involved in several protein-protein interactions and for homo- and heterodimerisation of RASSF isoforms. N-terminal RASSF proteins (with the RA domain in the N terminus) do not usually contain a SARAH domain [
].At least 10 RASSF family members have been characterised (with multiple splice variants), many of which have been shown to play a role in tumour suppression. RASSF proteins also act as scaffolding agents in microtubule stability, regulate mitotic cell division, control cell migration and cell adhesion, and modulate NF-KB activity and the duration of inflammation. Loss of RASSF expression through promoter methylation has been shown in numerous types of cancer, including leukemia, melanoma, breast and prostate cancer [
].This entry refers to the N-terminal RASSF family of proteins.
This entry includes human endoplasmic reticulum chaperone BIP (also known as 70kDa heat shock protein 5, HSPA5, glucose-regulated protein 78/GRP78, and immunoglobulin heavy chain-binding protein), yeast Kar2p (also known as Grp78p) [
,
], hsp-3 from C. elegans [] and related proteins. BIP/HSPA5 belongs to the heat shock protein 70 (HSP70) family of chaperones that assist in protein folding and assembly and can direct incompetent 'client' proteins towards degradation. HSPA5 and Kar2p are chaperones of the endoplasmic reticulum (ER). Typically, HSP70s have a nucleotide-binding domain (NBD) and a substrate-binding domain (SBD). The nucleotide sits in a deep cleft formed between the two lobes of the NBD. The two subdomains of each lobe change conformation between ATP-bound, ADP-bound, and nucleotide-free states. ATP binding opens up the substrate-binding site; substrate-binding increases the rate of ATP hydrolysis. HSP70 chaperone activity is regulated by various co-chaperones: J-domain proteins and nucleotide exchange factors (NEFs). Multiple ER DNAJ domain proteins have been identified and may exist in distinct complexes with HSPA5 in various locations in the ER, for example DNAJC3-p58IPK in the lumen []. HSPA5-NEFs include SIL1, whose disruption causes the Marinesco-Sjogren syndrome, and an atypical HSP70 family protein HYOU1/ORP150 [,
]. The ATPase activity of Kar2p is stimulated by the NEFs: Sil1p and Lhs1p [].
The mitochondrial protein translocase family, which is responsible for movement of nuclear encoded pre-proteins into mitochondria, is very complex with at least 19 components. These proteins include several chaperone proteins, four proteins of the outer membrane translocase (Tom) import receptor, five proteins of the Tom channel complex, five proteins of the inner membrane translocase (Tim) and three "motor"proteins. This family represents the Tom22 proteins [
]. The N-terminal region of Tom22 has been shown to have chaperone-like activity, and the C-terminal region faces the intermembrane face [].
This family consists of regulator of G-protein signalling 7-binding protein (RGS7BP) and regulator of G-protein signalling 9-binding protein (RGS9BP). They interact with regulation of G-protein signalling (RGS) proteins and Gbeta5 to form complexes that act to regulate G-proteins [
].RGS7BP is expressed neuronally [
] and interacts with R7 family of RGS proteins (RGS6, RGS7, RGS9 and RGS11). RGS9BP interacts with RGS9 and is involved in the regulation of G-protein signalling in the phototransduction, participating in the recovery phase of visual transduction [
].
M proteins play a critical role in protein-protein interactions (as well as protein-RNA interactions) since virus-like particle (VLP) formation in many coronavirus requires only the M and envelope (E) proteins for efficient virion assembly [
]. The M protein or E1 glycoprotein is implicated in virus assembly []. The E1 viral membrane protein is required for formation of the viral envelope and is transported via the Golgi complex [].This entry contains the Membrane (M) protein of deltacoronaviruses, including porcine deltacoronavirus and Bulbul coronavirus HKU11 [
].
The N-terminal region of this group of proteins is required for correct folding of the ER UDP-Glc: glucosyltransferase. These proteins selectively reglucosylates unfolded glycoproteins, thus providing quality control for protein transport out of the ER. Unfolded, denatured glycoproteins are substantially better substrates for glucosylation by this enzyme than are the corresponding native proteins. This protein and transient glucosylation may be involved in monitoring and/or assisting the folding and assembly of newly made glycoproteins, in order to identify glycoproteins that need assistance in folding from chaperones
The Krueppel-associated box (KRAB) is a domain of around 75 amino acids that is found in the N-terminal part of about one third of eukaryotic Krueppel-type C2H2 zinc finger proteins (ZFPs) [
]. It is enriched in charged amino acids and can be divided into subregions A and B, which are predicted to fold into two amphipathic α-helices. The KRAB A and B boxes can be separated by variable spacer segments and many KRAB proteins contain only the A box [].The functions currently known for members of the KRAB-containing protein family include transcriptional repression of RNA polymerase I, II and III promoters, binding and splicing of RNA, and control of nucleolus function. The KRAB domain functions as a transcriptional repressor when tethered to the template DNA by a DNA-binding domain. A sequence of 45 amino acids in the KRAB A subdomain has been shown to be necessary and sufficient for transcriptional repression. The B box does not repress by itself but does potentiate the repression exerted by the KRAB A subdomain [
,
]. Gene silencing requires the binding of the KRAB domain to the RING-B box-coiled coil (RBCC) domain of the KAP-1/TIF1-beta corepressor. As KAP-1 binds to the heterochromatin proteins HP1, it has been proposed that the KRAB-ZFP-bound target gene could be silenced following recruitment to heterochromatin [,
].KRAB-ZFPs probably constitute the single largest class of transcription factors within the human genome [
]. The KRAB domain is generally encoded by two exons. The regions coded by the two exons are known as KRAB-A and KRAB-B. Although the function of KRAB-ZFPs is largely unknown, they appear to play important roles during cell differentiation and development. These proteins have been shown to play important roles in cell differentiation and organ development, and in regulating viral replication and transcription. A KRAB domain may consist of an A-box, or of an A-box plus either a B-box, a divergent B-box (b), or a C-box. Only the A-box is included in this model. The A-box is needed for repression, the B- and C- boxes are not. KRAB-ZFPs have one or two KRAB domains at their amino-terminal end, and multiple C2H2 zinc finger motifs at their C-termini. Some KRAB-ZFPs also contain a SCAN domain which mediates homo- and hetero-oligomerization. The KRAB domain is a protein-protein interaction module which represses transcription through recruiting corepressors. A key mechanism appears to be the following: KRAB-AFPs tethered to DNA recruit, via their KRAB domain, the repressor KAP1 (KRAB-associated protein-1, also known as transcription intermediary factor 1 beta, KRAB-A interacting protein and tripartite motif protein 28). The KAP1/ KRAB-AFP complex in turn recruits the heterochromatin protein 1 (HP1) family, and other chromatin modulating proteins, leading to transcriptional repression through heterochromatin formation [
].
The Krueppel-associated box (KRAB) is a domain of around 75 amino acids that is found in the N-terminal part of about one third of eukaryotic Krueppel-type C2H2 zinc finger proteins (ZFPs) [
]. It is enriched in charged amino acids and can be divided into subregions A and B, which are predicted to fold into two amphipathic α-helices. The KRAB A and B boxes can be separated by variable spacer segments and many KRAB proteins contain only the A box [].The functions currently known for members of the KRAB-containing protein family include transcriptional repression of RNA polymerase I, II and III promoters, binding and splicing of RNA, and control of nucleolus function. The KRAB domain functions as a transcriptional repressor when tethered to the template DNA by a DNA-binding domain. A sequence of 45 amino acids in the KRAB A subdomain has been shown to be necessary and sufficient for transcriptional repression. The B box does not repress by itself but does potentiate the repression exerted by the KRAB A subdomain [
,
]. Gene silencing requires the binding of the KRAB domain to the RING-B box-coiled coil (RBCC) domain of the KAP-1/TIF1-beta corepressor. As KAP-1 binds to the heterochromatin proteins HP1, it has been proposed that the KRAB-ZFP-bound target gene could be silenced following recruitment to heterochromatin [,
].KRAB-ZFPs probably constitute the single largest class of transcription factors within the human genome [
]. The KRAB domain is generally encoded by two exons. The regions coded by the two exons are known as KRAB-A and KRAB-B. Although the function of KRAB-ZFPs is largely unknown, they appear to play important roles during cell differentiation and development. These proteins have been shown to play important roles in cell differentiation and organ development, and in regulating viral replication and transcription. A KRAB domain may consist of an A-box, or of an A-box plus either a B-box, a divergent B-box (b), or a C-box. Only the A-box is included in this model. The A-box is needed for repression, the B- and C- boxes are not. KRAB-ZFPs have one or two KRAB domains at their amino-terminal end, and multiple C2H2 zinc finger motifs at their C-termini. Some KRAB-ZFPs also contain a SCAN domain which mediates homo- and hetero-oligomerization. The KRAB domain is a protein-protein interaction module which represses transcription through recruiting corepressors. A key mechanism appears to be the following: KRAB-AFPs tethered to DNA recruit, via their KRAB domain, the repressor KAP1 (KRAB-associated protein-1, also known as transcription intermediary factor 1 beta, KRAB-A interacting protein and tripartite motif protein 28). The KAP1/ KRAB-AFP complex in turn recruits the heterochromatin protein 1 (HP1) family, and other chromatin modulating proteins, leading to transcriptional repression through heterochromatin formation [
].
This entry represents the Toll/interleukin-1 receptor (TIR) domain, which is the conserved cytoplasmic domain of approximately 200 amino acids, found in Toll-like receptors (TLRs) and their adaptors. Proteins containing this domain can also be found in plants, where they mediate disease resistance [
], and in bacteria, where they have been associated with virulence. Interestingly, the TIR domains from proteins present in all three major domains of life have been shown to cleave nicotinamide adenine dinucleotide (NAD+). In plants, TIR domains require self-association interfaces and a putative catalytic glutamic acid that is conserved in both bacterial TIR NAD+-cleaving enzymes (NADases) and the mammalian SARM1 (sterile alpha and TIR motif containing 1) NADase for cell death induction and NAD+ cleavage activity [
,
]. It has been suggested that the primordial function of the TIR domain is the enzymatic cleavage of NAD+ and that the scaffolding function, which is best characterised in mammalian TIR domains involved in innate immunity, may be a more recent evolutionary adaptation [].Toll proteins or Toll-like receptors (TLRs) and the interleukin-1 receptor (IL-1R) superfamily are both involved in innate antibacterial and antifungal
immunity in insects as well as in mammals. These receptors share a conserved cytoplasmic domain of approximately 200 amino acids, known as the Toll/IL-1R homologous region (TIR). The similarity between TLRs and IL-1Rs is not restricted to sequence homology since these proteins also share a similar signalling pathway. They both induce the activation of a Rel type transcription factor via an adaptor protein and a protein kinase []. Interestingly, MyD88, a cytoplasmic adaptor protein found in mammals, contains a TIR domain associated to a DEATH domain [,
,
]. Besides the mammalian and Drosophila proteins, a TIR domain is also found in a number of plant cytoplasmic proteins implicated in host defense [].Site directed mutagenesis and deletion analysis have shown that the TIR domain is essential for Toll and IL-1R activities. Sequence analysis have revealed the presence of three highly conserved regions among the different members of the family: box 1 (FDAFISY), box 2 (GYKLC-RD-PG), and box 3 (a conserved W surrounded by basic residues). It has been proposed that boxes 1 and 2 are involved in the binding of proteins involved in signalling, whereas box 3 is primarily involved in directing localization of receptor, perhaps through interactions with cytoskeletal element [
].Resolution of the crystal structures of the TIR domains of human Toll-like receptors 1 and 2 has shown that they contain a central five-stranded parallel β-sheet that is surrounded by a total of five helices on both sides, with connecting loop structures [
]. The loop regions appear to play an important role in mediating the specificity of protein-protein interactions [,
].
A key aspect of eukaryotic intracellular trafficking is the sorting of cell-surface proteins into multi-vesicular endosomes or bodies (MVBs), which eventually fuse with the lysosome, where they are degraded by lipases and peptidases. This is the primary mechanism for down-regulation of signaling via transmembrane receptors and removal of misfolded or defective membrane proteins. This process is also utilised by several viruses (e.g. HIV-1) to facilitate budding of their virions from the cell-membrane. Studies in animals and fungi have shown that it depends on an intricate series of interactions, which is initiated via ubiquitination (typically one or more mono-ubiquitinations) of the cytoplasmic tails of membrane proteins by specific E3 ligases. Ubiquitinated membrane proteins are then captured into endosomes by the ESCRT system and prevented from being recycled back to the plasma membrane via the retrograde trafficking system. The ESCRT system also folds the endosomal membranes into invaginations that are concentrated in these ubiquitinated targets and catalyzes their abscission into intra-luminal-vesicles inside the endosome. This largely seals the fate of these membrane proteins as targets for lysosomal degradation. The ESCRT system is comprised of 4 major protein complexes, ESCRT-0 to ESCRT-III, which are successively involved in the above-described steps [
].ESCRT-I contains three subunits that are conserved between yeast and animals, namely the inactive E2-ligase protein TSG101/VPS23, VPS28 and VPS37. Additionally, both yeast and metazoan ESCRT-I contain a fourth subunit termed MVB12 (multivesicular body sorting factor of 12 kD); however, the MVB12 subunits from the two lineages do not show significant sequence similarity. The metazoan MVB12 proteins contain two distinct conserved domains that occur independently in various proteins. The N-terminal region of MVB12 forms the MVB12-associated β-prism (MABP), which is also found in DENND4A/B/C from vertebrates, the membrane trafficking regulator Crag from Drosophila, bacterial proteins typified by the MAC/perforin (MACPF)-like protein plu1415 from Photorhabdus luminescens and uncharacterised proteins from choanoflagellates and stamenopiles. It has been suggested that the MABP domain has a membrane-associated function, perhaps even specific interactions with membrane components. It is plausible that the eukaryotic MABP domains are adaptators that help linking other associated domains found in the same polypeptide to vesicular membranes [].The MABP domain has an internal repeat structure of three homologous segments. Consitent with this, the structurally characterised representative Photorhabdus plu1415, showed that this region precisely corresponds to a type-I β-prism domain with an internal three fold symetry. Each of the three sub-domains of the β-prism structure is a distinctive three-stranded β-sheet. The MABP domain shares a triradial symmetry with β-sheets parallel to the prism axis. The β-prism fold is associated with membrane interaction. The majority of the eukaryotic MABP domain versions contain a conserved cysteine in the first and third subdomain of the β-prism [
,
].
FMP27/BLTP2/Hobbit, GFWDK motif-containing RBG unit
Type:
Domain
Description:
This entry includes Hobbit protein from Drosophila melanogaster and its homologues such as Bridge-like lipid transfer protein family member 2 from human (BLTP2/KIAA0100) and FMP27 from yeast, referred to the Hob proteins [
,
]. These proteins are localized to endoplasmic reticulum-plasma membrane (ER-PM) contact sites described as conserved lipid-binding proteins. They have a long hydrophobic groove and can mediate bulk transport of lipids between organelles. This entry also includes maize protein APT1 and Arabidopsis homologues SABRE and KIP []. The Hob family belongs to the repeating β-groove (RBG) superfamily together with VPS13, ATG2, SHIP164, Csf1/BLTP1, which are all conserved lipid transfer proteins containing long hydrophobic grooves []. They all share the same structure consisting of multiple repeating modules consisting of five β-sheets followed by a loop. The function of the FMP27 (Found in mitochondrial proteome protein 27) is tube-forming lipid transport protein which binds to phosphatidylinositols and affects phosphatidylinositol-4,5-bisphosphate (PtdIns-4,5-P2) distribution [
,
]. APT1 (Aberrant pollen transmission 1) is required for pollen tube growth. It is a Golgi-localised protein and appears to regulate vesicular trafficking []. SABRE (Hypersensitive to Pi starvation 4) and KIP (Kinky pollen) are APT1 homologues and they are involved in the elongation of root cortex cells and pollen tubes respectively.This entry represents a RBG module containing the conserved GFWDK sequence motif.
SecA is a cytoplasmic protein of 800 to 960 amino acid residues. The eubacterial secA protein [
] plays an important role in protein export. It interacts with the secY and secE components of the protein translocation system. It has a central role in coupling the hydrolysis of ATP to the transfer of proteins across the membrane.SecA is a superfamily 2 (SF2) helicase that adapted to translocate proteins. It contains the characteristic DEAD/DEXH ATPase core structure with the seven SF2 motifs [
]. Several structural analyses on secA have been reported [
,
]. They show that secA contains two recA-like domains similar to SF1 and SF2 helicases. In helicases, the two recA-like domains move relative to one another during the ATPase cycle, generating domain movements that translocate the helicase along nucleic acids. In secA, it seems that a similar mechanism is used to generate domain movements that are coupled to polypeptide translocation. The N-terminal recA-like domain of secA contains an insert of about 150 residues that forms the preprotein crosslinking domain (PPXD) which has the ability to bind preproteins in solution and which is important for preprotein loading onto SecYEG-containing membranes [].Homologs of secA are also encoded in the chloroplast genome of some algae [
] as well as in the nuclear genome of plants []. It could be involved in the intraorganellar protein transport into thylakoids.
The tetratrico peptide repeat region (TPR) is a structural motif present in a wide range of proteins [
,
,
]. It mediates protein-protein interactions and the assembly of multiprotein complexes []. The TPR motif consists of 3-16 tandem-repeats of 34 amino acids residues, although individual TPR motifs can be dispersed in the protein sequence. Sequence alignment of the TPR domains reveals a consensus sequence defined by a pattern of small and large amino acids. TPR motifs have been identified in various different organisms, ranging from bacteria to humans. Proteins containing TPRs are involved in a variety of biological processes, such as cell cycle regulation, transcriptional control, mitochondrial and peroxisomal protein transport, neurogenesis and protein folding.The X-ray structure of a domain containing three TPRs from protein phosphatase 5 revealed that TPR adopts a helix-turn-helix arrangement, with adjacent TPR motifs packing in a parallel fashion, resulting in a spiral of repeating anti-parallel α-helices [
]. The two helices are denoted helix A and helix B. The packing angle between helix A and helix B is ~24 degrees within a single TPR and generates a right-handed superhelical shape. Helix A interacts with helix B and with helix A' of the next TPR. Two protein surfaces are generated: the inner concave surface is contributed to mainly by residue on helices A, and the other surface presents residues from both helices A and B.
The tetratrico peptide repeat (TPR) is a structural motif present in a wide range of proteins [
,
,
]. It mediates protein-protein interactions and the assembly of multiprotein complexes []. The TPR motif consists of 3-16 tandem-repeats of 34 amino acids residues, although individual TPR motifs can be dispersed in the protein sequence. Sequence alignment of the TPR domains reveals a consensus sequence defined by a pattern of small and large amino acids. TPR motifs have been identified in various different organisms, ranging from bacteria to humans. Proteins containing TPRs are involved in a variety of biological processes, such as cell cycle regulation, transcriptional control, mitochondrial and peroxisomal protein transport, neurogenesis and protein folding [].The X-ray structure of a domain containing three TPRs from protein phosphatase 5 revealed that
TPR adopts a helix-turn-helix arrangement, with adjacent TPR motifs packing in a parallelfashion, resulting in a spiral of repeating anti-parallel α-helices [
]. The two helices are denotedhelix A and helix B. The packing angle between helix A and helix B is ~24 degrees; within a
single TPR and generates a right-handed superhelical shape. Helix A interacts with helix B andwith helix A' of the next TPR. Two protein surfaces are generated: the inner concave surface is
contributed to mainly by residue on helices A, and the other surface presents residues from bothhelices A and B.
Bacteria synthesise a set of small, usually basic proteins of about 90 residues that bind DNA and are known as histone-like proteins [
,
]. Examples include the HU protein in Escherichia coli which is a dimer of closely related alpha and beta chains and in other bacteria can be a dimer of identical chains. HU-type proteins have been found in a variety of eubacteria, cyanobacteria and archaebacteria, and are also encoded in the chloroplast genome of some algae []. The integration host factor (IHF), a dimer of closely related chains which seem to function in genetic recombination as well as in translational and transcriptional control [] is found in enterobacteria and viral proteins include the African Swine fever virus protein Pret-047 (also known as A104R or LMW5-AR) [].The exact function of these proteins is not yet clear but they are capable of wrapping DNA and stabilising it from denaturation under extreme environmental conditions. The structure is known for one of these proteins [
]. The protein exists as a dimer and two "β-arms"function as the non-specific binding site for bacterial DNA.
The signature pattern for entry represents a twenty residue sequence which includes three perfectly conserved positions. According to the tertiary structure of one of these proteins [
], this pattern spans exactly the first half of the flexible DNA-binding arm.
Proteins containing this domain include ARF1-directed GTPase-activating protein, the cycle control GTPase activating protein (GAP) GCS1 which is important for the regulation of the ADP ribosylation factor ARF, a member of the Ras superfamily of GTP-binding proteins [
]. The GTP-bound form of ARF is essential for the maintenance of normal Golgi morphology, it participates in recruitment of coat proteins which are required for budding and fission of membranes. Before the fusion with an acceptor compartment the membrane must be uncoated. This step required the hydrolysis of GTP associated to ARF. These proteins contain a characteristic zinc finger motif (Cys-x2-Cys-x(16,17)-x2-Cys) which displays some similarity to the C4-type GATA zinc finger. The ARFGAP domain display no obvious similarity to other GAP proteins.The 3D structure of the ARFGAP domain of the PYK2-associated protein beta has been solved [
]. It consists of a three-stranded β-sheet surrounded by 5 alpha helices. The domain is organised around a central zinc atom which is coordinated by 4 cysteines. The ARFGAP domain is clearly unrelated to the other GAP proteins structures which are exclusively helical. Classical GAP proteins accelerate GTPase activity by supplying an arginine finger to the active site. The crystal structure of ARFGAP bound to ARF revealed that the ARFGAP domain does not supply an arginine to the active site which suggests a more indirect role of the ARFGAP domain in the GTPase hydrolysis [].
Synucleins are small, soluble proteins expressed primarily in neural tissue and in certain tumors [
,
]. The family includes three known proteins: alpha-synuclein, beta-synuclein, and gamma-synuclein. All synucleins have in common a highly conserved α-helical lipid-binding motif with similarity to the class-A2 lipid-binding domains of the exchangeable apolipoproteins [].Synuclein family members are not found outside vertebrates, although they have some conserved structural similarity with plant 'late-embryo-abundant' proteins. The alpha- and beta-synuclein proteins are found primarily in brain tissue, where they are seen mainly in presynaptic terminals [
,
]. The gamma-synuclein protein is found primarily in the peripheral nervous system and retina, but its expression in breast tumors [] is a marker for tumor progression [].Normal cellular functions have not been determined for any of the synuclein proteins,
although some data suggest a role in the regulation of membrane stability and/or turnover.Mutations in alpha-synuclein are associated with rare familial cases of early-onset Parkinson's
disease, and the protein accumulates abnormally in Parkinson's disease, Alzheimer's disease,and several other neurodegenerative illnesses [
]. Beta-synuclein, also known as PNP 14, is expressed in the brain, specifically in synapses around
neurons, but not in glial cells. The protein, which has been designateda phosphoneuroprotein, has been found to be phosphorylated in vitro and in
vivo []. It is believed that the physiological functions of beta-synuclein may be controlled by the phosphorylation reaction.
The ERM family consists of three closely-related proteins, ezrin, radixin and moesin [
]. Ezrin was first identified as a constituent of microvilli [], radixin as a barbed, end-capping actin-modulating protein from isolated junctional fractions [], and moesin as a heparin binding protein [], which is particularly important in immunity acting on both T and B-cells homeostasis and self-tolerance [,
]. Members of this family have been associated with axon-associated Schwann cell (SC) motility and the maintenance of the polarity of these cells []. A tumour suppressor molecule responsible for neurofibromatosis type 2 (NF2) is highly similar to ERM proteins and has been designated merlin (moesin-ezrin-radixin-like protein) []. ERM molecules contain 3 domains, an N-terminal globular domain, an extended α-helical domain and a charged C-terminal domain []. Ezrin, radixin and merlin also contain a polyproline regionbetween the helical and C-terminal domains. The N-terminal domain is highly conserved, and is also found in merlin, band 4.1 proteins and members of the band 4.1 superfamily, designated the FERM domain [
]. ERM proteins crosslink actin filaments with plasma membranes. They co-localise with CD44 at actin filament plasma membrane interaction sites, associating with CD44 via their N-terminal domains and with actin filaments via their C-terminal domains []. The α-helical region is involved in intramolecular masking of protein-protein interaction sites which regulates the activity of this proteins [].
A number of eukaryotic proteins, which probably are sequence specific DNA-binding proteins that act as transcription factors, share a conserved domain of 40 to 50 amino acid residues. It has been proposed [
] that this domain is formed of two amphipathic helices joined by a variable length linker region that could form a loop. This 'helix-loop-helix' (HLH) domain mediates protein dimerization and has been found in the proteins listed below []. Most of these proteins have an extra basic region of about 15 amino acid residues that is adjacent to the HLH domain and specifically binds to DNA. They are referred as basic helix-loop-helix proteins (bHLH), and are classified in two groups: class A (ubiquitous) and class B (tissue-specific). Members of the bHLH family bind variations on the core sequence 'CANNTG', also referred to asthe E-box motif. The homo- or heterodimerization mediated by the HLH domain is independent of, but necessary for DNA binding, as two basic regions are required for DNA binding activity. The HLH proteins lacking the basic domain (Emc, Id) function as negative regulators, since they form heterodimers, but fail to bind DNA. The hairy-related proteins (hairy, E(spl), deadpan) also repress transcription although they can bind DNA. The proteins of this subfamily act together with co-repressor proteins, like groucho, through their -terminal motif WRPW.
Nitrogenase (
) [
] is the enzyme system responsible for biological nitrogen fixation. Nitrogenase is an oligomeric complex which consists of two components: component 1 which contains the active site for the reduction of nitrogen to ammonia and component 2 (also called the iron protein).Component 2 is a homodimer of a protein (gene nifH) which binds a single 4Fe-
4S iron sulfur cluster []. In the nitrogen fixation process nifH is first reduced by a protein such as ferredoxin; the reduced protein then transfers electrons to component 1 with the concomitant consumption of ATP.A number of proteins are known to be evolutionary related to nifH. These
proteins are:Chloroplast encoded chlL (or frxC) protein [
]. ChlL is encoded on the chloroplast genome of some plant species, its exact function is not known, but it could act as an electron carrier in the conversion of protochlorophyllide to chlorophyllide.Photosynthetic bacteria proteins bchL and bchX [
]. These proteins are also likely to play a role in chlorophyll synthesis.There are a number of conserved regions in the sequence of these proteins: in
the N-terminal section there is an ATP-binding site motif 'A' (P-loop) and inthe central section there are two conserved cysteines which have been shown,
in nifH, to be the ligands of the 4Fe-4S cluster. This entry represents two conserved sites which correspond to the regions around these cysteines.
Pentaxins (or pentraxins) [
,
] are a family of proteins which show, underelectron microscopy, a discoid arrangement of five noncovalently bound
subunits. Proteins known to belong to this family are:C-reactive protein (CRP), a protein which, in mammals, is expressed during
acute phase response to tissue injury or inflammation. CRP displays severalfunctions associated with host defense: it promotes agglutination,
bacterial capsular swelling, phagocytosis and complement fixation throughits calcium-dependent binding to phosphorylcholine. CRPs have also been
sequenced in an invertebrate, the Atlantic horseshoe crab, where they are anormal constituent of the hemolymph.Serum Amyloid P-component (SAP), a precursor of amyloid component P which
is found in basement membrane and is associated with amyloid deposits.Hamster female protein (FP), a plasma protein whose concentration is
altered by sex steroids and stimuli that elicit an acute phase response.A number of proteins, whose function is not yet clear, contain a C-terminal
pentaxin-like domain. These proteins are:Human PTX3 (or TSG-14). PTX3 is a cytokine-induced protein.Guinea pig apexin [
], a sperm acrosomal protein. Apexin seems to be theortholog of human neuronal pentraxin II (gene NPTX2) [
].Rat neuronal pentaxin I [
].The sequences of the different members of this family are quite conserved. This entry represents a six residue pattern which includes a cysteine
known to be involved in a disulfide bridge in CRPs and SAP.
Regucalcin, also known as senesence marker protein-30 (SMP30), wasdiscovered in 1978 as a Ca
2+binding protein that does not contain EF-hand
motifs, suggesting a novel class of Ca2+binding protein. It is primarily
localised to the liver and kidney cortex of animals. Expression of its mRNAin the liver and renal cortex of rats is stimulated by an increase in
cellular Ca2+levels [
,
].Regucalin, as a regulatory protein of Ca2+, has a pivotal role in thecontrol of many cell functions. The protein has a reversible effect on
Ca2+-induced activation and inhibition of many enzymes in both the liver and
renal cortex cells []. It has also been shown to inhibit various proteinkinases (including Ca
2+/calmodulin-dependent protein kinase [
], proteinkinase C [
] and tyrosine kinase) and protein phosphatases, indicating aregulatory role in signal transduction within the cell. In addition,
regucalcin regulates intracellular Ca2+homeostasis by enhancing Ca2+-
pumping activity in the plasma membrane through activation of the pumpenzymes [
]. Moreover, it can inhibit RNA synthesis in the nuclei of normaland regenerating rat livers in vitro [
].Hydropathy profiles indicate hydrophobic domains in both N- and C-terminalregions of the regucalcin molecule; the protein also exhibits hydrophilic
characteristics []. Human and rodent regucalcins share 89% sequenceidentity, the high degree of conservation between species suggesting that
the complete structure is required for physiological function.
Formins (formin homology proteins) proteins play a crucial role in the reorganisation of the actin cytoskeleton and associate with the fast-growing end (barbed end) of actin filaments [,
]. This entry represents the formin homologues from animals (and some fungi), including protein cappuccino from Drosophila melanogaster and formins from human and mouse. Protein cappuccino acts as an actin nucleation factor and promotes assembly of actin filaments together with spir. It may play a role in intracellular vesicle transport along actin fibres, providing a novel link between actin cytoskeleton dynamics and intracellular transport []. Formins are characterised by the presence of three FH domains (FH1, FH2 and FH3), although members of the formin family do not necessarily contain all three domains [
]. The proline-rich FH1 domain mediates interactions with a variety of proteins, including the actin-binding protein profilin, SH3 (Src homology 3) domain proteins, and WW domain proteins. The FH2 domain is required for the self-association of formin proteins through the ability of FH2 domains to directly bind each other [], and may also act to inhibit actin polymerisation []. The FH3 domain () is less well conserved and may be important for determining intracellular localisation of formin family proteins. In addition, some formins can contain a GTPase-binding domain (GBD) (
) required for binding to Rho small GTPases, and a C-terminal conserved Dia-autoregulatory domain (DAD).
This domain is found in a diverse group of multifunctional proteins that regulate cellular signalling events downstream of G-protein coupled receptors (GPCRs). They belong to the R7 (Neuronal RGS) subfamily, which includes RGS6, RGS7, RGS9, and RGS11, all of which, in humans, are expressed predominantly in the nervous system [
,
]. They form an obligatory complex with G-beta-5, and play important roles in the regulation of crucial neuronal processes []. In addition, R7 proteins were found to bind many other proteins outside of the G protein signalling pathways including: m-opioid receptor, beta-arrestin, alpha-actinin-2, NMDAR, polycystin, spinophilin, guanylyl cyclase, among others. They have an RGS domain, which is an essential part of the R7 protein subfamily. These proteins are involved in many crucial cellular processes such as regulation of intracellular trafficking, glial differentiation, embryonic axis formation, skeletal and muscle development, and cell migration during early embryogenesis []. This entry also includes egl-10, an orthologue of human RGS7 from C. elegans.This domain is referred to as DEP helical extension (DHEX) because it is located next to N-terminal Dishevelled/Egl-10/Pleckstrin homology (DEP) domain [
]. Both the DEP and DHEX domains are necessary, but not sufficient, to bind anchoring proteins such as RGS9 anchor protein. These constitute a site of major allosteric regulation of the complex via association with GPCRs and small SNARE-like membrane proteins. DHEX has no close structural homologues [].
Regulator of G-protein signalling 6/7/9/11, DHEX domain superfamily
Type:
Homologous_superfamily
Description:
This entry includes a diverse group of multifunctional proteins that regulate cellular signalling events downstream of G-protein coupled receptors (GPCRs). They belong to the R7 (Neuronal RGS) subfamily, which includes RGS6, RGS7, RGS9, and RGS11, all of which, in humans, are expressed predominantly in the nervous system [
,
]. They form an obligatory complex with G-beta-5, and play important roles in the regulation of crucial neuronal processes []. In addition, R7 proteins were found to bind many other proteins outside of the G protein signalling pathways including: m-opioid receptor, beta-arrestin, alpha-actinin-2, NMDAR, polycystin, spinophilin, guanylyl cyclase, among others. They have an RGS domain, which is an essential part of the R7 protein subfamily. These proteins are involved in many crucial cellular processes such as regulation of intracellular trafficking, glial differentiation, embryonic axis formation, skeletal and muscle development, and cell migration during early embryogenesis []. This entry also includes egl-10, an orthologue of human RGS7 from C. elegans.This is the DHEX (DEP helical extension) domain superfamily. The DHEX domain is located next to N-terminal Dishevelled/Egl-10/Pleckstrin homology (DEP) domain. These constitute a site of major allosteric regulation of the complex via association with GPCRs and small SNARE-like membrane proteins. Both the DEP and DHEX domains are necessary, but not sufficient, to bind anchoring proteins such as RGS9 anchor protein. DHEX has no close structural homologues [
].
The Flagellar/Hr/Invasion Proteins Export Pore (FHIPEP) family [
,
] consists of a number of proteins that constitute the type III secretion (or signal peptide-independent) pathway apparatus [,
]. This mechanism translocates proteins lacking an N-terminal signal peptide across the cell membrane in one step, as it does not require an intermediate periplasmic process to cleave the signal peptide. It is a common pathway amongst Gram-negative bacteria for secreting toxic and flagellar proteins.The pathway apparatus comprises three components: two within the inner membrane and one within the outer []. An FHIPEP protein is located within the inner membrane, although it is unknown which component it constitutes. FHIPEP proteins have all about 700 amino-acid residues. Within the sequence, the N terminus is highly conserved and hydrophobic, suggesting that this terminus is embedded within the membrane, with 6-8 transmembrane (TM) domains, while the C terminus is less conserved and appears to be devoid of TM regions. It is possible that members of the FHIPEP family serve as pores for the export of specific proteins.Characterized proteins from the FHIPEP family include: the flagellar biosynthesis protein FlhA which is required for formation of the rod structure of the flagellar apparatus [
]; and the secretion system apparatus protein SsaV from Salmonella typhimuriumwhich is required for the secretion of the SpvB toxin [
].
This domain is involved in binding sterols, and is found in proteins such as SCP2. This domain has a 3-layer alpha/beta/alpha fold, composed of alpha/beta(3)/(crossover)/beta/(alpha)/beta. The human sterol carrier protein 2 (SCP2), also known as nonspecific lipid transfer protein, is a basic protein that is believed to participate in the intracellular transport of cholesterol and various other lipids [
]. The Unc-24 protein of Caenorhabditis elegans contains a domain similar to part of two ion channel regulators (the erythrocyte integral membrane protein stomatin and the C. elegans neuronal protein MEC-2) juxtaposed to a domain similar to nonspecific lipid transfer protein (nsLTP; also called sterol carrier protein 2) [].
This domain is involved in binding sterols, and is found in proteins such as SCP2. This domain has a 3-layer alpha/beta/alpha fold, composed of alpha/beta(3)/(crossover)/beta/(alpha)/beta. The human sterol carrier protein 2 (SCP2), also known as nonspecific lipid transfer protein, is a basic protein that is believed to participate in the intracellular transport of cholesterol and various other lipids [
]. The Unc-24 protein of Caenorhabditis elegans contains a domain similar to part of two ion channel regulators (the erythrocyte integral membrane protein stomatin and the C. elegans neuronal protein MEC-2) juxtaposed to a domain similar to nonspecific lipid transfer protein (nsLTP; also called sterol carrier protein 2) [].
The Sac domain is a region of homology between the N terminus of synaptojanin and the otherwise unrelated yeast protein Sac1p. The Sac domain is approximately 400 residues in length, and proteins containing this domain show approximately 35% identity with other Sac domains throughout this region. The Sac domain exhibits phosphatidylinositol polyphosphate phosphatase activity and can hydrolyse phosphate from any of the three positions of inositol that may be phosphorylated (3-, 4- and 5). However, adjacent phosphates are resistant to hydrolysis. Sac domains cannot hydrolyse phosphate from phosphatidylinositol-4,5-bisphosphate (PtdIns(4,5)P2), or PtdIns(3,4)P2, or PtdIns(3,4,5)P3, but can hydrolyse PtdIns(3,5)P2 [
].The Sac domain consists of seven highly conserved motifs which appear to define the catalytic and regulatory regions of the phosphatase. The sixth conserved region contains a highly conserved C-x(5)-R-[TS] motif, thought to be the catalytic motif of many metal-independent protein and inositide polyphosphate phosphatases. Interestingly, the Inp51p Sac domain in which the cysteine, arginine and threonine/serine residues within the C-x(5)-R-[TS]motif are absent, does not exhibit any phosphatase activity [
].Two classes of Sac domain proteins have been identified in mammals as well as lower eukaryotes [
]. The first comprises proteins, which, in addition to an N-terminal phosphatase Sac domain, have all the domains associated with type II phosphatidylinositol phosphate 5-phosphatases:Mammalian synaptojanins, type II phosphatidylinositol phosphate 5- phosphatases.Yeast INP51, a 108kDa membrane protein. It is involved in endocytosis and regulation of the actin cytoskeleton under conditions of normal vegetative growth. Although the Sac phosphatase domain of INP51 may be catalytically inactive, the domain may retain other functions.Yeast INP52, a 133kDa membrane protein. It is involved in endocytosis and regulation of the actin cytoskeleton under conditions of normal vegetative growth.Yeast INP53, a 124kDa membrane protein. It appears to have a role in intra-Golgi and Golgi-to-endosomal trafficking.The other class of Sac-containing phosphatases consists of proteins with an N-terminal Sac phosphatase domain and no other recognizable domains:Yeast Sac1p, a 67kDa membrane protein found in the endoplasmic reticulum (ER) and Golgi. It regulates the actin cytoskeleton and phospholipid metabolism.Yeast FIG4, a 101kDa protein encoded by a pheromone regulated or induced gene. FIG4 might function to regulate effector molecules of the actin cytoskeleton during mating.
The tetratrico peptide repeat region (TPR) is a structural motif present in a wide range of proteins [
,
,
]. It mediates protein-protein interactions and the assembly of multiprotein complexes []. The TPR motif consists of 3-16 tandem-repeats of 34 amino acids residues, although individual TPR motifs can be dispersed in the protein sequence. Sequence alignment of the TPR domains reveals a consensus sequence defined by a pattern of small and large amino acids. TPR motifs have been identified in various different organisms, ranging from bacteria to humans. Proteins containing TPRs are involved in a variety of biological processes, such as cell cycle regulation, transcriptional control, mitochondrial and peroxisomal protein transport, neurogenesis and protein folding.The X-ray structure of a domain containing three TPRs from protein phosphatase 5 revealed that TPR adopts a helix-turn-helix arrangement, with adjacent TPR motifs packing in a parallel fashion, resulting in a spiral of repeating anti-parallel α-helices [
]. The two helices are denoted helix A and helix B. The packing angle between helix A and helix B is ~24 degrees within a single TPR and generates a right-handed superhelical shape. Helix A interacts with helix B and with helix A' of the next TPR. Two protein surfaces are generated: the inner concave surface is contributed to mainly by residue on helices A, and the other surface presents residues from both helices A and B. The domain represented in this superfamily consists of a multi-helical fold comprised of two curved layers of α-helices arranged in a regular right-handed superhelix, where the repeats that make up this structure are arranged about a common axis [
]. These superhelical structures present an extensive solvent-accessible surface that is well suited to binding large substrates such as proteins and nucleic acids. The TPR is likely to be an ancient repeat, since it is found in eukaryotes, bacteria and archaea, whereas the PPR repeat is found predominantly in higher plants. The superhelix formed from these repeats can bind ligands at a number of different regions, and has the ability to acquire multiple functional roles [].
SH3 (src Homology-3) domains are small protein modules containing approximately 50 amino acid residues [
,
]. They are found in a great variety of intracellular or membrane-associated proteins [,
,
] for example, in a variety of proteins with enzymatic activity, in adaptor proteins, such as fodrin and yeast actin binding protein ABP-1.The SH3 domain has a characteristic fold which consists of five or six β-strands arranged as two tightly packed anti-parallel β-sheets. The linker regions may contain short helices. The surface of the SH3-domain bears a flat, hydrophobic ligand-binding pocket which consists of three shallow grooves defined by conservative aromatic residues in which the ligand adopts an extended left-handed helical arrangement. The ligand binds with low affinity but this may be enhanced by multiple interactions. The region bound by the SH3 domain is in all cases proline-rich and contains PXXP as a core-conserved binding motif. The function of the SH3 domain is not well understood but they may mediate many diverse processes such as increasing local concentration of proteins, altering their subcellular location and mediating the assembly of large multiprotein complexes [
].The crystal structure of the SH3 domain of the cytoskeletal protein spectrin, and the solution structures of SH3 domains of phospholipase C (PLC-y) and phosphatidylinositol 3-kinase p85 alpha-subunit, have been determined [
,
,
]. In spite of relatively limited sequence similarity, their overall structures are similar. The domains belong to the α+β structural class, with five to eight β-strands forming 2 tightly-packed, anti-parallel β-sheets arranged in a barrel-like structure, and intervening loops sometimes forming helices. Conserved aliphatic and aromatic residues form a hydrophobic core (A11, L23, A29, V34, W42, L52 and V59 in PLC-y []) and a hydrophobic pocket on the molecular surface (L12, F13, W53 and P55 in PLC-y). The conserved core is believed to stabilise the fold, while the pocket is thought to serve as a binding site for target proteins. Conserved carboxylic amino acids located in the loops, on the periphery of the pocket (D14 and E22), may be involved in protein-protein interactions via proline-rich regions. The N- and C-terminal are packed in close proximity, indicating that they are independent structural modules.
The Ubiquitin Interacting Motif (UIM), or 'LALAL-motif', is a stretch of about 20 amino acid residues, which was first described in the 26S proteasome subunit PSD4/RPN-10 that is known to recognise ubiquitin [
,
]. In addition, the UIM is found, often in tandem or triplet arrays, in a variety of proteins either involved in ubiquitination and ubiquitin metabolism, or known to interact with ubiquitin-like modifiers. Among the UIM proteins are two different subgroups of the UBP (ubiquitin carboxy-terminal hydrolase) family of deubiquitinating enzymes, one F-box protein, one family of HECT-containing ubiquitin-ligases (E3s) from plants, and several proteins containing ubiquitin-associated UBA and/or UBX domains []. In most of these proteins, the UIM occurs in multiple copies and in association with other domains such as UBA (), UBX (
), ENTH, EH (
), VHS (
), SH3 (
), HECT (
), VWFA (
), EF-hand calcium-binding, WD-40 (
), F-box (
), LIM (
), protein kinase (
), ankyrin (
), PX (
), phosphatidylinositol 3- and 4-kinase (
), C2 (
), OTU (
), dnaJ (
), RING-finger (
) or FYVE-finger (
). UIMs have been shown to bind ubiquitin and to serve as a specific targeting signal important for monoubiquitination. Thus, UIMs may have several functions in ubiquitin metabolism each of which may require different numbers of UIMs [
,
,
]. The UIM is unlikely to form an independent folding domain. Instead, based on the spacing of the conserved residues, the motif probably forms a short α-helix that can be embedded into different protein folds [
]. Some proteins known to contain an UIM are listed below: Eukaryotic PSD4/RPN-10/S5, a multi-ubiquitin binding subunit of the 26S proteasome. Vertebrate Machado-Joseph disease protein 1 (Ataxin-3), which acts as a histone-binding protein that regulates transcription; defects in Ataxin-3 cause the neurodegenerative disorder Machado-Joseph disease (MJD).Vertebrate epsin and epsin2. Vertebrate hepatocyte growth factor-regulated tyrosine kinase substrate (HRS). Mammalian epidermal growth factor receptor substrate 15 (EPS15), which is involved in cell growth regulation. Mammalian epidermal growth factor receptor substrate EPS15R. Drosophila melanogaster (Fruit fly) liquid facets (lqf), an epsin. Yeast VPS27 vacuolar sorting protein, which is required for membrane traffic to the vacuole.
SH3 (src Homology-3) domains are small protein modules containing approximately 50 amino acid residues [
,
]. They are found in a great variety of intracellular or membrane-associated proteins [,
,
] for example, in a variety of proteins with enzymatic activity, in adaptor proteins, such as fodrin and yeast actin binding protein ABP-1.
The SH3 domain has a characteristic fold which consists of five or six β-strands arranged as two tightly packed anti-parallel β-sheets. The linker regions may contain short helices. The surface of the SH3-domain bears a flat, hydrophobic ligand-binding pocket which consists of three shallow grooves defined by conservative aromatic residues in which the ligand adopts an extended left-handed helical arrangement. The ligand binds with low affinity but this may be enhanced by multiple interactions. The region bound by the SH3 domain is in all cases proline-rich and contains PXXP as a core-conserved binding motif. The function of the SH3 domain is not well understood but they may mediate many diverse processes such as increasing local concentration of proteins, altering their subcellular location and mediating the assembly of large multiprotein complexes [
].The crystal structure of the SH3 domain of the cytoskeletal protein spectrin, and the solution structures of SH3 domains of phospholipase C (PLC-y) and phosphatidylinositol 3-kinase p85 alpha-subunit, have been determined [
,
,
]. In spite of relatively limited sequence similarity, their overall structures are similar. The domains belong to the α+β structural class, with five to eight β-strands forming 2 tightly-packed, anti-parallel β-sheets arranged in a barrel-like structure, and intervening loops sometimes forming helices. Conserved aliphatic and aromatic residues form a hydrophobic core (A11, L23, A29, V34, W42, L52 and V59 in PLC-y []) and a hydrophobic pocket on the molecular surface (L12, F13, W53 and P55 in PLC-y). The conserved core is believed to stabilise the fold, while the pocket is thought to serve as a binding site for target proteins. Conserved carboxylic amino acids located in the loops, on the periphery of the pocket (D14 and E22), may be involved in protein-protein interactions via proline-rich regions. The N- and C-terminal are packed in close proximity, indicating that they are independent structural modules.
This entry includes a domain found in nitrate (Nrt) and bicarbonate (Cmp) receptors. This domain is found in eubacterial periplasmic-binding proteins that serve as initial receptors in the ABC transport of bicarbonate, nitrate, taurine, or a wide range of aliphatic sulfonates [
]. After binding its ligand with high affinity, it interacts with a cognate membrane transport complex comprised of two integral membrane domains and two cytoplasmically located ATPase domains. This interaction triggers the ligand translocation across the cytoplasmic membrane energised by ATP hydrolysis. These binding proteins belong to the PBP2 superfamily of periplasmic-binding proteins that differ in size and ligand specificity, but have similar tertiary structures consisting of two globular subdomains connected by a flexible hinge. They have been shown to bind their ligand in the cleft between these domains in a manner resembling a Venus flytrap [
].In cyanobacteria, nitrate transport takes place through NRT system, a multicomponent ABC transporter. NRT consists of 4 proteins: a periplasmic substrate-binding protein (NrtA) involved in the specific and high affinity binding of nitrate and nitrite. NrtB is a hydrophobic protein with structural similarities to integral membrane subunits of ABC transporters [
]. It is thought to form a pore across the membrane to allow the translocation of nitrate and nitrite. Finally, NrtC and NrtD are proposed to form a heterodimer associated to the inner side of the cytoplasmic membrane, and to be responsible of energising the transport system via ATP hydrolysis. In NrtC the binding site for ATP in found at the N-terminal [].The cmpA, cmpB, cmpC, and cmpD genes are strongly similar to the genes encoding the nitrate/nitrite transporter, nrtA, nrtB, nrtC, and nrtD, respectively.
NrtB and CmpB are hydrophobic proteins with structural similarities to the integral membrane components of ABC transporters. CmpC and CmpD are ATP-binding cassette proteins strongly similar to NrtC and NrtD, respectively. CmpA is a cytoplasmic membrane protein, which is 46.5% identical to NrtA that functions as the membrane-anchored substrate (nitrate and nitrite)-binding protein []. The similarity of CmpA to NrtA and its involvement in HCO(3)(-) uptake suggest that CmpA is the substrate-binding protein of the HCO(3)(-) transporter [].
PRY is a 50-60 amino acids domain associated with SPRY domains, adjacent to its N-terminal. The SPRY domain (
) is a protein-protein interaction module involved in many important signaling pathways [
,
]. Distant homologues are domains in butyrophilin/marenostrin/pyrin, evolutionarily more ancient than SPRY/B30.2 counterpart. PRY and SPRY domains are structurally very similar and consist of a beta sandwich fold [,
]. Ca2+-release from the sarcoplasmic or endoplasmic reticulum, the intracellular Ca2+ store, is mediated by the ryanodine receptor (RyR) and/or the inositol trisphosphate receptor (IP3R).The proteins identified by the PRY domain, clearly fall into 3 sets which can be defined by their combination of signatures:This group contains an immunoglobulin domain N-terminal to the PRY and butyrophilin domains. Butyrophilins are glycoproteins that are expressed on the apical surfaces of secretory cells in lactating mammary tissue and which may function in the secretion of milk-fat droplets.This group contain a RING-finger domain N-terminal to the PRY domain. The RING-finger is a specialised type of Zn-finger of 40 to 60 residues that binds two atoms of zinc, and is probably involved in mediating protein-protein interactions. There are two different variants, the C3HC4-type and a C3H2C3-type, which is clearly related despite the different cysteine/histidine pattern. The latter type is sometimes referred to as 'RING-H2 finger' is not found associated with this group of proteins. This set of proteins are described as TRIM (TRIpartite Motif) family members and are involved in cellular compartmentalisation [
]. The TRIM family sequences are defined by a Ring finger domain, a B-box type1 (B1) and a B-box type 2 (B2) followed by a coiled-coil (CC) region []. Genes belonging to this family are implicated in a variety of processes such as development and cell growth and are involved in human disease.Many of these proteins, with the PRY domain have a number of C-terminal signatures, SPRY, RFP-like (also known as B30.2 domain or PRYSPRY) and butyrophilin domain [
].The third set of proteins have the C-terminal signatures but have no N-terminal RING-finger or immunoglobulin domain signatures. These proteins have not been functionally described.
The band-7 protein family comprises a diverse set of membrane-bound proteins characterised by the presence of a conserved domain, the band-7 domain, also known as SPFH or PHB domain [
]. The exact function of the band-7 domain is not known, but examples from animal and bacterial stomatin-type proteins demonstrate binding to lipids and the ability to assemble into membrane-bound oligomers that form putative scaffolds []. A variety of proteins belong to this family. These include the prohibitins, cytoplasmic anti-proliferative proteins and stomatin, an erythrocyte membrane protein. Bacterial HflC protein also belongs to this family.Note: Band 4.1 and Band 7 proteins refer to human erythrocyte membrane proteins separated by SDS polyacrylamide gels and stained with coomassie blue [
].
The band-7 protein family comprises a diverse set of membrane-bound proteins characterised by the presence of a conserved domain, the band-7 domain, also known as SPFH or PHB domain [
]. The exact function of the band-7 domain is not known, but examples from animal and bacterial stomatin-type proteins demonstrate binding to lipids and the ability to assemble into membrane-bound oligomers that form putative scaffolds []. A variety of proteins belong to this family. These include the prohibitins, cytoplasmic anti-proliferative proteins and stomatin, an erythrocyte membrane protein. Bacterial HflC protein also belongs to this family.Note: Band 4.1 and Band 7 proteins refer to human erythrocyte membrane proteins separated by SDS polyacrylamide gels and stained with coomassie blue [
].
This entry represents B-box-type zinc finger domains, which are around 40 residues in length. B-box zinc fingers can be divided into two groups, where types 1 and 2 B-box domains differ in their consensus sequence and in the spacing of the 7-8 zinc-binding residues. Several proteins contain both types 1 and 2 B-boxes, suggesting some level of cooperativity between these two domains. B-box domains are found in over 1500 proteins from a variety of organisms. They are found in TRIM (tripartite motif) proteins that consist of an N-terminal RING finger (originally called an A-box), followed by 1-2 B-box domains and a coiled-coil domain (also called RBCC for Ring, B-box, Coiled-Coil). TRIM proteins contain a type 2 B-box domain, and may also contain a type 1 B-box. In proteins that do not contain RING or coiled-coil domains, the B-box domain is primarily type 2. Many type 2 B-box proteins are involved in ubiquitination. Proteins containing a B-box zinc finger domain include transcription factors, ribonucleoproteins and proto-oncoproteins; for example, MID1, MID2, TRIM9, TNL, TRIM36, TRIM63, TRIFIC, NCL1 and CONSTANS-like proteins [].The microtubule-associated E3 ligase MID1 (
) contains a type 1 B-box zinc finger domain. MID1 specifically binds Alpha-4, which in turn recruits the catalytic subunit of phosphatase 2A (PP2Ac). This complex is required for targeting of PP2Ac for proteasome-mediated degradation. The MID1 B-box coordinates two zinc ions and adopts a β/β/α cross-brace structure similar to that of ZZ, PHD, RING and FYVE zinc fingers [
,
].
Sorting nexins (SNXs) are a diverse group of cellular trafficking proteins that are unified by the presence of a phospholipid-binding motif, the PX domain. The ability of these proteins to bind specific phospholipids, as well as their propensity to form protein-protein complexes, points to a role for these proteins in membrane trafficking and protein sorting [
]. SNX6 was found to interact with members of the transforming growth factor-beta family of receptor serine/threonine kinases. Strong heteromeric interactions were also seen among SNX1, -2, -4, and -6, suggesting the formation in vivo of oligomeric complexes. SNX6 forms a stable complex with SNX1 and may be a component of the retromer complex, a membrane coat multimeric complex required for endosomal retrieval of lysosomal hydrolase receptors to the Golgi, acting as a mammalian equivalent of yeast Vsp17p [
]. It also plays roles in enhancing the degradation of EGFR and in regulating the activity of Na,K-ATPase through its interaction with Translationally Controlled Tumor Protein (TCTP) [,
]. SNX6 is localized in the cytoplasm where it is thought to target proteins to the trans-Golgi network []. In addition, SNX6 was found to be translocated from the cytoplasm to nucleus by Pim-1, an oncogene product of serine/threonine kinase. This translocation is not affected by Pim-1-dependent phosphorylation, but the functional significance is unknown [].BAR domains form dimers that bind to membranes, induce membrane bending and curvature, and may also be involved in protein-protein interactions. This entry represents the BAR domain is SNX6.
Bacterial lipoproteins represent a large group of specialized membrane proteins that perform a variety of functions including maintenance and stabilization of the cell envelope, protein targeting and transit to the outer membrane, membrane biogenesis, and cell adherence [
]. Pathogenic Gram-negative bacteria within the Neisseriaceae and Pasteurellaceae families rely on a specialized uptake system, characterized by an essential surface receptor complex that acquires iron from host transferrin (Tf) and transports the iron across the outer membrane. They have an iron uptake system composed of surface exposed lipoprotein, Tf-binding protein B (TbpB), and an integral outer-membrane protein, Tf-binding protein A (TbpA), that together function to extract iron from the host iron binding glycoprotein (Tf). TbpB is a bilobed (N and C lobe) lipid-anchored protein with each lobe consisting of an eight-stranded beta barrel flanked by a 'handle' domain made up of four (N lobe) or eight (C lobe) beta strands []. TbpB extends from the outer membrane surface by virtue of an N-terminal peptide region that is anchored to the outer membrane by fatty acyl chains on the N-terminal cysteine and is involved in the initial capture of iron-loaded Tf []. This domain superfamily is found in the handle domain of the C lobe (domain C) of TbpB proteins. It consists of a squashed six-stranded beta sheet flanked by two antiparallel beta strands and has no supporting alpha helix as in the N lobe [
].
Alphaviruses are enveloped RNA viruses that use arthropods such as mosquitoes for transmission to their vertebrate hosts, and include Semliki Forest and Sindbis viruses [
]. Alphaviruses consist of three structural proteins: the core nucleocapsid protein C, and the envelope proteins P62 and E1 that associate as a heterodimer. The viral membrane-anchored surface glycoproteins are responsible for receptor recognition and entry into target cells through membrane fusion. The proteolytic maturation of P62 into E2 () and E3 (
) causes a change in the viral surface. Together the E1, E2, and sometimes E3, glycoprotein "spikes"form an E1/E2 dimer or an E1/E2/E3 trimer, where E2 extends from the centre to the vertices, E1 fills the space between the vertices, and E3, if present, is at the distal end of the spike [
]. Upon exposure of the virus to the acidity of the endosome, E1 dissociates from E2 to form an E1 homotrimer, which is necessary for the fusion step to drive the cellular and viral membranes together. The alphaviral glycoprotein E1 is a class II viral fusion protein, which is structurally different from the class I fusion proteins found in influenza virus and HIV. The structure of the Semliki Forest virus revealed a structure that is similar to that of flaviviral glycoprotein E, with three structural domains in the same primary sequence arrangement []. This entry represents all three domains of the alphaviral E1 glycoprotein.
Cyanobacteria and red algae harvest light through water-soluble complexes, called phycobilisomes, which are attached to the outer face of the thylakoid membrane [
]. These complexes are capable of transferring the absorbed energy to the photosynthetic reaction centre with greater than 95% efficiency. Phycobilisomes contain various photosynthetic light harvesting proteins known as biliproteins, and linker proteins which help assemble the structure. The two main structural elements of the complex are a core located near the photosynthetic reaction centre, and rods attached to this core. Allophycocyanin is the major component of the core, while the rods contain phycocyanins, phycoerythrins and linker proteins. The rod biliproteins harvest photons, with the excitation energy being passed through the rods into the allophycocyanin in the core. Other core biliproteins subsequently pass this energy to chlorophyll within the thylakoid membrane.This entry represents the alpha and beta subunits found in biliproteins from cyanobacteria and red algae. Structural studies indicate that the basic structural unit of most biliproteins is a heterodimer composed of these alpha and beta subunits [
,
,
,
]. The full protein is a ring-like trimer assembly of these heterodimers. Each subunit of the heterodimer has eight helices and binds chromophores through thioester bonds formed at particular cysteine residues. These chromophores, also known as bilins, are open-chain tetrapyrroles whose number and type vary with the particular biliprotein eg R-phyocerythrin binds five phycoerythrobilins per heterodimer, while allophycocyanin binds two phycocyanobilins per heterodimer.
Cyanobacteria and red algae harvest light through water-soluble complexes, called phycobilisomes, which are attached to the outer face of the thylakoid membrane [
]. These complexes are capable of transferring the absorbed energy to the photosynthetic reaction centre with greater than 95% efficiency. Phycobilisomes contain various photosynthetic light harvesting proteins known as biliproteins, and linker proteins which help assemble the structure. The two main structural elements of the complex are a core located near the photosynthetic reaction centre, and rods attached to this core. Allophycocyanin is the major component of the core, while the rods contain phycocyanins, phycoerythrins and linker proteins. The rod biliproteins harvest photons, with the excitation energy being passed through the rods into the allophycocyanin in the core. Other core biliproteins subsequently pass this energy to chlorophyll within the thylakoid membrane.This entry represents the alpha and beta subunits found in biliproteins from cyanobacteria and red algae. Structural studies indicate that the basic structural unit of most biliproteins is a heterodimer composed of these alpha and beta subunits [,
,
,
]. The full protein is a ring-like trimer assembly of these heterodimers. Each subunit of the heterodimer has eight helices and binds chromophores through thioester bonds formed at particular cysteine residues. These chromophores, also known as bilins, are open-chain tetrapyrroles whose number and type vary with the particular biliprotein eg R-phyocerythrin binds five phycoerythrobilins per heterodimer, while allophycocyanin binds two phycocyanobilins per heterodimer.