Search our database by keyword

- or -

Examples

  • Search this entire website. Enter identifiers, names or keywords for genes, pathways, authors, ontology terms, etc. (e.g. eve, embryo, zen, allele)
  • Use OR to search for either of two terms (e.g. fly OR drosophila) or quotation marks to search for phrases (e.g. "dna binding").
  • Boolean search syntax is supported: e.g. dros* for partial matches or fly AND NOT embryo to exclude a term

Search results 11901 to 12000 out of 30763 for seed protein

Category restricted to ProteinDomain (x)

0.036s

Categories

Category: ProteinDomain
Type Details Score
Protein Domain
Name: 26S proteasome regulatory subunit, C-terminal
Type: Domain
Description: Intracellular proteins, including short-lived proteins such as cyclin, Mos, Myc, p53, NF-kappaB, and IkappaB, are degraded by the ubiquitin-proteasome system. The 26S proteasome is a self-compartmentalising protease responsible for the regulated degradation of intracellular proteins in eukaryotes [ , ]. This giant intracellular protease is formed by several subunits arranged into two 19S polar caps, where protein recognition and ATP-dependent unfolding occur, flanking a 20S central barrel-shaped structure with an inner proteolytic chamber. This overall structure is highly conserved among eukaryotes and is essential for cell viability. Proteins targeted to the 26S proteasome are conjugated with a polyubiquitin chain by an enzymatic cascade before delivery to the 26S proteasome for degradation into oligopeptides.The 26S proteasome can be divided into two subcomplexes: the 19S regulatory particle (RP) and the 20S core particle (CP) [ ]. The 19S component is divided into a "base"subunit containing six ATPases (Rpt proteins) and two non-ATPases (Rpn1, Rpn2), and a "lid"subunit composed of eight stoichiometric proteins (Rpn3, Rpn5, Rpn6, Rpn7, Rpn8, Rpn9, Rpn11, Rpn12) [ ]. Additional non-essential and species specific proteins may also be present. The 19S unit performs several essential functions including binding the specific protein substrates, unfolding them, cleaving the attached ubiquitin chains, opening the 20S subunit, and driving the unfolded polypeptide into the proteolytic chamber for degradation. The 26s proteasome and 19S regulator are of medical interest due to their involvement in burn rehabilitation [].This eukaryotic domain is found at the C terminus of 26S proteasome regulatory subunits such as the non-ATPase Rpn3 subunit which is essential for proteasomal function [ ]. It occurs together with the PCI/PINT domain ().
Protein Domain
Name: Flavivirus glycoprotein, central and dimerisation domain superfamily
Type: Homologous_superfamily
Description: Flaviviruses are small, enveloped RNA viruses that use arthropods such as mosquitoes for transmission to their vertebrate hosts, and include Yellow fever virus (YFV), West Nile virus (WNV), Tick-borne encephalitis virus, Japanese encephalitis virus (JE) and Dengue virus 2 viruses [ ]. Flaviviruses consist of three structural proteins: the core nucleocapsid protein C (IPR001122), and the envelope glycoproteins M (IPR000069) and E. Glycoprotein E is a class II viral fusion protein that mediates both receptor binding and fusion. Class II viral fusion proteins are found in flaviviruses and alphaviruses, and are structurally distinct from class I fusion proteins from influenza virus and HIV.Glycoprotein E is comprised of three domains: domain I (dimerisation domain) is an 8-stranded beta barrel, domain II (central domain) is an elongated domain composed of twelve beta strands and two alpha helices, and domain III (immunoglobulin-like domain) is an IgC-like module with ten beta strands. Domains I and II are intertwined [ ]. This superfamily represents the intertwined central and dimerisation domains.The glycoprotein E dimers on the viral surface re-cluster irreversibly into fusion-competent trimers upon exposure to low pH, as found in the acidic environment of the endosome. The formation of trimers results in a conformational change in the hinge region of domain II, a key structural element that opens a ligand-binding hydrophobic pocket at the interface between domains I and II. The conformational change results in the exposure of a fusion peptide loop at the tip of domain II, which is required in the fusion step to drive the cellular and viral membranes together by inserting into the membrane [].
Protein Domain
Name: Thyroglobulin type-1 superfamily
Type: Homologous_superfamily
Description: Thyroglobulin (Tg) is a large glycoprotein specific to the thyroid gland and is the precursor of the iodinated thyroid hormones thyroxine (T4) and triiodothyronine (T3). The N-terminal section of Tg contains 10 repeats of a domain of about 65 amino acids which is known as the Tg type-1 repeat [ , ]. Such a domain has also been found as a single or repeated sequence in the HLA class II associated invariant chain []; human pancreatic carcinoma marker proteins GA733-1 and GA733-2 []; nidogen (entactin), a sulphated glycoprotein which is widely distributed in basement membranes and that is tightly associated with laminin; insulin-like growth factor binding proteins (IGFBP) []; saxiphilin, a transferrin-like protein from Rana catesbeiana (Bull frog) that binds specifically to the neurotoxin saxitoxin []; chum salmon egg cysteine proteinase inhibitor, and equistatin, a thiol-protease inhibitor from Actinia equina (sea anemone) []. The existence of Thyr-1 domains in such a wide variety of proteins raises questions about their activity and function, and their interactions with neighbouring domains. The Thyr-1 and related domains belong to MEROPS proteinase inhibitor family I31, clan IX.Equistatin from A. equina is composed of three Thyr-1 domains; as with other proteins that contains Thyr-1 domains, the thyropins, they bind reversibly and tightly to cysteine proteases (inhibitor family C1). In equistatin inhibition of papain is a function of domain-1. Unusually domain-2 inhibits cathepsin D, an aspartic protease (inhibitor family A1) and has no activity against papain. Domain-3, does not inhibit either papain or cathepsin D, and its function or its target peptidase has yet to be determined [ , ].The thyroglobulin type-1 domain has an alpha+beta fold.
Protein Domain
Name: Vitellinogen, beta-sheet N-terminal
Type: Homologous_superfamily
Description: Vitellinogen precursors provide the major egg yolk proteins that are a source of nutrients during early development of oviparous vertebrates and invertebrates. Vitellinogen precursors are multi-domain apolipoproteins that are cleaved into distinct yolk proteins. Different vitellinogen precursors exist, which are composed of variable combinations of yolk protein components; however, the cleavage sites are conserved [ , ].In vertebrates, a complete vitellinogen is composed of an N-terminal signal peptide for export, followed by four regions that can be cleaved into yolk proteins: lipovitellin-1, phosvitin, lipovitellin-2, and a von Willebrand factor type D domain (YGP40). Vitellinogens are post-translationally glycosylated and phosphorylated in the endoplasmic reticulum and Golgi complex of hepatocytes, before being secreted into the circulatory system to be taken up by oocytes. In the ovary, vitellinogens bind to specific Vtgr receptors on oocyte membranes to become internalised by endocytosis, where they are cleaved into yolk proteins by cathepsin D. YGP40 is released into the yolk plasma before or during compartmentation of lipovitellin-phosvitin complex into the yolk granule.The different yolk proteins have distinct roles. Phosvitins are important in sequestering calcium, iron and other cations for the developing embryo. Phosvitins are one of the most phosphorylated (10%) proteins in nature, the high concentration of phosphate groups providing efficient metal-binding sites in clusters [ , ]. Lipovitellins are involved in lipid and metal storage, and contain a heterogeneous mixture of about 16% (w/w) noncovalently bound lipid, most being phospholipid. Lipovitellin-1 contains two chains, LV1N and LV1C [, ].This entry represents a β-sheet shell region found at the N terminus of vitellinogen proteins, which generally corresponds to the N-terminal of the lipovitellin-1 peptide product.
Protein Domain
Name: Serine protease, chymotrypsin-like serine protease, C-terminal
Type: Homologous_superfamily
Description: The replicase polyprotein 1ab is a multifunctional protein: it contains the activities necessary for the transcription of negative stranded RNA, leader RNA, subgenomic mRNAs and progeny virion RNA as well as proteinases responsible for the cleavage of the polyprotein into functional products. Nsp1 is essential for viral subgenomic mRNA synthesis. Nsp2 cysteine proteinase which cleaves the Nsp2/Nsp3 site in the polyprotein. Also displays deubiquitinating and deISGylase activities. The deubiquitinating activity cleaves both ubiquitinated and ISGylated products and may therefore regulate ubiquitin and ISG15 dependent host innate immunity. The 3C-like serine proteinase chain (Nsp4) is responsible for the majority of cleavages as it cleaves the C terminus of the polyprotein. The helicase chain, which contains a zinc finger structure, displays RNA and DNA duplex-unwinding activities with 5' to 3' polarity [ ].This superfamily represents the C-terminal domain of chymotrypsin-like serine proteinase Nsp4. Nsp4 contains two β-barrels, known as N- and C-terminal barrels, as well as a unique C-terminal domain. The C-terminal domain consists of two short pairs of β-strands and two α-helices. It interacts with the C-terminal barrel through an interface consisting of conserved hydrophobic residues: Leu-105 and Leu-112 from the C-terminal β-barrel and Val-158, Leu-163, Phe-167, Ile-182, Leu-196, and Ile-197 from the C-terminal domain. There is also an exposed patch of conserved solvent-exposed hydrophobic residues that may form part of the interface with Nsp5 in the Nsp4-5 intermediate [ ]. This hydrophobic patch may also mediate interactions with Nsp2, which associates with Nsp3-8 to induce cleavage of the Nsp4-5 site by Nsp4 []. The C-terminal domain can clearly adopt different orientations relative to the two β-barrels, which may facilitate substrate binding or autoproteolysis [ ].
Protein Domain
Name: EMI domain
Type: Domain
Description: The EMI domain, first named after its presence in proteins of the EMILIN family, is a small cysteine-rich module of around 75 amino acids. The EMI domain is most often found at the N terminus of metazoan extracellular proteins that are forming or are compatible with multimer formation [ ]. It is found in association with other domains, such as C1q, laminin-type EGF-like, collagen-like, FN3, WAP, ZP or FAS1 []. It has been suggested that the EMI domain could be a protein-protein interaction module, as the EMI domain of EMILIN-1 was found to interact with the C1q domain of EMILIN-2 []. The EMI domain possesses six highly conserved cysteines residues, which likely form disulphide bonds. Other key features of the EMI domain are the C-C-x-G-[WYFH] pattern, a hydrophobic position just preceding the first cysteine (Cys1) of the domain and a cluster of hydrophobic residues between Cys3 and Cys4. The EMI domain could be made of two sub-domains, the fold of the second one sharing similarities with the C-terminal sub-module characteristic of EGF-like domains []. Proteins known to contain a EMI domain include:Vertebrate Emilins, extracellular matrix glycoproteins.Vertebrate Multimerins, extracellular matrix glycoproteins. Vetebrate Emu proteins, which could interact with several different extracellular matrix components and serve to connect and integrate the function of multiple partner molecules. Vertebrate beta-IG-H3. Vertebrate osteoblast-specific factor 2 (OSF-2). Mammalian NEU1/NG3 proteins. Drosophila midline fasciclin. Caenorhabditis elegans ced-1, a transmembrane receptor that mediates cell corpse engulfment. The Pfam alignment for this domain is truncated at the C terminus and does not include the final cysteine [ ]. This is to stop the family overlapping with other domains.
Protein Domain
Name: CVC domain
Type: Domain
Description: This entry represents the CVC (Chx10/Vsx-1 and ceh-10) domain, which is found in visual system homeobox proteins. Homeobox-containing genes encode transcription factors and are characterised by the homeodomain (HD), a motif that directs specific DNA binding to regulate the expression of target genes. Homeobox genes are grouped into several subclasses according to the primary structure of their HD and its flanking sequences. Among the paired-like subclass, the paired-like CVC (PLC)-homeodomain proteins (HDPs) are characterised by a conserved CVC domain and can be grouped into the Vsx-1 and Vsx-2 family, containing orthologs from several species. The characteristic feature of the Vsx-1 group are the existence of a RV domain and LG sequence and the lack of a Paired-Tail/OAR domain. Instead of an RV domain, the Vsx-2 group proteins have a conserved 14 amino acid motif called the Paired-Tail or OAR domain. These specific domain differences may correlate with the functional specificity of each group. PLC-HDPs appear to play a particular role in occular development and execute their functions by binding to the conserved locus control region (LCR), located upstream of the transcrption initiation site of the red opsin gene, and thus specify the development and differenciation of cone photoreceptors and a subset of retinal inner nuclear layer bipolar cells. The CVC domain consists of approximately 50-60 amino acid residues. Because the CVC domain is located immediately C-terminal to the homeodomain, the CVC domain is predicted to influence transcriptional regulation through DNA-binding, protein-protein interaction and/or protein degradation processes. It is possible that the CVC domain is necessary for protein folding or protein-protein interactions because this domain is hydrophobic [ , , ].
Protein Domain
Name: Cell-cell fusogen EFF/AFF, domain 3
Type: Homologous_superfamily
Description: Cell fusion is fundamental for reproduction and organ formation. Fusion between most Caenorhabditis elegans epithelial cells is mediated by the EFF1 fusogen. AFF was first identified in EFF1 mutants. Cell fusion in all epidermal and vulval epithelia was blocked in EFF1 mutants. However, fusion between the anchor cell and the utse syncytium that establishes a continuous uterine-vulval tube proceeded normally [ ]. AFF1 was established as necessary for this and for the fusion of heterologous cells in C. elegans [].The transmembrane forms of FF proteins, like most viral fusogens, possess an N-terminal signal sequence followed by a long extracellular portion, a predicted transmembrane domain, and a short intracellular tail. A striking conservation in the position and number of all 16 cysteines in the extracellular portion of FF proteins from different nematode species suggests that these proteins are folded in a similar 3D structure that is essential for their fusogenic activity [ ]. C. elegans AFF1 and EFF1 proteins are essential for developmental cell-to-cell fusion and can merge insect cells. Thus FFs comprise an ancient family of cellular fusogens that can promote fusion when expressed on a viral particle [].Cell-cell fusogen EFF/AFF is a three domain protein involved in cell fusion, a process that is essential for development. These proteins can be found either as a monomer or trimer. Crystal structures of the trimer show unambiguous structural homology to class II viral fusion proteins in their characteristic post-fusion hairpin conformation. The trimer subunits feature the three class II beta sandwich domains, termed I, II, and III, organised in the same way as in the viral proteins [ ].This superfamily represents the domain 3 of the cell-cell fusogen EFF/AFF protein.
Protein Domain
Name: Protein-tyrosine phosphatase-like
Type: Homologous_superfamily
Description: This superfamily represents a domain found in some protein-tyrosine phosphatases, including dual specificity phosphatases, myotubularin-like phosphatases [ ], receptor-type tyrosine-protein phosphatases and non receptor-type tyrosine-protein phosphatases. This domain also shares structure similarity with the lipid phosphatase domain in tensin [] and tensin like proteins, such as cyclin G-associated kinase (GAK) [] and phoshphoinositide phosphatase PTEN (phosphatase and tensin homologue) [].Protein tyrosine (pTyr) phosphorylation is a common post-translational modification which can create novel recognition motifs for protein interactions and cellular localisation, affect protein stability, and regulate enzyme activity. Consequently, maintaining an appropriate level of protein tyrosine phosphorylation is essential for many cellular functions. Tyrosine-specific protein phosphatases (PTPase; ) catalyse the removal of a phosphate group attached to a tyrosine residue, using a cysteinyl-phosphate enzyme intermediate. These enzymes are key regulatory components in signal transduction pathways (such as the MAP kinase pathway) and cell cycle control, and are important in the control of cell growth, proliferation, differentiation and transformation [ , ]. The PTP superfamily can be divided into four subfamilies []:(1) pTyr-specific phosphatases(2) dual specificity phosphatases (dTyr and dSer/dThr)(3) Cdc25 phosphatases (dTyr and/or dThr)(4) LMW (low molecular weight) phosphatasesBased on their cellular localisation, PTPases are also classified as:Receptor-like, which are transmembrane receptors that contain PTPase domains [ ] Non-receptor (intracellular) PTPases [ ] All PTPases carry the highly conserved active site motif C(X)5R (PTP signature motif), employ a common catalytic mechanism, and share a similar core structure made of a central parallel β-sheet with flanking α-helices containing a β-loop-α-loop that encompasses the PTP signature motif [ ]. Functional diversity between PTPases is endowed by regulatory domains and subunits.
Protein Domain
Name: ATP-dependent RNA helicase DEAD-box, conserved site
Type: Conserved_site
Description: A number of eukaryotic and prokaryotic proteins involved in ATP-dependent, nucleic-acid unwinding have been characterised [ , , ] on the basis of their structural similarity. All these proteins share a number of conserved sequence motifs. Some of them are specific to this family while others are shared by other ATP-binding proteins or by proteins belonging to the helicases `superfamily'. One of these motifs, called the 'D-E-A-D-box', represents a special version of the B motif of ATP-binding proteins. Proteins currently known to belong to this family include eukaryotic initiation factor eIF-4A; yeast PRP5, PRP28 and MSS116 splicing proteins, and proteins DHH1, DRS1, MAK5 and ROK1; mouse Pl10, Caenorhabditis elegans helicase glh-1; Drosophila Rm62 (p62), Me31B and Vasa; and Escherichia coli putative RNA helicases dbpA, deaD, rhlB and rhlE.
Protein Domain
Name: Oxysterol-binding protein, conserved site
Type: Conserved_site
Description: A number of eukaryotic proteins that seem to be involved with sterol synthesis and/or its regulation have been found [ ] to be evolutionary related. These include mammalian oxysterol-binding protein (OSBP), a protein of about 800 amino-acid residues that binds a variety of oxysterols (oxygenated derivatives of cholesterol) [, ]; yeast Osh1, a protein of 859 residues that also plays a role in ergosterol synthesis []; yeast proteins Hes1 and Kes1, highly related proteins of 434 residues that seem to play a role in ergosterol synthesis [, ]; Probable transporter efuK from the fungi Hormonema carpetanum, which is involved in the biosynthesis of enfumafungin []; and OSBP-related proteins (ORP) from plants such as Arabidopsis thaliana [].This entry represents a sequence region in these proteins that contains a conserved pentapetide.
Protein Domain
Name: CRAL-TRIO lipid binding domain
Type: Domain
Description: The CRAL-TRIO domain is a protein structural domain that binds small lipophilic molecules [ ]. The domain is named after cellular retinaldehyde-binding protein (CRALBP) and TRIO guanine exchange factor.The CRAL-TRIO domain is found in GTPase-activating proteins (GAPs), guanine nucleotide exchange factors (GEFs) and a family of hydrophobic ligand binding proteins, including the yeast SEC14 protein and mammalian retinaldehyde- and alpha-tocopherol-binding proteins. The domain may either constitute all of the protein or only part of it [ , , , ].The structure of the domain in SEC14 proteins has been determined [ ]. The structure contains several alpha helices as well as a beta sheet composed of 6 strands. Strands 2,3,4 and 5 form a parallel beta sheet with strands 1 and 6 being anti-parallel. The structure also identified a hydrophobic binding pocket for lipid binding.
Protein Domain
Name: Tetratricopeptide repeat 2
Type: Repeat
Description: The tetratrico peptide repeat (TPR) is a structural motif present in a wide range of proteins [ , , ]. It mediates protein-protein interactions and the assembly of multiprotein complexes []. The TPR motif consists of 3-16 tandem-repeats of 34 amino acids residues, although individual TPR motifs can be dispersed in the protein sequence. Sequence alignment of the TPR domains reveals a consensus sequence defined by a pattern of small and large amino acids. TPR motifs have been identified in various different organisms, ranging from bacteria to humans. Proteins containing TPRs are involved in a variety of biological processes, such as cell cycle regulation, transcriptional control, mitochondrial and peroxisomal protein transport, neurogenesis and protein folding [ ].This repeat includes outlying Tetratricopeptide-like repeats (TPR) that are not matched by .
Protein Domain
Name: CRA domain
Type: Domain
Description: This entry represents the CRA (or CT11-RanBPM) domain, which is a protein-protein interaction domain present in crown eukaryotes (plants, animals, fungi) and which is found in Ran-binding proteins such as Ran-binding protein 9 (RanBP9 or RanBPM) and RanBP10. RanBPM is a scaffolding protein important in regulating cellular function in both the immune system and the nervous system, and may act as an adapter protein to couple membrane receptors to intracellular signaling pathways. This domain is at the C terminus of the proteins and is the binding domain for the CRA motif, which is comprised of approximately 100 amino acids at the C-terminal of RanBPM. It was found to be important for the interaction of RanBPM with Fragile X messenger ribonucleoprotein 1 (FMRP/FMR1), but its functional significance has yet to be determined [ ].
Protein Domain
Name: CRAL-TRIO lipid binding domain superfamily
Type: Homologous_superfamily
Description: The CRAL-TRIO domain is a protein structural domain that binds small lipophilic molecules [ ]. The domain is named after cellular retinaldehyde-binding protein (CRALBP) and TRIO guanine exchange factor.The CRAL-TRIO domain is found in GTPase-activating proteins (GAPs), guanine nucleotide exchange factors (GEFs) and a family of hydrophobic ligand binding proteins, including the yeast SEC14 protein and mammalian retinaldehyde- and alpha-tocopherol-binding proteins. The domain may either constitute all of the protein or only part of it [ , , , ].The structure of the domain in SEC14 proteins has been determined [ ]. The structure contains several alpha helices as well as a beta sheet composed of 6 strands. Strands 2,3,4 and 5 form a parallel beta sheet with strands 1 and 6 being anti-parallel. The structure also identified a hydrophobic binding pocket for lipid binding.
Protein Domain
Name: Chaperone lipoprotein, PulS/OutS
Type: Family
Description: This family comprises lipoproteins from gamma proteobacterial species: pullulanase secretion protein PulS protein of Klebsiella pneumoniae ( ), the lipoprotein OutS protein of Erwinia chrysanthemi ( ) and the functionally uncharacterised type II secretion protein EtpO ( ) from Escherichia coli O157:H7. PulS and OutS have been shown to interact with and facilitate insertion of secretins into the outer membrane, suggesting a chaperone-like, or piloting function for members of this family. These proteins consist of 4 alpha helices arranged into a slightly elongated, C-shaped helical bundle. Within PulS, the helices form a binding groove which interact with a disordered region on another protein, PulD, to form a PulS-PulD complex. PulD then oligomerises into a dodecameric outer membrane complex in the formation of a type II secretion system [ ].
Protein Domain
Name: Regulator of G-protein signalling 8, RGS domain
Type: Domain
Description: The RGS (Regulator of G-protein Signalling) domain is an essential part of the RGS8 protein.RGS8 is a member of R4 subfamily of RGS family, a diverse group of multifunctional proteins that regulate cellular signalling events downstream of G-protein coupled receptors (GPCRs) [ ]. Signalling is initiated when GPCRs bind to their ligands, triggering the replacement of GDP bound to the G-alpha subunits of heterotrimeric G proteins with GTP. RGSs inhibit signal transduction by increasing the GTPase activity of G protein alpha subunits, thereby driving them into their inactive GDP-bound form. This activity defines them as GTPase activating proteins (GAPs).RGS8 is involved in G-protein-gated potassium channels regulation [ , ] and is predominantly expressed in the brain []. In the hematopoietic system, it is selectively expressed in natural killer (NK) cells [].
Protein Domain
Name: Lipocalin, bacterial
Type: Family
Description: The lipocalins are a diverse, interesting, yet poorly understood family of proteins composed, in the main, of extracellular ligand-binding proteinsdisplaying high specificity for small hydrophobic molecules [ , , ]. Functionsof these proteins include transport of nutrients, control of cell regulation, pheromone transport, cryptic colouration, and the enzymatic synthesis of prostaglandins.The crystal structures of several lipocalins have been solved and show a novel 8-stranded anti-parallel β-barrel fold well conserved within thefamily. Sequence similarity within the family is at a much lower level and would seem to be restricted to conserved disulphides and 3 motifs, whichform a juxtaposed cluster that may act as a common cell surface receptor site []. By contrast, at the more variable end of the fold are found an internal ligand binding site and a putative surface for the formation of macromolecular complexes []. The anti-parallel β-barrel fold is alsoexploited by the fatty acid-binding proteins (which function similarly by binding small hydrophobic molecules), by avidin and the closely relatedmetalloprotease inhibitors, and by triabin. Similarity at the sequence level, however, is less obvious, being confined to a single short N-terminal motif. The lipocalin family can be subdivided into kernal and outlier sets. Thekernal lipocalins form the largest self consistent group. The outlier lipocalins form several smaller distinct subgroups: the OBPs, the von Ebner's gland proteins, alpha-1-acid glycoproteins, tick histamine binding proteins and the nitrophorins. Relatively recently, bacterial lipocalins have been described for the first time [, , ]. These are lipoproteins anchored to the outer membrane of Gram-negative bacteria and some plants. Their promoters are activated at the transition between exponential and stationary growth phases. Bacteriallipocalin sequences are quite closely related to apolipoprotein D and may serve a starvation response function in bacteria. Overexpression, membranefractionation, and metabolic labelling with tritiated palmitate showed bacterial lipocalins to be globomycin-sensitive outer membrane proteins.The bacterial lipocalins have been found in a small number of species, raising the possibility that they originated by horizontal transfer. Estimates of the G+C content in the first and third codon positions of these genes have been calculated. A biased %G+C in the 1st and 3rd codonpositions would suggest horizontal transfer. None of the computed G+C contents of the bacterial lipocalin genes were outside of the expected limits (between the first and third quartiles). These data provide no support for a hypothesis in which bacterial lipocalins were recently acquired through horizontal transfer. Further evidence against horizontal transfer will come from finding more lipocalins in different species, thus making the gene transfer hypothesis more unlikely.
Protein Domain
Name: FY-rich, C-terminal
Type: Domain
Description: The "FY-rich"domain N-terminal (FYRN) and "FY-rich"domain C-terminal (FYRC) sequence motifs are two poorly characterised phenylalanine/ tyrosine-rich regions of around 50 and 100 amino acids, respectively, that arefound in a variety of chromatin-associated proteins [ , , , ]. They areparticularly common in histone H3K4 methyltransferases most notably in a family of proteins that includes human mixed lineage leukemia (MLL) and theDrosophila melanogaster protein trithorax. Both of these enzymes play a key role in the epigenetic regulation of gene expression during development, andthe gene coding for MLL is frequently rearranged in infant and secondary therapy-related acute leukemias. They are also found in transforming growthfactor beta regulator 1 (TBRG1), a growth inhibitory protein induced in cells undergoing arrest in response to DNA damage and transforming growth factor(TGF)-beta1. As TBRG1 has been shown to bind to both the tumor suppressor p14ARF and MDM2, a key regulator of p53, it is also known as nuclearinteractor of ARF and MDM2 (NIAM). In most proteins, the FYRN and FYRC regions are closely juxtaposed, however, in MLL and its homologues they are fardistant. To be fully active, MLL must be proteolytically processed by taspase1, which cleaves the protein between the FYRN and FYRC regions []. TheN-terminal and C-terminal fragments remain associated after proteolysis apparently as a result of an interaction between the FYRN and FYRC regions.How proteolytic processing regulates the activity of MLL is not known. Intriguingly, the FYRN and FYRC motifs of a second family of histone H3K4methyltransferases, represented by MLL2 and MLL4 in humans and TRR in Drosophila melanogaster, are closely juxtaposed. FYRN and FYRC motifs arefound in association with modules that create or recognise histone modifications in proteins from a wide range of eukaryotes, and it is likelythat in these proteins they have a conserved role related to some aspect of chromatin biology [].The FYRN and FYRC regions are not separate independently folded domains, butare components of a distinct protein module, The FYRN and FYRC motifs both form part of a single folded module (the FYR domain), which adopts an alpha+beta fold consisting of a six-stranded antiparallel β-sheet followed by four consecutive α-helices. The FYRN region correspondsto β-strands 1-4 and their connecting loops, whereas the FYRC motif maps to β-strand 5, β-strand 6 and helices alpha1 to alpha4. Most of theconserved tyrosine and phenylalanine residues, after which these motifs are named are involved in interactions that stabilise the fold. Proteins such asMLL, in which the FYRN and FYRC regions are separated by hundreds of amino acids, are expected to contain FYR domains with a large insertion between twoof the strands of the β-sheet (strands 4 and 5) [ ].
Protein Domain
Name: FY-rich, N-terminal
Type: Domain
Description: The "FY-rich"domain N-terminal (FYRN) and "FY-rich"domain C-terminal (FYRC) sequence motifs are two poorly characterised phenylalanine/ tyrosine-rich regions of around 50 and 100 amino acids, respectively, that arefound in a variety of chromatin-associated proteins [ , , , ]. They areparticularly common in histone H3K4 methyltransferases most notably in a family of proteins that includes human mixed lineage leukemia (MLL) and theDrosophila melanogaster protein trithorax. Both of these enzymes play a key role in the epigenetic regulation of gene expression during development, andthe gene coding for MLL is frequently rearranged in infant and secondary therapy-related acute leukemias. They are also found in transforming growthfactor beta regulator 1 (TBRG1), a growth inhibitory protein induced in cells undergoing arrest in response to DNA damage and transforming growth factor(TGF)-beta1. As TBRG1 has been shown to bind to both the tumor suppressor p14ARF and MDM2, a key regulator of p53, it is also known as nuclearinteractor of ARF and MDM2 (NIAM). In most proteins, the FYRN and FYRC regions are closely juxtaposed, however, in MLL and its homologues they are fardistant. To be fully active, MLL must be proteolytically processed by taspase1, which cleaves the protein between the FYRN and FYRC regions []. TheN-terminal and C-terminal fragments remain associated after proteolysis apparently as a result of an interaction between the FYRN and FYRC regions.How proteolytic processing regulates the activity of MLL is not known. Intriguingly, the FYRN and FYRC motifs of a second family of histone H3K4methyltransferases, represented by MLL2 and MLL4 in humans and TRR in Drosophila melanogaster, are closely juxtaposed. FYRN and FYRC motifs arefound in association with modules that create or recognise histone modifications in proteins from a wide range of eukaryotes, and it is likelythat in these proteins they have a conserved role related to some aspect of chromatin biology [].The FYRN and FYRC regions are not separate independently folded domains, butare components of a distinct protein module, The FYRN and FYRC motifs both form part of a single folded module (the FYR domain), which adopts an alpha+beta fold consisting of a six-stranded antiparallel β-sheet followed by four consecutive α-helices. The FYRN region correspondsto β-strands 1-4 and their connecting loops, whereas the FYRC motif maps to β-strand 5, β-strand 6 and helices alpha1 to alpha4. Most of theconserved tyrosine and phenylalanine residues, after which these motifs are named are involved in interactions that stabilise the fold. Proteins such asMLL, in which the FYRN and FYRC regions are separated by hundreds of amino acids, are expected to contain FYR domains with a large insertion between twoof the strands of the β-sheet (strands 4 and 5) [ ].
Protein Domain
Name: Insulin-like growth factor binding protein, N-terminal, Cys-rich conserved site
Type: Conserved_site
Description: The insulin-like growth factors (IGF-I and IGF-II) bind to specific binding proteins in extracellular fluids with high affinity [ , ]. These IGF-binding proteins (IGFBP) prolong the half-life of the IGFs and have been shown to either inhibit or stimulate the growth promoting effects of the IGFs on cells culture. They seem to alter the interaction of IGFs with their cell surface receptors. The IGFBP family comprises six proteins (IGFBP-1 to -6) that bind to IGFs with high affinity. The precursor forms of all six IGFBPs have secretory signal peptides. All IGFBPs share a common domain organisation and also a high degree of similarity in their primary protein structure. The highest conservation is found in the N- and C-terminal cysteine-rich regions. Twelve conserved cysteines (ten in IGFBP-6) are found in the N-terminal domain, and six are found in the C-terminal domain. Both the N- and C-terminal domains participate in binding to IGFs, although the specific roles each of these domains in IGF binding have not been decisively established. In general, the strongest binding to IGFs is shown by amino-terminal fragments, which, however bind to IGF with 10- to 1000-fold lower affinity than full length IGFBPs. The central weakly conserved part (L domain) contains most of the cleavage sites for specific proteases [ , ].The N-terminal domain is about 80 residues in length and has an L-like structure. It can be divided into two subdomains that are connected by a short stretch of amino acids. The two subdomains are perpendicular to each other, creating the "L"shape for the whole N-terminal domain. The core of the first subdomain presents a novel fold stabilised by a short two-stranded beta sheet and four disulphide bridges forming a disulphide bond ladder-like structure. The beta sheet and disulphide bridges are all in one plane, making the structure appear flat from one side like the "palm"of a hand. The palm is extended with a "thumb"segment in various IGFBPs. The thumb segment consists of the very N-terminal residues and contains a consensus XhhyC motif, where h is a hydrophobic amino acid and y is positively charged. The second subdomain adopts a globular fold whose scaffold is secured by an inside packing of two cysteines bridges stabilised by a three-stranded beta sheet [ , ].The following growth-factor inducible proteins are structurally related to IGFBPs and could function as growth-factor binding proteins [ , ]:Mouse protein cyr61 and its probable chicken homologue, protein CEF-10.Human connective tissue growth factor (CTGF) and its mouse homologue, protein FISP-12.Vertebrate protein NOV.This entry represents a conserved cysteine-rich region located in the N-terminal IGFBP domain.
Protein Domain
Name: Bacterial microcompartments protein, conserved site
Type: Conserved_site
Description: Bacterial microcompartments (BMCs) are large proteinaceous structures comprised of a roughly icosahedral shell and a series of encapsulated enzymes. The shells of BMCs are made primarily of a family of proteins whose structural core is the BMC domain, and variations upon this core provide functional diversity. This domain is found in a variety of polyhedral organelle shell proteins (CcmK), including CsoS1A, CsoS1B and CsoS1C of Thiobacillus neapolitanus (Halothiobacillus neapolitanus) and their orthologues from other bacteria [ , , ].Some autotrophic and non-autotrophic organisms form polyhedral organelles, carboxysomes/enterosomes [ ]. The best studied is the carboxysome of Halothiobacillus neapolitanus, which is composed of at least 9 proteins: six shell proteins, CsoS1A, CsoS1B, CsoS1C, Cso2A, Cso2B and CsoS3 (carbonic anhydrase) [, ], one protein of unknown function and the large and small subunits of RuBisCo (CbbL and Cbbs). Carboxysomes appear to be approximately 120 nm in diameter, most often observed as regular hexagons, with a solid interior bounded by a unilamellar protein shell. The interior is filled with type I RuBisCo, which is composed of 8 large subunits and 8 small subunits; it accounts for 60% of the carboxysomal protein, which amounts to approximately 300 molecules of enzyme per carboxysome. Carboxysomes are required for autotrophic growth at low CO2concentrations and are thought to function as part of a CO 2-concentrating mechanism [ , ].Polyhedral organelles, enterosomes, from non-autotrophic organisms are involved in coenzyme B 12-dependent 1,2-propanediol utilisation (e.g., in Salmonella enterica [ ]) and ethanolamine utilisation (e.g., in Salmonella typhimurium []). Genes needed for enterosome formation are located in the 1,2-propanediol utilisation pdu[ , ] or ethanolamine utilisation eut[ , ] operons, respectively. Although enterosomes of non-autotrophic organisms are apparently related to carboxysomes structurally, a functional relationship is uncertain. A role in CO2concentration, similar to that of the carboxysome, is unlikely since there is no known association between CO 2and coenzyme B12-dependent 1,2-propanediol or ethanolamine utilisation [ ]. It seems probable that enterosomes help protect the cells from reactive aldehyde species in the degradation pathways of 1,2-propanediol and ethanolamine [].The BMC domain fold consists of three α-helices (designated A, B, and C) and four β-strands ( ). Some instances of the BMC shell protein reveal a circular permutation in which a highly similar tertiary structure is built from secondary structure elements occurring in a different order. The secondary structure elements contributed by the C-terminal region of the typical BMC fold are instead contributed by the N-terminal region of the BMC circularly permuted domain ( ) [ , , ].This entry represents a conserved region found at the N-terminal of the Bacterial microcompartment (BMC) domain from BMC proteins.
Protein Domain
Name: Fumarate reductase/succinate dehydrogenase, FAD-binding site
Type: Binding_site
Description: In bacteria two distinct, membrane-bound, enzyme complexes are responsible for the interconversion of fumarate and succinate (): fumarate reductase (Frd) is used in anaerobic growth, and succinate dehydrogenase (Sdh)is used in aerobic growth. Both complexes consist of two main components: a membrane-extrinsic component composed of a FAD-binding flavoprotein and aniron-sulphur protein; and an hydrophobic component composed of a membrane anchor protein and/or a cytochrome B.In eukaryotes mitochondrial succinate dehydrogenase (ubiquinone) ( ) is an enzyme composed of two subunits: a FAD flavoprotein and and iron-sulphurprotein. The flavoprotein subunit is a protein of about 60 to 70 Kd to which FAD is covalently bound to a histidine residue which is located in the N-terminalsection of the protein [ ]. The sequence around that histidine is wellconserved in Frd and Sdh from various bacterial and eukaryotic species [ ].
Protein Domain
Name: G-protein gamma-like domain superfamily
Type: Homologous_superfamily
Description: This entry represents the G protein gamma subunit and the GGL (G protein gamma-like) domain superfamily, which are related in sequence and are comprised of an extended α-helical polypeptide. The G protein gamma subunit forms a stable dimer with the beta subunit, but it does not make any contact with the alpha subunit, which contacts the opposite face of the beta subunit. The GGL domain is found in several RGS (regulators of G protein signaling) proteins. GGL domains can interact with beta subunits to form novel dimers that prevent gamma subunit binding, and may prevent heterotrimer formation by inhibiting alpha subunit binding. The interaction between G protein beta-5 neuro-specific isoforms and RGS GGL domains may represent a general mode of binding between β-propeller proteins and their partners [].The G protein structure consists of a long α-helix interrupted in the middle.
Protein Domain
Name: BRCA1-associated protein, RING finger, H2 subclass
Type: Domain
Description: This entry represents the RING finger, H2 subclass found in BRAP2 and it homologues.This entry includes human BRCA1-associated protein (BRAP/BRAP2, also known as impedes mitogenic signal propagation (IMP), RING finger protein 52, or renal carcinoma antigen NY-REN-63) and its homologues from yeast ETP1. BRAP2 is a cytoplasmic protein interacting with the two functional nuclear localization signal (NLS) motifs of BRCA1, a nuclear protein linked to breast cancer. It also binds to the SV40 large T antigen NLS motif and the bipartite NLS motif found in mitosin. BRAP2 serves as a cytoplasmic retention protein and plays a role in the regulation of nuclear protein transport [ , , , , , ].These proteins contain an N-terminal RNA recognition motif (RRM), also known as RBD (RNA binding domain) or RNP (ribonucleoprotein domain), followed by a C3H2C3-type RING-H2 finger and a UBP-type zinc finger.
Protein Domain
Name: KH domain-containing BBP-like
Type: Family
Description: This entry represents a group of KH domain-containing proteins that share protein sequence similarity with fungal BBP protein, such as KH domain-containing RNA-binding protein QKI and KHDR1-3 from human. QKI is a RNA-binding protein that plays a central role in myelination [ ] and regulates pre-mRNA splicing, mRNA export and protein translation [].BBP/SF1 family members include the branchpoint binding proteins (BBP) from yeast and their mammalian homologue, splicing factor 1 (SF1). They are KH domain containing proteins required for pre-spliceosome formation, which is the first step of pre-mRNA splicing [ , ]. In budding yeasts, BBP binds the conserved RNA sequence UACUAAC, called the branchpoint motif, found near the 3' end of yeast introns []. Besides its function in pre-mRNA splicing, BBP is also involved in nuclear retention of pre-mRNA []. Mammalian SF1, also known as ZFM1, may act as transcription repressor [].
Protein Domain
Name: TraG, P-loop domain
Type: Domain
Description: This entry represents the P-loop domain [ ] found in the TraG conjugation protein (also known as Type IV (T4S) secretion system-coupling protein VirD4) and related proteins such as VirB4 []. T4S system is composed of VirB1-11 and VirD4 proteins, thus, known as VirB/D T4S system [ ] and is responsible of protein delivery into host cells, DNA transfer and toxin release, important for survival and pathogenesis. VirB4, VirB11, and VirD4 are cytoplasmic ATPases that provide energy for substrate transfer. VirD4 is also termed coupling protein as it recruits T4S system substrates to the VirB1-11 secretion channel. VirB4 is the most conserved member of the T4S and essential for the system functions. This domain is also found CAG pathogenicity island protein 23 (CagE) from Helicobacter pylori, another example of Type IV secretion system which translocates effector proteins into gastric epithelial cells [].
Protein Domain
Name: SPHK1-interactor/A-kinase anchor 110kDa
Type: Family
Description: This family consists of several mammalian protein kinase A anchoring protein 3 (PRKA3) or A-kinase anchor protein 110kDa (AKAP 110) sequences. Agents that increase intracellular cAMP are potent stimulators of sperm motility. Anchoring inhibitor peptides, designed to disrupt the interaction of the cAMP-dependent protein kinase A (PKA) with A kinase-anchoring proteins (AKAPs), are potent inhibitors of sperm motility. PKA anchoring is a key biochemical mechanism controlling motility. AKAP110 shares compartments with both RI and RII isoforms of PKA and may function as a regulator of both motility- and head-associated functions such as capacitation and the acrosome reaction [ ].SPHK1-interactor and AKAP domain-containing proteins are proteins that mediates the subcellular compartmentation of cAMP-dependent protein kinase (PKA type II). May act as a converging factor linking cAMP and sphingosine signaling pathways. Plays a regulatory role in the modulation of SPHK1 [ ].
Protein Domain
Name: MATH/TRAF domain
Type: Domain
Description: Although apparently functionally unrelated, intracellular TRAFs and extracellular meprins share a conserved region of about 180 residues, the meprin and TRAF homology (MATH) domain [ ]. Meprins are mammalian tissue-specific metalloendopeptidases of the astacin family implicated in developmental, normal and pathological processes by hydrolysing a variety of proteins. Various growth factors, cytokines, and extracellular matrix proteins are substrates for meprins. They are composed of five structural domains: an N-terminal endopeptidase domain, a MAM domain (see ), a MATH domain, an EGF-like domain (see ) and a C-terminal transmembrane region. Meprin A and B form membrane bound homotetramer whereas homooligomers of meprin A are secreted. A proteolitic site adjacent to the MATH domain, only present in meprin A, allows the release of the protein from the membrane [ ].TRAF proteins were first isolated by their ability to interact with TNF receptors [ ]. They promote cell survival by the activation of downstream protein kinases and, finally, transcription factors of the NF-kB and AP-1 family. The TRAF proteins are composed of 3 structural domains: a RING finger (see ) in the N-terminal part of the protein, one to seven TRAF zinc fingers (see ) in the middle and the MATH domain in the C-terminal part [ ]. The MATH domain is necessary and sufficient for self-association and receptor interaction. From the structural analysis two consensus sequence recognised by the TRAF domain have been defined: a major one, [PSAT]x[QE]E and a minor one, PxQxxD [].The structure of the TRAF2 protein reveals a trimeric self-association of the MATH domain [ ]. The domain forms a new, light-stranded antiparallel β-sandwich structure. A coiled-coil region adjacent to the MATH domain is also important for the trimerisation. The oligomerisation is essential for establishing appropriate connections to form signalling complexes with TNF receptor-1. The ligand binding surface of TRAF proteins is located in β-strands 6 and 7 [].
Protein Domain
Name: Mini-chromosome maintenance, conserved site
Type: Conserved_site
Description: MCM proteins are DNA-dependent ATPases required for the initiation of eukaryotic DNA replication [, , ]. In eukaryotes there is a family of six proteins, MCM2 to MCM7. They were first identified in yeast where most of them have adirect role in the initiation of chromosomal DNA replication by interacting directly with autonomously replicating sequences (ARS). They were thus called minichromosome maintenance proteins, MCM proteins [ ].This family is also present in the archebacteria in 1 to 4 copies. Methanocaldococcus jannaschii (Methanococcus jannaschii) has four members, MJ0363, MJ0961, MJ1489 and MJECL13.The "MCM motif"contains Walker-A and Walker-B type nucleotide binding motifs. The diagnostic sequence defining the MCMs is IDEFDKM. Only Mcm2 (aka Cdc19 or Nda1) has been subjected to mutational analysis in this region, and most mutations abolish its activity [ ]. The presence of a putative ATP-binding domain implies that these proteins may be involved in an ATP-consuming step in the initiation of DNA replication in eukaryotes.The MCM proteins bind together in a large complex [ ].Within this complex, individual subunits associate with different affinities, and there is a tightly associated core of Mcm4 (Cdc21), Mcm6 (Mis5) and Mcm7 [ ]. This core complex in human MCMs has been associated with helicase activity in vitro[ ], leading to the suggestion that the MCM proteins are the eukaryotic replicative helicase.Schizosaccharomyces pombe (Fission yeast) MCMs, like those in metazoans, are found in the nucleus throughout the cell cycle. This is in contrast to the Saccharomyces cerevisiae (Baker's yeast) in which MCM proteins move in and out of the nucleus during each cell cycle. The assembly of the MCM complex in S. pombe is required for MCM localisation, ensuring that only intact MCM complexes remain in the nucleus [ ].The signature pattern used in this entry represents a perfectly conserved region that is a special version of the B motif found in ATP-binding proteins.
Protein Domain
Name: C-type lectin-like/link domain superfamily
Type: Homologous_superfamily
Description: Lectins occur in plants, animals, bacteria and viruses. Initially described for their carbohydrate-binding activity [ ], they are now recognised as a more diverse group of proteins, some of which are involved in protein-protein, protein-lipid or protein-nucleic acid interactions []. There are at least twelve structural families of lectins, of which C-type (Ca+-dependent) lectins is one. C-type lectins can be further divided into seven subgroups based on additional non-lectin domains and gene structure: (I) hyalectans, (II) asialoglycoprotein receptors, (III) collectins, (IV) selectins, (V) NK group transmembrane receptors, (VI) macrophage mannose receptors, and (VII) simple (single domain) lectins [].The term 'C-type lectin domain' was introduced to distinguish a carbohydrate-recognition domain (CRD) which is present in all Ca2+-dependent lectins, but not in other types of animal lectins. However, there are proteins with modules similar in overall structure to CRDs that serve functions other than sugar binding. Therefore, a more general term C-type lectin-like domain was introduced to refer to such domains, although both terms are sometimes used interchangeably [ ].This superfamily represents a structural domain found in C-type lectins, as well as in other proteins, including:The C-terminal domain of invasin [ ] and intimin [].Link domain, which includes the Link module of tumor necrosis factor-inducible gene 6 protein (TSG-6) [ ] (a hyaladherin with important roles in inflammation and ovulation) and the hyaluronan binding domain of CD44 (which contains extra N-terminal β-strand and C-terminal β-hairpin) []. The Link domain may have emerged as a result of a deletion of the long loop region from an ancestral canonical C-type lectin domain [].Endostatin [ ] and the endostatin domain of collagen alpha 1 (XV) [], these domains being decorated with many insertions in the common fold.
Protein Domain
Name: Ribonuclease P/MRP subunit Rpp29
Type: Family
Description: Ribonuclease P (Rnp) is a ubiquitous ribozyme that catalyzes a Mg2 -dependent hydrolysis to remove the 5'-leader sequence of precursor tRNA (pre-tRNA) in all three domains of life [ ]. In bacteria, the catalytic RNA (typically ~120kDa) is aided by a small protein cofactor (~14kDa) []. Archaeal and eukaryote RNase P consist of a single RNA and archaeal RNase P has four or five proteins, while eukaryotic RNase P consists of 9 or 10 proteins. Eukaryotic and archaeal RNase P RNAs cooperatively function with protein subunits in catalysis [].Eukaryotic nuclear RNase P shares most of its protein components with another essential RNP enzyme, nucleolar RNase MRP [ ]. RNase MRP (mitochondrial RNA processing) is an rRNA processing enzyme that cleaves various RNAs, including ribosomal, messenger, and mitochondrial RNAs. It can cleave a specific site within precursor rRNA to generate the mature 5'-end of 5.8S rRNA []. Despite its name, the vast majority of RNase MRP is localized in the nucleolus []. RNase MRP has been shown to cleave primers for mitochondrial DNA replication and CLB2 mRNA. In yeast, RNase MRP possesses one putatively catalytic RNA and at least 9 protein subunits (Pop1, Pop3-Pop8, Rpp1, Snm1 and Rmp1) []. Human RNase MRP complex consists of 267 nucleotides and supports the interaction with and among at least seven protein components: hPop1, hPop5, Rpp20, Rpp25, Rpp30, Rpp38, and Rpp40) and three additional proteins, hPop4, Rpp21 and Rpp14, have been reported to be associated with at least a subset of RNase MRP complexes [].This entry represents the p29 subunit (also known as Rpp29 or Pop4) of the related ribonucleoproteins ribonuclease (RNase) P and RNase MRP from eukaryotes [ ]. Rpp29 has a conserved C-terminal domain with an Sm-like fold []. Rpp29 () catalyses the endonucleolytic cleavage of RNA, removing 5'-extranucleotides from tRNA precursor. It interacts with the Rpp25 and Pop5 subunits.
Protein Domain
Name: von Willebrand factor, type A
Type: Domain
Description: The von Willebrand factor is a large multimeric glycoprotein found in blood plasma. Mutant forms are involved in the aetiology of bleeding disorders []. In von Willebrand factor, the type A domain (vWF) is the prototype for a protein superfamily. The vWF domain is found in various plasma proteins: complement factors B, C2, CR3 and CR4; the integrins (I-domains); collagen types VI, VII, XII and XIV; and other extracellular proteins [, , ]. Although the majority of VWA-containing proteins are extracellular, the most ancient ones present in all eukaryotes are all intracellular proteins involved in functions such as transcription, DNA repair, ribosomal and membrane transport and the proteasome. A common feature appears to be involvement in multiprotein complexes. Proteins that incorporate vWF domains participate in numerous biological events (e.g. cell adhesion, migration, homing, pattern formation, and signal transduction), involving interaction with a large array of ligands []. A number of human diseases arise from mutations in VWA domains. Secondary structure prediction from 75 aligned vWF sequences has revealed a largely alternating sequence of α-helices and β-strands [ ]. The vWF domain fold is predicted to be a doubly-wound, open, twisted β-sheet flanked by α-helices []. 3D structures have been determined for the I-domains of integrins alpha-M (CD11b; with bound magnesium) [ ] and alpha-L (CD11a; with bound manganese) []. The domain adopts a classic α/β Rossmann fold and contains an unusual metal ion coordination site at its surface. It has been suggested that this site represents a general metal ion-dependent adhesion site (MIDAS) for binding protein ligands []. The residues constituting the MIDAS motif in the CD11band CD11a I-domains are completely conserved, but the manner in which the metal ion is coordinated differs slightly [].
Protein Domain
Name: von Willebrand factor domain BatA-type
Type: Domain
Description: Members of this subgroup are bacterial in origin. They are typified by the presence of a MIDAS motif [ , ].The von Willebrand factor is a large multimeric glycoprotein found in blood plasma. Mutant forms are involved in the aetiology of bleeding disorders []. In von Willebrand factor, the type A domain (vWF) is the prototype for a protein superfamily. The vWF domain is found in various plasma proteins: complement factors B, C2, CR3 and CR4; the integrins (I-domains); collagen types VI, VII, XII and XIV; and other extracellular proteins [, , ]. Although the majority of VWA-containing proteins are extracellular, the most ancient ones present in all eukaryotes are all intracellular proteins involved in functions such as transcription, DNA repair, ribosomal and membrane transport and the proteasome. A common feature appears to be involvement in multiprotein complexes. Proteins that incorporate vWF domains participate in numerous biological events (e.g. cell adhesion, migration, homing, pattern formation, and signal transduction), involving interaction with a large array of ligands []. A number of human diseases arise from mutations in VWA domains. Secondary structure prediction from 75 aligned vWF sequences has revealed a largely alternating sequence of α-helices and β-strands [ ]. The vWF domain fold is predicted to be a doubly-wound, open, twisted β-sheet flanked by α-helices []. 3D structures have been determined for the I-domains of integrins alpha-M (CD11b; with bound magnesium) [ ] and alpha-L (CD11a; with bound manganese) []. The domain adopts a classic α/β Rossmann fold and contains an unusual metal ion coordination site at its surface. It has been suggested that this site represents a general metal ion-dependent adhesion site (MIDAS) for binding protein ligands []. The residues constituting the MIDAS motif in the CD11band CD11a I-domains are completely conserved, but the manner in which the metal ion is coordinated differs slightly [].
Protein Domain
Name: CRISPR-associated protein, Cas02710
Type: Family
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. This entry represents a family of Cas proteins encoded exclusively in the vicinity of CRISPR repeats and other Cas proteins in Methanothermobacter thermautotrophicus (Methanobacterium thermoformicicum), Thermus thermophilus (Deinococcus-Thermus), Chloroflexus aurantiacus (Chloroflexi), and Thermomicrobium roseum (Thermomicrobia).
Protein Domain
Name: von Willebrand factor A-like domain superfamily
Type: Homologous_superfamily
Description: The von Willebrand factor is a large multimeric glycoprotein found in blood plasma. Mutant forms are involved in the aetiology of bleeding disorders []. In von Willebrand factor, the type A domain (vWF) is the prototype for a protein superfamily. The vWF domain is found in various plasma proteins: complement factors B, C2, CR3 and CR4; the integrins (I-domains); collagen types VI, VII, XII and XIV; and other extracellular proteins [, , ]. Although the majority of VWA-containing proteins are extracellular, the most ancient ones present in all eukaryotes are all intracellular proteins involved in functions such as transcription, DNA repair, ribosomal and membrane transport and the proteasome. A common feature appears to be involvement in multiprotein complexes. Proteins that incorporate vWF domains participate in numerous biological events (e.g. cell adhesion, migration, homing, pattern formation, and signal transduction), involving interaction with a large array of ligands []. A number of human diseases arise from mutations in VWA domains. Secondary structure prediction from 75 aligned vWF sequences has revealed a largely alternating sequence of α-helices and β-strands [ ]. The vWF domain fold is predicted to be a doubly-wound, open, twisted β-sheet flanked by α-helices []. 3D structures have been determined for the I-domains of integrins alpha-M (CD11b; with bound magnesium) [ ] and alpha-L (CD11a; with bound manganese) []. The domain adopts a classic α/β Rossmann fold and contains an unusual metal ion coordination site at its surface. It has been suggested that this site represents a general metal ion-dependent adhesion site (MIDAS) for binding protein ligands []. The residues constituting the MIDAS motif in the CD11band CD11a I-domains are completely conserved, but the manner in which the metal ion is coordinated differs slightly [].
Protein Domain
Name: CRISPR-associated protein, Cmr5
Type: Family
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. This entry represents a family of Cas proteins as represented by TM1791.1 from Thermotoga maritima. This family of Cas proteins are found in both archaeal and bacterial species.
Protein Domain
Name: CRISPR system endoribonuclease Csx1
Type: Family
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability []. This entry represents a family of Cas proteins including CRISPR system endoribonuclease Csx1. This family was previously known as CRISPR-associated protein, MJ1666 family.
Protein Domain
Name: CRISPR-associated RAMP Csm3
Type: Family
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. This entry represents the Csm3 (CRISPR/cas Subtype Mtube, protein 3) family of Cas proteins encoded by genes found in the mtube subtype CRISPR/cas locus and designated. This is also known as Csm3 Type III-A [ ].
Protein Domain
Name: Coronin
Type: Family
Description: Coronins are evoluntionarily conserved WD-repeat-containing proteins mostly involved in actin cytoskeleton organisation. The WD40 motif is found in a multitude of eukaryotic proteins involved in a variety of cellular processes [ ]. Repeated WD40 motifs act as a site for protein-protein interaction, and proteins containing WD40 repeats are known to serve as platforms for the assembly of protein complexes or mediators of transient interplay among other proteins. The final 40 amino acids are predicted to form a coiled-coil in a coronin homodimer []. Coronin was first identified as an actin binding protein in Dictyostelium discoideum. It was named Coronin because of its association with crown-shaped cell surface projections of growth-phase D [ ]. Since then, several Coronin homologues and isoforms have been identified from yeast to human. Mammalian Coronin isoforms include Coronin 1A/B/C, Coronin 2A/B, Coronin 6 and Coronin 7. The yeast Coronin homologue is known as Crn1, while the Drosophila homologue is known as pod1. In budding yeast, Crn1 regulates the actin filament nucleation/branching activity of the actin-related protein 2/3 (Arp2/3) complex through interaction with the Arc35p subunit [ ].Mammalian Coronin 1A is exclusively expressed in leukocytes and involved in the regulation of leukocyte specific signaling events [ ]. The crystal structure of Coronin 1A has been solved [, ]. Mammalian Coronin 1B can protect new (ATP-rich) filaments from F-actin severing Cofilin and dismantle old (ADP-rich) filaments by inducing Arp2/3 dissociation in lamellipodia [, ]. It is worth noting that Coronin 7 in this entry has not been shown to interact with actin [ ]. Unlike most of the Coronin isoforms, it binds to the outer side of Golgi complex membranes and acts as a mediator of cargo vesicle formation at the trans-Golgi network [].
Protein Domain
Name: Paxillin/TGFB1I1, LIM domain 1
Type: Domain
Description: Paxillin is a cytoskeletal protein involved in actin-membrane attachment at sites of cell adhesion to the extracellular matrix (focal adhesion) [ , ]. Extensive tyrosine phosphorylation occurs during integrin-mediated cell adhesion, embryonic development, fibroblast transformation and following stimulation of cells by mitogens that operate through the 7TM family of G-protein-coupled receptors []. Paxillin binds in vitro to the focal adhesion protein vinculin, as well as to the SH3 domain of c-Src, and, when tyrosine phosphorylated, to the SH2 domain of v-Crk []. An N-terminal region has been identified that supports the binding of both vinculin and the focal adhesion tyrosine kinase, pp125Fak [].Paxillin is a 68kDa protein containing multiple domains, including four tandem C-terminal LIM domains (each of which binds 2 zinc ions); an N-terminal proline-rich domain, which contains a consensus SH3 binding site; and three potential Crk-SH2 binding sites [ ]. The predicted structure of paxillin suggests that it is a unique cytoskeletal protein capable of interaction with a variety of intracellular signalling and structural molecules important in growth control and the regulation of cytoskeletal organisation [, ].This entry includes the transforming growth factor beta-1-induced transcript 1 (TGFB1I1, also known as Hic-5).Hic-5 functions as a molecular adapter coordinating multiple protein-protein interactions at the focal adhesion complex and in the nucleus [ ]. Leupaxin is a transcriptional coactivator for androgen receptor (AR) and serum response factor (SRF) [].This entry represents the first Lim domain of Paxillin and similar animal proteins, which function as an adaptor or scaffold to support the assembly of multimeric protein complexes. This domain shows two characteristic zinc finger motifs. The two zinc fingers contain eight conserved residues, mostly cysteines and histidines, which coordinately bond to two zinc atoms [ , ].
Protein Domain
Name: von Willebrand factor, type D domain
Type: Domain
Description: Von Willebrand factor (VWF) is a large, multimeric blood glycoprotein synthesized in endothelial cells and megakaryocytes, that is required fornormal hemostasis. Mutant forms are involved in the most common inherited bleeding disorder (von Willebrand disease: VWD). VWF mediates the adhesion ofplatelets to sites of vascular damage by binding to specific platelet membrane glycoproteins and to constituents of exposed connective tissue. It is alsoessential for the transport of the blood clotting factor VIII [ , ].VWF is a large multidomain protein. The type D domain (VWFD) is not only required for blood clotting factor VIII binding but also for normalmultimerization of VWF [ , ]. The interaction between blood clotting factorVIII and VWF is necessary for normal survival of blood clotting factor VIII in blood circulation. The VWFD domain is a highly structured region, in which thefirst conserved Cys has been found to form a disulfide bridge with the second conserved one [, ].The VWFD domain can occur in association with a lot of different domains like vitellogenin, VWFC, VWFA, and ZP.Proteins with a VWFD domain are listed below:Mammalian von Willebrand factor (VWF), a multifunctional protein involved in maintaining homeostasis. It consists of 4 VWFD domains (D1-4), 3 VWFA domains,3 VWFB domains, 2 VWFC domains, an X domain and a C-terminal cystine knot [ ].There might be a third VWFC domain within the type B domain region [ ]. The structure of the VWF D3 domain has been revealed []. Mammalian zonadhesin, which binds in a species-specific manner to the zona pellucida of the egg.Mammalian bone morphogenetic protein-binding (BMP-binding) endothelial regulator protein.Mammalian alpha-tectorin, which is one of the major non-collagenous components of the tectorial membrane.Mammalian mucins, glycoproteins that are major constituents of the glycocalyx that covers mucosal epithelium.Mammalian vitellogenin, a major lipoprotein in many oviparous animals, which is a precursor of a lipid-binding product named as lipovitellin.This entry represents the VWFD domain.
Protein Domain
Name: HAT (Half-A-TPR) repeat
Type: Repeat
Description: The HAT (Half A TPR) repeat has a repetitive pattern characterised by three aromatic residues with a conserved spacing. They are structurally and sequentially similar to TPRs (tetratricopeptide repeats), though they lack the highly conserved alanine and glycine residues found in TPRs. The number of HAT repeats found in different proteins varies between 9 and 12. HAT-repeat-containing proteins appear to be components of macromolecular complexes that are required for RNA processing [ ]. The HAT motif has striking structural similarities to HEAT repeats (), being of a similar length and consisting of two short helices connected by a loop domain, as in HEAT repeats. Some studies have suggested that the HAT repeats may be involved in protein-protein interactions [ , ]. However, the HAT repeats of Arabidopsis HCF107 protein have been shown to bind RNA []. Proteins containing this domain includes:Crooked neck (Crn) from Drosophila. It associates with the RNA-binding protein HOW to control glial cell maturation [ ]. Clf1, Prp6 and Prp39 from S. cerevisiae. Clf1 is part of the of the NineTeen Complex (NTC) that stabilises U6 snRNA in catalytic forms of the spliceosome containing U2, U5, and U6 snRNAs [ , ]. Prp6 and Prp39 are involved in pre-mRNA splicing.Cleavage stimulation factor subunit 3 (CSTF3 or CstF-77) and crooked neck-like protein 1 (CRNKL1) from mammals. CSTF3 is required for polyadenylation and 3'-end cleavage of mammalian pre-mRNAs [ ]. Protein high chlorophyll fluorescent 107 (HCF107) from Arabidopsis. HCF107 exhibits sequence-specific RNA binding and RNA remodeling activities, probably leading to the activation of translation of the target gene cluster psbB-psbT-psbH-petB-petD [ ]. It blocks 5'-3' and 3'-5' exoribonucleases (e.g. polynucleotide phosphorylase (PNPase), RNase R) in vitro []. It is necessary for intercistronic RNA processing of the psbH 5' untranslated region or the stabilization of 5' processed psbH RNAs and is also required for the synthesis of psbB [, , ].
Protein Domain
Name: Flaviviral glycoprotein E, central domain, subdomain 2
Type: Homologous_superfamily
Description: Flaviviruses are small, enveloped RNA viruses that use arthropods such as mosquitoes for transmission to their vertebrate hosts, and include Yellow fever virus (YFV), West Nile virus (WNV), Tick-borne encephalitis virus, Japanese encephalitis virus (JE) and Dengue virus 2 viruses []. Flaviviruses consist of three structural proteins: the core nucleocapsid protein C (), and the envelope glycoproteins M ( ) and E. Glycoprotein E is a class II viral fusion protein that mediates both receptor binding and fusion. Class II viral fusion proteins are found in flaviviruses and alphaviruses, and are structurally distinct from class I fusion proteins from influenza virus and HIV. Glycoprotein E is comprised of three domains: domain I (dimerisation domain) is an 8-stranded beta barrel, domain II (central domain) is an elongated domain composed of twelve beta strands and two alpha helices, and domain III (immunoglobulin-like domain) is an IgC-like module with ten beta strands. Domains I and II are intertwined [ ]. Domain II can be divided into two structural components, both of which comprise α-β sandwich folds. This entry represents subdomain 2 of domain II, the central domain.The glycoprotein E dimers on the viral surface re-cluster irreversibly into fusion-competent trimers upon exposure to low pH, as found in the acidic environment of the endosome. The formation of trimers results in a conformational change in the hinge region of domain II, a key structural element that opens a ligand-binding hydrophobic pocket at the interface between domains I and II. The conformational change results in the exposure of a fusion peptide loop at the tip of domain II, which is required in the fusion step to drive the cellular and viral membranes together by inserting into the membrane [].
Protein Domain
Name: Flaviviral glycoprotein E, central domain, subdomain 1
Type: Homologous_superfamily
Description: Flaviviruses are small, enveloped RNA viruses that use arthropods such as mosquitoes for transmission to their vertebrate hosts, and include Yellow fever virus (YFV), West Nile virus (WNV), Tick-borne encephalitis virus, Japanese encephalitis virus (JE) and Dengue virus 2 viruses []. Flaviviruses consist of three structural proteins: the core nucleocapsid protein C (), and the envelope glycoproteins M ( ) and E. Glycoprotein E is a class II viral fusion protein that mediates both receptor binding and fusion. Class II viral fusion proteins are found in flaviviruses and alphaviruses, and are structurally distinct from class I fusion proteins from influenza virus and HIV. Glycoprotein E is comprised of three domains: domain I (dimerisation domain) is an 8-stranded beta barrel, domain II (central domain) is an elongated domain composed of twelve beta strands and two alpha helices, and domain III (immunoglobulin-like domain) is an IgC-like module with ten beta strands. Domains I and II are intertwined [ ]. Domain II can be divided into two structural components, both of which comprise α-β sandwich folds. This entry represents subdomain 1 of domain II, the central domain.The glycoprotein E dimers on the viral surface re-cluster irreversibly into fusion-competent trimers upon exposure to low pH, as found in the acidic environment of the endosome. The formation of trimers results in a conformational change in the hinge region of domain II, a key structural element that opens a ligand-binding hydrophobic pocket at the interface between domains I and II. The conformational change results in the exposure of a fusion peptide loop at the tip of domain II, which is required in the fusion step to drive the cellular and viral membranes together by inserting into the membrane [ ].
Protein Domain
Name: C-type lectin, conserved site
Type: Conserved_site
Description: Lectins occur in plants, animals, bacteria and viruses. Initially described for their carbohydrate-binding activity [ ], they are now recognised as a more diverse group of proteins, some of which are involved in protein-protein, protein-lipid or protein-nucleic acid interactions []. There are at least twelve structural families of lectins:C-type lectins, which are Ca+-dependent. S-type (galectins), a widespread family of glycan-binding proteins [ ].I-type, which have an immunoglobulin-like fold and can recognise sialic acids, other sugars and glycosaminoglycans [ ].P-type, which bind phosphomannosyl receptors [ ].Pentraxins [ ].(Trout) egg lectins.Calreticulin and calnexin, which act as molecular chaperones of the endoplasmic reticulum [ ].ERGIC-53 and VIP-36 [ ].Discoidins [ ].Eel aggutinins (fucolectins) [ ].Annexin lectins [ ].Fibrinogen-type lectins, which includes ficolins, tachylectins 5A and 5B, and Limax flavus (Spotted garden slug) agglutinin (these proteins have clear distinctions from one another, but they share a homologous fibrinogen-like domain used for carbohydrate binding).Also unclassified orphan lectins, including amphoterin, Cel-II, complement factor H, thrombospondin, sailic acid-binding lectins, adherence lectin, and cytokins (such as tumour necrosis factor and several interleukins).C-type lectins can be further divided into seven subgroups based on additional non-lectin domains and gene structure: (I) hyalectans, (II) asialoglycoprotein receptors, (III) collectins, (IV) selectins, (V) NK group transmembrane receptors, (VI) macrophage mannose receptors, and (VII) simple (single domain) lectins [ ].Therefore, lectins are a diverse group of proteins, both in terms of structure and activity. Carbohydrate binding ability may have evolved independently and sporadically in numerous unrelated families, where each evolved a structure that was conserved to fulfil some other activity and function. In general, animal lectins act as recognition molecules within the immune system, their functions involving defence against pathogens, cell trafficking, immune regulation and the prevention of autoimmunity [ ].
Protein Domain
Name: Type III secretion system, low calcium response, chaperone LcrH/SycD, subgroup
Type: Family
Description: The type III secretion system of Gram-negative bacteria is used to transport virulence factors from the pathogen directly into the host cell [ ] and is only triggered when the bacterium comes into close contact with the host. Effector proteins secreted by the type III system do not possess a secretion signal, and are considered unique because of this. Yersinia spp. secrete effector proteins called YopB and YopD that facilitate the spread of other translocated proteins through the type III needle and the host cell cytoplasm []. Both are believed to act as pore translocases, forming apertures in the host cell membrane and allowing the bacterium easy access to its cytoplasm. YopD also acts as a negative regulator of the Yersinia low-calcium response, and in turn is controlled by a chaperone, SycD []. This protein also regulates YopB secretion. SycD is located on the Yop pathogenicity island of Yersinia spp., and is speculated to prevent a premature interaction between YopB, YopD and the calcium-response LcrV protein [].It has been speculated that a type III secretion mechanism also exists in Chlamydial species. With the sequencing of the Chlamydia trachomatis genome, several proteins similar to characterised type III proteins have emerged, including a SycD homologue [ ]. The Pseudomonas aeruginosa gene PcrH is also similar to the Yersinia chaperone, suggesting a comparable function.Proteins in this entry are found in type III secretion operons. LcrH, from Yersinia is believed to have a regulatory function in the low-calcium response of the secretion system [ , ]. The same protein is also known as SycD (SYC = Specific Yop Chaperone) for its chaperone role []. In Pseudomonas, where the homologue is known as PcrH, the chaperone role has been demonstrated and the regulatory role appears to be absent []. ScyD/LcrH contains three central tetratricopeptide-like repeats that are predicted to fold into an all-α-helical array [].
Protein Domain
Name: Type III secretion system, low calcium response, chaperone LcrH/SycD
Type: Family
Description: The type III secretion system of Gram-negative bacteria is used to transport virulence factors from the pathogen directly into the host cell [ ] and is only triggered when the bacterium comes into close contact with the host. Effector proteins secreted by the type III system do not possess a secretion signal, and are considered unique because of this. Yersinia spp. secrete effector proteins called YopB and YopD that facilitate the spread of other translocated proteins through the type III needle and the host cell cytoplasm []. Both are believed to act as pore translocases, forming apertures in the host cell membrane and allowing the bacterium easy access to its cytoplasm. YopD also acts as a negative regulator of the Yersinia low-calcium response, and in turn is controlled by a chaperone, SycD []. This protein also regulates YopB secretion. SycD is located on the Yop pathogenicity island of Yersinia spp., and is speculated to prevent a premature interaction between YopB, YopD and the calcium-response LcrV protein [].It has been speculated that a type III secretion mechanism also exists in Chlamydial species. With the sequencing of the Chlamydia trachomatis genome, several proteins similar to characterised type III proteins have emerged, including a SycD homologue [ ]. The Pseudomonas aeruginosa gene PcrH is also similar to the Yersinia chaperone, suggesting a comparable function.Proteins in this entry are found in type III secretion operons. LcrH, from Yersinia is believed to have a regulatory function in the low-calcium response of the secretion system [ , ]. The same protein is also known as SycD (SYC = Specific Yop Chaperone) for its chaperone role []. In Pseudomonas, where the homologue is known as PcrH, the chaperone role has been demonstrated and the regulatory role appears to be absent []. ScyD/LcrH contains three central tetratricopeptide-like repeats that are predicted to fold into an all-α-helical array [].
Protein Domain
Name: Ras guanine-nucleotide exchange factor, conserved site
Type: Conserved_site
Description: Ras proteins are membrane-associated molecular switches that bind GTP and GDP and slowly hydrolyse GTP to GDP [ ]. The balance between the GTP bound (active) and GDP bound (inactive) states is regulated by the opposite action of proteins activating the GTPase activity and that of proteins which promote the loss of bound GDP and the uptake of fresh GTP [, ]. The latter proteins are known as guanine-nucleotide dissociation stimulators (GDSs) (or also as guanine-nucleotide releasing (or exchange) factors (GRFs)). Proteins that act as GDS can be classified into at least two families, on the basis of sequence similarities, the CDC24 family (see ) and the CDC25 family. The size of the proteins of the CDC25 family range from 309 residues (LTE1) to 1596 residues (sos). The sequence similarity shared by all these proteins is limited to a region of about 250 amino acids generally located in their C-terminal section (currently the only exceptions are sos and ralGDS where this domain makes up the central part of the protein). This domain has been shown, in CDC25 an SCD25, to be essential for the activity of these proteins.The crystal structure of the GEF region of human Sos1 complexes with Ras has been solved [ ]. The structure consists of two distinct alpha helical structural domains: the N-terminal domain which seems to have a purely structural role and the C-terminal domain which is sufficient for catalytic activity and contains all residues that interact with Ras. A main feature of the catalytic domain is the protrusion of a helical hairpin important for the nucleotide-exchange mechanism. The N-terminal domain is likely to be important for the stability and correct placement of the hairpin structure. The signature pattern for this entry spans the helical hairpin.
Protein Domain
Name: BRCA1-associated 2/ETP1, RRM
Type: Domain
Description: This entry represents the RNA-binding domain (also referred to as RNA recognition motif (RRM)) of BRAP2 and its homologues.This entry includes human BRCA1-associated protein (BRAP/BRAP2, also known as impedes mitogenic signal propagation (IMP), RING finger protein 52, or renal carcinoma antigen NY-REN-63) and its homologues from yeast ETP1. BRAP2 is a cytoplasmic protein interacting with the two functional nuclear localization signal (NLS) motifs of BRCA1, a nuclear protein linked to breast cancer. It also binds to the SV40 large T antigen NLS motif and the bipartite NLS motif found in mitosin. BRAP2 serves as a cytoplasmic retention protein and plays a role in the regulation of nuclear protein transport [ , , , , , ].These proteins contain an N-terminal RNA recognition motif (RRM), also known as RBD (RNA binding domain) or RNP (ribonucleoprotein domain), followed by a C3H2C3-type RING-H2 finger and a UBP-type zinc finger.
Protein Domain
Name: FeS cluster biogenesis
Type: Domain
Description: The proteins in this entry are variously annotated as iron-sulphur cluster insertion protein or Fe/S biogenesis protein. They appear to be involved in Fe-S cluster biogenesis. This family includes IscA, HesB, YadR and YfhF-like proteins. The hesB gene is expressed only under nitrogen fixation conditions [ ]. IscA, an 11kDa member of the hesB family of proteins, binds iron and [2Fe-2S]clusters, and participates in the biosynthesis of iron-sulphur proteins. IscA is able to bind at least 2 iron ions per dimer [ ]. Other members of this family include various hypothetical proteins that also contain the NifU-like domain ( ) suggesting that they too are able to bind iron and are involved in Fe-S cluster biogenesis. The HesB family are found in species as divergent as Homo sapiens (Human) and Haemophilus influenzae suggesting that these proteins are involved in basic cellular functions [ ].
Protein Domain
Name: Nucleic acid-binding, OB-fold
Type: Homologous_superfamily
Description: A five-stranded β-barrel was first noted as a common structure among four proteins binding single-stranded nucleic acids (staphylococcal nuclease and aspartyl-tRNA synthetase) or oligosaccharides (B subunits of enterotoxin and verotoxin-1), and has been termed the oligonucleotide/oligosaccharide binding motif, or OB fold, a five-stranded β-sheet coiled to form a closed β-barrel capped by an alpha helix located between the third and fourth strands []. Two ribosomal proteins, S17 and S1, are members of this class, and have different variations of the OB fold theme. Comparisons with other OB fold nucleic acid binding proteins suggest somewhat different mechanisms of nucleic acid recognition in each case [].There are many nucleic acid-binding proteins that contain domains with this OB-fold structure, including anticodon-binding tRNA synthetases, ssDNA-binding proteins (CDC13, telomere-end binding proteins), phage ssDNA-binding proteins (gp32, gp2.5, gpV), cold shock proteins, DNA ligases, RNA-capping enzymes, DNA replication initiators and RNA polymerase subunit RBP8 [ ].
Protein Domain
Name: Six-bladed beta-propeller, TolB-like
Type: Homologous_superfamily
Description: This superfamily represents a six-bladed β-propeller domain consisting of six 4-stranded β-sheet motifs. This domain can be found in TolB proteins (C-terminal), in soluble quinoprotein glucose dehydrogenase, in calcium-dependent phosphotriesterases, in the low density lipoprotein (LDL) receptor YWTD domain, in nidogen, and in serine/threonine-protein kinase (PknD) NHL repeat domain.TolB is a periplasmic protein from Escherichia coli that is part of the Tol-dependent translocation system involving group A and E colicins that is used to penetrate and kill cells [ , ]. TolB has two domains, an α-helical N-terminal domain that shares structural similarity with the C-terminal domain of transfer RNA ligases, and a β-propeller C-terminal domain that shares structural similarity with numerous members of the prolyl oligopeptidase family and, to a lesser extent, to class B metallo-beta-lactamases (although its does not necessarily occur at the C-terminal in these proteins) []. The C-terminal domain of TolB may mediate protein-protein interactions with colicins.
Protein Domain
Name: Sortase family
Type: Family
Description: This family includes Staphylococcus aureus sortase, a transpeptidase that attaches surface proteins by the Thr of an LPXTG motif to the cell wall. It also includes a protein required for correct assembly of an LPXTG-containing fimbrial protein, a set of homologous proteins from Streptococcus pneumoniae, in which LPXTG proteins are common. However, related proteins are found in Bacillus subtilis and Methanobacterium thermoautotrophicum, in which LPXTG-mediated cell wall attachment is not known [ ]. Sortase refers to a group of prokaryotic enzymes which catalyze the assembly of pilins into pili, and the anchoring of pili to the cell wall [ ]. They act as both proteases and transpeptidases []. Sortase, a transpeptidase present in almost all Gram-positive bacteria, anchors a range of important surface proteins to the cell wall [, ]. The sortases are thought to be good targets for new antibiotics as they are important proteins for pathogenic bacteria [ ].
Protein Domain
Name: Torsin 1/2
Type: Family
Description: This entry represents a number of torsins including types 1A, 1B and 2A. Some torsin-like proteins are also included. Torsion dystonia is an autosomal dominant movement disorder characterised by involuntary, repetitive muscle contractions and twisted postures. The most severe early-onset form of dystonia has been linked to mutations in the human DYT1 (TOR1A) gene encoding a protein termed torsinA. While causative genetic alterations have been identified, the function of torsin proteins and the molecular mechanism underlying dystonia remain unknown. Phylogenetic analysis of the torsin protein family indicates these proteins share distant sequence similarity with the large and diverse family of AAA ATPase, central region containing proteins ( ) proteins. It has been suggested that torsins play a role in effectively managing protein folding and that possible breakdown in a neuroprotective mechanism that is, in part, mediated by torsins may be responsible for the neuronal dysfunction associated with dystonia [ ].
Protein Domain
Name: Carlavirus coat
Type: Domain
Description: This domain is found together with the viral coat protein domain ( ) in coat/capsid proteins of the plant infecting Carlavirus. It is required for genome encapsidation by forming ribonucleoprotein complexes along with TGB1 helicase and viral RNA. The N- and the C terminus of this coat protein can be exposed on the surface of the virus particle. The central core sequence may be important in maintaining correct tertiary structure of the coat protein and/or play a role in the interaction with the viral RNA. Coat proteins are often used to distinguish between Carlavirus isolates. In the coat protein amino acid sequences of definitive and tentative species of carlaviruses, there is a region of seven amino acids (GLGVPTE) that are conserved [ ]. The complete coat protein (CP) sequences of 29 Indian Chrysanthemum virus B (CVB) isolates were highly heterogeneous, sharing nucleotide sequence identities of 74-98% [, ].
Protein Domain
Name: Pappalysin-1/2
Type: Family
Description: Pappalysin-1 (also known as pregnancy-associated plasma protein-A (PAPP-A); MERIOPS identifier M43.004) is a metalloendopeptidase that belongs to the MEROPS peptidase family M43B. It was first found in high concentrations in the blood of pregnant women. Later, it was identified as a proteinase responsible for cleavage of insulin-like growth factor binding protein (IGFBP)-4, an inhibitor of IGF action that mediates cell growth and survival signals. PAPP-A is expressed by different cell types, and thus no longer could be considered to be just "pregnancy-associated"[ ]. It is involved in rapid yet strictly controlled growth and development, including wound healing, bone remodelling, folliculogenesis, placental development, and atherosclerosis [].Pappalysin-2 is a zinc binding metalloproteinase which specifically cleaves insulin-like growth factor-binding protein 5 (IGFBP-5). It shows limited proteolysis toward IGFBP-3 [ ].This entry includes Pappalysin-1/2 (PAPPA/PAPPA2). They are metalloproteinases that cleave insulin-like growth factor binding proteins. This entry also includes a group of uncharacterised bacterial proteins.
Protein Domain
Name: Bifunctional phosphatidylinositol trisphosphate phosphatase/dual specificity phosphatase PTEN
Type: Family
Description: This entry represents the magnesium-dependent bifunctional phosphatidylinositol-3,4,5-trisphosphate 3-phosphatase/dual-specificity protein phosphatase PTEN ( ; ; ), which possesses the following catalytic activities: Phosphatidylinositol 3,4,5-trisphosphate + H(2)O = phosphatidylinositol 4,5-bisphosphate + phosphate.A phosphoprotein + H(2)O = a protein + phosphate.Protein tyrosine phosphate + H(2)O = protein tyrosine + phosphate.This protein acts as a dual-specificity protein phosphatase, dephosphorylating tyrosine-, serine- and threonine-phosphorylated proteins [ ]. It also acts as a lipid phosphatase, removing the phosphate in the D3 position of the inositol ring from phosphatidylinositol 3,4,5-trisphosphate, phosphatidylinositol 3,4-diphosphate, phosphatidylinositol 3-phosphate and inositol 1,3,4,5-tetrakisphosphate. The lipid phosphatase activity is critical for its tumour suppressor function []. PTEN antagonizes the PI3K-AKT/PKB signalling pathway by dephosphorylating phosphoinositides and thereby modulating cell cycle progression and cell survival. The unphosphorylated form of PTEN cooperates with AIP1 to suppress AKT1 activation. PTEN dephosphorylates tyrosine-phosphorylated focal adhesion kinase and inhibits cell migration and integrin-mediated cell spreading and focal adhesion form [].
Protein Domain
Name: Ferritin-like diiron domain
Type: Domain
Description: This entry represents a group of proteins, containing ferritin-like domain, which is an about 145-residue domain made of a four-helix bundle surrounding a non-heme, non-sulphur, oxo-bridged diiron site. The diiron site is contained within a twisted, left-handed four-helix-bundle constituted of two anti-parallel helix pairs connected through a left-handed crossover connection. Known ligand residues at non-heme, non-sulphur diiron sites in proteins include His, Asp, Glu, and Tyr. Proteins containing a ferritin-like diiron domain possess the ability to catalyzeoxidation of Fe 2+to Fe 2+by O 2, i.e. ferroxidase activity. The ferritin- like diiron domain occurs in stand-alone form in ferritin and bacterioferritinor in association with the rubredoxin-like domain in rubrerythrin [].Proteins known to contain a ferritin-like diiron domain are listed below: Ferritin (Ftn), an eukaryotic intracellular protein that stores iron in a soluble, nontoxic, readily available form.Bacterioferritin (Bfr), a prokaryotic protein which may perform functions in iron detoxification and storage.Rubrerythrin (Rr), a non-heme protein isolated from anaerobic sulphate- reducing bacteria.Nigerythrin (Nr), a prokaryotic protein of unknown function.
Protein Domain
Name: Bacterial Ig-like domain, group 2
Type: Domain
Description: The Ig-like fold is part of proteins with important roles in different physiological processes [ ]. This entry represents the bacterial Ig-like domain (Big2). This domain is mainly found in a variety of bacterial and phage surface proteins such as intimins, but has also been found in several eukaryote proteins []. Intimin (Eae protein) is a bacterial cell-adhesion molecule that mediates the intimate bacterial host-cell interaction. It contains three domains; two immunoglobulin-like domains and a C-type lectin-like module implying that carbohydrate recognition may be important in intimin-mediated cell adhesion [ , ].The structure of this domain was also described in the Tail tube protein of phage lambda (TTP). This protein assembles in hexameric rings that stack on top of each others [ , ].Nuclear pore membrane glycoprotein 210 from humans (POM210) also belongs to this group of proteins. This nucleoporin is essential for nuclear pore assembly and fusion, nuclear pore spacing, as well as structural integrity [ ].
Protein Domain
Name: Development/cell death domain
Type: Domain
Description: The DCD (Development and Cell Death) domain is found in plant proteins involved in development and cell death. The DCD domain is an ~130 amino acid long stretch that contains several mostly invariable motifs. These include a FGLP and a LFL motif at the N terminus and a PAQV and a PLxE motif towards the C terminus of the domain. The DCD domain is present in proteins with different architectures. Some of these proteins contain additional recognizable motifs, like the KELCH repeats or the ParB domain [ ]. Biological studies indicate a role of these proteins in phytohormone response, embryo development and programmed cell death by pathogens or ozone.The predicted secondary structure of the DCD domain is mostly composed of beta strands and confined by an α-helix at the N- and at the C terminus [ ].Proteins known to contain a DCD domain are listed below: Carrot B2 protein.Pea Gda-1 protein.Soybean N-rich protein (NRP).
Protein Domain
Name: CobW/HypB/UreG, nucleotide-binding domain
Type: Domain
Description: This domain is found in HypB, a hydrogenase expression/formation protein, and urease accessory protein UreG. Both these proteins contain a P-loop nucleotide binding motif [ , ]. HypB has GTPase activity and is a guanine nucleotide binding protein []. UreG is a GTPase in charge of nucleotide hydrolysis required for activation of the urease enzyme []. Both GTPases are involved in nickel binding. HypB can store nickel and is required for nickel dependent hydrogenase expression []. UreG is required for functional incorporation of the urease nickel metallocentre []. GTP hydrolysis may required by these proteins for nickel incorporation into other nickel proteins [].Other proteins containing this domain include P47K ( ), a Pseudomonas chlororaphis protein needed for nitrile hydratase expression, CobW ( ), which may be involved in cobalamin biosynthesis in Pseudomonas denitrificans [ ], and YjiA []. Both CobW and YjiA are members of the metal homeostasis-associated COG0523 family of GTPases [].
Protein Domain
Name: Translation elongation factor, IF5A C-terminal
Type: Domain
Description: A five-stranded β-barrel was first noted as a common structure among four proteins binding single-stranded nucleic acids (staphylococcal nuclease and aspartyl-tRNA synthetase) or oligosaccharides (B subunits of enterotoxin and verotoxin-1), and has been termed the oligonucleotide/oligosaccharide binding motif, or OB fold, a five-stranded β-sheet coiled to form a closed β-barrel capped by an alpha helix located between the third and fourth strands []. Two ribosomal proteins, S17 and S1, are members of this class, and have different variations of the OB fold theme. Comparisons with other OB fold nucleic acid binding proteins suggest somewhat different mechanisms of nucleic acid recognition in each case [].There are many nucleic acid-binding proteins that contain domains with this OB-fold structure, including anticodon-binding tRNA synthetases, ssDNA-binding proteins (CDC13, telomere-end binding proteins), phage ssDNA-binding proteins (gp32, gp2.5, gpV), cold shock proteins, DNA ligases, RNA-capping enzymes, DNA replication initiators and RNA polymerase subunit RBP8 [ ].This entry represents the RNA-binding domain of translation elongation factor IF5A [ ].
Protein Domain
Name: IQ motif, EF-hand binding site
Type: Binding_site
Description: The IQ motif is an extremely basic unit of about 23 amino acids, whose conserved core usually fits the consensus A-x(3)-I-Q-x(2)-F-R-x(4)-K-K. The IQ motif, which can be present in one or more copies, serves as a binding site for different EF-hand proteins including the essential and regulatory myosin light chains, calmodulin (CaM), and CaM-like proteins [ , ].Many IQ motifs are protein kinase C (PKC) phosphorylation sites [, ].Resolution of the 3D structure of scallop myosin has shown that the IQ motif forms a basic amphipathic helix [ ].Some proteins known to contain an IQ motif are listed below: A number of conventional and unconventional myosins.Neuromodulin (GAP-43). This protein is associated with nerve growth. It is a major component of the motile "growth cones"that form the tips of elongating axons.Neurogranin (NG/p17). Acts as a "third messenger"substrate of protein kinase C-mediated molecular cascades during synaptic development and remodeling.Sperm surface protein Sp17.Ras GTPase-activating-like protein IQGAP1. IQGAP1 contains 4 IQ motifs.This entry covers the entire IQ motif.
Protein Domain
Name: GspD/PilQ family
Type: Family
Description: The general (type II) secretion pathway (Gsp) within Gram-negative bacteria is a signal sequence-dependent process responsible for protein export [ , , ]. The process has two stages: exoproteins are first translocated across the inner membrane by the general signal-dependent export pathway (GEP), and then across the outer membrane by a species-specific accessory mechanism.A number of proteins are involved in the Gsp; one of these is known as protein D (GspD protein), the most probable location of which is the outer membrane [ ]. This suggests that protein D constitutes the apparatus of the accessory mechanism, and is thus involved in transporting exoproteins from the periplasm, across the outer membrane, to the extracellular environment.This entry represents GspD and close homologues, including the type IV pilus outer membrane secretin PilQ. It also includes virion export protein, which is thought to form a channel across the host outer membrane for the purposes of extruding the bacteriophage.
Protein Domain
Name: Nuclear Testis protein, N-terminal
Type: Domain
Description: This domain is found in the N-terminal region of Nuclear Testis (NUT) proteins. The NUT gene is found fused to BRD3 or BRD4 genes in some aggressive types of carcinoma, due to chromosomal translocations [ , ]. The BRD family proteins contain two bromodomains that bind transcriptionally active chromatin through associations with acetylated histones H3 and H4 [, ]. BRD proteins are crucial for the regulation of cell cycle progression. The function of NUT proteins is not clear. NUT proteins contain the Nuclear Export Sequence (NES) and the Nuclear Localisation Signal (NLS), both located towards the C-terminal end of the protein [ , ]. NUT-GFP protein showed either cytoplasmic or nuclear localisation, suggesting that it is subject to nuclear/cytoplasmic shuttling. Consistent with this possibility, treatment with leptomycin B an inhibitor of CRM1-dependent nuclear export resulted in re-distribution of NUT-GFP to the nucleus [, ]. Inspection of NUT revealed a C-terminal sequence similar to known nuclear export NUT protein carries some natively unstructured sequence.
Protein Domain
Name: HNS-dependent expression A
Type: Family
Description: HNS (histone-like nucleoid structuring)-dependent expression A (HdeA) protein is a stress response protein found in highly acid resistant bacteria such as Shigella flexneri and Escherichia coli, but which is lacking in mildly acid tolerant bacteria such as Salmonella [ ]. HdeA is one of the most abundant proteins found in the periplasmic space of E. coli, where it is one of a network of proteins that confer an acid resistance phenotype essential for the pathogenesis of enteric bacteria []. HdeA is thought to act as a chaperone, functioning to prevent the aggregation of periplasmic proteins denatured under acidic conditions. The HNS protein, a chromatin-associated protein that influences the gene expression of several environmentally-induced target genes, represses the expression of HdeA. HdeB, which is encoded within the same operon, may form heterodimers with HdeA. HdeA is a single domain α-helical [] protein with an overall fold that is similar to the fold of the N-terminal subdomain of the GluRS anticodon-binding domain. This entry represents acid stress chaperone HdeA.
Protein Domain
Name: Cerato-platanin
Type: Family
Description: This entry represents a group of fungal proteins involved in plant pathogenesis and elicitation of plant defense responses, including Cerato-platanin (CP) from the Ascomycete Ceratocystis fimbriata f. sp. platani, which causes the severe plant disease: canker stain. This protein occurs in the cell wall of the fungus and is involved in the host-plane interaction and induces both cell necrosis and phytoalexin synthesis which is one of the first plant defense-related events. CP, like other fungal surface proteins, is able to self assemble in vitro [ ]. CP is a 120 amino acid protein, containing 40% hydrophobic residues and two S-S bridges. It contains four cysteine residues that form two disulphide bonds []. The N-terminal region of CP is very similar to cerato-ulmin, a phytotoxic protein produced by the Ophiostoma species belonging to the hydrophobin family, which also self-assembles []. This protein family also includes SnodProt (SNP1), also a phytotoxic protein, and Heat-stable 19 kDa antigen (CSA), a Coccidioides-specific antigen (CS antigen) with serine proteinase activity.
Protein Domain
Name: p115RhoGEF, RGS domain
Type: Domain
Description: The RGS (Regulator of G-protein Signaling) domain is an essential part of the p115RhoGEF protein, a member of the RhoGEF (Rho guanine nucleotide exchange factor) subfamily of the RGS protein family. The RhoGEFs are peripheral membrane proteins that regulate essential cellular processes, including cell shape, cell migration, cell cycle progression of cells, and gene transcription by linking signals from heterotrimeric G-alpha12/13 protein-coupled receptors to Rho GTPase activation, leading to various cellular responses, such as actin reorganization and gene expression [ , ].The RhoGEF subfamily includes p115RhoGEF, LARG, PDZ-RhoGEF and its rat specific splice variant GTRAP48. The RGS domain of RhoGEFs has very little sequence similarity with the canonical RGS domain of the RGS proteins and is often refered to as RH (RGS Homology) domain. In addition to being a G-alpha13/12 effector, the p115RhoGEF protein also functions as a GTPase-activating protein (GAP) for G-alpha13 [ ]. This entry represents the RGS domain of p115RhoGEF.
Protein Domain
Name: Flavivirus glycoprotein E, immunoglobulin-like domain
Type: Domain
Description: Flaviruses are small, enveloped RNA viruses that use arthropods such as mosquitoes for transmission to their vertebrate hosts, and include Yellow fever virus, West Nile virus, Tick-borne encephalitis virus, Japanese encephalitis virus, and Dengue virus 2 [ ]. Flaviviruses consist of three structural proteins: the core nucleocapsid protein C (), and the envelope glycoproteins M ( ) and E. Glycoprotein E is a class II viral fusion protein that mediates both receptor binding and fusion. Class II viral fusion proteins are found in flaviviruses and alphaviruses, and are structurally distinct from class I fusion proteins from influenza-type viruses and retroviruses. Glycoprotein E is comprised of three domains: domain I (dimerisation domain) is an 8-stranded beta barrel, domain II (central domain) is an elongated domain composed of twelve beta strands and two alpha helices, and domain III (immunoglobulin-like domain) is an IgC-like module with ten beta strands. This entry represents the Ig-like domain III, which contains a putative receptor-binding loop [ ].
Protein Domain
Name: TAFH/NHR1
Type: Domain
Description: The TAF homology (TAFH) or Nervy homology region 1 (NHR1) domain is a domain of 95-100 amino acids present in eukaryotic proteins of the MTG/ETO family and whereof the core ~75-80 residues occur in TAF proteins. The transcription initiation TFIID complex is composed of TATA binding protein (TBP) and a number of TBP-associated factors (TAFs). The TAFH/NHR1 domain is named after fruit fly TATA-box-associated factor 110 (TAF110), human TAF105 and TAF130, and the fruit fly protein Nervy, which is a homologue of human MTG8/ETO [ , ]. The human eight twenty-one (ETO, MTG8 or CBFA2T1) and related myeloid transforming gene products MTGR1 and MTG16 as well as the Nervy protein contain the NHR1-4 domains. The NHR1/TAFH domain occurs in the N-terminal part of these proteins, while a MYND-type zinc finger forms the NHR4 domain []. The TAFH/NHR1 domain can be involved in protein-protein interactions, e.g in MTG8/ETO with HSP90 and Gfi-1 [].
Protein Domain
Name: TAFH/NHR1 domain superfamily
Type: Homologous_superfamily
Description: The TAF homology (TAFH) or Nervy homology region 1 (NHR1) domain is a domain of 95-100 amino acids present in eukaryotic proteins of the MTG/ETO family and whereof the core ~75-80 residues occur in TAF proteins. The transcription initiation TFIID complex is composed of TATA binding protein (TBP) and a number of TBP-associated factors (TAFs). The TAFH/NHR1 domain is named after fruit fly TATA-box-associated factor 110 (TAF110), human TAF105 and TAF130, and the fruit fly protein Nervy, which is a homologue of human MTG8/ETO [ , ]. The human eight twenty-one (ETO, MTG8 or CBFA2T1) and related myeloid transforming gene products MTGR1 and MTG16 as well as the Nervy protein contain the NHR1-4 domains. The NHR1/TAFH domain occurs in the N-terminal part of these proteins, while a MYND-type zinc finger forms the NHR4 domain []. The TAFH/NHR1 domain can be involved in protein-protein interactions, e.g in MTG8/ETO with HSP90 and Gfi-1 []. Structurally, the TAFH/NHR1 domain consists of five helices with a close folded leaf topology.
Protein Domain
Name: MPP3, SH3 domain
Type: Domain
Description: This entry represents the SH3 domain of MPP3, which is a scaffolding protein that colocalizes with MPP5 and CRB1 at the subapical region adjacent to adherens junctions and may function in photoreceptor polarity. It interacts with some nectins and regulates their trafficking and processing. Nectins are cell-cell adhesion proteins involved in the establishment apical-basal polarity at cell adhesion sites [ ]. MPP3 belongs to the membrane-associated guanylate kinase (MAGUK) p55 subfamily. The membrane-associated guanylate kinase (MAGUK) p55 subfamily (also known as MPP subfamily) members include the Drosophila Stardust protein and its vertebrate homologues, MPP1-7. They contain the core of three domains characteristic of MAGUK (membrane-associated guanylate kinase) proteins: PDZ, SH3, and guanylate kinase (GuK). In addition, they also contain the Hook (Protein 4.1 Binding) motif in between the SH3 and GuK domains [ ]. MPP2-7 have two additional L27 domains at their N terminus. The GuK domain in MAGUK proteins is enzymatically inactive; instead, the domain mediates protein-protein interactions and associates intramolecularly with the SH3 domain [].
Protein Domain
Name: Membrane magnesium transporter
Type: Family
Description: This entry represents a family of membrane magnesium transporters (MMgT) [ ]. The proteins, MMgT1 (also known as ER membrane protein complex subunit 5) and MMgT2, are localised to the Golgi complex and post-Golgi vesicles, including the early endosomes, suggesting that they may provide regulated pathways for Mg2+ transport in the Golgi and post-Golgi organelles of epithelium-derived cells []. MMgT1 is part of the endoplasmic reticulum membrane protein complex (EMC) that enables the energy-independent insertion into endoplasmic reticulum membranes of newly synthesized membrane proteins and it is also involved in the cotranslational insertion of multi-pass membrane proteins in which stop-transfer membrane-anchor sequences become ER membrane spanning helices [, , , , ]. MMgT1 is also required for the post-translational insertion of tail-anchored proteins in endoplasmic reticulum membranes [, ] and mediates the proper cotranslational insertion of N-terminal transmembrane domains in an N-exo topology thus, controlling the topology of multi-pass membrane proteins like the G protein-coupled receptors [].
Protein Domain
Name: Regulator of G-protein signalling 4, RGS domain
Type: Domain
Description: RGS4 is a member of R4 subfamily of RGS family, a diverse group of multifunctional proteins that regulate cellular signalling events downstream of G-protein coupled receptors (GPCRs) [ ]. Signalling is initiated when GPCRs bind to their ligands, triggering the replacement of GDP bound to the G-alpha subunits of heterotrimeric G proteins with GTP. RGSs inhibit signal transduction by increasing the GTPase activity of G protein alpha subunits, thereby driving them into their inactive GDP-bound form. This activity defines them as GTPase activating proteins (GAPs). RGS4 is expressed widely in brain and has been implicated in regulation of opioid, cholinergic, and serotonergic signaling [ ]. Dysfunctions in RGS4 proteins are involved in etiology of Parkinson's disease, addiction, and schizophrenia [, , ]. RGS4 also is up-regulated in the failing human heart []. RGS4 interacts with many binding partners outside of GPCR pathways, including PIP, calcium/CaM, PA and 14-3-3 [, ].The RGS (Regulator of G-protein Signalling) domain is an essential part of the RGS4 protein.
Protein Domain
Name: Netrin domain
Type: Domain
Description: The netrin (NTR) module is an about 130-residue domain found in the C-terminal parts of netrins, complement proteins C3, C4, and C5, secreted frizzled-related proteins, and type I procollagen C-proteinase enhancer proteins (PCOLCEs), as well as in the N-terminal parts of tissue inhibitors of metalloproteinases (TIMPs). The proteins harboring the NTR domain fulfill diverse biological roles ranging from axon guidance, regulation of Wnt signalling, to the control of the activity of metalloproteinases. The NTR domain can be found associated to other domains such as CUB, WAP, Kazal, Kunitz, Ig-like, laminin N-terminal, laminin-type EGF or frizzled. The NTR domain is implicated in inhibition of zinc metalloproteinases of the metzincin family [ , ].The NTR module is a basic domain containing six conserved cysteines, which are likely to form internal disulphide bonds, and several conserved blocks of hydrophobic residues (including an YLLLG-like motif). The NTR module consists of a β-barrel with two terminal α-helices packed side by side against the face of the β-barrel (see ) [ ].
Protein Domain
Name: PTEN, phosphatase domain
Type: Domain
Description: This entry represents the phosphatase domain found in PTEN, which is a magnesium-dependent bifunctional phosphatidylinositol-3,4,5-trisphosphate 3-phosphatase/dual-specificity protein phosphatase PTEN ( ; ; ), which possesses the following catalytic activities: Phosphatidylinositol 3,4,5-trisphosphate + H(2)O = phosphatidylinositol 4,5-bisphosphate + phosphate.A phosphoprotein + H(2)O = a protein + phosphate.Protein tyrosine phosphate + H(2)O = protein tyrosine + phosphate.This protein acts as a dual-specificity protein phosphatase, dephosphorylating tyrosine-, serine- and threonine-phosphorylated proteins [ ]. It also acts as a lipid phosphatase, removing the phosphate in the D3 position of the inositol ring from phosphatidylinositol 3,4,5-trisphosphate, phosphatidylinositol 3,4-diphosphate, phosphatidylinositol 3-phosphate and inositol 1,3,4,5-tetrakisphosphate. The lipid phosphatase activity is critical for its tumour suppressor function []. PTEN antagonizes the PI3K-AKT/PKB signalling pathway by dephosphorylating phosphoinositides and thereby modulating cell cycle progression and cell survival. The unphosphorylated form of PTEN cooperates with AIP1 to suppress AKT1 activation. PTEN dephosphorylates tyrosine-phosphorylated focal adhesion kinase and inhibits cell migration and integrin-mediated cell spreading and focal adhesion form [].
Protein Domain
Name: Transcription factor STAT
Type: Family
Description: The STAT protein (Signal Transducers and Activators of Transcription) family contains transcription factors that are specifically activated to regulate gene transcription when cells encounter cytokines and growth factors, hence they act as signal transducers in the cytoplasm and transcription activators in the nucleus [ ]. Binding of these factors to cell-surface receptors leads to receptor autophosphorylation at a tyrosine, the phosphotyrosine being recognised by the STAT SH2 domain, which mediates the recruitment of STAT proteins from the cytosol and their association with the activated receptor. The STAT proteins are then activated by phosphorylation via members of the JAK family of protein kinases, causing them to dimerise and translocated to the nucleus, where they bind to specific promoter sequences in target genes. In mammals, STATs comprise a family of seven structurally and functionally related proteins: Stat1, Stat2, Stat3, Stat4, Stat5a and Stat5b, Stat6. STAT proteins play a critical role in regulating innate and acquired host immune responses. Dysregulation of at least two STAT signalling cascades (i.e. Stat3 and Stat5) is associated with cellular transformation.Signalling through the JAK/STAT pathway is initiated when a cytokine binds to its corresponding receptor. This leads to conformational changes in the cytoplasmic portion of the receptor, initiating activation of receptor associated members of the JAK family of kinases. The JAKs, in turn, mediate phosphorylation at the specific receptor tyrosine residues, which then serve as docking sites for STATs and other signalling molecules. Once recruited to the receptor, STATs also become phosphorylated by JAKs, on a single tyrosine residue. Activated STATs dissociate from the receptor, dimerise, translocate to the nucleus and bind to members of the GAS (gamma activated site) family of enhancers.The seven STAT proteins identified in mammals range in size from 750 and 850 amino acids. The chromosomal distribution of these STATs, as well as the identification of STATs in more primitive eukaryotes, suggest that this family arose from a single primordial gene. STATs share 6 structurally and functionally conserved domains including: an N-terminal domain (ND) that strengthens interactions between STAT dimers on adjacent DNA-binding sites; a coiled-coil STAT domain (CCD) that is implicated in protein-protein interactions; a DNA-binding domain (DBD) with an immunoglobulin-like fold similar to p53 tumour suppressor protein; an EF-hand-like linker domain connecting the DNA-binding and SH2 domains; an SH2 domain ( ) that acts as a phosphorylation-dependent switch to control receptor recognition and DNA-binding; and a C-terminal transactivation domain [ , , ]. The crystal structure of the N terminus of Stat4 reveals a dimer. The interface of this dimer is formed by a ring-shaped element consisting of five short helices. Several studies suggest that this N-terminal dimerisation promotes cooperativity of binding to tandem GAS elements and with the transcriptional coactivator CBP/p300.
Protein Domain
Name: Retinol binding protein/Purpurin
Type: Family
Description: Proteins in this family include plasma retinol binding protein (pRBP) and purpurin. They mediate retinol transport in blood plasma [ ]. Binding to RBP allows the hydrophobic vitamin to circulate in blood, but retinol dissociates from the protein prior to entering target cells [].Retinol circulates in the plasma without significant loss because it is bound to pRBP. This protein is synthesized primarily in the liver, where it requires the binding of retinol to trigger its secretion [ ]. In mammals, pRBP binds to transthyretin in the plasma, preventing the loss of pRBP through glomerular filtration []. It has been suggested that a surface cell receptor would recognise the retinol-pRBP-transthyretin complex and, in that way, retinol would be delivered into the cell. However, such a protein remains to be identified. In fish, no transthyretin homologue has been found, though fish pRBP is capable of binding mammalian transthyretin []. In Cyprinus carpio, pRBP is N-glycosylated and it has been suggested that pRBP filtration through kidney glomeruli may be reduced by a glycosylation-dependent increase in the molecular size and negative charge of the protein, since kidney filtration of anionic proteins is less than half that of neutral protein of the same size [].The role of pRBP in retinol transport enables it to fulfill a number of physiological functions:1) it facilitates the transfer of insoluble retinol between tissues, principally transport from storage sites in the liver to peripheral tissues, 2) pRBP protects bound retinol from oxidation and tissues from the indiscriminate distribution of such a biologically active molecule, 3) the synthesis of pRBP regulates retinol release from the liver and mediates the specificity of its uptake by target cells, and 4) pRBP is believed to have an important role in the transfer of retinol from maternal circulation to the developing fetus in mammals [ ].Purpurin is a constituent of adherons, high molecular weight glycoprotein complexes that are released into the growth medium of cultured cells. Adherons mediate the adhesive interactions of many cell types, including those of embryonic chick neural retina. Purpurin promotes cell-adheron adhesion by interacting with a cell surface heparan sulphate proteoglycan. It also prolongs the survival of cultured neural retina cells [ ]. It is located almost exclusively in the neural cells of the retina [, ], and it is synthesised in photoreceptor cells before incorporation into the extracellular matrix []. The role of purpurin as a trophic factor, mediating both cell adhesion and survival, seems clear but it may also have a subsidiary role as a minor retinol transporter in the retina based on its retinol binding capacity []. pRBP is also able to stimulate the adhesion of neural retina cells, although the serum protein is less active than purpurin [].Structural notes: This subgroup shares a common β-barrel fold with the parent group. See for a detailed description.
Protein Domain
Name: Cytochrome P450 superfamily
Type: Homologous_superfamily
Description: Cytochrome P450 enzymes are a superfamily of haem-containing mono-oxygenases that are found in all kingdoms of life, and which show extraordinary diversity in their reaction chemistry. In mammals, these proteins are found primarily in microsomes of hepatocytes and other cell types, where they oxidise steroids, fatty acids and xenobiotics, and are important for the detoxification and clearance of various compounds, as well as for hormone synthesis and breakdown, cholesterol synthesis and vitamin D metabolism. In plants, these proteins are important for the biosynthesis of several compounds such as hormones, defensive compounds and fatty acids. In bacteria, they are important for several metabolic processes, such as the biosynthesis of antibiotic erythromycin in Saccharopolyspora erythraea (Streptomyces erythraeus).Cytochrome P450 enzymes use haem to oxidise their substrates, using protons derived from NADH or NADPH to split the oxygen so a single atom can be added to a substrate. They also require electrons, which they receive from a variety of redox partners. In certain cases, cytochrome P450 can be fused to its redox partner to produce a bi-functional protein, such as with P450BM-3 from Bacillus megaterium [ ], which has haem and flavin domains.Organisms produce many different cytochrome P450 enzymes (at least 58 in humans), which together with alternative splicing can provide a wide array of enzymes with different substrate and tissue specificities. Individual cytochrome P450 proteins follow the nomenclature: CYP, followed by a number (family), then a letter (subfamily), and another number (protein); e.g. CYP3A4 is the fourth protein in family 3, subfamily A. In general, family members should share >40% identity, while subfamily members should share >55% identity.Cytochrome P450 proteins can also be grouped by two different schemes. One scheme was based on a taxonomic split: class I (prokaryotic/mitochondrial) and class II (eukaryotic microsomes). The other scheme was based on the number of components in the system: class B (3-components) and class E (2-components). These classes merge to a certain degree. Most prokaryotes and mitochondria (and fungal CYP55) have 3-component systems (class I/class B) - a FAD-containing flavoprotein (NAD(P)H-dependent reductase), an iron-sulphur protein and P450. Most eukaryotic microsomes have 2-component systems (class II/class E) - NADPH:P450 reductase (FAD and FMN-containing flavoprotein) and P450. There are exceptions to this scheme, such as 1-component systems that resemble class E enzymes [ , , ]. The class E enzymes can be further subdivided into five sequence clusters, groups I-V, each of which may contain more than one cytochrome P450 family (eg, CYP1 and CYP2 are both found in group I). The divergence of the cytochrome P450 superfamily into B- and E-classes, and further divergence into stable clusters within the E-class, appears to be very ancient, occurring before the appearance of eukaryotes.Cytochrome P450 has a multihelical structure.
Protein Domain
Name: STAT transcription factor, DNA-binding
Type: Domain
Description: The STAT protein (Signal Transducers and Activators of Transcription) family contains transcription factors that are specifically activated to regulate gene transcription when cells encounter cytokines and growth factors, hence they act as signal transducers in the cytoplasm and transcription activators in the nucleus [ ]. Binding of these factors to cell-surface receptors leads to receptor autophosphorylation at a tyrosine, the phosphotyrosine being recognised by the STAT SH2 domain, which mediates the recruitment of STAT proteins from the cytosol and their association with the activated receptor. The STAT proteins are then activated by phosphorylation via members of the JAK family of protein kinases, causing them to dimerise and translocated to the nucleus, where they bind to specific promoter sequences in target genes. In mammals, STATs comprise a family of seven structurally and functionally related proteins: Stat1, Stat2, Stat3, Stat4, Stat5a and Stat5b, Stat6. STAT proteins play a critical role in regulating innate and acquired host immune responses. Dysregulation of at least two STAT signalling cascades (i.e. Stat3 and Stat5) is associated with cellular transformation. Signalling through the JAK/STAT pathway is initiated when a cytokine binds to its corresponding receptor. This leads to conformational changes in the cytoplasmic portion of the receptor, initiating activation of receptor associated members of the JAK family of kinases. The JAKs, in turn, mediate phosphorylation at the specific receptor tyrosine residues, which then serve as docking sites for STATs and other signalling molecules. Once recruited to the receptor, STATs also become phosphorylated by JAKs, on a single tyrosine residue. Activated STATs dissociate from the receptor, dimerise, translocate to the nucleus and bind to members of the GAS (gamma activated site) family of enhancers.The seven STAT proteins identified in mammals range in size from 750 and 850 amino acids. The chromosomal distribution of these STATs, as well as the identification of STATs in more primitive eukaryotes, suggest that this family arose from a single primordial gene. STATs share 6 structurally and functionally conserved domains including: an N-terminal domain (ND) that strengthens interactions between STAT dimers on adjacent DNA-binding sites; a coiled-coil STAT domain (CCD) that is implicated in protein-protein interactions; a DNA-binding domain (DBD) with an immunoglobulin-like fold similar to p53 tumour suppressor protein; an EF-hand-like linker domain connecting the DNA-binding and SH2 domains; an SH2 domain ( ) that acts as a phosphorylation-dependent switch to control receptor recognition and DNA-binding; and a C-terminal transactivation domain [ , , ]. The crystal structure of the N terminus of Stat4 reveals a dimer. The interface of this dimer is formed by a ring-shaped element consisting of five short helices. Several studies suggest that this N-terminal dimerisation promotes cooperativity of binding to tandem GAS elements and with the transcriptional coactivator CBP/p300.This entry represents the DNA-binding domain, which has an immunoglobulin-like structural fold.
Protein Domain
Name: MoaA/NifB/PqqE, iron-sulphur binding, conserved site
Type: Conserved_site
Description: A number of proteins involved in the biosynthesis of metallo cofactors have been shown [, ] to be evolutionary related. These include:Bacterial and archebacterial protein moaA, which is involved in the biosynthesis of the molybdenum cofactor (molybdopterin; MPT).Arabidopsis thaliana (Mouse-ear cress) cnx2, a protein involved in molybdopterin biosynthesis and which is highly similar to moaA.Bacillus subtilis narA, which seems to be the moaA ortholog in that bacteria.Bacterial protein nifB (or fixZ) which is involved in the biosynthesis of the nitrogenase iron-molybdenum cofactor.Bacterial protein pqqE which is involved in the biosynthesis of the cofactor pyrrolo-quinoline-quinone (PQQ).Pyrococcus furiosus cmo, a protein involved in the synthesis of a molybdopterin-based tungsten cofactor.Caenorhabditis elegans hypothetical protein F49E2.1.These proteins share, in their N-terminal region, a conserved domain thatcontains three cysteines. In moaA, these cysteines have been shown to be important for biological activity by binding a [4Fe-4S] cluster []. The three cysteines each coordinate one Fe, while S-adenosylmethionine is the fourth ligand to the cluster and binds to its unique Fe as an N/O chelate.
Protein Domain
Name: Ion channel regulatory protein, UNC-93
Type: Family
Description: The proteins in this family are represented by UNC-93 from Caenorhabditis elegans and also includes protein unc-93 homologue A (UNC93A), protein unc-93 homologue B1 (UNC93B1), and UNC93-like protein MFSD11 (also called major facilitator superfamily domain-containing protein 11 or protein ET). UNC-93 colocalizes with SUP-10 and SUP-9 within muscle cells. UNC-93 acts as a regulatory subunit of a multi-subunit potassium channel complex that may function in coordinating muscle contraction in C. elegans [ ]. UNC93B1 controls intracellular trafficking and transport of a subset of Toll-like receptors (TLRs), including TLR3, TLR7 and TLR9, from the endoplasmic reticulum to endolysosomes where they can engage pathogen nucleotides and activate signalling cascades []. MFSD11 is ubiquitously expressed in the periphery and the central nervous system of mice, where it is expressed in excitatory and inhibitory mouse brain neurons [].This group of proteins also includes Notoamide biosynthesis cluster protein O' from Aspergillus versicolor, which is involved in the biosynthesis of notoamide, a fungal indole alkaloid that belongs to a family of natural products containing a characteristic bicyclo[2.2.2]diazaoctane core [].
Protein Domain
Name: START domain
Type: Domain
Description: START (StAR-related lipid-transfer) is a lipid-binding domain in StAR, HD-ZIP and signalling proteins [ ]. StAR (Steroidogenic Acute Regulatory protein) is a mitochondrial protein that is synthesised in response to luteinising hormone stimulation [].Expression of the protein in the absence of hormone stimulation is sufficient to induce steroid production, suggesting that this protein is required in the acute regulation ofsteroidogenesis. Representatives of the START domain family have been shown to bind different ligands such as sterols (StAR protein) andphosphatidylcholine (PC-TP). Ligand binding by the START domain can also regulate the activities of other domains that co-occur with the START domainin multidomain proteins such as Rho-gap, the homeodomain, and the thioesterase domain [, ]. The crystal structure of START domain of human MLN64 shows analpha/beta fold built around an U-shaped incomplete β-barrel. Most importantly, the interior of the protein encompasses a 26 x 12 x 11 Angstromshydrophobic tunnel that is apparently large enough to bind a single cholesterol molecule []. The START domain structure revealed an unexpectedsimilarity to that of the birch pollen allergen Bet v 1 and to bacterial polyketide cyclases/aromatases [, ].
Protein Domain
Name: HNS-dependent expression B
Type: Family
Description: HNS (histone-like nucleoid structuring)-dependent expression A (HdeA) protein is a stress response protein found in highly acid resistant bacteria such as Shigella flexneri and Escherichia coli, but which is lacking in mildly acid tolerant bacteria such as Salmonella [ ]. HdeA is one of the most abundant proteins found in the periplasmic space of E. coli, where it is one of a network of proteins that confer an acid resistance phenotype essential for the pathogenesis of enteric bacteria []. HdeA is thought to act as a chaperone, functioning to prevent the aggregation of periplasmic proteins denatured under acidic conditions. The HNS protein, a chromatin-associated protein that influences the gene expression of several environmentally-induced target genes, represses the expression of HdeA. HdeB, which is encoded within the same operon, may form heterodimers with HdeA. HdeA is a single domain α-helical [] protein with an overall fold that is similar to the fold of the N-terminal subdomain of the GluRS anticodon-binding domain. This entry represents acid stress chaperone HdeB [ ].
Protein Domain
Name: HNS-dependent expression A/B
Type: Family
Description: HNS (histone-like nucleoid structuring)-dependent expression A (HdeA) protein is a stress response protein found in highly acid resistant bacteria such as Shigella flexneri and Escherichia coli, but which is lacking in mildly acid tolerant bacteria such as Salmonella [ ]. HdeA is one of the most abundant proteins found in the periplasmic space of E. coli, where it is one of a network of proteins that confer an acid resistance phenotype essential for the pathogenesis of enteric bacteria [ ]. HdeA is thought to act as a chaperone, functioning to prevent the aggregation of periplasmic proteins denatured under acidic conditions. The HNS protein, a chromatin-associated protein that influences the gene expression of several environmentally-induced target genes, represses the expression of HdeA. HdeB, which is encoded within the same operon, may form heterodimers with HdeA. HdeA is a single domain α-helical [] protein with an overall fold that is similar to the fold of the N-terminal subdomain of the GluRS anticodon-binding domain. This entry represents HdeA and HdeB which are also known as acid stress chaperones.
Protein Domain
Name: Alpha-2-macroglobulin RAP, domain 2
Type: Domain
Description: This entry is the domain 2 (D2) of receptor-associated protein, RAP, also known as alpha-2-macroglobulin receptor-associated protein. RAP is a three domain ER (endoplasmic reticulum)-resident protein that is a chaperone for the LRP (low-density lipoprotein receptor-related protein). RAP is an antagonist and a specialized chaperone that binds tightly to members of the low-density lipoprotein (LDL) receptor family and prevents them from associating with other ligands [ ].D2, along with RAP domain 1 (D1), is essential for blocking low-density lipoprotein receptor-related protein (LRP) from binding of certain ligands, such as alpha2-macroglobulin; D1 and D2 each bind LRP weakly but the tandem D1D2 binds much more tightly to the second and the fourth ligand-binding clusters present on LRP, suggesting the avidity effects arising from amino acid residues contributed from each domain [ ]. Also, RAP has regions that interact weakly with heparin, one located in D2 and two located in D3. The double module of complement type repeats, CR56, of LRP binds many ligands including alpha2-macroglobulin, which promotes the catabolism of the Abeta-peptide implicated in Alzheimer's disease [].
Protein Domain
Name: HNS-dependent expression A/B superfamily
Type: Homologous_superfamily
Description: HNS (histone-like nucleoid structuring)-dependent expression A (HdeA) protein is a stress response protein found in highly acid resistant bacteria such as Shigella flexneri and Escherichia coli, but which is lacking in mildly acid tolerant bacteria such as Salmonella [ ]. HdeA is one of the most abundant proteins found in the periplasmic space of E. coli, where it is one of a network of proteins that confer an acid resistance phenotype essential for the pathogenesis of enteric bacteria []. HdeA is thought to act as a chaperone, functioning to prevent the aggregation of periplasmic proteins denatured under acidic conditions. The HNS protein, a chromatin-associated protein that influences the gene expression of several environmentally-induced target genes, represses the expression of HdeA. HdeB, which is encoded within the same operon, may form heterodimers with HdeA. HdeA is a single domain α-helical [] protein with an overall fold that is similar to the fold of the N-terminal subdomain of the GluRS anticodon-binding domain. This superfamily represents HdeA and HdeB which are also known as acid stress chaperones.
Protein Domain
Name: Synuclein
Type: Family
Description: Synucleins are small, soluble proteins expressed primarily in neural tissue and in certain tumours [ , ]. The family includes three known proteins: alpha-synuclein, beta-synuclein, and gamma-synuclein. All synucleins have in common a highly conserved α-helical lipid-binding motif with similarity to the class-A2 lipid-binding domains of the exchangeable apolipoproteins [].Synuclein family members are not found outside vertebrates, although they have some conserved structural similarity with plant 'late-embryo-abundant' proteins. The alpha- and beta-synuclein proteins are found primarily in brain tissue, where they are seen mainly in presynaptic terminals [ , ]. The gamma-synuclein protein is found primarily in the peripheral nervous system and retina, but its expression in breast tumors is a marker for tumor progression []. Normal cellular functions have not been determined for any of the synuclein proteins, although some data suggest a role in the regulation of membrane stability and/or turnover. Mutations in alpha-synuclein are associated with rare familial cases of early-onset Parkinson's disease, and the protein accumulates abnormally in Parkinson's disease, Alzheimer's disease, and several other neurodegenerative illnesses [].
Protein Domain
Name: Chromogranin A/B/C
Type: Family
Description: Granins (chromogranins or secretogranins) [ ] are a family of acidic proteins present in the secretory granules of a wide variety of endocrine and neuro-endocrine cells. The exact function(s) of these proteins is not yet known but they seem to be the precursors of biologically active peptides and/or they may act as helper proteins in the packaging of peptide hormones and neuropeptides. Apart from their subcellular location and the abundance of acidic residues (Asp and Glu), these proteins do not share many structural similarities. Only one short region, located in the C-terminal section, is conserved in all these proteins, such as:Chromogranin A (CGA): CGA is a protein of about 420 residues; it is the precursor of the peptide pancreastatin which strongly inhibits glucose- induced insulin release from the pancreas.Secretogranin 1 (chromogranin B): A sulfated protein of about 600 residues.Secretogranin 2 (chromogranin C): A sulfated protein of about 650 residues.Chromogranins and secretogranins together share a C-terminal motif, whereas chromogranins A and B share a region of high similarity in their N-terminal section; this region includes two cysteine residues involved in a disulphide bond.
Protein Domain
Name: Heterogeneous nuclear ribonucleoprotein Q, RNA recognition motif 1
Type: Domain
Description: This entry represents the RNA recognition motif 1 (RRM1) of heterogeneous nuclear ribonucleoprotein Q (hnRNP Q).hnRNP Q (also known as NSAP1 or GRY-RBP) is a RNA-binding protein found in the spliceosome complex and the apobec-1 editosome [ , ]. It acts as an interaction partner of a multifunctional protein required for viral replication, and is implicated in the regulation of specific mRNA transport []. hnRNP Q has also been identified as SYNCRIP (synaptotagmin-binding, cytoplasmic RNA-interacting protein), a dual functional protein participating in both viral RNA replication and translation [, ]. As a synaptotagmin-binding protein, hnRNP Q plays a putative role in organelle-based mRNA transport along the cytoskeleton []. Moreover, hnRNP Q has been found in protein complexes involved in translationally coupled mRNA turnover and mRNA splicing. It functions as a wild-type survival motor neuron (SMN)-binding protein that may participate in pre-mRNA splicing and modulate mRNA transport along microtubuli [].hnRNP Q contains an acidic auxiliary N-terminal region, followed by two well-defined and one degenerated RNA recognition motifs (RRMs), and a C-terminal RGG motif; hnRNP Q binds RNA through its RRM domains [ ].
Protein Domain
Name: Heterogeneous nuclear ribonucleoprotein Q, RNA recognition motif 2
Type: Domain
Description: This entry represents the RNA recognition motif 2 (RRM2) of heterogeneous nuclear ribonucleoprotein Q (hnRNP Q).hnRNP Q (also known as NSAP1 or GRY-RBP) is a RNA-binding protein found in the spliceosome complex and the apobec-1 editosome [ , ]. It acts as an interaction partner of a multifunctional protein required for viral replication, and is implicated in the regulation of specific mRNA transport []. hnRNP Q has also been identified as SYNCRIP (synaptotagmin-binding, cytoplasmic RNA-interacting protein), a dual functional protein participating in both viral RNA replication and translation [, ]. As a synaptotagmin-binding protein, hnRNP Q plays a putative role in organelle-based mRNA transport along the cytoskeleton []. Moreover, hnRNP Q has been found in protein complexes involved in translationally coupled mRNA turnover and mRNA splicing. It functions as a wild-type survival motor neuron (SMN)-binding protein that may participate in pre-mRNA splicing and modulate mRNA transport along microtubuli [ ].hnRNP Q contains an acidic auxiliary N-terminal region, followed by two well-defined and one degenerated RNA recognition motifs (RRMs), and a C-terminal RGG motif; hnRNP Q binds RNA through its RRM domains [ ].
Protein Domain
Name: Small ubiquitin-related modifier 1, Ubl domain
Type: Domain
Description: This entry represents the ubiquitin-like (Ubl) domain found in small ubiquitin-related modifier 1 (SUMO1) and similar proteins from animals. SUMO1 is a binding partner of the RAD51/52 nucleoprotein filament proteins, which mediate DNA strand exchange [ , , ]. Its conjugation to cellular proteins has been implicated in multiple important cellular processes, such as nuclear transport, cell cycle control, oncogenesis, inflammation, and the response to virus infection [, , , ]. SUMO resembles ubiquitin (Ub) in structure, binding to other proteins and the mechanism of ligation. Ubiquitin (Ub) is a protein modifier in eukaryotes that is involved in various cellular processes including transcriptional regulation, cell cycle control, and DNA repair. Ubiquitination is comprised of a cascade of E1, E2 and E3 enzymes that results in a covalent bond between the C terminus of Ub and the ε-amino group of a substrate lysine. SUMOs, like Ub, are covalently conjugated to lysine residues in a wide variety of target proteins in eukaryotic cells and regulate numerous cellular processes, such as transcription, epigenetic gene control, genomic instability, and protein degradation [, , ].
Protein Domain
Name: CC chemokine, conserved site
Type: Conserved_site
Description: Many low-molecular weight factors secreted by cells including fibroblasts, macrophages and endothelial cells, in response to a variety of stimuli such as growth factors, interferons, viral transformation and bacterial products, are structurally related [, , ]. Most members of this family of proteins seem to have mitogenic, chemotactic or inflammatory activities. These small cytokines are also called intercrines or chemokines. They are cationic proteins of 70 to 100 amino acid residues that share four conserved cysteine residues involved in two disulphide bonds. These proteins can be sorted into two groups based on the spacing of the two amino-terminal cysteines. In the first group, the two cysteines are separated by a single residue (C-x-C), while in the second group, they are adjacent (C-C). The 'C-C' group is currently known to include monocyte chemotactic proteins 1 (MCP-1),2 (MCP-2), 3 and 4; macrophage inflammatory protein 1 alpha (MIP-1-alpha), beta (MIP-1-beta) and gamma (MIP-1-gamma); macrophage inflammatory protein 3 alpha and beta, 4 and 5; LD78 beta; SIS-epsilon (p500);thymus and activation-regulated chemokine (TARC); Eotaxin; I-309; human proteins HCC-1/NCC-2 and HCC-3; and mouse protein C10.
Protein Domain
Name: HNS-dependent expression A superfamily
Type: Homologous_superfamily
Description: HNS (histone-like nucleoid structuring)-dependent expression A (HdeA) protein is a stress response protein found in highly acid resistant bacteria such as Shigella flexneri and Escherichia coli, but which is lacking in mildly acid tolerant bacteria such as Salmonella [ ]. HdeA is one of the most abundant proteins found in the periplasmic space of E. coli, where it is one of a network of proteins that confer an acid resistance phenotype essential for the pathogenesis of enteric bacteria []. HdeA is thought to act as a chaperone, functioning to prevent the aggregation of periplasmic proteins denatured under acidic conditions. The HNS protein, a chromatin-associated protein that influences the gene expression of several environmentally-induced target genes, represses the expression of HdeA. HdeB, which is encoded within the same operon, may form heterodimers with HdeA. HdeA is a single domain α-helical [] protein with an overall fold that is similar to the fold of the N-terminal subdomain of the GluRS anticodon-binding domain. This entry represents the acid stress chaperone HdeA superfamily.
Protein Domain
Name: Claudin-5
Type: Family
Description: Claudins form the paracellular tight junction seal in epithelial tissues. In humans, 24 claudins (claudin 1-24) have been identified. Their ability to polymerise and form strands is affected by the cell types [ , , ]. They can also form heteropolymers with each other within and between tight junction strands []. Most of the claudins (claudin-12 being the exception) have a C-terminal PDZ-binding motif that can interact with other PDZ domain proteins, such as scaffolding protein, ZO-1, -2 and -3 []. They also interact with non-tight junction proteins, such as cell adhesion proteins EpCam and tetraspanins and the signaling proteins, ephrin A and B and their receptors, EphA and EphB [].Claudin-5 was originally termed lung-specific membrane protein, brain endothelial cell clone 1 protein (BEC1), and transmembrane protein deleted in velo-cardio-facial syndrome (TMVCF). It was reclassified as claudin-5on the basis of cDNA sequence similarity with claudins-1 and -2, and antibody studies that showed it to be expressed at tight junctions []. Claudin-5 may play an important role in development, since the gene is frequently deleted in velo-cardio-facial/DiGeorge syndrome patients [].
Protein Domain
Name: CASK, SH3 domain
Type: Domain
Description: CASK is a scaffolding protein that is highly expressed in the mammalian nervous system and plays roles in synaptic protein targeting, neural development, and gene expression regulation. CASK interacts with many different binding partners including parkin, neurexin, syndecans, calcium channel proteins, caskin, among others, to perform specific functions in different subcellular locations [ ]. Disruption of the CASK gene in mice results in neonatal lethality [] while mutations in the human gene have been associated with X-linked mental retardation []. Drosophila CASK is associated with both pre- and postsynaptic membranes and is crucial in synaptic transmission and vesicle cycling []. CASK contains an N-terminal calmodulin-dependent kinase (CaMK)-like domain, two L27 domains, followed by the core of three domains characteristic of MAGUK (membrane-associated guanylate kinase) proteins: PDZ, SH3, and guanylate kinase (GuK) [ ]. In addition, it also contains the Hook (Protein 4.1 Binding) motif in between the SH3 and GuK domains. The GuK domain in MAGUK proteins is enzymatically inactive; instead, the domain mediates protein-protein interactions and associates intramolecularly with the SH3 domain [].
Protein Domain
Name: HR1 rho-binding domain
Type: Domain
Description: HR1 was first described as a three times repeated homology region of the N-terminal non-catalytic part of protein kinase PRK1(PKN) [ ]. The first two of these repeats were later shown to bind the small G protein rho [, ] known to activate PKN in its GTP-bound form. Similar rho-binding domains also occur in a number of other protein kinases and in the rho-binding proteins rhophilin and rhotekin. Recently, the structure of the N-terminal HR1 repeat complexed with RhoA has been determined by X-ray crystallography. This domain contains two long alpha helices forming a left-handed antiparallel coiled-coil fold termed the antiparallel coiled- coil (ACC) finger domain. The two long helices encompass the basic region and the leucine repeat region, which are identified as the Rho-binding region [, , ].This entry also includes Transducer of Cdc42-dependent actin assembly protein (TOCA) family proteins which contains a central HR1 (also known as Rho effector motif class 1, REM-1) which is closely related to Cdc42-interacting protein 4 (CIP4), effectors of the Rho family small G protein Cdc2 [ ].
Protein Domain
Name: Arrestin C-terminal-like domain
Type: Domain
Description: Arrestins comprise a family of closely-related proteins. In addition to the inactivation of G protein-coupled receptors, arrestins have been implicated in the endocytosis of receptors and cross talk with other signalling pathways. S-Arrestin (retinal S-antigen) is a major protein of the retinal rod outer segments. It interacts with photo-activated phosphorylated rhodopsin, inhibiting or 'arresting' its ability to interact with transducin []. Beta-arrestin-1 and -2, which regulate the function of beta-adrenergic receptors by binding to their phosphorylated forms, impairing their capacity to activate G(S) proteins; Cone photoreceptors C-arrestin (arrestin-X) [], which could bind to phosphorylated red/green opsins; and Drosophila phosrestins I and II, which undergo light-induced phosphorylation, and probably play a role in photoreceptor transduction [, , ]. The crystal structure of bovine retinal arrestin comprises two domains of antiparallel β-sheets connected through a hinge region and one short α-helix on the back of the amino-terminal fold []. This C-terminal domain consists of an immunoglobulin-like β-sandwich structure. This domain is found in arrestins and in other proteins including arrestin domain-containing proteins, protein ROD1 [ ] and ROG3 [] and thioredoxin-interacting protein [].
Protein Domain
Name: SH3PXD2, PX domain
Type: Domain
Description: This entry represents the PX domain of SH3PXD2A and SH3PXD2B. The PX domain is involved in targeting of proteins to phosphoinositide-enriched membranes, and may also be involved in protein-protein interaction [ ].SH3PXD2A, also known as Fish or Tks5, is a scaffold protein with five SH3 domains and one PX domain. It is required for podosome formation, for degradation of the extracellular matrix, and for invasion of some cancer cells [ , ]. SH3PXD2A binding partners include ADAM-family metalloproteases [] and Nck adaptor proteins [].SH3 and PX domain-containing protein 2B (SH3PXD2B, also known as Fad49 or Tks4) is an adaptor protein required for functional podosome formation [ ]. It binds matrix metalloproteinases (ADAMs), NADPH oxidases (NOXs) and phosphoinositides []. It regulates epidermal growth factor-dependent cell migration [] and has been linked to cancer []. Mutations in the SH3PXD2B gene cause Frank-ter Haar syndrome, a rare disease characterised by abnormalities that affect bone, heart, and eye development []. It contains an N-terminal Phox homology (PX) domain and four SH3 domains.
USDA
InterMine logo
The Legume Information System (LIS) is a research project of the USDA-ARS:Corn Insects and Crop Genetics Research in Ames, IA.
LegumeMine || ArachisMine | CicerMine | GlycineMine | LensMine | LupinusMine | PhaseolusMine | VignaMine | MedicagoMine
InterMine © 2002 - 2022 Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, United Kingdom