Search our database by keyword

- or -

Examples

  • Search this entire website. Enter identifiers, names or keywords for genes, pathways, authors, ontology terms, etc. (e.g. eve, embryo, zen, allele)
  • Use OR to search for either of two terms (e.g. fly OR drosophila) or quotation marks to search for phrases (e.g. "dna binding").
  • Boolean search syntax is supported: e.g. dros* for partial matches or fly AND NOT embryo to exclude a term

Search results 11801 to 11900 out of 30763 for seed protein

Category restricted to ProteinDomain (x)

0.031s

Categories

Category: ProteinDomain
Type Details Score
Protein Domain
Name: CRISPR-associated helicase, CYANO-type
Type: Family
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. This entry represents a family of Cas proteins that are strictly associated with the Cyano subtype of CRISPR/Cas locus and found in several species of Cyanobacteria and several archaeal species. These proteins contain helicase motifs and appear to represent the Cas3 protein of the Cyano subtype of CRISPR/Cas system.
Protein Domain
Name: MHC class I-like antigen recognition-like
Type: Domain
Description: Class I MHC glycoproteins are expressed on the surface of all somatic nucleated cells, with the exception of neurons. MHC class I receptors present peptide antigens that are synthesised in the cytoplasm, which includes self-peptides (presented for self-tolerance) as well as foreign peptides (such as viral proteins). These antigens are generated from degraded protein fragments that are transported to the endoplasmic reticulum by TAP proteins (transporter of antigenic peptides), where they can bind MHC I molecules, before being transported to the cell surface via the Golgi apparatus [ , ]. MHC class I receptors display antigens for recognition by cytotoxic T cells, which have the ability to destroy viral-infected or malignant (surfeit of self-peptides) cells.MHC class I molecules are comprised of two chains: a MHC alpha chain (heavy chain), and a beta2-microglobulin chain (light chain), where only the alpha chain spans the membrane. The alpha chain has three extracellular domains (alpha 1-3, with alpha1 being at the N terminus), a transmembrane region and a C-terminal cytoplasmic tail. The soluble extracellular beta-2 microglobulin chain associates primarily with the alpha-3 domain and is necessary for MHC stability. The alpha1 and alpha2 domains of the alpha chain are referred to as the recognition region, because the peptide antigen binds in a deep groove between these two domains. This entry represents MHC antigen-recognition-like domains from:MHC class I, alpha-1 and alpha-2 domains [ ] MHC class I homologue gammadelta T-cell ligand [ ] MHC class I related Ulbp3 [ ] MHC class I related Fc (IgG) receptor, alpha-1 and alpha-2 domains [ ] MHC class I related CD1, alpha-1 and alpha-2 domains [ ] MHC class I related zinc-alpha-2-glycoprotein ZAG (fat depleting factor) [ ] Immunomodulatory protein m144, alpha-1 and alpha-2 domains [ ] Haemochromatosis protein Hfe, alpha-1 and alpha-2 domains [ ] Endothelial protein C receptor (phospholipid-binding protein) [ ] NK cell ligand RAE-1 [ ]. RAE-1 proteins (alpha, beta, delta, and gamma) are distant major histocompatibility complex (MHC) class I homologues, comprising isolated alpha-1 alpha-2 domains, and lack alpha3 domains [].
Protein Domain
Name: MHC class I-like antigen recognition-like superfamily
Type: Homologous_superfamily
Description: Class I MHC glycoproteins are expressed on the surface of all somatic nucleated cells, with the exception of neurons. MHC class I receptors present peptide antigens that are synthesised in the cytoplasm, which includes self-peptides (presented for self-tolerance) as well as foreign peptides (such as viral proteins). These antigens are generated from degraded protein fragments that are transported to the endoplasmic reticulum by TAP proteins (transporter of antigenic peptides), where they can bind MHC I molecules, before being transported to the cell surface via the Golgi apparatus [ , ]. MHC class I receptors display antigens for recognition by cytotoxic T cells, which have the ability to destroy viral-infected or malignant (surfeit of self-peptides) cells.MHC class I molecules are comprised of two chains: a MHC alpha chain (heavy chain), and a beta2-microglobulin chain (light chain), where only the alpha chain spans the membrane. The alpha chain has three extracellular domains (alpha 1-3, with alpha1 being at the N terminus), a transmembrane region and a C-terminal cytoplasmic tail. The soluble extracellular beta-2 microglobulin chain associates primarily with the alpha-3 domain and is necessary for MHC stability. The alpha1 and alpha2 domains of the alpha chain are referred to as the recognition region, because the peptide antigen binds in a deep groove between these two domains. This entry represents MHC antigen-recognition-like domains from:MHC class I, alpha-1 and alpha-2 domains [ ] MHC class I homologue gammadelta T-cell ligand [ ] MHC class I related Ulbp3 [ ] MHC class I related Fc (IgG) receptor, alpha-1 and alpha-2 domains [ ] MHC class I related CD1, alpha-1 and alpha-2 domains [ ] MHC class I related zinc-alpha-2-glycoprotein ZAG (fat depleting factor) [ ] Immunomodulatory protein m144, alpha-1 and alpha-2 domains [ ] Haemochromatosis protein Hfe, alpha-1 and alpha-2 domains [ ] Endothelial protein C receptor (phospholipid-binding protein) [ ] NK cell ligand RAE-1 [ ]. RAE-1 proteins (alpha, beta, delta, and gamma) are distant major histocompatibility complex (MHC) class I homologues, comprising isolated alpha-1 alpha-2 domains, and lack alpha3 domains [].
Protein Domain
Name: Matrix/matrix long, N-terminal
Type: Domain
Description: This entry represents the N-terminal domain of family members such as the Matrix (Mx) and Matrix protein long (ML) proteins. They are found in Thogoto virus (THOV), a tick-transmitted orthomyxovirus with a genome consisting of six single-stranded RNA segments that encode seven structural proteins [ ]. Matrix proteins of the family Orthomyxoviridae are major structural components of the viral capsid, located below the viral lipid membrane and provide protection for viral ribonucleoproteins (vRNPs) []. They serve as a major participant during the processes of virus invasion and budding. Furthermore, they play specific roles throughout the viral life cycle, usually by interacting with other viral components or host cellular proteins [].ML protein, an extended version of the viral M protein, is a viral IFN antagonist. ML is essential for virus growth and pathogenesis in an IFN-competent host. In the presence of ML the activation and/or action of the interferon regulatory factor-3 (IRF-3) is severely affected. This effect depends on direct interaction of ML with the transcription factor IIB (TFIIB). ML suppresses IRF-7 in a similar manner as it suppresses IRF-3. Studies have revealed that ML associates with IRF-7 and prevents IRF-7 dimerization and interaction with TRAF6 [ ].Structural analysis revealed that N-terminal fragment of M protein (MN) undergoes conformational changes that result in specific, pH-dependent inter-molecular interactions. Comparison of THOV MN and influenza A virus (IAV) MN region, showed low sequence identity. However, superimposition of the two structures in neutral condition showed that both matrix proteins contain nine helices connected with same topology. Since the matrix layer of IAV disassembles in acidic endosome at the beginning of infection and repacks in the neutral cytoplasm, a change of pH might be a key regulator for the capsid assembly/disassembly transition during these processes. Hence, pH-dependent conformational transition model was studied in THOV MN, where interactions such as hydrogen bonds and hydrophobic interactions are suggested to be involved in THOV matrix assembly [].
Protein Domain
Name: Zinc finger, TRAF-type
Type: Domain
Description: Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [ , , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. This entry represents TRAF-type zinc finger domains. Some of the proteins that have this domain are mammalian signal transducers associated with the cytoplasmic domain of the 75kDa tumour necrosis factor receptor [ ]. A heterocomplex, homodimer or heterodimer of TRAF1 and TRAF2, binds to the N-terminal of the inhibitor of apoptosis proteins 1 and 2 (IAPS) and recruits them to the tumour necrosis factor receptor 2. Other proteins containing this domain include F45G2.6 protein from Caenorhabditis elegans and DG17 protein from Dictyostelium discoideum (Slime mold).
Protein Domain
Name: Disintegrin domain
Type: Domain
Description: Disintegrins are a family of small proteins from viper venoms that function as potent inhibitors of both platelet aggregation and integrin-dependent cell adhesion [ , ]. Integrin receptors are involved in cell-cell and cell-extracellular matrix interactions, serving as the final common pathway leading to aggregation via formation of platelet-platelet bridges, which are essential in thrombosis and haemostasis. Disintegrins contain an RGD (Arg-Gly-Asp) or KGD (Lys-Gly-Asp) sequence motif that binds specifically to integrin IIb-IIIa receptors on the platelet surface, thereby blocking the binding of fibrinogen to the receptor-glycoprotein complex of activated platelets. Disintegrins act as receptor antagonists, inhibiting aggregation induced by ADP, thrombin, platelet-activating factor and collagen []. The role of disintegrin in preventing blood coagulation renders it of medical interest, particularly with regard to its use as an anti-coagulant [].Disintegrins from different snake species have been characterised: albolabrin, applagin, barbourin, batroxostatin, bitistatin, obtustatin [ ], schistatin [], echistatin [], elegantin, eristicophin, flavoridin [], halysin, kistrin, tergeminin, salmosin [] and triflavin.Disintegrin-like proteins are found in various species ranging from slime mold to humans. Some other proteins known to contain a disintegrin domain are:Some snake venom zinc metalloproteinases [ ] consist of an N-terminal catalytic domain fused to a disintegrin domain. Such is the case for trimerelysin I (HR1B), atrolysin-e (Ht-e) and trigramin. It has been suggested that these proteinases are able to cleave themselves from the disintegrin domains and that the latter may arise from such a post-translational processing.The beta-subunit of guinea pig sperm surface protein PH30 [ ]. PH30 is a protein involved in sperm-egg fusion. The beta subunit contains a disintegrin at the N-terminal extremity.Mammalian epididymial apical protein 1 (EAP I) [ ]. EAP I is associated with the sperm membrane and may play a role in sperm maturation. Structurally, EAP I consists of an N-terminal domain, followed by a zinc metalloproteinase domain, a disintegrin domain, and a large C-terminal domain that contains a transmembrane region.
Protein Domain
Name: Intermediate filament protein, conserved site
Type: Conserved_site
Description: Intermediate filaments (IF) [ ] are proteins which are primordial components of the cytoskeleton and the nuclear envelope. They generally form filamentous structures 8 to 14 nm wide. IF proteins are members of a very large multigene family of proteins, which has been subdivided in six major subgroups: Type I: Acidic cytokeratins. Type II: Basic cytokeratins. Type III: Vimentin, desmin, glial fibrillary acidic protein (GFAP), peripherin, and plasticin. Type IV: Neurofilaments L, H and M, alpha-internexin and nestin. Type V: Nuclear lamins A, B1, B2 and C. Type VI: 'Orphan' IF proteins, which are more distant in terms of their amino acid sequences. All IF proteins are structurally similar in that they consist of: a central rod domain comprising some 300 to 350 residues which is arranged in coiled-coiled α-helices, with at least two short characteristic interruptions; a N-terminal non-helical domain (head) of variable length; and a C-terminaldomain (tail) which is also non-helical, and which shows extreme length variation between different IF proteins.While IF proteins are evolutionary and structurally related, they have limited sequence homologies except in several regions of the rod domain. The IF rod domain is approximately 310 residues long in all cytoplasmic IF proteins and close to 350 residues in the nuclear ones. The IF rod domain exhibits aninterrupted α-helical conformation and reveals a pronounced seven-residue periodicity in the distribution of apolar residues.The heptad periodicity within the rod domain is interrupted in several places, which generates four consecutive α-helical segments: 1A and 1B, whichtogether form the so-called coil 1, and 2A and 2B, which form coil 2. The four α-helical segments are interconnected by relatively short, variablelinkers L1, L12 and L2 [ , ].IF proteins have a very strong tendency to dimerize via the formation of an α-helical coiled coil (CC) by their rod domains [].This entry represents a conserved region situated at the very end of the 2B segment, which is critically involved in specific dimer-dimer interactions within the mature filament [ ].
Protein Domain
Name: Disintegrin, conserved site
Type: Conserved_site
Description: Disintegrins are a family of small proteins from viper venoms that function as potent inhibitors of both platelet aggregation and integrin-dependent cell adhesion [ , ]. Integrin receptors are involved in cell-cell and cell-extracellular matrix interactions, serving as the final common pathway leading to aggregation via formation of platelet-platelet bridges, which are essential in thrombosis and haemostasis. Disintegrins contain an RGD (Arg-Gly-Asp) or KGD (Lys-Gly-Asp) sequence motif that binds specifically to integrin IIb-IIIa receptors on the platelet surface, thereby blocking the binding of fibrinogen to the receptor-glycoprotein complex of activated platelets. Disintegrins act as receptor antagonists, inhibiting aggregation induced by ADP, thrombin, platelet-activating factor and collagen []. The role of disintegrin in preventing blood coagulation renders it of medical interest, particularly with regard to its use as an anti-coagulant [].Disintegrins from different snake species have been characterised: albolabrin, applagin, barbourin, batroxostatin, bitistatin, obtustatin [ ], schistatin [], echistatin [], elegantin, eristicophin, flavoridin [], halysin, kistrin, tergeminin, salmosin [] and triflavin.Disintegrin-like proteins are found in various species ranging from slime mold to humans. Some other proteins known to contain a disintegrin domain are:Some snake venom zinc metalloproteinases [ ] consist of an N-terminal catalytic domain fused to a disintegrin domain. Such is the case for trimerelysin I (HR1B), atrolysin-e (Ht-e) and trigramin. It has been suggested that these proteinases are able to cleave themselves from the disintegrin domains and that the latter may arise from such a post-translational processing.The beta-subunit of guinea pig sperm surface protein PH30 [ ]. PH30 is a protein involved in sperm-egg fusion. The beta subunit contains a disintegrin at the N-terminal extremity.Mammalian epididymial apical protein 1 (EAP I) [ ]. EAP I is associated with the sperm membrane and may play a role in sperm maturation. Structurally, EAP I consists of an N-terminal domain, followed by a zinc metalloproteinase domain, a disintegrin domain, and a large C-terminal domain that contains a transmembrane region.
Protein Domain
Name: Cytochrome P450, B-class
Type: Family
Description: Cytochrome P450 enzymes are a superfamily of haem-containing mono-oxygenases that are found in all kingdoms of life, and which show extraordinary diversity in their reaction chemistry. In mammals, these proteins are found primarily in microsomes of hepatocytes and other cell types, where they oxidise steroids, fatty acids and xenobiotics, and are important for the detoxification and clearance of various compounds, as well as for hormone synthesis and breakdown, cholesterol synthesis and vitamin D metabolism. In plants, these proteins are important for the biosynthesis of several compounds such as hormones, defensive compounds and fatty acids. In bacteria, they are important for several metabolic processes, such as the biosynthesis of antibiotic erythromycin in Saccharopolyspora erythraea (Streptomyces erythraeus).Cytochrome P450 enzymes use haem to oxidise their substrates, using protons derived from NADH or NADPH to split the oxygen so a single atom can be added to a substrate. They also require electrons, which they receive from a variety of redox partners. In certain cases, cytochrome P450 can be fused to its redox partner to produce a bi-functional protein, such as with P450BM-3 from Bacillus megaterium [ ], which has haem and flavin domains.Organisms produce many different cytochrome P450 enzymes (at least 58 in humans), which together with alternative splicing can provide a wide array of enzymes with different substrate and tissue specificities. Individual cytochrome P450 proteins follow the nomenclature: CYP, followed by a number (family), then a letter (subfamily), and another number (protein); e.g. CYP3A4 is the fourth protein in family 3, subfamily A. In general, family members should share >40% identity, while subfamily members should share >55% identity.Cytochrome P450 proteins can also be grouped by two different schemes. One scheme was based on a taxonomic split: class I (prokaryotic/mitochondrial) and class II (eukaryotic microsomes). The other scheme was based on the number of components in the system: class B (3-components) and class E (2-components). These classes merge to a certain degree. Most prokaryotes and mitochondria (and fungal CYP55) have 3-component systems (class I/class B) - a FAD-containing flavoprotein (NAD(P)H-dependent reductase), an iron-sulphur protein and P450. Most eukaryotic microsomes have 2-component systems (class II/class E) - NADPH:P450 reductase (FAD and FMN-containing flavoprotein) and P450. There are exceptions to this scheme, such as 1-component systems that resemble class E enzymes [ , , ]. The class E enzymes can be further subdivided into five sequence clusters, groups I-V, each of which may contain more than one cytochrome P450 family (eg, CYP1 and CYP2 are both found in group I). The divergence of the cytochrome P450 superfamily into B- and E-classes, and further divergence into stable clusters within the E-class, appears to be very ancient, occurring before the appearance of eukaryotes.This entry represents class B cytochrome P450 proteins, which are part of 3-component systems in bacteria, mitochondria and certain fungal enzymes.
Protein Domain
Name: Membrane attack complex component/perforin (MACPF) domain
Type: Domain
Description: The membrane attack complex/perforin (MACPF) domain is conserved in bacteria, fungi, mammals and plants. It was originally identified and named as being common to five complement components (C6, C7, C8-alpha, C8-beta, and C9) and perforin. These molecules perform critical functions in innate and adaptive immunity. The MAC family proteins and perforin are known to participate in lytic pore formation. In response to pathogen infection, a sequential and highly specific interaction between the constituent elements occurs to form transmembrane channels which are known as the membrane-attack complex (MAC).Only a few other MACPF proteins have been characterised and several are thought to form pores for invasion or protection [ , , ]. Examples are proteins from malarial parasites [], the cytolytic toxins from sea anemones [], and proteins that provide plant immunity [, ]. Functionally uncharacterised MACPF proteins are also evident in pathogenic bacteria such as Chlamydia spp [] and Photorhabdus luminescens (Xenorhabdus luminescens) [].The MACPF domain is commonly found to be associated with other N- and C-terminal domains, such as TSP1 (see ), LDLRA (see ), EGF-like (see ),Sushi/CCP/SCR (see ), FIMAC or C2 (see ). They probably control or target MACPF function [ , ]. The MACPF domain oligomerizes, undergoes conformational change, and is required for lytic activity.The MACPF domain consists of a central kinked four-stranded antiparallel beta sheet surrounded by alpha helices and beta strands, forming two structural segments. Overall, the MACPF domain has a thin L-shaped appearance. MACPF domains exhibit limited sequence similarity but contain a signature [YW]-G-[TS]-H-[FY]-x(6)-G-G motif [, , ].Some proteins known to contain a MACPF domain are listed below:Vertebrate complement proteins C6 to C9. Complement factors C6 to C9 assemble to form a scaffold, the membrane attack complex (MAC), that permits C9 polymerisation into pores that lyse Gram-negative pathogens [ , ].Vertebrate perforin. It is delivered by natural killer cells and cytotoxic T lymphocytes and forms oligomeric pores (12 to 18 monomers) in the plasma membrane of either virus-infected or transformed cells.Arabidopsis thaliana (Mouse-ear cress) constitutively activated cell death 1 (CAD1) protein. It is likely to act as a mediator that recognises plant signals for pathogen infection [ ].Arabidopsis thaliana (Mouse-ear cress) necrotic spotted lesions 1 (NSL1) protein [ ].Venomous sea anemone Phyllodiscus semoni (Night anemone) toxins PsTX-60A and PsTX-60B [ ].Venomous sea anemone Actineria villosa (Okinawan sea anemone) toxin AvTX-60A [ ].Plasmodium sporozoite microneme protein essential for cell traversal 2 (SPECT2). It is essential for the membrane-wounding activity of the sporozoite and is involved in its traversal of the sinusoidal cell layer prior to hepatocyte-infection [ ].P. luminescens Plu-MACPF. Although nonlytic, it was shown to bind to cell membranes [ ].Chlamydial putative uncharacterised protein CT153 [ ].
Protein Domain
Name: Membrane attack complex component/perforin domain, conserved site
Type: Conserved_site
Description: The membrane attack complex/perforin (MACPF) domain is conserved in bacteria, fungi, mammals and plants. It was originally identified and named as being common to five complement components (C6, C7, C8-alpha, C8-beta, and C9) and perforin. These molecules perform critical functions in innate and adaptive immunity. The MAC family proteins and perforin are known to participate in lytic pore formation. In response to pathogen infection, a sequential and highly specific interaction between the constituent elements occurs to form transmembrane channels which are known as the membrane-attack complex (MAC).Only a few other MACPF proteins have been characterised and several are thought to form pores for invasion or protection [ , , ]. Examples are proteins from malarial parasites [], the cytolytic toxins from sea anemones [], and proteins that provide plant immunity [, ]. Functionally uncharacterised MACPF proteins are also evident in pathogenic bacteria such as Chlamydia spp [] and Photorhabdus luminescens (Xenorhabdus luminescens) [].The MACPF domain is commonly found to be associated with other N- and C-terminal domains, such as TSP1 (see ), LDLRA (see ), EGF-like (see ),Sushi/CCP/SCR (see ), FIMAC or C2 (see ). They probably control or target MACPF function [, ]. The MACPF domain oligomerizes, undergoes conformational change, and is required for lytic activity.The MACPF domain consists of a central kinked four-stranded antiparallel beta sheet surrounded by alpha helices and beta strands, forming two structural segments. Overall, the MACPF domain has a thin L-shaped appearance. MACPF domains exhibit limited sequence similarity but contain a signature [YW]-G-[TS]-H-[FY]-x(6)-G-G motif [, , ].Some proteins known to contain a MACPF domain are listed below:Vertebrate complement proteins C6 to C9. Complement factors C6 to C9 assemble to form a scaffold, the membrane attack complex (MAC), that permits C9 polymerisation into pores that lyse Gram-negative pathogens [ , ].Vertebrate perforin. It is delivered by natural killer cells and cytotoxic T lymphocytes and forms oligomeric pores (12 to 18 monomers) in the plasma membrane of either virus-infected or transformed cells.Arabidopsis thaliana (Mouse-ear cress) constitutively activated cell death 1 (CAD1) protein. It is likely to act as a mediator that recognises plant signals for pathogen infection [ ].Arabidopsis thaliana (Mouse-ear cress) necrotic spotted lesions 1 (NSL1) protein [ ].Venomous sea anemone Phyllodiscus semoni (Night anemone) toxins PsTX-60A and PsTX-60B [ ].Venomous sea anemone Actineria villosa (Okinawan sea anemone) toxin AvTX-60A [ ].Plasmodium sporozoite microneme protein essential for cell traversal 2 (SPECT2). It is essential for the membrane-wounding activity of the sporozoite and is involved in its traversal of the sinusoidal cell layer prior to hepatocyte-infection [ ].P. luminescens Plu-MACPF. Although nonlytic, it was shown to bind to cell membranes [ ].Chlamydial putative uncharacterised protein CT153 [ ].
Protein Domain
Name: Tify domain
Type: Domain
Description: The tify domain is a 36-amino acid domain only found among Embryophyta (land plants). It has been named after the most conserved amino acid pattern (TIF[F/Y]XG) it contains, but was previously known as the Zim domain. As the use of uppercase characters (TIFY) might imply that the domain is fully conserved across proteins, a lowercase lettering has been chosen in an attempt to highlight the reality of its natural variability. Based on the domain architecture, tify domain containing proteins can be classified into two groups. Group I is formed by proteins possessing a CCT (CONSTANS, CO-like, and TOC1) domain and a GATA-type zinc finger in addition to the tify domain. Group II contains proteins characterised by the tify domain but lacking a GATA-type zinc finger. Tify domain containing proteins might be involved in developmental processes and some of them have features that are characteristic for transcription factors: a nuclear localisation and the presence of a putative DNA-binding domain [ ]. Some proteins known to contain a tify domain include: Arabidopsis thaliana GATA transcription factors (Zinc-finger protein expressed in Inflorescence Meristem, ZIM), a putative transcription factor involved in inflorescence and flower development [ , ]. A. thaliana ZIM-like proteins (ZML) [ ]. A. thaliana Protein TIFY 1-11 [ ].
Protein Domain
Name: DNA repair Rad51/transcription factor NusA, alpha-helical
Type: Homologous_superfamily
Description: This superfamily represents an α-helical bundle domain, which has a SAM domain-like fold. This compact domain consists of a 4-5 helical bundle of two orthogonally packed alpha-hairpins, and contains one classic and one pseudo HhH (helix-hairpin-helix) motif. This domain is found at N-terminal of the DNA repair protein Rad51, at the C-terminal of the transcription elongation protein NusA, and at the C-terminal of the hypothetical protein AF1548.Human Rad51 protein is a homologue of Escherichia coli RecA protein, and functions in DNA repair and recombination [ ]. In higher eukaryotes, Rad51 protein is essential for cell viability. The N-terminal region of Rad51 is highly conserved among eukaryotic Rad51 proteins but is absent from RecA, suggesting a Rad51-specific function for this region. The-terminal domain is involved in interactions with DNA and proteins; DNA binding may be regulated via phosphorylation within the N-terminal domain.NusA (N utilisation substance A) from E. coli is an essential transcription factor that associates with the RNA polymerase (RNAP) core enzyme, where it modulates transcriptional pausing, termination and anti-termination [ ]. The C-terminal of NusA consists of two repeat units, and is responsible for the interaction of NisA with the C-terminal of RNAP, and with its interaction with protein N from phage lambda during anti-termination [].
Protein Domain
Name: Ubiquitin-like domain
Type: Domain
Description: Ubiquitin is a protein of 76 amino acid residues, found in all eukaryotic cells and whose sequence is extremely well conserved from protozoan to vertebrates. Ubiquitin acts through its post-translational attachment (ubiquitinylation) to other proteins, where these modifications alter the function, location or trafficking of the protein, or targets it for destruction by the 26S proteasome [ ].Ubiquitin is a globular protein, the last four C-terminal residues (Leu-Arg-Gly-Gly) extending from the compact structure to form a 'tail', important for its function. The latter is mediated by the covalent conjugation of ubiquitin to target proteins, by an isopeptide linkage between the C-terminal glycine and the epsilon amino group of lysine residues in the target proteins.Ubiquitin is expressed as three different precursors: a polymeric head-to-tail concatemer of identical units (polyubiquitin), and two N-terminal ubiquitin moieties, UbL40 and UbS27, that are fused to the ribosomal polypeptides L40 and S27, respectively. Specific endopeptidases cleave these precursor molecules [ ] to release ubiquitin moieties that are identical in sequence and contribute to the ubiquitin pool []. Some organisms express additional ubiquitin fusion proteins []. Furthermore, there are several ubiquitin-like proteins derived from ubiquitin [].This entry represents a domain characteristic of ubiquitin (Ub) and ubiquitin-like (Ubl) proteins such as SUMO [ , ] and Nedd8 [].
Protein Domain
Name: Formin, FH2 domain
Type: Domain
Description: Formin homology (FH) proteins play a crucial role in the reorganisation of the actin cytoskeleton, which mediates various functions of the cell cortex including motility, adhesion, and cytokinesis [ ]. Formins are multidomain proteins that interact with diverse signalling molecules and cytoskeletal proteins, although some formins have been assigned functions within the nucleus. Formins are characterised by the presence of three FH domains (FH1, FH2 and FH3), although members of the formin family do not necessarily contain all three domains []. The proline-rich FH1 domain mediates interactions with a variety of proteins, including the actin-binding protein profilin, SH3 (Src homology 3) domain proteins, and WW domain proteins. The FH2 domain is required for the self-association of formin proteins through the ability of FH2 domains to directly bind each other [], and may also act to inhibit actin polymerisation []. The FH3 domain () is less well conserved and may be important for determining intracellular localisation of formin family proteins. In addition, some formins can contain a GTPase-binding domain (GBD) ( ) required for binding to Rho small GTPases, and a C-terminal conserved Dia-autoregulatory domain (DAD). This entry represents the FH2 domain, which was shown by X-ray crystallography to have an elongated, crescent shape containing three helical subdomains [ ].
Protein Domain
Name: VWFC domain
Type: Domain
Description: The vWF domain is found in various plasma proteins: complement factors B, C2, CR3 and CR4; the integrins (I-domains); collagen types VI, VII, XII and XIV; and other extracellular proteins [ , , ]. Although the majority of VWA-containing proteins are extracellular, the most ancient ones present in all eukaryotes are all intracellular proteins involved in functions such as transcription, DNA repair, ribosomal and membrane transport and the proteasome. A common feature appears to be involvement in multiprotein complexes. Proteinsthat incorporate vWF domains participate in numerous biological events (e.g. cell adhesion, migration, homing, pattern formation, and signal transduction), involving interaction with a large array of ligands []. A number of human diseases arise from mutations in VWA domains. Secondary structure prediction from 75 aligned vWF sequences has revealed a largely alternating sequence of α-helices and β-strands []. The domain is named after the von Willebrand factor (VWF) type C repeat which is found in multidomain protein/multifunctional proteins involved in maintaining homeostasis [, ]. For the von Willebrand factor the duplicated VWFC domain is thought to participate in oligomerisation, but not in the initial dimerisation step []. The presence of this region in a number of other complex-forming proteins points to the possible involvement of the VWFC domain in complex formation.
Protein Domain
Name: Tetraspanin, animals
Type: Family
Description: Tetraspanins are a distinct family of cell surface proteins, containing four conserved transmembrane domains: a small outer loop (EC1), a larger outer loop (EC2), a small inner loop (IL) and short cytoplasmic tails. They contain characteristic structural features, including 4-6 conserved extracellular cysteine residues, and polar residues within transmembrane domains. A fundamental role of tetraspanins appears to be organising other proteins into a network of multimolecular membrane microdomains, sometimes called the 'tetraspanin web'. Within this web there are primary complexes in which tetraspanins show robust, specific, and direct lateral associations with other proteins. The strong tendency of tetraspanins to associate with each other probably contributes to the assembly of a network of secondary interactions in which non-tetraspanin proteins are associated with each other via palmitoylated tetraspanins acting as linker proteins. In addition, the association of lipids, such as gangliosides and cholesterol, probably contributes to the assembly of even larger tetraspanin complexes, which have some lipid raft-like properties (e.g. resistance to solubilization in non-ionic detergents). Within the tetraspanin web, tetraspanin proteins can associate not only with integrins and other transmembrane proteins, but also with signalling enzymes such as protein kinase C and phosphatidylinositol-4 kinase. Thus, the tetraspanin web provides a mechanistic framework by which membrane protein signalling can be expanded into a lateral dimension [].
Protein Domain
Name: E3 SUMO-protein ligase CBX4
Type: Family
Description: Polycomb group (PcG) proteins were first identified in Drosophila but mammalian homologs have been identified. They interact with each other to form multimeric, chromatin-associated protein complexes. The human orthologs of the Drosophila PcG proteins are CBX2, CBX4, CBX6, CBX7 and CBX8, which exhibit distinct nuclear localisations and they contribute differently to transcriptional repression. They have been described as regulators of homeotic gene expression. It has been demonstrated that these proteins maintain the repressed state of these genes, involved in development, signalling or cancer [ ]. PcG complexes can be recruited to DNA by interactions with specific DNA binding proteins [].Sumoylation is a reversible conjugation process similar to ubiquitination, in which proteins involved in a wide range of processes are modified, such as DNA repair, chromosome segregation and gene expression. It implies an activating enzyme (E1), a conjugating enzyme (E2), and a ligase (E3) As well as in the ubiquitination process, the E3 ligase determines substrate specificity. Polycomb protein Pc2, also known as CBX4, is a SUMO (small ubiquitin-related modifier) ligase E3. DNA damage activates CBX4 which sumoylates the heterogeneous nuclear ribonucleoprotein K (hnRNP K), a p53 cofactor required for transcriptional regulation of p53 target genes [ ].This entry includes de E3 SUMO-protein ligase CBX4 (also known as Pc2).
Protein Domain
Name: Formin, FH2 domain superfamily
Type: Homologous_superfamily
Description: Formin homology (FH) proteins play a crucial role in the reorganisation of the actin cytoskeleton, which mediates various functions of the cell cortex including motility, adhesion, and cytokinesis [ ]. Formins are multidomain proteins that interact with diverse signalling molecules and cytoskeletal proteins, although some formins have been assigned functions within the nucleus. Formins are characterised by the presence of three FH domains (FH1, FH2 and FH3), although members of the formin family do not necessarily contain all three domains []. The proline-rich FH1 domain mediates interactions with a variety of proteins, including the actin-binding protein profilin, SH3 (Src homology 3) domain proteins, and WW domain proteins. The FH2 domain is required for the self-association of formin proteins through the ability of FH2 domains to directly bind each other [], and may also act to inhibit actin polymerisation []. The FH3 domain () is less well conserved and may be important for determining intracellular localisation of formin family proteins. In addition, some formins can contain a GTPase-binding domain (GBD) ( ) required for binding to Rho small GTPases, and a C-terminal conserved Dia-autoregulatory domain (DAD). This superfamily represents the FH2 domain, which was shown by X-ray crystallography to have an elongated, crescent shape containing three helical subdomains [ ].
Protein Domain
Name: Liprin-beta, SAM domain repeat 3
Type: Domain
Description: This entry represents the SAM (sterile alpha motif) domain repeat 3 of liprin-beta protein. Liprin-beta proteins contain three copies (repeats) of SAM domain, which is a protein-protein interaction domain. They may form heterodimers with liprin-alpha proteins through their SAM domains [ ]. Liprin beta1 has been shown to interacts with metastasis-associated protein S100A4 (Mts1), and this interaction results in the inhibition of liprin-beta-1 phosphorylation by protein kinase C and protein kinase CK2 in vitro [].
Protein Domain
Name: Peptidase C24, Calicivirus polyprotein Orf1
Type: Domain
Description: The two signatures that defines this group of calivirus polyproteins identify a cysteine peptidase signature that belongs to MEROPS peptidase family C24 (clan PA(C)). Caliciviruses are positive-stranded ssRNA viruses that cause gastroenteritis. The calicivirus genome contains two open reading frames, ORF1 and ORF2. ORF2 encodes a structural protein []; while ORF1 encodes a non-structural polypeptide, which has RNA helicase, cysteine protease and RNA polymerase activity. The regions of the polyprotein inwhich these activities lie are similar to proteins produced by the picornaviruses. Two different families of caliciviruses can be distinguished on the basis of sequence similarity, namely those classified as small round structured viruses (SRSVs) and those classed as non-SRSVs. Calicivirus proteases from the non-SRSV group, which are members of the PA protease clan, constitute family C24 of the cysteine proteases (proteasesfrom SRSVs belong to the C37 family). As mentioned above, the protease activity resides within a polyprotein. The enzyme cleaves the polyproteinat sites N-terminal to itself, liberating the polyprotein helicase.A cysteine peptidase is a proteolytic enzyme that hydrolyses a peptide bond using the thiol group of a cysteine residue as a nucleophile. Hydrolysis involves usually a catalytic triad consisting of the thiol group of the cysteine, the imidazolium ring of a histidine, and a third residue, usually asparagine or aspartic acid, to orientate and activate the imidazolium ring. In only one family of cysteine peptidases, is the role of the general base assigned to a residue other than a histidine: in peptidases from family C89 (acid ceramidase) an arginine is the general base. Cysteine peptidases can be grouped into fourteen different clans, with members of each clan possessing a tertiary fold unique to the clan. Four clans of cysteine peptidases share structural similarities with serine and threonine peptidases and asparagine lyases. From sequence similarities, cysteine peptidases can be clustered into over 80 different families [ ]. Clans CF, CM, CN, CO, CP and PD contain only one family.Cysteine peptidases are often active at acidic pH and are therefore confined to acidic environments, such as the animal lysosome or plant vacuole. Cysteine peptidases can be endopeptidases, aminopeptidases, carboxypeptidases, dipeptidyl-peptidases or omega-peptidases. They are inhibited by thiol chelators such as iodoacetate, iodoacetic acid, N-ethylmaleimide or p-chloromercuribenzoate. Clan CA includes proteins with a papain-like fold. There is a catalytic triad which occurs in the order: Cys/His/Asn (or Asp). A fourth residue, usually Gln, is important for stabilising the acyl intermediate that forms during catalysis, and this precedes the active site Cys. The fold consists of two subdomains with the active site between them. One subdomain consists of a bundle of helices, with the catalytic Cys at the end of one of them, and the other subdomain is a β-barrel with the active site His and Asn (or Asp). There are over thirty families in the clan, and tertiary structures have been solved for members of most of these. Peptidases in clan CA are usually sensitive to the small molecule inhibitor E64, which is ineffective against peptidases from other clans of cysteine peptidases [ ].Clan CD includes proteins with a caspase-like fold. Proteins in the clan have an α/β/α sandwich structure. There is a catalytic dyad which occurs in the order His/Cys. The active site His occurs in a His-Gly motif and the active site Cys occurs in an Ala-Cys motif; both motifs are preceded by a block of hydrophobic residues [ ]. Specificity is predominantly directed towards residues that occupy the S1 binding pocket, so that caspases cleave aspartyl bonds, legumains cleave asparaginyl bonds, and gingipains cleave lysyl or arginyl bonds.Clan CE includes proteins with an adenain-like fold. The fold consists of two subdomains with the active site between them. One domain is a bundle of helices, and the other a β-barrel. The subdomains are in the opposite order to those found in peptidases from clan CA, and this is reflected in the order of active site residues: His/Asn/Gln/Cys. This has prompted speculation that proteins in clans CA and CE are related, and that members of one clan are derived from a circular permutation of the structure of the other.Clan CL includes proteins with a sortase B-like fold. Peptidases in the clan hydrolyse and transfer bacterial cell wall peptides. The fold shows a closed β-barrel decorated with helices with the active site at one end of the barrel [ ]. The active site consists of a His/Cys catalytic dyad.Cysteine peptidases with a chymotrypsin-like fold are included in clan PA, which also includes serine peptidases. Cysteine peptidases that are N-terminal nucleophile hydrolases are included in clan PB. Cysteine peptidases with a tertiary structure similar to that of the serine-type aspartyl dipeptidase are included in clan PC. Cysteine peptidases with an intein-like fold are included in clan PD, which also includes asparagine lyases.
Protein Domain
Name: Cytochrome P450, E-class, CYP3A
Type: Family
Description: Cytochrome P450 enzymes are a superfamily of haem-containing mono-oxygenases that are found in all kingdoms of life, and which show extraordinary diversity in their reaction chemistry. In mammals, these proteins are found primarily in microsomes of hepatocytes and other cell types, where they oxidise steroids, fatty acids and xenobiotics, and are important for the detoxification and clearance of various compounds, as well as for hormone synthesis and breakdown, cholesterol synthesis and vitamin D metabolism. In plants, these proteins are important for the biosynthesis of several compounds such as hormones, defensive compounds and fatty acids. In bacteria, they are important for several metabolic processes, such as the biosynthesis of antibiotic erythromycin in Saccharopolyspora erythraea (Streptomyces erythraeus).Cytochrome P450 enzymes use haem to oxidise their substrates, using protons derived from NADH or NADPH to split the oxygen so a single atom can be added to a substrate. They also require electrons, which they receive from a variety of redox partners. In certain cases, cytochrome P450 can be fused to its redox partner to produce a bi-functional protein, such as with P450BM-3 from Bacillus megaterium [ ], which has haem and flavin domains.Organisms produce many different cytochrome P450 enzymes (at least 58 in humans), which together with alternative splicing can provide a wide array of enzymes with different substrate and tissue specificities. Individual cytochrome P450 proteins follow the nomenclature: CYP, followed by a number (family), then a letter (subfamily), and another number (protein); e.g. CYP3A4 is the fourth protein in family 3, subfamily A. In general, family members should share >40% identity, while subfamily members should share >55% identity.Cytochrome P450 proteins can also be grouped by two different schemes. One scheme was based on a taxonomic split: class I (prokaryotic/mitochondrial) and class II (eukaryotic microsomes). The other scheme was based on the number of components in the system: class B (3-components) and class E (2-components). These classes merge to a certain degree. Most prokaryotes and mitochondria (and fungal CYP55) have 3-component systems (class I/class B) - a FAD-containing flavoprotein (NAD(P)H-dependent reductase), an iron-sulphur protein and P450. Most eukaryotic microsomes have 2-component systems (class II/class E) - NADPH:P450 reductase (FAD and FMN-containing flavoprotein) and P450. There are exceptions to this scheme, such as 1-component systems that resemble class E enzymes [ , , ]. The class E enzymes can be further subdivided into five sequence clusters, groups I-V, each of which may contain more than one cytochrome P450 family (eg, CYP1 and CYP2 are both found in group I). The divergence of the cytochrome P450 superfamily into B- and E-classes, and further divergence into stable clusters within the E-class, appears to be very ancient, occurring before the appearance of eukaryotes.This entry represents the CYP2J family from group I, class E, cytochrome P450 proteins, as well as other CYP2 family proteins. The CYP2 family comprises 15 subfamilies (A-H, J-N, P and Q). The first five (A-E) are present in mammalian liver, but in differing amounts and with different inducibilities. These five subfamilies show varied substrate specificities, with some degree of overlap. CYP3A family enzymes are of major importance in the mammalian (especially human) detoxification of xenobiotics. CYP3A4, CYP3A5 and CYP3A7 catalyses the metabolism of a wide variety of substrates, including over 50% of therapeutic drugs. CYP3A enzymes are predominantly (not exclusively) expressed in the liver and intestine. Both genetic and environmental factors such as diet (especially grapefruit and St John's wort) can affect CYP3A activity, which can alter the efficacy and clearance of drugs [ ]. In addition, CYP3A may play a role in breast and prostate carcinogenesis through its role in controlling the level of se hormones drugs [].
Protein Domain
Name: Cytochrome P450, E-class, group I, CYP2J-like
Type: Family
Description: Cytochrome P450 enzymes are a superfamily of haem-containing mono-oxygenases that are found in all kingdoms of life, and which show extraordinary diversity in their reaction chemistry. In mammals, these proteins are found primarily in microsomes of hepatocytes and other cell types, where they oxidise steroids, fatty acids and xenobiotics, and are important for the detoxification and clearance of various compounds, as well as for hormone synthesis and breakdown, cholesterol synthesis and vitamin D metabolism. In plants, these proteins are important for the biosynthesis of several compounds such as hormones, defensive compounds and fatty acids. In bacteria, they are important for several metabolic processes, such as the biosynthesis of antibiotic erythromycin in Saccharopolyspora erythraea (Streptomyces erythraeus).Cytochrome P450 enzymes use haem to oxidise their substrates, using protons derived from NADH or NADPH to split the oxygen so a single atom can be added to a substrate. They also require electrons, which they receive from a variety of redox partners. In certain cases, cytochrome P450 can be fused to its redox partner to produce a bi-functional protein, such as with P450BM-3 from Bacillus megaterium [ ], which has haem and flavin domains.Organisms produce many different cytochrome P450 enzymes (at least 58 in humans), which together with alternative splicing can provide a wide array of enzymes with different substrate and tissue specificities. Individual cytochrome P450 proteins follow the nomenclature: CYP, followed by a number (family), then a letter (subfamily), and another number (protein); e.g. CYP3A4 is the fourth protein in family 3, subfamily A. In general, family members should share >40% identity, while subfamily members should share >55% identity.Cytochrome P450 proteins can also be grouped by two different schemes. One scheme was based on a taxonomic split: class I (prokaryotic/mitochondrial) and class II (eukaryotic microsomes). The other scheme was based on the number of components in the system: class B (3-components) and class E (2-components). These classes merge to a certain degree. Most prokaryotes and mitochondria (and fungal CYP55) have 3-component systems (class I/class B) - a FAD-containing flavoprotein (NAD(P)H-dependent reductase), an iron-sulphur protein and P450. Most eukaryotic microsomes have 2-component systems (class II/class E) - NADPH:P450 reductase (FAD and FMN-containing flavoprotein) and P450. There are exceptions to this scheme, such as 1-component systems that resemble class E enzymes [ , , ]. The class E enzymes can be further subdivided into five sequence clusters, groups I-V, each of which may contain more than one cytochrome P450 family (eg, CYP1 and CYP2 are both found in group I). The divergence of the cytochrome P450 superfamily into B- and E-classes, and further divergence into stable clusters within the E-class, appears to be very ancient, occurring before the appearance of eukaryotes.This entry represents the CYP2J family from group I, class E, cytochrome P450 proteins, as well as other CYP2 family proteins. The CYP2 family comprises 15 subfamilies (A-H, J-N, P and Q). The first five (A-E) are present in mammalian liver, but in differing amounts and with different inducibilities. These five subfamilies show varied substrate specificities, with some degree of overlap. Several CYP2J isoforms have been reported, including rabbit CYP2J1; human CYP2J2, the only member of the human CYP2J subfamily known for its role as arachidonic acid epoxygenase and antihistamine drugs are also substrates of it [ ]; rat CYP2J3 and CYP2J4; mouse CYP2J5, CYP2J6, CYP2J7, CYP2J8 and CYP2J9; and rat CYP2J10. Both rat CYP2J3 and human CYP2J2 catalyse vitamin D 25-hydroxylase, but with distinct preferences: rat for vitamin D3 and human for vitamin D2 []. Rat CYP2J4 is expressed in the intestine, where it is active towards all-trans and 9-cis-retinal, producing the corresponding retinoic acids []. Mouse CYP2J5 is abundant in the kidney, where it is active in the metabolism of arachidonic acid to epoxyeicosatrienoic acids, and may be influenced by sex hormones [].
Protein Domain
Name: Globin, bacterial-like, conserved site
Type: Conserved_site
Description: Globins are haem-containing proteins involved in binding and/or transporting oxygen. They belong to a very large and well studied family that is widely distributed in many organisms [ ]. Globins have evolved from a common ancestor and can be divided into three groups: single-domain globins, and two types of chimeric globins, flavohaemoglobins and globin-coupled sensors. Bacteria have all three types of globins, while archaea lack flavohaemoglobins, and eukaryotes lack globin-coupled sensors []. Several functionally different haemoglobins can coexist in the same species. The major types of globins include:Haemoglobin (Hb): tetramer of two alpha and two beta chains, although embryonic and foetal forms can substitute the alpha or beta chain for ones with higher oxygen affinity, such as gamma, delta, epsilon or zeta chains. Hb transports oxygen from lungs to other tissues in vertebrates [ ]. Hb proteins are also present in unicellular organisms where they act as enzymes or sensors [].Myoglobin (Mb): monomeric protein responsible for oxygen storage in vertebrate muscle [ ].Neuroglobin: a myoglobin-like haemprotein expressed in vertebrate brain and retina, where it is involved in neuroprotection from damage due to hypoxia or ischemia [ ]. Neuroglobin belongs to a branch of the globin family that diverged early in evolution. Cytoglobin: an oxygen sensor expressed in multiple tissues. Related to neuroglobin [ ].Erythrocruorin: highly cooperative extracellular respiratory proteins found in annelids and arthropods that are assembled from as many as 180 subunit into hexagonal bilayers [ ].Leghaemoglobin (legHb or symbiotic Hb): occurs in the root nodules of leguminous plants, where it facilitates the diffusion of oxygen to symbiotic bacteriods in order to promote nitrogen fixation.Non-symbiotic haemoglobin (NsHb): occurs in non-leguminous plants, and can be over-expressed in stressed plants [ ].Flavohaemoglobins (FHb): chimeric, with an N-terminal globin domain and a C-terminal ferredoxin reductase-like NAD/FAD-binding domain. FHb provides protection against nitric oxide via its C-terminal domain, which transfers electrons to haem in the globin [ ].Globin-coupled sensors: chimeric, with an N-terminal myoglobin-like domain and a C-terminal domain that resembles the cytoplasmic signalling domain of bacterial chemoreceptors. They bind oxygen, and act to initiate an aerotactic response or regulate gene expression [ , ]. Protoglobin: a single domain globin found in archaea that is related to the N-terminal domain of globin-coupled sensors [ ].Truncated 2/2 globin: lack the first helix, giving them a 2-over-2 instead of the canonical 3-over-3 α-helical sandwich fold. Can be divided into three main groups (I, II and II) based on structural features [ ].This entry represents a group of haemoglobin-like proteins found in eubacteria, cyanobacteria, protozoa, algae and plants, but not in animals or yeast. These proteins have a truncated 2-over-2 rather than the canonical 3-over-3 α-helical sandwich fold [ ]. This entry includes:HbN (or GlbN): a truncated haemoglobin-like protein that binds oxygen cooperatively with a very high affinity and a slow dissociation rate, which may exclude it from oxygen transport. It appears to be involved in bacterial nitric oxide detoxification and in nitrosative stress [ ].Cyanoglobin (or GlbN): a truncated haemoprotein found in cyanobacteria that has high oxygen affinity, and which appears to serve as part of a terminal oxidase, rather than as a respiratory pigment [ ].HbO (or GlbO): a truncated haemoglobin-like protein with a lower oxygen affinity than HbN. HbO associates with the bacterial cell membrane, where it significantly increases oxygen uptake over membranes lacking this protein. HbO appears to interact with a terminal oxidase, and could participate in an oxygen/electron-transfer process that facilitates oxygen transfer during aerobic metabolism [].Glb3: a nuclear-encoded truncated haemoglobin from plants that appears more closely related to HbO than HbN. Glb3 from Arabidopsis thaliana (Mouse-ear cress) exhibits an unusual concentration-independent binding of oxygen and carbon dioxide [ ].These proteins in this entry contain a conserved histidine which could be involved in heme-binding. This signature represents the conserved region that ends with this histidine residue.
Protein Domain
Name: Peptidase S1C
Type: Family
Description: This group of serine peptidases and non-peptidase homologues belong to the MEROPS peptidase family S1, subfamily S1C (protease Do subfamily, clan PS(S)). A type example is the protease Do from Escherichia coli. Other members of this group include the E. coli htrA gene product (HrtA or DegP protein), which is essential for bacterial survival at temperatures above 42 degrees [, ] and for digesting misfolded protein in the periplasm. Mature DegP from E. coli has 448 residues, of which His105, Asp135, and Ser210 form the catalytic triad []. The protein has an N-terminal sequence typical of a leader peptide. Structural analysis indicates that bacterial HtrA is a serine protease belonging to the family of age-forming proteases and that only unfolded polypeptides can be threaded in extended conformation into the cage to access the proteolytic sites. Disulphide bonds of partially unfolded substrates impede protein breakdown and represent a conformational constraint for entering the inner cavity. This preference for unfolded polypeptides might be also a reason for the ATP-independent mode of action and for the increased proteolytic activity at higher temperatures [].The HtrA family shares a modular architecture composed of an N-terminal segment believed to have regulatory functions, a conserved trypsin-like protease domain, and one or two PDZ domains which mediate specific protein-protein interactions and bind preferentially to the C-terminal three to four residues of the target protein. HtrA belongs to the trypsin clan SA. SA proteases have a two-domain structure with each domain forming a six-stranded barrel. The active site cleft is located at the interface of the two perpendicularly arranged barrel domains. The active site is constructed by several loops located at the C-terminal side of both barrel domains. The functional unit of HtrA appears to be a trimer, which is stabilised exclusively by residues of the protease domains. The basic trimer has a funnel-like shape with the protease domains located at its top and the PDZ domains protruding to the outside. Once substrates have been bound, they have to be delivered into the interior of the funnel and the proteolytic sites. In contrast to other protease-chaperone systems, ATP does not drive binding and release of substrates [].The degQ and degS genes of E. coli encode proteins of 455 and 355 residues that are homologues of the DegP protease []. Purified DegQ protein has the properties of a serine endopeptidase, and is processed by the removal of a 27-residue N-terminal signal sequence. Deletion studies suggest that DegQ, like DegP, functions as a periplasmic protease in vivo[ ].An example of a non-peptidase homologue in this entry is the anti-sigma-I factor RsgI9 from Clostridium thermocellum, which has the catalytic serine replaced with threonine.This entry also includes the membrane transporter protein MamO and the magnetosome formation protease MamE. MamO promotes magnetite nucleation/formation and activates the MamE protease [ , ]. MamE is required for correct localization of proteins to the magnetosome while the protease activity is required for maturation of small magnetite crystals into larger, functional ones [].
Protein Domain
Name: U1-C, C2H2-type zinc finger
Type: Domain
Description: Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [ , , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. C2H2-type (classical) zinc fingers (Znf) were the first class to be characterised. They contain a short β hairpin and an α helix (β/β/α structure), where a single zinc atom is held in place by Cys(2)His(2) (C2H2) residues in a tetrahedral array. C2H2 Znf's can be divided into three groups based on the number and pattern of fingers: triple-C2H2 (binds single ligand), multiple-adjacent-C2H2 (binds multiple ligands), and separated paired-C2H2 [ ]. C2H2 Znf's are the most common DNA-binding motifs found in eukaryotic transcription factors, and have also been identified in prokaryotes []. Transcription factors usually contain several Znf's (each with a conserved β/β/α structure) capable of making multiple contacts along the DNA, where the C2H2 Znf motifs recognise DNA sequences by binding to the major groove of DNA via a short α-helix in the Znf, the Znf spanning 3-4 bases of the DNA []. C2H2 Znf's can also bind to RNA and protein targets [].This entry represents a C2H2-type zinc finger motif found in several U1 small nuclear ribonucleoprotein C (U1-C) proteins. Some proteins contain multiple copies of this motif. The U1 small nuclear ribonucleoprotein (U1 snRNP) binds to the pre-mRNA 5' splice site at early stages of spliceosome assembly. Recruitment of U1 to a class of weak 5' splice site is promoted by binding of the protein TIA-1 to uridine-rich sequences immediately downstream from the 5' splice site. Binding of TIA-1 in the vicinity of a 5' splice site helps to stabilise U1 snRNP recruitment, at least in part, via a direct interaction with U1-C, thus providing one molecular mechanism for the function of this splicing regulator [ ].
Protein Domain
Name: Cytochrome P450, conserved site
Type: Conserved_site
Description: Cytochrome P450 enzymes are a superfamily of haem-containing mono-oxygenases that are found in all kingdoms of life, and which show extraordinary diversity in their reaction chemistry. In mammals, these proteins are found primarily in microsomes of hepatocytes and other cell types, where they oxidise steroids, fatty acids and xenobiotics, and are important for the detoxification and clearance of various compounds, as well as for hormone synthesis and breakdown, cholesterol synthesis and vitamin D metabolism. In plants, these proteins are important for the biosynthesis of several compounds such as hormones, defensive compounds and fatty acids. In bacteria, they are important for several metabolic processes, such as the biosynthesis of antibiotic erythromycin in Saccharopolyspora erythraea (Streptomyces erythraeus).Cytochrome P450 enzymes use haem to oxidise their substrates, using protons derived from NADH or NADPH to split the oxygen so a single atom can be added to a substrate. They also require electrons, which they receive from a variety of redox partners. In certain cases, cytochrome P450 can be fused to its redox partner to produce a bi-functional protein, such as with P450BM-3 from Bacillus megaterium [ ], which has haem and flavin domains.Organisms produce many different cytochrome P450 enzymes (at least 58 in humans), which together with alternative splicing can provide a wide array of enzymes with different substrate and tissue specificities. Individual cytochrome P450 proteins follow the nomenclature: CYP, followed by a number (family), then a letter (subfamily), and another number (protein); e.g. CYP3A4 is the fourth protein in family 3, subfamily A. In general, family members should share >40% identity, while subfamily members should share >55% identity.Cytochrome P450 proteins can also be grouped by two different schemes. One scheme was based on a taxonomic split: class I (prokaryotic/mitochondrial) and class II (eukaryotic microsomes). The other scheme was based on the number of components in the system: class B (3-components) and class E (2-components). These classes merge to a certain degree. Most prokaryotes and mitochondria (and fungal CYP55) have 3-component systems (class I/class B) - a FAD-containing flavoprotein (NAD(P)H-dependent reductase), an iron-sulphur protein and P450. Most eukaryotic microsomes have 2-component systems (class II/class E) - NADPH:P450 reductase (FAD and FMN-containing flavoprotein) and P450. There are exceptions to this scheme, such as 1-component systems that resemble class E enzymes [ , , ]. The class E enzymes can be further subdivided into five sequence clusters, groups I-V, each of which may contain more than one cytochrome P450 family (eg, CYP1 and CYP2 are both found in group I). The divergence of the cytochrome P450 superfamily into B- and E-classes, and further divergence into stable clusters within the E-class, appears to be very ancient, occurring before the appearance of eukaryotes.This entry represents a conserved site based around a highly conserved cysteine residue involved in binding haem iron in the fifth coordination site, which is found in the C-terminal regions of P450 proteins.
Protein Domain
Name: Zinc finger, RING-type
Type: Domain
Description: Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [, , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. This entry represents RING-type zinc finger domains. The RING-finger is a specialised type of Zn-finger of 40 to 60 residues that binds two atoms of zinc, and is probably involved in mediating protein-protein interactions [ , , ]. There are two different variants, the C3HC4-type and a C3H2C3-type, which are clearly related despite the different cysteine/histidine pattern. The latter type is sometimes referred to as 'RING-H2 finger'. The RING domain is a protein interaction domain that has been implicated in a range of diverse biological processes. E3 ubiquitin-protein ligase activity is intrinsic to the RING domain of c-Cbl and is likely to be a general function of this domain. E3 ubiquitin-protein ligases determine the substrate specificity for ubiquitylation and have been classified into HECT and RING-finger families. More recently, however, U-box proteins, which contain a domain (the U box) of about 70 amino acids that is conserved from yeast to humans, have been identified as a new type of E3 []. Various RING fingers also exhibit binding to E2 ubiquitin-conjugating enzymes (Ubc's) [, , ].Several 3D-structures for RING-fingers are known [ , ]. The 3D structure of the zinc ligation system is unique to the RING domain and is referred to as the 'cross-brace' motif. The spacing of the cysteines in such a domain is C-x(2)-C-x(9 to 39)-C-x(1 to 3)-H-x(2 to 3)-C-x(2)-C-x(4 to 48)-C-x(2)-C. Metal ligand pairs one and three co-ordinate to bind one zinc ion, whilst pairs two and four bind the second.Note that in the older literature, some RING-fingers are denoted as LIM-domains. The LIM-domain Zn-finger is a fundamentally different family, albeit with similar Cys-spacing (see ).
Protein Domain
Name: STAT transcription factor, DNA-binding, N-terminal
Type: Homologous_superfamily
Description: The STAT protein (Signal Transducers and Activators of Transcription) family contains transcription factors that are specifically activated to regulate gene transcription when cells encounter cytokines and growth factors, hence they act as signal transducers in the cytoplasm and transcription activators in the nucleus [ ]. Binding of these factors to cell-surface receptors leads to receptor autophosphorylation at a tyrosine, the phosphotyrosine being recognised by the STAT SH2 domain, which mediates the recruitment of STAT proteins from the cytosol and their association with the activated receptor. The STAT proteins are then activated by phosphorylation via members of the JAK family of protein kinases, causing them to dimerise and translocated to the nucleus, where they bind to specific promoter sequences in target genes. In mammals, STATs comprise a family of seven structurally and functionally related proteins: Stat1, Stat2, Stat3, Stat4, Stat5a and Stat5b, Stat6. STAT proteins play a critical role in regulating innate and acquired host immune responses. Dysregulation of at least two STAT signalling cascades (i.e. Stat3 and Stat5) is associated with cellular transformation.Signalling through the JAK/STAT pathway is initiated when a cytokine binds to its corresponding receptor. This leads to conformational changes in the cytoplasmic portion of the receptor, initiating activation of receptor associated members of the JAK family of kinases. The JAKs, in turn, mediate phosphorylation at the specific receptor tyrosine residues, which then serve as docking sites for STATs and other signalling molecules. Once recruited to the receptor, STATs also become phosphorylated by JAKs, on a single tyrosine residue. Activated STATs dissociate from the receptor, dimerise, translocate to the nucleus and bind to members of the GAS (gamma activated site) family of enhancers.The seven STAT proteins identified in mammals range in size from 750 and 850 amino acids. The chromosomal distribution of these STATs, as well as the identification of STATs in more primitive eukaryotes, suggest that this family arose from a single primordial gene. STATs share 6 structurally and functionally conserved domains including: an N-terminal domain (ND) that strengthens interactions between STAT dimers on adjacent DNA-binding sites; a coiled-coil STAT domain (CCD) that is implicated in protein-protein interactions; a DNA-binding domain (DBD) with an immunoglobulin-like fold similar to p53 tumour suppressor protein; an EF-hand-like linker domain connecting the DNA-binding and SH2 domains; an SH2 domain ( ) that acts as a phosphorylation-dependent switch to control receptor recognition and DNA-binding; and a C-terminal transactivation domain [ , , ]. The crystal structure of the N terminus of Stat4 reveals a dimer. The interface of this dimer is formed by a ring-shaped element consisting of five short helices. Several studies suggest that this N-terminal dimerisation promotes cooperativity of binding to tandem GAS elements and with the transcriptional coactivator CBP/p300.This superfamily represents the N terminus part of the p53-like DNA-binding domain of STAT proteins. Both the DNA-binding domain and the linker domain help determine DNA-specificity.
Protein Domain
Name: ASXL, HARE-HTH domain
Type: Domain
Description: This domain, known as the HARE-HTH domain, adopts the winged helix-turn-helix fold and is predicted to bind DNA. It can be found at the N terminus of the ASXL protein. It can also be found in several other eukaryotic chromatin proteins (such as HB1 in plants), diverse restriction endonucleases and DNA glycosylases, the RNA polymerase delta subunit of Gram-positive bacteria and certain bacterial proteins that combine features of the RNA polymerase alpha-subunit and sigma factors [ ]. The genetic interaction of the HARE-HTH containing ASXL with the methyl cytosine hydroxylating Tet2 protein is suggestive of a role for the domain in discriminating sequences with DNA modifications such as hmC []. Bacterial versions include fusions to diverse restriction endonucleases, and a DNA glycosylase where it may play a similar role in detecting modified DNA. Certain bacterial version of the HARE-HTH domain show fusions to the helix-hairpin-helix domain of the RNA polymerase alpha subunit and the HTH domains found in regions 3 and 4 of the sigma factors []. These versions are predicted to function as a novel inhibitor of the binding of RNA polymerase to transcription start sites, similar to the Bacillus delta protein [, ].This domain consists of four α-helices (helices I-II-III-IV) and an antiparallel β-sheet composed of three short β-strands at the top of a "twisted tripod"formed by helices II, III, and IV [ ].The Asx-like (Asxl) proteins includes Asxl1-3. They are putative Polycomb group (PcG) proteins, which act by forming multiprotein complexes that are required to maintain the transcriptionally repressive state of homeotic genes throughout development. Asxl1 is involved in transcriptional regulation mediated by ligand-bound retinoic acid receptors (RARs) and peroxisome proliferator-activated receptor gamma (PPARG) [ ].The delta protein is a dispensable subunit of Bacillus subtilis RNA polymerase (RNAP) that has major effects on the biochemical properties of the purified enzyme. In the presence of delta, RNAP displays an increased specificity of transcription, a decreased affinity for nucleic acids, and an increased efficiency of RNA synthesis because of enhanced recycling [ ]. The delta protein, contains two distinct regions, an N-terminal domain and a glutamate and aspartate residue-rich C-terminal region []. It participates in both the initiation and recycling phases of transcription.
Protein Domain
Name: Beta-lactoglobulin
Type: Family
Description: Beta-lactoglobulin (Blg) is the major protein component of milk from a wide range of species but not human. Glycodelin or PP14 protein is the human equivalent of Blg and is secreted into the endometrium. Blg binds a wide variety of hydrophobic ligands but its function remains unknown. The crystalstructure of Blg has been solved [ ], confirming membership of the lipocalin protein family.This entry also includes human glycodelin. It has four glycoforms, namely glycodelin-S, -A, -F and -C have been identified in reproductive tissues that differ in glycosylation and biological activity. Glycodelin-A has contraceptive and immunosuppressive activities [ , ]. Glycodelin-C stimulates binding of spermatozoa to the zona pellucida []. Glycodelin-F inhibits spermatozoa-zona pellucida binding and significantly suppresses progesterone-induced acrosome reaction of spermatozoa []. Glycodelin-S in seminal plasma maintains the uncapacitated state of human spermatozoa [].The lipocalins are a diverse, interesting, yet poorly understood family of proteins composed, in the main, of extracellular ligand-binding proteinsdisplaying high specificity for small hydrophobic molecules [ , ]. Functionsof these proteins include transport of nutrients, control of cell regulation, pheromone transport, cryptic colouration and the enzymatic synthesis of prostaglandins.The crystal structures of several lipocalins have been solved and show a novel 8-stranded anti-parallel β-barrel fold well conserved within thefamily. Sequence similarity within the family is at a much lower level and would seem to be restricted to conserved disulphides and 3 motifs, whichform a juxtaposed cluster that may act as a common cell surface receptor site []. By contrast, at the more variable end of the fold are found an internal ligand binding site and a putative surface for the formation of macromolecular complexes []. The anti-parallel β-barrel fold is alsoexploited by the fatty acid-binding proteins (which function similarly by binding small hydrophobic molecules), by avidin and the closely relatedmetalloprotease inhibitors, and by triabin. Similarity at the sequence level, however, is less obvious, being confined to a single short N-terminal motif. The lipocalin family can be subdivided into kernal and outlier sets. Thekernal lipocalins form the largest self consistent group, comprising the subfamily of beta-lactoglobulins. The outlier lipocalins form several smaller distinct subgroups: the OBPs, the von Ebner's gland proteins, alpha-1-acid glycoproteins, tick histamine binding proteins and the nitrophorins.
Protein Domain
Name: Apoptosis regulator, Bcl-W
Type: Family
Description: Apoptosis, or programmed cell death (PCD), is a common and evolutionarily conserved property of all metazoans [ ]. In many biological processes, apoptosis is required to eliminate supernumerary or dangerous (such as pre-cancerous) cells and to promote normal development. Dysregulation of apoptosis can, therefore, contribute to the development of many major diseases including cancer, autoimmunity and neurodegenerative disorders. In most cases, proteins of the caspase family execute the genetic programme that leads to cell death.Bcl-2 proteins are central regulators of caspase activation, and play a key role in cell death by regulating the integrity of the mitochondrial and endoplasmic reticulum (ER) membranes [ ]. At least 20 Bcl-2 proteins have been reported in mammals, and several others have been identified in viruses. Bcl-2 family proteins fall roughly into three subtypes, which either promote cell survival (anti-apoptotic) or trigger cell death (pro-apoptotic). All members contain at least one of four conserved motifs, termed Bcl-2 Homology (BH) domains. Bcl-2 subfamily proteins, which contain at least BH1 and BH2, promote cell survival by inhibiting the adapters needed for the activation of caspases.Pro-apoptotic members potentially exert their effects by displacing the adapters from the pro-survival proteins; these proteins belong either to the Bax subfamily, which contain BH1-BH3, or to the BH3 subfamily, which mostly only feature BH3 [ ]. Thus, the balance between antagonistic family members is believed to play a role in determining cell fate. Members of the wider Bcl-2 family, which also includes Bcl-x, Bcl-w and Mcl-1, are described by their similarity to Bcl-2 protein, a member of the pro-survival Bcl-2 subfamily []. Full-length Bcl-2 proteins feature all four BH domains, seven α-helices, and a C-terminal hydrophobic motif that targets the protein to the outer mitochondrial membrane, ER and nuclear envelope. In healthy cells, Bcl-w resides in the cytoplasm, and becomes membrane-associated in response to cytotoxic insults. Its 3D structure comprises a bundle of five amphipathic α-helices surrounding two central hydrophobic helical regions. A hydrophobic groove, formed by residues from BH1-BH3, is capable of binding the BH3 α-helix from a pro-apoptotic BH3-only family member [ ].
Protein Domain
Name: Mammalian uncoordinated homology 13, domain 2
Type: Domain
Description: Mammalian uncoordinated homology 13 (Munc13) proteins constitute a family of three highly homologous molecules (Munc13-1, Munc13-2 and Munc13-3) with homology to Caenorhabditis elegans Unc-13. Munc13 proteins contain a phorbol ester-binding C1 domain and two C2 domains, which are Ca2+/phospholipid binding domains. Sequence analyses have uncovered two regions called Munc13 homology domains 1 (MHD1) and 2 (MHD2) that are arranged between two flanking C2 domains. MHD1 and MHD2 domains are present in a wide variety of proteins from Arabidopsis thaliana, C. elegans, Drosophila melanogaster (Fruit fly), Mus musculus (Mouse), Rattus norvegicus (Rat) and Homo sapiens (Human), some of which may function in a Munc13-like manner to regulate membrane trafficking [ ].The MHD1 and MHD2 domains are predicted to be α-helical [ ]. Some proteins known to contain MHD1 and MHD2 domains are listed below:Mammalian Munc13-1. It is specifically targeted to presynaptic active zones and has a central priming function in synaptic vesicle exocytosis from glutaminergic synapses.Mammalian Munc13-2. It plays a role in vesicle maturation during exocytosis as a target of the diacylglycerol second messenger pathway.Mammalian Munc13-3. It probably plays a role in vesicle maturation during exocytosis as a target of the diacylglycerol second messenger pathway.Mammalian Munc13-4. It is predominantly expressed in lung where it is localized to goblet cells of the bronchial epithelium and to alveolar type II cells, both of which are cell types with secretory function.C. elegans Unc-13. It may form part of a signal transduction pathway, transducing the signal from diacylglycerol to effector functions.Mammalian BAI1-associated protein 3 (BAP3), which exhibits the typical Munc13-like domain structure with two C2 domains flanking the MHD1 and MHD2 domains, but which lack the long N terminus with the C1 domain.Animal calcium-dependent activator proteins for secretion (CAPSs), regulators of large dense-core vesicle secretion. They contain only a MHD1 domain and are otherwise unrelated to Munc13 proteins.A. thaliana hypothetical proteins with MHD1 and MHD2 domains but without C1 and C2 domains.Saccharomyces cerevisiae uncharacterised protein YOR296W, where MHD1 and MHD2 enclose a central C2 domain. YOR296W is presumably involved in bud formation.Schizosaccharomyces pombe hypothetical protein C11E3.02c in chromosome I, where MHD1 and MHD2 enclose a central C2 domain.This entry represents the Munc13 homology domain 2.
Protein Domain
Name: Munc13 homology 1
Type: Domain
Description: Munc13 proteins constitute a family of three highly homologous molecules (Munc13-1, Munc13-2 and Munc13-3) with homology to Caenorhabditis elegans Unc-13. Munc13 proteins contain a phorbol ester-binding C1 domain and two C2 domains, which are Ca2+/phospholipid binding domains. Sequence analyses have uncovered two regions called Munc13 homology domains 1 (MHD1) and 2 (MHD2) that are arranged between two flanking C2 domains. MHD1 and MHD2 domains are present in a wide variety of proteins from Arabidopsis thaliana, C. elegans, Drosophila melanogaster (Fruit fly), Mus musculus (Mouse), Rattus norvegicus (Rat) and Homo sapiens (Human), some of which may function in a Munc13-like manner to regulate membrane trafficking [ ].The MHD1 and MHD2 domains are predicted to be α-helical [ ]. Some proteins known to contain MHD1 and MHD2 domains are listed below:Mammalian Munc13-1. It is specifically targeted to presynaptic active zones and has a central priming function in synaptic vesicle exocytosis from glutaminergic synapses.Mammalian Munc13-2. It plays a role in vesicle maturation during exocytosis as a target of the diacylglycerol second messenger pathway.Mammalian Munc13-3. It probably plays a role in vesicle maturation during exocytosis as a target of the diacylglycerol second messenger pathway.Mammalian Munc13-4. It is predominantly expressed in lung where it is localized to goblet cells of the bronchial epithelium and to alveolar type II cells, both of which are cell types with secretory function.C. elegans Unc-13. It may form part of a signal transduction pathway, transducing the signal from diacylglycerol to effector functions.Mammalian BAI1-associated protein 3 (BAP3), which exhibits the typical Munc13-like domain structure with two C2 domains flanking the MHD1 and MHD2 domains, but which lack the long N terminus with the C1 domain.Animal calcium-dependent activator proteins for secretion (CAPSs), regulators of large dense-core vesicle secretion. They contain only a MHD1 domain and are otherwise unrelated to Munc13 proteins.A. thaliana hypothetical proteins with MHD1 and MHD2 domains but without C1 and C2 domains.Saccharomyces cerevisiae uncharacterised protein YOR296W, where MHD1 and MHD2 enclose a central C2 domain. YOR296W is presumably involved in bud formation.Schizosaccharomyces pombe hypothetical protein C11E3.02c in chromosome I, where MHD1 and MHD2 enclose a central C2 domain.This entry represents the Munc13 homology domain 1.
Protein Domain
Name: Ribonuclease P subunit, Rpr2/Snm1/Rpp21
Type: Family
Description: This entry contains ribonuclease P (Rnp) proteins from eukaryotes and archaea. Rnp is a ubiquitous ribozyme that catalyzes a Mg2 -dependent hydrolysis to remove the 5'-leader sequence of precursor tRNA (pre-tRNA) [ , ]. Archaeal and eukaryotic RNase P consist of a single RNA and archaeal RNase P has four or five proteins, while eukaryotic RNase P consists of 9 or 10 proteins. Eukaryotic and archaeal RNase P RNAs cooperatively function with protein subunits in catalysis []. Human RNase P is composed of a singular protein Pop1 and three subcomplexes, the Rpp20-Rpp25 heterodimer, Pop5-Rpp14-(Rpp30)2-Rpp40 heteropentamer, and Rpp21-Rpp29-Rpp38 heterotrimer. Although both Pop5 and Rpp14 have similar protein structure, they share a very limited sequence similarity. Moreover, the C-terminal fragments after the conserved beta sheets in Pop5 and Rpp14 exhibit distinct structural features that mediate interactions with Pop1 and Rpp40, respectively [ ].In the hyperthermophilic archaeon Pyrococcus horikoshii OT3, RNase P is composed of the RNase P RNA (pRNA) and five proteins (PhoPop5, PhoRpp38, PhoRpp21, PhoRpp29, and PhoRpp30) [ , ].This entry includes Rpp21 from animals, Snm1/Rpr2 from yeasts and RNP4 from archaea [ , ]. Snm1 is a subunit of RNase MRP (mitochondrial RNA processing), a ribonucleoprotein endoribonuclease that has roles in both mitochondrial DNA replication and nuclear 5.8S rRNA processing. Snm1 is an RNA binding protein that binds the MRP RNA specifically []. This subunit possibly binds the precursor tRNA [].
Protein Domain
Name: IBR domain
Type: Domain
Description: The IBR (In Between Ring fingers) domain is often found to occur between pairs of ring fingers. This domain has also been called the C6HC domain and DRIL (for double RING finger linked) domain [ ]. Proteins that contain two Ring fingers and an IBR domain (these proteins are also termed RBR family proteins) are thought to exist in all eukaryotic organisms. RBR family members play roles in protein quality control and can indirectly regulate transcription []. Evidence suggests that RBR proteins are often parts of cullin-containing ubiquitin ligase complexes. The ubiquitin ligase Parkin is an RBR family protein whose mutations are involved in forms of familial Parkinson's disease [].IBR domain is a cysteine-rich (C6HC) zinc finger domain that is present in Triad1, and which is conserved in other proteins encoded by various eukaryotes. The C6HC consensus pattern is:C-x(4)-C-x(14-30)-C-x(1-4)-C-x(4)-C-x(2)-C-x(4)-H-x(4)-C The C6HC zinc finger motif is the fourth family member of the zinc-binding RING, LIM, and LAP/PHD fingers. Strikingly, in most of the proteins the C6HC domain is flanked by two RING finger structures . The novel C6HC motif has been called DRIL (double RING finger linked). The strong conservation of the larger tripartite TRIAD (twoRING fingers and DRIL) structure indicates that the three subdomains are functionally linked and identifies a novel class of proteins [ ].
Protein Domain
Name: Protein-glutamine gamma-glutamyltransferase, animal
Type: Family
Description: This entry includes a group of transglutaminases from animals. Transglutaminases catalyse the post-translational modification of proteins at glutamine residues, with formation of isopeptide bonds. Members of the transglutaminase family usually have three domains: N-terminal, middle and C-terminal. The middle domain is usually well conserved, but family members can display major differences in their N- and C-terminal domains, although their overall structure is conserved [ ].The best known transglutaminase is blood coagulation factor XIII, a plasma tetrameric protein composed of two catalytic A subunits and two non-catalytic B subunits. Factor XIII is responsible for cross-linking fibrin chains, thus stabilising the fibrin clot. Protein-glutamine gamma-glutamyltransferases () are calcium-dependent enzymes that catalyse the cross-linking of proteins by promoting the formation of isopeptide bonds between the γ-carboxyl group of a glutamine in one polypeptide chain and the ε-amino group of a lysine in a second polypeptide chain. TGases also catalyse the conjugation of polyamines to proteins [ , ].This entry also includes Protein 4.2 (also known as Epb42), which is one of the most abundant protein components of the erythrocyte membrane. The protein shares significant sequence homology with transglutaminases, but lacks the catalytic triad residues required for transglutaminase activity [ ]. The complete or nearly complete absence of protein 4.2 is associated with an atypical form of hereditary spherocytosis (HS) [].
Protein Domain
Name: P-type trefoil domain superfamily
Type: Homologous_superfamily
Description: A cysteine-rich domain of approximately forty five amino-acid residues has been found in some extracellular eukaryotic proteins [ , , , ]. It is known as either the 'P', 'trefoil' or 'TFF' domain, and contains six cysteines linked by three disulphide bonds with connectivity 1-5, 2-4, 3-6. This leads to a characteristic three leafed structure ('trefoil'). The P-type domain is clearly composed of three looplike regions. The central core of the domain consists of a short two-stranded antiparallel β-sheet, which is capped by an irregular loop and forms a central hairpin (loop 3). The β-sheet is preceded by a short α-helix, with majority of the remainder of the domain contained in two loops, which lie on either side of the central hairpin.This domain has been found in a variety of extracellular eukaryotic proteins [, , ], including:Protein pS2 (TFF1), a protein secreted by the stomach mucosaSpasmolytic polypeptide (SP) (TFF2), a protein of about 115 residues that inhibits gastrointestinal motility and gastric acid secretionIntestinal trefoil factor (ITF) (TFF3)Xenopus laevis stomach proteins xP1 and xP4Xenopus integumentary mucins A.1 (FIM-A.1 or preprospasmolysin) and C.1 (FIM-C.1), proteins which may be involved in defence against microbial infections by protecting the epithelia from the external environmentXenopus skin protein xp2 (or APEG)Zona pellucida sperm-binding protein B (ZP-B)Intestinal sucrase-isomaltase ( / ), a vertebrate membrane bound, multifunctional enzyme complex which hydrolyses sucrose, maltose and isomaltose Lysosomal alpha-glucosidase ( )
Protein Domain
Name: ApaG domain
Type: Domain
Description: The apaG domain is a ~125 amino acids domain present in bacterial apaG proteins and in eukaryotic F-box proteins. The domain is named after thebacterial apaG protein, of which it forms the core. The domain also occurs in the C-terminal part of eukaryotic proteins with an N-terminal F-box domain. The Salmonella typhimurium apaG domain protein corD isinvolved in Co(2+) resistance and Mg(2+) efflux. Tertiary structures from different apaG proteins show a fold of several β-sheets.The apaG domain may be involved in protein-protein interactions which could be implicated in substrate-specificity [, , , ].
Protein Domain
Name: Sas10/Utp3/C1D
Type: Family
Description: This entry represents Something about silencing protein 10 (SAS10, also known as UTP3) and U3 small nucleolar ribonucleoprotein protein LCP5 which are components of the U3 ribonucleoprotein complex []. It also includes Nuclear nucleic acid-binding protein C1D from Mus musculus (Mouse), which plays a role in the recruitment of the RNA exosome complex to pre-rRNA to mediate the 3'-5' end processing of the 5.8S rRNA [], Protein THALLO from Arabidopsis thaliana, essential during embryogenesis [, ] and the human protein Neuroguidin, an initiation factor 4E (eIF4E)-binding protein [].
Protein Domain
Name: ApaG domain superfamily
Type: Homologous_superfamily
Description: The apaG domain is a ~125 amino acids domain present in bacterial apaG proteins and in eukaryotic F-box proteins. The domain is named after thebacterial apaG protein, of which it forms the core. The domain also occurs in the C-terminal part of eukaryotic proteins with an N-terminal F-box domain. The Salmonella typhimurium apaG domain protein corD isinvolved in Co(2+) resistance and Mg(2+) efflux. Tertiary structures from different apaG proteins show a fold of several β-sheets.The apaG domain may be involved in protein-protein interactions which could be implicated in substrate-specificity [, , , ].
Protein Domain
Name: TRAP transporter solute receptor, DctP family
Type: Family
Description: Substrate-binding proteins (SBPs) are extracytoplasmic proteins involved in substrate recognition for several different bacterial transporters. This entry includes a subset of the DctP family of the substrate-binding proteins. They are part of the DctP-TRAP (tripartite ATP-independent periplasmic) transporter. Proteins in this family include DctP from R. capsulatus, SiaP from Haemophilus influenzae [ ] and DctB from Bacillus subtilis []. The tripartite ATP-independent periplasmic (TRAP) transporters are substrate-binding protein (SBP)-dependent secondary transporters found in prokaryotes. They consist of a substrate-binding protein (SBP) of the DctP or TAXI families and two integral membrane proteins that form the DctQ and DctM protein families [ ].
Protein Domain
Name: Type I secretion membrane fusion protein, HlyD family
Type: Family
Description: Type I secretion is an ABC transporter that exports proteins, without cleavage of any signal sequence, from the cytosol to extracellular medium across both inner and outer membranes. The secretion signal is found in the C terminus of the transported protein. This entry represents the adaptor protein between the ATP-binding cassette (ABC) protein of the inner membrane and the outer membrane protein, and is called the membrane fusion protein. This entry selects a group of sequences closely related to HlyD [ , ]; it is defined narrowly and excludes, for example, colicin V secretion protein CvaA and multidrug efflux proteins.
Protein Domain
Name: TRAP transporter solute receptor, TAXI family
Type: Family
Description: Substrate-binding proteins (SBPs) are extracytoplasmic proteins involved in substrate recognition for several different bacterial transporters. This entry represents the TAXI family of the substrate-binding proteins. They are part of the TAXI-TRAP (tripartite ATP-independent periplasmic) transporter. Proteins in this family have been proposed to function as glutamine/glutamate specific uptake systems and a role in glutamate transport [ , ]. The tripartite ATP-independent periplasmic (TRAP) transporters are substrate-binding protein (SBP)-dependent secondary transporters found in prokaryotes. They consist of a substrate-binding protein (SBP) of the DctP or TAXI families and two integral membrane proteins that form the DctQ and DctM protein families [ ].
Protein Domain
Name: Auxin response factor
Type: Family
Description: The plant hormone auxin (indole-3-acetic acid) can regulate the gene expression of several families, including Aux/IAA, GH3 and SAUR families. Two related families of proteins, Aux/IAA proteins ( ) and the auxin response factors (ARF), are key regulators of auxin-modulated gene expression [ ]. There are multiple ARF proteins, some of which activate, while others repress transcription. ARF proteins bind to auxin-responsive cis-acting promoter elements (AuxREs) using an N-terminal DNA-binding domain. It is thought that Aux/IAA proteins activate transcription by modifying ARF activity through the C-terminal protein-protein interaction domains found in both Aux/IAA and ARF proteins.
Protein Domain
Name: 26S proteasome regulatory complex, non-ATPase subcomplex, Rpn2/Psmd1 subunit
Type: Family
Description: Intracellular proteins, including short-lived proteins such as cyclin, Mos, Myc, p53, NF-kappaB, and IkappaB, are degraded by the ubiquitin-proteasome system. The 26S proteasome is a self-compartmentalising protease responsible for the regulated degradation of intracellular proteins in eukaryotes [, ]. This giant intracellular protease is formed by several subunits arranged into two 19S polar caps, where protein recognition and ATP-dependent unfolding occur, flanking a 20S central barrel-shaped structure with an inner proteolytic chamber. This overall structure is highly conserved among eukaryotes and is essential for cell viability. Proteins targeted to the 26S proteasome are conjugated with a polyubiquitin chain by an enzymatic cascade before delivery to the 26S proteasome for degradation into oligopeptides.The 26S proteasome can be divided into two subcomplexes: the 19S regulatory particle (RP) and the 20S core particle (CP) [ ]. The 19S component is divided into a "base"subunit containing six ATPases (Rpt proteins) and two non-ATPases (Rpn1, Rpn2), and a "lid"subunit composed of eight stoichiometric proteins (Rpn3, Rpn5, Rpn6, Rpn7, Rpn8, Rpn9, Rpn11, Rpn12) [ ]. Additional non-essential and species specific proteins may also be present. The 19S unit performs several essential functions including binding the specific protein substrates, unfolding them, cleaving the attached ubiquitin chains, opening the 20S subunit, and driving the unfolded polypeptide into the proteolytic chamber for degradation. The 26s proteasome and 19S regulator are of medical interest due to their involvement in burn rehabilitation [].This group represents a 26S proteasome regulatory complex, non-ATPase subcomplex, Rpn2/Psmd1 subunit.
Protein Domain
Name: Lipocalin, ApoD type
Type: Family
Description: This entry represents ApoD-type lipocalins, including retinol-binding protein 4 as well as other retinol-binding proteins. Apolipoprotein D (ApoD) is mainly associated with high-density lipoproteins (HDL) and appears to be able to transport a variety of ligands in a number of different contexts [ ]. Insect Lazarillo is an homologue of ApoD [].The lipocalins are a diverse, interesting, yet poorly understood family of proteins composed, in the main, of extracellular ligand-binding proteins displaying high specificity for small hydrophobic molecules []. Functions of these proteins include transport of nutrients, control of cell regulation, pheromone transport, cryptic colouration, and the enzymatic synthesis of prostaglandins. For example, retinol-binding protein 4 transfers retinol from the stores in the liver to peripheral tissues [].The crystal structures of several lipocalins have been solved and show a novel 8-stranded anti-parallel β-barrel fold well conserved within the family. Sequence similarity within the family is at a much lower level and would seem to be restricted to conserved disulphides and 3 motifs, which form a juxtaposed cluster that may act as a common cell surface receptor site [ , ]. By contrast, at the more variable end of the fold are found an internal ligand binding site and a putative surface for the formation of macromolecular complexes []. The anti-parallel β-barrel fold is also exploited by the fatty acid-binding proteins, which function similarly by binding small hydrophobic molecules. Similarity at the sequence level, however, is less obvious, being confined to a single short N-terminal motif.
Protein Domain
Name: Tetraspanin, conserved site
Type: Conserved_site
Description: Tetraspanins are a distinct family of cell surface proteins, containing four conserved transmembrane domains: a small outer loop (EC1), a larger outer loop (EC2), a small inner loop (IL) and short cytoplasmic tails. They contain characteristic structural features, including 4-6 conserved extracellular cysteine residues, and polar residues within transmembrane domains. A fundamental role of tetraspanins appears to be organising other proteins into a network of multimolecular membrane microdomains, sometimes called the 'tetraspanin web'. Within this web there are primary complexes in which tetraspanins show robust, specific, and direct lateral associations with other proteins. The strong tendency of tetraspanins to associate with each other probably contributes to the assembly of a network of secondary interactions in which non-tetraspanin proteins are associated with each other via palmitoylated tetraspanins acting as linker proteins. In addition, the association of lipids, such as gangliosides and cholesterol, probably contributes to the assembly of even larger tetraspanin complexes, which have some lipid raft-like properties (e.g. resistance to solubilization in non-ionic detergents). Within the tetraspanin web, tetraspanin proteins can associate not only with integrins and other transmembrane proteins, but also with signalling enzymes such as protein kinase C and phosphatidylinositol-4 kinase. Thus, the tetraspanin web provides a mechanistic framework by which membrane protein signalling can be expanded into a lateral dimension [ ].This entry represents a short conserved site found in many tetraspanins. It is located in a short cytoplasmic loop between the second and third transmembrane domains, and contains two conserved cyteines.
Protein Domain
Name: Clathrin, heavy chain/VPS, 7-fold repeat
Type: Repeat
Description: Clathrin is a triskelion-shaped cytoplasmic protein that polymerises into a polyhedral lattice on intracellular membranes to form protein-coated membrane vesicles. Lattice formation induces the sorting of membrane proteins during endocytosis and organelle biogenesis by interacting with membrane-associated adaptor molecules. Clathrin functions as a trimer, and these trimers, or triskelions, are comprised of three legs joined by a central vertex. Each leg consists of one heavy chain and one light chain. The clathrin heavy-chain contains a 145-residue repeat that is present in seven copies [ , ]. The clathrin heavy-chain repeat (CHCR) is also found in nonclathrin proteins such as Pep3, Pep5, Vam6, Vps41, and Vps8 from Saccharomyces cerevisiae and their orthologs from other eukaryotes [, , , ]. These proteins, like clathrins, are involved in vacuolar maintenance and protein sorting. The CHCR repeats in these proteins could mediate protein-protein interactions, or possibly represent clathrin-binding domains, or perform clathrin-like functions. CHCR repeats in the clathrin heavy chain, Saccharomyces cerevisiae Vamp2 and human Vamp6 have been implicated in homooligomerization, suggesting that this may be the primary function of this repeat.The CHCR repeat folds into an elongated right-handed superhelix coil of short α-helices [ ]. Individual 'helix-turn-helix-loop' or helix hairpin units comprise the canonical repeat and stack along the superhelix axis to form a single extended domain. The canonical hairpin repeat of the clathrin superhelix resembles a tetratrico peptide repeat (TPR), but is shorter and lacks the characteristic spacing of the hydrophobic residues in TPRs.
Protein Domain
Name: High mobility group box domain
Type: Domain
Description: High mobility group (HMG) box domains are involved in binding DNA, and may be involved in protein-protein interactions as well. The structure of the HMG-box domain consists of three helices in an irregular array. HMG-box domains are found in one or more copies in HMG-box proteins, which form a large, diverse family involved in the regulation of DNA-dependent processes such as transcription, replication, and strand repair, all of which require the bending and unwinding of chromatin. Many of these proteins are regulators of gene expression. HMG-box proteins are found in a variety of eukaryotic organisms, and can be broadly divided into two groups, based on sequence-dependent and sequence-independent DNA recognition; the former usually contain one HMG-box motif, while the latter can contain multiple HMG-box motifs.HMG-box domains can be found in single or multiple copies in the following protein classes: HMG1 and HMG2 non-histone components of chromatin; SRY (sex determining region Y protein) involved in differential gonadogenesis; the SOX family of transcription factors [ ]; sequence-specific LEF1 (lymphoid enhancer binding factor 1) and TCF-1 (T-cell factor 1) involved in regulation of organogenesis and thymocyte differentiation []; structure-specific recognition protein SSRP involved in transcription and replication; MTF1 mitochondrial transcription factor; nucleolar transcription factors UBF 1/2 (upstream binding factor) involved in transcription by RNA polymerase I; Abf2 yeast ARS-binding factor []; yeast transcription factors lxr1, Rox1, Nhp6b and Spp41; mating type proteins (MAT) involved in the sexual reproduction of fungi []; and the YABBY plant-specific transcription factors.
Protein Domain
Name: Antifreeze protein, type I
Type: Family
Description: Marine teleosts from polar oceans can be protected from freezing in icy sea-water by serum antifreeze proteins (AFPs) or glycoproteins (AFGPs) []:these function by binding to, and preventing the growth of, ice crystals within the fish and depressing the non-equilibrium freezing point to below that of the melting point.Despite functional similarity, antifreeze proteins are structurally diverse and include glycosylated and at least 3 non-glycosylated forms: the AFGP of nototheniids and cods are polymers of atripeptide repeat, Ala-Ala-Thr, with a disaccharide attached to the threonine residue; type I AFPs are foundin flounder and sculpin; type II AFPs of sea-raven, smelt and herring are Cys-rich proteins; and type III AFPs, found in eel pouts, are rich inβ-structure. Non-homologous antifreeze proteins have also been identified in insects and plants [ ].Type I AFPs are Ala-rich, amphiphilic, α-helical proteins [ ]. The ice-binding sites of all AFPs are relatively flat and hydrophobic and have an uninterupted section of alanines running the length of the approximately 16.5A helix repeat. Basedon the energy-minimised structure [ ], a model has been proposed todescribe the binding of the protein to ice crystals, whereby the protein binds to an ice nucleation structure, in a zipper-like fashion, viahydrogen bonding of the methyl-group of threonine side chains (with an 11-residue period) to oxygen atoms in the ice lattice. The growth of ice crystals is thusstopped, or retarded, and the freezing point depressed. The high lysince content of these peptides may serve to promote the solubility of these proteins.
Protein Domain
Name: P-type trefoil domain
Type: Domain
Description: A cysteine-rich domain of approximately forty five amino-acid residues has been found in some extracellular eukaryotic proteins [ , , , ]. It is known as either the 'P', 'trefoil' or 'TFF' domain, and contains six cysteines linked by three disulphide bonds with connectivity 1-5, 2-4, 3-6. This leads to a characteristic three leafed structure ('trefoil'). The P-type domain is clearly composed of three looplike regions. The central core of the domain consists of a short two-stranded antiparallel β-sheet, which is capped by an irregular loop and forms a central hairpin (loop 3). The β-sheet is preceded by a short α-helix, with majority of the remainder of the domain contained in two loops, which lie on either side of the central hairpin.This domain has been found in a variety of extracellular eukaryotic proteins [ , , ], including:Protein pS2 (TFF1), a protein secreted by the stomach mucosaSpasmolytic polypeptide (SP) (TFF2), a protein of about 115 residues that inhibits gastrointestinal motility and gastric acid secretionIntestinal trefoil factor (ITF) (TFF3)Xenopus laevis stomach proteins xP1 and xP4Xenopus integumentary mucins A.1 (FIM-A.1 or preprospasmolysin) and C.1 (FIM-C.1), proteins which may be involved in defence against microbial infections by protecting the epithelia from the external environmentXenopus skin protein xp2 (or APEG)Zona pellucida sperm-binding protein B (ZP-B)Intestinal sucrase-isomaltase ( / ), a vertebrate membrane bound, multifunctional enzyme complex which hydrolyses sucrose, maltose and isomaltose Lysosomal alpha-glucosidase ( )
Protein Domain
Name: P-type trefoil, chordata
Type: Family
Description: A cysteine-rich domain of approximately forty five amino-acid residues has been found in some extracellular eukaryotic proteins [ , , , ]. It is known as either the 'P', 'trefoil' or 'TFF' domain, and contains six cysteines linked by three disulphide bonds with connectivity 1-5, 2-4, 3-6. This leads to a characteristic three leafed structure ('trefoil'). The P-type domain is clearly composed of three looplike regions. The central core of the domain consists of a short two-stranded antiparallel β-sheet, which is capped by an irregular loop and forms a central hairpin (loop 3). The β-sheet is preceded by a short α-helix, with majority of the remainder of the domain contained in two loops, which lie on either side of the central hairpin.This domain has been found in a variety of extracellular eukaryotic proteins [ , , ], including:Protein pS2 (TFF1), a protein secreted by the stomach mucosaSpasmolytic polypeptide (SP) (TFF2), a protein of about 115 residues that inhibits gastrointestinal motility and gastric acid secretionIntestinal trefoil factor (ITF) (TFF3)Xenopus laevis stomach proteins xP1 and xP4Xenopus integumentary mucins A.1 (FIM-A.1 or preprospasmolysin) and C.1 (FIM-C.1), proteins which may be involved in defence against microbial infections by protecting the epithelia from the external environmentXenopus skin protein xp2 (or APEG)Zona pellucida sperm-binding protein B (ZP-B)Intestinal sucrase-isomaltase ( / ), a vertebrate membrane bound, multifunctional enzyme complex which hydrolyses sucrose, maltose and isomaltose Lysosomal alpha-glucosidase ( )
Protein Domain
Name: Tetraspanin, EC2 domain superfamily
Type: Homologous_superfamily
Description: Tetraspanins are a distinct family of cell surface proteins, containing four conserved transmembrane domains: a small outer loop (EC1), a larger outer loop (EC2), a small inner loop (IL) and short cytoplasmic tails. They contain characteristic structural features, including 4-6 conserved extracellular cysteine residues, and polar residues within transmembrane domains. A fundamental role of tetraspanins appears to be organising other proteins into a network of multimolecular membrane microdomains, sometimes called the 'tetraspanin web'. Within this web there are primary complexes in which tetraspanins show robust, specific, and direct lateral associations with other proteins. The strong tendency of tetraspanins to associate with each other probably contributes to the assembly of a network of secondary interactions in which non-tetraspanin proteins are associated with each other via palmitoylated tetraspanins acting as linker proteins. In addition, the association of lipids, such as gangliosides and cholesterol, probably contributes to the assembly of even larger tetraspanin complexes, which have some lipid raft-like properties (e.g. resistance to solubilization in non-ionic detergents). Within the tetraspanin web, tetraspanin proteins can associate not only with integrins and other transmembrane proteins, but also with signalling enzymes such as protein kinase C and phosphatidylinositol-4 kinase. Thus, the tetraspanin web provides a mechanistic framework by which membrane protein signalling can be expanded into a lateral dimension [ ].This superfamily represents the EC2 domain from tetraspanins, consisting of 5 helices in an irregular disulphide-linked array which plays a role in form homodimerization.
Protein Domain
Name: DNA binding HTH domain, Psq-type
Type: Domain
Description: The psq-type HTH domain is a DNA-binding, helix-turn-helix (HTH) domain of about 50 amino acids present in eukaryotic proteins of the Pipsqueak family.This family is named after the Drosophila pipsqueak protein, containing a DNA-binding domain that consists of four tandem repeats of the psq motif [].Proteins of the Pipsqueak family occur in vertebrates, insects, nematodes, and fungi. Three subgroups of the family have been described: BTB, E93 and CENP-B. Pipsqueak and the other proteins of the BTB group (Broad-Complex, Tramtrack,Bric a brac) contain a BTB protein-protein interaction domain in the N-terminal part, and the psq-type HTH domain(s) occur in the C-terminal part. Many BTB proteins are transcriptional regulators and the psq-type HTH domain binds DNA. The Drosophila cell death regulating protein E93 and human orthologs form the second subgroup and can contain the psq-type HTH at varying positions. The human centromere protein B (CENP-B) and the other members of the CENP-B group contain a psq-type DNA-binding domain in the N-terminal part and often a dimerisation domain in the C-terminal part. The CENP-B group includes fungal transposases that, however, lack the N-terminal extremity of the psq-type HTH domain [].The structure of human CENP-B shows that the N-terminal part of the DNA binding domain is composed of three α-helices. The second and third helices connected via a turn comprise the helix-turn-helix motif. Helix 3 is termed the recognition helix as it binds the DNA major groove, like in other HTHs [ ].
Protein Domain
Name: RP1/RP1L1/DCX
Type: Family
Description: This entry includes oxygen-regulated protein 1 (RP1), retinitis pigmentosa 1-like 1 (RP1L1) and neuronal migration protein doublecortin (DCX).Oxygen-regulated protein 1 (also known as retinitis pigmentosa 1, RP1) is a photoreceptor-specific, microtubule-associated ciliary protein containing the doublecortin (DCX) domain. Together with RP1L1, they play essential and synergistic roles in affecting photosensitivity and outer segment morphogenesis of rod photoreceptors [ ].Mutations in the RP1 gene account for 5-10% of cases of autosomal dominant retinitis pigmentosa, a disease characterised by late-onset night blindness, loss of peripheral vision, and diminished or absent electroretinogram (ERG) responses [ ].Mutations in the RP1L1 gene cause occult macular dystrophy (OCMD), an inherited macular dystrophy characterised by progressive loss of macular function but normal ophthalmoscopic appearance [ ]. Neuronal migration protein doublecortin (DCX; also known as Lissencephalin-X) seems to be required for the initial steps of neuronal dispersion and cortex lamination during cerebral cortex development [ ]. This protein may act by competing with the putative neuronal protein kinase DCAMKL1 in binding to a target protein, and thereby participate in a signalling pathway that is crucial for neuronal interaction before and during migration, possibly as part of a calcium ion-dependent signal transduction pathway. It may also be involved with LIS-1 in an overlapping, but distinct, signalling pathways that promotes neuronal migration. Defects in neuronal migration protein doublecortin are the cause of lissencephaly X-linked type 1 (LISX1), a classic lissencephaly characterised by mental retardation and seizures that are more severe in male patients [, ].
Protein Domain
Name: Adenylate cyclase-associated CAP
Type: Family
Description: Cyclase-associated proteins (CAPs) are highly conserved actin-binding proteins present in a wide range of organisms including yeast, fly, plants, and mammals. CAPs are multifunctional proteins that contain several structural domains. CAP is involved in species-specific signalling pathways [ , , , ]. In Drosophila, CAP functions in Hedgehog-mediated eye development and in establishing oocyte polarity. In Dictyostelium (slim mold), CAP is involved in microfilament reorganisation near the plasma membrane in a PIP2-regulated manner and is required to perpetuate the cAMP relay signal to organise fruitbody formation. In plants, CAP is involved in plant signalling pathways required for co-ordinated organ expansion. In yeast, CAP is involved in adenylate cyclase activation, as well as in vesicle trafficking and endocytosis. In both yeast and mammals, CAPs appear to be involved in recycling G-actin monomers from ADF/cofilins for subsequent rounds of filament assembly [, ]. In mammals, there are two different CAPs (CAP1 and CAP2) that share 64% amino acid identity. All CAPs appear to contain a C-terminal actin-binding domain that regulates actin remodelling in response to cellular signals and is required for normal cellular morphology, cell division, growth and locomotion in eukaryotes. CAP directly regulates actin filament dynamics and has been implicated in a number of complex developmental and morphological processes, including mRNA localisation and the establishment of cell polarity. Actin exists both as globular (G) (monomeric) actin subunits and assembled into filamentous (F) actin. In cells, actin cycles between these two forms. Proteins that bind F-actin often regulate F-actin assembly and its interaction with other proteins, while proteins that interact with G-actin often control the availability of unpolymerised actin. CAPs bind G-actin. In addition to actin-binding, CAPs can have additional roles, and may act as bifunctional proteins. In Saccharomyces cerevisiae (Baker's yeast), CAP is a component of the adenylyl cyclase complex (Cyr1p) that serves as an effector of Ras during normal cell signalling. S. cerevisiae CAP functions to expose adenylate cyclase binding sites to Ras, thereby enabling adenylate cyclase to be activated by Ras regulatory signals. In Schizosaccharomyces pombe (Fission yeast), CAP is also required for adenylate cyclase activity, but not through the Ras pathway. In both organisms, the N-terminal domain is responsible for adenylate cyclase activation, but the S cerevisiae and S. pombe N-termini cannot complement one another. Yeast CAPs are unique among the CAP family of proteins, because they are the only ones to directly interact with and activate adenylate cyclase [ ]. S. cerevisiae CAP has four major domains. In addition to the N-terminal adenylate cyclase-interacting domain, and the C-terminal actin-binding domain, it possesses two other domains: a proline-rich domain that interacts with Src homology 3 (SH3) domains of specific proteins, and a domain that is responsible for CAP oligomerisation to form multimeric complexes (although oligomerisation appears to involve the N- and C-terminal domains as well). The proline-rich domain interacts with profilin, a protein that catalyses nucleotide exchange on G-actin monomers and promotes addition to barbed ends of filamentous F-actin []. Since CAP can bind profilin via a proline-rich domain, and G-actin via a C-terminal domain, it has been suggested that a ternary G-actin/CAP/profilin complex could be formed.This entry represents CAP proteins from various organisms.
Protein Domain
Name: Cytochrome P450, E-class, CYP24A, mitochondrial
Type: Family
Description: Cytochrome P450 enzymes are a superfamily of haem-containing mono-oxygenases that are found in all kingdoms of life, and which show extraordinary diversity in their reaction chemistry. In mammals, these proteins are found primarily in microsomes of hepatocytes and other cell types, where they oxidise steroids, fatty acids and xenobiotics, and are important for the detoxification and clearance of various compounds, as well as for hormone synthesis and breakdown, cholesterol synthesis and vitamin D metabolism. In plants, these proteins are important for the biosynthesis of several compounds such as hormones, defensive compounds and fatty acids. In bacteria, they are important for several metabolic processes, such as the biosynthesis of antibiotic erythromycin in Saccharopolyspora erythraea (Streptomyces erythraeus).Cytochrome P450 enzymes use haem to oxidise their substrates, using protons derived from NADH or NADPH to split the oxygen so a single atom can be added to a substrate. They also require electrons, which they receive from a variety of redox partners. In certain cases, cytochrome P450 can be fused to its redox partner to produce a bi-functional protein, such as with P450BM-3 from Bacillus megaterium [ ], which has haem and flavin domains.Organisms produce many different cytochrome P450 enzymes (at least 58 in humans), which together with alternative splicing can provide a wide array of enzymes with different substrate and tissue specificities. Individual cytochrome P450 proteins follow the nomenclature: CYP, followed by a number (family), then a letter (subfamily), and another number (protein); e.g. CYP3A4 is the fourth protein in family 3, subfamily A. In general, family members should share >40% identity, while subfamily members should share >55% identity.Cytochrome P450 proteins can also be grouped by two different schemes. One scheme was based on a taxonomic split: class I (prokaryotic/mitochondrial) and class II (eukaryotic microsomes). The other scheme was based on the number of components in the system: class B (3-components) and class E (2-components). These classes merge to a certain degree. Most prokaryotes and mitochondria (and fungal CYP55) have 3-component systems (class I/class B) - a FAD-containing flavoprotein (NAD(P)H-dependent reductase), an iron-sulphur protein and P450. Most eukaryotic microsomes have 2-component systems (class II/class E) - NADPH:P450 reductase (FAD and FMN-containing flavoprotein) and P450. There are exceptions to this scheme, such as 1-component systems that resemble class E enzymes [ , , ]. The class E enzymes can be further subdivided into five sequence clusters, groups I-V, each of which may contain more than one cytochrome P450 family (eg, CYP1 and CYP2 are both found in group I). The divergence of the cytochrome P450 superfamily into B- and E-classes, and further divergence into stable clusters within the E-class, appears to be very ancient, occurring before the appearance of eukaryotes.This entry represents the CYP24A (Vitamin D 24-hydroxylase) family from class E, cytochrome P450 proteins. These enzymes are mitochondrial in origin. CYP24A1 has a role in maintaining calcium homeostasis, catalysing the NADPH-dependent 24-hydroxylation of 25-hydroxyvitamin D(3) in the presence of adrenodoxin and NADPH-adrenodoxin reductase. Human CYP24A1 catalyses both C-23 and C-24 oxidation, while the rat enzyme only catalyses C-24 oxidation [ ].
Protein Domain
Name: TSC-22 / Dip / Bun
Type: Family
Description: Several eukaryotic proteins are evolutionary related and are thought to be involved in transcriptional regulation. These proteins are highly similar in a region of about 50 residues that include a conserved leucine-zipper domain most probably involved in homo- or hetero-dimerisation. Proteins containing this signature include:Vertebrate protein TSC-22 [ ], a transcriptional regulator which seems to act on C-type natriuretic peptide (CNP)promoter. Mammalian protein DIP (DSIP-immunoreactive peptide) [ ], a protein whose function is not yet known.Drosophila protein bunched [ ] (gene bun) (also known as shortsighted), a probable transcription factor required for peripheral nervous system morphogenesis, eye development and oogenesis.Caenorhabditis elegans hypothetical protein T18D3.7.
Protein Domain
Name: Auxin response factor domain
Type: Domain
Description: This pattern represents a conserved region of auxin-responsive transcription factors.The plant hormone auxin (indole-3-acetic acid) can regulate the gene expression of several families, including Aux/IAA, GH3 and SAUR families. Two related families of proteins, Aux/IAA proteins () and the auxin response factors (ARF), are key regulators of auxin-modulated gene expression [ ]. There are multiple ARF proteins, some of which activate, while others repress transcription. ARF proteins bind to auxin-responsive cis-acting promoter elements (AuxREs) using an N-terminal DNA-binding domain. It is thought that Aux/IAA proteins activate transcription by modifying ARF activity through the C-terminal protein-protein interaction domains found in both Aux/IAA and ARF proteins.
Protein Domain
Name: SEA domain
Type: Domain
Description: The SEA domain has been named after the first three proteins in which it was identified (Sperm protein, Enterokinase and Agrin). The SEA domain has around 120 residues, it is an extracellular domain found in a number of cell surface and secreted proteins in which it could be present in one or two copies [ ]. Many SEA domains possess autoproteolysis activity. The SEA domain is closely associated with regions receiving extensive O-glycosylation and is present adjacent to the transmembrane segment in quite a number of type I transmembrane proteins on the cell surface, such as mucin-1 (MUC1) and Notch receptors and in type II single-pass transmembrane proteins such as enterokinase and matriptases. It also present in interphotoreceptor matrix proteoglycans (IMPG1 and IMPG2) []. It has been proposed that carbohydrates are required to stabilise SEA domains and protect them against proteolytic degradation and that the extent of substitution may control proteolytic processing [, ].The SEA domain contains an about 80-residue conserved region and an about 40-residue segment that separates the conserved region from the subsequent C-terminal domains with an alternating conformation of β-sheets and α-helices. Structural analysis of MUC1 SEA domain revealed that it adopts a ferredoxin-like fold in which the cleavage site is located in the middle of the β-hairpin of the second and third β-strands. MUC1 SEA domain undergoes autoproteolysis at the glycine-serine peptide bond and the Ser responsible of this activity is located in the consensus motif GSXXX (X: a hydrophobic residue) [ , , ].Some proteins known to contain a SEA domain include:Vertebrate agrin, an heparan sulfate proteoglycan of the basal lamina of the neuromuscular junction. It is responsible for the clustering of acetylcholine receptors (AChRs) and other proteins at the neuromuscular junction.Mammalian enterokinase. It catalyses the conversion of trypsinogen to trypsin which in turn activates other proenzymes, including chymotrypsinogen, procarboxypeptidases, and proelastases.63kDa sea urchin sperm protein (SP63). It might mediate sperm-egg or sperm-matrix interactions.Animal perlecan, a heparan sulfate containing proteoglycan found in all basement membranes. It interacts with other basement membrane components such as laminin and collagen type IV and serves as an attachment substrate for cells.Some vertebrate epithelial mucins. They form a family of secreted and cell surface glycoproteins expressed by epithelial tissues and implicated in epithelial cell protection, adhesion modulation and signaling.Mammalian cell surface antigen 114/A10, an integral transmembrane protein that is highly expressed in hematopoietic progenitor cells and IL-3-dependent cell lines.
Protein Domain
Name: SEA domain superfamily
Type: Homologous_superfamily
Description: The SEA domain has been named after the first three proteins in which it was identified (Sperm protein, Enterokinase and Agrin). The SEA domain has around 120 residues, it is an extracellular domain found in a number of cell surface and secreted proteins in which it could be present in one or two copies [ ]. Many SEA domains possess autoproteolysis activity. The SEA domain is closely associated with regions receiving extensive O-glycosylation and is present adjacent to the transmembrane segment in quite a number of type I transmembrane proteins on the cell surface, such as mucin-1 (MUC1) and Notch receptors and in type II single-pass transmembrane proteins such as enterokinase and matriptases. It also present in interphotoreceptor matrix proteoglycans (IMPG1 and IMPG2) []. It has been proposed that carbohydrates are required to stabilise SEA domains and protect them against proteolytic degradation and that the extent of substitution may control proteolytic processing [, ].The SEA domain contains an about 80-residue conserved region and an about 40-residue segment that separates the conserved region from the subsequent C-terminal domains with an alternating conformation of β-sheets and α-helices. Structural analysis of MUC1 SEA domain revealed that it adopts a ferredoxin-like fold in which the cleavage site is located in the middle of the β-hairpin of the second and third β-strands. MUC1 SEA domain undergoes autoproteolysis at the glycine-serine peptide bond and the Ser responsible of this activity is located in the consensus motif GSXXX (X: a hydrophobic residue) [ , , ].Some proteins known to contain a SEA domain include:Vertebrate agrin, an heparan sulfate proteoglycan of the basal lamina of the neuromuscular junction. It is responsible for the clustering of acetylcholine receptors (AChRs) and other proteins at the neuromuscular junction.Mammalian enterokinase. It catalyses the conversion of trypsinogen to trypsin which in turn activates other proenzymes, including chymotrypsinogen, procarboxypeptidases, and proelastases.63kDa sea urchin sperm protein (SP63). It might mediate sperm-egg or sperm-matrix interactions.Animal perlecan, a heparan sulfate containing proteoglycan found in all basement membranes. It interacts with other basement membrane components such as laminin and collagen type IV and serves as an attachment substrate for cells.Some vertebrate epithelial mucins. They form a family of secreted and cell surface glycoproteins expressed by epithelial tissues and implicated in epithelial cell protection, adhesion modulation and signaling.Mammalian cell surface antigen 114/A10, an integral transmembrane protein that is highly expressed in hematopoietic progenitor cells and IL-3-dependent cell lines.
Protein Domain
Name: Amyloidogenic glycoprotein
Type: Family
Description: Amyloid-beta precursor protein (APP, or A4) is associated with Alzheimer's disease (AD), because one of its breakdown products, amyloid-beta (A-beta), aggregates to form amyloid or senile plaques [ , , ]. Mutations in APP or in proteins that process APP have been linked with early-onset, familial AD. Individuals with Down's syndrome carry an extra copy of chromosome 21, which contains the APP gene, and almost invariably develop amyloid plaques and Alzheimer's symptoms.APP is important for the neurogenesis and neuronal regeneration, either through the intact protein, or through its many breakdown products [ , ]. APP consists of a large N-terminal extracellular region containing heparin-binding and copper-binding sites, Kunitz domain, E2 domain, a short hydrophobic transmembrane domain, and a short C-terminal intracellular domain. The N-terminal region is similar in structure to cysteine-rich growth factors and appears to function as a cell surface receptor, contributing to neurite growth, neuronal adhesion, axonogenesis and cell mobility []. APP acts as a kinesin I membrane receptor to mediate the axonal transport of beta-secretase and presenilin 1. The N-terminal domain can regulate neurite outgrowth through its binding to heparin and collagen I and IV, which are components of the extracellular matrix. APP is also coupled to apoptosis-inducing pathways, and is involved in copper homeostasis/oxidative stress through copper ion reduction, where copper-metallated APP induces neuronal death [, ]. The C-terminal intracellular domain appears to be involved in transcription regulation through protein-protein interactions. APP can promote transcription activation through binding to APBB1/Tip60, and may bind to the adaptor protein FE65 to transactivate a wide variety of different promoters.APP can be processed by different sets of enzymes:In the non-amyloidogenic (non-plaque-forming) pathway, APP is cleaved by alpha-secretase to yield a soluble N-terminal sAPP-alpha (neuroprotective) and a membrane-bound CTF-alpha. CTF-alpha is broken-down by presenilin-containing gamma-secretase to yield soluble p3 and membrane-bound AICD (nuclear signalling). In the amyloidogenic pathway (plaque-forming), APP is broken down by beta-secretase to yield soluble sAPP-beta and membrane-bound CTF-beta. CTF-beta is broken down by gamma-secretase to yield soluble amyloid-beta and membrane-bound AICD. Amyloid-beta is required for neuronal function, but can aggregate to form amyloid plaques that seem to disrupt brain cells by clogging points of cell-cell contact.This entry represents the family of amyloidogenic glycoproteins, such as amyloid-beta precursor protein (APP, or A4) and amyloid beta precursor like protein 1/2 (APLP1/2). APP is an integral, glycosylated membrane protein.
Protein Domain
Name: NADPH-dependent 7-cyano-7-deazaguanine reductase, QueF type 2
Type: Family
Description: This group represents QueF-like proteins, closely related to (QueF/YkvM) but containing an additional N-terminal domain. They are predicted to function as NADPH-dependent nitrile oxidoreductase based on sequence similarity to , and to catalyse the NADPH-dependent reduction of 7-cyano-7-deazaguanineto7-aminomethyl-7-deazaguanine, a late step in the biosynthesis of queuosine, a 7-deazaguanine modified nucleoside found in tRNA(GUN) of bacteria and eukaryotes. Queuosine (Q) is an example of a highly modified nucleoside located in the anticodon wobble position 34 of tRNAs specific for Tyr, His, Asp, and Asn. With few exceptions (such as yeast and mycoplasma), it is widely distributed in most prokaryotic and eukaryotic phyla [ ]. Q is based on a very unusual 7-deazaguanosine core, which is further modified by addition of a cyclopentendiol ring [].This group of proteins belongs to the T fold structural superfamily and is related to GTP cyclohydrolase FolE. QueF-like proteins form two groups, type I proteins exemplified by Bacillus subtilis YkvM ( ) and type II proteins exemplified by Escherichia coli YqcD ( ). The type I proteins are comparable in size with bacterial and mammalian FolE, whereas the type II proteins are larger and are predicted to be comprised of two domains, similar to plant FolE [ ].In members of this entry, the N-terminal domain has often been annotated as a membrane-spanning domain, but transmembrane prediction programs run on YqcD do not detect any transmembrane segments [ ]. Instead, the QueF motif can be easily detected in this domain, whereas the flanking and invariant cysteine and glutamate residues (Cys-190 and Glu-230 in E. coli YqcD) are only present in the C-terminal domain. The splitting of active-site residues between the two domains of YqcD is very similar to that seen in two-domain FolE, in which neither domain contains the full set of active site residues nor is active when expressed separately. Further, the pattern of active-site splitting is the same in both proteins, with a similarly located conserved central sequence motif split from two flanking sequences, which are 40 residues apart. The splitting of the YqcD active site suggests that a gene duplication occurred, with each domain retaining some of the residues of the putative active site []. As in two-domain FolE, such a duplication event and redistribution of active-site residues could allow the YqcD proteins to evolve a simpler quaternary structure than the QueF proteins [].
Protein Domain
Name: TrmO-like, N-terminal domain
Type: Domain
Description: TrmO (also known as TsaA or YaeB) is a RNA methyltransferase responsible for N6-methylation of N6-threonylcarbamoyladenosine in tRNAs. It has a unique single-sheeted β-barrel structure and represent a new category of AdoMet-dependent methyltransferases [ ].The following uncharacterised proteins have been shown to be evolutionary related and to contain a TrmO-like beta barrel structural domain domain:Agrobacterium tumefaciens Ti plasmid protein virR.Pseudomonas aeruginosa protein rcsF.Archaeoglobus fulgidus hypothetical protein AF0241.Archaeoglobus fulgidus hypothetical protein AF0433.Methanococcus jannaschii hypothetical protein MJ1583.Methanobacterium thermoautotrophicum hypothetical protein MTH1797.Mammalian Nef-associated protein 1 (NAP1). The crystal structure of Haemophilus influenzae TrmO has been solved. In addition to the N-terminal β-barrel methyltransferase domain it has a C-terminal domain containing the conserved sequence motif DPRxxY [ ].
Protein Domain
Name: DDE superfamily endonuclease domain
Type: Domain
Description: Proteins containing this domain are probably endonucleases of the DDE superfamily. This domain contains three carboxylate residues that are believed to be responsible for coordinating metal ions needed for catalysis. The catalytic activity of this enzyme involves DNA cleavage at a specific site followed by a strand transfer reaction. Proteins containing this domain include tigger transposable element derived proteins from mammals [ ] and protein Pdc2 from budding yeasts []. Interestingly, proteins conatining this domain also includes the Centromere Protein B (CENP-B) protein, which appears to have lost the metal binding residues in this domain and is unlikely to have endonuclease activity. CENP-B is a an inner kinetochore protein that binds to a specific centromere sequence [].
Protein Domain
Name: TIPIN/Csm3/Swi3
Type: Family
Description: Proteins in this family contain a domain found in yeast chromosome segregation in meiosis protein 3. Proteins include: Chromosome segregation in meiosis protein 3, which is required for required for chromosome segregation during meiosis and DNA damage repair, forming a fork protection complex with TOF1 [ , ].TIMELESS-interacting protein (also known as TIPIN), which is a nuclear protein that associates with the replicative helicase, and is required for efficient cell cycle arrest in response to DNA damage. It forms a checkpoint complex with TIMELESS [ ]. Protein TIPIN homolog, which is the orthologue of TIPIN from Caenorhabditis elegans and Drosophila melanogaster.Swi1-interacting protein swi3, from Schizosaccharomyces pombe, which forms a fork protection complex with swi1 [ ].
Protein Domain
Name: Aveugle-like, SAM domain
Type: Domain
Description: The SAM (sterile alpha motif) domain of aveugle-like proteins is a protein-protein interaction domain [, ]. In Drosophila, the aveugle (AVE) protein (also known as HYP (Hyphen)) is involved in normal photoreceptor differentiation, and required for epidermal growth factor receptor (EGFR) signaling between ras and raf genes during eye development and wing vein formation. SAM domain of the HYP(AVE) protein interacts with SAM domain of CNK, the multidomain scaffold protein connector enhancer of kinase suppressor of Ras. CNK/HYP(AVE) complex interacts with KSR (kinase suppressor of Ras) protein. This interaction leads to stimulation of Ras-dependent Raf activation [, , ]. Proteins containing this domain also includes vertebrate AVE homologues - Samd10 and Samd12. Their function is unknown.
Protein Domain
Name: Transthyretin, conserved site
Type: Conserved_site
Description: Transthyretin (prealbumin) is a thyroid hormone-binding protein that seems to transport thyroxine (T4) from the bloodstream to the brain. It is a protein of about 130 amino acids that assembles as a homotetramer and forms an internal channel that binds thyroxine. In humans, transthyretin is mainly synthesized in the brain choroid plexus; variants of the protein are associated with distinct forms of amyloidosis. The sequence of transthyretin is highly conserved in vertebrates. A number of uncharacterised proteins also belong to this family: Escherichia coli hypothetical protein YedX. Bacillus subtilis hypothetical protein YunM. Caenorhabditis elegans hypothetical protein R09H10.3. Caenorhabditis elegans hypothetical protein ZK697.8. The signature pattern in this entry is located at the C terminus.
Protein Domain
Name: DPS-like protein, ferritin-like diiron-binding domain
Type: Domain
Description: Dps-like proteins are members of the broad superfamily of ferritin-like diiron-carboxylate proteins, which are non-haem iron storage proteins [ ]. The Dps-like protein from Sulfolobus solfataricus is structurally related to a class of DNA-binding protein from starved cells (Dps). It has been experimentally characterised and shown to self-assemble into a hollow dodecameric protein cage having tetrahedral symmetry, which uses H2O2 to oxidize Fe(II) to Fe(III), storing the oxide as a mineral core on the interior surface of the protein cage. Members of the Dps-like family may therefore be involved in the mitigation of oxidative damage [].This entry represents the ferritin-like diiron-binding domain of Dps-like proteins. Many of the conserved residues of a diiron centre are present in this domain.
Protein Domain
Name: 26S proteasome regulatory complex, non-ATPase subcomplex, Rpn1 subunit
Type: Family
Description: Intracellular proteins, including short-lived proteins such as cyclin, Mos, Myc, p53, NF-kappaB, and IkappaB, are degraded by the ubiquitin-proteasome system. The 26S proteasome is a self-compartmentalising protease responsible for the regulated degradation of intracellular proteins in eukaryotes [ , ]. This giant intracellular protease is formed by several subunits arranged into two 19S polar caps, where protein recognition and ATP-dependent unfolding occur, flanking a 20S central barrel-shaped structure with an inner proteolytic chamber. This overall structure is highly conserved among eukaryotes and is essential for cell viability. Proteins targeted to the 26S proteasome are conjugated with a polyubiquitin chain by an enzymatic cascade before delivery to the 26S proteasome for degradation into oligopeptides.The 26S proteasome can be divided into two subcomplexes: the 19S regulatory particle (RP) and the 20S core particle (CP) [ ]. The 19S component is divided into a "base"subunit containing six ATPases (Rpt proteins) and two non-ATPases (Rpn1, Rpn2), and a "lid"subunit composed of eight stoichiometric proteins (Rpn3, Rpn5, Rpn6, Rpn7, Rpn8, Rpn9, Rpn11, Rpn12) [ ]. Additional non-essential and species specific proteins may also be present. The 19S unit performs several essential functions including binding the specific protein substrates, unfolding them, cleaving the attached ubiquitin chains, opening the 20S subunit, and driving the unfolded polypeptide into the proteolytic chamber for degradation. The 26s proteasome and 19S regulator are of medical interest due to their involvement in burn rehabilitation [].This group represents a 26S proteasome regulatory complex, non-ATPase subcomplex, Rpn1 (regulatory-particle non-ATPase subunit 1). This subunit is essential for embryogenesis in Arabidopsis thaliana [ ].
Protein Domain
Name: Flaviviral glycoprotein E, dimerisation domain
Type: Homologous_superfamily
Description: Flaviviruses are small, enveloped RNA viruses that use arthropods such as mosquitoes for transmission to their vertebrate hosts, and include Yellow fever virus (YFV), West Nile virus (WNV), Tick-borne encephalitis virus, Japanese encephalitis virus (JE) and Dengue virus 2 viruses [ ]. Flaviviruses consist of three structural proteins: the core nucleocapsid proteinC (IPR001122), and the envelope glycoproteins M (IPR000069) and E. Glycoprotein E is a class II viral fusion protein that mediates both receptor binding and fusion. Class II viral fusion proteins are found in flaviviruses and alphaviruses, and are structurally distinct from class I fusion proteins from influenza virus and HIV. Glycoprotein E is comprised of three domains: domain I (dimerisation domain) is an 8-stranded beta barrel, domain II (central domain) is an elongated domain composed of twelve beta strands and two alpha helices, and domain III (immunoglobulin-like domain) is an IgC-like module with ten beta strands. Domains I and II are intertwined [ ]. This superfamily represents domain I.The glycoprotein E dimers on the viral surface re-cluster irreversibly into fusion-competent trimers upon exposure to low pH, as found in the acidic environment of the endosome. The formation of trimers results in a conformational change in the hinge region of domain II, a key structural element that opens a ligand-binding hydrophobic pocket at the interface between domains I and II. The conformational change results in the exposure of a fusion peptide loop at the tip of domain II, which is required in the fusion step to drive the cellular and viral membranes together by inserting into the membrane [ ].
Protein Domain
Name: Thyroglobulin type-1
Type: Domain
Description: Thyroglobulin (Tg) is a large glycoprotein specific to the thyroid gland and is the precursor of the iodinated thyroid hormones thyroxine (T4) and triiodothyronine (T3). The N-terminal section of Tg contains 10 repeats of a domain of about 65 amino acids which is known as the Tg type-1 repeat [ , ]. Such a domain has also been found as a single or repeated sequence in the HLA class II associated invariant chain []; human pancreatic carcinoma marker proteins GA733-1 and GA733-2 []; nidogen (entactin), a sulphated glycoprotein which is widely distributed in basement membranes and that is tightly associated with laminin; insulin-like growth factor binding proteins (IGFBP) []; saxiphilin, a transferrin-like protein from Rana catesbeiana (Bull frog) that binds specifically to the neurotoxin saxitoxin []; chum salmon egg cysteine proteinase inhibitor, and equistatin, a thiol-protease inhibitor from Actinia equina (sea anemone) []. The existence of Thyr-1 domains in such a wide variety of proteins raises questions about their activity and function, and their interactions with neighbouring domains. The Thyr-1 and related domains belong to MEROPS proteinase inhibitor family I31, clan IX.Equistatin from A. equina is composed of three Thyr-1 domains; as with other proteins that contains Thyr-1 domains, the thyropins, they bind reversibly and tightly to cysteine proteases (inhibitor family C1). In equistatin inhibition of papain is a function of domain-1. Unusually domain-2 inhibits cathepsin D, an aspartic protease (inhibitor family A1) and has no activity against papain. Domain-3, does not inhibit either papain or cathepsin D, and its function or its target peptidase has yet to be determined [ , ].
Protein Domain
Name: HMG-I/HMG-Y, DNA-binding, conserved site
Type: Conserved_site
Description: High mobility group (HMG) proteins are a family of relatively low molecular weight non-histone components in chromatin [ ]. HMG-I and HMG-Y (HMGA) are proteins of about 100 amino acid residues which are produced by the alternative splicing of a single gene. HMG-I/Y proteins bind preferentially to the minor groove of AT-rich regions in double-stranded DNA in a non-sequence specific manner [, ]. It is suggested that these proteins could function in nucleosome phasing and in the 3' end processing of mRNA transcripts. They are also involved in the transcription regulation of genes containing, or in close proximity to, AT-rich regions. DNA-binding of these, and several related, proteins is effected by an 11-residue domain known as an AT-hook. Within known HMG-I/Y proteins are found three highly conserved regions, closely related to the consensus sequence TPKRPRGRPKK. A synthetic oligopeptide with this sequence specifically binds to substrate DNA in a manner reminiscent of intact HMG-I/Y proteins. Structure predictions suggest that the peptide has a secondary structure similar to the anti-tumour and anti-viral drugs netropsin and distamycin, and to the dye Hoechst 33258. These ligands, which also preferentially bind to AT-rich DNA, effectively compete with both the synthetic peptide and the HMG-I/Y proteins for DNA binding. The peptide also contains novel structural features such as a predicted Asx bend, or 'hook', at its N terminus, and laterally-projecting cationic Arg/Lys 'bristles', which may play a role in the binding of HMG-I/Y proteins. The predicted peptide structure, the AT-hook, is a previously undescribed DNA-binding motif [].
Protein Domain
Name: High mobility group box domain superfamily
Type: Homologous_superfamily
Description: High mobility group (HMG) box domains are involved in binding DNA, and may be involved in protein-protein interactions as well. The structure of the HMG-box domain consists of three helices in an irregular array. HMG-box domains are found in one or more copies in HMG-box proteins, which form a large, diverse family involved in the regulation of DNA-dependent processes such as transcription, replication, and strand repair, all of which require the bending and unwinding of chromatin. Many of these proteins are regulators of gene expression. HMG-box proteins are found in a variety of eukaryotic organisms, and can be broadly divided into two groups, based on sequence-dependent and sequence-independent DNA recognition; the former usually contain one HMG-box motif, while the latter can contain multiple HMG-box motifs.HMG-box domains can be found in single or multiple copies in the following protein classes: HMG1 and HMG2 non-histone components of chromatin; SRY (sex determining region Y protein) involved in differential gonadogenesis; the SOX family of transcription factors [ ]; sequence-specific LEF1 (lymphoid enhancer binding factor 1) and TCF-1 (T-cell factor 1) involved in regulation of organogenesis and thymocyte differentiation []; structure-specific recognition protein SSRP involved in transcription and replication; MTF1 mitochondrial transcription factor; nucleolar transcription factors UBF 1/2 (upstream binding factor) involved in transcription by RNA polymerase I; Abf2 yeast ARS-binding factor []; yeast transcription factors lxr1, Rox1, Nhp6b and Spp41; mating type proteins (MAT) involved in the sexual reproduction of fungi []; and the YABBY plant-specific transcription factors. Structurally, the HMG box domain is composed of three helices.
Protein Domain
Name: TAP C-terminal (TAP-C) domain
Type: Domain
Description: The vertebrate Tap protein is a member of the NXF family of shuttling transport receptors for nuclear export of mRNA. Tap has a modular structure, and its most C-terminal domain is important for binding to FG repeat-containing nuclear pore proteins (FG-nucleoporins) and is sufficient to mediate nuclear shuttling [ ]. The structure of the C-terminal domain is composed of four helices []. The structure is related to the UBA domain.The NXF family of mRNA nuclear export factors includes vertebrate NXF1 (also called tip-associated protein or mRNA export factor TAP), NXF2 (also called cancer/testis antigen CT39 or TAP-like protein TAPL-2), Caenorhabditis elegans NXF1 (ceNXF1), Saccharomyces cerevisiae mRNA nuclear export factor Mex67p and similar proteins. NXF proteins can stimulate nuclear export of mRNAs and facilitate the export of unspliced viral mRNA containing the constitutive transport element. An NXF protein is multi-domain with a nuclear localization sequence (NLS), a non-canonical mRNA-binding domain, and four leucine-rich repeats (LLR) at the N-terminal region. Its C-terminal part contains a NTF2-like domain and a ubiquitin-associated (UBA)-like domain, joined by flexible Pro-rich linker. Caenorhabditis elegans NXF1 are essential for the nuclear export of poly(A)+mRNA. In budding yeast, Mex67p binds mRNAs through its adaptor Yra1/REF. It also interacts directly with Nab2, an essential shuttling mRNA-binding protein required for export. Moreover, Mex67p associates with both nuclear pore protein (nucleoporin) FG repeats and Hpr1, a component of the TREX/THO complex linking transcription and export [ , , , , , , , , , , , , , , , ].
Protein Domain
Name: TAZ domain superfamily
Type: Homologous_superfamily
Description: TAZ (Transcription Adaptor putative Zinc finger) domains are zinc-containing domains found in the homologous transcriptional co-activators CREB-binding protein (CBP) and the P300. CBP and P300 are histone acetyltransferases ( ) that catalyse the reversible acetylation of all four histones in nucleosomes, acting to regulate transcription via chromatin remodelling. These large nuclear proteins interact with numerous transcription factors and viral oncoproteins, including p53 tumour suppressor protein, E1A oncoprotein, MyoD, and GATA-1, and are involved in cell growth, differentiation and apoptosis []. Both CBP and P300 have two copies of the TAZ domain, one in the N-terminal region, the other in the C-terminal region. The TAZ1 domain of CBP and P300 forms a complex with CITED2 (CBP/P300-interacting transactivator with ED-rich tail), inhibiting the activity of the hypoxia inducible factor (HIF-1alpha) and thereby attenuating the cellular response to low tissue oxygen concentration []. Adaptation to hypoxia is mediated by transactivation of hypoxia-responsive genes by hypoxia-inducible factor-1 (HIF-1) in complex with the CBP and p300 transcriptional coactivators [].Proteins containing this domain also include a group of land-plant specific proteins, know as the BTB/POZ and TAZ domain-containing (BT) protein. The reports of their interaction with CUL3 are contradictory. They are multifunctional scaffold proteins essential for male and female gametophyte development [ ]. The TAZ domain adopts an all-alpha fold with zinc-binding sites in the loops connecting the helices. The TAZ1 domain in P300 and the TAZ2 (CH3) domain in CBP have each been shown to have four amphipathic helices, organised by three zinc-binding clusters with HCCC-type coordination [ , , ].
Protein Domain
Name: CBP/p300-type histone acetyltransferase domain
Type: Domain
Description: Histone acetyltransferase (HAT) enzymes play important roles in the regulation of chromatin assembly, RNA transcription, DNA repair and other DNA-templatedreactions through the lysine side-chain acetylation of histones and other transcription factors. HATs fall into at least four different families basedon sequence conservation within the HAT domain. This includes Gcn5/PCAF, CBP/p300, Rtt109 and MYST families. The different HAT families contain a structurally conserved centralregion associated with acetyl-Coenzyme A (Ac-CoA) cofactor binding but distinct catalytic mechanisms and structurally divergent flanking regions thatmediate different chromatin regulatory functions. Protein acetylation extends beyond histones to other nuclear proteins and even cytoplasmic proteins toregulate diverse biological processes including the regulation of cell cycle, vesicular trafficking, cytoskeleton reorganisation and metabolism.CREB-binding protein (CBP)/p300 proteins are involved in various physiological events including proliferation, differentiation and apoptosis. CBP/p300proteins contain several well-defined protein-interaction domains as well as a centrally located 380-residue HAT domain [, , , ].The overall fold of the CBP/p300-type HAT domain consists of a central beta- sheet comprising seven β-strands surrounded by nine α-helices andseveral loops [ , ].Some proteins known to contain a cBP/p300-type HAT domain are listed below:Animal CBP (also known as KAT3A), a coactivator for the cAMP-responsive transcription factor CREB.Animal p300 (also known as KAT3B), binds to the adenoviral oncoprotein E1A.Arabidopsis thaliana HACs, involved in the ethylene signaling pathway (HAC1, HAC2, HAC4, HAC5 and HAC12). All the HAC members share the CBP/p300-type HAT domain; however, the HAT domain of HAC1 but not that of HAC2 possesses acetyltransferase activity, while the situation had not beentested in the other HAC members [ ].
Protein Domain
Name: HTH CenpB-type DNA-binding domain
Type: Domain
Description: The CENPB-type HTH domain is a DNA-binding, helix-turn-helix (HTH) domain of about 70-75 amino acids, present in eukaryotic centromere proteins andtransposases. The domain is named after the mammalian major centromere autoantigen B or centromere protein B (CENP-B), which is a fundamentalcentromere component of chromosomes. The N terminus of CENP-B contains two DNA-binding HTH domains, which bind to adjacent major grooves of DNA. TheN terminus of CENP-B is formed by a psq-type HTH domain and C-terminal to this domain lies the CENPB-type HTH domain. These two HTHdomains together bind specifically to a 17-base-pair sequence, the CENP-B box, which occurs in alpha-satellite DNA in human centromeres [].The structure of the CENPB-type HTH domain is composed of three α-helices. The second and third helices connected via a turn comprise the helix-turn-helix motif. Helix 3 is termed the recognition helix as itbinds the DNA major groove, like in other HTHs. In CENP-B this domain recognises site 3 of the CENP-B box, while the preceding psq-type HTH binds site 1 of the CENP-B box, and a connecting linker loopbinds in the minor groove of DNA and recognises site 2 [ ].Some proteins known to contain a CENPB-type HTH domain:Mammalian centromere protein B (CENP-B), associated with the centromere and specifically binding DNA to the CENP-B box.Mammalian jerky protein, involved in epileptic seizures in mice [ ].Mammalian Pogo transposases [ ] and tigger transposable elements [].Fission yeast ARS-binding protein 1 (abp1) [ ] and CENP-B homologue proteins (CBHP-1 and 2), which are centromere proteins [, ].Candida albicans protein PDC2 (Pyruvate DeCarboxylase 2) [ , ].Fungal transposases.
Protein Domain
Name: PTPN3/4, FERM domain C-lobe
Type: Domain
Description: Tyrosine-protein phosphatase non-receptor type-4 (PTPN4, also known as PTPMEG) is a cytoplasmic protein-tyrosine phosphatase (PTP) thought to play a role in cerebellar function. PTPMEG-knockout mice have impaired memory formation and cerebellar long-term depression [ ]. Tyrosine-protein phosphatase non-receptor type-3 (PTPN3/PTPH1) is a membrane-associated PTP implicated in regulating tyrosine phosphorylation of growth factor receptors, p97 VCP (valosin-containing protein, or Cdc48 in Saccharomyces cerevisiae), and HBV (Hepatitis B Virus) gene expression; it is mutated in a subset of colon cancers []. PTPN3 and PTPN4 contain an N-terminal FERM domain, a middle PDZ domain, and a C-terminal phosphatase domain. Tyrosine-protein phosphatase 1/PTP1 from nematodes is also included in this entry. The FERM domain has a cloverleaf tripart structure composed of: (1) FERM_N (A-lobe or F1); (2) FERM_M (B-lobe, or F2); and (3) FERM_C (C-lobe or F3). The C-lobe/F3 within the FERM domain is part of the PH domain family. Like most other ERM members they have a phosphoinositide-binding site in their FERM domain. The FERM C domain is the third structural domain within the FERM domain. The FERM domain is found in the cytoskeletal-associated proteins such as ezrin, moesin, radixin, 4.1R, and merlin. These proteins provide a link between the membrane and cytoskeleton and are involved in signal transduction pathways. The FERM domain is also found in protein tyrosine phosphatases (PTPs) , the tyrosine kinases FAK and JAK, in addition to other proteins involved in signaling. This domain is structurally similar to the PH and PTB domains and consequently is capable of binding to both peptides and phospholipids at different sites [ , ].
Protein Domain
Name: CTF transcription factor/nuclear factor 1, DNA-binding domain
Type: Domain
Description: Nuclear factor I (NF-I) or CCAAT box-binding transcription factor (CTF) [ , , ] (also known as TGGCA-binding proteins) are a family of vertebrate nuclear proteins which recognise and bind, as dimers, the palindromic DNA sequence 5'-TGGCANNNTGCCA-3'. This family was first described for its role in stimulating the initiation of adenovirus DNA replication []. In vertebrates there are four members NFIA, NFIB, NFIC, and NFIX and an orthologue from Caenorhabditis elegans has been described, called Nuclear factor I family protein (NFI-I) []. The CTF/NF-I proteins are individually capable of activating transcription and DNA replication, thus they function by regulating cell proliferation and differentiation. They are involved in normal development and have been associated with developmental abnormalities and cancer in humans []. In a given species, there are a large number of different CTF/NF-I proteins, generated both by alternative splicing and by the occurrence of four different genes. CTF/NF-1 proteins contain 400 to 600 amino acids. The N-terminal 200 amino-acid sequence, almost perfectly conserved in all species and genes sequenced, mediates site-specific DNA recognition, protein dimerisation and Adenovirus DNA replication. The C-terminal 100 amino acids contain the transcriptional activation domain. This activation domain is the target of gene expression regulatory pathways elicited by growth factors and it interacts with basal transcription factors and with histone H3 [ ].This entry represents the 200 amino-acid DNA-binding domain found in N-terminal of CTF/NF1 proteins. It mediates site-specific DNA recognition, protein dimerisation and Adenovirus DNA replication. The CTF/NF-I DNA-binding domain contains four conserved Cys residues, which are required for its DNA-binding activity [ ].
Protein Domain
Name: Homeobox engrailed, C-terminal
Type: Domain
Description: Homeodomain proteins are transcription factors that share a related DNA-binding homeodomain [ ]. The homeodomain was initially identified in Drosophila melanogaster (Fruit fly) homeotic and segmentation proteins, but is well conserved throughout metazoans [, ]. The homeodomain binds DNA through a helix-turn-helix (HTH) structure, consisting of approximately 20 residues []. The HTH motif is comprised of two α-helices that make intimate contacts with the DNA; the second helix binds to DNA via a number of hydrogen bonds and hydrophobic interactions. These interactions occur between specific side chains and the exposed bases and thymine methyl groups within the major groove of the DNA. The first helix helps to stabilise the structure and is joined to the second through a short turn.Most proteins which contain a homeobox domain can be classified [ , ],on the basis of their sequence characteristics, into three subfamilies, engrailed, antennapedia and paired. A number of different proteins contain homeodomains, including Drosophila engrailed, yeast mating type proteins, hepatocyte nuclear factor 1a and Hox proteins. Hox genes encode homeodomain-containing transcriptional regulators that operate differential genetic programs along the anterior-posterior axis of animal bodies []. The homeodomain motif is very similar in sequence identity and structure to domains in other DNA-binding proteins, including recombinases, GARP response regulators, human telomeric protein, AraC type transcriptional activator and tetracycline repressor [, , ].This entry represents a conserved region of some 20 amino-acid residues located at the C-terminal of the 'homeobox' domain and forms a kind of a signature pattern for this subfamily of proteins [ ].
Protein Domain
Name: Homeobox engrailed-type, conserved site
Type: Conserved_site
Description: Homeodomain proteins are transcription factors that share a related DNA-binding homeodomain [ ]. The homeodomain was initially identified in Drosophila melanogaster (Fruit fly) homeotic and segmentation proteins, but is well conserved throughout metazoans [, ]. The homeodomain binds DNA through a helix-turn-helix (HTH) structure, consisting of approximately 20 residues []. The HTH motif is comprised of two α-helices that make intimate contacts with the DNA; the second helix binds to DNA via a number of hydrogen bonds and hydrophobic interactions. These interactions occur between specific side chains and the exposed bases and thymine methyl groups within the major groove of the DNA. The first helix helps to stabilise the structure and is joined to the second through a short turn.Most proteins which contain a homeobox domain can be classified [ , ],on the basis of their sequence characteristics, into three subfamilies, engrailed, antennapedia and paired. A number of different proteins contain homeodomains, including Drosophila engrailed, yeast mating type proteins, hepatocyte nuclear factor 1a and Hox proteins. Hox genes encode homeodomain-containing transcriptional regulators that operate differential genetic programs along the anterior-posterior axis of animal bodies []. The homeodomain motif is very similar in sequence identity and structure to domains in other DNA-binding proteins, including recombinases, GARP response regulators, human telomeric protein, AraC type transcriptional activator and tetracycline repressor [, , ].This entry identifies a conserved region of some 20 amino-acid residues, specific to engrailed proteins, located at the C-terminal of the 'homeobox' domain; the specific function of these residues is unclear.
Protein Domain
Name: GTP1/OBG domain
Type: Domain
Description: The N-terminal domain of GTPase Obg has the OBG fold, which is formed by three glycine-rich regions inserted into a small 8-stranded β-sandwich. These regions form six left-handed collagen-like helices packed and H-bonded together.Several proteins have recently been shown to contain the 5 structural motifs characteristic of GTP-binding proteins []. These include murine DRG protein; GTP1 proteinfrom Schizosaccharomyces pombe; OBG protein from Bacillus subtilis; and several others. Although the proteins contain GTP-binding motifs and are similar to each other, they donot share sequence similarity to other GTP-binding proteins, and have thus been classed as a novel group, the GTP1/OBG family. As yet, the functions of these proteins is uncertain,but they have been shown to be important in development and normal cell metabolism [, ].
Protein Domain
Name: Transthyretin, thyroxine binding site
Type: Binding_site
Description: Transthyretin (prealbumin) is a thyroid hormone-binding protein that seems to transport thyroxine (T4) from the bloodstream to the brain. It is a protein of about 130 amino acids that assembles as a homotetramer and forms an internal channel that binds thyroxine. In humans, transthyretin is mainly synthesized in the brain choroid plexus; variants of the protein are associated with distinct forms of amyloidosis. The sequence of transthyretin is highly conserved in vertebrates. A number of uncharacterised proteins also belong to this family: Escherichia coli hypothetical protein YedX. Bacillus subtilis hypothetical protein YunM. Caenorhabditis elegans hypothetical protein R09H10.3. Caenorhabditis elegans hypothetical protein ZK697.8. The signature pattern in this entry is located in the N-terminal extremity and starts with a lysine known to be involved in binding thyroxine binding.
Protein Domain
Name: Apolipoprotein CIII superfamily
Type: Homologous_superfamily
Description: Apolipoprotein C-III is a 79-residue glycoprotein synthesised in the intestine and liver as part of the very low density lipoprotein (VLDL) and the high density lipoprotein (HDL) particles. Owing to its positive correlation with plasma triglyceride (Tg) levels, Apo-CIII is suggested to play a role in Tg metabolism and is therefore of interest regarding atherosclerosis. However, unlike other apolipoproteins such as Apo-AI, Apo E or CII for which many naturally occurring mutations are known, the structure-function relationships of apo C-III remains a subject of debate. One possibility is that apo C-III inhibits lipoprotein lipase (LPL) activity, as shown by in vitroexperiments. Another suggestion, is that elevated levels of Apo-CIII displace other apolipoproteins at the lipoprotein surface, modifying their clearance from plasma [ ].
Protein Domain
Name: GTP1/OBG domain superfamily
Type: Homologous_superfamily
Description: The N-terminal domain of GTPase Obg has the OBG fold, which is formed by three glycine-rich regions inserted into a small 8-stranded β-sandwich. These regions form six left-handed collagen-like helices packed and H-bonded together.Several proteins have recently been shown to contain the 5 structural motifs characteristic of GTP-binding proteins []. These include murine DRG protein; GTP1 proteinfrom Schizosaccharomyces pombe; OBG protein from Bacillus subtilis; and several others. Although the proteins contain GTP-binding motifs and are similar to each other, they donot share sequence similarity to other GTP-binding proteins, and have thus been classed as a novel group, the GTP1/OBG family. As yet, the functions of these proteins is uncertain, but they have been shown to be important in development and normal cell metabolism[ , ].
Protein Domain
Name: Cortactin-binding protein-2, N-terminal
Type: Domain
Description: This entry represents a N-terminal domain found in cortactin-binding protein 2 and in filamin A interacting protein 1 (Filip1).In addition to being a positional candidate for autism, cortactin-binding protein 2 is expressed at highest levels in the brain in humans. Towards the C-terminal end of this protein are a series of proline-rich regions which are likely to be the points of interaction with the SH3 domain of cortactin. The human protein has six associated ankyrin repeat domains ( ) towards the C terminus of the protein which act as protein-protein interaction domains [ ]. Filip1 controls the start of neocortical cell migration from the ventricular zone by acting through a filamin-A/F-actin axis. It may be able to induce the degradation of Filamin A [ , ].
Protein Domain
Name: ATP-dependent Clp protease ATP-binding subunit ClpA
Type: Family
Description: Proteins in this entry are related to ClpA ( ) from Escherichia coli. ClpA is an ATP-dependent chaperone and part of the ClpAP protease that participates in regulatory protein degradation and the dissolution and degradation of protein aggregates [ ]. ClpA functions as the regulatory component of the ATP dependent protease complex ClpAP []. ClpA recognises sequences in specific proteins, which it then unfolds in an ATP-dependent manner and transports into the degradation chamber of the associated ClpP protease [, ]. A small adaptor-like protein, ClpS, modulates the activity of ClpA and is an important regulatory factor for this protein []. It protects ClpA from autodegradation and appears to redirect its activity away from soluble proteins and toward aggregated proteins.
Protein Domain
Name: Apolipoprotein CIII
Type: Family
Description: Apolipoprotein C-III is a 79-residue glycoprotein synthesised in the intestine and liver as part of the very low density lipoprotein (VLDL) and the high density lipoprotein (HDL) particles. Owing to its positive correlation with plasma triglyceride (Tg) levels, Apo-CIII is suggested to play a role in Tg metabolism and is therefore of interest regarding atherosclerosis. However, unlike other apolipoproteins such as Apo-AI, Apo E or CII for which many naturally occurring mutations are known, the structure-function relationships of apo C-III remains a subject of debate. One possibility is that apo C-III inhibits lipoprotein lipase (LPL) activity, as shown by in vitroexperiments. Another suggestion, is that elevated levels of Apo-CIII displace other apolipoproteins at the lipoprotein surface, modifying their clearance from plasma [ ].
Protein Domain
Name: Adenylate cyclase-associated CAP, N-terminal
Type: Domain
Description: Cyclase-associated proteins (CAPs) are highly conserved actin-binding proteins present in a wide range of organisms including yeast, fly, plants, and mammals. CAPs are multifunctional proteins that contain several structural domains. CAP is involved in species-specific signalling pathways [ , , , ]. In Drosophila, CAP functions in Hedgehog-mediated eye development and in establishing oocyte polarity. In Dictyostelium (slim mold), CAP is involved in microfilament reorganisation near the plasma membrane in a PIP2-regulated manner and is required to perpetuate the cAMP relay signal to organise fruitbody formation. In plants, CAP is involved in plant signalling pathways required for co-ordinated organ expansion. In yeast, CAP is involved in adenylate cyclase activation, as well as in vesicle trafficking and endocytosis. In both yeast and mammals, CAPs appear to be involved in recycling G-actin monomers from ADF/cofilins for subsequent rounds of filament assembly [, ]. In mammals, there are two different CAPs (CAP1 and CAP2) that share 64% amino acid identity. All CAPs appear to contain a C-terminal actin-binding domain that regulates actin remodelling in response to cellular signals and is required for normal cellular morphology, cell division, growth and locomotion in eukaryotes. CAP directly regulates actin filament dynamics and has been implicated in a number of complex developmental and morphological processes, including mRNA localisation and the establishment of cell polarity. Actin exists both as globular (G) (monomeric) actin subunits and assembled into filamentous (F) actin. In cells, actin cycles between these two forms. Proteins that bind F-actin often regulate F-actin assembly and its interaction with other proteins, while proteins that interact with G-actin often control the availability of unpolymerised actin. CAPs bind G-actin. In addition to actin-binding, CAPs can have additional roles, and may act as bifunctional proteins. In Saccharomyces cerevisiae (Baker's yeast), CAP is a component of the adenylyl cyclase complex (Cyr1p) that serves as an effector of Ras during normal cell signalling. S. cerevisiae CAP functions to expose adenylate cyclase binding sites to Ras, thereby enabling adenylate cyclase to be activated by Ras regulatory signals. In Schizosaccharomyces pombe (Fission yeast), CAP is also required for adenylate cyclase activity, but not through the Ras pathway. In both organisms, the N-terminal domain is responsible for adenylate cyclase activation, but the S cerevisiae and S. pombe N-termini cannot complement one another. Yeast CAPs are unique among the CAP family of proteins, because they are the only ones to directly interact with and activate adenylate cyclase []. S. cerevisiae CAP has four major domains. In addition to the N-terminal adenylate cyclase-interacting domain, and the C-terminal actin-binding domain, it possesses two other domains: a proline-rich domain that interacts with Src homology 3 (SH3) domains of specific proteins, and a domain that is responsible for CAP oligomerisation to form multimeric complexes (although oligomerisation appears to involve the N- and C-terminal domains as well). The proline-rich domain interacts with profilin, a protein that catalyses nucleotide exchange on G-actin monomers and promotes addition to barbed ends of filamentous F-actin []. Since CAP can bind profilin via a proline-rich domain, and G-actin via a C-terminal domain, it has been suggested that a ternary G-actin/CAP/profilin complex could be formed.This entry represents the N-terminal domain of CAP proteins. This domain has an all-alpha structure consisting of six helices in a bundle with a left-handed twist and an up-and-down topology [ ].
Protein Domain
Name: Adenylate cyclase-associated CAP, C-terminal
Type: Domain
Description: Cyclase-associated proteins (CAPs) are highly conserved actin-binding proteins present in a wide range of organisms including yeast, fly, plants, and mammals. CAPs are multifunctional proteins that contain several structural domains. CAP is involved in species-specific signalling pathways [ , , , ]. In Drosophila, CAP functions in Hedgehog-mediated eye development and in establishing oocyte polarity. In Dictyostelium (slim mold), CAP is involved in microfilament reorganisation near the plasma membrane in a PIP2-regulated manner and is required to perpetuate the cAMP relay signal to organise fruitbody formation. In plants, CAP is involved in plant signalling pathways required for co-ordinated organ expansion. In yeast, CAP is involved in adenylate cyclase activation, as well as in vesicle trafficking and endocytosis. In both yeast and mammals, CAPs appear to be involved in recycling G-actin monomers from ADF/cofilins for subsequent rounds of filament assembly [, ]. In mammals, there are two different CAPs (CAP1 and CAP2) that share 64% amino acid identity. All CAPs appear to contain a C-terminal actin-binding domain that regulates actin remodelling in response to cellular signals and is required for normal cellular morphology, cell division, growth and locomotion in eukaryotes. CAP directly regulates actin filament dynamics and has been implicated in a number of complex developmental and morphological processes, including mRNA localisation and the establishment of cell polarity. Actin exists both as globular (G) (monomeric) actin subunits and assembled into filamentous (F) actin. In cells, actin cycles between these two forms. Proteins that bind F-actin often regulate F-actin assembly and its interaction with other proteins, while proteins that interact with G-actin often control the availability of unpolymerised actin. CAPs bind G-actin. In addition to actin-binding, CAPs can have additional roles, and may act as bifunctional proteins. In Saccharomyces cerevisiae (Baker's yeast), CAP is a component of the adenylyl cyclase complex (Cyr1p) that serves as an effector of Ras during normal cell signalling. S. cerevisiae CAP functions to expose adenylate cyclase binding sites to Ras, thereby enabling adenylate cyclase to be activated by Ras regulatory signals. In Schizosaccharomyces pombe (Fission yeast), CAP is also required for adenylate cyclase activity, but not through the Ras pathway. In both organisms, the N-terminal domain is responsible for adenylate cyclase activation, but the S cerevisiae and S. pombe N-termini cannot complement one another. Yeast CAPs are unique among the CAP family of proteins, because they are the only ones to directly interact with and activate adenylate cyclase [ ]. S. cerevisiae CAP has four major domains. In addition to the N-terminal adenylate cyclase-interacting domain, and the C-terminal actin-binding domain, it possesses two other domains: a proline-rich domain that interacts with Src homology 3 (SH3) domains of specific proteins, and a domain that is responsible for CAP oligomerisation to form multimeric complexes (although oligomerisation appears to involve the N- and C-terminal domains as well). The proline-rich domain interacts with profilin, a protein that catalyses nucleotide exchange on G-actin monomers and promotes addition to barbed ends of filamentous F-actin []. Since CAP can bind profilin via a proline-rich domain, and G-actin via a C-terminal domain, it has been suggested that a ternary G-actin/CAP/profilin complex could be formed.This entry represents the C-terminal domain of CAP proteins, which is responsible for G-actin-binding. This domain has a superhelical structure, where the superhelix turns are made of two β-strands each [ ].
Protein Domain
Name: Cytochrome P450, E-class, group I
Type: Family
Description: Cytochrome P450 enzymes are a superfamily of haem-containing mono-oxygenases that are found in all kingdoms of life, and which show extraordinary diversity in their reaction chemistry. In mammals, these proteins are found primarily in microsomes of hepatocytes and other cell types, where they oxidise steroids, fatty acids and xenobiotics, and are important for the detoxification and clearance of various compounds, as well as for hormone synthesis and breakdown, cholesterol synthesis and vitamin D metabolism. In plants, these proteins are important for the biosynthesis of several compounds such as hormones, defensive compounds and fatty acids. In bacteria, they are important for several metabolic processes, such as the biosynthesis of antibiotic erythromycin in Saccharopolyspora erythraea (Streptomyces erythraeus).Cytochrome P450 enzymes use haem to oxidise their substrates, using protons derived from NADH or NADPH to split the oxygen so a single atom can be added to a substrate. They also require electrons, which they receive from a variety of redox partners. In certain cases, cytochrome P450 can be fused to its redox partner to produce a bi-functional protein, such as with P450BM-3 from Bacillus megaterium [ ], which has haem and flavin domains.Organisms produce many different cytochrome P450 enzymes (at least 58 in humans), which together with alternative splicing can provide a wide array of enzymes with different substrate and tissue specificities. Individual cytochrome P450 proteins follow the nomenclature: CYP, followed by a number (family), then a letter (subfamily), and another number (protein); e.g. CYP3A4 is the fourth protein in family 3, subfamily A. In general, family members should share >40% identity, while subfamily members should share >55% identity.Cytochrome P450 proteins can also be grouped by two different schemes. One scheme was based on a taxonomic split: class I (prokaryotic/mitochondrial) and class II (eukaryotic microsomes). The other scheme was based on the number of components in the system: class B (3-components) and class E (2-components). These classes merge to a certain degree. Most prokaryotes and mitochondria (and fungal CYP55) have 3-component systems (class I/class B) - a FAD-containing flavoprotein (NAD(P)H-dependent reductase), an iron-sulphur protein and P450. Most eukaryotic microsomes have 2-component systems (class II/class E) - NADPH:P450 reductase (FAD and FMN-containing flavoprotein) and P450. There are exceptions to this scheme, such as 1-component systems that resemble class E enzymes [ , , ]. The class E enzymes can be further subdivided into five sequence clusters, groups I-V, each of which may contain more than one cytochrome P450 family (eg, CYP1 and CYP2 are both found in group I). The divergence of the cytochrome P450 superfamily into B- and E-classes, and further divergence into stable clusters within the E-class, appears to be very ancient, occurring before the appearance of eukaryotes.This entry represents class E cytochrome P450 proteins that fall into sequence cluster group I. Group I is richest in members, consisting of cytochrome P450 families CYP1, CYP2, CYP17, CYP21 and CYP71. The members of the first four families are of vertebrate origin, while those from CYP71 are derived from plants. CYP1 and CYP2 enzymes mainly metabolise exogenous substrates, whereas CYP17 and CYP21 are involved in metabolism of endogenous physiologically-active compounds.In the fungus Gibberella, P450 (FUS8) is a component in the biosynthetic pathway for the mycotoxin fusarin C. FUS8 oxidizes carbon C-20 of the intermediate 20-hydroxy-fusarin to form the penultimate intermediate carboxy-fusarin C [ ].This entry also includes cytochromes P450 (Noroxomaritidine synthases and p-coumarate 3-hydroxylase) that catalyse an intramolecular para-para' C-C phenol coupling of 4'-O-methylnorbelladine in alkaloids biosynthesis, during the biosynthesis of phenylpropanoids and Amaryllidaceae alkaloids including haemanthamine- and crinamine-type alkaloids, promising anticancer agents [ , ].
Protein Domain
Name: Zinc finger, RING-CH-type
Type: Domain
Description: Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [ , , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. The RING finger is a well characterised zinc finger which coordinates two zinc atoms in a cross-braced manner (see ). According to the pattern of cysteines and histidines three different subfamilies of RING finger can be defined. The classical RING finger (RING-HC) has a histidine at the fourth coordinating position and a cysteine at the fifth. In the RING-H2 variant, both the fourth and fifth positions are occupied by histidines. The RING-CH, which is very similar to the classical RING finger, differs from both of these variants in that it has a cys residue in the fourth position and a His in the fifth. Another difference between the RING-CH and the common RING variants is a somewhat longer peptide segment between the fourth and fifth zinc-coordinating residues. The RING-CH zinc finger has thus the same arrangement of cysteine and histidine (C4HC3) as the PHD zinc finger (see ) but it contains features (spacing between the cysteines and the histidine) characteristic of the genuine RING-finger (C3HC4) [ , ]. The RING-CH-type is an E3 ligase mainly found in proteins associated to membranes [, ].The solution structure of the RING-CH-type zinc finger of the herpesvirus Mir1 protein has shown that it is an outlying relative of the cellular RING finger domain family, with its polypeptide backbone much more closely resembling that of RING domains than PHD domains [ ]. The only real difference between the classic and variant RING domains, other than the alteration of zinc ligands, is theloss of the small β-sheet found in RING domains and the replacement of one strand of this sheet with a single turn of helix. Some proteins that contains a RING-CH-type zinc finger are listed below: Yeast Doa10/SSM4 ( ). An E3 ligase essential for the endoplasmic reticulum associated degradation (ERAD), an ubiquitin-proteasome system responsible for the degradation of membrane and lumenal proteins of the endoplasmic reticulum. Mammalian membrane-associated RING-CH 1 to 9 (MARCH1 to 9) proteins.Human herpesvirus 8 (HHV-8) (Kaposi's sarcoma-associated herpesvirus) modulator of immune recognition 1 ( ). An E3 ubiquitin-protein ligase which promotes ubiquitination and subsequent degradation of host MHC-I and CD1D molecules, presumably to prevent lysis of infected cells by cytotoxic T-lymphocytes.
Protein Domain
Name: Cytochrome P450, E-class, group II
Type: Family
Description: Cytochrome P450 enzymes are a superfamily of haem-containing mono-oxygenases that are found in all kingdoms of life, and which show extraordinary diversity in their reaction chemistry. In mammals, these proteins are found primarily in microsomes of hepatocytes and other cell types, where they oxidise steroids, fatty acids and xenobiotics, and are important for the detoxification and clearance of various compounds, as well as for hormone synthesis and breakdown, cholesterol synthesis and vitamin D metabolism. In plants, these proteins are important for the biosynthesis of several compounds such as hormones, defensive compounds and fatty acids. In bacteria, they are important for several metabolic processes, such as the biosynthesis of antibiotic erythromycin in Saccharopolyspora erythraea (Streptomyces erythraeus).Cytochrome P450 enzymes use haem to oxidise their substrates, using protons derived from NADH or NADPH to split the oxygen so a single atom can be added to a substrate. They also require electrons, which they receive from a variety of redox partners. In certain cases, cytochrome P450 can be fused to its redox partner to produce a bi-functional protein, such as with P450BM-3 from Bacillus megaterium [], which has haem and flavin domains.Organisms produce many different cytochrome P450 enzymes (at least 58 in humans), which together with alternative splicing can provide a wide array of enzymes with different substrate and tissue specificities. Individual cytochrome P450 proteins follow the nomenclature: CYP, followed by a number (family), then a letter (subfamily), and another number (protein); e.g. CYP3A4 is the fourth protein in family 3, subfamily A. In general, family members should share >40% identity, while subfamily members should share >55% identity.Cytochrome P450 proteins can also be grouped by two different schemes. One scheme was based on a taxonomic split: class I (prokaryotic/mitochondrial) and class II (eukaryotic microsomes). The other scheme was based on the number of components in the system: class B (3-components) and class E (2-components). These classes merge to a certain degree. Most prokaryotes and mitochondria (and fungal CYP55) have 3-component systems (class I/class B) - a FAD-containing flavoprotein (NAD(P)H-dependent reductase), an iron-sulphur protein and P450. Most eukaryotic microsomes have 2-component systems (class II/class E) - NADPH:P450 reductase (FAD and FMN-containing flavoprotein) and P450. There are exceptions to this scheme, such as 1-component systems that resemble class E enzymes [ , , ]. The class E enzymes can be further subdivided into five sequence clusters, groups I-V, each of which may contain more than one cytochrome P450 family (eg, CYP1 and CYP2 are both found in group I). The divergence of the cytochrome P450 superfamily into B- and E-classes, and further divergence into stable clusters within the E-class, appears to be very ancient, occurring before the appearance of eukaryotes.This entry represents class E cytochrome P450 proteins that fall into sequence cluster group II. Group II enzymes are distributed widely in life, i.e., in bacteria (family CYP102), cyanobacteria (CYP110), fungi (CYP52, CYP53 and CYP56), insects (CYP4 and CYP6) and mammals (CYP3, CYP4 and CYP5). Many group II enzymes catalyse hydroxylation of linear chains, such as alkanes (CYP52), alcohols and fatty acids (CYP4, CYP5, CYP102); Aspergillus niger CYP53 carries out para-hydroxylation of benzoate; yeast CYP56 is possibly involved in oxidation of tyrosine residues; insect CYP6 metabolises a wide range of toxic compounds; and members of the CYP3 family are omnivorous.
Protein Domain
Name: Adenylate cyclase-associated CAP, C-terminal superfamily
Type: Homologous_superfamily
Description: Cyclase-associated proteins (CAPs) are highly conserved actin-binding proteins present in a wide range of organisms including yeast, fly, plants, and mammals. CAPs are multifunctional proteins that contain several structural domains. CAP is involved in species-specific signalling pathways [ , , , ]. In Drosophila, CAP functions in Hedgehog-mediated eye development and in establishing oocyte polarity. In Dictyostelium (slim mold), CAP is involved in microfilament reorganisation near the plasma membrane in a PIP2-regulated manner and is required to perpetuate the cAMP relay signal to organise fruitbody formation. In plants, CAP is involved in plant signalling pathways required for co-ordinated organ expansion. In yeast, CAP is involved in adenylate cyclase activation, as well as in vesicle trafficking and endocytosis. In both yeast and mammals, CAPs appear to be involved in recycling G-actin monomers from ADF/cofilins for subsequent rounds of filament assembly [, ]. In mammals, there are two different CAPs (CAP1 and CAP2) that share 64% amino acid identity. All CAPs appear to contain a C-terminal actin-binding domain that regulates actin remodelling in response to cellular signals and is required for normal cellular morphology, cell division, growth and locomotion in eukaryotes. CAP directly regulates actin filament dynamics and has been implicated in a number of complex developmental and morphological processes, including mRNA localisation and the establishment of cell polarity. Actin exists both as globular (G) (monomeric) actin subunits and assembled into filamentous (F) actin. In cells, actin cycles between these two forms. Proteins that bind F-actin often regulate F-actin assembly and its interaction with other proteins, while proteins that interact with G-actin often control the availability of unpolymerised actin. CAPs bind G-actin. In addition to actin-binding, CAPs can have additional roles, and may act as bifunctional proteins. In Saccharomyces cerevisiae (Baker's yeast), CAP is a component of the adenylyl cyclase complex (Cyr1p) that serves as an effector of Ras during normal cell signalling. S. cerevisiae CAP functions to expose adenylate cyclase binding sites to Ras, thereby enabling adenylate cyclase to be activated by Ras regulatory signals. In Schizosaccharomyces pombe (Fission yeast), CAP is also required for adenylate cyclase activity, but not through the Ras pathway. In both organisms, the N-terminal domain is responsible for adenylate cyclase activation, but the S cerevisiae and S. pombe N-termini cannot complement one another. Yeast CAPs are unique among the CAP family of proteins, because they are the only ones to directly interact with and activate adenylate cyclase [ ]. S. cerevisiae CAP has four major domains. In addition to the N-terminal adenylate cyclase-interacting domain, and the C-terminal actin-binding domain, it possesses two other domains: a proline-rich domain that interacts with Src homology 3 (SH3) domains of specific proteins, and a domain that is responsible for CAP oligomerisation to form multimeric complexes (although oligomerisation appears to involve the N- and C-terminal domains as well). The proline-rich domain interacts with profilin, a protein that catalyses nucleotide exchange on G-actin monomers and promotes addition to barbed ends of filamentous F-actin []. Since CAP can bind profilin via a proline-rich domain, and G-actin via a C-terminal domain, it has been suggested that a ternary G-actin/CAP/profilin complex could be formed.This entry represents the C-terminal domain of CAP proteins, which is responsible for G-actin-binding. This domain has a superhelical structure, where the superhelix turns are made of two β-strands each [ ].
Protein Domain
Name: Adenylate cyclase-associated CAP, N-terminal domain superfamily
Type: Homologous_superfamily
Description: Cyclase-associated proteins (CAPs) are highly conserved actin-binding proteins present in a wide range of organisms including yeast, fly, plants, and mammals. CAPs are multifunctional proteins that contain several structural domains. CAP is involved in species-specific signalling pathways [ , , , ]. In Drosophila, CAP functions in Hedgehog-mediated eye development and in establishing oocyte polarity. In Dictyostelium (slim mold), CAP is involved in microfilament reorganisation near the plasma membrane in a PIP2-regulated manner and is required to perpetuate the cAMP relay signal to organise fruitbody formation. In plants, CAP is involved in plant signalling pathways required for co-ordinated organ expansion. In yeast, CAP is involved in adenylate cyclase activation, as well as in vesicle trafficking and endocytosis. In both yeast and mammals, CAPs appear to be involved in recycling G-actin monomers from ADF/cofilins for subsequent rounds of filament assembly [, ]. In mammals, there are two different CAPs (CAP1 and CAP2) that share 64% amino acid identity. All CAPs appear to contain a C-terminal actin-binding domain that regulates actin remodelling in response to cellular signals and is required for normal cellular morphology, cell division, growth and locomotion in eukaryotes. CAP directly regulates actin filament dynamics and has been implicated in a number of complex developmental and morphological processes, including mRNA localisation and the establishment of cell polarity. Actin exists both as globular (G) (monomeric) actin subunits and assembled into filamentous (F) actin. In cells, actin cycles between these two forms. Proteins that bind F-actin often regulate F-actin assembly and its interaction with other proteins, while proteins that interact with G-actin often control the availability of unpolymerised actin. CAPs bind G-actin. In addition to actin-binding, CAPs can have additional roles, and may act as bifunctional proteins. In Saccharomyces cerevisiae (Baker's yeast), CAP is a component of the adenylyl cyclase complex (Cyr1p) that serves as an effector of Ras during normal cell signalling. S. cerevisiae CAP functions to expose adenylate cyclase binding sites to Ras, thereby enabling adenylate cyclase to be activated by Ras regulatory signals. In Schizosaccharomyces pombe (Fission yeast), CAP is also required for adenylate cyclase activity, but not through the Ras pathway. In both organisms, the N-terminal domain is responsible for adenylate cyclase activation, but the S cerevisiae and S. pombe N-termini cannot complement one another. Yeast CAPs are unique among the CAP family of proteins, because they are the only ones to directly interact with and activate adenylate cyclase [ ]. S. cerevisiae CAP has four major domains. In addition to the N-terminal adenylate cyclase-interacting domain, and the C-terminal actin-binding domain, it possesses two other domains: a proline-rich domain that interacts with Src homology 3 (SH3) domains of specific proteins, and a domain that is responsible for CAP oligomerisation to form multimeric complexes (although oligomerisation appears to involve the N- and C-terminal domains as well). The proline-rich domain interacts with profilin, a protein that catalyses nucleotide exchange on G-actin monomers and promotes addition to barbed ends of filamentous F-actin []. Since CAP can bind profilin via a proline-rich domain, and G-actin via a C-terminal domain, it has been suggested that a ternary G-actin/CAP/profilin complex could be formed.This entry represents the N-terminal domain of CAP proteins. This domain has an all-alpha structure consisting of six helices in a bundle with a left-handed twist and an up-and-down topology [ ].
Protein Domain
Name: Cold-shock protein, DNA-binding
Type: Domain
Description: When Escherichia coli is exposed to a temperature drop from 37 to 10 degrees centigrade, a 4-5 hour lag phase occurs, after which growth is resumed at a reduced rate [ ]. During the lag phase, the expression of around 13 proteins, which contain specific DNA-binding regions [], is increased 2-10 fold. These so-called 'cold shock' proteins (CSPs) are thought to help the cell to survive in temperatures lower than optimum growth temperature, by contrast with heat shock proteins, which help the cell to survive in temperatures greater than the optimum, possibly by condensation of the chromosome and organisation of the prokaryotic nucleoid []. A conserved domain of about 70 amino acids has been found in prokaryotic and eukaryotic DNA-binding proteins [, , ]. This domain is known as the 'cold-shock domain' (CSD), part of which is highly similar [] to the RNP-1 RNA-binding motif.CSPs include the major cold-shock proteins CspA and CspB in bacteria and the eukaryotic gene regulatory factor Y-box protein. CSP expression is up-regulated by an abrupt drop in growth temperature. CSPs are also expressed under normal condition at lower level. The function of cold-shock proteins is not fully understood. They preferentially bind poly-pyrimidine region of single-stranded RNA and DNA [ ]. CSPs are thought to bind mRNA and regulate ribosomal translation, mRNA degradation, and the rate of transcription termination. The human Y-box protein, which contains a CSD [], regulates transcription and translation of genes that contain the Y-box sequence in their promoters. This specific ssDNA-binding properties of CSD are required for the binding of Y-box protein to the promoter's Y-box sequence, thereby regulating transcription.
Protein Domain
Name: Protein-tyrosine phosphatase-like, PTPLA
Type: Family
Description: Protein tyrosine (pTyr) phosphorylation is a common post-translational modification which can create novel recognition motifs for protein interactions and cellular localisation, affect protein stability, and regulate enzyme activity. Consequently, maintaining an appropriate level of protein tyrosine phosphorylation is essential for many cellular functions. Tyrosine-specific protein phosphatases (PTPase; ) catalyse the removal of a phosphate group attached to a tyrosine residue, using a cysteinyl-phosphate enzyme intermediate. These enzymes are key regulatory components in signal transduction pathways (such as the MAP kinase pathway) and cell cycle control, and are important in the control of cell growth, proliferation, differentiation and transformation [ , ]. The PTP superfamily can be divided into four subfamilies []:(1) pTyr-specific phosphatases(2) dual specificity phosphatases (dTyr and dSer/dThr)(3) Cdc25 phosphatases (dTyr and/or dThr)(4) LMW (low molecular weight) phosphatasesBased on their cellular localisation, PTPases are also classified as:Receptor-like, which are transmembrane receptors that contain PTPase domains [ ] Non-receptor (intracellular) PTPases [ ] All PTPases carry the highly conserved active site motif C(X)5R (PTP signature motif), employ a common catalytic mechanism, and share a similar core structure made of a central parallel β-sheet with flanking α-helices containing a β-loop-α-loop that encompasses the PTP signature motif [ ]. Functional diversity between PTPases is endowed by regulatory domains and subunits. This family includes the mammalian protein tyrosine phosphatase-like protein, PTPLA. A significant variation of PTPLA from other protein tyrosine phosphatases is the presence of proline instead of catalytic arginine at the active site. It is thought that PTPLA proteins have a role in the development, differentiation, and maintenance of a number of tissue types [ ].
Protein Domain
Name: SUF system FeS cluster assembly, SufBD
Type: Family
Description: Iron-sulphur (FeS) clusters are important cofactors for numerous proteins involved in electron transfer, in redox and non-redox catalysis, in gene regulation, and as sensors of oxygen and iron. These functions depend on the various FeS cluster prosthetic groups, the most common being [2Fe-2S]and [4Fe-4S] []. FeS cluster assembly is a complex process involving the mobilisation of Fe and S atoms from storage sources, their assembly into [Fe-S]form, their transport to specific cellular locations, and their transfer to recipient apoproteins. So far, three FeS assembly machineries have been identified, which are capable of synthesising all types of [Fe-S] clusters: ISC (iron-sulphur cluster), SUF (sulphur assimilation), and NIF (nitrogen fixation) systems.The SUF system is an alternative pathway to the ISC system that operates under iron starvation and oxidative stress. It is found in eubacteria, archaea and eukaryotes (plastids). The SUF system is encoded by the suf operon (sufABCDSE), and the six encoded proteins are arranged into two complexes (SufSE and SufBCD) and one protein (SufA). SufS is a pyridoxal-phosphate (PLP) protein displaying cysteine desulphurase activity. SufE acts as a scaffold protein that accepts S from SufS and donates it to SufA [ ]. SufC is an ATPase with an unorthodox ATP-binding cassette (ABC)-like component. SufA is homologous to IscA [], acting as a scaffold protein in which Fe and S atoms are assembled into [FeS]cluster forms, which can then easily be transferred to apoproteins targets. This entry represents SufB and SufD proteins, which are homologous, and form part of the SufBCD complex in the SUF system [ ]. SufB accepts sulfur transferred from SufE [], whereas SufD may play a role in iron acquisition [].
USDA
InterMine logo
The Legume Information System (LIS) is a research project of the USDA-ARS:Corn Insects and Crop Genetics Research in Ames, IA.
LegumeMine || ArachisMine | CicerMine | GlycineMine | LensMine | LupinusMine | PhaseolusMine | VignaMine | MedicagoMine
InterMine © 2002 - 2022 Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, United Kingdom