Search our database by keyword

- or -

Examples

  • Search this entire website. Enter identifiers, names or keywords for genes, pathways, authors, ontology terms, etc. (e.g. eve, embryo, zen, allele)
  • Use OR to search for either of two terms (e.g. fly OR drosophila) or quotation marks to search for phrases (e.g. "dna binding").
  • Boolean search syntax is supported: e.g. dros* for partial matches or fly AND NOT embryo to exclude a term

Search results 101 to 200 out of 38750 for *

Category restricted to ProteinDomain (x)

0.026s

Categories

Category: ProteinDomain
Type Details Score
Protein Domain      
Protein Domain
Name: Zinc finger, PHD-type
Type: Domain
Description: Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [ , , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. This entry represents the PHD (homeodomain) zinc finger domain [ ], which is a C4HC3 zinc-finger-like motif found in nuclear proteins thought to be involved in chromatin-mediated transcriptional regulation. The PHD finger motif is reminiscent of, but distinct from the C3HC4 type RING finger.The function of this domain is not yet known but in analogy with the LIM domain it could be involved in protein-protein interaction and be important for the assembly or activity of multicomponent complexes involved in transcriptional activation or repression. Alternatively, the interactions could be intra-molecular and be important in maintaining the structural integrity of the protein. In similarity to the RING finger and the LIM domain, the PHD finger is thought to bind two zinc ions.
Protein Domain
Name: Zinc finger, PHD-type, conserved site
Type: Conserved_site
Description: Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [ , , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. This entry represents the PHD (homeodomain) zinc finger domain [ ], which is a C4HC3 zinc-finger-like motif found in nuclear proteins thought to be involved in chromatin-mediated transcriptional regulation. The PHD finger motif is reminiscent of, but distinct from the C3HC4 type RING finger.The function of this domain is not yet known but in analogy with the LIM domain it could be involved in protein-protein interaction and be important for the assembly or activity of multicomponent complexes involved in transcriptional activation or repression. Alternatively, the interactions could be intra-molecular and be important in maintaining the structural integrity of the protein. In similarity to the RING finger and the LIM domain, the PHD finger is thought to bind two zinc ions.The signature of this entry starts at the first cysteine of the zinc finger region and ends at the last one. The spacing between cysteines in the PHD finger is closely related to that in the RING finger. Discrimination between these two domains with either a pattern or a profile is therefore difficult, and some rare domains are recognised by both the RING and PHD patterns and profiles.
Protein Domain
Name: Zinc finger, FYVE/PHD-type
Type: Homologous_superfamily
Description: Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [ , , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. The FYVE zinc finger domain is conserved from yeast to man, and is named after four proteins that it has been found in: Fab1, YOTB/ZK632.12, Vac1, and EEA1. It functions in the membrane recruitment of cytosolic proteins by binding to phosphatidylinositol 3-phosphate (PI3P), which is found mainly on endosomes [ , ].The plant homeodomain (PHD) zinc finger domain has a C4HC3-type motif, and is widely distributed in eukaryotes, being found in many chromatin regulatory factors [ ].Both the FYVE and the PHD zinc finger motifs display strikingly similar dimetal(zinc)-bound alpha+beta folds.
Protein Domain
Name: Transcription factor BREVIS RADIX, N-terminal domain
Type: Domain
Description: This entry represents the N-terminal α-helical domain of the BREVIS RADIX (BRX), which was characterised as being a transcription factor in plants regulating the extent of cell proliferation and elongation in the growth zone of the root [ , ]. BRX maintains a rate-limiting brassinosteroid biosynthesis enzyme expression to keep brassinosteroid biosynthesis above a critical threshold []. BRX has a ubiquitous, although quantitatively variable role in modulating the growth rate in both the root and the shoot []. This entry features a short α-helical domain, N-terminal to the repeated α-helices of the BRX domain ( ).
Protein Domain
Name: Brevis radix (BRX) domain
Type: Domain
Description: This is a short domain, approximately 35 residues in length that is found near the C terminus in a number of plant proteins, being repeated in some members.It is found in Brevis radix-like proteins. These may act as a regulator of cell proliferation and elongation in the root [ ]. It is also found in proteins annotated as involved in disease resistance and in the regulation of chromosome condensation, which also contain other domains with varied functions, such as TIR () and FYVE ( ) respectively.
Protein Domain
Name: Regulator of chromosome condensation 1/beta-lactamase-inhibitor protein II
Type: Homologous_superfamily
Description: The beta-lactamase-inhibitor protein II (BLIP-II) is a secreted protein produced by the soil bacteria Streptomyces exfoliates SMF19. BLIP-II acts as a potent inhibitor of beta-lactamases such as TEM-1, which is the most widespread resistance enzyme to penicillin antibiotics. BLIP-II binds competitively to TEM-1, but no direct contacts are made with TEM-1 active site residues. BLIP-II shows no sequence similarity with BLIP, even though both bind to and inhibit TEM-1. However, BLIP-II does share significant sequence identity with the regulator of chromosome condensation (RCC1) family of proteins. These two families are clearly related, both having a seven-bladed β-propeller structure, although they differ in the number of strands per blade, BLIP-II having three antiparallel β-strands per blade, while RCC1 has four-stranded blades []. RCC1 is a eukaryotic nuclear protein that acts as a guanine nucleotide exchange factor for Ran, a member of the Ras GTPase family. RCC1 mediates a Ran-GTP gradient necessary for the regulation of spindle formation and nuclear assembly during mitosis, as well as for the transport of macromolecules across the nuclear membrane during interphase.
Protein Domain
Name: Regulator of chromosome condensation, RCC1
Type: Repeat
Description: The regulator of chromosome condensation (RCC1) [ ] is a eukaryotic proteinwhich binds to chromatin and interacts with ran, a nuclear GTP-binding protein , to promote the loss of bound GDP and the uptake of fresh GTP, thus acting as a guanine-nucleotide dissociation stimulator (GDS).The interaction of RCC1 with ran probably plays an important role in the regulation of gene expression.RCC1, known as PRP20 or SRM1 in yeast, pim1 in fission yeast and BJ1 in Drosophila, is a protein that contains seven tandem repeats of a domain ofabout 50 to 60 amino acids. As shown in the following schematic representation, the repeats make up the major part of the length of theprotein. Outside the repeat region, there is just a small N-terminal domain of about 40 to 50 residues and, in the Drosophila protein only, a C-terminaldomain of about 130 residues.+----+-------+-------+-------+-------+-------+-------+-------+-------------+ |N-t.|Rpt. 1 |Rpt. 2 |Rpt. 3 |Rpt. 4 |Rpt. 5 |Rpt. 6 |Rpt. 7 | C-terminal |+----+-------+-------+-------+-------+-------+-------+-------+-------------+ The RCC1-type of repeat is also found in the X-linked retinitis pigmentosaGTPase regulator [ ]. The RCC repeats form a β-propellerstructure.
Protein Domain
Name: PH-like domain superfamily
Type: Homologous_superfamily
Description: Pleckstrin homology (PH) domains are small modular domains that occur in a large variety of signalling proteins, where they serve as simple targeting domains that bind lipids [ , , ]. PH domains have a partly opened β-barrel topology that is capped by an α-helix. The structure of PH domains is similar to the phosphotyrosine-binding domain (PTB) found in IRS-1 (insulin receptor substrate 1) [ ], Shc adaptor and Numb []; to the Ran-binding domain, found in Nup nuclear pore complex and Ranbp1 []; to the Enabled/VASP homology domain 1 (EVH1 domain), found in Enabled, VASP (vasodilator-stimulated phosphoprotein), Homer and WASP actin regulatory protein []; and to the third domain of FERM, found in moesin, radixin, ezrin, merlin and talin [].This superfamily represents the PH domain and structurally related domains.
Protein Domain
Name: Zinc finger, FYVE-related
Type: Domain
Description: Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [ , , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. The FYVE zinc finger is named after four proteins that it has been found in: Fab1, YOTB/ZK632.12, Vac1, and EEA1. The FYVE finger has been shown to bind two Zn 2+ions [ ]. The FYVE finger has eight potential zinc coordinating cysteine positions. Many members of this family also include two histidines in a motif R+HHC+XCG, where + represents a charged residue and X any residue.
Protein Domain
Name: FYVE zinc finger
Type: Domain
Description: Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [ , , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. The FYVE zinc finger is named after four proteins that it has been found in: Fab1, YOTB/ZK632.12, Vac1, and EEA1. The FYVE finger has been shown to bind two zinc ions [ ]. The FYVE finger has eight potential zinc coordinating cysteine positions. Many members of this family also include two histidines in a motif R+HHC+XCG, where + represents a charged residue and X any residue. FYVE-type domains are divided into two known classes: FYVE domains that specifically bind to phosphatidylinositol 3-phosphate in lipid bilayers and FYVE-related domains of undetermined function []. Those that bind to phosphatidylinositol 3-phosphate are often found in proteins targeted to lipid membranes that are involved in regulating membrane traffic [, , ]. Most FYVE domains target proteins to endosomes by binding specifically to phosphatidylinositol-3-phosphate at the membrane surface. By contrast, the CARP2 FYVE-like domain is not optimized to bind to phosphoinositides or insert into lipid bilayers. FYVE domains are distinguished from other zinc fingers by three signature sequences: an N-terminal WxxD motif, a basic R(R/K)HHCR patch, and a C-terminal RVC motif.
Protein Domain      
Protein Domain
Name: Peptidase S24/S26A/S26B/S26C
Type: Domain
Description: This entry represents a stuctural domain superfamily found in serine peptidases belonging to MEROPS peptidase families: S24 (LexA family, clan SF); S26A (signal peptidase I), S26B (signalase) and S26C TraF peptidase. This domain has a complex fold made of several coiled β-sheets, which contains an SH3-like barrel structure.The S26 family includes Escherichia coli signal peptidase, SPase, which is a membrane-bound endopeptidase with two N-terminal transmembrane segments and a C-terminal catalytic region [ ]. SPase functions to release proteins that have been translocated into the inner membrane from the cell interior, by cleaving off their signal peptides. In SPase proteins, this domain is disrupted by the insertion of an additional all-beta subdomain. Note: This signature covers both the SH3-like barrel β-ribbon domain and the all-β subdomain inserted into it.The S24 family includes:the lambda repressor CI/C2 family and related bacterial prophage repressor proteins [ ]. LexA, the diverse family of bacterial transcription factors that repress genes in the cellular SOS response to DNA damage [ , ]. MucA and the related UmuD proteins, which are lesion-bypass DNA polymerases, induced in response to mitogenic DNA damage [ ]. UmuD is self-processed by its own serine protease activity during the SOS response.RulA, a component of the rulAB locus that confers resistance to UV.All of these proteins, with the possible exception of RulA, interact with RecA, which activates self cleavage either derepressing transcription in the case of CI and LexA [ ] or activating the lesion-bypass polymerase in the case of UmuD and MucA. UmuD'2, is the homodimeric component of DNA pol V, which is produced from UmuD by RecA-facilitated self-cleavage. The first 24 N-terminal residues of UmuD are removed; UmuD'2 is a DNA lesion bypass polymerase [, ]. MucA [, ], like UmuD, is a plasmid encoded a DNA polymerase (pol RI) which is converted into the active lesion-bypass polymerase by a self-cleavage reaction involving RecA [].This group of proteins also contains proteins not recognised as peptidases as well as those classified as non-peptidase homologues as they either have been found experimentally to be without peptidase activity, or lack amino acid residues that are believed to be essential for catalytic activity.
Protein Domain      
Protein Domain
Name: Peptidase S26B
Type: Family
Description: Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes [ ]. They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Many families of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence []. Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases [].Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base [ ]. The geometric orientations of the catalytic residues are similar between families, despite different protein folds []. The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) [, ].This group of serine peptidases belong to MEROPS peptidase family S26 (signal peptidase I family, clan SF), subfamily S26B.Eukaryotic microsomal signal peptidase is involved in the removal of signal peptides from secretory proteins as they pass into the endoplasmic reticulum lumen [ ]. The peptidase is more complex than its mitochondrial and bacterial counterparts, containing a number of subunits, ranging from two in the chicken oviduct peptidase, to five in the dog pancreas protein []. They share sequence similarity with the bacterial leader peptidases (family S26A), although activity here is mediated by a serine/histidine dyad rather than a serine/lysine dyad []. Archaeal signal peptidases also belong to this group.
Protein Domain
Name: Late embryogenesis abundant protein, LEA_1 subgroup
Type: Family
Description: LEA (late embryogenesis abundant) proteins were first identified in land plants. Plant LEA proteins have been found to accumulate to high levels during the last stage of seed formation (when a natural desiccation of the seed tissues takes place) and during periods of water deficit in vegetative organs. Later, LEA homologues have also been found in various species [, ]. They have been classified into several subgroups in Pfam and according to Bray and Dure [].This entry represents Pfam LEA_1, or D-113 from Dure, or group 4 from Bray. Proteins in this entry include LEA6, LEA18 and LEA46 from Arabidopsis. They may play roles in the adaptive process to water deficit in higher plants [ ].
Protein Domain
Name: DNA replication ATP-dependent helicase/nuclease Dna2/JHS1, DEXXQ-box helicase domain
Type: Domain
Description: Dna2 and its plant homologue JHS1 are DNA replication factors with single-stranded DNA-dependent ATPase, ATP-dependent nuclease, (5'-flap endonuclease) and helicase activities. It is required for Okazaki fragment processing and is involved in DNA repair pathways [ , ]. The helicase activity is weak and its function remains unclear [, , , ].This entry represents DEXXQ-box helicase domain of DNA2 and homologues. It contains the ATP-binding region [ ].
Protein Domain
Name: DNA replication factor Dna2, N-terminal
Type: Domain
Description: This entry represents N-terminal domain of the DNA replication factor Dna2.Dna2 and its plant homologue JHS1 are DNA replication factors with single-stranded DNA-dependent ATPase, ATP-dependent nuclease, (5'-flap endonuclease) and helicase activities. It is required for Okazaki fragment processing and is involved in DNA repair pathways [ , ]. The helicase activity is weak and its function remains unclear [, , , ].
Protein Domain
Name: Dna2/Cas4, domain of unknown function DUF83
Type: Domain
Description: This entry represents an uncharacterised domain found in several proteins, including DNA replication helicase Dna2, clustered regularly interspaced short palindromic repeats (CRISPR)-associated exonuclease Cas4 and putative RecB family exonuclease proteins.Dna2 is a DNA replication factor with single-stranded DNA-dependent ATPase, ATP-dependent nuclease, (5'-flap endonuclease) and helicase activities. Cas4 has been shown to be a 5' to 3' single stranded DNA exonuclease in Sulfolobus solfataricus [ ].
Protein Domain
Name: Ribosomal protein L18
Type: Family
Description: This family includes the large subunit ribosomal protein L18 from bacteria, mitochondria and plastids.Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [ , ]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [, ].
Protein Domain
Name: Development/cell death domain
Type: Domain
Description: The DCD (Development and Cell Death) domain is found in plant proteins involved in development and cell death. The DCD domain is an ~130 amino acid long stretch that contains several mostly invariable motifs. These include a FGLP and a LFL motif at the N terminus and a PAQV and a PLxE motif towards the C terminus of the domain. The DCD domain is present in proteins with different architectures. Some of these proteins contain additional recognizable motifs, like the KELCH repeats or the ParB domain [ ]. Biological studies indicate a role of these proteins in phytohormone response, embryo development and programmed cell death by pathogens or ozone.The predicted secondary structure of the DCD domain is mostly composed of beta strands and confined by an α-helix at the N- and at the C terminus [ ].Proteins known to contain a DCD domain are listed below: Carrot B2 protein.Pea Gda-1 protein.Soybean N-rich protein (NRP).
Protein Domain
Name: Leucine-rich repeat
Type: Repeat
Description: Leucine-rich repeats (LRR) consist of 2-45 motifs of 20-30 amino acids in length that generally folds into an arc or horseshoe shape [ ]. LRRs occur in proteins ranging from viruses to eukaryotes, and appear to provide a structural framework for the formation of protein-protein interactions [, ].Proteins containing LRRs include tyrosine kinase receptors, cell-adhesion molecules, virulence factors, and extracellular matrix-binding glycoproteins, and are involved in a variety of biological processes, including signal transduction, cell adhesion, DNA repair, recombination, transcription, RNA processing, disease resistance, apoptosis, and the immune response [, ].Sequence analyses of LRR proteins suggested the existence of several different subfamilies of LRRs. The significance of this classification is that repeats from different subfamilies never occur simultaneously and have most probably evolved independently. It is, however, now clear that all major classes of LRR have curved horseshoe structures with a parallel beta sheet on the concave side and mostly helical elements on the convex side. At least six families of LRR proteins, characterised by different lengths and consensus sequences of the repeats, have been identified. Eleven-residue segments of the LRRs (LxxLxLxxN/CxL), corresponding to the β-strand and adjacent loop regions, are conserved in LRR proteins, whereas the remaining parts of the repeats (herein termed variable) may be very different. Despite the differences, each of the variable parts contains two half-turns at both ends and a "linear"segment (as the chain follows a linear path overall), usually formed by a helix, in the middle. The concave face and the adjacent loops are the most common protein interaction surfaces on LRR proteins. 3D structure of some LRR proteins-ligand complexes show that the concave surface of LRR domain is ideal for interaction with α-helix, thus supporting earlier conclusions that the elongated and curved LRR structure provides an outstanding framework for achieving diverse protein-protein interactions []. Molecular modeling suggests that the conserved pattern LxxLxL, which is shorter than the previously proposed LxxLxLxxN/CxL is sufficient to impart the characteristic horseshoe curvature to proteins with 20- to 30-residue repeats [].
Protein Domain
Name: Palmitoyltransferase, DHHC domain
Type: Domain
Description: This entry refers to the DHHC domain, found in DHHC proteins which are palmitoyltransferases [ ].Palmitoylation or, more specifically S-acylation, plays important roles in the regulation of protein localization, stability, and activity. It is a post-translational protein modification that involves the attachment of palmitic acid to Cys residues through a thioester linkage. Protein acyltransferases (PATs), also known as palmitoyltransferases, catalyse this reaction by transferring the palmitoyl group from palmitoyl-CoA to the thiol group of Cys residues. They are characterised by the presence of 50-residue-long domain called the DHHC domain, which in most but not all cases is also cysteine-rich and gets its name from a highly conserved DHHC signature tetrapeptide (Asp-His-His-Cys). The Cys residue within the DHHC domain forms a stable acyl intermediate and transfers the acyl chain to the Cys residues of a target protein [ , ].Some of the proteins containing a DHHC domain are listed below:Drosophila DNZ1 protein [ ] Mouse Abl-philin 2 (Aph2) protein. Interacts with c-Abl. May play a role in apoptosis [ ] Mammalian ZDHHC9, an integral membrane protein [ ] Yeast ankyrin repeat-containing protein AKR1 [ ] Yeast Erf2 protein. This protein localizes to the endoplasmic reticulum and seems to be important for Ras function [ ] Arabidopsis thaliana tip growth defective 1 [ ]
Protein Domain
Name: Pectinesterase inhibitor domain
Type: Domain
Description: This domain inhibits pectin methylesterases (PMEs) and invertases through formation of a non-covalent 1:1 complex [ ]. It has been implicated in the regulation of fruit development, carbohydrate metabolism and cell wall extension []. It may also be involved in inhibiting microbial pathogen PMEs. It has been observed that it is often expressed as a large inactive preprotein []. It is also found at the N-termini of PMEs, suggesting that both PMEs and their inhibitor are expressed as a single polyprotein and subsequently processed. It has two disulphide bridges and is mainly α-helical [].
Protein Domain
Name: DNA-binding domain superfamily
Type: Homologous_superfamily
Description: This superfamily represents a DNA-binding domain with a 2-layer beta(3)-alpha fold that is found in several DNA-binding proteins, including:DNA-binding domain of tn916 integrase [ ].N-terminal DNA-binding domain of lambda integrase [ ].GCC-box DNA-binding domains of certain transcription factors [ ].Methyl-CpG DNA-binding domain found in Methyl-CpG-binding protein 2 (MECP2) [ ] and methylation-dependent transcriptional repressor MBD1/PCM1 [].
Protein Domain
Name: AP2/ERF domain
Type: Domain
Description: Ethylene is an endogenous plant hormone that influences many aspects of plant growth and development. Some defense related genes that are induced by ethylene contain a cis-regulatory element known as the Ethylene-Responsive Element (ERE) [ ]. Sequence analysis on various ERE regions has identified a short motif rich in G/C nucleotides, the GCC-box, essential for the response to ethylene. This short motif is recognised by a family of transcrition factors, the ERE binding factors (ERF) [].ERF proteins contain a domain of around 60 amino acids which is also found in the APETALA2 (AP2) protein [ ]. This AP2/ERF domain has been shown in various proteins to be necessary and sufficient to bind the GCC-box [ ].The structure of the AP2/ERF domain in complex with the target DNA has been solved [ ]. The structure resembles that of bacteriophage integrases and the methyl-CpG-binding domain (MBD): a three-stranded β-sheet and an alpha helix almost parallel to the β-sheet. It contacts DNA via Arg and Trp residues located in the β-sheet. Some proteins known to contain an AP2/ERF domain include:Arabidopsis thaliana ERF1 to 6. Tobacco ethylene-responsive element-binding proteins (EREBPs), homologues of ERF proteins. Arabidopsis thaliana AP2 protein. It regulates meristeme identity, floral organ specification and seed coat development. Arabidopsis thaliana C-repeat/dehydration-responsive element (DRE) binding factor 1 (CBF1 or DREB1) and DREB2. They bind a GCC-box-like element found in dehydratation responsive element. Binding to this element mediates cold-inducible transcription. Arabidopsis thaliana and maize abscisic acid (ABA)-insensitive 4 (ABI4) proteins. They bind to a GCC-box-like element found in ABA-responsive genes.Octadecanoid-derivative responsive catharenthus AP2-domain (ORCA2) protein. It binds a GCC-box-like element in the jasmonate responsive element of Str promoter. Tomato Pto-interacting proteins 4 to 6 (Pti4 to Pti6). Pti5 and 6 bind a GCC-box-like element in regulatory regions of various pathogenesis-related (PR) genes. Trichodesmium erythraeum, Tetrahymena thermophila, Enterobacteria phage RB49 and bacteriophage Felix 01 HNH endonucleases. HNH endonucleases are homing endonucleases that move extensively via lateral gene transfer [ ]. This entry represents the AP2/ERF domain.
Protein Domain
Name: Haem peroxidase
Type: Domain
Description: Peroxidases are haem-containing enzymes that use hydrogen peroxide as the electron acceptor to catalyse a number of oxidative reactions.Most haem peroxidases follow the reaction scheme: Fe3++ H 2O 2-->[Fe 4+=O]R' (Compound I) + H2O [Fe4+=O]R' + substrate -->[Fe 4+=O]R (Compound II) + oxidised substrate[Fe4+=O]R + substrate -->Fe 3++ H 2O + oxidised substrate In this mechanism, the enzyme reacts with one equivalent of H 2O 2to give [Fe4+=O]R' (compound I). This is a two-electron oxidation/reduction reaction where H 2O 2is reduced to water and the enzyme is oxidised. One oxidising equivalent resides on iron, giving the oxyferryl [] intermediate, while in many peroxidases the porphyrin (R) is oxidised to the porphyrin pi-cation radical (R'). Compound I then oxidises an organic substrate to give a substrate radical [ ].Haem peroxidases include two superfamilies: one found in bacteria, fungi, plants and the second found in animals. The first one can be viewed as consisting of 3 major classes. ClassI, the intracellular peroxidases, includes: yeast cytochrome c peroxidase (CCP), a soluble protein found in the mitochondrial electron transportchain, where it probably protects against toxic peroxides; ascorbate peroxidase (AP), the main enzyme responsible for hydrogen peroxide removalin chloroplasts and cytosol of higher plants; and bacterial catalase- peroxidases, exhibiting both peroxidase and catalase activities. It isthought that catalase-peroxidase provides protection to cells under oxidative stress [].Class II consists of secretory fungal peroxidases: ligninases, or lignin peroxidases (LiPs), and manganese-dependent peroxidases (MnPs). These aremonomeric glycoproteins involved in the degradation of lignin. In MnP, Mn2+serves as the reducing substrate [ ]. Class II proteins contain fourconserved disulphide bridges and two conserved calcium-binding sites. Class III consists of the secretory plant peroxidases, which have multiple tissue-specific functions: e.g., removal of hydrogen peroxide fromchloroplasts and cytosol; oxidation of toxic compounds; biosynthesis of the cell wall; defence responses towards wounding; indole-3-acetic acid (IAA) catabolism; ethylene biosynthesis; and so on. Class III proteins are also monomeric glycoproteins, containing four conserved disulphide bridges and two calcium ions, although the placement of the disulphides differs from class II enzymes. The crystal structures of a number of these proteins show that they share the same architecture - two all-alpha domains between which the haem group is embedded. This entry represents the first type of haem peroxidases found in bacteria, fungi, plants.
Protein Domain
Name: Haem peroxidase superfamily
Type: Homologous_superfamily
Description: Peroxidases are haem-containing enzymes that use hydrogen peroxide as the electron acceptor to catalyse a number of oxidative reactions.Most haem peroxidases follow the reaction scheme: Fe3++ H 2O 2-->[Fe 4+=O]R' (Compound I) + H2O [Fe4+=O]R' + substrate -->[Fe 4+=O]R (Compound II) + oxidised substrate[Fe4+=O]R + substrate -->Fe 3++ H 2O + oxidised substrate In this mechanism, the enzyme reacts with one equivalent of H 2O 2to give [Fe4+=O]R' (compound I). This is a two-electron oxidation/reduction reaction where H 2O 2is reduced to water and the enzyme is oxidised. One oxidising equivalent resides on iron, giving the oxyferryl [] intermediate, while in many peroxidases the porphyrin (R) is oxidised to the porphyrin pi-cation radical (R'). Compound I then oxidises an organic substrate to give a substrate radical [ ].Haem peroxidases include two superfamilies: one found in bacteria, fungi, plants and the second found in animals. The animal peroxidases comprise a group of homologous proteins that differ markedly from the plant/fungal/bacterial peroxidases in primary, secondary and tertiary structure, but which share with them a common function. Animal peroxidases probably arose independently of the plant/fungal/bacterial peroxidase superfamily and most likely belong to a different gene family. The crystal structures of a number of these proteins show that the active sites of animal peroxidase and plant/fungal/bacterial peroxidases are remarkably similar [ ].
Protein Domain
Name: Peroxidase, active site
Type: Active_site
Description: Peroxidases are haem-containing enzymes that use hydrogen peroxide as the electron acceptor to catalyse a number of oxidative reactions.Most haem peroxidases follow the reaction scheme: Fe3++ H 2O 2-->[Fe 4+=O]R' (Compound I) + H2O [Fe4+=O]R' + substrate -->[Fe 4+=O]R (Compound II) + oxidised substrate[Fe4+=O]R + substrate -->Fe 3++ H 2O + oxidised substrate In this mechanism, the enzyme reacts with one equivalent of H 2O 2to give [Fe4+=O]R' (compound I). This is a two-electron oxidation/reduction reaction where H 2O 2is reduced to water and the enzyme is oxidised. One oxidising equivalent resides on iron, giving the oxyferryl [] intermediate, while in many peroxidases the porphyrin (R) is oxidised to the porphyrin pi-cation radical (R'). Compound I then oxidises an organic substrate to give a substrate radical [ ].Haem peroxidases include two superfamilies: one found in bacteria, fungi, plants and the second found in animals. The first one can be viewed as consisting of 3 major classes. ClassI, the intracellular peroxidases, includes: yeast cytochrome c peroxidase (CCP), a soluble protein found in the mitochondrial electron transportchain, where it probably protects against toxic peroxides; ascorbate peroxidase (AP), the main enzyme responsible for hydrogen peroxide removalin chloroplasts and cytosol of higher plants; and bacterial catalase- peroxidases, exhibiting both peroxidase and catalase activities. It isthought that catalase-peroxidase provides protection to cells under oxidative stress [].Class II consists of secretory fungal peroxidases: ligninases, or lignin peroxidases (LiPs), and manganese-dependent peroxidases (MnPs). These aremonomeric glycoproteins involved in the degradation of lignin. In MnP, Mn2+serves as the reducing substrate [ ]. Class II proteins contain fourconserved disulphide bridges and two conserved calcium-binding sites. Class III consists of the secretory plant peroxidases, which have multiple tissue-specific functions: e.g., removal of hydrogen peroxide fromchloroplasts and cytosol; oxidation of toxic compounds; biosynthesis of the cell wall; defence responses towards wounding; indole-3-acetic acid (IAA) catabolism; ethylene biosynthesis; and so on. Class III proteins are also monomeric glycoproteins, containing four conserved disulphide bridges and two calcium ions, although the placement of the disulphides differs from class II enzymes. The crystal structures of a number of these proteins show that they share the same architecture - two all-alpha domains between which the haem group is embedded. This entry represents an active site found in a number of peroxidases.
Protein Domain
Name: Plant peroxidase
Type: Family
Description: Peroxidases are haem-containing enzymes that use hydrogen peroxide as the electron acceptor to catalyse a number of oxidative reactions. Most haem peroxidases follow the reaction scheme:Fe3++ H 2O 2-->[Fe 4+=O]R' (Compound I) + H2O [Fe4+=O]R' + substrate -->[Fe 4+=O]R (Compound II) + oxidised substrate[Fe4+=O]R + substrate -->Fe 3++ H 2O + oxidised substrate In this mechanism, the enzyme reacts with one equivalent of H 2O 2to give [Fe 4+=O]R' (compound I). This is a two-electron oxidation/reduction reaction where H2O 2is reduced to water and the enzyme is oxidised. One oxidising equivalent resides on iron, giving the oxyferryl [ ] intermediate, while in many peroxidases the porphyrin (R) is oxidised to the porphyrin pi-cation radical (R'). Compound I then oxidises an organic substrate to give a substrate radical [].Peroxidases are found in bacteria, fungi, plants and animals and can be viewed as members of a superfamily consisting of 3 major classes. Class III comprises the secretory plant peroxidases, which have multiple tissue-specific functions e.g., removal of hydrogen peroxide from chloroplasts and cytosol; oxidation of toxic compounds; biosynthesis of the cell wall; defence responses towards wounding; indole-3-acetic acid (IAA) catabolism; ethylene biosynthesis; and so on. The wide spectrum of peroxidase activity, coupled with the participation in various physiological processes, is in keeping with its relative lack of specificity for substrates and the occurrence of a variety of isozymes. Plant peroxidases are monomeric glycoproteins containing 4 conserved disulphide bridges and 2 calcium ions. The 3D structure of peanut peroxidase has been shown to possess the same helical fold as class I and II peroxidases [].
Protein Domain
Name: Peroxidases heam-ligand binding site
Type: Binding_site
Description: Peroxidases are haem-containing enzymes that use hydrogen peroxide as the electron acceptor to catalyse a number of oxidative reactions.Most haem peroxidases follow the reaction scheme: Fe3++ H 2O 2-->[Fe 4+=O]R' (Compound I) + H2O [Fe4+=O]R' + substrate -->[Fe 4+=O]R (Compound II) + oxidised substrate[Fe4+=O]R + substrate -->Fe 3++ H 2O + oxidised substrate In this mechanism, the enzyme reacts with one equivalent of H 2O 2to give [Fe4+=O]R' (compound I). This is a two-electron oxidation/reduction reaction where H 2O 2is reduced to water and the enzyme is oxidised. One oxidising equivalent resides on iron, giving the oxyferryl [] intermediate, while in many peroxidases the porphyrin (R) is oxidised to the porphyrin pi-cation radical (R'). Compound I then oxidises an organic substrate to give a substrate radical [ ].Haem peroxidases include two superfamilies: one found in bacteria, fungi, plants and the second found in animals. The first one can be viewed as consisting of 3 major classes. ClassI, the intracellular peroxidases, includes: yeast cytochrome c peroxidase (CCP), a soluble protein found in the mitochondrial electron transportchain, where it probably protects against toxic peroxides; ascorbate peroxidase (AP), the main enzyme responsible for hydrogen peroxide removalin chloroplasts and cytosol of higher plants; and bacterial catalase- peroxidases, exhibiting both peroxidase and catalase activities. It isthought that catalase-peroxidase provides protection to cells under oxidative stress [].Class II consists of secretory fungal peroxidases: ligninases, or lignin peroxidases (LiPs), and manganese-dependent peroxidases (MnPs). These aremonomeric glycoproteins involved in the degradation of lignin. In MnP, Mn2+serves as the reducing substrate [ ]. Class II proteins contain fourconserved disulphide bridges and two conserved calcium-binding sites. Class III consists of the secretory plant peroxidases, which have multiple tissue-specific functions: e.g., removal of hydrogen peroxide fromchloroplasts and cytosol; oxidation of toxic compounds; biosynthesis of the cell wall; defence responses towards wounding; indole-3-acetic acid (IAA) catabolism; ethylene biosynthesis; and so on. Class III proteins are also monomeric glycoproteins, containing four conserved disulphide bridges and two calcium ions, although the placement of the disulphides differs from class II enzymes. The crystal structures of a number of these proteins show that they share the same architecture - two all-alpha domains between which the haem group is embedded. This entry represents the binding site for heam in a number of peroxidases.
Protein Domain
Name: SET domain
Type: Domain
Description: The SET domain is a 130 to 140 amino acid, evolutionary well conserved sequence motif that was initially characterised in the Drosophila proteins Su(var)3-9, Enhancer-of-zeste and Trithorax. In addition to these chromosomal proteins modulating gene activities and/or chromatin structure, the SET domain is found in proteins of diverse functions ranging from yeast to mammals, but also including some bacteria and viruses [, ].The SET domains of mammalian SUV39H1 and 2 and fission yeast clr4 have been shown to be necessary for the methylation of lysine-9 in the histone H3 N terminus []. However, this histone methyltransferase (HMTase) activity is probably restricted to a subset of SET domain proteins as it requires the combination of the SET domain with the adjacent cysteine-rich regions, one located N-terminally (pre-SET) and the other posterior to the SET domain (post-SET). Post- and pre- SET regions seem then to play a crucial role when it comes to substrate recognition and enzymatic activity [, ].The structure of the SET domain and the two adjacent regions pre-SET and post-SET have been solved [, , ]. The SET structure is all beta, but consists only in sets of few short strands composing no more than a couple of small sheets. Consequently the SET structure is mostly defined by turns and loops. An unusual feature is that the SET core is made up of two discontinual segments of the primary sequence forming an approximate L shape [, , ]. Two of the most conserved motifs in the SET domain are constituted by (1) a stretch at the C-terminal containing a strictly conserved tyrosine residue and (2) a preceding loop inside which the C-terminal segment passes forming a knot-like structure, but not quite a true knot. These two regions have been proven to be essential for SAM binding and catalysis, particularly the invariant tyrosine where in all likelihood catalysis takes place [, ].
Protein Domain
Name: Endonuclease/exonuclease/phosphatase
Type: Domain
Description: This domain is found in a large number of proteins including magnesium dependent endonucleases and phosphatases involved in intracellular signalling [ ]. Proteins this domain is found in include: AP endonuclease proteins (), DNase I proteins ( ), Synaptojanin an inositol-1,4,5-trisphosphate phosphatase ( ) and Sphingomyelinase ( ).
Protein Domain
Name: Leucine-rich repeat-containing N-terminal, plant-type
Type: Domain
Description: Leucine-rich repeats (LRR) consist of 2-45 motifs of 20-30 amino acids in length that generally folds into an arc or horseshoe shape []. LRRs occur in proteins ranging from viruses to eukaryotes, and appear to provide a structural framework for the formation of protein-protein interactions [, ].Proteins containing LRRs include tyrosine kinase receptors, cell-adhesion molecules, virulence factors, and extracellular matrix-binding glycoproteins, and are involved in a variety of biological processes, including signal transduction, cell adhesion, DNA repair, recombination, transcription, RNA processing, disease resistance, apoptosis, and the immune response [, ].Sequence analyses of LRR proteins suggested the existence of several different subfamilies of LRRs. The significance of this classification is that repeats from different subfamilies never occur simultaneously and have most probably evolved independently. It is, however, now clear that all major classes of LRR have curved horseshoe structures with a parallel beta sheet on the concave side and mostly helical elements on the convex side. At least six families of LRR proteins, characterised by different lengths and consensus sequences of the repeats, have been identified. Eleven-residue segments of the LRRs (LxxLxLxxN/CxL), corresponding to the β-strand and adjacent loop regions, are conserved in LRR proteins, whereas the remaining parts of the repeats (herein termed variable) may be very different. Despite the differences, each of the variable parts contains two half-turns at both ends and a "linear"segment (as the chain follows a linear path overall), usually formed by a helix, in the middle. The concave face and the adjacent loops are the most common protein interaction surfaces on LRR proteins. 3D structure of some LRR proteins-ligand complexes show that the concave surface of LRR domain is ideal for interaction with α-helix, thus supporting earlier conclusions that the elongated and curved LRR structure provides an outstanding framework for achieving diverse protein-protein interactions []. Molecular modeling suggests that the conserved pattern LxxLxL, which is shorter than the previously proposed LxxLxLxxN/CxL is sufficient to impart the characteristic horseshoe curvature to proteins with 20- to 30-residue repeats []. This domain is often found at the N terminus of tandem leucine rich repeats, mainly in plant proteins.
Protein Domain
Name: Transketolase-like, pyrimidine-binding domain
Type: Domain
Description: Transketolase (TK) catalyses the reversible transfer of a two-carbon ketol unit from xylulose 5-phosphate to an aldose receptor, such asribose 5-phosphate, to form sedoheptulose 7-phosphate and glyceraldehyde 3- phosphate. This enzyme, together with transaldolase, provides a link betweenthe glycolytic and pentose-phosphate pathways. TK requires thiamine pyrophosphate as a cofactor. In most sources where TK hasbeen purified, it is a homodimer of approximately 70 Kd subunits. TK sequences from a variety of eukaryotic and prokaryotic sources [, ] show that theenzyme has been evolutionarily conserved. In the peroxisomes of methylotrophic yeast Pichia angusta (Yeast) (Hansenula polymorpha), there is ahighly related enzyme, dihydroxy-acetone synthase (DHAS) (also known as formaldehyde transketolase), which exhibits a very unusualspecificity by including formaldehyde amongst its substrates. 1-deoxyxylulose-5-phosphate synthase (DXP synthase) [] is an enzyme so farfound in bacteria (gene dxs) and plants (gene CLA1) which catalyses the thiamine pyrophosphoate-dependent acyloin condensation reaction between carbonatoms 2 and 3 of pyruvate and glyceraldehyde 3-phosphate to yield 1-deoxy-D-xylulose-5-phosphate (dxp), a precursor in the biosynthetic pathway to isoprenoids, thiamine (vitamin B1), and pyridoxol (vitamin B6). DXP synthaseis evolutionary related to TK. The N-terminal section, contains a histidine residue which appears to function inproton transfer during catalysis [ ]. This entry represents the centralsection there are conserved acidic residues that are part of the active cleft and may participate in substrate-binding [].This group of proteins includes transketolase enzymes and 2-oxoisovalerate dehydrogenasebeta subunit . Both these enzymes utilise thiamine pyrophosphate as a cofactor, suggestingthere may be common aspects in their mechanism of catalysis.
Protein Domain
Name: Transketolase C-terminal/Pyruvate-ferredoxin oxidoreductase domain II
Type: Homologous_superfamily
Description: Transketolase C-terminal-like domains [ ] can be found in a number of different enzymes, including the C-terminal domain of the pyruvate dehydrogenase E1 component [], the C-terminal domain of branched-chain alpha-keto acid dehydrogenases [], and domain II of pyruvate-ferredoxin oxidoreductase (PFOR) []. Structural studies reveal this domain to comprise of three layers alpha/beta/alpha. The mixed beta sheet consists of five strands in the order 13245, where strand 1 is antiparallel to the others. This domain has been proposed as a regulatory molecule binding site in transketolase [].
Protein Domain
Name: Hydrophobic seed protein domain
Type: Domain
Description: This domain has a four-helix bundle structure. It contains four disulfide bonds, of which three function to keep the C- and N-terminal parts of the molecule in place [ ].
Protein Domain
Name: Bifunctional inhibitor/plant lipid transfer protein/seed storage helical domain
Type: Domain
Description: This entry represents a structural domain consisting of 4-helices with a folded leaf topology, and forming a right-handed superhelix. This domain occurs in several proteins, including:Plant lipid-transfer proteins, such as the non-specific lipid-transfer proteins ns-LTP1 and ns-LTP2 [ , ].Proteinase/alpha-amylase inhibitors, such as trypsin/alpha-amylase inhibitor RBI from Eleusine coracana (Indian finger millet) [ ] and Hageman factor/amylase inhibitor from Zea mays (Maize) [].Seed storage proteins, such as napin from Brassica napus (Rape) [ ] and 2S albumin from Ricinus communis (Castor bean) [].
Protein Domain
Name: 14-3-3 protein
Type: Family
Description: The 14-3-3 proteins are a large family of approximately 30kDa acidic proteins which exist primarily as homo- and heterodimers within all eukaryotic cells [ , ]. These are structurally similar phospho-binding proteins that regulate multiple signaling pathways []. There is a high degree of sequence identity and conservation between all the 14-3-3 isotypes, particularly in the regions which form the dimer interface or line the central ligand binding channel of the dimeric molecule. Each 14-3-3 protein sequence can be roughly divided into three sections: a divergent amino terminus, the conserved core region and a divergent carboxyl terminus. The conserved middle core region of the 14-3-3s encodes an amphipathic groove that forms the main functional domain, a cradle for interacting with client proteins. The monomer consists of nine helices organised in an antiparallel manner, forming an L-shaped structure. The interior of the L-structure is composed of four helices: H3 and H5, which contain many charged and polar amino acids, and H7 and H9, which contain hydrophobic amino acids. These four helices form the concave amphipathic groove that interacts with target peptides.The 14-3-3 proteins mainly bind proteins containing phosphothreonine or phosphoserine motifs, however exceptions to this rule do exist. Extensive investigation of the 14-3-3 binding site of the mammalian serine/threonine kinase Raf-1 has produced a consensus sequence for 14-3-3-binding, RSxpSxP (in the single-letter amino-acid code, where x denotes any amino acid and p indicates that the next residue is phosphorylated). The 14-3-3 proteins appear to effect intracellular signalling in one of three ways - by direct regulation of the catalytic activity of the bound protein, by regulating interactions between the bound protein and other molecules in the cell by sequestration or modification or by controlling the subcellular localisation of the bound ligand. Proteins appear to initially bind to a single dominant site and then subsequently to many, much weaker secondary interaction sites. The 14-3-3 dimer is capable of changing the conformation of its bound ligand whilst itself undergoing minimal structural alteration.
Protein Domain
Name: 14-3-3 protein, conserved site
Type: Conserved_site
Description: The 14-3-3 proteins are a family of closely related acidic homodimeric proteins of about 30kDa which were first identified as being very abundant in mammalian brain tissues and located preferentially in neurons [ , , ]. The 14-3-3 proteins seem to have multiple biological activities and play a key role in signal transduction pathways and the cell cycle. They interacts with kinases such as PKC or Raf-1; they seem to also function as protein-kinase dependent activators of tyrosine and tryptophan hydroxylases and in plants they are associated with a complex that binds to the G-box promoter elements. The 14-3-3 family of proteins are ubiquitously found in all eukaryotic species studied and have been sequenced in fungi (yeast BMH1 and BMH2, fission yeast rad24 and rad25), plants, Drosophila, and vertebrates. The sequences of the 14-3-3 proteins are extremely well conserved. As signature patterns we have selected two highly conserved regions: the first is a peptide of 11 residues located in the N-terminal section; the second, a 20 amino acid region located in the C-terminal section. This signature patterns in this entry cover both the 11 and 20 residue conserved regions.
Protein Domain
Name: 14-3-3 domain
Type: Domain
Description: The 14-3-3 proteins are a large family of approximately 30kDa acidic proteins which exist primarily as homo- and heterodimers within all eukaryotic cells [ , ]. These are structurally similar phospho-binding proteins that regulate multiple signaling pathways []. There is a high degree of sequence identity and conservation between all the 14-3-3 isotypes, particularly in the regions which form the dimer interface or line the central ligand binding channel of the dimeric molecule. Each 14-3-3 protein sequence can be roughly divided into three sections: a divergent amino terminus, the conserved core region and a divergent carboxyl terminus. The conserved middle core region of the 14-3-3s encodes an amphipathic groove that forms the main functional domain, a cradle for interacting with client proteins. The monomer consists of nine helices organised in an antiparallel manner, forming an L-shaped structure. The interior of the L-structure is composed of four helices: H3 and H5, which contain many charged and polar amino acids, and H7 and H9, which contain hydrophobic amino acids. These four helices form the concave amphipathic groove that interacts with target peptides.The 14-3-3 proteins mainly bind proteins containing phosphothreonine or phosphoserine motifs, however exceptions to this rule do exist. Extensive investigation of the 14-3-3 binding site of the mammalian serine/threonine kinase Raf-1 has produced a consensus sequence for 14-3-3-binding, RSxpSxP (in the single-letter amino-acid code, where x denotes any amino acid and p indicates that the next residue is phosphorylated). The 14-3-3 proteins appear to effect intracellular signalling in one of three ways - by direct regulation of the catalytic activity of the bound protein, by regulating interactions between the bound protein and other molecules in the cell by sequestration or modification or by controlling the subcellular localisation of the bound ligand. Proteins appear to initially bind to a single dominant site and then subsequently to many, much weaker secondary interaction sites. The 14-3-3 dimer is capable of changing the conformation of its bound ligand whilst itself undergoing minimal structural alteration. This entry represents the structural domain found in 14-3-3 proteins.
Protein Domain
Name: Short-chain dehydrogenase/reductase SDR
Type: Family
Description: The short-chain dehydrogenases/reductases family (SDR) [ , ] is a very large family of enzymes, most of which are known to be NAD- or NADP-dependent oxidoreductases. As the first member of this family to be characterised was Drosophila alcohol dehydrogenase, this family used to be called [, , ] 'insect-type', or 'short-chain' alcohol dehydrogenases. Most members of this family are proteins of about 250 to 300 amino acid residues. Most dehydrogenases possess at least 2 domains [], the first binding the coenzyme, often NAD, and the second binding the substrate. This latter domain determines the substrate specificity and contains amino acids involved in catalysis. Little sequence similarity has been found in the coenzyme binding domain although there is a large degree of structural similarity, and it has therefore been suggested that the structure of dehydrogenases has arisen through gene fusion of a common ancestral coenzyme nucleotide sequence with various substrate specific domains [ ].
Protein Domain
Name: NAD(P)-binding domain
Type: Domain
Description: This entry represents NAD- and NADP-binding domains with a core Rossmann-type fold, which consists of 3-layers α/β/α, where the six β-strands are parallel in the order 321456. Many different enzymes contain an NAD/NADP-binding domain, including:C-terminal domain of alcohol dehydrogenases [ ] Tyrosine-dependent oxidoreductases (also known as short-chain dehydrogenases) [ ] N-terminal domain of glyceraldehyde-3-phosphate dehydrogenase [ ] NAD-binding domain of formate/glycerate dehydrogenases [ ] N-terminal domain of sirohaem synthase [ ] N-terminal domain of lactate dehydrogenase [ ] N-terminal domain of 6-phosphogluconate dehydrogenase (the β-sheet is extended to 8 strands) [ ] C-terminal domain of amino acid dehydrogenases (an extra N-terminal helix displaces the C-terminal helix [ ] NAD-binding domain of certain potassium channels [ ] C-terminal domain of the transcriptional repressor Rex [ ] Ornithine cyclodeaminase [ ] CoA-binding N-terminal domain of the alpha chain of succinyl-CoA synthetase [ ]
Protein Domain      
Protein Domain
Name: Short-chain dehydrogenase/reductase, conserved site
Type: Conserved_site
Description: The short-chain dehydrogenases/reductases family (SDR) [ , ] is a very large family of enzymes, most of which are known to be NAD- or NADP-dependent oxidoreductases. As the first member of this family to be characterised was Drosophila alcohol dehydrogenase, this family used to be called [, , ] 'insect-type', or 'short-chain' alcohol dehydrogenases. Most members of this family are proteins of about 250 to 300 amino acid residues. Most dehydrogenases possess at least 2 domains [], the first binding the coenzyme, often NAD, and the second binding the substrate. This latter domain determines the substrate specificity and contains amino acids involved in catalysis. Little sequence similarity has been found in the coenzyme binding domain although there is a large degree of structural similarity, and it has therefore been suggested that the structure of dehydrogenases has arisen through gene fusion of a common ancestral coenzyme nucleotide sequence with various substrate specific domains [ ].This entry contains a signature pattern for this family of proteins which covers one of the best conserved regions. It includes two perfectly conserved residues, a tyrosine and a lysine. The tyrosine residue participates in the catalytic mechanism.
Protein Domain
Name: HnRNP-L/PTB
Type: Family
Description: Included in this family of heterogeneous ribonucleoproteins are PTB (polypyrimidine tract binding protein [ ]) and hnRNP-L []. These proteins contain four RNA recognition motifs.
Protein Domain
Name: Ribosomal protein L3, archaeal
Type: Family
Description: Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [ , ]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [ , ].Ribosomal protein L3 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L3 is known to bind to the 23S rRNA and may participate in the formation of the peptidyltransferase centre of the ribosome. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities includes bacterial, red algal, cyanelle, mammalian, yeast and Arabidopsis thaliana L3 proteins; archaeal Haloarcula marismortui HmaL3 (HL1), and yeast mitochondrial YmL9 [, , ].This entry represents archaeal L3 proteins.
Protein Domain
Name: Ribosomal protein L3, conserved site
Type: Conserved_site
Description: Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [ , ]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [, ].Ribosomal protein L3 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L3 is known to bind to the 23S rRNA and may participate in the formation of the peptidyltransferase centre of the ribosome. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities includes bacterial, red algal, cyanelle, mammalian, yeast and Arabidopsis thaliana L3 proteins; archaeal Haloarcula marismortui HmaL3 (HL1), and yeast mitochondrial YmL9 [, , ].This entry represents a short conserved region located in the central section of ribosomal L3 proteins.
Protein Domain
Name: Ribosomal protein L3
Type: Family
Description: Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [ , ]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [ , ].Ribosomal protein L3 is one of the proteins from the large ribosomal subunit. In Escherichia coli, L3 is known to bind to the 23S rRNA and may participate in the formation of the peptidyltransferase centre of the ribosome. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities includes bacterial, red algal, cyanelle, mammalian, yeast and Arabidopsis thaliana L3 proteins; archaeal Haloarcula marismortui HmaL3 (HL1), and yeast mitochondrial YmL9 [, , ].
Protein Domain
Name: 30s ribosomal protein S13, C-terminal
Type: Homologous_superfamily
Description: This superfamily represents the C-terminal domain in proteins of the ribosomal protein family S13.Ribosomal protein S13 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S13 is known to be involved in binding fMet-tRNA and, hence, in the initiation of translation. It is a basic protein of 115 to 177 amino-acid residues that contains three helices and a β-hairpin in the core of the protein, forming a helix-two turns-helix (H2TH) motif, and a non-globular C-terminal extension. This family of ribosomal proteins is present in prokaryotes, eukaryotes and archaea [ ].
Protein Domain
Name: Ribosomal protein S13-like, H2TH
Type: Homologous_superfamily
Description: Ribosomal protein S13 is one of the proteins from the small ribosomal subunit [ ]. In Escherichia coli, S13 is known to be involved in binding fMet-tRNA and, hence, in the initiation of translation. S13 contains thee helices and a β-hairpin in the core of the protein, which form a helix-two turns-helix (H2TH) motif, and a non-globular C-terminal extension.This H2TH motif can be found in other proteins as well. In the DNA repair protein, MutM (formamidopyrimidine DNA glycosylase; Fpg), the middle domain contains the H2TH motif. MutM is a trifunctional DNA base excision repair enzyme that removes a wide range of oxidatively damaged bases (N-glycosylase activity) and cleaves both the 3'- and 5'-phosphodiester bonds of the resulting apurinic/apyrimidinic site (AP lyase activity) [ ]. Other repair enzymes, such as E. coli Endonuclease VIII that excises oxidized pyrimidines from DNA, also contain a DNA-binding H2TH motif within the middle domain. The H2TH domains of these repair proteins are only peripherally involved in binding DNA; their primary function may be simply to position the N-terminal lobe and C-terminal zinc finger domain of the glycosylases for interactions with DNA.The middle domain of topoisomerase IV-B subunit contains a H2TH motif that is structurally related to the DNA repair proteins. Although the H2TH domain appears to be retained in all archaeal and plant type IIB topoisomerases identified to date, it has no known function and has not been observed in other topoisomerase families [].
Protein Domain
Name: Ribosomal protein S13, conserved site
Type: Conserved_site
Description: Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [ , ]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [, ].Ribosomal protein S13 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S13 is known to be involved in binding fMet-tRNA and, hence, in the initiation of translation. It is a basic protein of 115 to 177 amino-acid residues that contains thee helices and a β-hairpin in the core of the protein, forming a helix-two turns-helix (H2TH) motif, and a non-globular C-terminal extension. This family of ribosomal proteins is present in prokaryotes, eukaryotes and archaea [ ].
Protein Domain
Name: Ribosomal protein S13
Type: Family
Description: Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [ , ]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [ , ].Ribosomal protein S13 is one of the proteins from the small ribosomal subunit. In Escherichia coli, S13 is known to be involved in binding fMet-tRNA and, hence, in the initiation of translation. It is a basic protein of 115 to 177 amino-acid residues that contains thee helices and a β-hairpin in the core of the protein, forming a helix-two turns-helix (H2TH) motif, and a non-globular C-terminal extension. This family of ribosomal proteins is present in prokaryotes, eukaryotes and archaea [ ].This entry also includes the 40S ribosomal protein S18 which is located at the top of the head of the 40S subunit where it contacts several helices of the 18S rRNA [ ].
Protein Domain
Name: Sugar phosphate transporter domain
Type: Domain
Description: This domain is found in a number of sugar phosphate transporters, including those with a specificity for triose phosphate [ ].
Protein Domain
Name: Uncharacterised conserved protein UCP009193
Type: Family
Description: There is currently no experimental data for members of this group or their homologues, nor do they exhibit features indicative of any function.The designation as "holocarboxylase synthetase"appears to be faulty. It originally comes from the annotation for the Triticum aestivum (Wheat) member, which notes similarity to human holocarboxylase synthetase. However, such similarity does not appear to exist.
Protein Domain
Name: Peptidase M14, carboxypeptidase A
Type: Domain
Description: Over 70 metallopeptidase families have been identified to date. In these enzymes a divalent cation which is usually zinc, but may be cobalt, manganese or copper, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. In some families of co-catalytic metallopeptidases, two metal ions are observed in crystal structures ligated by five amino acids, with one amino acid ligating both metal ions. The known metal ligands are His, Glu, Asp or Lys. At least one other residue is required for catalysis, which may play an electrophillic role. Many metalloproteases contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site []. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as 'abXHEbbHbc', where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases [].This group of sequences contain a diverse range of gene families, which include metallopeptidases belonging to MEROPS peptidase family M14 (carboxypeptidase A, clan MC), subfamilies M14A and M14B.The carboxypeptidase A family can be divided into four subfamilies: M14A (carboxypeptidase A or digestive), M14B (carboxypeptidase H or regulatory), M14C (gamma-D-glutamyl-L-diamino acid peptidase I) and M14D (AGTPBP-1/Nna1-like proteins) [, ]. Members of subfamily M14B have longer C-termini than those of subfamily M14A [], and carboxypeptidase M (a member of the H family) is bound to the membrane by a glycosylphosphatidylinositol anchor, unlike the majority of the M14 family, which are soluble [].ATP/GTP binding protein (AGTPBP-1/Nna1)-like proteins are active metallopeptidases that act on cytosolic proteins such as alpha-tubulin, to remove a C-terminal tyrosine. Mutations in AGTPBP-1/Nna1 cause Purkinje cell degeneration (pcd). AGTPBP-1/Nna1-like proteins from the different phyla are highly diverse, but they all contain a unique N-terminal conserved domain right before the CP domain. It has been suggested that this N-terminal domain might act as a folding domain [ , , , ].The zinc ligands have been determined as two histidines and a glutamate, and the catalytic residue has been identified as a C-terminal glutamate,but these do not form the characteristic metalloprotease HEXXH motif [ , ]. Members of the carboxypeptidase A family are synthesised as inactive molecules with propeptides that must be cleaved to activate the enzyme. Structural studies of carboxypeptidases A and B reveal the propeptide to exist as a globular domain, followed by an extended α-helix; this shields the catalytic site, without specifically binding to it, while the substrate-binding site is blocked by making specific contacts [, ].
Protein Domain      
Protein Domain
Name: Carboxypeptidase-like, regulatory domain superfamily
Type: Homologous_superfamily
Description: This domain superfamily identifies a number of eukaryotic carboxypeptidases, these include carboxypeptidase D, E (H), N, X, X2 and Z. These are metallopeptidases belong to MEROPS peptidase family M14 (clan MC), subfamily M14B.Carboxypeptidase D (CPD) is a new B-type metallocarboxypeptidase that is membrane bound and has an acidic pH optimum. A hydrophobic region at the N terminus represents the signal peptide, and one near the C terminus that probably represents the transmembrane anchor. A regulatory domain within the protein has been identified as a β-sandwich, comprising 7 strands in 2 sheets in a greek-key topology. Some family members have an additional 1-2 strands to the common fold [ ].The bacterial and archaeal sequences having this signature are variously annotated, examples are:Hypothetical/conserved/membrane/cell surface protein N-acetylglucosamine deacetylaseSide tail fibre protein homologue from lambdoid prophage RacHypothetical tonB-linked outer membrane receptorOmpA-related proteinPutative outer membrane protein, probably involved in nutrient bindingTonB-dependent receptorThis entry also includes the teneurin family members, which may function as cellular signal transducers.
Protein Domain
Name: GIGANTEA
Type: Family
Description: GIGANTEA is involved in regulation of circadian rhythm, and in the control of the photoperiodic flowering.Photoperiodic control of flowering is a vital developmental process in plants because it directly relates to successful reproduction [ ].In Arabidopsis, a long-day (LD) plant, flowering is promoted by long days and is delayed by short days. Mutations in the GIGANTEA (GI) gene delay flowering under long days but the effects are minimal under short days. It is believed that GI plays an important role in regulating the expression of flowering-time genes during the promotion of flowering by photo-period [ ], and that it participates in a feedback loop of the plant circadian system []. In rice, a short day (SD) plant, GI acts as suppressor of flowering under short-day and long-day conditions [].The GI gene encodes a putative membrane protein containing 6 putative transmembrane (TM) domains [ ]. The protein interacts with SPINDLY, a negative regulator of gibberellin signalling in Arabidopsis, via the latter's N-terminal array of 10 tetratricopeptide repeats [].
Protein Domain
Name: Clathrin adaptor complex, small chain
Type: Conserved_site
Description: Proteins synthesized on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. This traffic is bidirectional, to ensure that proteins required to form vesicles are recycled. Vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transfer [ ]. Clathrin coats contain both clathrin and adaptor complexes that link clathrin to receptors in coated vesicles. Clathrin-associated protein complexes are believed to interact with the cytoplasmic tails of membrane proteins, leading to their selection and concentration. The two major types of clathrin adaptor complexes are the heterotetrameric adaptor protein (AP) complexes, and the monomeric GGA (Golgi-localising, Gamma-adaptin ear domain homology, ARF-binding proteins) adaptors [ ]. All AP complexes are heterotetramers composed of two large subunits (adaptins), a medium subunit (mu) and a small subunit (sigma). Each subunit has a specific function. Adaptin subunits recognise and bind to clathrin through their hinge region (clathrin box), and recruit accessory proteins that modulate AP function through their C-terminal appendage domains. By contrast, GGAs are monomers composed of four domains, which have functions similar to AP subunits: an N-terminal VHS (Vps27p/Hrs/Stam) domain, a GAT (GGA and Tom1) domain, a hinge region, and a C-terminal GAE (gamma-adaptin ear) domain. The GAE domain is similar to the AP gamma-adaptin ear domain, being responsible for the recruitment of accessory proteins that regulate clathrin-mediated endocytosis [].While clathrin mediates endocytic protein transport from ER to Golgi, coatomers (COPI, COPII) primarily mediate intra-Golgi transport, as well as the reverse Golgi to ER transport of dilysine-tagged proteins [ ]. Coatomers reversibly associate with Golgi (non-clathrin-coated) vesicles to mediate protein transport and for budding from Golgi membranes []. Coatomer complexes are hetero-oligomers composed of at least an alpha, beta, beta', gamma, delta, epsilon and zeta subunits. This entry represents the small sigma subunit of various adaptins from different AP clathrin adaptor complexes (including AP1, AP2, AP3 and AP4), and the zeta subunit of various coatomer (COP) adaptors. The small sigma subunit of AP proteins have been characterised in several species [ , , , ]. The sigma subunit plays a role in protein sorting in the late-Golgi/trans-Golgi network (TGN) and/or endosomes. The zeta subunit of coatomers (zeta-COP) is required for coatomer binding to Golgi membranes and for coat-vesicle assembly [, ].
Protein Domain
Name: AP complex, mu/sigma subunit
Type: Domain
Description: Proteins synthesized on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. This traffic is bidirectional, to ensure that proteins required to form vesicles are recycled. Vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transfer [ ]. Clathrin coats contain both clathrin and adaptor complexes that link clathrin to receptors in coated vesicles. Clathrin-associated protein complexes are believed to interact with the cytoplasmic tails of membrane proteins, leading to their selection and concentration. The two major types of clathrin adaptor complexes are the heterotetrameric adaptor protein (AP) complexes, and the monomeric GGA (Golgi-localising, Gamma-adaptin ear domain homology, ARF-binding proteins) adaptors [ ]. All AP complexes are heterotetramers composed of two large subunits (adaptins), a medium subunit (mu) and a small subunit (sigma). Each subunit has a specific function. Adaptin subunits recognise and bind to clathrin through their hinge region (clathrin box), and recruit accessory proteins that modulate AP function through their C-terminal appendage domains. By contrast, GGAs are monomers composed of four domains, which have functions similar to AP subunits: an N-terminal VHS (Vps27p/Hrs/Stam) domain, a GAT (GGA and Tom1) domain, a hinge region, and a C-terminal GAE (gamma-adaptin ear) domain. The GAE domain is similar to the AP gamma-adaptin ear domain, being responsible for the recruitment of accessory proteins that regulate clathrin-mediated endocytosis [].While clathrin mediates endocytic protein transport from ER to Golgi, coatomers (COPI, COPII) primarily mediate intra-Golgi transport, as well as the reverse Golgi to ER transport of dilysine-tagged proteins [ ]. Coatomers reversibly associate with Golgi (non-clathrin-coated) vesicles to mediate protein transport and for budding from Golgi membranes []. Coatomer complexes are hetero-oligomers composed of at least an alpha, beta, beta', gamma, delta, epsilon and zeta subunits. This entry represents the small sigma and mu subunits of various adaptins from different AP clathrin adaptor complexes (including AP1, AP2, AP3 and AP4), and the zeta and delta subunits of various coatomer (COP) adaptors. The small sigma subunit of AP proteins have been characterised in several species [ , , , ]. The sigma subunit plays a role in protein sorting in the late-Golgi/trans-Golgi network (TGN) and/or endosomes. The zeta subunit of coatomers (zeta-COP) is required for coatomer binding to Golgi membranes and for coat-vesicle assembly [, ].
Protein Domain
Name: Longin-like domain superfamily
Type: Homologous_superfamily
Description: VAMPs (and its homologue synaptobrevins) define a group of SNARE proteins that contain a C-terminal coiled-coil/SNARE motif, in combination with variable N-terminal domains that are used to classify VAMPs: those containing longin N-terminal domains (~150 aa) are referred to as longins, while those with shorter N-termini are referred to as brevins [ ]. Longins are the only type of VAMP protein found in all eukaryotes, suggesting that their longin domain is essential. The longin domain is thought to exert a regulatory function. Longin domains have been shown to share the same structural fold, a profilin-like globular domain consisting of a five-stranded antiparallel β-sheet that is sandwiched by an α-helix on one side, and two α-helices on the other (beta(2)-α-β(3)-alpha(2)).Other families have been shown to contain domains that structurally resemble the VAMP longin domain. An example is the eukaryotic conserved protein, SEDL, which is a component of the transport protein particle (TRAPP), critically involved in endoplasmic reticulum-to-Golgi vesicle transport; mutations in the SEDL gene are associated with an X-linked skeletal disorder, spondyloepiphyseal dysplasia tarda []. Another example is the assembly domain of clathrin coat proteins, such as Mu2 adaptin (AP50) and Sigma2 adaptin (AP17), which structurally resemble the longin domain. AP50 and AP17 are two of the proteins that make up the core of AP2, a complex that functions in clathrin-mediated endocytosis [].
Protein Domain
Name: Adaptor protein complex, sigma subunit
Type: Family
Description: The adaptor protein complexes mediate both the recruitment of clathrin to membranes and the recognition of sorting signals within the cytosolic tails of transmembrane cargo molecules [ ]. Adaptor protein complex 1 (AP-1) is a heterotetramer composed of two large adaptins (gamma-type subunit AP1G1 and beta-type subunit AP1B1), a medium adaptin (mu-type subunit AP1M1 or AP1M2) and a small adaptin (sigma-type subunit AP1S1 or AP1S2 or AP1S3). Subunits of clathrin-associated adaptor protein complex 1 play a role in protein sorting in the late-Golgi/trans-Golgi network (TGN) and/or in endosomes.This group represents an adaptor protein complex, sigma subunit.
Protein Domain
Name: Zinc/iron permease
Type: Family
Description: These ZIP zinc transporter proteins define a family of metal ion transporters that are found in plants, protozoa, fungi, invertebrates, and vertebrates, making it now possible to address questions of metal ion accumulation and homeostasis in diverse organisms [].
Protein Domain
Name: VQ
Type: Domain
Description: This short motif is found in a variety of plant proteins. These proteins vary greatly in length and are mostly composed of low complexity regions. They all conserve a short motif FXhVQChTG, where X is any amino acid and h is a hydrophobic amino acid. The function of this motif is uncertain, however one protein in this family has been found to bind the SigA sigma factor . It would seem plausible that this motif is needed for this activity and that this whole family might be involved in modulating plastid sigma factors.
Protein Domain      
Protein Domain
Name: Protein of unknown function DUF3511
Type: Family
Description: This presumed domain is functionally uncharacterised. This domain is found in eukaryotes. This domain is about 50 amino acids in length. This domain has two completely conserved residues (Y and K) that may be functionally important.
Protein Domain
Name: F-box domain
Type: Domain
Description: First identified in cyclin-F as a protein-protein interaction motif, the F-box is a conserved domain that is present in numerous proteins with a bipartite structure []. Through the F-box, these proteins are linked to the Skp1 protein and the core of SCFs (Skp1-cullin-F-box protein ligase) complexes. SCFs complexes constitute a new class of E3 ligases []. They function in combination with the E2 enzyme Cdc34 to ubiquitinate G1 cyclins, Cdk inhibitors and many other proteins, to mark them for degradation. The binding of the specific substrates by SCFs complexes is mediated by divergent protein-protein interaction motifs present in F-box proteins, like WD40 repeats, leucine rich repeats [, ] or ANK repeats.
Protein Domain
Name: Tubby, C-terminal
Type: Domain
Description: Tubby, an autosomal recessive mutation, mapping to mouse chromosome 7, was recently found to be the result of a splicing defect in a novel gene with unknown function. This mutation maps to the tub gene [ , ]. The mouse tubby mutation is the cause of maturity-onset obesity, insulin resistance and sensory deficits. By contrast with the rapid juvenile-onset weight gain seen in diabetes (db) and obese (ob) mice, obesity in tubby mice develops gradually, and strongly resembles the late-onset obesity observed in the human population. Excessive deposition of adipose tissue culminates in a two-fold increase of body weight. Tubby mice also suffer retinal degeneration and neurosensory hearing loss. The tripartite character of the tubby phenotype is highly similar to human obesity syndromes, such as Alstrom and Bardet-Biedl. Although these phenotypes indicate a vital role for tubby proteins, no biochemical function has yet been ascribed to any family member [ ], although it has been suggested that the phenotypic features of tubby mice may be the result of cellular apoptosis triggered by expression of the mutated tub gene. TUB is the founding-member of the tubby-like proteins, the TULPs. TULPs are found in multicellular organisms from both the plant and animal kingdoms. Ablation of members of this protein family cause disease phenotypes that are indicative of their importance in nervous-system function and development [].Mammalian TUB is a hydrophilic protein of ~500 residues. The N-terminal ( ) portion of the protein is conserved neither in length nor sequence, but, in TUB, contains the nuclear localisation signal and may have transcriptional-activation activity. The C-terminal 250 residues are highly conserved. The C-terminal extremity contains a cysteine residue that might play an important role in the normal functioning of these proteins. The crystal structure of the C-terminal core domain from mouse tubby has been determined to 1.9A resolution. This domain is arranged as a 12-stranded, all anti-parallel, closed β-barrel that surrounds a central alpha helix, (which is at the extreme carboxyl terminus of the protein) that forms most of the hydrophobic core. Structural analyses suggest that TULPs constitute a unique family of bipartite transcription factors [ ].
Protein Domain
Name: Tubby-like, C-terminal
Type: Homologous_superfamily
Description: Tubby, an autosomal recessive mutation, mapping to mouse chromosome 7, was recently found to be the result of a splicing defect in a novel gene with unknown function. This mutation maps to the tub gene [ , ]. The mouse tubby mutation is the cause of maturity-onset obesity, insulin resistance and sensory deficits. By contrast with the rapid juvenile-onset weight gain seen in diabetes (db) and obese (ob) mice, obesity in tubby mice develops gradually, and strongly resembles the late-onset obesity observed in the human population. Excessive deposition of adipose tissue culminates in a two-fold increase of body weight. Tubby mice also suffer retinal degeneration and neurosensory hearing loss. The tripartite character of the tubby phenotype is highly similar to human obesity syndromes, such as Alstrom and Bardet-Biedl. Although these phenotypes indicate a vital role for tubby proteins, no biochemical function has yet been ascribed to any family member [], although it has been suggested that the phenotypic features of tubby mice may be the result of cellular apoptosis triggered by expression of the mutated tub gene. TUB is the founding-member of the tubby-like proteins, the TULPs. TULPs are found in multicellular organisms from both the plant and animal kingdoms. Ablation of members of this protein family cause disease phenotypes that are indicative of their importance in nervous-system function and development [].Mammalian TUB is a hydrophilic protein of ~500 residues. The N-terminal ( ) portion of the protein is conserved neither in length nor sequence, but, in TUB, contains the nuclear localisation signal and may have transcriptional-activation activity. The C-terminal 250 residues are highly conserved. The C-terminal extremity contains a cysteine residue that might play an important role in the normal functioning of these proteins. The crystal structure of the C-terminal core domain from mouse tubby has been determined to 1.9A resolution. This domain is arranged as a 12-stranded, all anti-parallel, closed β-barrel that surrounds a central alpha helix, (which is at the extreme carboxyl terminus of the protein) that forms most of the hydrophobic core. Structural analyses suggest that TULPs constitute a unique family of bipartite transcription factors [ ].This superfamily represents the tubby C-terminal domain and the structurally related LURP1-like domain.
Protein Domain
Name: Tubby, C-terminal, conserved site
Type: Conserved_site
Description: Tubby, an autosomal recessive mutation, mapping to mouse chromosome 7, was recently found to be the result of a splicing defect in a novel gene with unknown function. This mutation maps to the tub gene [ , ]. The mouse tubby mutation is the cause of maturity-onset obesity, insulin resistance and sensory deficits. By contrast with the rapid juvenile-onset weight gain seen in diabetes (db) and obese (ob) mice, obesity in tubby mice develops gradually, and strongly resembles the late-onset obesity observed in the human population. Excessive deposition of adipose tissue culminates in a two-fold increase of body weight. Tubby mice also suffer retinal degeneration and neurosensory hearing loss. The tripartite character of the tubby phenotype is highly similar to human obesity syndromes, such as Alstrom and Bardet-Biedl. Although these phenotypes indicate a vital role for tubby proteins, no biochemical function has yet been ascribed to any family member [], although it has been suggested that the phenotypic features of tubby mice may be the result of cellular apoptosis triggered by expression of the mutated tub gene. TUB is the founding-member of the tubby-like proteins, the TULPs. TULPs are found in multicellular organisms from both the plant and animal kingdoms. Ablation of members of this protein family cause disease phenotypes that are indicative of their importance in nervous-system function and development [].Mammalian TUB is a hydrophilic protein of ~500 residues. The N-terminal ( ) portion of the protein is conserved neither in length nor sequence, but, in TUB, contains the nuclear localisation signal and may have transcriptional-activation activity. The C-terminal 250 residues are highly conserved. The C-terminal extremity contains a cysteine residue that might play an important role in the normal functioning of these proteins. The crystal structure of the C-terminal core domain from mouse tubby has been determined to 1.9A resolution. This domain is arranged as a 12-stranded, all anti-parallel, closed β-barrel that surrounds a central alpha helix, (which is at the extreme carboxyl terminus of the protein) that forms most of the hydrophobic core. Structural analyses suggest that TULPs constitute a unique family of bipartite transcription factors [ ].This entry represents conserved sites found in the C-terminal domain. The site closest to the C terminus contains a penultimate cysteine residue that could be critical to the normal functioning of these proteins.
Protein Domain
Name: Alpha crystallin/Hsp20 domain
Type: Domain
Description: Prokaryotic and eukaryotic organisms respond to heat shock or other environmental stress by inducing the synthesis of proteins collectively known as heat-shock proteins (hsp) [ ]. Amongst them is a family of proteins with an average molecular weight of 20 Kd, known as the hsp20 proteins []. These seem to act as chaperones that can protect other proteins against heat-induced denaturation and aggregation. Hsp20 proteins seem to form large heterooligomeric aggregates.These low-molecular-weight proteins are evolutionarily related to alpha-crystallin [ ]. Alpha-crystallin is an abundant constituent of the eye lens of most vertebrate species. Its main function appears to be to maintain the correct refractive index and transparency of the lens. It is also found in other tissues where it seems to act as a chaperone [, ]. Other related proteins include certain surface antigens [].This entry represents a conserved C-terminal domain of about 100 residues characteristic of this group of proteins [ ].
Protein Domain
Name: HSP20-like chaperone
Type: Homologous_superfamily
Description: Hsp20 is a mammalian small heat-shock protein family that occurs most abundantly in skeletal muscle and heart. It has a tendency to form dimers, via a disulphide linkage formed by an N-terminal cysteine, low heat stability and a poor chaperoning ability in comparison with other family members. Structurally, this and related proteins contain a β-sandwich fold consisting of 8 strands in 2 β-sheets in a greek-key topology [ ].
Protein Domain
Name: Protein ROH1-like
Type: Family
Description: ROH1 is an interactor of the exocyst subunit Exo70A1, and has been shown to be required for seed coat mucilage deposition [ ].
Protein Domain
Name: ABA/WDS induced protein
Type: Family
Description: This is a family of plant proteins induced by water deficit stress (WDS) [ ], or abscisic acid (ABA) stress and ripening []. The Ip3 cDNA clone is expressed at high levels in the roots, and is induced by ABA under WDS.
Protein Domain
Name: Elongator complex protein 4
Type: Family
Description: Elongator is a 6 subunit protein complex highly conserved in eukaryotes. The human Elongator six-subunit complex, known as holo-Elongator, has histone acetyltransferase activity directed against histone H3 and H4 [ , ]. It consists of two subcomplexes, a core subcomplex (ELP1-3), and an accessory subcomplex (ELP4-6) []. The elongator complex has been associated with many cellular activities, including transcriptional elongation [, ], but its main function is tRNA modification [, ]. It is required for the formation of 5-methoxy-carbonylmethyl (mcm5) and 5-carbamoylmethyl (ncm5) groups on uridine nucleosides present at the wobble position of many tRNAs [].This entry represents the ELP4 subunit. Mammalian ELP4 gene is implicated in rolandic epilepsy [ ].
Protein Domain
Name: Reverse transcriptase domain
Type: Domain
Description: The use of an RNA template to produce DNA, for integration into the host genome and exploitation of a host cell, is a strategy employed in the replication of retroid elements, such as the retroviruses and bacterial retrons. The enzyme catalysing polymerisation is an RNA-directed DNA-polymerase, or reverse trancriptase (RT) ( ). Reverse transcriptase occurs in a variety of mobile elements, including retrotransposons, retroviruses, group II introns [ ], bacterial msDNAs, hepadnaviruses, and caulimoviruses.Retroviral reverse transcriptase is synthesised as part of the POL polyprotein that contains; an aspartyl protease, a reverse transcriptase, RNase H and integrase. POL polyprotein undergoes specific enzymatic cleavage to yield the mature proteins. The discovery of retroelements in the prokaryotes raises intriguing questions concerning their roles in bacteria and the origin and evolution of reverse transcriptases and whether the bacterial reverse transcriptases are older than eukaryotic reverse transcriptases [].Several crystal structures of the reverse transcriptase (RT) domain have been determined [ ].
Protein Domain
Name: Aminoacyl-tRNA synthetase, class Ic
Type: Family
Description: The aminoacyl-tRNA synthetases (also known as aminoacyl-tRNA ligases) catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction [ , ]. These proteins differ widely in size and oligomeric state, and have limited sequence homology []. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold catalytic domain and are mostly monomeric []. Class II aminoacyl-tRNA synthetases share an anti-parallel β-sheet fold flanked by α-helices [], and are mostly dimeric or multimeric, containing at least three conserved regions [, , ]. However, tRNA binding involves an α-helical structure that is conserved between class I and class II synthetases. In reactions catalysed by the class I aminoacyl-tRNA synthetases, the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan, valine, and some lysine synthetases (non-eukaryotic group) belong to class I synthetases. The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, phenylalanine, proline, serine, threonine, and some lysine synthetases (non-archaeal group), belong to class-II synthetases. Based on their mode of binding to the tRNA acceptor stem, both classes of tRNA synthetases have been subdivided into three subclasses, designated 1a, 1b, 1c and 2a, 2b, 2c [].The class Ia aminoacyl-tRNA synthetases consist of the isoleucyl, methionyl, valyl, leucyl, cysteinyl, and arginyl-tRNA synthetases; the class Ib include the glutamyl and glutaminyl-tRNA synthetases, and the class Ic are the tyrosyl and tryptophanyl-tRNA synthetases [ ].
Protein Domain
Name: Tryptophan-tRNA ligase
Type: Family
Description: This entry represents tryptophan-tRNA ligase (TrpRS; also known as tryptophanyl-tRNA synthetase) ( ). The enzyme is widely distributed, being found in archaea, bacteria and eukaryotes. TrpRS is a homodimer which attaches Tyr to the appropriate tRNA. TrpRS is a class I tRNA synthetase, so it aminoacylates the 2'-OH of the nucleotide at the 3' end of the tRNA. The core domain is based on the Rossman fold and is responsible for the ATP-dependent formation of the enzyme bound aminoacyl-adenylate. It contains class I characteristic 'HIGH' and 'KMSKS' motifs, which are involved in ATP binding [ ].The class Ia aminoacyl-tRNA synthetases consist of the isoleucyl, methionyl, valyl, leucyl, cysteinyl, and arginyl-tRNA synthetases; the class Ib include the glutamyl and glutaminyl-tRNA synthetases, and the class Ic are the tyrosyl and tryptophanyl-tRNA synthetases [ ].The aminoacyl-tRNA synthetases (also known as aminoacyl-tRNA ligases) catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction [ , ]. These proteins differ widely in size and oligomeric state, and have limited sequence homology []. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold catalytic domain and are mostly monomeric []. Class II aminoacyl-tRNA synthetases share an anti-parallel β-sheet fold flanked by α-helices [], and are mostly dimeric or multimeric, containing at least three conserved regions [, , ]. However, tRNA binding involves an α-helical structure that is conserved between class I and class II synthetases. In reactions catalysed by the class I aminoacyl-tRNA synthetases, the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan, valine, and some lysine synthetases (non-eukaryotic group) belong to class I synthetases. The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, phenylalanine, proline, serine, threonine, and some lysine synthetases (non-archaeal group), belong to class-II synthetases. Based on their mode of binding to the tRNA acceptor stem, both classes of tRNA synthetases have been subdivided into three subclasses, designated 1a, 1b, 1c and 2a, 2b, 2c [].
Protein Domain
Name: Domain of unknown function DUF659
Type: Domain
Description: These are transposase-like proteins with no known function.
Protein Domain
Name: HAT, C-terminal dimerisation domain
Type: Domain
Description: This dimerisation domain is found at the C terminus of the transposases of elements belonging to the Activator superfamily (hAT element superfamily). The isolated dimerisation domain forms extremely stable dimers in vitro[ , ].
Protein Domain
Name: Zinc finger, BED-type
Type: Domain
Description: The BED finger, which was named after the Drosophila proteins BEAF and DREF, is found in one or more copies in cellular regulatory factors and transposases from plants, animals and fungi. The BED finger is an about 50 to 60 amino acid residues domain that contains a characteristic motif with two highly conserved aromatic positions, as well as a shared pattern of cysteines and histidines that is predicted to form a zinc finger. As diverse BED fingers are able to bind DNA, it has been suggested that DNA-binding is the general function of this domain [ ].Some proteins known to contain a BED domain are listed below:Animal, fungal and plant AC1 and Hobo-like transposases.Caenorhabditis elegans protein dpy-20, a predicted cuticular-gene transcriptional regulator.Drosophila BEAF (boundary element-associated factor), which is thought to be involved in chromatin insulation.Drosophila DREF, a transcriptional regulator for S-phase genes.Tobacco 3AF1 and tomato E4/E8-BP1, which are light- and ethylene-regulated DNA binding proteins that contain two BED fingers [ , ].
Protein Domain
Name: Ribonuclease H-like superfamily
Type: Homologous_superfamily
Description: The catalytic domain of several polynucleotidyl transferases share a similar structure, consisting of a 3-layer α/β/α fold that contains mixed β-sheets, suggesting that they share a similar mechanism of catalysis. Polynucleotidyl transferases containing this domain include ribonuclease H class I (RNase HI) and class II (RNase HII), HIV RNase (reverse transcriptase domain), retroviral integrase (catalytic domain), Mu transposase (core domain), transposase inhibitor Tn5 (containing additional all-α subdomains), DnaQ-like 3'-5' exonucleases (exonuclease domains), RuvC resolvase, and mitochondrial resolvase ydc2 (catalytic domain) [ , , ].This superfamily also includes the YqgF domain, described as RNase H-like and typified by the Escherichia coli protein YqgF [ , ]. YqgF domain-containing proteins are predicted to be ribonucleases or resolvases based on homology to RuvC Holliday junction resolvases.
Protein Domain
Name: Rossmann-like alpha/beta/alpha sandwich fold
Type: Homologous_superfamily
Description: This superfamily represents domains related by a common ancestor that have a Rossmann-like, 3-layer, alpha/beta/alpha sandwich fold. Protein families in which the domain is found include:Nucleotidylyl transferases ( ) such as cytidylyltransferases [ ], adenylyltransferases [].Class I aminoacyl-tRNA synthetases (catalytic domain), such as tyrosyl-tRNA synthetase ( ) and glutaminyl-tRNA synthetase ( ) [ ].Pantothenate synthetases ( ) [ ].ATP sulphurylase (central domain) [ ] N-type ATP pyrophosphatases, such as beta-lactam synthetase ( ) and GMP synthase ( ) [ ].PP-loop ATPases such as the cell cycle protein MesJ (N-terminal domain) [ ].Phosphoadenylyl sulphate (PAPS) reductase [ ] Electron transfer flavoprotein (ETFP) subunits, such as the N-terminal domains of the alpha and beta subunits [ ].Universal stress protein A (UspA) [ ].Cryptochrome and DNA photolyase [ ].
Protein Domain
Name: UDP-glucuronosyl/UDP-glucosyltransferase
Type: Family
Description: UDP glycosyltransferases (UGT) are a superfamily of enzymes that catalyses the addition of the glycosyl group from a UDP-sugar to a small hydrophobic molecule. This family currently consist of:Mammalian UDP-glucuronosyl transferases ( ) (UDPGT) [ ]. A large family of membrane-bound microsomal enzymes which catalyse the transfer of glucuronic acid to a wide variety of exogenous and endogenous lipophilic substrates. These enzymes are of major importance in the detoxification and subsequent elimination of xenobiotics such as drugs and carcinogens. These enzymes are also involved in cancer progression and drug resistance [].A large number of putative UDPGT from Caenorhabditis elegans.Mammalian 2-hydroxyacylsphingosine 1-beta-galactosyltransferase [ ] () (also known as UDP-galactose-ceramide galactosyltransferase). This enzyme catalyses the transfer of galactose to ceramide, a key enzymatic step in the biosynthesis of galactocerebrosides, which are abundant sphingolipids of the myelin membrane of the central nervous system and peripheral nervous system. Fungal Sterol 3-beta-glucosyltransferase, which is involved in the degradation of peroxisomes, mitochondria and nuclei [ ]. Fungal Enfumafungin synthase efuA [ ]. This protein plays a role in the biosynthesis of enfumafungin, a glycosylated fernene-type triterpenoid with potent antifungal activity.Plants Anthocyanidin 3-O-glucosyltransferase, also known as Flavonol O(3)-glucosyltransferase, an enzyme that catalyses the transfer of glucose from UDP-glucose to a flavanol. This reaction is essential and one of the last steps in anthocyanin pigment biosynthesis. Gallate 1-beta-glucosyltransferase ( ), a glucosyltransferase that catalyses the formation of 1-O-galloyl-beta-D-glucose, the first committed step of hydrolyzable tannins (HTs) biosynthesis [ ].(R)-mandelonitrile beta-glucosyltransferase from almond, which is involved in the biosynthesis of the cyanogenic glycoside (R)-prunasin (stereo-selective), a precursor of (R)-amygdalin which at high concentrations is associated with bitterness in kernels of almond [ ].Baculoviruses ecdysteroid UDP-glucosyltransferase ( ) [ ] (egt). This enzyme catalyses the transfer of glucose from UDP-glucose to ectysteroids which are insect molting hormones. The expression of egt in the insect host interferes with the normal insect development by blocking the molting process.Prokaryotic zeaxanthin glucosyltransferase ( ) (gene crtX), an enzyme involved in carotenoid biosynthesis and that catalyses the glycosylation reaction which converts zeaxanthin to zeaxanthin-beta-diglucoside; Enterobactin C-glucosyltransferase iroB which catalyses the successive monoglucosylation, diglucosylation and triglucosylation of enterobactin decreasing the membrane affinity of Enterobactin and increasing the iron acquisition rate [ , ].Streptomyces macrolide glycosyltransferases ( ) [ ]. These enzymes specifically inactivate macrolide antibiotics via 2'-O-glycosylation using UDP-glucose.
Protein Domain
Name: Glycosyltransferase family 28, N-terminal domain
Type: Domain
Description: The biosynthesis of disaccharides, oligosaccharides and polysaccharides involves the action of hundreds of different glycosyltransferases. These enzymes catalyse the transfer of sugar moieties from activated donor molecules to specific acceptor molecules, forming glycosidic bonds. A classification of glycosyltransferases using nucleotide diphospho-sugar, nucleotide monophospho-sugar and sugar phosphates ([intenz:2.4.1.-]) and related proteins into distinct sequence based families has been described []. This classification is available on the CAZy (CArbohydrate-Active EnZymes) web site. The same three-dimensional fold is expected to occur within each of the families. Because 3-D structures are better conserved than sequences, several of the families defined on the basis of sequence similarities may have similar 3-D structures and therefore form 'clans'.Glycosyltransferase family 28 comprises enzymes with a number of known activities; 1,2-diacylglycerol 3-beta-galactosyltransferase ( ); 1,2-diacylglycerol 3-beta-glucosyltransferase ( ); beta-N-acetylglucosamine transferase ( ).
Protein Domain
Name: Sphingomyelin synthase-like domain
Type: Domain
Description: This domain is found in sphingomyelin synthase (also known as phosphatidylcholine:ceramide cholinephosphotransferase), and other proteins. Sphingomyelin synthase is a bidirectional lipid cholinephosphotransferase capable of converting phosphatidylcholine (PC) and ceramide to sphingomyelin (SM) and diacylglycerol (DAG) and vice versa [ ]. This domain is closely related to the C-terminal a region of phosphatidic acid phosphatase type 2 (PAP2).
Protein Domain
Name: Cellulose synthase, RING-type zinc finger
Type: Domain
Description: This RING-type zinc finger domain is frequently found in the catalytic subunit of cellulose synthase ( ). This enzyme removes the glucose from UDP-glucose and adds it to the growing cellulose, thereby releasing UDP. The domain-structure is treble-clef like (PDB:1weo).
Protein Domain
Name: Prolamin-like domain
Type: Domain
Description: Proteins with this domain are found to be expressed in the plant embryo sac and are regulated by the Myb98 transcription factor. Computational analysis has revealed that they are homologous to the plant prolamin superfamily (Protease inhibitor-seed storage-LTP family, ) [ ]. In contrast to the typical prolamin members that have eight conserved Cys residues forming four pairs of disulphide bonds, proteins with this domain only contain six conserved Cys residues that may form three pairs of disulphide bonds. They may have potential functions in lipid transfer or protection during plant embryo sac development and reproduction []. This domain includes both previous DUF784 and DUF1278 domains.
Protein Domain
Name: Cupredoxin
Type: Homologous_superfamily
Description: Copper is one of the most prevalent transition metals in living organisms and its biological function is intimately related to its redox properties. Since free copper is toxic, even at very low concentrations, its homeostasis in living organisms is tightly controlled by subtle molecular mechanisms. In eukaryotes, before being transported inside the cell via the high-affinity copper transporters of the CTR family, the copper (II) ion is reduced to copper (I). In blue copper proteins such as Cupredoxin, the copper (I) ion form is stabilised by a constrained His2Cys coordination environment.This entry represents cupredoxin proteins, as well as structural homologues to cupredoxin. Structurally, the cupredoxin-like fold consists of a β-sandwich with 7 strands in 2 β-sheets, which is arranged in a Greek-key β-barrel [ ]. Some of these proteins have lost the ability to bind copper. Proteins with a cupredoxin-type fold are found in the following family groups: Mono-domain cupredoxins, such as amicyanin, plastocyanin, pseudoazurin, plantacyanin, azurin, auracyanin, rusticyanin, stellacyanin, and mavicyanin.Multi-domain cupredoxins, such as nitrite reductase (2 domains of this fold), multicopper oxidase CueO, spore coat protein A, ascorbate oxidase (3 domains of this fold), laccase (3 domains of this fold), ceruloplamin (6 domains of this fold), and coagulation factor V.Red copper protein nitrocyanin and the C-terminal of nitrous oxide reductase.Quinol oxidase and the periplasmic domain of cytochrome c oxidase subunit II.Ephrin-a5 and ephrin-b2 ectodomain, which are related to cupredoxins but lack the metal-binding site.The N-terminal domain of protein arginine deiminase Pad4, which is related to cupredoxin but lacks the metal-biding site.
Protein Domain
Name: Domain of unknown function DUF223
Type: Domain
Description: The function of this domain has not been characterised.
Protein Domain
Name: RIN4, pathogenic type III effector avirulence factor Avr cleavage site
Type: Domain
Description: This domain is conserved in small families of otherwise unrelated proteins in both mono-cots and di-cots, suggesting that it has a conserved, plant-specific function. It is found in the plant RIN4 (RPM1-interacting protein 4) where it appears to contribute to the binding of the protein to RCS (AvrRpt2 auto-cleavage site) and AvrB, the virulence factors from the infecting bacterium [ ]. The cleavage site for the AvrRpt2 avirulence protein would appear to be the sequence motifs VPQFGDW and LPKFGEW, both of which are highly conserved within the domain [].
Protein Domain
Name: ABC1 atypical kinase-like domain
Type: Domain
Description: This entry represents a domain found in Escherichia coli UbiB, known in Providencia stuartii as Aarf, which is required for ubiquinone (CoQ) biosynthesis [ , , ]. Some proteins with this domain are described as aarF domain-containing protein kinases (ADCKs). This domain is also found in yeast ABC1 proteins () required for function of the mitochondrial bc1 complex [ ], in which CoQ functions as an essential cofactor. The function of these proteins is not clear. Along with ABC1, UbiB is part of a large family of proteins that contain motifs found in eukaryotic-type protein kinases [ ], but is not known if they have kinase activity and how this would relate to their requirement for the monoxygenase step in CoQ synthesis. A role in regulation of this step by phosphorylation has been speculated [ ].
Protein Domain
Name: NUDIX hydrolase, conserved site
Type: Conserved_site
Description: MutT is a small bacterial protein (~12-15Kd) involved in the GO system [] responsible for removing an oxidatively damaged form of guanine (8-hydroxy-guanine or 7,8-dihydro-8-oxoguanine) from DNA and the nucleotide pool. 8-oxo-dGTP is inserted opposite dA and dC residues of template DNA with near equal efficiency, leading to A.T to G.C transversions. MutTspecifically degrades 8-oxo-dGTP to the monophosphate, with the concomitant release of pyrophosphate. A short conserved N-terminal region of mutT (designated the MutT domain) is also found in a variety of other prokaryotic, viral and eukaryotic proteins [, , , ].The generic name `NUDIX hydrolases' (NUcleoside DIphosphate linked to some other moiety X) has been coined for this domain family []. Thefamily can be divided into a number of subgroups, of which MutT anti- mutagenic activity represents only one type; most of the rest hydrolysediverse nucleoside diphosphate derivatives (including ADP-ribose, GDP- mannose, TDP-glucose, NADH, UDP-sugars, dNTP and NTP).This signature covers the core region of the NUDIX domain and contains four conserved glutamate residues [ ]. The region spanned by this signature could be part of the active centre of a family of pyrophosphate-releasing NTPases.
Protein Domain
Name: Nudix hydrolase 6-like
Type: Family
Description: This entry represents several nudix hydrolases, including nudix hydrolase 2, 5, 6, 7, 8 and 10. Nudix hydrolases are ubiquitous proteins that hydrolyse a wide range of organic pyrophosphates, including nucleoside di- and triphosphates, dinucleoside and diphosphoinositol polyphosphates, nucleotide sugars and RNA caps, with varying degrees of substrate specificity [ ]. Nudix hydrolase 7 is an Acyl-CoA diphosphatase involved in regulating peroxisomal coenzyme A homeostasis [].
Protein Domain
Name: NUDIX hydrolase-like domain superfamily
Type: Homologous_superfamily
Description: MutT is a small bacterial protein (~12-15Kd) involved in the GO system [ ] responsible for removing an oxidatively damaged form of guanine (8-hydroxy-guanine or 7,8-dihydro-8-oxoguanine) from DNA and the nucleotide pool. 8-oxo-dGTP is inserted opposite dA and dC residues of template DNA with near equal efficiency, leading to A-T to G-C transversions. MutT specifically degrades 8-oxo-dGTP to the monophosphate, with the concomitant release of pyrophosphate. A short conserved N-terminal region of mutT (designated the MutT domain) is also found in a variety of other prokaryotic, viral and eukaryotic proteins [, , , ].The generic name 'NUDIX hydrolases' (NUcleoside DIphosphate linked to some other moiety X) has been coined for this domain superfamily [ ]. The superfamily can be divided into a number of subgroups, of which MutT anti-mutagenic activity represents only one type; most of the rest hydrolyse diverse nucleoside diphosphate derivatives (including ADP-ribose, GDP-mannose, TDP-glucose, NADH, UDP-sugars, dNTP and NTP).
Protein Domain
Name: NUDIX hydrolase domain
Type: Domain
Description: The Nudix superfamily is widespread among eukaryotes, bacteria, archaea and viruses and consists mainly of pyrophosphohydrolases that act upon substrates of general structure NUcleoside DIphosphate linked to another moiety, X (NDP-X) to yield NMP plus P-X. Such substrates include (d)NTPs (both canonical and oxidised derivatives), nucleotide sugars and alcohols, dinucleoside polyphosphates (NpnN), dinucleotide coenzymes and capped RNAs. However, phosphohydrolase activity, including activity towards NDPs themselves, and non-nucleotide substrates such as diphosphoinositol polyphosphates (DIPs), 5-phosphoribosyl 1-pyrophosphate (PRPP), thiamine pyrophosphate (TPP) and dihydroneopterin triphosphate (DHNTP) have also been described. Some superfamily members, such as Escherichia coli mutT, have the ability to degrade potentially mutagenic, oxidised nucleotides while others control the levels of metabolic intermediates and signalling compounds. In procaryotes and simple eucaryotes, the number of Nudix genes varies from 0 to over 30, reflecting the metabolic complexity and adaptability of the organism. Nudix hydrolases are typically small proteins, larger ones having additional domains with interactive or other catalytic functions []. The Nudix domain formed by two β-sheets packed between α-helices [ , ]. It can accomodate sequences of different lengths in the connecting loops and in the antiparallel β-sheet. Catalysis depends on the conserved 23-amino acid Nudix motif (Nudix box), G-x(5)-E-x(5)-[UA]-x-R-E-x(2)-E-E-x-G-U, where U is an aliphatic, hydrophobic residue. This sequence is located in a loop-helix-loop structural motif and the Glu residues in the core of the motif, R-E-x(2)-E-E, play an important role in binding essential divalent cations [ ]. The substrate specificity is determined by other residues outside the Nudix box. For example, CoA pyrophosphatases share the NuCoA motif L-L-T-x-R-[SA]-x(3)-R-x(3)-G-x(3)-F-P-G-G that is located N-terminal of the Nudix box and is involved in CoA recognition [ ].
Protein Domain
Name: Aspartic peptidase domain superfamily
Type: Homologous_superfamily
Description: This domain superfamily is found in aspartic peptidases, including pepsin A and other peptidase A1 family members, the yeast DNA-damage inducible protein 1 (Ddi1) (MEROPS peptidase subfamily A28A), animal retroviral-like aspartic protease (SASPase; MEROPS peptidase subfamily A28B) and Caulobacter PerP peptidase (MEROPS peptidase family A32). The tertiary structure shows a retropepsin-like fold and the peptidase is active as a homodimer [ ].Aspartic peptidases, also known as aspartyl proteases ([intenz:3.4.23.-]), are widely distributed proteolytic enzymes [, , ] known to exist in vertebrates, fungi, plants, protozoa, bacteria, archaea, retroviruses and some plant viruses. All known aspartic peptidases are endopeptidases. A water molecule, activated by two aspartic acid residues, acts as the nucleophile in catalysis. Aspartic peptidases can be grouped into five clans, each of which shows a unique structural fold [].Peptidases in clan AA are either bilobed (family A1 or the pepsin family) or are a homodimer (all other families in the clan, including retropepsin from HIV-1/AIDS) [ ]. Each lobe consists of a single domain with a closed β-barrel and each lobe contributes one Asp to form the active site. Most peptidases in the clan are inhibited by the naturally occurring small-molecule inhibitor pepstatin [].Clan AC contains the single family A8: the signal peptidase 2 family. Members of the family are found in all bacteria. Signal peptidase 2 processes the premurein precursor, removing the signal peptide. The peptidase has four transmembrane domains and the active site is on the periplasmic side of the cell membrane. Cleavage occurs on the amino side of a cysteine where the thiol group has been substituted by a diacylglyceryl group. Site-directed mutagenesis has identified two essential aspartic acid residues which occur in the motifs GNXXDRX and FNXAD (where X is a hydrophobic residue) [ ]. No tertiary structures have been solved for any member of the family, but because of the intramembrane location, the structure is assumed not to be pepsin-like.Clan AD contains two families of transmembrane endopeptidases: A22 and A24. These are also known as "GXGD peptidases"because of a common GXGD motif which includes one of the pair of catalytic aspartic acid residues. Structures are known for members of both families and show a unique, common fold with up to nine transmembrane regions [ ]. The active site aspartic acids are located within a large cavity in the membrane into which water can gain access [].Clan AE contains two families, A25 and A31. Tertiary structures have been solved for members of both families and show a common fold consisting of an α-β-alpha sandwich, in which the beta sheet is five stranded [ , ].Clan AF contains the single family A26. Members of the clan are membrane-proteins with a unique fold. Homologues are known only from bacteria. The structure of omptin (also known as OmpT) shows a cylindrical barrel containing ten beta strands inserted in the membrane with the active site residues on the outer surface [ ].There are two families of aspartic peptidases for which neither structure nor active site residues are known and these are not assigned to clans. Family A5 includes thermopsin, an endopeptidase found only in thermophilic archaea. Family A36 contains sporulation factor SpoIIGA, which is known to process and activate sigma factor E, one of the transcription factors that controls sporulation in bacteria [ ].
Protein Domain
Name: Chromo-like domain superfamily
Type: Homologous_superfamily
Description: This entry represents a chromo (CHRromatin Organization MOdifier) structural domain, which consists of an SH3-like β-barrel capped by a C-terminal helix. Chromo domains are conserved modules of around 60 amino acids that are implicated in the recognition of lysine-methylated histone tails and nucleic acids. Chromo domains were originally identified in Drosophila modifiers of variegation, proteins that alter the structure of chromatin to the condensed morphology of heterochromatin. Domains with a chromo domain structural fold include:Chromo domain, which lacks the first strand of the SH3-like β-barrel.Shadow chromo domain, in which the first strand of the SH3-like β-barrel is altered by insertions.Chromo barrel domain, which is a typical SH3-like β-barrel fold (similar sequence motif to the canonical chromo domain).Histone-like DNA-binding proteins from Archaea [ ].Chromo domains can be found in various nuclear proteins, including heterochromatin protein 1 (HP1) (N-terminal chromo domain and C-terminal chromo shadow domain), where the chromo domain recognises histone tails with specifically methylated lysines [ ]; polycomb protein Pc, which is essential for maintaining the silencing state of homeotic genes during development (chromo domain important for chromatin targeting) []; histone methyltransferase clr4, which regulates silencing and switching at the mating-type loci and to affect chromatin structure at centromeres []; and the ATP-dependent helicase CHD1, which regulates ATP-dependent nucleosome assembly and mobilisation through conserved double chromo domains and a SWI2/SNF2 helicase/ATPase domain [].Chromo barrel domains are found in various histone acetyltransferases, such as MYST1 from Mus musculus (Mouse) and MOF from Drosophila melanogaster (Fruit fly) [ ]. This domain can also be found in the human mortality factor 4-like protein, MRG15.
Protein Domain
Name: Aspartic peptidase, active site
Type: Active_site
Description: Aspartic peptidase, also known as aspartyl proteases ([intenz:3.4.23.-]) are a widely distributed family of proteolytic enzymes [, , ] known to exist in vertebrates, fungi, plants, retroviruses and some plant viruses. Aspartate proteases of eukaryotes are monomeric enzymes which consist of two domains. Each domain contains an active site centred on a catalytic aspartyl residue. The two domains most probably evolved from the duplication of an ancestral gene encoding a primordial domain. Currently known eukaryotic aspartyl proteases are:Vertebrate gastric pepsins A and C (also known as gastricsin). Vertebrate chymosin (rennin), involved in digestion and used for making cheese.Vertebrate lysosomal cathepsins D (EC 3.4.23.5) and E (EC 3.4.23.34).Mammalian renin (EC 3.4.23.15) whose function is to generate angiotensin I from angiotensinogen in the plasma.Fungal proteases such as aspergillopepsin A (EC 3.4.23.18), candidapepsin (EC 3.4.23.24), mucoropepsin (EC 3.4.23.23) (mucor rennin), endothiapepsin (EC 3.4.23.22), polyporopepsin (EC 3.4.23.29), and rhizopuspepsin (EC 3.4.23.21).Yeast saccharopepsin (EC 3.4.23.25) (proteinase A) (gene PEP4). PEP4 is implicated in posttranslational regulation of vacuolar hydrolases.Yeast barrierpepsin (EC 3.4.23.35) (gene BAR1); a protease that cleaves alpha-factor and thus acts as an antagonist of the mating pheromone.Fission yeast sxa1 which is involved in degrading or processing the mating pheromones.This signature contains the active site residues which are conserved in eukaryotic and viral aspartyl proteases.Aspartic peptidases, also known as aspartyl proteases ([intenz:3.4.23.-]), are widely distributed proteolytic enzymes [, , ] known to exist in vertebrates, fungi, plants, protozoa, bacteria, archaea, retroviruses and some plant viruses. All known aspartic peptidases are endopeptidases. A water molecule, activated by two aspartic acid residues, acts as the nucleophile in catalysis. Aspartic peptidases can be grouped into five clans, each of which shows a unique structural fold [].Peptidases in clan AA are either bilobed (family A1 or the pepsin family) or are a homodimer (all other families in the clan, including retropepsin from HIV-1/AIDS) [ ]. Each lobe consists of a single domain with a closed β-barrel and each lobe contributes one Asp to form the active site. Most peptidases in the clan are inhibited by the naturally occurring small-molecule inhibitor pepstatin [].Clan AC contains the single family A8: the signal peptidase 2 family. Members of the family are found in all bacteria. Signal peptidase 2 processes the premurein precursor, removing the signal peptide. The peptidase has four transmembrane domains and the active site is on the periplasmic side of the cell membrane. Cleavage occurs on the amino side of a cysteine where the thiol group has been substituted by a diacylglyceryl group. Site-directed mutagenesis has identified two essential aspartic acid residues which occur in the motifs GNXXDRX and FNXAD (where X is a hydrophobic residue) [ ]. No tertiary structures have been solved for any member of the family, but because of the intramembrane location, the structure is assumed not to be pepsin-like.Clan AD contains two families of transmembrane endopeptidases: A22 and A24. These are also known as "GXGD peptidases"because of a common GXGD motif which includes one of the pair of catalytic aspartic acid residues. Structures are known for members of both families and show a unique, common fold with up to nine transmembrane regions [ ]. The active site aspartic acids are located within a large cavity in the membrane into which water can gain access [].Clan AE contains two families, A25 and A31. Tertiary structures have been solved for members of both families and show a common fold consisting of an α-β-alpha sandwich, in which the beta sheet is five stranded [ , ].Clan AF contains the single family A26. Members of the clan are membrane-proteins with a unique fold. Homologues are known only from bacteria. The structure of omptin (also known as OmpT) shows a cylindrical barrel containing ten beta strands inserted in the membrane with the active site residues on the outer surface [ ].There are two families of aspartic peptidases for which neither structure nor active site residues are known and these are not assigned to clans. Family A5 includes thermopsin, an endopeptidase found only in thermophilic archaea. Family A36 contains sporulation factor SpoIIGA, which is known to process and activate sigma factor E, one of the transcription factors that controls sporulation in bacteria [ ].
USDA
InterMine logo
The Legume Information System (LIS) is a research project of the USDA-ARS:Corn Insects and Crop Genetics Research in Ames, IA.
LegumeMine || ArachisMine | CicerMine | GlycineMine | LensMine | LupinusMine | PhaseolusMine | VignaMine | MedicagoMine
InterMine © 2002 - 2022 Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, United Kingdom