Search our database by keyword

- or -

Examples

  • Search this entire website. Enter identifiers, names or keywords for genes, pathways, authors, ontology terms, etc. (e.g. eve, embryo, zen, allele)
  • Use OR to search for either of two terms (e.g. fly OR drosophila) or quotation marks to search for phrases (e.g. "dna binding").
  • Boolean search syntax is supported: e.g. dros* for partial matches or fly AND NOT embryo to exclude a term

Search results 16201 to 16300 out of 30763 for seed protein

Category restricted to ProteinDomain (x)

0.03s

Categories

Category: ProteinDomain
Type Details Score
Protein Domain
Name: Heat shock chaperonin-binding
Type: Domain
Description: This describes a heat shock chaperonin-binding motif found in the stress-inducible phosphoprotein STI1. Both N- and C-termini of STI1 are capable of binding heat shock proteins [ ] and the domain is found both singly and duplicated in other proteins.
Protein Domain
Name: Phage shock protein, PspA
Type: Family
Description: Members of this protein are the phage shock protein PspA, from the phage shock operon. PspA appears to maintain the proton motive force under stress conditions that include over expression of certain phage secretins, heat shock, ethanol and protein export defects.
Protein Domain
Name: Transcriptional repressor poly-beta-hydroxybutyrate-responsive
Type: Family
Description: Members of this family are transcriptional regulatory proteins found in the vicinity of poly-beta-hydroxybutyrate (PHB) operons in several species of Bacillus. This protein appears to have repressor activity modulated by PHB itself. This protein belongs to the larger PadR family.
Protein Domain
Name: Exosortase E/protease, VPEID-CTERM system
Type: Family
Description: Members of this protein family are fusion proteins of exosortase (N-terminal) and a CAAX prenyl protease domain (C-terminal). Members are restricted to the alpha Proteobacteria. The variant C-terminal protein sequence VPEID-CTERM occurs only in these species, often adjacent [ ].
Protein Domain
Name: Rtr1/RPAP2 domain superfamily
Type: Homologous_superfamily
Description: This superfamily includes a domain found in the human RPAP2 (RNAP II associated polypeptide) protein and in the yeast Rtr1 protein [ ]. It has been suggested that this group of proteins are regulators of core RNA polymerase II function [].
Protein Domain
Name: ATP-GRASP peptide maturase, grasp-with-spasm system
Type: Family
Description: Members of this protein family are ATP-GRASP proteins that occur in a peptide maturation cassette with a SPASM domain protein. SPASM usually occurs as a C-terminal extension to radical SAM enzymes that act as peptide maturases, although it can occur independently.
Protein Domain
Name: CSLREA domain
Type: Domain
Description: This entry represents an N-terminal region, with a motif CSLREA, shared by tandem genes in Acinetobacter that both have the GlyGly-CTERM putative protein-sorting domain. Many proteins with this domain are putative outer membrane proteins (OMPs) with predicted beta strand-forming repeats.
Protein Domain
Name: Type III effector HopJ superfamily
Type: Homologous_superfamily
Description: Pathovars of Pseudomonas syringae interact with their plant hosts via the action of Hrp outer protein (Hop) effector proteins, injected into plant cells by the type III secretion system. The proteins are called HopJ after the original member HopPmaJ [ ].
Protein Domain
Name: Cthe_2751-like superfamily
Type: Homologous_superfamily
Description: This entry includes a group of uncharacterised proteins, including Cthe_2751. The structure of the Cthe_2751 protein ( ) from Clostridium thermocellumhas been solved and shows an all α-helical protein with a central hydrophobic core which provides thermal stability [ ].
Protein Domain
Name: Bacteriophage T4, Y12G
Type: Family
Description: Proteins in this family are bacteriophage Y12G proteins. Gene Y12G encodes a 17.1kDa protein in Gp30-rIII intergenic region, which in T4 is a 75 amino acid basic peptide which has a C terminus rich in charged amino acids [ ][].
Protein Domain
Name: Putative amidoligase enzyme
Type: Family
Description: This family of proteins are likely to act as amidoligase enzymes [ ] Protein in this family are found in conserved gene neighbourhoods encoding a glutamine amidotransferase-like thiol peptidase (in proteobacteria) or an Aig2 family cyclotransferase protein (in firmicutes) [].
Protein Domain
Name: PA1123-like superfamily
Type: Homologous_superfamily
Description: This domain superfamily contains three alpha helices and six beta strands. It is found in a protein that is produced from gene PA1123 of Pseudomonas. The protein PA1123 appears to be present in the biofilm layer and may be a lipoprotein.
Protein Domain
Name: CCDC85 family
Type: Family
Description: This entry includes human CCDC85A/B/C and C. elegans Picc-1 protein. Picc-1 serves as a linker protein which helps to recruit the Rho GTPase-activating protein, pac-1, to adherens junctions [ ]. Human CCDC85B suppresses the beta-catenin activity in a p53-dependent manner [].
Protein Domain
Name: Nmi/IFP 35 domain, N-terminal
Type: Domain
Description: This entry represents the N terminus of interferon-induced 35kDa protein (IFP 35) (approximately 80 residues long), which contains a leucine zipper motif in an alpha helical configuration [ ]. This group of proteins also includes N-myc-interactor (Nmi), a homologous interferon-induced protein.
Protein Domain
Name: YehS-like
Type: Family
Description: This entry represents a family of bacterial proteins that includes Uncharacterized protein YehS from Escherichia coli, which appears to be associated with growth in the presence of n-butanol or n-hexane [ ]. This protein is predicted to have an all-α structure.
Protein Domain
Name: RNA-directed RNA polymerase L, C-terminal
Type: Domain
Description: This entry represents a common C-terminal region shared by paramyxovirus-like RNA-dependent RNA polymerases. These are often called L protein (large polymerase protein). Capping of mRNA requires RNA triphosphatase and guanylyl transferase activities, demonstrated for the rinderpest virus L protein [ ].
Protein Domain
Name: FR47-like
Type: Domain
Description: Proteins in this entry have a conserved region similar to the C-terminal region of the Drosophila melanogaster (Fruit fly) hypothetical protein FR47 ( ). This protein has been found to consist of two N-acyltransferase-like domains swapped with the C-terminal strands.
Protein Domain
Name: PA1123-like domain
Type: Domain
Description: This domain contains three alpha helices and six beta strands. It is found in a protein that is produced from gene PA1123 of Pseudomonas. The protein PA1123 appears to be present in the biofilm layer and may be a lipoprotein.
Protein Domain
Name: LYRM2, LYR domain
Type: Domain
Description: LYRM2 is an uncharacterised LYR motif-containing protein that belongs to the Complex1_LYR-like superfamily which consists of proteins of diverse functions that are exclusively found in eukaryotes; these proteins contain the conserved tripeptide 'LYR' close to the N terminus [ ].
Protein Domain
Name: Family of unknown function DUF5670
Type: Family
Description: This family of proteins is found in bacteria and archaea. Proteins in this family are approximately 50 amino acids in length. There is a single completely conserved residue W that may be functionally important. These proteins contain two transmembrane helices.
Protein Domain
Name: Family of unknown function DUF5662
Type: Family
Description: This family of proteins is found in bacteria, archaea, eukaryotes and viruses. Proteins in this family are typically between 175 and 193 amino acids in length. Many proteins in this family are annotated as catalase, but this could not be verified.
Protein Domain
Name: Family of unknown function DUF5677
Type: Family
Description: This family of proteins is found in bacteria, archaea, eukaryotes and viruses. Proteins in this family are typically between 250 and 347 amino acids in length. These proteins contain a conserved RXXXE motif an invariant Histidine that may be functionally important.
Protein Domain
Name: Domain of unknown function DUF5523
Type: Domain
Description: This entry represents a domain of unknown function found in Eukaryotes. Many (but not all) proteins matched by this entry, such as the human protein Coiled-coil and C2 domain-containing protein 2A (), also contain and domains at the C-terminal region.
Protein Domain
Name: Oligopeptide transporter
Type: Family
Description: This entry represents a subfamily of OPT proteins that are involved in oligopeptide transport. Their transport activity is proton dependent [ ]. These proteins may constitute a major route for the absorption of the end products of protein digestion [].
Protein Domain
Name: Phage tail tube protein, lambda-like
Type: Family
Description: This family represents the phage-tail-tube protein from a set of Siphoviridae from Gammaproteobacteria. Tail tube proteins polymerise with the assistance of the Tail-tip complex, a tape measure protein and two chaperones. Infectivity of host is delivered through the tube [ ].
Protein Domain
Name: Bacteriophage HP1, Orf23
Type: Family
Description: This entry is represented by Bacteriophage HP1, Orf23. The characteristics of the protein distribution suggest prophage matches in addition to the phage matches.This family of proteins is a component of a contractile injection system related to phage sheath proteins [ ].
Protein Domain
Name: YebC-like
Type: Homologous_superfamily
Description: This domain is found in uncharacterised proteins, including E. coli protein YebC ( ), a probable transcriptional regulator, and Aq1575 from Aquifex aeolicus. This protein reveals a monomer consisting of three domains arranged along a pseudo threefold symmetry axis [ ].
Protein Domain
Name: Plant EC metallothionein-like protein, family 15
Type: Family
Description: Metallothioneins (MT) are small proteins that bind heavy metals, such as zinc, copper, cadmium, nickel, etc. They have a high content of cysteine residues that bind the metal ions through clusters of thiolate bonds [ , , , ]. An empirical classification into three classes has been proposed by Fowler and coworkers [] and Kojima []. Members of class I are defined to include polypeptides related in the positions of their cysteines to equine MT-1B, and include mammalian MTs as well as MTs from crustaceans and molluscs. Class II groups MTs from a variety of species, including sea urchins, fungi, insects and cyanobacteria. Class III MTs are atypical polypeptides composed of gamma-glutamylcysteinyl units []. This original classification system has been found to be limited, in the sense that it does not allow clear differentiation of patterns of structural similarities, either between or within classes. Consequently, all class I and class II MTs (the proteinaceous sequences) have now been grouped into families of phylogenetically-related and thus alignable sequences. This system subdivides the MT superfamily into families, subfamilies, subgroups, and isolated isoforms and alleles. The metallothionein superfamily comprises all polypeptides that resemble equine renal metallothionein in several respects []: e.g., low molecular weight; high metal content; amino acid composition with high Cys and low aromatic residue content; unique sequence with characteristic distribution of cysteines, and spectroscopic manifestations indicative of metal thiolate clusters. A MT family subsumes MTs that share particular sequence-specific features and are thought to be evolutionarily related. The inclusion of a MT within a family presupposes that its amino acid sequence is alignable with that of all members. Fifteen MT families have been characterised, each family being identified by its number and its taxonomic range: e.g., Family 1: vertebrate MTs.Family 15 consists of planta MTs. Its members are recognised by the sequence pattern [YFH]-x(5,25)-C-[SKD]-C-[GA]-[SDPAT]-x(0,1)-C-x-[CYF] which yields all plant sequences, but also MTCU_HELPO and the non-MT ITB3_HUMAN. The taxonomic range of the members extends to planta. Planta MTs are 45-84 residue proteins, containing 17 conserved cysteines that bind 5 zinc ions. Generally, there are two Cys-rich regions (domain 1 and domain 3) separated by a Cys-poor region (domain 2) and only the domain 2 contains unusual residues. It is believed that the proteins may have a role in Zn2+homeostasis during embryogenesis. Family 15 includes the following subfamilies: p1, p2, p2v, p3, pec, p21.
Protein Domain
Name: Transcription regulator LuxR, C-terminal
Type: Domain
Description: This domain is a DNA-binding, helix-turn-helix (HTH) domain of about 65 amino acids, present in transcription regulators of the LuxR/FixJ family of response regulators. The domain is named after Vibrio fischeri luxR, a transcriptional activator for quorum-sensing control of luminescence. LuxR-type HTH domain proteins occur in a variety of organisms. The DNA-binding HTH domain is usually located in the C-terminal region; the N-terminal region often containing an autoinducer-binding domain or a response regulatory domain. Most luxR-type regulators act as transcription activators, but some can be repressors or have a dual role for different sites. LuxR-type HTH regulators control a wide variety of activities in various biological processes.The luxR-type, DNA-binding HTH domain forms a four-helical bundle structure. The HTH motif comprises the second and third helices, known as the scaffold and recognition helix, respectively. The HTH binds DNA in the major groove, where the N-terminal part of the recognition helix makes most of the DNA contacts. The fourth helix is involved in dimerisation of gerE and traR. Signalling events by one of the four activation mechanisms described below lead to multimerisation of the regulator. The regulators bind DNA as multimers [ , , ].LuxR-type HTH proteins can be activated by one of four different mechanisms:1) Regulators which belong to a two-component sensory transduction system where the protein is activated by its phosphorylation, generally on an aspartate residue, by a transmembrane kinase [ , ]. Some proteins that belong to this category are:Rhizobiaceae fixJ (global regulator inducing expression of nitrogen-fixation genes in microaerobiosis)Escherichia coli and Salmonella typhimurium uhpA (activates hexose phosphate transport gene uhpT)E. coli narL and narP (activate nitrate reductase operon)Enterobacteria rcsB (regulation of exopolysaccharide biosynthesis in enteric and plant pathogenesis)Bordetella pertussis bvgA (virulence factor)Bacillus subtilis coma (involved in expression of late-expressing competence genes)2) Regulators which are activated, or in very rare cases repressed, when bound to N-acyl homoserine lactones, which are used as quorum sensing molecules in a variety of Gram-negative bacteria [ ]:V. fischeri luxR (activates bioluminescence operon)Agrobacterium tumefaciens traR (regulation of Ti plasmid transfer)Erwinia carotovora carR (control of carbapenem antibiotics biosynthesis)E. carotovora expR (virulence factor for soft rot disease; activates plant tissue macerating enzyme genes)Pseudomonas aeruginosa lasR (activates elastase gene lasB)Erwinia chrysanthemi echR and Erwinia stewartii esaRPseudomonas chlororaphis phzR (positive regulator of phenazine antibiotic production)Pseudomonas aeruginosa rhlR (activates rhlAB operon and lasB gene)3) Autonomous effector domain regulators, without a regulatory domain, represented by gerE [ ].4) Multiple ligand-binding regulators, exemplified by malT [ ].
Protein Domain
Name: 3-isopropylmalate dehydratase, small subunit
Type: Family
Description: 3-isopropylmalate dehydratase (or isopropylmalate isomerase; ) catalyses the stereo-specific isomerisation of 2-isopropylmalate and 3-isopropylmalate, via the formation of 2-isopropylmaleate. This enzyme performs the second step in the biosynthesis of leucine, and is present in most prokaryotes and many fungal species. The prokaryotic enzyme is a heterodimer composed of a large (LeuC) and small (LeuD) subunit, while the fungal form is a monomeric enzyme. Both forms of isopropylmalate are related and are part of the larger aconitase family [ ]. Aconitases are mostly monomeric proteins which share four domains in common and contain a single, labile [4Fe-4S]cluster. Three structural domains (1, 2 and 3) are tightly packed around the iron-sulphur cluster, while a fourth domain (4) forms a deep active-site cleft. The prokaryotic enzyme is encoded by two adjacent genes, leuC and leuD, corresponding to aconitase domains 1-3 and 4 respectively [ , ]. LeuC does not bind an iron-sulphur cluster. It is thought that some prokaryotic isopropylamalate dehydrogenases can also function as homoaconitase , converting cis-homoaconitate to homoisocitric acid in lysine biosynthesis [ ]. Homoaconitase has been identified in higher fungi (mitochondria) and several archaea and one thermophilic species of bacteria, Thermus thermophilus []. It is also found in the higher plant Arabidopsis thaliana, where it is targeted to the chloroplast [].This entry represents a region of the small subunit. The structure of the Pyrococcus horikoshii small subunit ( ) has recently been determined [ ]. As expected the structure of this polypeptide is similar to that of aconitase domain 4, though one alpha helix is replaced by a short loop with relatively high temperature factor values. This loop region is thought to be important for substrate recognition. Unlike other aconitase family proteins, this subunit formed a tetramer through disulphide linkages, though it is not expected to interfere with its interaction with the large subunit. These disulphide linkages would be expected to confer thermostability on the enzyme, reflecting the thermophilic lifestyle of the organism.Homoaconitase, aconitase, and 3-isopropylmalate dehydratase have similar overall structures. All are dehydratases ( ) and bind a [4Fe-4S]-cluster. 3-isopropylmalate dehydratase is split into large (leuC) and small (leuD) chains in eubacteria. Several pairs of archaeal proteins resemble the leuC and leuD pair in length and sequence but even more closely resemble the respective domains of homoaconitase, and their identity is uncertain. The archaeal leuD-like proteins are not included in group.
Protein Domain
Name: Mediator complex subunit Med13, C-terminal
Type: Domain
Description: This entry represents the C-terminal domain of Med13. This domain is also identified as an RNaseH domain of the medPIWI PIWI/Argonaute module. medPIWI is the core domain found in the Med13 protein. The medPIWI module in Med13 is predicted to bind double-stranded nucleic acids, triggering the experimentally-observed conformational switch in the CDK8 subcomplex which regulates the Mediator complex [ ]. Med13 is a component of the SRB8-11 complex. The SRB8-11 complex is a regulatory module of the Mediator complex, which may be involved in the transcriptional repression of a subset of genes regulated by Mediator. It acts by inhibiting the association of the Mediator complex with RNA polymerase II to form the holoenzyme complex [].The Mediator complex is a coactivator involved in the regulated transcription of nearly all RNA polymerase II-dependent genes. Mediator functions as a bridge to convey information from gene-specific regulatory proteins to the basal RNA polymerase II transcription machinery. The Mediator complex, having a compact conformation in its free form, is recruited to promoters by direct interactions with regulatory proteins and serves for the assembly of a functional preinitiation complex with RNA polymerase II and the general transcription factors. On recruitment the Mediator complex unfolds to an extended conformation and partially surrounds RNA polymerase II, specifically interacting with the unphosphorylated form of the C-terminal domain (CTD) of RNA polymerase II. The Mediator complex dissociates from the RNA polymerase II holoenzyme and stays at the promoter when transcriptional elongation begins. The Mediator complex is composed of at least 31 subunits: MED1, MED4, MED6, MED7, MED8, MED9, MED10, MED11, MED12, MED13, MED13L, MED14, MED15, MED16, MED17, MED18, MED19, MED20, MED21, MED22, MED23, MED24, MED25, MED26, MED27, MED29, MED30, MED31, CCNC, CDK8 and CDC2L6/CDK11. The subunits form at least three structurally distinct submodules. The head and the middle modules interact directly with RNA polymerase II, whereas the elongated tail module interacts with gene-specific regulatory proteins. Mediator containing the CDK8 module is less active than Mediator lacking this module in supporting transcriptional activation. The head module contains: MED6, MED8, MED11, SRB4/MED17, SRB5/MED18, ROX3/MED19, SRB2/MED20 and SRB6/MED22. The middle module contains: MED1, MED4, NUT1/MED5, MED7, CSE2/MED9, NUT2/MED10, SRB7/MED21 and SOH1/MED31. CSE2/MED9 interacts directly with MED4. The tail module contains: MED2, PGD1/MED3, RGR1/MED14, GAL11/MED15 and SIN4/MED16. The CDK8 module contains: MED12, MED13, CCNC and CDK8. Individual preparations of the Mediator complex lacking one or more distinct subunits have been variously termed ARC, CRSP, DRIP, PC2, SMCC and TRAP.
Protein Domain
Name: Vasopressin receptor
Type: Family
Description: G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups [ ]. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [, , , , ]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The rhodopsin-like GPCRs (GPCRA) represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7 transmembrane (TM) helices [ , , ].Vasopressin and oxytocin are members of the neurohypophyseal hormone family found in all mammalian species. They are present in high levels in theposterior pituitary. Vasopressin has an essential role in the control of the water content of the body, acting in the kidney to increase water andsodium absorption. In higher concentrations, vasopressin stimulates contraction of vascular smooth muscle, stimulates glycogen breakdown in theliver, induces platelet activation, and evokes release of corticotrophin from the anterior pituitary. Vasopressin and its analogues are usedclinically to treat diabetes insipidus. Oxytocin stimulates contraction of uterine smooth muscle, and stimulates milk secretion in response tosuckling by inducing contraction of myoepithelial cells in the mammary gland . Clinically, it is used to induce labour and promote lactation.
Protein Domain
Name: Annexin A6
Type: Family
Description: The annexins (or lipocortins) are a family of proteins that bind to phospholipids in a calcium-dependent manner [ ]. The 12 annexins common to vertebrates are classified in the annexin A family and named as annexins A1-A13 (or ANXA1-ANXA13), leaving A12 unassigned in the official nomenclature. Annexins outside vertebrates are classified into families B (in invertebrates), C (in fungi and some groups of unicellular eukaryotes), D (in plants), and E (in protists) []. Annexins are absent from yeasts and prokaryotes [].Most eukaryotic species have 1-20 annexin (ANX) genes. All annexins share a core domain made up of four similar repeats, each approximately 70 amino acids long [ ]. Each individual annexin repeat (sometimes referred to as endonexin folds) is folded into five α-helices, and in turn are wound into a right-handed super-helix; they usually contain a characteristic 'type 2' motif for binding calcium ions with the sequence 'GxGT-[38 residues]-D/E'. Animal and fungal annexins also have variable amino-terminal domains. The core domains of most vertebrate annexins have been analysed by X-ray crystallography, revealing conservation of their secondary and tertiary structures despite only 45-55% amino-acid identity among individual members. The four repeats pack into a structure that resembles a flattened disc, with a slightly convex surface on which the Ca2+ -binding loops are located and a concave surface at which the amino and carboxyl termini come into close apposition. Annexins are traditionally thought of as calcium-dependent phospholipid-binding proteins, but recent work suggests a more complex set of functions. The family has been linked with inhibition of phospholipase activity, exocytosis and endocytosis, signal transduction, organisation of the extracellular matrix, resistance to reactive oxygen species and DNA replication [ ].This entry represents Type VI annexins that are found in various secretory cells, e.g. B- and T-cells (where it is found in greater concentrations in mature cells), and the lactation ducts of non-lactating human breasts. The observation that the protein is absent in lactating breasts suggests that it inhibits secretion. The type VI class may also play a part in the regulation of some calcium channels, and its presence may cause arrest of cell growth, before the DNA-replication stage, in cells growing at low serum concentrations. This annexin class is unusual in containing eight repeats of the conserved domain rather than the usual four. It is thus believed that the protein has arisen from a gene duplication event.
Protein Domain
Name: Cholecystokinin receptor
Type: Family
Description: G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups [ ]. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [, , , , ]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The rhodopsin-like GPCRs (GPCRA) represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7 transmembrane (TM) helices [ , , ].Cholecystokinins (CCKs) and gastrins are naturally-occurring peptides that share a common C-terminal sequence, GWMDF; full biological activity residesin this region. In the periphery, the principal physiological actions of CCK include gall bladder contraction, pancreatic enzyme secretion andregulation of secretion/absorption in the gastrointestinal tract. In the CNS, CCK induces analgesia, satiety and a decrease in exploratory behaviour.In mesolimbic and mesocortical neurons, CCK coexists with dopamine. It is found throughout the digestive tract, with high concentrations in theduodenum and jejunum. It is also found in peripheral nerves to other smooth muscles and to secretory glands, and is one of the most abundant peptides inthe brain. The principal physiological role of gastrin is to stimulate acid secretion in the stomach; it also has trophic effects on gastric mucosa. It is found predominantly in the stomach and intestine, but also invagal nerves.
Protein Domain
Name: Adenosine receptor
Type: Family
Description: G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups [ ]. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [, , , , ]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The rhodopsin-like GPCRs (GPCRA) represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7 transmembrane (TM) helices [ , , ].In addition to their role in energy metabolism, purines (especially adenosine and adenine nucleotides) produce a wide range of pharmacologicaleffects mediated by activation of cell surface receptors. Distinct receptors exist for adenosine. In the periphery, the main effects ofadenosine include vasodilation, bronchoconstriction, immunosuppresion, inhibition of platelet aggregation, cardiac depression, stimulation ofnociceptive afferents, inhibition of neurotransmitter release and inhibition of the release of other factors, e.g. hormones. In the CNS,adenosine exerts a pre- and post-synaptic depressant action, reducing motor activity, depressing respiration, inducing sleep and relieving anxiety. Thephysiological role of adenosine is thought to be to adjust energy demands in line with oxygen supply. Many of the clinical actions of methylxanthinesare thought to be mediated through antagonism of adenosine receptors. Four subtypes of receptor have been identified, designated A1, A2A, A2B and A3.
Protein Domain
Name: Melanocortin/ACTH receptor
Type: Family
Description: G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups [ ]. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [, , , , ]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The rhodopsin-like GPCRs (GPCRA) represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7 transmembrane (TM) helices [ , , ].Adrenocorticotrophin (ACTH), melanocyte-stimulating hormones (MSH) and beta-endorphin are peptide products of pituitary pro-opiomelanocortin.ACTH regulates synthesis and release of glucocorticoids and aldosterone in the adrenal cortex; it also has a trophic action on these cells.ACTH and beta-endorphin are synthesised and released in response to corticotrophin-releasing factor at times of stress (heat, cold, infections,etc.) - their release leads to increased metabolism and analgesia. MSH has a trophic action on melanocytes, and regulates pigment productionin fish and amphibia. The ACTH receptor is found in high levels in the adrenal cortex - binding sites are present in lower levels in theCNS. The MSH receptor is expressed in high levels in melanocytes, melanomas and their derived cell lines. Receptors are found in lowlevels in the CNS. MSH regulates temperature control in the septal region of the brain and releases prolactin from the pituitary.
Protein Domain
Name: Retinal pigment epithelium GPCR
Type: Family
Description: G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups [ ]. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence [ ]. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [, , , , ]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The rhodopsin-like GPCRs (GPCRA) represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7 transmembrane (TM) helices [ , , ].Retinal pigment epithelium (RPE) hosts a putative GPCR. The RPE-retinal GPCR (RGR) covalently binds all-trans- and 11-cis-retinal after reductionby sodium borohydride [ ]. All-trans-retinal is bound preferentially overthe 11-cis isomer. The human sequence is 86% identical to that of bovine RGR [, ], and a lysine residue, analogous to the retinaldehyde attachmentsite of rhodopsin, is conserved in TM domain 7 [ ]. The human gene, whosestructure is distinct from that of the visual pigment genes, spans 14.8 kb and is split into 7 exons []. This suggests that the rgr gene representsthe earliest independent branch of the vertebrate opsin gene family [ ].Since the RGR gene product preferentially binds all-trans-retinal, it is thought that one of its functions may be to catalyse isomerisation of thechromophore by a retinochrome-like mechanism [ ].
Protein Domain
Name: G protein-coupled receptor 37 orphan
Type: Family
Description: G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups [ ]. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [, , , , ]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The rhodopsin-like GPCRs (GPCRA) represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7 transmembrane (TM) helices [ , , ].Several 7TM receptors have been cloned but their endogenous ligands are unknown; these have been termed orphan receptors. GPR37 was isolated from aset of human brain frontal lobe expressed sequence tags. The GPR37 genomic sequence was subsequently mapped to chromosome 7. A putative orthologue, 83%identical to the human form in terms of predicted amino acid sequence, has since been identified in the mouse genome and mapped to chromosome 6. Northern blot analyses revealed a highly expressed 3.8kb mRNA and a less abundant 8kb mRNA in both human and mouse brain. The 3.8kb mRNA was also less abundantly expressed in human liver and placenta, and a further 3kb mRNA was found in mouse testis [].
Protein Domain
Name: P2Y13 purinoceptor
Type: Family
Description: There are three distinct families of extracellular receptors for purine and pyrimidine nucleotides [ ], known as P1, P2X and P2Y purinoceptors []. These receptors induce a wide variety of biological effects and are involved in many different cellular functions [, , ]. P2X receptors are ligand-gated ion channels, whereas P1 and P2Y receptors are rhodopsin-like G protein-coupled receptors [, ]. The families also differ by their method of activation: P1 receptors are preferentially activated by adenosine [], P2X via ATP [], whereas the P2Y receptors, in addition to being activated by ATP, are activated by different adenine and/or uridine nucleoside di- and triphosphates (ADP, UDP, UTP, UDP and UDP-glucose) [].The P2Y purinoceptors currently consist of eleven subtypes: P2Y1, P2Y2, P2Y3 P2Y4, P2Y6, P2Y8, P2Y10, P2Y11, P2Y12, P2Y13 and P2Y14 [ , , ]. P2Y3 has, as yet, only been found in birds [], whilst the rest have been cloned in humans. The gaps in P2Y receptor numbering are due to the reclassification of some receptors that were initially associated with to the P2Y family. These include P2Y5 (now known as lysophosphatidic acid receptor 6), P2Y7 (now leukotriene B4 receptor) and P2Y9 (lysophosphatidic acid receptor 4) [, , , ]. P2Y purinoceptor subtypes have different pharmacological selectivities, which overlap in some cases, for various adenosine and uridine nucleotides. They are widely expressed and are involved in platelet aggregation, vasodilation and neuromodulation, and a range of other processes, such as ion flux, differentiation, and synaptic communication [ , , , ]. They exert their varied biological functions based on different G-protein coupling []. Each receptor subtype can couple to multiple G proteins, either Gi, Gq/11 or Gs, triggering the activation of diverse intracellular signalling cascades (stimulation of phospholipase C through Gq/11, stimulation of adenylyl cyclase via Gs, or ihibition of adenylyl cyclase via Gi [, ]).This entry represents P2Y13 receptor (previously known as SP174 and GPR86), it is primarily coupled to Gi/o proteins [ , ]. ADP is the naturally agonist of the P2Y13 receptor [] and upon activation and coupling inhibits adenylate cyclase formation []. The P2Y12 receptor is expressed at highest levels in the brain and a number of immune tissues, particularly the spleen and is also found in the placenta, liver, bone marrow, lung [, , , ]. It is thought to play a role in hematopoiesis and the immune system [].
Protein Domain
Name: Vasopressin V1B receptor
Type: Family
Description: G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups [ ]. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [, , , , ]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The rhodopsin-like GPCRs (GPCRA) represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7 transmembrane (TM) helices [ , , ].Vasopressin and oxytocin are members of the neurohypophyseal hormone family found in all mammalian species. They are present in high levels in theposterior pituitary. Vasopressin has an essential role in the control of the water content of the body, acting in the kidney to increase water andsodium absorption. In higher concentrations, vasopressin stimulates contraction of vascular smooth muscle, stimulates glycogen breakdown in theliver, induces platelet activation, and evokes release of corticotrophin from the anterior pituitary. Vasopressin and its analogues are usedclinically to treat diabetes insipidus.In the periphery, the V1A receptor is found in high levels in vascular smooth muscle, myometrium and the bladder where it mediates contraction.V1B receptors can be distinguished from V1A receptors by the low affinity of certain antagonists at the former. The receptors stimulate phosphoinositidemetabolism and are found in the anterior pituitary.
Protein Domain
Name: Phosphotransferase system, glucitol/sorbitol-specific IIA component superfamily
Type: Homologous_superfamily
Description: The phosphoenolpyruvate-dependent sugar phosphotransferase system (PTS) [ , ] is a major carbohydrate transport system in bacteria. The PTS catalyses the phosphorylation of incoming sugar substrates and coupled with translocation across the cell membrane, makes the PTS a link between the uptake and metabolism of sugars.The general mechanism of the PTS is the following: a phosphoryl group from phosphoenolpyruvate (PEP) is transferred via a signal transduction pathway, to enzyme I (EI) which in turn transfers it to a phosphoryl carrier, the histidine protein (HPr). Phospho-HPr then transfers the phosphoryl group to a sugar-specific permease, a membrane-bound complex known as enzyme 2 (EII), which transports the sugar to the cell. EII consists of at least three structurally distinct domains IIA, IIB and IIC []. These can either be fused together in a single polypeptide chain or exist as two or three interactive chains, formerly called enzymes II (EII) and III (EIII). The first domain (IIA or EIIA) carries the first permease-specific phosphorylation site, a histidine which is phosphorylated by phospho-HPr. The second domain (IIB or EIIB) is phosphorylated by phospho-IIA on a cysteinyl or histidyl residue, depending on the sugar transported. Finally, the phosphoryl group is transferred from the IIB domain to the sugar substrate concomitantly with the sugar uptake processed by the IIC domain. This third domain (IIC or EIIC) forms the translocation channel and the specific substrate-binding site. An additional transmembrane domain IID, homologous to IIC, can be found in some PTSs, e.g. for mannose [ , , , ]. The Man family is unique in several respects among PTS permease families:It is the only PTS family in which members possess a IID protein.It is the only PTS family in which the IIB constituent is phosphorylated on a histidyl rather than a cysteyl residue.Its permease members exhibit broad specificity for a range of sugars, rather than being specific for just one or a few sugars.This entry consists only of glucitol-specific transporters, and occur both in Gram-negative and Gram-positive bacteria. The system in Escherichia coli consists of a IIA protein, and a IIBC protein. This superfamily represents specifically the IIA component. The structure of this component is composed of a close barrel fold with mixed sheet; it has two overside connections and consists of two intertwinned structural repeats.
Protein Domain
Name: Mediator complex, subunit Med18
Type: Family
Description: The Mediator complex is a coactivator involved in the regulated transcription of nearly all RNA polymerase II-dependent genes. Mediator functions as a bridge to convey information from gene-specific regulatory proteins to the basal RNA polymerase II transcription machinery. The Mediator complex, having a compact conformation in its free form, is recruited to promoters by direct interactions with regulatory proteins and serves for the assembly of a functional preinitiation complex with RNA polymerase II and the general transcription factors. On recruitment the Mediator complex unfolds to an extended conformation and partially surrounds RNA polymerase II, specifically interacting with the unphosphorylated form of the C-terminal domain (CTD) of RNA polymerase II. The Mediator complex dissociates from the RNA polymerase II holoenzyme and stays at the promoter when transcriptional elongation begins. The Mediator complex is composed of at least 31 subunits: MED1, MED4, MED6, MED7, MED8, MED9, MED10, MED11, MED12, MED13, MED13L, MED14, MED15, MED16, MED17, MED18, MED19, MED20, MED21, MED22, MED23, MED24, MED25, MED26, MED27, MED29, MED30, MED31, CCNC, CDK8 and CDC2L6/CDK11. The subunits form at least three structurally distinct submodules. The head and the middle modules interact directly with RNA polymerase II, whereas the elongated tail module interacts with gene-specific regulatory proteins. Mediator containing the CDK8 module is less active than Mediator lacking this module in supporting transcriptional activation. The head module contains: MED6, MED8, MED11, SRB4/MED17, SRB5/MED18, ROX3/MED19, SRB2/MED20 and SRB6/MED22. The middle module contains: MED1, MED4, NUT1/MED5, MED7, CSE2/MED9, NUT2/MED10, SRB7/MED21 and SOH1/MED31. CSE2/MED9 interacts directly with MED4. The tail module contains: MED2, PGD1/MED3, RGR1/MED14, GAL11/MED15 and SIN4/MED16. The CDK8 module contains: MED12, MED13, CCNC and CDK8. Individual preparations of the Mediator complex lacking one or more distinct subunits have been variously termed ARC, CRSP, DRIP, PC2, SMCC and TRAP.Med18 is one subunit of the Mediator complex and a component of the head module that is involved in stimulating basal RNA polymerase II (PolII) transcription. Med18 consists of an eight-stranded β-barrel with a central pore and three flanking helices. It complexes with Med8 and Med20 proteins by forming a heterodimer of two-fold symmetry with Med20 and binding the C-terminal α-helix region of Med8 across the top of its barrel. This complex creates a multipartite TBP-binding site that can be modulated by transcriptional activators [ ].
Protein Domain
Name: 7TM GPCR, serpentine receptor class bc (Srbc)
Type: Family
Description: G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups [ ]. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [, , , , ]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [ ].The nematode Caenorhabditis elegans has only 14 types of chemosensory neuron, yet is able to sense and respond to several hundred different chemicals because each neuron detects several stimuli [ ]. Chemoperception is one of the central senses of soil nematodes like C. elegans which are otherwise 'blind' and 'deaf' []. Chemoreception in C. elegans is mediated by members of the seven-transmembrane G-protein-coupled receptor class (7TM GPCRs). More than 1300 potential chemoreceptor genes have been identified in C. elegans, which are generally prefixed sr for serpentine receptor. The receptor superfamilies include Sra (Sra, Srb, Srab, Sre), Str (Srh, Str, Sri, Srd, Srj, Srm, Srn) and Srg (Srx, Srt, Srg, Sru, Srv, Srxa), as well as the families Srw, Srz, Srbc, Srsx and Srr [, , ]. Many of these proteins have homologues in Caenorhabditis briggsae.This entry represents serpentine receptor class b (Srb) from the Sra superfamily [ ]. Srb receptors contain 6-8 hydrophobic, putative transmembrane, regions and can be distinguished from other 7TM GPCR receptors by their own characteristic TM signatures.Srbc is a solo family amongst the superfamilies of chemoreceptors.
Protein Domain
Name: 7TM GPCR, serpentine receptor class r (Str)
Type: Family
Description: G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups [ ]. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [, , , , ]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The nematode Caenorhabditis elegans has only 14 types of chemosensory neuron, yet is able to sense and respond to several hundred different chemicals because each neuron detects several stimuli [ ]. Chemoperception is one of the central senses of soil nematodes like C. elegans which are otherwise 'blind' and 'deaf' []. Chemoreception in C. elegans is mediated by members of the seven-transmembrane G-protein-coupled receptor class (7TM GPCRs). More than 1300 potential chemoreceptor genes have been identified in C. elegans, which are generally prefixed sr for serpentine receptor. The receptor superfamilies include Sra (Sra, Srb, Srab, Sre), Str (Srh, Str, Sri, Srd, Srj, Srm, Srn) and Srg (Srx, Srt, Srg, Sru, Srv, Srxa), as well as the families Srw, Srz, Srbc, Srsx and Srr [ , , ]. Many of these proteins have homologues in Caenorhabditis briggsae.This entry represents serpentine receptor class r (Str) from the Str superfamily [ , ]. Almost a quarter (22.5%) of str and srj family genes and pseudogenes in C. elegans appear to have been newly formed by gene duplications since the species split [].
Protein Domain
Name: 7TM GPCR, serpentine receptor class ab (Srab)
Type: Family
Description: G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups [ ]. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [, , , , ]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The nematode Caenorhabditis elegans has only 14 types of chemosensory neuron, yet is able to sense and respond to several hundred different chemicals because each neuron detects several stimuli []. Chemoperception is one of the central senses of soil nematodes like C. elegans which are otherwise 'blind' and 'deaf' []. Chemoreception in C. elegans is mediated by members of the seven-transmembrane G-protein-coupled receptor class (7TM GPCRs). More than 1300 potential chemoreceptor genes have been identified in C. elegans, which are generally prefixed sr for serpentine receptor. The receptor superfamilies include Sra (Sra, Srb, Srab, Sre), Str (Srh, Str, Sri, Srd, Srj, Srm, Srn) and Srg (Srx, Srt, Srg, Sru, Srv, Srxa), as well as the families Srw, Srz, Srbc, Srsx and Srr [, , ]. Many of these proteins have homologues in Caenorhabditis briggsae.Srab is part of the Sra superfamily of chemoreceptors. The expression pattern of the srab genes is biologically intriguing. Of the six promoters successfully expressed in transgenic organisms, one was exclusively expressed in the tail phasmid neurons, two were exclusively expressed in a head amphid neuron, and two were expressed both in the head and tail neurons as well as a limited number of other cells [ ].
Protein Domain
Name: Sodium bicarbonate cotransporter
Type: Family
Description: Bicarbonate (HCO 3-) transport mechanisms are the principal regulators of pH in animal cells. Such transport also plays a vital role in acid-base movements in the stomach, pancreas, intestine, kidney, reproductive organs and the central nervous system. Functional studies have suggested four different HCO 3-transport modes. Anion exchanger proteins exchange HCO 3-for Cl -in a reversible, electroneutral manner [ ]. Na+/HCO 3-co-transport proteins mediate the coupled movement of Na +and HCO 3-across plasma membranes, often in an electrogenic manner [ ]. Na+driven Cl -/HCO 3-exchange and K +/HCO 3-exchange activities have also been detected in certain cell types, although the molecular identities of the proteins responsible remain to be determined. Sequence analysis of the two families of HCO 3-transporters that have been cloned to date (the anion exchangers and Na +/HCO 3-co-transporters) reveals that they are homologous. This is not entirely unexpected, given that they both transport HCO 3-and are inhibited by a class of pharmacological agents called disulphonic stilbenes [ ]. They share around ~25-30% sequence identity, which is distributed along their entire sequence length, and have similar predicted membrane topologies, suggesting they have ~10 transmembrane (TM) domains.Na +/HCO 3-co-transport proteins are involved in cellular HCO 3-absorption and secretion, and also with intracellular pH regulation. They mediate thecoupled movement of Na +and HCO 3-across plasma membranes in most of the cell types so far investigated. A single HCO3-is transported together with one to three Na+; this transport mode is therefore often electrogenic. In the kidney, an electrogenic Na+/HCO 3-co-transporter is the principal HCO3-transporter of the renal proximal tubule, and is responsible forreabsorption of more than 85% of the filtered load of HCO 3-[ ]. Untilrecently, the molecular nature of these Na +/HCO 3-co-transporters had remained undiscovered, as initial attempts to clone them based on presumedhomology to Cl -/HCO 3-(anion) exchangers had proved unsuccessful. Instead, an expression cloning strategy was successfully utilised to identify theNa +/HCO 3-co-transporter from salamander kidney, an organ previously found to possess electrogenic Na+/HCO 3-co-transport activity [ ]. At least 3 mammalian Na+/HCO 3-co-transporters have since been cloned, with similar primary sequence lengths and putative membrance topologies. One ofthese has been found to be a kidney-specific isoform [ ], which isnear-identical (except for a varying N-terminal region) to a more widely-distributed co-transporter cloned from pancreatic tissue [].
Protein Domain
Name: Vacuolating cytotoxin
Type: Family
Description: Helicobacter pylori is a micro-aerophilic bacterium with the extraordinary ability to establish infections in human stomachs that can last for years or decades, despite immune and inflammatory responses and normal turnover of the gastric epithelium and overlying mucin layer in which it resides. Most H. pylori strains secrete a toxin (VacA) that induces multiple structural and functional alterations in eukaryotic cells. The most prominent effect of VacA is its capacity to induce the formation of largecytoplasmic vacuoles in eukaryotic cells. In addition, VacA interferes with the process of antigen presentation, increases permeability of polarised epithelial cell monolayers, and forms anion-selective membrane channels. Formation of channels in endosomal membranes of cells may be an important feature of the mechanism by which VacA induces cell vacuolation. H. pylori vacA encodes a ~139kDa protoxin, which undergoes cleavage of a 33-residue N-terminal signal sequence and C-terminal proteolytic processing to yield a mature secreted toxin. Purified VacA degrades during prolonged storage into two fragments (of ~34 and 58kDa), which are derived from theN- and the C terminus of the toxin respectively. The mass of the experimentally intact toxin (~88.2kDa) corresponds closely to the sum of the masses of the two proteolytic fragments [ ].Secondary structure predictions suggest that a 35kDa portion of the VacA C-terminal domain is rich in amphipathic β-sheets, and this region exhibits low-level similarity to members of the family of autotransporter proteins. In addition, at the C terminus of VacA, there is a phenylalanine- containing motif that is commonly found in autotransporter proteins, as wellas in numerous Gram-negative bacterial outer membrane proteins. An intact N-terminal portion of VacA is not required for proteolytic processing of theprotoxin. However, the N-terminal 32 amino acids of the mature VacA are predicted to form the only contiguous hydrophobic region in the protein thatis long enough to span the membrane. What is more, isogenic H. pylori mutant strains in which the C-terminal VacA domain is disrupted, fail to express orsecrete any detectable VacA, which is probably attributable to the degradation of export-incompetent toxin precursors within the periplasm. It is speculated that the VacA protoxin may undergo proteolytic cleavage at multiple sites downstream from amino acid 854 of the protoxin, which wouldyield a 33kDa cell-associated domain, as well as a fragment of ~15kDa [ ].
Protein Domain
Name: X opioid receptor
Type: Family
Description: G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups [ ]. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [, , , , ]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The rhodopsin-like GPCRs (GPCRA) represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7 transmembrane (TM) helices [ , , ].The term opioid refers to a class of substance that produces its effects via the major classes of opioid receptor, termed mu, delta and kappa.The receptors are found in the CNS and certain smooth muscles: mu-opioid receptors are believed to mediate analgesia, hypothermia, respiratorydepression, miosis, bradycardia, nausea, euphoria and physical dependence, beta-endorphin being the most potent endogenous ligand; delta-receptorsmediate analgesia; and kappa-opioid receptors are believed to mediate analgesia, sedation, miosis and diuresis, dynorphin being the most potentendogenous ligand.The X-receptor is closely related to opioid receptors, on grounds of both sequence and function, although it is not a typical opioid receptor [].X-receptors are found in many regions of the brain and spinal cord, particularly limbic and hypothalamic structures. They are believed torepresent a new class of opioid receptors, with a potential role in modulating various brain functions, including instinctive behaviours andemotions [ ].
Protein Domain
Name: CACNB2, SH3 domain
Type: Domain
Description: Ca2+ ions are unique in that they not only carry charge but they are also the most widely used of diffusible second messengers. Voltage-dependent Ca2+ channels (VDCC) are a family of molecules that allow cells to couple electrical activity to intracellular Ca2+ signalling. The opening and closing of these channels by depolarizing stimuli, such as action potentials, allows Ca2+ ions to enter neurons down a steep electrochemical gradient, producing transient intracellular Ca2+ signals. Many of the processes that occur in neurons, including transmitter release, gene transcription and metabolism are controlled by Ca2+ influx occurring simultaneously at different cellular locales. The pore is formed by the alpha-1 subunit which incorporates the conduction pore, the voltage sensor and gating apparatus, and the known sites of channel regulation by second messengers, drugs, and toxins [ ]. The activity of this pore is modulated by four tightly-coupled subunits: an intracellular beta subunit; a transmembrane gamma subunit; and a disulphide-linked complex of alpha-2 and delta subunits, which are proteolytically cleaved from the same gene product. Properties of the protein including gating voltage-dependence, G protein modulation and kinase susceptibility can be influenced by these subunits.Voltage-gated calcium channels are classified as T, L, N, P, Q and R, and are distinguished by their sensitivity to pharmacological blocks, single-channel conductance kinetics, and voltage-dependence. On the basis of their voltage activation properties, the voltage-gated calcium classes can be further divided into two broad groups: the low (T-type) and high (L, N, P, Q and R-type) threshold-activated channels.The beta subunit is a soluble and intracellular protein that interacts with the transmembrane alpha1 subunit. It facilitates the trafficking and proper localization of the alpha1 subunit to the cellular plasma membrane. Vertebrates contain four different beta subunits from distinct genes (beta1-4); each exists as multiple splice variants [ ]. All are expressed in the brain while other tissues show more specific expression patterns. The beta subunits show similarity to MAGUK (membrane-associated guanylate kinase) proteins in that they contain SH3 and inactive guanylate kinase (GuK) domains; however, they do not appear to contain a PDZ domain []. This entry represents the SH3 domain of the beta2 subunit, which is expressed in the heart [ ] and is present in specific neuronal cells including cerebellar Purkinje cells, hippocampal pyramidal neurons [], and photoreceptors []. Knockout of the beta2 gene in mice results in embryonic lethality, demonstrating its importance in development [, ].
Protein Domain
Name: Peptidase S1B, exfoliative toxin
Type: Family
Description: This group of serine peptidases belong to MEROPS peptidase family S1, subfamily S1B (clan PA(S)). The type example is glutamyl endopeptidase I of Staphylococcus aureus, a well-characterised and specialised human pathogen expressing a variety of virulence factors to enable successful infection of the host. Symptoms usually manifest in cases of food poisoning, pyrogenic fever and toxic shock syndrome, and can prove lethal in immunocompromised victims. Of all the exotoxins secreted from the bacterial cell, superantigens and hemolysins are amongst the most studied [ ]. Of these, the former are well characterised, and several S. aureus super-antigenic enterotoxins exist that trigger excessive and aberrant T-cell activation in the host. Homologues of these S. aureus proteins have been found in Streptococcus pyogenes, and cause similar effects [ ]. In conventional Major Histocompatibility Complex (MHC)-II-restricted antigen processing, a peptide epitope is presented to a specific T-cell receptor (TCR) by the antigen presenting cell, and up to 0.0001% of the host T-cell repertoire is activated. By contrast, a bacterial superantigen (SAg) can bypass this process, binding non-specifically to constant regions on both the MHC-II and TCR. This results in up to 25% of the total host T-cell population being activated, with a massive release of inflammatory cytokines as a consequence. A recent study into the origins and functions of the S. aureus and S. pyogenessuperantigens has identified distinct protein domains that are responsible for the slightly different actions of each protein subgroup []. Based on these criteria, the staphylococcal and streptococcal entero-/exotoxins discovered so far can be placed into several groups/subfamilies. The EXFOL group contains those superantigens that cause exfoliative skin diseases in the human host, and shows some similarity to staphylococcalserine proteases [ ]. Although these proteins show no significant sequencesimilarity to the more "conventional"SAgs, they do function in the same way, binding both TCR and MHC-II molecules []. The EXFOL group of exotoxins also possess potent serine protease activity, and contain a functional domain found in other S. aureus serine proteases. To date, two distinct members of the subfamily have been characterised, both from S. aureus, and designated Exfoliative toxin A and B (Eta and Etb). The tertiary structure of Eta has been resolved to 1.7A using X-ray crystallography. This reveals that Eta contains unique "ETA-surface loops", no cysteine bridges, and a specific N-terminal helix that is crucial for substrate hydrolysis.
Protein Domain
Name: Voltage-dependent calcium channel, gamma-6 subunit
Type: Family
Description: Ca2+ ions are unique in that they not only carry charge but they are also the most widely used of diffusible second messengers. Voltage-dependent Ca2+ channels (VDCC) are a family of molecules that allow cells to couple electrical activity to intracellular Ca2+ signalling. The opening and closing of these channels by depolarizing stimuli, such as action potentials, allows Ca2+ ions to enter neurons down a steep electrochemical gradient, producing transient intracellular Ca2+ signals. Many of the processes that occur in neurons, including transmitter release, gene transcription and metabolism are controlled by Ca2+ influx occurring simultaneously at different cellular locales. The pore is formed by the alpha-1 subunit which incorporates the conduction pore, the voltage sensor and gating apparatus, and the known sites of channel regulation by second messengers, drugs, and toxins [ ]. The activity of this pore is modulated by four tightly-coupled subunits: an intracellular beta subunit; a transmembrane gamma subunit; and a disulphide-linked complex of alpha-2 and delta subunits, which are proteolytically cleaved from the same gene product. Properties of the protein including gating voltage-dependence, G protein modulation and kinase susceptibility can be influenced by these subunits.Voltage-gated calcium channels are classified as T, L, N, P, Q and R, and are distinguished by their sensitivity to pharmacological blocks, single-channel conductance kinetics, and voltage-dependence. On the basis of their voltage activation properties, the voltage-gated calcium classes can be further divided into two broad groups: the low (T-type) and high (L, N, P, Q and R-type) threshold-activated channels.The voltage-dependent calcium channel gamma (VDCCG) subunit family consists of at least 8 members, which share a number of common structural features[ ]. Each member is predicted to possess 4 transmembrane domains, with intracellular N- and C-termini. The first extracellular loop contains a highly conserved N-glycosylation site and a pair of conserved cysteine residues. The C-terminal 7 residues of VDCCG-2, -3, -4 and -8 are also conserved andcontain a consensus site for phosphorylation by cAMP and cGMP-dependent protein kinases, and a target site for binding by PDZ domain proteins [].The VDCCG-6 subunit was identified by high throughput genomic sequencedatabase searching, pursuing sequences similar to VDCCG-1 to -5 [ ].Mouse, human and rat isoforms have been cloned. VDCCG-6 is expressed in a range of tissues including brain, kidney, lung, skeletal muscle, prostateand testis [ ].
Protein Domain
Name: Voltage-dependent calcium channel, gamma-7 subunit
Type: Family
Description: Ca2+ ions are unique in that they not only carry charge but they are also the most widely used of diffusible second messengers. Voltage-dependent Ca2+ channels (VDCC) are a family of molecules that allow cells to couple electrical activity to intracellular Ca2+ signalling. The opening and closing of these channels by depolarizing stimuli, such as action potentials, allows Ca2+ ions to enter neurons down a steep electrochemical gradient, producing transient intracellular Ca2+ signals. Many of the processes that occur in neurons, including transmitter release, gene transcription and metabolism are controlled by Ca2+ influx occurring simultaneously at different cellular locales. The pore is formed by the alpha-1 subunit which incorporates the conduction pore, the voltage sensor and gating apparatus, and the known sites of channel regulation by second messengers, drugs, and toxins []. The activity of this pore is modulated by four tightly-coupled subunits: an intracellular beta subunit; a transmembrane gamma subunit; and a disulphide-linked complex of alpha-2 and delta subunits, which are proteolytically cleaved from the same gene product. Properties of the protein including gating voltage-dependence, G protein modulation and kinase susceptibility can be influenced by these subunits.Voltage-gated calcium channels are classified as T, L, N, P, Q and R, and are distinguished by their sensitivity to pharmacological blocks, single-channel conductance kinetics, and voltage-dependence. On the basis of their voltage activation properties, the voltage-gated calcium classes can be further divided into two broad groups: the low (T-type) and high (L, N, P, Q and R-type) threshold-activated channels.The voltage-dependent calcium channel gamma (VDCCG) subunit family consists of at least 8 members, which share a number of common structural features[ ]. Each member is predicted to possess 4 transmembrane domains, with intracellular N- and C-termini. The first extracellular loop contains a highly conserved N-glycosylation site and a pair of conserved cysteine residues. The C-terminal 7 residues of VDCCG-2, -3, -4 and -8 are also conserved andcontain a consensus site for phosphorylation by cAMP and cGMP-dependent protein kinases, and a target site for binding by PDZ domain proteins [].The VDCCG-7 subunit was identified by high throughput genomic sequencedatabase searching, pursuing sequences similar to VDCCG-1 to -5. Mouse and human isofroms have been cloned. VDCCG-7 is expressed in a rangeof tissues including brain, kidney, liver, small intestine and testis [ ].
Protein Domain
Name: NSP15, NendoU domain, coronavirus
Type: Domain
Description: Nidovirus endoribonucleases (NendoUs) are uridylate-specific endoribonucleases, which release a cleavage product containing a 2',3'-cyclic phosphate at the 3' terminal end. They are conserved among this order and a genetic marker of nidoviruses [ , , , ]. A feature of these viruses' evolutionary relationship is the organisation and processing of the genome, which is translated in two large precursor polyproteins (pp1a and pp1ab) from the replicase gene, that are proteolytically processed by virus proteases into 13 to 16 nonstructural proteins (NSPs) []. Proteins containing the NendoU domain include NSP15 from coronaviruses and NSP11 from arteriviruses, both of which participate in the viral replication process and in the evasion of the host immune system. Although they are similar and conserved, they only share 27% identical residues and show structural differences [ ]. NSP11 has an N-terminal domain and a C-terminal NendoU catalytic domain. NSP11 functions as a dimer and Mg2 is dispensable for its activity. In Porcine reproductive and respiratory syndrome virus (PRRSV), NSP11 induces STAT2 degradation to inhibit interferon signaling. Mutagenesis studies revealed that the amino acid residue K59 located at the N-terminal domain of NSP11 is indispensable for inducing STAT2 reduction []. This domain is not conserved in those nidovirus branches that replicate in invertebrate hosts (Mesoniviridae, Roniviridae), suggesting specific roles in vertebrate hosts.The NendoU domain packs into two β-sheets which constitute the catalytic-site cleft located at one side of the domain. A group of small α-helices packed at the other side of the domain face the concave surface of the β-sheets. The active site, located in the shallow groove between the two β-sheets, carries the catalytic triad made of two histidines and a lysine [ , ].This entry represents the C-terminal NendoU domain of NSP15. NSP15 is encoded by ORF1a/1ab and proteolytically released from the pp1a/1ab polyprotein. This domain exhibits endoribonuclease activity designated EndoU, highly conserved in all known CoVs and is part of the replicase-transcriptase complex that plays important roles in virus replication and transcription [ , , ]. NSP15 is a Uridylate-specific endoribonuclease that cleaves the 5'-polyuridines from negative-sense viral RNA, termed PUN RNA either upstream or downstream of uridylates, at GUU or GU to produce molecules with 2',3'-cyclic phosphate ends [, , ]. PUN RNA is a CoV MDA5-dependent pathogen-associated molecular pattern (PAMP) [].
Protein Domain
Name: Histone acetyltransferase GCN5
Type: Family
Description: This entry includes histone acetyltransferases GCN5, KAT2A and KAT2B (all of which are included in ). GCN5 acetylates histones H2B, H3 and H4, providing a specific tag for epigenetic transcription activation. GCN5 is a component of the transcription regulatory histone acetylation (HAT) complexes SAGA [ ], SLIK [], SALSA [] and ADA []. Mammals have two paralogues: KAT2A (also known as GCN5) and KAT2B. KAT2A acetylates core histones to provide a specific tag for epigenetic transcription activation, but not nucleosome core particles. It also acetylates proteins such as CEBPB []. KAT2A is a component of the ATAC complex, which has acetyltransferase activity on histones H3 and H4 []. KAT2B (also known as P300/calcium-binding protein (CBP)-associated factor or PCAF) can acetylate the core histones H3 and H4 as well as nucleosome core particles and non-histone proteins such as ACLY [].The transcription regulatory histone acetylation complex Spt-Ada-Gcn5 acetyltransferase (SAGA) is involved in RNA polymerase II-dependent transcriptional regulation of approximately 10% of yeast genes. SAGA preferentially acetylates histones H3 and H2B and deubiquitinates histone H2B [ ]. SAGA is known as PCAF in vertebrates and PCAF acetylates nucleosomal histone H3 []. The SAGA complex consists of at least TRA1, CHD1, SPT7, TAF5, ADA3, SGF73, SPT20/ADA5, SPT8, TAF12, TAF6, HFI1/ADA1, UBP8, GCN5, ADA2, SPT3, SGF29, TAF10, TAF9, SGF11 and SUS1, and some of these components are present as two copies. The complex is built up from distinct modules, each of which has a separate function and crosslinks with either other proteins or other modules in the complex [].SLIK (SAGA-like) is a multi-subunit histone acetyltransferase complex that preferentially acetylates histones H3 and H2B and deubiquitinates histone H2B. It is an embellishment of the SAGA complex. The yeast SLIK complex consists of at least TRA1, CHD1, SPT7, CC TAF5, ADA3, SPT20, RTG2, TAF12, TAF6, HFI1, UBP8 (a deubiquitinase), GCN5, ADA2, SPT3, SGF29, TAF10 and TAF9 [ , ].The yeast SALSA complex is an altered form of the SAGA complex and consists of at least TRA1, SPT7 (C-terminal truncated form), TAF5, ADA3, SPT20, TAF12, TAF6, HFI1, GCN5, ADA2 and SPT3 [ ].The ADA complex is a transcription regulatory histone acetylation (HAT) complex. ADA preferentially acetylates nucleosomal histones H3 (at 'Lys-14' and 'Lys-18') and H2B. The complex consists of at least ADA2, ADA3, AHC1, and GCN5. AHC1 is required for the overall structural integrity of the ADA complex [ ].
Protein Domain
Name: Sodium/hydrogen exchanger 1-like
Type: Family
Description: Sodium proton exchangers (NHEs) constitute a large family of integral membrane protein transporters that are responsible for the counter-transport of protons and sodium ions across lipid bilayers [ , ]. These proteins are found in organisms across all domains of life. In archaea, bacteria, yeast and plants, these exchangers provide increased salt tolerance by removing sodium in exchanger for extracellular protons. In mammals they participate in the regulation of cell pH, volume, and intracellular sodium concentration, as well as for the reabsorption of NaCl across renal, intestinal, and other epithelia [, , , ]. Human NHE is also involved in heart disease, cell growth and in cell differentiation []. The removal of intracellular protons in exchange for extracellular sodium effectively eliminates excess acid from actively metabolising cells. In mammalian cells, NHE activity is found in both the plasma membrane and inner mitochondrial membrane. To date, nine mammalian isoforms have been identified (designated NHE1-NHE9) [, ]. These exchangers are highly-regulated (glyco)phosphoproteins, which, based on their primary structure, appear to contain 10-12 membrane-spanning regions (M) at the N terminus and a large cytoplasmic region at the C terminus. The transmembrane regions M3-M12 share identity with other members of the family. The M6 and M7 regions are highly conserved. Thus, this is thought to be the region that is involved in the transport of sodium and hydrogen ions. The cytoplasmic region has little similarity throughout the family. There is some evidence that the exchangers may exist in the cell membrane as homodimers, but little is currently known about the mechanism of their antiport [].Sodium/hydrogen exchanger 1 (NHE-1) is found in virtually all tissues and cells in mammals and is involved in numerous physiological processes, including regulation of intracellular pH, cellular volume, cytoskeletal organisation, heart disease and cancer [ , , ]. In epithelial cells, NHE-1 is largely restricted to the basolateral membrane, which specific subcellular localisation is thought to be important to the functioning of these epithelia. This protein comprises two domains: an N-terminal membrane domain that functions to transport ions, and a C-terminal cytoplasmic regulatory domain that regulates the activity and mediates cytoskeletal interactions.NHE-1 plays a role in survival and migration and invasion of several cancers [ , ]. It was shown to be activated at physiological levels of NO [].
Protein Domain
Name: Glycoside hydrolase family 22 domain
Type: Domain
Description: O-Glycosyl hydrolases ( ) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl hydrolases, based on sequence similarity, has led to the definition of 85 different families [ , ]. This classification is available on the CAZy (CArbohydrate-Active EnZymes) website.Glycoside hydrolase family 22 comprises enzymes with two known activities; lysozyme type C ( ) and alpha-lactalbumins. Asp and/or the carbonyl oxygen of the C-2 acetamido group of the substrate acts as the catalytic nucleophile/base. Alpha-lactalbumin [ , ] is a milk protein that acts as the regulatory subunit of lactose synthetase, acting to promote the conversion of galactosyltransferase to lactose synthase, which is essential for milk production. In the mammary gland, alpha-lactalbumin changes the substrate specificity of galactosyltransferase from N-acetylglucosamine to glucose.Lysozymes ( ) act as bacteriolytic enzymes by hydrolyzing the beta(1->4) bonds between N-acetylglucosamine and N-acetylmuramic acid in the peptidoglycan of prokaryotic cell walls. It has also been recruited for a digestive role in certain ruminants and colobine monkeys [ ]. There are at least five different classes of lysozymes []: C (chicken type), G (goose type), phage-type (T4), fungi (Chalaropsis), and bacterial (Bacillus subtilis). There are few similarities in the sequences of the different types of lysozymes.Lysozyme type C and alpha-lactalbumin are similar both in terms of primary sequence and structure, and probably evolved from a common ancestral protein [ ]. Around 35 to 40% of the residues are conserved in both proteins as well as the positions of the four disulphide bonds. There is, however, no similarity in function. Another significant difference between the two enzymes is that all lactalbumins have the ability to bind calcium [], while this property is restricted to only a few lysozymes []. The binding site was deduced using high resolution X-ray structure analysis and was shown to consist of three aspartic acid residues. It was first suggested that calcium bound to lactalbumin stabilised the structure, but recently it has been claimed that calcium controls the release of lactalbumin from the golgi membrane and that the pattern of ion binding may also affect the catalytic properties of the lactose synthetase complex.This domain includes three cysteines which are involved in two of the disulphide bonds found in these proteins.
Protein Domain
Name: Voltage-dependent calcium channel, gamma-4 subunit
Type: Family
Description: Ca2+ ions are unique in that they not only carry charge but they are also the most widely used of diffusible second messengers. Voltage-dependent Ca2+ channels (VDCC) are a family of molecules that allow cells to couple electrical activity to intracellular Ca2+ signalling. The opening and closing of these channels by depolarizing stimuli, such as action potentials, allows Ca2+ ions to enter neurons down a steep electrochemical gradient, producing transient intracellular Ca2+ signals. Many of the processes that occur in neurons, including transmitter release, gene transcription and metabolism are controlled by Ca2+ influx occurring simultaneously at different cellular locales. The pore is formed by the alpha-1 subunit which incorporates the conduction pore, the voltage sensor and gating apparatus, and the known sites of channel regulation by second messengers, drugs, and toxins [ ]. The activity of this pore is modulated by four tightly-coupled subunits: an intracellular beta subunit; a transmembrane gamma subunit; and a disulphide-linked complex of alpha-2 and delta subunits, which are proteolytically cleaved from the same gene product. Properties of the protein including gating voltage-dependence, G protein modulation and kinase susceptibility can be influenced by these subunits. Voltage-gated calcium channels are classified as T, L, N, P, Q and R, and are distinguished by their sensitivity to pharmacological blocks, single-channel conductance kinetics, and voltage-dependence. On the basis of their voltage activation properties, the voltage-gated calcium classes can be further divided into two broad groups: the low (T-type) and high (L, N, P, Q and R-type) threshold-activated channels.The voltage-dependent calcium channel gamma (VDCCG) subunit family consists of at least 8 members, which share a number of common structural features[ ]. Each member is predicted to possess 4 transmembrane domains, with intracellular N- and C-termini. The first extracellular loop contains a highly conserved N-glycosylation site and a pair of conserved cysteine residues. The C-terminal 7 residues of VDCCG-2, -3, -4 and -8 are also conserved andcontain a consensus site for phosphorylation by cAMP and cGMP-dependent protein kinases, and a target site for binding by PDZ domain proteins [].The VDCCG-4 subunit is predominantly expressed in neuronal tissue, although there is some evidence for expression in lung and prostate [, ]. Themodulatory properties of the subunit have been investigated using heterologous expression systems. Coexpression of the VDCGG-4 subunit with P/Q-type channels shifts the steady-state inactivation curve of these channels to more hyperpolarised potentials [, ].
Protein Domain
Name: Oil body-associated protein-like
Type: Family
Description: This entry includes a group of oil body associated proteins (OBAPs) from plants and some uncharacterised proteins from fungi and bacteria. The plant obap proteins are predominantly expressed during embryo development and may be involved in the stability of oil bodies [ ].
Protein Domain
Name: Translocation and assembly module TamB
Type: Family
Description: TamB is an integral inner membrane protein that forms a complex, the translocation and assembly module or TAM [ ], with the outer membrane protein, TamA. TAM is responsible for the efficient secretion of the adhesin protein Ag43 in E.coli K-12 [].
Protein Domain
Name: ORMDL family
Type: Family
Description: ORMDL family members include ORMDL1/2/3 from humans and their homologues, such as protein Orm1 and Orm2 from budding yeasts. ORMDLs may be involved in protein folding in the endoplasmic reticulum [ ]. In budding yeast, Orm1 and Orm2 proteins mediate sphingolipid homeostasis [].
Protein Domain
Name: Thylakoid soluble phosphoprotein TSP9
Type: Family
Description: The plant-specific protein, TSP9 is phosphorylated and released in response to changing light conditions from the photosynthetic membrane. The protein resembles the characteristics of transcription/translation regulatory factors. The structure of the protein is predicted to consist of a random coil [ ].
Protein Domain
Name: Regulator of ribonuclease activity B domain
Type: Domain
Description: This entry represents a domain found in regulator of ribonuclease activity B (RraB) protein. RraB regulates mRNA abundance by binding to RNaseE and inhibiting its endonucleolytic activity [ , ]. A subset of these proteins are predicted to function as immunity proteins [].
Protein Domain
Name: CPPED1, metallophosphatase domain
Type: Domain
Description: CPPED1 (calcineurin-like phosphoesterase domain-containing 1), also known as CSTP1 (complete S-transactivated protein 1), is a protein with a metallophosphatase domain. CPPED1 is involved in glucose uptake in adipocytes [ ]. It is transactivated by the complete S protein of hepatitis B virus [].
Protein Domain
Name: Arterivirus GP3 envelope glycoprotein
Type: Family
Description: This family consists of envelope proteins from Arterivirus, including glycoprotein 3 (GP3) from Porcine reproductive and respiratory syndrome virus (PRRSV) [ ] and Lactate dehydrogenase-elevating virus (LDV) structural glycoprotein []. Arteriviruses consists of positive ssRNA and do not have a DNA stage.
Protein Domain
Name: Vip1-like, RNA recognition motif, plant
Type: Domain
Description: This entry represents the Vip1-like, uncharacterized proteins found in plants. Although their biological roles remain unclear, these proteins show high sequence similarity to the fission yeast Vip1. Similar to the Vip1 protein, members in this family contain an N-terminal RNA recognition motif (RRM).
Protein Domain
Name: RraB-like superfamily
Type: Homologous_superfamily
Description: This entry represents a domain found in regulator of ribonuclease activity B (RraB) protein. RraB regulates mRNA abundance by binding to RNaseE and inhibiting its endonucleolytic activity [ , ]. A subset of these proteins are predicted to function as immunity proteins [].
Protein Domain
Name: ChaB
Type: Family
Description: This family of proteins contain a conserved 60 residue region. This protein is known as ChaB in Escherichia coli and is found next to ChaA, which is a cation transporter protein [ , ]. ChaB may regulate ChaA function in some way.
Protein Domain
Name: FAM20
Type: Family
Description: FAM20 is a family of secreted proteins with a potential role in regulating differentiation and function of hematopoietic and other tissues. Proteins in this family include FAM20A [ ], FAM20B or glycosaminoglycan xylosylkinase [], and FAM20C or dentin matrix protein 4 [, ].
Protein Domain
Name: MamQ/LemA
Type: Family
Description: This family includes the magnetosome protein MamQ as well as a functionally uncharacterised protein called LemA which is predicted to be a transmembrane protein with an extracellular N terminus [ ]. MamQ has been shown to be essential for magnetosome formation [].
Protein Domain
Name: PRD domain protein, EF0829/AHA3910
Type: Family
Description: Members of this family of relatively uncommon proteins are found in both Gram-positive (e.g. Enterococcus faecalis) and Gram-negative (e.g. Aeromonas hydrophila) bacteria, as part of a cluster of conserved proteins. The proteins contain a PRD domain (see ). Their function is unknown.
Protein Domain
Name: GRASP55/65
Type: Family
Description: GRASP55 (Golgi reassembly stacking protein of 55kDa) and GRASP65 (a 65kDa) protein are highly homologous. GRASP55 is a component of the Golgi stacking machinery. GRASP65 is an N-ethylmaleimide-sensitive membrane protein required for the stacking of Golgi cisternae in a cell-free system [ ].
Protein Domain
Name: CCDC144C-like, coiled-coil domain
Type: Domain
Description: This entry represents the coiled-coil domain found in the human protein CCDC144C ( ), the ankyrin repeat domain-containing protein 26-like 1 and related proteins. The function of CCDC144C remains unknown. The ankyrin repeat which features in CCDC144C is a common amino acid motif.
Protein Domain
Name: Abortive infection protein, AbiV family
Type: Family
Description: Bacterial abortive infection (Abi) systems contain phage resistance proteins that limit viral replication. Abi mechanisms differ greatly. This family includes AbiV from Lactococcus lactis. AbiV interacts directly with the protein SaV in phage p2 and blocks translation of phage proteins [ ].
Protein Domain
Name: Thylakoid soluble phosphoprotein TSP9 superfamily
Type: Homologous_superfamily
Description: The plant-specific protein, TSP9 is phosphorylated and released in response to changing light conditions from the photosynthetic membrane. The protein resembles the characteristics of transcription/translation regulatory factors. The structure of the protein is predicted to consist of a random coil [ ].
Protein Domain
Name: NEDD8 ultimate buster 1
Type: Family
Description: NUB1 is an adaptor protein which negatively regulates the ubiquitin-like protein Nedd8 as well as neddylated proteins levels through proteasomal degradation [ , ]. It has been shown to be regulated by Mdm2 (E3 ubiquitin ligase) through ubiquitination on its lysine 159 [].
Protein Domain
Name: Selenoprotein F/M
Type: Family
Description: Selenoprotein F (Sep15) and selenoprotein M (SelM) are eukaryotic selenoproteins that have a thioredoxin-like domain and a surface accessible active site redox motif [ ]. This suggests that they function as thiol-disulphide isomerases involved in disulphide bond formation in the endoplasmic reticulum [].
Protein Domain
Name: Capsid assembly scaffolding protein-like
Type: Family
Description: This family includes the capsid assembly protein Gp9 (scaffolding protein) of bacteriophage T7, similar viral proteins and prophages from Proteobacteria. Gp9 facilitates assembly by binding to Gp10 hexamers but not the pentamers and locking them into a morphogenically correct conformation [ , ].
Protein Domain
Name: BTBD8, BACK domain
Type: Domain
Description: BTBD8 is a BTB-domain-containing Kelch-like protein that may play a role in developmental process. It may also act as a protein-protein adaptor in a transcription complex and thus may be involved in brain development [ ].This entry represents the BACK domain of BTBD8.
Protein Domain
Name: Siphovirus Gp157
Type: Family
Description: This family contains both viral and bacterial proteins which are related to the Gp157 protein of the Streptococcus thermophilus SFi bacteriophage. It is thought that bacteria possessing the gene coding for this protein have an increased resistance to the bacteriophage [ ].
Protein Domain
Name: MHCK/EF2 kinase
Type: Domain
Description: Proteins containing this domain consist of a novel group of eukaryotic protein kinase catalytic domains, which have no detectable similarity to conventional kinases. Proteins include myosin heavy chain kinases [ , ] and Elongation Factor-2 kinase and a bifunctional ion channel [].
Protein Domain
Name: P48 protein, Baculovirus
Type: Family
Description: This family comprises the Baculovirus P48 proteins. They contain two possible membrane-spanning domains and a cysteine-rich domain that are conserved in all of the proteins. The Bombyx mori (Silk moth) nuclear polyhedrosisvirus protein, , has been described as a putative DNA helicase.
Protein Domain
Name: Retro-transcribing virus envelope glycoprotein
Type: Domain
Description: This entry represents a group of mammalian proteins that have retroviral origin. They are retrovirus K envelope glycoprotein genes that entry into human ancestral genome millions of years ago [ ]. These endogenous envelope proteins have lost their original fusogenic properties [].
Protein Domain
Name: FAM21/CAPZIP domain
Type: Domain
Description: This domain is found on WASH complex subunits FAM21 [ ] and CAP-ZIP proteins []. Proteins containing this domain are eukaryotic proteins that are typically between 305 and 1321 amino acids in length. The exact function of this domain is not known.
Protein Domain
Name: FF domain
Type: Domain
Description: The FF domain may be involved in protein-protein interaction [ ]. It often occurs as multiple copies and often accompanies WW domains . PRP40 from yeast encodes a novel, essential splicing component that associates with the yeast U1 small nuclear ribonucleoprotein particle [ ].
Protein Domain
Name: Apoptosis-antagonizing transcription factor, C-terminal
Type: Domain
Description: This C-terminal domain is found in apoptosis-antagonizing transcription factor (AATF) proteins [ ]. This is the domain of the AATF proteins that interacts with BLOS2 or Ceap, that functions as an adaptor in processes such as protein and vesicle processing and transport, and perhaps transcription.
Protein Domain
Name: Hemimethylated DNA-binding domain
Type: Domain
Description: Heat shock protein HspQ, also known as YccV, is an Escherichia coli hemimethylated DNA binding protein which has been shown to regulate dnaA gene expression [ ].This entry represents a YccV-like hemimethylated DNA binding domain that can also be found in longer eukaryotic proteins.
Protein Domain
Name: Domain of unknown function DUF4005
Type: Domain
Description: This domain is found towards the C terminus of a number of plant IQ domain-containing proteins. These proteins may be involved in cooperative interactions with calmodulins or calmodulin-like proteins, and may associate with nucleic acids and regulate gene expression at the transcriptional or post-transcriptional level.
Protein Domain
Name: Sieve element occlusion, N-terminal
Type: Domain
Description: This entry represents the N terminus of the sieve element occlusion (SEO) proteins (also known as forisomes), which are phloem proteins accumulated during sieve element differentiation [ , ]. This domain mediates homologous dimerisation of the forisome protein MtSEO-F, probably via hydrophobic interplay [].
Protein Domain
Name: Hepatitis delta virus delta antigen
Type: Family
Description: The Hepatitis delta virus (HDV) encodes a single protein, the hepatitis delta antigen (HDAg). The central region of this proteinhas been shown to bind RNA [ ]. Several interactions are alsomediated by a coiled-coil region at the N terminus of the protein [ ].
Protein Domain
Name: Archaeosortase B
Type: Family
Description: Members of this protein family are found so far in Methanohalophilus mahii DSM 5219 and Methanohalobium evestigatum Z-7303, along with five and nine proteins, respectively, with the VPXXXP-CTERM protein sorting signal. In these species, this boutique system represents a second exosortase/archaeosortase-type system [ ].
Protein Domain
Name: Domain of unknown function DUF38/FTH, Caenorhabditis species
Type: Domain
Description: This domain with no known function is presumed to be a protein-protein interaction module specific to proteins from several Caenorhabditis species. It is named FTH after FOG-2 homology domain [ ]. The domain is found associated with, and C-terminal to, the cyclin-like F-box .
Protein Domain
Name: Apical junction component
Type: Family
Description: This entry represents a family of proteins known variously as apical junction molecule, or apical junction component. In Caenorhabditis elegans, the coiled-coil protein Ajm-1 (apical junction molecule) controls epithelial junction integrity. Its localization to apical junctions is regulated by proteins LET-413 and DLG-1 [ ].
Protein Domain
Name: Phospholipid biosynthesis protein, PlsX-like
Type: Family
Description: The proteins in this group are phospholipid biosynthesis proteins of unknown function. Escherichia coli PlsX protein has been shown to be required for phospholipid biosynthesis, but its exact function is not yet known [ , ]. It has been suggested to be an enzyme.
Protein Domain
Name: DinB-like domain
Type: Domain
Description: DinB from Bacillus subtilis and related Gram-positive bacteria is a DNA-damage-induced gene and the corresponding protein contains a four helix bundle. This fold is shared by diverse proteins [ ].This entry represents a DinB-like structural domain found in uncharacterised proteins and putative metal-dependent hydrolases.
Protein Domain
Name: Flagellar basal-body rod FlgF
Type: Family
Description: FlgF is a flagellar basal-body protein that along with FlgBCG composes the rod of bacterial flagellin [ ]. This entry contains proteins only from the proteobacteria, and not in the epsilon subdivision (where the architecture of the related FlgE protein differs substantially from other lineages).
Protein Domain
Name: FGF binding 1
Type: Family
Description: This family consists of several mammalian FGF binding protein 1. Fibroblast growth factors (FGFs) play important roles during foetal and embryonic development [ ]. Fibroblast growth factor-binding protein (FGF-BP) 1 is a secreted protein that can bind fibroblast growth factors (FGFs) 1 and 2 [].
Protein Domain
Name: SlyX
Type: Family
Description: The SlyX protein has no known function. It is short, less than 80 amino acids, and its gene is found close to the slyD gene. The SlyX protein has a conserved PPH(Y/W) motif at its C terminus. The protein may be a coiled-coil structure.
Protein Domain
Name: Domain of unknown function DUF4822
Type: Domain
Description: This is lipocain-like domain found in functionally uncharacterised bacterial proteins, often as a repeat of two domains. Proteins with this domain are found in a wide range of bacteria and are often annotated as S-layer proteins, but the origin of this annotation is not clear.
Protein Domain
Name: Capsid Gp10A/Gp10B
Type: Domain
Description: This entry describes a domain found in capsid proteins, such as Gp10A and Gp10B from bacteriophage T7 and T3. Gp10A is a major capsid protein, while Gp10B is a minor capsid protein. They incorporate into the capsid in about a 90/10 ratio respectively [ ].
Protein Domain
Name: Trimerisation motif
Type: Domain
Description: This domain is predominantly found in the structural protein coronin, and is duplicated in some sequences. It appears to have the function of stabilising the topology of short coiled-coils in proteins [ ].Coronins are evoluntionarily conserved proteins, mainly involved in actin cytoskeleton organisation [ ].
USDA
InterMine logo
The Legume Information System (LIS) is a research project of the USDA-ARS:Corn Insects and Crop Genetics Research in Ames, IA.
LegumeMine || ArachisMine | CicerMine | GlycineMine | LensMine | LupinusMine | PhaseolusMine | VignaMine | MedicagoMine
InterMine © 2002 - 2022 Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, United Kingdom