Search our database by keyword

- or -

Examples

  • Search this entire website. Enter identifiers, names or keywords for genes, pathways, authors, ontology terms, etc. (e.g. eve, embryo, zen, allele)
  • Use OR to search for either of two terms (e.g. fly OR drosophila) or quotation marks to search for phrases (e.g. "dna binding").
  • Boolean search syntax is supported: e.g. dros* for partial matches or fly AND NOT embryo to exclude a term

Search results 14601 to 14700 out of 30763 for seed protein

Category restricted to ProteinDomain (x)

0.04s

Categories

Category: ProteinDomain
Type Details Score
Protein Domain
Name: WWE domain, subgroup
Type: Domain
Description: The WWE domain is named after three of its conserved residues and is predicted to mediate specific protein-protein interactions in ubiquitin and ADP ribose conjugation systems. This domain is found as a tandem repeat at the N-terminal of Deltex, a cytosolic effector of Notch signalling thought to bind the N-terminal of the Notch receptor [ ]. It is also found as an interaction module in protein ubiquination and ADP ribosylation proteins [].
Protein Domain
Name: Archaeal Nre, N-terminal
Type: Domain
Description: This conserved region is found in the N-terminal region of archaeal Nre proteins. While most archaeal organisms encode only a single Nre protein, some encode two, NreA and NreB. NreA is an archaeal PCNA interacting protein that works together with the UvrABC proteins in repairing DNA damage resulting from exposure to DNA damaging agent MMC. NreA contains a putative PIP motif at its C terminus that is important for its function [ ].
Protein Domain
Name: Archaeal Nre, C-terminal
Type: Domain
Description: This conserved region is found in the C-terminal region of archaeal Nre proteins. While most archaeal organisms encode only a single Nre protein, some encode two, NreA and NreB. NreA is an archaeal PCNA interacting protein that works together with the UvrABC proteins in repairing DNA damage resulting from exposure to DNA damaging agent MMC. NreA contains a putative PIP motif at its C terminus that is important for its function [ ].
Protein Domain
Name: Cationic amino acid transport permease
Type: Family
Description: Amino acid permeases are integral membrane proteins involved in the transport of amino acids into the cell. A number of such proteins have been found to beevolutionary related [ , , ]. These proteins seem to contain up to 12 transmembrane segments. The best conserved region in this family is located in the second transmembrane segment.Proteins in this family are permeases involved in the transport of arginine, lysine and ornithine, the cationic amino acids.
Protein Domain
Name: Glutamate:g-aminobutyrate antiporter
Type: Family
Description: Amino acid permeases are integral membrane proteins involved in the transport of amino acids into the cell. A number of such proteins have been found to beevolutionary related [ , , ].These proteins seem to contain up to 12 transmembrane segments. The best conserved region in this family is located in the second transmembrane segment.An example of the glutamate:g-aminobutyrate antiporter proteins is the amino acid transporter involved in extreme acid resistance from Shigella flexneri.
Protein Domain
Name: Herpesvirus UL3
Type: Family
Description: Herpes simplex viruses are large DNA viruses, the genome of which encode approximately 80 genes. The UL3 gene of Human herpesvirus 2 (HHV-2) is predicted to encode a 233 amino acid protein with a molecular mass of 26kDa. Homologues of the UL3 protein are encoded only among alphaherpesviruses. The function of the UL3 protein of Herpes simplex viruses remains unknown but it is known to localize to the nucleus and is a phosphoprotein [ ].
Protein Domain
Name: Nop, C-terminal domain
Type: Homologous_superfamily
Description: The Nop domain is present in various pre-RNA processing ribonucleoproteins (RNP):Eukaryotic Prp31, part of a tri-snRNP complex. It is involved in pre-mRNA splicing.Eukaryotic Nucleolar proteins 56 and 58 (Nop56 and Nop58), components ofbox C/D small nucleolar ribonucleoprotein (snoRNP) particles.Archaeal Nop5, an homologue of Nop56/Nop58.The Nop domain is a RNP binding module, exhibiting RNA and protein binding surfaces. It is oval-shaped and exclusively α-helical [ , ].This entry represents the C-terminal part of the Nop domain.
Protein Domain
Name: EGF-type aspartate/asparagine hydroxylation site
Type: PTM
Description: Post-translational hydroxylation of aspartic acid or asparagine [ ] to formerythro-beta-hydroxyaspartic acid or erythro-beta-hydroxyasparagine has been identified in a number of proteins with domains homologous to epidermal growthfactor (EGF). Examples of such proteins are the blood coagulation protein factors VII, IX and X, proteins C, S, and Z, the LDL receptor, thrombomodulin,etc. Based on sequence comparisons of the EGF-homology region that contains hydroxylated Asp or Asn, a consensus sequence has been identified that seemsto be required by the hydroxylase(s).
Protein Domain
Name: MMM1 domain
Type: Domain
Description: This is entry represents a domain found in mitochondrial distribution and morphology proteins Mdm12 and Mdm34, and in maintenance of mitochondrial morphology protein Mmm1. These proteins are components of the ERMES/MDM complex, which serves as a molecular tether to connect the endoplasmic reticulum and mitochondria [ ]. MMM1 is an integral ER protein conserved from plants to humans. It is N-glycosylated, and forms a complex with Mdm10, Mdm12and Mdm34 to tether the mitochondria to the endoplasmic reticulum [ ].
Protein Domain
Name: Vesicle transport protein, Use1
Type: Family
Description: This entry represents a family of proteins, approximately 300 residues in length, involved in vesicle transport. They have a single C-terminal transmembrane domain and a SNARE [soluble NSF (N-ethylmaleimide-sensitive fusion protein) attachment protein receptor] domain of approximately 60 residues. The SNARE domains are essential for membrane fusion and are conserved from yeasts to humans. Use1 is one of the three protein subunits that make up the SNARE complex and it is specifically required for Golgi-endoplasmic reticulum retrograde transport [].
Protein Domain
Name: Molybdenum cofactor biosynthesis, conserved site
Type: Conserved_site
Description: Eukaryotic and prokaryotic molybdoenzymes require a molybdopterin cofactor (MoCF) for their activity. The biosynthesis of this cofactor involves acomplex multistep enzymatic pathway. One of the eukaryotic proteins involved in this pathway is the Drosophila protein cinnamon [] which is highly similar to gephyrin, a rat microtubule-associated protein which was thought to anchor the glycine receptor to subsynaptic microtubules.Cinnamon and gephyrin are evolutionary related, in their N-terminal half, to the Escherichia coli MoCF biosynthesis proteins mog/chlG and moaB/chlA2.
Protein Domain
Name: Plant organelle RNA recognition domain
Type: Domain
Description: The plant organelle RNA recognition (PORR) domain, previously known as DUF860, is a component of group II intron ribonucleoprotein particles in maize chloroplasts. It is required for the splicing of the introns with which it associates, and promotes splicing in the context of a heterodimer with the RNase III-domain protein RNC1. Proteins containing this domain are predicted to localise to mitochondria or chloroplasts [ ]. It seems likely that most PORR proteins function in organellar RNA metabolism [].
Protein Domain
Name: Polycomb protein, VEFS-Box
Type: Domain
Description: The VEFS-Box is found in the the C-terminal region of the VRN2, EMF2, FIS2, and Su(z)12 polycomb proteins. This domain is characterised by an acidic cluster and a tryptophan/methionine-rich sequence, the acidic-W/M domain [ ]. In some proteins the VEFS-Box is associated with a zinc-finger domain located roughly 100 residues towards the N terminus. These proteins are part of the polycomb cluster of proteins which control HOX gene transcription as it functions in heterochromatin-mediated repression [].
Protein Domain
Name: Sizzled, cysteine-rich domain
Type: Domain
Description: The cysteine-rich domain (CRD) is an essential part of the sizzled protein, which regulates bone morphogenetic protein (Bmp) signaling by stabilizing chordin, and plays a critical role in the patterning of vertebrate and invertebrate embryos. Sizzled also functions in the ventral region as a Wnt inhibitor and modulates canonical Wnt signaling. Sizzled proteins belong to the secreted frizzled-related protein family (SFRP), and have be identified in the genomes of birds, fishes and frogs, but not mammals [ ].
Protein Domain
Name: Plasmodium falciparum UIS3, C-terminal domain superfamily
Type: Homologous_superfamily
Description: UIS3 is a membrane protein essential for sporozoite development in infected hepatocytes. This superfamily corresponds to the 130-229 region of the Plasmodium falciparum UIS3 protein which is compact and has an all α-helical structure. PfUIS3(130-229) interacts with lipids, phospholipid lysosomes, the human liver fatty acid-binding protein and with the lipid phosphatidylethanolamine. The interaction with liver fatty acid-binding protein provides the parasite with a method to import essential fatty acids/lipids during rapid growth phases of sporozoites [ ].
Protein Domain
Name: Replication terminator Tus, domain 1
Type: Homologous_superfamily
Description: Bacterial DNA replication terminus site-binding proteins (also known as Tus or Ter-binding proteins) are required for the termination of DNA replication and function by binding to DNA replication terminator sequences, thus preventing the passage of replication forks [ ]. The termination efficiency is affected by the affinity of a particular protein for the terminator sequence.Tus protein folds into two domains divided by a central basic cleft [ ]. This superfamily describes the N-terminal 3-layer sandwich domain.
Protein Domain
Name: Mononegavirales mRNA-capping domain V
Type: Domain
Description: L protein (large polymerase protein) is a polymerase protein. The V domain of L RNA-polymerase carries a new motif, GxxTx(n)HR, that is essential for mRNA cap formation. Nonsegmented negative-sense (NNS) RNA viruses, Mononegavirales, cap their mRNA by an unconventional mechanism. Specifically, 5'-monophosphate mRNA is transferred to GDP derived from GTP through a reaction that involves a covalent intermediate between the large polymerase protein L and mRNA. The V region is essential for this process [ ].
Protein Domain
Name: Lysoplasmalogenase-like
Type: Family
Description: This entry represents a family of proteins that include Lysoplasmalogenase from humans (also known as Transmembrane protein 86B, TMEM86B) and similar proteins found in eukaryotes and bacteria. TMEM86B catalyses the degradation of lysoplasmalogen, which is formed by the hydrolysis of membrane glycerophospholipids plasmalogens. It may modulate cell membrane properties [ ]. Uncharacterized membrane protein YhhN from E. coli also belongs to this group (). Putative conserved active site residues have been proposed for this family [ ].
Protein Domain
Name: Mucin-like glycoprotein
Type: Family
Description: This family of trypanosomal proteins resemble vertebrate mucins. The protein consists of three regions. The N and C terminii areconserved between all members of the family, whereas the central region is not well conserved and contains a large number ofthreonine residues which can be glycosylated [ ].Indirect evidence suggested that these genes might encode the core protein of parasite mucins, glycoproteins that were proposed to beinvolved in the interaction with, and invasion of, mammalian host cells.
Protein Domain
Name: Phage capsid
Type: Family
Description: The proteins in this family are found in bacteria and viruses (Caudovirales) and represent bacteriophage capsid proteins. The major capsid protein of Enterobacteria phage HK97 assembles to form an icosahedral capsid [ , ]. Included within the capsid protein is the delta domain which acts as the scaffold upon which the capsid is built []. Bacterial and archaeal encapsulin-like systems and HK97-type viruses share a common ancestor and it is likely that encapsulins have evolved from HK97-type phages ].
Protein Domain
Name: SPAN-X family
Type: Family
Description: This entry represents SPAN-X (Sperm Protein Associated with the Nucleus on the X chromosome) family proteins, including N1, N2, N3 and N5. These human sperm proteins are associated with the nucleus and mapped to the X chromosome (SPAN-X) (approximately 100 residues long). SPAN-X proteins are cancer-testis antigens (CTAs), and thus represent potential targets for cancer immunotherapy because they are widely distributed in tumours but not in normal tissues, except testes. They are highly insoluble, acidic, and polymorphic [ ].
Protein Domain
Name: Bacteriophage Mu, Gp16
Type: Family
Description: This family consists of several bacterial and phage proteins, including GemA (also known as gp16) protein from Bacteriophage Mu. GemA is an early protein responsible for decreasing host DNA gyrase activity, that promotes DNA relaxation of bacterial host genome. Modulates the expression of various host genes probably controlled by supercoiling of their promoters. Host genes affected include DNA replication and cell division determinants [ ]. This family also includes the Mu-like prophage FluMu protein gp16 from Haemophilus influenzae.
Protein Domain
Name: Quinoprotein dehydrogenase-associated
Type: Family
Description: Proteins in this entry are found in known methanotrophs and a range of PQQ-biosynthesising species. Interpretation of evidence by homology and by direct experimental work suggest two different roles. By homology, these proteins appear to be the periplasmic substrate-binding protein of an ABC transport family. However, mutational studies and direct characterisation for some sequences related to these proteins suggests they may act as a maturation chaperone or additional subunit of a methanol dehydrogenase-like enzyme [ ].
Protein Domain
Name: Tra1, HEAT repeat central region
Type: Repeat
Description: This entry includes transcription-associated protein 1 (Tra1) from yeast and TRRAP from mammals. Budding yeast Tra1 is a subunit of SAGA and NuA4 histone acetyltransferase complexes [ , ]. Human TRRAP is an adapter protein found in various multiprotein chromatin complexes with histone acetyltransferase activity (HAT) [].This entry represents part of the Tra1 protein composed of α-solenoid repeats that forms the central region [ ]. This is named as central due to its position relative to the ring region.
Protein Domain
Name: Uncharacterised domain UPF0029, Impact, C-terminal
Type: Domain
Description: Members of this entry are a set of functionally uncharacterised hypothetical bacterial proteins. They adopt a ferredoxin-like fold, with a beta-α-β-beta-α-β arrangement [ ]. This entry contains the protein Impact, which is a translational regulator that ensures constant high levels of translation under amino acid starvation. It acts by interacting with Gcn1/Gcn1L1, thereby preventing activation of Gcn2 protein kinases (EIF2AK1 to 4) and subsequent down-regulation of protein synthesis. It is evolutionary conserved from eukaryotes to archaea [ ].
Protein Domain
Name: SPATA6 family
Type: Family
Description: This entry includes Spermatogenesis-associated protein 6 (SPATA6) and SPATA6-like proteins. SPATA6 has similarity with the motor domain of kinesin related proteins and the Caenorhabditis elegans neural calcium sensor protein (NCS-2) [ ]. It plays a role in linking the developing flagellum to the head during late spermiogenesis by being involved in the formation of the segmented columns and the capitulum, two major structures of the sperm connecting piece []. The function of SPATA6L is not clear.
Protein Domain
Name: Vta1/callose synthase, N-terminal
Type: Domain
Description: This domain can be found in the N terminus of the Vta1 protein, which is a class E vacuolar protein sorting (VPS) protein required for the formation of the multivesicular body (MVB) [ ]. Proteins containing this domain also include Vta1 homologues, such as SBP1 from humans and LIP5 from Arabidopsis thaliana []. This domain can also be found in plant callose synthase, which is involved in callose synthesis at the forming cell plate during cytokinesis [].
Protein Domain
Name: Hepatitis E virus Orf2, capsid
Type: Family
Description: The Hepatitis E virus(HEV) genome is a single-stranded, positive-sense RNA molecule of approximately 7.5 kb [ ]. Three open reading frames (ORF) were identified within the HEV genome: ORF1 encodes nonstructural proteins, ORF2 encodes the putative structural protein(s), and ORF3 encodes a protein of unknown function. ORF2 contains a consensus signal peptide sequence at its amino terminus and a capsid-like region with a high content of basic amino acids similar to that seen with other virus capsid proteins [].
Protein Domain
Name: Cohesin subunit Scc3/SA
Type: Family
Description: Proteins in this family are subunits of the cohesin complex - a protein complex required for sister chromatid cohesion in eukaryotes, including subunits SA-1, SA-2, SA-3 from chordates, Scc3 from S. cerevisiae and Psc3 from S. pombe [ ]. They contain a STAG domain and a stromalin conservative domain (SCD). The SCD domain is also found in human STAG3-like proteins and meiotic recombination protein Rec11 from S. pombe []. The structure of Scc3 has been revealed [].
Protein Domain
Name: Methanogenesis multiheme c-type cytochrome
Type: Family
Description: Members of this protein family are multiheme cytochrome c proteins of Methanosarcina acetivorans C2A and several other archaeal methanogens. All members have N-terminal signal peptides and are presumed to act in electron transfer reactions associated with methanogenesis. Putative heme-binding motifs include five (or six) CXXCH motifs, a CXXXCH motif, and a CXXXXCH motif. These proteins show multiple regions of local homology, in the same order, with multiheme cytochrome c proteins such as octaheme tetrathionate reductase from Shewanella.
Protein Domain
Name: Caskin-1, SH3 domain
Type: Domain
Description: This entry represents the SH3 domain of Caskin-1 (CASK-interacting protein 1).Caskin-1 and Caskin-2 are multidomain proteins containing six N-terminal ankyrin repeats, a single SH3 domain, and two sterile alpha motif (SAM) domains followed by a long proline-rich sequence and a short conserved C-terminal domain. Caskin-1 may link the scaffolding protein CASK to downstream intracellular effectors [ ]. Caskin-1 polymerises, via the tandem SAM domains, to form long, 8 nM wide fibres, upon which other proteins can assemble [].
Protein Domain
Name: Peptidoglycan-binding protein, CsiV
Type: Family
Description: CsiV, a small periplasmic protein (cell-shape integrity in Vibrio), is essential for growth of Vibrio cholerae in the presence of DAA, non-canonical amino-acids, the typical components of peptidoglycan side-chains in Vibrio cholerae. CsiV interacts with LpoA, the lipoprotein activator of penicillin-binding-protein1A that is necessary for mediating the assembly of peptidoglycan. CsiV acts through LpoA to promote peptidoglycan biogenesis in V. cholerae and other vibrio species as well as in the other genera where this protein is found [ ].
Protein Domain
Name: VPRBP/DCAF1 family
Type: Family
Description: This entry represents the VPRBP/DCAF1 family, whose members include protein VPRBP from mammals [ ], protein mahjong form fruit flies [] and DCAF1 from worms and plants []. They are the substrate recognition component of the E3 ubiquitin-protein ligase complex. Human VPRBP is part of the CUL4A-RBX1-DDB1-DCAF1/VPRBP complex that mediates ubiquitination and proteasome-dependent degradation of proteins [ ]. VPRBP also has kinase activity and is capable of phosphorylating histone H2A on threonine 120 (H2AT120p) in a nucleosomal context [].
Protein Domain
Name: SiaC family regulatory phosphoprotein
Type: Domain
Description: This entry represents the SiaC family regulatory phosphoprotein which undergoes a regulatory phosphorylation at Thr-68 of founder protein PA0170 from Pseudomonas aeruginosa, but in more distant homologs, T can be S. Also, it is part of motif NTSS, so may contain more than one phosphorylation site. Phosphorylation causes regulatory change to protein-protein interaction, in a pathway that seems broadly distributed, involves a diguanylate cyclase, and in the case of Pseudomonas aeruginosa affects aggregation and biofilm formation responses [ ].
Protein Domain
Name: Amino acid antiporter
Type: Family
Description: Amino acid permeases are integral membrane proteins involved in the transport of amino acids into the cell. A number of such proteins have been found to beevolutionary related [ , , ]. These proteins seem to contain up to 12 transmembrane segments. The best conserved region in this family is located in the second transmembrane segment.Proteins in this group catalyse an electroneutral exchange between arginine and ornithine to allow high-efficiency energy conversions in the arginine deiminase pathway.
Protein Domain
Name: SAPAP family
Type: Family
Description: This entry represents the SAPAP family, whose members include mars from fruit flies and disks large-associated proteins from vertebrates.Mars binds to microtubules and protein phosphatase 1. It is involved in cell signalling, mitotic spindle organisation and regulation of mitotic cell cycle [ , ]. It is essential for the early development of Drosophila embryos []. Disks large-associated proteins are synaptic proteins that may play several roles in the molecular organisation of synapses and neuronal cell signalling [ , ].
Protein Domain
Name: Sperm-associated antigen 11 A/B
Type: Family
Description: This family consists of several variants of the human and chimpanzee (Pan troglodytes) sperm antigen proteins (HE2 and EP2respectively). The EP2 gene codes for a family of androgen-dependent, epididymis-specific secretory proteins, known as sperm-associated antigen 11. The EP2 gene uses alternative promoters and differential splicing to produce afamily of variant messages. The translated putative protein variants differ significantly from each other. Some of these putative proteins have similarity to beta-defensins, a family of antimicrobialpeptides [ ].
Protein Domain
Name: Vasodilator-stimulated phosphoprotein/ENA/VASP-like
Type: Family
Description: Ena/VASP proteins are actin-associated proteins involved in a range of processes dependent on cytoskeleton remodeling and cell polarity, such as axon guidance and lamellipodial and filopodial dynamics in migrating cells [ , ]. Ena/VASP proteins possess a modular domain organization, including a conserved tetramerization domain (TD) mediates the formation of Ena/VASP tetramers [ ]. In vertebrates there are three Ena-VASP family members: Mena (Mammalian enabled), VASP, and EVL (Ena-VASP like).This entry includes VASP and Ena/VASP-like (EVL) proteins.
Protein Domain
Name: Peptidoglycan recognition protein, PGRP-S
Type: Family
Description: PGRP-S (peptidoglycan recognition protein short, also known as peptidoglycan recognition protein 1) is a innate immunity pattern recognition protein conserved from insects to mammals, recognising bacterial cell wall peptidoglycan and activates two antimicrobial defense systems, prophenoloxidase cascade and antimicrobial peptides through Toll receptor [ ]. Peptidoglycan recognition proteins (PGRPs) are part of the host's innate immune system. In mammals, only four types of PGRPs, long PGRPs, intermediate PGRPs and short PGRPs have so far been identified [ ].
Protein Domain
Name: Prepilin type IV endopeptidase, peptidase domain
Type: Domain
Description: This group of aspartic endopeptidases belong to MEROPS peptidase family A24 (type IV prepilin peptidase family). The family is divided into two subfamilies: subfamily A24A includes the type IV prepilin peptidase from bacteria and subfamily A24B includes the preflagellin peptidase from archaea. Peptidases in the family are also known as "GXGD membrane proteases"because of the common motif that includes one of the two active site residues [ ].Bacteria produce a number of protein precursors that undergo post-translational methylation and proteolysis prior to secretion as active proteins. Type IV prepilin leader peptidases are enzymes that mediate this type of post-translational modification. Type IV pilin is a protein found on the surface of Pseudomonas aeruginosa, Neisseria gonorrhoeae and other Gram-negativepathogens. Pilin subunits attach the infecting organism to the surface of host epithelial cells. They are synthesised as prepilin subunits, whichdiffer from mature pilin by virtue of containing a 6-8 residue leader peptide consisting of charged amino acids. Mature type IV pilins alsocontain a methylated N-terminal phenylalanine residue.The bifunctional enzyme prepilin peptidase (PilD) from Pseudomonas aeruginosa is a key determinant in both type-IV pilus biogenesis and extracellular protein secretion, in its roles as a leader peptidase and methyl transferase (MTase). It is responsible for endopeptidic cleavage of the unique leader peptides that characterise type-IV pilin precursors, as well as proteins with homologous leader sequences that are essential components of the general secretion pathway found in a variety of Gram-negative pathogens. Following removal of the leader peptides, the same enzyme is responsible for the second post-translational modification that characterises the type-IV pilins and their homologues, namely N-methylation of the newly exposed N-terminal amino acid residue [ ]. In type IV prepilin peptidase, the two active-site Asp residues occur in the motifs Xaa-Xaa- Asp-Xaa-Xbb-Xcc-Xcc-Xcc-Xaa-Pro and Xaa-Gly-Xcc-Gly- Asp-Xaa-Lys-Xaa-Xaa-Xaa (where Xaa is hydrophobic, Xbb is charged and Xcc is any amino acid). Some archaea possess a flagellum that contains a flagellin protein. Flagellin is synthesized as a precursor with a positively charged leader peptide. This leader peptide is removed by prepilin peptidase before flagellin is incorporated into the filament [ ]. The tertiary structure of the preflagellin peptidase from Methanococcus maripaludishas been solved, and shows a bundle of six helices. The active site residues are far apart on transmembrane helices 1 and 4, which implies that a conformational change is required to activate the peptidase [ ].This entry represents the peptidase domain from the prepilin type IV endopeptidases [ ]. It can be found on its own, or in the case of the bifunctional enzymes, next to a methylation domain.Aspartic peptidases, also known as aspartyl proteases ([intenz:3.4.23.-]), are widely distributed proteolytic enzymes [, , ]known to exist in vertebrates, fungi, plants, protozoa, bacteria, archaea, retroviruses and some plant viruses. All known aspartic peptidases are endopeptidases. A water molecule, activated by two aspartic acid residues, acts as the nucleophile in catalysis. Aspartic peptidases can be grouped into five clans, each of which shows a unique structural fold [ ].Peptidases in clan AA are either bilobed (family A1 or the pepsin family) or are a homodimer (all other families in the clan, including retropepsin from HIV-1/AIDS) [ ]. Each lobe consists of a single domain with a closed β-barrel and each lobe contributes one Asp to form the active site. Most peptidases in the clan are inhibited by the naturally occurring small-molecule inhibitor pepstatin [].Clan AC contains the single family A8: the signal peptidase 2 family. Members of the family are found in all bacteria. Signal peptidase 2 processes the premurein precursor, removing the signal peptide. The peptidase has four transmembrane domains and the active site is on the periplasmic side of the cell membrane. Cleavage occurs on the amino side of a cysteine where the thiol group has been substituted by a diacylglyceryl group. Site-directed mutagenesis has identified two essential aspartic acid residues which occur in the motifs GNXXDRX and FNXAD (where X is a hydrophobic residue) [ ]. No tertiary structures have been solved for any member of the family, but because of the intramembrane location, the structure is assumed not to be pepsin-like.Clan AD contains two families of transmembrane endopeptidases: A22 and A24. These are also known as "GXGD peptidases"because of a common GXGD motif which includes one of the pair of catalytic aspartic acid residues. Structures are known for members of both families and show a unique, common fold with up to nine transmembrane regions [ ]. The active site aspartic acids are located within a large cavity in the membrane into which water can gain access [].Clan AE contains two families, A25 and A31. Tertiary structures have been solved for members of both families and show a common fold consisting of an α-β-alpha sandwich, in which the beta sheet is five stranded [ , ].Clan AF contains the single family A26. Members of the clan are membrane-proteins with a unique fold. Homologues are known only from bacteria. The structure of omptin (also known as OmpT) shows a cylindrical barrel containing ten beta strands inserted in the membrane with the active site residues on the outer surface [ ].There are two families of aspartic peptidases for which neither structure nor active site residues are known and these are not assigned to clans. Family A5 includes thermopsin, an endopeptidase found only in thermophilic archaea. Family A36 contains sporulation factor SpoIIGA, which is known to process and activate sigma factor E, one of the transcription factors that controls sporulation in bacteria [ ].
Protein Domain
Name: Leghaemoglobin, iron-binding site
Type: Binding_site
Description: Globins are haem-containing proteins involved in binding and/or transporting oxygen. They belong to a very large and well studied family that is widely distributed in many organisms [ ]. Globins have evolved from a common ancestor and can be divided into three groups: single-domain globins, and two types of chimeric globins, flavohaemoglobins and globin-coupled sensors. Bacteria have all three types of globins, while archaea lack flavohaemoglobins, and eukaryotes lack globin-coupled sensors []. Several functionally different haemoglobins can coexist in the same species. The major types of globins include:Haemoglobin (Hb): tetramer of two alpha and two beta chains, although embryonic and foetal forms can substitute the alpha or beta chain for ones with higher oxygen affinity, such as gamma, delta, epsilon or zeta chains. Hb transports oxygen from lungs to other tissues in vertebrates [ ]. Hb proteins are also present in unicellular organisms where they act as enzymes or sensors [].Myoglobin (Mb): monomeric protein responsible for oxygen storage in vertebrate muscle [ ].Neuroglobin: a myoglobin-like haemprotein expressed in vertebrate brain and retina, where it is involved in neuroprotection from damage due to hypoxia or ischemia [ ]. Neuroglobin belongs to a branch of the globin family that diverged early in evolution. Cytoglobin: an oxygen sensor expressed in multiple tissues. Related to neuroglobin [ ].Erythrocruorin: highly cooperative extracellular respiratory proteins found in annelids and arthropods that are assembled from as many as 180 subunit into hexagonal bilayers [ ].Leghaemoglobin (legHb or symbiotic Hb): occurs in the root nodules of leguminous plants, where it facilitates the diffusion of oxygen to symbiotic bacteriods in order to promote nitrogen fixation.Non-symbiotic haemoglobin (NsHb): occurs in non-leguminous plants, and can be over-expressed in stressed plants [ ].Flavohaemoglobins (FHb): chimeric, with an N-terminal globin domain and a C-terminal ferredoxin reductase-like NAD/FAD-binding domain. FHb provides protection against nitric oxide via its C-terminal domain, which transfers electrons to haem in the globin [].Globin-coupled sensors: chimeric, with an N-terminal myoglobin-like domain and a C-terminal domain that resembles the cytoplasmic signalling domain of bacterial chemoreceptors. They bind oxygen, and act to initiate an aerotactic response or regulate gene expression [ , ]. Protoglobin: a single domain globin found in archaea that is related to the N-terminal domain of globin-coupled sensors [ ].Truncated 2/2 globin: lack the first helix, giving them a 2-over-2 instead of the canonical 3-over-3 α-helical sandwich fold. Can be divided into three main groups (I, II and II) based on structural features [ ].This entry is found in leghaemoglobins from leguminous and non-leguminous plants, and also non-symbiotic haemoglobins from other plants. Leghaemoglobins were first identified in the root nodules of leguminous plants, where they are crucial for supplying sufficient oxygen to root nodule bacteria for nitrogen fixation to occur [ ]. Although leghaemoglobin and myoglobin both share a common fold, and both regulate the facilitated diffusion of oxygen, leghemoglobins regulate oxygen affinity through a mechanism different from that of myoglobin using a novel combination of haem pocket amino acids that lower the oxygen affinity [, ]. In non-leguminous plants, leghaemoglobins play a role in the respiratory metabolism of root cells. The structure of leghaemoglobins is similar to that of haemoglobins and myoglobins, although there is little sequence conservation. The proteins are largely α-helical, eight helices providing the scaffold for a well-defined haem-binding pocket. By contrast with the tetrameric mammalian globin assembly, the plant form is monomeric.Non-symbiotic haemoglobins (NsHb) play important roles in a variety of cellular processes. A class I NsHb from cotton plants can be induced in plant roots as a defence mechanism against pathogen invasion, possibly by modulating nitric oxide (NO) levels [ ]. Several NsHbs appear to play a role NO scavenging in plants, indicating that the primordial function of haemoglobins may well be to protect against nitrosative stress and to modulate NO signalling functions [].The signature pattern of this entry exclusively identifies plant haemoglobin sequences. It is centred on a histidine that acts as the haem iron distal ligand.
Protein Domain
Name: Peptidase C54
Type: Family
Description: This is a group of cysteine peptidases which constitute MEROPS peptidase family C54 (Aut2 peptidase family, clan CA).A cysteine peptidase is a proteolytic enzyme that hydrolyses a peptide bond using the thiol group of a cysteine residue as a nucleophile. Hydrolysis involves usually a catalytic triad consisting of the thiol group of the cysteine, the imidazolium ring of a histidine, and a third residue, usually asparagine or aspartic acid, to orientate and activate the imidazolium ring. In only one family of cysteine peptidases, is the role of the general base assigned to a residue other than a histidine: in peptidases from family C89 (acid ceramidase) an arginine is the general base. Cysteine peptidases can be grouped into fourteen different clans, with members of each clan possessing a tertiary fold unique to the clan. Four clans of cysteine peptidases share structural similarities with serine and threonine peptidases and asparagine lyases. From sequence similarities, cysteine peptidases can be clustered into over 80 different families [ ]. Clans CF, CM, CN, CO, CP and PD contain only one family.Cysteine peptidases are often active at acidic pH and are therefore confined to acidic environments, such as the animal lysosome or plant vacuole. Cysteine peptidases can be endopeptidases, aminopeptidases, carboxypeptidases, dipeptidyl-peptidases or omega-peptidases. They are inhibited by thiol chelators such as iodoacetate, iodoacetic acid, N-ethylmaleimide or p-chloromercuribenzoate. Clan CA includes proteins with a papain-like fold. There is a catalytic triad which occurs in the order: Cys/His/Asn (or Asp). A fourth residue, usually Gln, is important for stabilising the acyl intermediate that forms during catalysis, and this precedes the active site Cys. The fold consists of two subdomains with the active site between them. One subdomain consists of a bundle of helices, with the catalytic Cys at the end of one of them, and the other subdomain is a β-barrel with the active site His and Asn (or Asp). There are over thirty families in the clan, and tertiary structures have been solved for members of most of these. Peptidases in clan CA are usually sensitive to the small molecule inhibitor E64, which is ineffective against peptidases from other clans of cysteine peptidases [].Clan CD includes proteins with a caspase-like fold. Proteins in the clan have an α/β/α sandwich structure. There is a catalytic dyad which occurs in the order His/Cys. The active site His occurs in a His-Gly motif and the active site Cys occurs in an Ala-Cys motif; both motifs are preceded by a block of hydrophobic residues [ ]. Specificity is predominantly directed towards residues that occupy the S1 binding pocket, so that caspases cleave aspartyl bonds, legumains cleave asparaginyl bonds, and gingipains cleave lysyl or arginyl bonds.Clan CE includes proteins with an adenain-like fold. The fold consists of two subdomains with the active site between them. One domain is a bundle of helices, and the other a β-barrel. The subdomains are in the opposite order to those found in peptidases from clan CA, and this is reflected in the order of active site residues: His/Asn/Gln/Cys. This has prompted speculation that proteins in clans CA and CE are related, and that members of one clan are derived from a circular permutation of the structure of the other.Clan CL includes proteins with a sortase B-like fold. Peptidases in the clan hydrolyse and transfer bacterial cell wall peptides. The fold shows a closed β-barrel decorated with helices with the active site at one end of the barrel [ ]. The active site consists of a His/Cys catalytic dyad.Cysteine peptidases with a chymotrypsin-like fold are included in clan PA, which also includes serine peptidases. Cysteine peptidases that are N-terminal nucleophile hydrolases are included in clan PB. Cysteine peptidases with a tertiary structure similar to that of the serine-type aspartyl dipeptidase are included in clan PC. Cysteine peptidases with an intein-like fold are included in clan PD, which also includes asparagine lyases.
Protein Domain
Name: Peptidase C11, Clostripain Clostridium species
Type: Family
Description: Clostripain is a cysteine protease characterised from Clostridium histolyticum, and also known from Clostridium perfringens. It is a heterodimer processed from a single precursor polypeptide, using a specific Arg-|-Xaa cleavage. The older term alpha-clostripain refers to the most active, most reduced form, rather than to the product of one of several different genes. This group of cysteine peptidases belong to the MEROPS peptidase family C11 (clostripain family, clan CD).A cysteine peptidase is a proteolytic enzyme that hydrolyses a peptide bond using the thiol group of a cysteine residue as a nucleophile. Hydrolysis involves usually a catalytic triad consisting of the thiol group of the cysteine, the imidazolium ring of a histidine, and a third residue, usually asparagine or aspartic acid, to orientate and activate the imidazolium ring. In only one family of cysteine peptidases, is the role of the general base assigned to a residue other than a histidine: in peptidases from family C89 (acid ceramidase) an arginine is the general base. Cysteine peptidases can be grouped into fourteen different clans, with members of each clan possessing a tertiary fold unique to the clan. Four clans of cysteine peptidases share structural similarities with serine and threonine peptidases and asparagine lyases. From sequence similarities, cysteine peptidases can be clustered into over 80 different families [ ]. Clans CF, CM, CN, CO, CP and PD contain only one family.Cysteine peptidases are often active at acidic pH and are therefore confined to acidic environments, such as the animal lysosome or plant vacuole. Cysteine peptidases can be endopeptidases, aminopeptidases, carboxypeptidases, dipeptidyl-peptidases or omega-peptidases. They are inhibited by thiol chelators such as iodoacetate, iodoacetic acid, N-ethylmaleimide or p-chloromercuribenzoate. Clan CA includes proteins with a papain-like fold. There is a catalytic triad which occurs in the order: Cys/His/Asn (or Asp). A fourth residue, usually Gln, is important for stabilising the acyl intermediate that forms during catalysis, and this precedes the active site Cys. The fold consists of two subdomains with the active site between them. One subdomain consists of a bundle of helices, with the catalytic Cys at the end of one of them, and the other subdomain is a β-barrel with the active site His and Asn (or Asp). There are over thirty families in the clan, and tertiary structures have been solved for members of most of these. Peptidases in clan CA are usually sensitive to the small molecule inhibitor E64, which is ineffective against peptidases from other clans of cysteine peptidases [ ].Clan CD includes proteins with a caspase-like fold. Proteins in the clan have an α/β/α sandwich structure. There is a catalytic dyad which occurs in the order His/Cys. The active site His occurs in a His-Gly motif and the active site Cys occurs in an Ala-Cys motif; both motifs are preceded by a block of hydrophobic residues []. Specificity is predominantly directed towards residues that occupy the S1 binding pocket, so that caspases cleave aspartyl bonds, legumains cleave asparaginyl bonds, and gingipains cleave lysyl or arginyl bonds.Clan CE includes proteins with an adenain-like fold. The fold consists of two subdomains with the active site between them. One domain is a bundle of helices, and the other a β-barrel. The subdomains are in the opposite order to those found in peptidases from clan CA, and this is reflected in the order of active site residues: His/Asn/Gln/Cys. This has prompted speculation that proteins in clans CA and CE are related, and that members of one clan are derived from a circular permutation of the structure of the other.Clan CL includes proteins with a sortase B-like fold. Peptidases in the clan hydrolyse and transfer bacterial cell wall peptides. The fold shows a closed β-barrel decorated with helices with the active site at one end of the barrel [ ]. The active site consists of a His/Cys catalytic dyad.
Protein Domain
Name: Peptidase C11, clostripain
Type: Family
Description: This group of cysteine peptidases belong to the MEROPS peptidase family C11 (clostripain family, clan CD). A cysteine peptidase is a proteolytic enzyme that hydrolyses a peptide bond using the thiol group of a cysteine residue as a nucleophile. Hydrolysis involves usually a catalytic triad consisting of the thiol group of the cysteine, the imidazolium ring of a histidine, and a third residue, usually asparagine or aspartic acid, to orientate and activate the imidazolium ring. In only one family of cysteine peptidases, is the role of the general base assigned to a residue other than a histidine: in peptidases from family C89 (acid ceramidase) an arginine is the general base. Cysteine peptidases can be grouped into fourteen different clans, with members of each clan possessing a tertiary fold unique to the clan. Four clans of cysteine peptidases share structural similarities with serine and threonine peptidases and asparagine lyases. From sequence similarities, cysteine peptidases can be clustered into over 80 different families [ ]. Clans CF, CM, CN, CO, CP and PD contain only one family.Cysteine peptidases are often active at acidic pH and are therefore confined to acidic environments, such as the animal lysosome or plant vacuole. Cysteine peptidases can be endopeptidases, aminopeptidases, carboxypeptidases, dipeptidyl-peptidases or omega-peptidases. They are inhibited by thiol chelators such as iodoacetate, iodoacetic acid, N-ethylmaleimide or p-chloromercuribenzoate. Clan CA includes proteins with a papain-like fold. There is a catalytic triad which occurs in the order: Cys/His/Asn (or Asp). A fourth residue, usually Gln, is important for stabilising the acyl intermediate that forms during catalysis, and this precedes the active site Cys. The fold consists of two subdomains with the active site between them. One subdomain consists of a bundle of helices, with the catalytic Cys at the end of one of them, and the other subdomain is a β-barrel with the active site His and Asn (or Asp). There are over thirty families in the clan, and tertiary structures have been solved for members of most of these. Peptidases in clan CA are usually sensitive to the small molecule inhibitor E64, which is ineffective against peptidases from other clans of cysteine peptidases [ ].Clan CD includes proteins with a caspase-like fold. Proteins in the clan have an α/β/α sandwich structure. There is a catalytic dyad which occurs in the order His/Cys. The active site His occurs in a His-Gly motif and the active site Cys occurs in an Ala-Cys motif; both motifs are preceded by a block of hydrophobic residues [ ]. Specificity is predominantly directed towards residues that occupy the S1 binding pocket, so that caspases cleave aspartyl bonds, legumains cleave asparaginyl bonds, and gingipains cleave lysyl or arginyl bonds.Clan CE includes proteins with an adenain-like fold. The fold consists of two subdomains with the active site between them. One domain is a bundle of helices, and the other a β-barrel. The subdomains are in the opposite order to those found in peptidases from clan CA, and this is reflected in the order of active site residues: His/Asn/Gln/Cys. This has prompted speculation that proteins in clans CA and CE are related, and that members of one clan are derived from a circular permutation of the structure of the other.Clan CL includes proteins with a sortase B-like fold. Peptidases in the clan hydrolyse and transfer bacterial cell wall peptides. The fold shows a closed β-barrel decorated with helices with the active site at one end of the barrel [ ]. The active site consists of a His/Cys catalytic dyad.Cysteine peptidases with a chymotrypsin-like fold are included in clan PA, which also includes serine peptidases. Cysteine peptidases that are N-terminal nucleophile hydrolases are included in clan PB. Cysteine peptidases with a tertiary structure similar to that of the serine-type aspartyl dipeptidase are included in clan PC. Cysteine peptidases with an intein-like fold are included in clan PD, which also includes asparagine lyases.
Protein Domain
Name: 2-aminoethylphosphonate ABC transport system, permease protein, putative
Type: Family
Description: ABC transporters belong to the ATP-Binding Cassette (ABC) superfamily, which uses the hydrolysis of ATP to energise diverse biological systems. ABC transporters minimally consist of two conserved regions: a highly conserved ATP binding cassette (ABC) and a less conserved transmembrane domain (TMD). These can be found on the same protein or on two different ones. Most ABC transporters function as a dimer and therefore are constituted of four domains, two ABC modules and two TMDs.ABC transporters are involved in the export or import of a wide variety of substrates ranging from small ions to macromolecules. The major function of ABC import systems is to provide essential nutrients to bacteria. They are found only in prokaryotes and their four constitutive domains are usually encoded by independent polypeptides (two ABC proteins and two TMD proteins). Prokaryotic importers require additional extracytoplasmic binding proteins (one or more per systems) for function. In contrast, export systems are involved in the extrusion of noxious substances, the export of extracellular toxins and the targeting of membrane components. They are found in all living organisms and in general the TMD is fused to the ABC module in a variety of combinations. Some eukaryotic exporters encode the four domains on the same polypeptide chain [ ].The ABC module (approximately two hundred amino acid residues) is known to bind and hydrolyse ATP, thereby coupling transport to ATP hydrolysis in a large number of biological processes. The cassette is duplicated in several subfamilies. Its primary sequence is highly conserved, displaying a typical phosphate-binding loop: Walker A, and a magnesium binding site: Walker B. Besides these two regions, three other conserved motifs are present in the ABC cassette: the switch region which contains a histidine loop, postulated to polarise the attaching water molecule for hydrolysis, the signature conserved motif (LSGGQ) specific to the ABC transporter, and the Q-motif (between Walker A and the signature), which interacts with the gamma phosphate through a water bond. The Walker A, Walker B, Q-loop and switch region form the nucleotide binding site [, , ].The 3D structure of a monomeric ABC module adopts a stubby L-shape with two distinct arms. ArmI (mainly β-strand) contains Walker A and Walker B. The important residues for ATP hydrolysis and/or binding are located in the P-loop. The ATP-binding pocket is located at the extremity of armI. The perpendicular armII contains mostly the alpha helical subdomain with the signature motif. It only seems to be required for structural integrity of the ABC module. ArmII is in direct contact with the TMD. The hinge between armI and armII contains both the histidine loop and the Q-loop, making contact with the gamma phosphate of the ATP molecule. ATP hydrolysis leads to a conformational change that could facilitate ADP release. In the dimer the two ABC cassettes contact each other through hydrophobic interactions at the antiparallel β-sheet of armI by a two-fold axis [ , , , , , ].The ATP-Binding Cassette (ABC) superfamily forms one of the largest of all protein families with a diversity of physiological functions [ ]. Several studies have shown that there is a correlation between the functional characterisation and the phylogenetic classification of the ABC cassette [, ]. More than 50 subfamilies have been described based on a phylogenetic and functional classification [, , ].The enzyme phosphonatase catalyses the degradation of 2-aminoethylphosphonate (AEP) in bacteria. This allows them to metabolise a range of organophosphonate compounds, including 2-aminoethylphosphonate, as a sole source of carbon, energy and phosphorus for growth [ ]. The C-P bond in phosphonoacetaldehyde (Pald) is hydrolysed and a bi-covalent Lys53ethylenamine/Asp12 aspartylphosphate intermediate is formed []. This step can also be catalysed by C-P lyase [], with some bacteria having the genes for both pathways and some only for one of them. The 2-aminoethylphosphonate ABC transport system functions in the transport of 2-aminoethylphosphonate across the membrane for utilisation in the bacterial cell [].This entry represents a putative permease protein in the 2-aminoethylphosphonate ABC transport system [ ].
Protein Domain
Name: PotA, ATP-binding domain
Type: Domain
Description: ABC transporters belong to the ATP-Binding Cassette (ABC) superfamily, which uses the hydrolysis of ATP to energise diverse biological systems. ABC transporters minimally consist of two conserved regions: a highly conserved ATP binding cassette (ABC) and a less conserved transmembrane domain (TMD). These can be found on the same protein or on two different ones. Most ABC transporters function as a dimer and therefore are constituted of four domains, two ABC modules and two TMDs.ABC transporters are involved in the export or import of a wide variety of substrates ranging from small ions to macromolecules. The major function of ABC import systems is to provide essential nutrients to bacteria. They are found only in prokaryotes and their four constitutive domains are usually encoded by independent polypeptides (two ABC proteins and two TMD proteins). Prokaryotic importers require additional extracytoplasmic binding proteins (one or more per systems) for function. In contrast, export systems are involved in the extrusion of noxious substances, the export of extracellular toxins and the targeting of membrane components. They are found in all living organisms and in general the TMD is fused to the ABC module in a variety of combinations. Some eukaryotic exporters encode the four domains on the same polypeptide chain [ ].The ABC module (approximately two hundred amino acid residues) is known to bind and hydrolyse ATP, thereby coupling transport to ATP hydrolysis in a large number of biological processes. The cassette is duplicated in several subfamilies. Its primary sequence is highly conserved, displaying a typical phosphate-binding loop: Walker A, and a magnesium binding site: Walker B. Besides these two regions, three other conserved motifs are present in the ABC cassette: the switch region which contains a histidine loop, postulated to polarise the attaching water molecule for hydrolysis, the signature conserved motif (LSGGQ) specific to the ABC transporter, and the Q-motif (between Walker A and the signature), which interacts with the gamma phosphate through a water bond. The Walker A, Walker B, Q-loop and switch region form the nucleotide binding site [, , ].The 3D structure of a monomeric ABC module adopts a stubby L-shape with two distinct arms. ArmI (mainly β-strand) contains Walker A and Walker B. The important residues for ATP hydrolysis and/or binding are located in the P-loop. The ATP-binding pocket is located at the extremity of armI. The perpendicular armII contains mostly the alpha helical subdomain with the signature motif. It only seems to be required for structural integrity of the ABC module. ArmII is in direct contact with the TMD. The hinge between armI and armII contains both the histidine loop and the Q-loop, making contact with the gamma phosphate of the ATP molecule. ATP hydrolysis leads to a conformational change that could facilitate ADP release. In the dimer the two ABC cassettes contact each other through hydrophobic interactions at the antiparallel β-sheet of armI by a two-fold axis [ , , , , , ].The ATP-Binding Cassette (ABC) superfamily forms one of the largest of all protein families with a diversity of physiological functions [ ]. Several studies have shown that there is a correlation between the functional characterisation and the phylogenetic classification of the ABC cassette [, ]. More than 50 subfamilies have been described based on a phylogenetic and functional classification [, , ].PotA is a bacterial protein that imports putrescine and spermidine [ , , ]. Spermidine is a polyamine involved in cellular metabolism that can be used to stimulate the enzyme the RNA polymerase, T7 RNA polymerase. Putrescine attacks s-adenosyl methionine and converts it to spermidine. PotA has two domains with the N-terminal domain containing the ATPase activity and the residues required for homodimerization with PotA and heterdimerization with PotB. This entry represents the N-terminal domain of PotA.
Protein Domain
Name: Globin, lamprey/hagfish type
Type: Family
Description: Globins are haem-containing proteins involved in binding and/or transporting oxygen. They belong to a very large and well studied family that is widely distributed in many organisms [ ]. Globins have evolved from a common ancestor and can be divided into three groups: single-domain globins, and two types of chimeric globins, flavohaemoglobins and globin-coupled sensors. Bacteria have all three types of globins, while archaea lack flavohaemoglobins, and eukaryotes lack globin-coupled sensors []. Several functionally different haemoglobins can coexist in the same species. The major types of globins include:Haemoglobin (Hb): tetramer of two alpha and two beta chains, although embryonic and foetal forms can substitute the alpha or beta chain for ones with higher oxygen affinity, such as gamma, delta, epsilon or zeta chains. Hb transports oxygen from lungs to other tissues in vertebrates [ ]. Hb proteins are also present in unicellular organisms where they act as enzymes or sensors [].Myoglobin (Mb): monomeric protein responsible for oxygen storage in vertebrate muscle [ ].Neuroglobin: a myoglobin-like haemprotein expressed in vertebrate brain and retina, where it is involved in neuroprotection from damage due to hypoxia or ischemia [ ]. Neuroglobin belongs to a branch of the globin family that diverged early in evolution. Cytoglobin: an oxygen sensor expressed in multiple tissues. Related to neuroglobin [ ].Erythrocruorin: highly cooperative extracellular respiratory proteins found in annelids and arthropods that are assembled from as many as 180 subunit into hexagonal bilayers [ ].Leghaemoglobin (legHb or symbiotic Hb): occurs in the root nodules of leguminous plants, where it facilitates the diffusion of oxygen to symbiotic bacteriods in order to promote nitrogen fixation.Non-symbiotic haemoglobin (NsHb): occurs in non-leguminous plants, and can be over-expressed in stressed plants [ ].Flavohaemoglobins (FHb): chimeric, with an N-terminal globin domain and a C-terminal ferredoxin reductase-like NAD/FAD-binding domain. FHb provides protection against nitric oxide via its C-terminal domain, which transfers electrons to haem in the globin [].Globin-coupled sensors: chimeric, with an N-terminal myoglobin-like domain and a C-terminal domain that resembles the cytoplasmic signalling domain of bacterial chemoreceptors. They bind oxygen, and act to initiate an aerotactic response or regulate gene expression [ , ]. Protoglobin: a single domain globin found in archaea that is related to the N-terminal domain of globin-coupled sensors [ ].Truncated 2/2 globin: lack the first helix, giving them a 2-over-2 instead of the canonical 3-over-3 α-helical sandwich fold. Can be divided into three main groups (I, II and II) based on structural features [ ].Lampreys have haemoglobins with self-association and ligand-binding properties that are very different from those characteristic of the tetrameric Hbs of higher vertebrates [ ]. Monomeric, ligated lamprey Hb self-associates to dimers and tetramers on deoxygenation; dissociation to monomers on oxygenation accounts for the cooperative binding of O(2) and its pH dependence []. Adult erythrocytes of Mordacia mordax (Southern hemisphere lamprey), a southern hemisphere lamprey, contain three monomeric haemoglobins that are closely related to hag-fish haemoglobins [].The 3D structures of a great number of vertebrate Hbs in various states are known. The protein is largely α-helical, eight conserved helices (A to H) providing the scaffold for a well-defined haem-binding pocket. The imidazole ring of the "proximal"His residue provides the fifth haem iron ligand; the other axial haem iron position remains essentially free for O(2) coordination.The crystal structure of deoxygenated lamprey haemoglobin V has been determined by molecular replacement to 2.7A resolution, in a crystal form with twelve protomers in the asymmetric unit [ ]. The subunits are arranged as identical dimers, their interface comprising non-polar interactions and a cluster of four glutamate residues contributed by the E helices and the AB corner - the Bohr effect seems to result from proton uptake by the inter- facial glutamate residues []. By contrast with human and mollusc Hbs, where modulation of function results primarily from proximal effects, regulation of oxygen affinity in lamprey Hb V seems to depend on changes at the distal (ligand-binding) side of the haem group [].
Protein Domain
Name: Peptidase C23
Type: Domain
Description: A cysteine peptidase is a proteolytic enzyme that hydrolyses a peptide bond using the thiol group of a cysteine residue as a nucleophile. Hydrolysis involves usually a catalytic triad consisting of the thiol group of the cysteine, the imidazolium ring of a histidine, and a third residue, usually asparagine or aspartic acid, to orientate and activate the imidazolium ring. In only one family of cysteine peptidases, is the role of the general base assigned to a residue other than a histidine: in peptidases from family C89 (acid ceramidase) an arginine is the general base. Cysteine peptidases can be grouped into fourteen different clans, with members of each clan possessing a tertiary fold unique to the clan. Four clans of cysteine peptidases share structural similarities with serine and threonine peptidases and asparagine lyases. From sequence similarities, cysteine peptidases can be clustered into over 80 different families []. Clans CF, CM, CN, CO, CP and PD contain only one family.Cysteine peptidases are often active at acidic pH and are therefore confined to acidic environments, such as the animal lysosome or plant vacuole. Cysteine peptidases can be endopeptidases, aminopeptidases, carboxypeptidases, dipeptidyl-peptidases or omega-peptidases. They are inhibited by thiol chelators such as iodoacetate, iodoacetic acid, N-ethylmaleimide or p-chloromercuribenzoate. Clan CA includes proteins with a papain-like fold. There is a catalytic triad which occurs in the order: Cys/His/Asn (or Asp). A fourth residue, usually Gln, is important for stabilising the acyl intermediate that forms during catalysis, and this precedes the active site Cys. The fold consists of two subdomains with the active site between them. One subdomain consists of a bundle of helices, with the catalytic Cys at the end of one of them, and the other subdomain is a β-barrel with the active site His and Asn (or Asp). There are over thirty families in the clan, and tertiary structures have been solved for members of most of these. Peptidases in clan CA are usually sensitive to the small molecule inhibitor E64, which is ineffective against peptidases from other clans of cysteine peptidases [ ].Clan CD includes proteins with a caspase-like fold. Proteins in the clan have an α/β/α sandwich structure. There is a catalytic dyad which occurs in the order His/Cys. The active site His occurs in a His-Gly motif and the active site Cys occurs in an Ala-Cys motif; both motifs are preceded by a block of hydrophobic residues [ ]. Specificity is predominantly directed towards residues that occupy the S1 binding pocket, so that caspases cleave aspartyl bonds, legumains cleave asparaginyl bonds, and gingipains cleave lysyl or arginyl bonds.Clan CE includes proteins with an adenain-like fold. The fold consists of two subdomains with the active site between them. One domain is a bundle of helices, and the other a β-barrel. The subdomains are in the opposite order to those found in peptidases from clan CA, and this is reflected in the order of active site residues: His/Asn/Gln/Cys. This has prompted speculation that proteins in clans CA and CE are related, and that members of one clan are derived from a circular permutation of the structure of the other.Clan CL includes proteins with a sortase B-like fold. Peptidases in the clan hydrolyse and transfer bacterial cell wall peptides. The fold shows a closed β-barrel decorated with helices with the active site at one end of the barrel [ ]. The active site consists of a His/Cys catalytic dyad.This group of cysteine peptidases belong to the MEROPS peptidase family C23 (clan CA). The type example is Carlavirus (apple stem pitting virus) endopeptidase, this thought to play a role in the post-translational cleavage of the high molecular weight primary translation products of the virus.
Protein Domain
Name: Peptidase C42, beet yellows virus-type papain-like endopeptidase C42
Type: Domain
Description: This group of cysteine peptidases correspond to MEROPS peptidase family C42. The type example is beet yellows virus-type papain-like endopeptidase (beet yellows virus) [ ]. A cysteine peptidase is a proteolytic enzyme that hydrolyses a peptide bond using the thiol group of a cysteine residue as a nucleophile. Hydrolysis involves usually a catalytic triad consisting of the thiol group of the cysteine, the imidazolium ring of a histidine, and a third residue, usually asparagine or aspartic acid, to orientate and activate the imidazolium ring. In only one family of cysteine peptidases, is the role of the general base assigned to a residue other than a histidine: in peptidases from family C89 (acid ceramidase) an arginine is the general base. Cysteine peptidases can be grouped into fourteen different clans, with members of each clan possessing a tertiary fold unique to the clan. Four clans of cysteine peptidases share structural similarities with serine and threonine peptidases and asparagine lyases. From sequence similarities, cysteine peptidases can be clustered into over 80 different families [ ]. Clans CF, CM, CN, CO, CP and PD contain only one family.Cysteine peptidases are often active at acidic pH and are therefore confined to acidic environments, such as the animal lysosome or plant vacuole. Cysteine peptidases can be endopeptidases, aminopeptidases, carboxypeptidases, dipeptidyl-peptidases or omega-peptidases. They are inhibited by thiol chelators such as iodoacetate, iodoacetic acid, N-ethylmaleimide or p-chloromercuribenzoate. Clan CA includes proteins with a papain-like fold. There is a catalytic triad which occurs in the order: Cys/His/Asn (or Asp). A fourth residue, usually Gln, is important for stabilising the acyl intermediate that forms during catalysis, and this precedes the active site Cys. The fold consists of two subdomains with the active site between them. One subdomain consists of a bundle of helices, with the catalytic Cys at the end of one of them, and the other subdomain is a β-barrel with the active site His and Asn (or Asp). There are over thirty families in the clan, and tertiary structures have been solved for members of most of these. Peptidases in clan CA are usually sensitive to the small molecule inhibitor E64, which is ineffective against peptidases from other clans of cysteine peptidases [].Clan CD includes proteins with a caspase-like fold. Proteins in the clan have an α/β/α sandwich structure. There is a catalytic dyad which occurs in the order His/Cys. The active site His occurs in a His-Gly motif and the active site Cys occurs in an Ala-Cys motif; both motifs are preceded by a block of hydrophobic residues [ ]. Specificity is predominantly directed towards residues that occupy the S1 binding pocket, so that caspases cleave aspartyl bonds, legumains cleave asparaginyl bonds, and gingipains cleave lysyl or arginyl bonds.Clan CE includes proteins with an adenain-like fold. The fold consists of two subdomains with the active site between them. One domain is a bundle of helices, and the other a β-barrel. The subdomains are in the opposite order to those found in peptidases from clan CA, and this is reflected in the order of active site residues: His/Asn/Gln/Cys. This has prompted speculation that proteins in clans CA and CE are related, and that members of one clan are derived from a circular permutation of the structure of the other.Clan CL includes proteins with a sortase B-like fold. Peptidases in the clan hydrolyse and transfer bacterial cell wall peptides. The fold shows a closed β-barrel decorated with helices with the active site at one end of the barrel [ ]. The active site consists of a His/Cys catalytic dyad.Cysteine peptidases with a chymotrypsin-like fold are included in clan PA, which also includes serine peptidases. Cysteine peptidases that are N-terminal nucleophile hydrolases are included in clan PB. Cysteine peptidases with a tertiary structure similar to that of the serine-type aspartyl dipeptidase are included in clan PC. Cysteine peptidases with an intein-like fold are included in clan PD, which also includes asparagine lyases.
Protein Domain
Name: Melanin-concentrating hormone receptor 1
Type: Family
Description: G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups [ ]. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [, , , , ]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice [ ]. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The rhodopsin-like GPCRs (GPCRA) represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7 transmembrane (TM) helices [ , , ].Melanin-concentrating hormone (MCH) is a cyclic peptide originally identified in teleost fish [ ]. In fish, MCH is released from the pituitary and causes lightening of skin pigment cells through pigment aggregation [, ]. In mammals, MCH is predominantly expressed in the hypothalamus, and functions as a neurotransmitter in the control of a range of functions []. A major role of MCH is thought to be in the regulation of feeding: injection of MCH into rat brains stimulates feeding; expression of MCH is upregulated in the hypothalamus of obese and fasting mice; and mice lacking MCH are lean and eat less [, ]. MCH and alpha melanocyte-stimulating hormone (alpha-MSH) have antagonistic effects on a number of physiological functions. Alpha-MSH darkens pigmentation in fish and reduces feeding in mammals, whereas MCH increases feeding [, ].MCH receptor 1 (MCHR1, previously known as SLC1) is a class I GPCR [ , ]. Expression of the receptor has been found at highest levels in the brain, with moderate levels in the eye and skeletal muscle, and lower levels in the tongue and pituitary. In the brain, the receptor is expressed extensively in the hippocampus, olfactory regions and medial nucleus accumbens, a distribution that corresponds to connections between MCH-containing neurons and areas of the brain involved in taste and olfaction []. The receptor is also found in parts of the hypothalamus, such as the ventromedial nucleus, that are known to regulate feeding and metabolism. The MCH receptor is expressed at moderate levels in the substantia nigra, ventral tegmental area and amygdala, suggesting that MCH may modulate the dopaminergic system. Expression has also been found in the locus coeruleus, indicating a possible role in the control of noradrenaline responses, including vigilance, attention, memory and sleep. Binding of MCH to the receptor results in inhibition of forskolin-stimulated cyclic AMP accumulation in a pertussis toxin sensitive manner, release of intracellular calcium in a partially pertussis toxin sensitive manner and activation of MAP kinase in a partially protein kinase C dependent manner []. This indicates that the MCH receptor is capable of coupling to G proteins of the Gi, Go and Gq classes.
Protein Domain
Name: Potassium channel, inwardly rectifying, Kir1.2
Type: Family
Description: Potassium channels are the most diverse group of the ion channel family [ , ]. They are important in shaping the action potential, and in neuronal excitability and plasticity []. The potassium channel family is composed of several functionally distinct isoforms, which can be broadly separated into 2 groups []: the practically non-inactivating 'delayed' group and the rapidly inactivating 'transient' group.These are all highly similar proteins, with only small amino acid changes causing the diversity of the voltage-dependent gating mechanism, channel conductance and toxin binding properties. Each type of K +channel is activated by different signals and conditions depending on their type of regulation: some open in response to depolarisation of the plasma membrane; others in response to hyperpolarisation or an increase in intracellular calcium concentration; some can be regulated by binding of a transmitter, together with intracellular kinases; while others are regulated by GTP-binding proteins or other second messengers [ ]. In eukaryotic cells, K+channels are involved in neural signalling and generation of the cardiac rhythm, act as effectors in signal transduction pathways involving G protein-coupled receptors (GPCRs) and may have a role in target cell lysis by cytotoxic T-lymphocytes [ ]. In prokaryotic cells, they play a role in the maintenance of ionic homeostasis [].All K +channels discovered so far possess a core of alpha subunits, each comprising either one or two copies of a highly conserved pore loop domain (P-domain). The P-domain contains the sequence (T/SxxTxGxG), which has been termed the K +selectivity sequence. In families that contain one P-domain, four subunits assemble to form a selective pathway for K +across the membrane. However, it remains unclear how the 2 P-domain subunits assemble to form a selective pore. The functional diversity of these families can arise through homo- or hetero-associations of alpha subunits or association with auxiliary cytoplasmic beta subunits. K +channel subunits containing one pore domain can be assigned into one of two superfamilies: those that possess six transmembrane (TM) domains and those that possess only two TM domains. The six TM domain superfamily can be further subdivided into conserved gene families: the voltage-gated (Kv) channels; the KCNQ channels (originally known as KvLQT channels); the EAG-like K +channels; and three types of calcium (Ca)-activated K +channels (BK, IK and SK) [ ]. The 2TM domain family comprises inward-rectifying K+channels. In addition, there are K +channel alpha-subunits that possess two P-domains. These are usually highly regulated K +selective leak channels. Inwardly-rectifying potassium channels (Kir) are the principal class of two-TM domain potassium channels. They are characterised by the property of inward-rectification, which is described as the ability to allow large inward currents and smaller outward currents. Inwardly rectifying potassium channels (Kir) are responsible for regulating diverse processes including: cellular excitability, vascular tone, heart rate, renal salt flow, and insulin release [ ]. To date, around twenty members of this superfamily have been cloned, which can be grouped into six families by sequence similarity, and these are designated Kir1.x-6.x [, ].Cloned Kir channel cDNAs encode proteins of between ~370-500 residues, both N- and C-termini are thought to be cytoplasmic, and the N terminus lacks a signal sequence. Kir channel alpha subunits possess only 2TM domains linked with a P-domain. Thus, Kir channels share similarity with the fifth and sixth domains, and P-domain of the other families. It is thought that four Kir subunits assemble to form a tetrameric channel complex, which may be hetero- or homomeric [ ].The Kir1.2 channel (also known as Kir4.1, ROMK2, BIR10, KAB-2 and BIRK1) is found principally in the brain, where it is widely distributed. It has also been found on satellite cells of the rat cochlear ganglia, and at low levels within the kidney. Kir1.2 channel activity is dependent upon phosphorylation-state, with two serine residues within the C terminus, likely the target of PKA protein kinase A, being involved []. Kir1.2 has recently been shown to interact with a novel PDZ domain-containing protein, CIPP [].
Protein Domain
Name: Peptidase C8, hypovirus
Type: Family
Description: This group of cysteine peptidases belong to MEROPS peptidase family C8 (clan CA). The peptidases are encoded by the double stranded viral RNAs belonging to the genus Hypovirus.A cysteine peptidase is a proteolytic enzyme that hydrolyses a peptide bond using the thiol group of a cysteine residue as a nucleophile. Hydrolysis involves usually a catalytic triad consisting of the thiol group of the cysteine, the imidazolium ring of a histidine, and a third residue, usually asparagine or aspartic acid, to orientate and activate the imidazolium ring. In only one family of cysteine peptidases, is the role of the general base assigned to a residue other than a histidine: in peptidases from family C89 (acid ceramidase) an arginine is the general base. Cysteine peptidases can be grouped into fourteen different clans, with members of each clan possessing a tertiary fold unique to the clan. Four clans of cysteine peptidases share structural similarities with serine and threonine peptidases and asparagine lyases. From sequence similarities, cysteine peptidases can be clustered into over 80 different families [ ]. Clans CF, CM, CN, CO, CP and PD contain only one family.Cysteine peptidases are often active at acidic pH and are therefore confined to acidic environments, such as the animal lysosome or plant vacuole. Cysteine peptidases can be endopeptidases, aminopeptidases, carboxypeptidases, dipeptidyl-peptidases or omega-peptidases. They are inhibited by thiol chelators such as iodoacetate, iodoacetic acid, N-ethylmaleimide or p-chloromercuribenzoate. Clan CA includes proteins with a papain-like fold. There is a catalytic triad which occurs in the order: Cys/His/Asn (or Asp). A fourth residue, usually Gln, is important for stabilising the acyl intermediate that forms during catalysis, and this precedes the active site Cys. The fold consists of two subdomains with the active site between them. One subdomain consists of a bundle of helices, with the catalytic Cys at the end of one of them, and the other subdomain is a β-barrel with the active site His and Asn (or Asp). There are over thirty families in the clan, and tertiary structures have been solved for members of most of these. Peptidases in clan CA are usually sensitive to the small molecule inhibitor E64, which is ineffective against peptidases from other clans of cysteine peptidases [ ].Clan CD includes proteins with a caspase-like fold. Proteins in the clan have an α/β/α sandwich structure. There is a catalytic dyad which occurs in the order His/Cys. The active site His occurs in a His-Gly motif and the active site Cys occurs in an Ala-Cys motif; both motifs are preceded by a block of hydrophobic residues [ ]. Specificity is predominantly directed towards residues that occupy the S1 binding pocket, so that caspases cleave aspartyl bonds, legumains cleave asparaginyl bonds, and gingipains cleave lysyl or arginyl bonds.Clan CE includes proteins with an adenain-like fold. The fold consists of two subdomains with the active site between them. One domain is a bundle of helices, and the other a β-barrel. The subdomains are in the opposite order to those found in peptidases from clan CA, and this is reflected in the order of active site residues: His/Asn/Gln/Cys. This has prompted speculation that proteins in clans CA and CE are related, and that members of one clan are derived from a circular permutation of the structure of the other.Clan CL includes proteins with a sortase B-like fold. Peptidases in the clan hydrolyse and transfer bacterial cell wall peptides. The fold shows a closed β-barrel decorated with helices with the active site at one end of the barrel [ ]. The active site consists of a His/Cys catalytic dyad.Cysteine peptidases with a chymotrypsin-like fold are included in clan PA, which also includes serine peptidases. Cysteine peptidases that are N-terminal nucleophile hydrolases are included in clan PB. Cysteine peptidases with a tertiary structure similar to that of the serine-type aspartyl dipeptidase are included in clan PC. Cysteine peptidases with an intein-like fold are included in clan PD, which also includes asparagine lyases.
Protein Domain
Name: AP-5 complex subunit zeta-1
Type: Family
Description: AP-5 complex subunit zeta-1 is a component of the fifth Adaptor-Protein complex (AP-5) [ ]. Adaptor protein (AP) complexes facilitate the trafficking of cargo from one membrane compartment of the cell to another by recruiting other proteins to particular types of vesicles. AP-5 is involved in trafficking proteins from endosomes towards other membranous compartments []. There are genetic links between AP-5 and hereditary spastic paraplegia, a group of human genetic disorders characterised by progressive spasticity in the lower limbs [].
Protein Domain
Name: ABC/ECF transporter, transmembrane component
Type: Family
Description: ECF (energy-coupling factor) transporters are a subgroup of ABC (ATP-binding cassette) transporters involved in the uptake of vitamins and micronutrients in prokaryotes [ ]. ECF transporters are protein complexes consisting of a conserved module (two peripheral ATPases and the integral membrane protein EcfT) and a non-conserved integral membrane protein responsible for substrate specificity (S-component) []. This entry represents the transmembrane component from a number of ECF transporters, including cobalt-specific transporter CbiQ, and nickel-specific transporter NikQ [ ]. It also includes uncharacterised eukaryotic proteins.
Protein Domain
Name: Sugar transporter, conserved site
Type: Conserved_site
Description: The sugar transporters belong to a superfamily of membrane proteins responsible for the binding and transport of various carbohydrates, organic alcohols, and acids in a wide range of prokaryotic and eukaryotic organisms [ ]. These integral membrane proteins are predicted to comprise twelve membrane spanning domains. It is likely that the transporters have evolved from an ancient protein present in living organisms before the divergence into prokaryotes and eukaryotes []. In mammals, these proteins are expressed in a number of organs [].
Protein Domain
Name: Upf1-like, C-terminal helicase domain
Type: Domain
Description: This entry represents the C-terminal helicase domain of Upf1-like family helicases. The Upf1-like helicase family includes UPF1, HELZ, Mov10L1, Aquarius, IGHMBP2 (SMUBP2), and similar proteins [, ]. They are DEAD-like helicases belonging to superfamily (SF)1, a diverse family of proteins involved in ATP-dependent RNA or DNA unwinding. Similar to SF2 helicases, SF1 helicases do not form toroidal structures like SF3-6 helicases. Their helicase core consists of two similar protein domains that resemble the fold of the recombination protein RecA [, , , ].
Protein Domain
Name: Autotransporter, YhjY, predicted
Type: Family
Description: This group represents a predicted autotransporter, YhjY type. Secretion of protein products occurs by a number of different pathways in bacteria. One of these pathways, known as the type IV pathway, was first described for the IgA1 protease [ ]. The protein component that mediates secretion through the outer membrane is contained within the secreted protein itself, hence the proteins secreted in this way are called autotransporters. yhjY is part of the RpoS regulon, which is involved in DNA repair [].
Protein Domain
Name: Long-chain acyl-[acyl-carrier-protein] reductase
Type: Family
Description: This entry represents Long-chain acyl-[acyl-carrier-protein] reductase from Synechococcus elongatus (ARR) and similar proteins from Cyanobacteria. ARR reduces a long-chain (mainly C16 or C18) fatty acyl ACP ester to its corresponding fatty aldehyde, releasing the acyl carrier protein (ACP) []. NADPH is the reductant for this reaction. Its structure shows the protein is composed of three domains, with the the mid-domain was designated as the dinucleotide recognition loop []. This enzyme may be distantly related to the short-chain dehydrogenase or reductase (SDR) family ().
Protein Domain
Name: CoA enzyme activase
Type: Domain
Description: This domain is found in a set of closely related proteins including the (R)-2-hydroxyglutaryl-CoA dehydratase activase of Acidaminococcus fermentans, in longer proteins from Methanocaldococcus jannaschii (Methanococcus jannaschii) and Methanobacterium thermoautotrophicum that share an additonal N-terminal domain, in a protein described as a subunit of the benzoyl-CoA reductase of Rhodopseudomonas palustris, and in two repeats of an uncharacterised protein of Aquifex aeolicus. This domain may be involved in generating or regenerating the active sites of enzymes related to (R)-2-hydroxyglutaryl-CoA dehydratase and benzoyl-CoA reductase.
Protein Domain
Name: BTG-like domain superfamily
Type: Homologous_superfamily
Description: This entry represents a conserved domain found in the N-terminal of the BTG family members (also known as anti-proliferative proteins). In mammals, BTG family comprises six proteins: BTG1, BTG2/PC3/Tis21, BTG3/ANA, BTG4/PC3B, Tob1/Tob and Tob2. They regulate cell cycle progression in a variety of cell types [ ].These proteins have from 158 to 363 amino acid residues, that are highly similar and include 3 conserved cysteine residues. BTG2 seems to have a signal sequence; while the other proteins may lack such a domain.
Protein Domain
Name: Beta-1,3-glucan recognition protein, C-terminal
Type: Domain
Description: Beta 1,3-glucan recognition proteins (GRP, also called Gram-negative bacteria binding proteins or GNBPs) have specific affinity for beta 1,3-glucan, a component on the surface of fungi and bacteria. Beta-GRP (beta-1,3-glucan recognition protein) is one of several pattern recognition receptors (PRRs), also referred to as biosensor proteins, that complexes with pathogen-associated beta-1,3-glucans and then transduces signals necessary for activation of an appropriate innate immune response. They are present in insects and lack all catalytic residues [ , , , , ].
Protein Domain
Name: Olfactory marker superfamily
Type: Homologous_superfamily
Description: Olfactory marker protein (OMP) is a highly expressed, cytoplasmic protein found in mature olfactory sensory receptor neurons of all vertebrates. OMP is a modulator of the olfactory signal transduction cascade. The crystal structure of OMP reveals a beta sandwich consisting of eight strands in two sheets with a jelly-roll topology [ ]. Three highly conserved regions have been identified as possible protein-protein interaction sites in OMP, indicating a possible role for OMP in modulating such interactions, thereby acting as a molecular switch [].
Protein Domain
Name: TraG, N-terminal, Proteobacteria
Type: Domain
Description: This domain is found in the N-terminal region of the TraG protein ( ) from Escherichia coli. This is a membrane-spanning protein, with three predicted transmembrane segments and two periplasmic regions [ ]. The TraG protein is known to be essential for DNA transfer in the process of conjugation, with the N-terminal portion being required for F pilus assembly [, ]. The protein is thought to interact with the periplasmic domain of TraN () to stabilise mating-cell interactions [ ].
Protein Domain
Name: Phosphate-starvation-induced PsiE-like
Type: Family
Description: This entry represents the phosphate-starvation-inducible protein (PsiE) and similar proteins. This entry is sometimes found in proteins that contain other domains such as the protoglobin domain.Phosphate-starvation-inducible E (PsiE) expression is under direct positive and negative control by PhoB and cAMP-CRP, respectively [ ]. PsiE is an integral membrane protein with four transmembrane helices. The second alpha helix contains a conserved glutamic acid residue and the third helix contains a conserved arginine residue. The function of PsiE remains to be determined.
Protein Domain
Name: AttH domain
Type: Domain
Description: This domain is found in bacterial proteins that contains the AttH-like fold characterised by two flattened, orthogonally packed, β-barrels of lipocalin-like topology. Proteins containing this domain includes (also known as NE1406), an all-beta protein with an AttH-like fold [ , ]. Proteins containing this domain also include kievitone hydratase from the fungus Fusarium solani. Kievitone hydratase converts kievitone to the less toxic kievitone hydrate, and thereby protects this pathogenic fungus against this antimicrobial phytoalexin produced by Phaseolus vulgaris (French bean) [ , ].
Protein Domain
Name: FMP27, sixth RBG unit
Type: Domain
Description: Fungal proteins FMP27 (also known as Hob1) and Hob2 (YPR117W) are tube-forming lipid transport proteins which bind to phosphatidylinositols and affects phosphatidylinositol-4,5-bisphosphate (PtdIns-4,5-P2) distribution [ , , ]. They belong to the repeating β-groove (RBG) superfamily together with VPS13, ATG2, SHIP164, Csf1/BLTP1 proteins, which are all conserved lipid transfer proteins containing long hydrophobic grooves [, ]. They all share the same structure consisting of multiple repeating modules comprising five β-sheets followed by a loop.This entry represents the sixth RBG unit within FMP27.
Protein Domain
Name: Histone H1-like Hc1
Type: Family
Description: This entry represents a family that includes Histone H1-like protein HC1 from Chlamydia pneumoniae and similar proteins from bacteria and some archaeal species. The gene coding for HC1 is expressed only during the late stages of the chlamydial life cycle concomitant with the reorganisation of chlamydial reticulate bodies into elementary bodies, suggesting that the HC1 protein plays a role in the condensation of chlamydial chromatin during intracellular differentiation, in a role analogous to that of eukaryotic histone proteins [ ].
Protein Domain
Name: Micronemal adhesive repeat, sialic-acid binding
Type: Repeat
Description: This entry represents a novel carbohydrate-binding domain found on micronemal proteins. Micronemal proteins (MICs) are released onto the parasite surface just before invasion of host cells and play important roles in host cell recognition, attachment and penetration. Toxoplasma gondii can infect and replicate within all nucleated cells [ ]. This domain interacts with sialylated oligosaccharides; the protein in T. gondii is a monomer but several MAR domains are carried on the protein. Each MAR domain contains one central sialic acid-binding pocket [].
Protein Domain
Name: SUMO-activating enzyme subunit Uba2
Type: Family
Description: The SUMO activating enzyme E1 enzyme facilitates conjugation of ubiquitin and ubiquitin-like proteins through adenylation, thioester transfer within E1, and thioester transfer from E1 to E2 conjugating proteins [ ]. This entry represents a subunit of the E1 enzyme, including Uba2 from yeasts [] and its homologue, SAE2 from animals []. E1 enzymes mediates ATP-dependent activation of SUMO proteins followed by formation of a thioester bond between a SUMO protein and a conserved active site cysteine residue on UBA2/SAE2 [, ].
Protein Domain
Name: ATPase, type I secretion system
Type: Family
Description: Type I protein secretion is a system in some Gram-negative bacteria to export proteins (often proteases) across both inner and outer membranes to the extracellular medium. This is one of three proteins of the type I secretion apparatus. Targeted proteins are not cleaved at the N terminus, but rather carry signals located toward the extreme C terminus to direct type I secretion. This model is related to models and , and to bacteriocin ABC transporters that cleave their substrates during export.
Protein Domain
Name: Type III secretion system effector delivery regulator TyeA-related
Type: Domain
Description: This entry represents a sequence region found in both small proteins of about 90 amino acids where it covers the whole sequence, and longer proteins of about 360 residues where it occurs in the C-terminal region. Some of the longer proteins (HrpJ) have N-terminal regions that contain . These proteins belong to bacterial type III secretion systems, and include TyeA from the well-studied Yersinia systems. TyeA appears to be involved in calcium-responsive regulation of the delivery of type III effectors.
Protein Domain
Name: Phosphotyrosine interaction domain containing 1
Type: Family
Description: PID1 (PTB-containing, cubilin and LRP1-interacting protein; also known as NYGGF4) is a phosphotyrosine-binding (PTB) domain-containing protein. It is an inhibitor of insulin-mediated signaling in adipocytes and muscle cells. It binds through its PTB domain to the second NPXY motif in the cytoplasmic tail of the low density lipoprotein receptor-related protein 1 (LRP1)[ ]. Besides being involved in obesity-associated insulin resistance [], PID1 has been shown to inhibit growth of medulloblastoma, glioblastoma and atypical teratoid rhabdoid tumour cell lines [ ].
Protein Domain
Name: Polysaccharide chain length determinant N-terminal domain
Type: Domain
Description: A number of related proteins are involved in the synthesis of lipopolysaccharide, O-antigen polysaccharide, capsule polysaccharide and exopolysaccharides. Chain length determinant protein (or wzz protein) is involved in lipopolysaccharide (lps) biosynthesis, conferring a modal distribution of chain length on the O-antigen component of lps [ ]. It gives rise to a reduced number of short chain molecules and increases in numbers of longer molecules, with a modal value of 20. The MPA/MPA2 proteins function in CPS and EPS polymerisation and export [].
Protein Domain
Name: MyBP-C, tri-helix bundle domain
Type: Domain
Description: This domain can be found in the myosin-binding motif (m-domain) region present in myosin-binding protein C (MyBP-C). MyBP-C is a sarcomeric assembly protein necessary for the regulation of sarcomere structure and function [ ]. The MyBP-C family of proteins consists mainly of modules with immunoglobulin (Ig) or fibronectin folds []. This domain exhibits a three-helix bundle fold and there is a known actin-binding motif, LK(R/K)XK positioned in the third helix (alpha3), similar to that found in villin and related proteins [].
Protein Domain
Name: Elongin-C
Type: Family
Description: Elongin-C is a highly conserved protein found in a variety of multiprotein complexes in human, rat, fly, worm, and yeast cells [ ]. Budding yeast elongin-C homologue, Elc1, forms a complex with Cul3 that is required for Pol II polyubiquitylation and degradation []. Elc1 also plays a role in global genomic repair []. In humans, elongin-C works as an adapter protein in the proteasomal degradation of target proteins via different E3 ubiquitin ligase complexes, including the von Hippel-Lindau ubiquitination complex CBC(VHL) [].
Protein Domain
Name: UPF0166
Type: Family
Description: UPF0166 protein family includes TM_0021 from Thermotoga maritima ( ) and other proteins predominantly found in bacteria but also in some archaea. TM_0021 is a putative PII-like signaling protein thought to be involved in the regulation of the nitrogen status in this organism. The protein adopts a trimeric assembly. Each monomer folds into an α+β sandwich with a four-stranded antiparallel β-sheet packed against two antiparallel α-helices resembling a ferredoxin-like fold, with a short β-hairpin inserted before the first α-helix [ ].
Protein Domain
Name: Bacteriophage PRD1, P2, C-terminal
Type: Homologous_superfamily
Description: The absorption protein P2 (synonym: receptor-binding protein P2) from the bacteriophage PRD1 is a multi-β-sheet protein whose complicated topology forms an elongated seahorse-shaped molecule with a distinct head, containing a pseudo-beta propeller structure with approximate 6-fold symmetry, and a tail (β-sandwich). It is required for the attachment of the phage to the host conjugative DNA transfer complex. This is a poorly understood large transmembrane complex of unknown architecture, with at least 11 different proteins [ ]. This entry represents the pseudo β-propeller region.
Protein Domain
Name: Bacteriophage PRD1, P2, N-terminal
Type: Homologous_superfamily
Description: The absorption protein P2 (synonym: receptor-binding protein P2) from the bacteriophage PRD1 is a multi-β-sheet protein whose complicated topology forms an elongated seahorse-shaped molecule with a distinct head, containing a pseudo-beta propeller structure with approximate 6-fold symmetry, and a tail (β-sandwich). It is required for the attachment of the phage to the host conjugative DNA transfer complex. This is a poorly understood large transmembrane complex of unknown architecture, with at least 11 different proteins [ ]. This entry represents the β-sandwich region.
Protein Domain
Name: STXBP6, SNARE domain
Type: Domain
Description: Syntaxin binding protein 6 (STXBP6, also called Amisyn) contains, beside the N-terminal PH-like domain, a C-terminal R-SNARE-like domain, which allows it to assemble into SNARE complexes, which in turn makes the complexes inactive and inhibits exocytosis [ ]. SNARE complexes mediate membrane fusion, important for trafficking of newly synthesized proteins, recycling of pre-existing proteins and organelle formation. SNARE proteins are classified into four groups, Qa-, Qb-, Qc- and R-SNAREs, with STXBP6 being a R-SNARE [, ].This entry represents the SNARE domain of STXBP6.
Protein Domain
Name: Costars domain
Type: Domain
Description: This domain is found both alone (in the costars family of proteins) and at the C terminus of actin-binding Rho-activating protein (ABRA). It binds to actin, and in muscle regulates the actin cytoskeleton and cell motility [ , ]. It has a winged helix-like fold consisting of three α-helices and four antiparallel beta strands. Unlike typical winged helix proteins it does not bind to DNA, but contains a hydrophobic groove which may be responsible for interaction with other proteins [].
Protein Domain
Name: CPH domain
Type: Domain
Description: The CPH domain is found in the Cullin-7, PARC and HERC2 proteins, which are all components of known or predicted E3-ubiquitin ligases. The CPH domain is a protein-protein interaction module that binds the teramerisation domain of the tumour suppressor protein p53 [ ]. Structurally it forms a β-barrel fold similar to the SH3, Tudor and KOW and domains. Unlike the SH3 and Tudor domains, which bind to small peptides, the CPH domain appears to bind to an extended surface on p53.
Protein Domain
Name: YlxR-like superfamily
Type: Homologous_superfamily
Description: This entry represents the YlxR protein of unknown function from Streptococcus pneumonia. YlxR is expressed from the nusA/infB operon in bacteria and belongs to a small protein family (COG2740) that shares a conserved sequence motif GRGA(Y/W). The YlxR structure resembles a two-layer alpha/beta sandwich with the overall shape of a cylinder. Structural analysis revealed that the YlxR structure represents a new protein fold that belongs to the α-β plait superfamily. YlxR is supposed to be an RNA-binding protein [ ].
Protein Domain
Name: Humanin family
Type: Family
Description: Humanin (HN) and humanin-like proteins are found exclusively in humans. Humanin is a short anti-apoptotic peptide that inhibits the activation of Bax (Bcl-2-associated X protein), which is involved in apoptosis [ ]. Humanin suppresses neuronal cell death caused by Alzheimer's disease (AD)-specific insults []. Humanin also interacts with the insulin-like growth factor-binding protein-3 (IGFBP-3), which is essential in the regulation of cell survival []. This entry represents the humanin family, and also matches some uncharacterised non-human proteins, mainly from chimpanzee.
Protein Domain
Name: SETD3, SET domain
Type: Domain
Description: This entry represents the SET domain found in SETD3 and related proteins.SETD3 is a protein-histidine N-methyltransferase that specifically mediates methylation of actin at 'His-73' [ ]. It was initially reported to have histone methyltransferase activity and methylate 'Lys-4' and 'Lys-36' of histone H3 (H3K4me and H3K36me). However, this conclusion was based on mass spectrometry data wherein mass shifts were inconsistent with a bona fide methylation event. In vitro, the protein-lysine methyltransferase activity is weak compared to the protein-histidine methyltransferase activity [].
Protein Domain
Name: Baculovirus occlusion-derived virus envelope, E25
Type: Family
Description: This family consists of several nucleopolyhedrovirus occlusion-derived virus envelope E25 proteins. The N terminus of this protein is extremely hydrophobic, studies suggest that this defined hydrophobic domain is sufficient to direct the protein toinduced membrane microvesicles within a baculovirus-infected cell nucleus and the viral envelope. In addition, movement of the protein into the nuclear envelope may initiate through cytoplasmic membranes, such as endoplasmic reticulum, andthat transport into the nucleus may be mediated through the outer and inner nuclear membrane [ ].
Protein Domain
Name: GPCR, family 2-like, transmembrane domain
Type: Domain
Description: G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups [ ]. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [, , , , ]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The secretin-like GPCRs include secretin [ ], calcitonin [], parathyroid hormone/parathyroid hormone-related peptides [] and vasoactive intestinal peptide [], all of which activate adenylyl cyclase and the phosphatidyl-inositol-calcium pathway. These receptors contain seven transmembrane regions, in a manner reminiscent of the rhodopsins and other receptors believed to interact with G-proteins (however there is no significant sequence identity between these families, the secretin-like receptors thus bear their own unique '7TM' signature). Their N-terminal is probably located on the extracellular side of the membrane and potentially glycosylated. This N-terminal region contains a long conserved region which allows the binding of large peptidic ligand such as glucagon, secretin, VIP and PACAP; this region contains five conserved cysteines residues which could be involved in disulphide bond. The C-terminal region of these receptor is probably cytoplasmic. Every receptor gene in this family is encoded on multiple exons, and several of these genes are alternatively spliced to yield functionally distinct products. This entry represents the transmembrane domain of family 2 GPCR receptor proteins and Frizzled proteins.
Protein Domain
Name: Proteinase inhibitor I25, cystatin, conserved site
Type: Conserved_site
Description: Cystatins are cysteine proteinase inhibitors belonging to MEROPS inhibitor family I25, clan IH [ , , ]. They mainly inhibit peptidases belonging to peptidase families C1 (papain family) and C13 (legumain family). The cystatin family includes:The Type 1 cystatins, which are intracellular cystatins that are present in the cytosol of many cell types, but can also appear in body fluids at significant concentrations. They are single-chain polypeptides of about 100 residues, which have neither disulphide bonds nor carbohydrate side chains. The Type 2 cystatins, which are mainly extracellular secreted polypeptides synthesised with a 19-28 residue signal peptide. They are broadly distributed and found in most body fluids. The Type 3 cystatins, which are multidomain proteins. The mammalian representatives of this group are the kininogens. There are three different kininogens in mammals: H- (high molecular mass, ) and L- (low molecular mass) kininogen which are found in a number of species, and T-kininogen that is found only in rat. Unclassified cystatins. These are cystatin-like proteins found in a range of organisms: plant phytocystatins, fetuin in mammals, insect cystatins and a puff adder venom cystatin which inhibits metalloproteases of the MEROPS peptidase family M12 (astacin/adamalysin). Also a number of the cystatins-like proteins have been shown to be devoid of inhibitory activity. All true cystatins inhibit cysteine peptidases of the papain family (MEROPS peptidase family C1), and some also inhibit legumain family enzymes (MEROPS peptidase family C13). These peptidases play key roles in physiological processes, such as intracellular protein degradation (cathepsins B, H and L), are pivotal in the remodelling of bone (cathepsin K), and may be important in the control of antigen presentation (cathepsin S, mammalian legumain). Moreover, the activities of such peptidases are increased in pathophysiological conditions, such as cancer metastasis and inflammation. Additionally, such peptidases are essential for several pathogenic parasites and bacteria. Thus in animals cystatins not only have capacity to regulate normal body processes and perhaps cause disease when down-regulated, but in other organisms may also participate in defence against biotic and abiotic stress. This entry represents a conserved region found in cystatins which includes five conserved residues proposed to be important for binding to cysteine proteases.
Protein Domain
Name: Mediator complex, subunit Med20
Type: Family
Description: The Mediator complex is a coactivator involved in the regulated transcription of nearly all RNA polymerase II-dependent genes. Mediator functions as a bridge to convey information from gene-specific regulatory proteins to the basal RNA polymerase II transcription machinery. The Mediator complex, having a compact conformation in its free form, is recruited to promoters by direct interactions with regulatory proteins and serves for the assembly of a functional preinitiation complex with RNA polymerase II and the general transcription factors. On recruitment the Mediator complex unfolds to an extended conformation and partially surrounds RNA polymerase II, specifically interacting with the unphosphorylated form of the C-terminal domain (CTD) of RNA polymerase II. The Mediator complex dissociates from the RNA polymerase II holoenzyme and stays at the promoter when transcriptional elongation begins. The Mediator complex is composed of at least 31 subunits: MED1, MED4, MED6, MED7, MED8, MED9, MED10, MED11, MED12, MED13, MED13L, MED14, MED15, MED16, MED17, MED18, MED19, MED20, MED21, MED22, MED23, MED24, MED25, MED26, MED27, MED29, MED30, MED31, CCNC, CDK8 and CDC2L6/CDK11. The subunits form at least three structurally distinct submodules. The head and the middle modules interact directly with RNA polymerase II, whereas the elongated tail module interacts with gene-specific regulatory proteins. Mediator containing the CDK8 module is less active than Mediator lacking this module in supporting transcriptional activation. The head module contains: MED6, MED8, MED11, SRB4/MED17, SRB5/MED18, ROX3/MED19, SRB2/MED20 and SRB6/MED22. The middle module contains: MED1, MED4, NUT1/MED5, MED7, CSE2/MED9, NUT2/MED10, SRB7/MED21 and SOH1/MED31. CSE2/MED9 interacts directly with MED4. The tail module contains: MED2, PGD1/MED3, RGR1/MED14, GAL11/MED15 and SIN4/MED16. The CDK8 module contains: MED12, MED13, CCNC and CDK8. Individual preparations of the Mediator complex lacking one or more distinct subunits have been variously termed ARC, CRSP, DRIP, PC2, SMCC and TRAP.Proteins in this entry are subunit Med20 of the Mediator complex, and is found in the non-essential part of the head [ ]. and related to the TATA-binding protein (TBP). TBP is a highly conserved RNA polymerase II general transcription factor that binds to the core promoter and initiates assembly of the pre-initiation complex. Human TRF has been shown to associate with an RNA polymerase II-SRB complex [].
Protein Domain
Name: U-box domain
Type: Domain
Description: This entry represents the U-box domain.The molecular mechanism underlying the transfer of ubiquitin (Ub) to a substrate consists of three key enzymatic steps. First, ubiquitin itself is adenylated at its C-terminal glycine residue by an activating enzyme (E1). Second, the adenylated Ub forms a covalent linkage to a conjugating enzyme (E2). Finally, a ligating enzyme (E3) recruits both the Ub-charged E2 species and the target protein. There are three classes of E3 enzymes- HECT, RING, and U-box, which are distinguished on the basis of their E2-recruiting domains. The U-box and RING classes of E3 ligases act as scaffolding molecules that recruit and colocalize both a Ub-charged E2 and the substrate concomitantly. The recruitement of substrate in these proteins involves protein interaction modules such as a WD-40 repeat, TPR, and armadillo repeat domains. In addition to a common organisation, the architecture of U-box and RING domains are similar. Both contain a central α-helix flanked by two surface-exposed loops arranged in a cross-brace formation. the structure of RING domains is built around two zinc binding sites that are critical to its stability. In contrast, U-boxes do not bind zinc but have evolved instead networks of hydrogen bonds and salt bridges in corresponding location in the structure. Other similarities between these two domains include an antiparallel β-sheet type arrangement involving the first surface exposed loop and the central alpha helix. The β-sheet is stabilised by highly conserved hydrophobic residues responsible for the core packing and stability of the molecule. Most U-box and RING domain structures also contain an elongated C-terminal helix. The physical basis and physiological rationale for evolving distinct U-box and RING E3 ligases are not yet known [ , , ].The U-box is a domain of ~70 amino acids that is present in proteins from yeast to human. It consists of the β-β-α-β-alpha-fold typical of U-box and RING domains (see PDB:2QIZ). The central alpha helix is flanked by two prominent surface-exposed loop regions. The characteristic network of hydrogen bonds within each loop stabilises the overall structure. The U-box protein appear to catalyze their own ubiquitination as well as that of heterologous substrate [ , , ].
Protein Domain
Name: Peptidase M24A, methionine aminopeptidase, subfamily 2, binding site
Type: Binding_site
Description: Over 70 metallopeptidase families have been identified to date. In these enzymes a divalent cation which is usually zinc, but may be cobalt, manganese or copper, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. In some families of co-catalytic metallopeptidases, two metal ions are observed in crystal structures ligated by five amino acids, with one amino acid ligating both metal ions. The known metal ligands are His, Glu, Asp or Lys. At least one other residue is required for catalysis, which may play an electrophillic role. Many metalloproteases contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site []. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as 'abXHEbbHbc', where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases [].This group of metallopeptidases belong to MEROPS peptidase family M24 (clan MG), subfamily M24A.Methionine aminopeptidase ( ) (MAP) is responsible for the removal of the amino-terminal (initiator) methionine from nascent eukaryotic cytosolic and cytoplasmic prokaryotic proteins if the penultimate amino acid is small and uncharged. All MAP studied to date are monomeric proteins that require cobalt ions for activity. Two subfamilies of MAP enzymes are known to exist [ , ]. While being evolutionary related, they only share a limited amount of sequence similarity mostly clustered around the residues shown, in the Escherichia coli MAP [ ], to be involved in cobalt-binding. The first family consists of enzymes from prokaryotes as well as eukaryotic MAP-1, while the second group () is made up of archaeal MAP and eukaryotic MAP-2 and includes proteins which do not seem to be MAP, but that are clearly evolutionary related such as mouse proliferation-associated protein 1 and fission yeast curved DNA-binding protein. This entry represents a cobalt binding site.
Protein Domain
Name: Actin/actin-like conserved site
Type: Conserved_site
Description: Actin [ , ] is a ubiquitous protein involved in the formation of filaments that are major components of the cytoskeleton. These filaments interact with myosin to produce a sliding effect, which is the basis of muscular contraction and many aspects of cell motility, including cytokinesis. Each actin protomer binds one molecule of ATP and has one high affinity site for either calcium or magnesium ions, as well as several low affinity sites. Actin exists as a monomer in low salt concentrations, but filaments form rapidly as salt concentration rises, with the consequent hydrolysis of ATP. Actin from many sources forms a tight complex with deoxyribonuclease (DNase I) although the significance of this is still unknown. The formation of this complex results in the inhibition of DNase I activity, and actin loses its ability to polymerise. It has been shown that an ATPase domain of actin shares similarity with ATPase domains of hexokinase and hsp70proteins [ , ].In vertebrates there are three groups of actin isoforms: alpha, beta and gamma. The alpha actins are found in muscle tissues and are a major constituent of the contractile apparatus. The beta and gamma actins co-exists in most cell types as components of the cytoskeleton and as mediators of internal cell motility. In plants there are many isoforms which are probably involved in a variety of functions such as cytoplasmic streaming, cell shape determination, tip growth, graviperception, cell wall deposition, etc.Recently some divergent actin-like proteins have been identified in several species. These proteins include centractin (actin-RPV) from mammals, fungi yeast ACT5, Neurospora crassa ro-4 and Pneumocystis carinii, which seems to be a component of a multi-subunit centrosomal complex involved in microtubule based vesicle motility (this subfamily is known as ARP1); ARP2 subfamily, which includes chicken ACTL, Saccharomyces cerevisiae ACT2, Drosophila melanogaster 14D and Caenorhabditis elegans actC; ARP3 subfamily, which includes actin 2 from mammals, Drosophila 66B, yeast ACT4 and Schizosaccharomyces pombe act2; and ARP4 subfamily, which includes yeast ACT3 and Drosophila 13E.This entry contains a signature that picks up both actins and the actin-like proteins and corresponds to positions 106 to 118 in actins.
Protein Domain
Name: XPG/Rad2 endonuclease
Type: Family
Description: Xeroderma pigmentosum (XP) [ ] is a human autosomal recessive disease, characterised by a high incidence of sunlight-induced skin cancer. People's skin cells with this condition are hypersensitive to ultraviolet light, due to defects in the incision step of DNA excision repair. There are a minimum of seven genetic complementation groups involved in this pathway: XP-A to XP-G. XP-G is one of the most rare and phenotypically heterogeneous of XP, showing anything from slight to extreme dysfunction in DNA excision repair [, ]. XP-G can be corrected by a 133 Kd nuclear protein, XPGC []. XPGC is an acidic protein that confers normal UV resistance in expressing cells []. It is a magnesium-dependent, single-strand DNA endonuclease that makes structure-specific endonucleolytic incisions in a DNA substrate containing a duplex region and single-stranded arms [, ]. XPGC cleaves one strand of the duplex at the border with the single-stranded region [].XPG (ERCC-5) belongs to a family of proteins that includes RAD2 from Saccharomyces cerevisiae (Baker's yeast) and rad13 from Schizosaccharomyces pombe (Fission yeast), which are single-stranded DNA endonucleases [ , , ]; mouse and human FEN-1, a structure-specific endonuclease; RAD2 from fission yeast and RAD27 from budding yeast; fission yeast exo1, a 5'-3' double-stranded DNA exonuclease that may act in a pathway that corrects mismatched base pairs; yeast DHS1, and yeast DIN7. Sequence alignment of this family of proteins reveals that similarities are largely confined to two regions. The first is located at the N-terminal extremity (N-region) and corresponds to the first 95 to 105 amino acids. The second region is internal (I-region) and found towards the C terminus; it spans about 140 residues and contains a highly conserved core of 27 amino acids that includes a conserved pentapeptide (E-A-[DE]-A-[QS]). It is possible that the conserved acidic residues are involved in the catalytic mechanism of DNA excision repair in XPG. The amino acids linking the N- and I-regions are not conserved.Proteins in this family also includes yeast Mkt1, which is a post-transcriptional regulator. It contains two domains, XPG-N and XPG-I, which are conserved among a family of nucleases. However, it contains only two of the seven Asp residues involved in Mg2 binding suggesting that it has no nuclease activity [ , ].
Protein Domain
Name: Photosystem II PsbJ
Type: Family
Description: Oxygenic photosynthesis uses two multi-subunit photosystems (I and II) located in the cell membranes of cyanobacteria and in the thylakoid membranes of chloroplasts in plants and algae. Photosystem II (PSII) has a P680 reaction centre containing chlorophyll 'a' that uses light energy to carry out the oxidation (splitting) of water molecules, and to produce ATP via a proton pump. Photosystem I (PSI) has a P700 reaction centre containing chlorophyll that takes the electron and associated hydrogen donated from PSII to reduce NADP+ to NADPH. Both ATP and NADPH are subsequently used in the light-independent reactions to convert carbon dioxide to glucose using the hydrogen atom extracted from water by PSII, releasing oxygen as a by-product.PSII is a multisubunit protein-pigment complex containing polypeptides both intrinsic and extrinsic to the photosynthetic membrane [ , , ]. Within the core of the complex, the chlorophyll and beta-carotene pigments are mainly bound to the antenna proteins CP43 (PsbC) and CP47 (PsbB), which pass the excitation energy on to the reaction centre proteins D1 (Qb, PsbA) and D2 (Qa, PsbD) that bind all the redox-active cofactors involved in the energy conversion process. The PSII oxygen-evolving complex (OEC) oxidises water to provide protons for use by PSI, and consists of OEE1 (PsbO), OEE2 (PsbP) and OEE3 (PsbQ). The remaining subunits in PSII are of low molecular weight (less than 10kDa), and are involved in PSII assembly, stabilisation, dimerisation, and photo-protection []. This entry represents the low molecular weight transmembrane protein PsbJ found in PSII. PsbJ is one of the most hydrophobic proteins in the thylakoid membrane, and is located in a gene cluster with PsbE, PsbF and PsbL (PsbEFJL). Both PsbJ and PsbL ( ) are essential for proper assembly of the OEC. Mutations in PsbJ cause the light-harvesting antenna to remain detached from the PSII dimers [ ]. In addition, both PsbJ and PsbL are involved in the unidirectional flow of electrons, where PsbJ regulates the forward electron flow from D2 (Qa) to the plastoquinone pool, and PsbL prevents the reduction of PSII by back electron flow from plastoquinol protecting PSII from photo-inactivation [].
Protein Domain
Name: Arterivirus papain-like cysteine protease beta (PCPbeta) domain superfamily
Type: Homologous_superfamily
Description: Arteriviruses are enveloped, positive-stranded RNA viruses and include pathogens of major economic concern to the swine- and horse-breedingindustries: Equine arteritis virus (EAV).Porcine reproductive and respiratory syndrome virus (PRRSV).Mice actate dehydrogenase-elevating virus.Simian hemorrhagic fever virus.The arterivirus replicase gene is composed of two open reading frames (ORFs).ORF1a is translated directly from the genomic RNA, whereas ORF1b can be expressed only by ribosomal frameshifting, yelding a 1ab fusion protein. Bothreplicase gene products are multidomain precursor proteins which are proteolytically processed into functional nonstructural proteins (nsps) by acomplex proteolytic cascade that is directed by four (PRRSV/LDV) or three(EAV) proteinase domains encoded in ORF1a. The arterivirus replicase processing scheme involves the rapid autoproteolytic release of two or threeN-terminal nsps (nsp1 (or nsp1alpha/1beta) and nsp2) and the subsequent processing of the remaining polyproteins by the "main protease"residing in nsp4, together resulting in a set of 13 or 14 individual nsps. The arterivirus nsp1 region contains a tandem ofpapain-like cysteine autoprotease domains (PCPalpha and PCPbeta), but in EAV PCPalpha has lost its enzymatic activity, resulting in the 'merge' ofnsp1alpha and nsp1beta into a single nsp1 subunit. Thus, instead of three self-cleaving N-terminal subunits, EAV has two: nsp1 and nsp2. The PCPalphaand PCPbeta domains mediate the nsp1alpha|1beta and nsp1beta|2 cleavages, respectively. The catalytic dyad of PCPalpha and PCPbeta domains is composedof Cys and His residues. In EAV, a Lys residue is found in place of the catalytic Cys residue, which explains the proteolytic deficiency of the EAVPCPalpha domain [ , , , ]. The PCPalpha and PCPbeta domains form respectivelypeptidase families C31 and C32. The PCPalpha and PCPbeta domains have a typical papain fold, which consists ofa compact global region containing sequentially connected left (L) and right (R) parts in a so-called standard orientation. The L subdomain of PCPalphaconsists of four α-helices, while the R subdomain is formed by three antiparallel beta strands []. The L subdomain of the PCBbetaconsists of three α-helices, while the R subdomain is formed by four antiparallel β-strands []. The Cys and His residues faceeach other at the L-R interface and form the catalytic centre of the PCPalpha and PCPbeta domains [, ].This entry represents the PCPbeta domain (peptidase C32) superfamily.
Protein Domain
Name: Arterivirus papain-like cysteine protease alpha (PCPalpha) domain superfamily
Type: Homologous_superfamily
Description: Arteriviruses are enveloped, positive-stranded RNA viruses and include pathogens of major economic concern to the swine- and horse-breedingindustries:Equine arteritis virus (EAV).Porcine reproductive and respiratory syndrome virus (PRRSV).Mice actate dehydrogenase-elevating virus.Simian hemorrhagic fever virus.The arterivirus replicase gene is composed of two open reading frames (ORFs). ORF1a is translated directly from the genomic RNA, whereas ORF1b can beexpressed only by ribosomal frameshifting, yelding a 1ab fusion protein. Both replicase gene products are multidomain precursor proteins which areproteolytically processed into functional nonstructural proteins (nsps) by a complex proteolytic cascade that is directed by four (PRRSV/LDV) or three(EAV) proteinase domains encoded in ORF1a. The arterivirus replicase processing scheme involves the rapid autoproteolytic release of two or threeN-terminal nsps (nsp1 (or nsp1alpha/1beta) and nsp2) and the subsequent processing of the remaining polyproteins by the "main protease"residing in nsp4, together resulting in a set of 13 or 14 individual nsps. The arterivirus nsp1 region contains a tandem of papain-like cysteine autoprotease domains (PCPalpha and PCPbeta), but in EAVPCPalpha has lost its enzymatic activity, resulting in the 'merge' of nsp1alpha and nsp1beta into a single nsp1 subunit. Thus, instead of threeself-cleaving N-terminal subunits, EAV has two: nsp1 and nsp2. The PCPalpha and PCPbeta domains mediate the nsp1alpha|1beta and nsp1beta|2 cleavages,respectively. The catalytic dyad of PCPalpha and PCPbeta domains is composed of Cys and His residues. In EAV, a Lys residue is found in place of thecatalytic Cys residue, which explains the proteolytic deficiency of the EAV PCPalpha domain [, , , ]. The PCPalpha and PCPbeta domains form respectively MEROPSpeptidase families C31 and C32. The PCPalpha and PCPbeta domains have a typical papain fold, which consists of a compact global region containing sequentially connected left (L) and right(R) parts in a so-called standard orientation. The L subdomain of PCPalpha consists of four α-helices, while the R subdomain is formed by threeantiparallel beta strands [ ]. The L subdomain of the PCBbetaconsists of three α-helices, while the R subdomain is formed by four antiparallel β-strands []. The Cys and His residues faceeach other at the L-R interface and form the catalytic centre of the PCPalpha and PCPbeta domains [, ].This entry represents the PCPalpha domain (peptidase C31) superfamily.
Protein Domain
Name: Oxytocin receptor
Type: Family
Description: G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups [ ]. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [, , , , ]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The rhodopsin-like GPCRs (GPCRA) represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7 transmembrane (TM) helices [, , ].Vasopressin and oxytocin are members of the neurohypophyseal hormone family found in all mammalian species. They are present in high levels in theposterior pituitary. Oxytocin stimulates contraction of uterine smooth muscle, and stimulates milk secretion in response to suckling by inducingcontraction of myoepithelial cells in the mammary gland. Clinically, it is used to induce labour and promote lactation.Oxytocin receptors are found in uterine smooth muscle, myoepithelial cells in the mammary gland, and in the pituitary. Activation ofphosphoinositide metabolism is effected via a pertussis-toxin-insensitive G-protein, probably of the Gq/G11 class.
Protein Domain
Name: Chemerin-like receptor 2
Type: Family
Description: G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups [ ]. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [, , , , ]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The rhodopsin-like GPCRs (GPCRA) represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7 transmembrane (TM) helices [ , , ].Chemerin-like receptor 2 (CML2, also known as GPCR1), is a receptor for chemoattractant adipokine chemerin/RARRES2 that may have a role for in the regulation of inflammation and energy homeostasis [ , ]. This protein also acts also as a receptor for TAFA1, mediates its effects on neuronal stem-cell proliferation and differentiation via the activation of ROCK/ERK and ROCK/STAT3 signaling pathway []. In humans, GPR1 is expressed in the human hippocampus []. By contrast, the rat GPR1 gene is not expressed in hippocampus, demonstrating a functional variation for this receptor in these species [ ].
Protein Domain
Name: Chemerin-like receptor 1
Type: Family
Description: G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups [ ]. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [, , , , ]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The rhodopsin-like GPCRs (GPCRA) represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7 transmembrane (TM) helices [ , , ].Chemerin-like receptor 1 (CML1, also known as ChemR23 and DEZ), is a GPCR for the chemoattractant adipokine chemerin, and for the omega-3 fatty acid derived molecule resolvin E1 [ , ], mainly found in chordates. Interaction with chemerin induces activation of the MAPK and PI3K signaling pathways leading to downstream functional effects, such as a decrease in immune responses, stimulation of adipogenesis, and angiogenesis [, ]. Resolvin E1 decreases pro-inflammatory cytokine expression and enhances macrophage phagocytic activity by regulation of the NFkB pathway []. This protein is prominently expressed in dendritic cells and macrophages [].
Protein Domain
Name: Platelet-activating factor receptor
Type: Family
Description: G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups [ ]. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [, , , , ]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The rhodopsin-like GPCRs (GPCRA) represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7 transmembrane (TM) helices [ , , ].Platelet-activating factor (PAF), a unique phospholipid mediator, possesses potent proinflammatory, smooth-muscle contractile and hypotensive activities,and appears to be crucial in the pathogenesis of bronchial asthma and in the lethality of endotoxin and anaphylactic shock [, ]. However, little isknown of the molecular properties of the PAF receptor and related signal transduction systems []. The gene for the human PAF receptor (PAFR) hasbeen isolated, and encodes a protein that is highly similar to the guinea pig PAF receptor. Analysis of somatic cell hybrids suggests that PAFR is encoded by a single gene on human chromosome 1 [].
Protein Domain
Name: Frizzled-6, transmembrane domain
Type: Domain
Description: This entry represents the transmembrane domain of Frizzled-6, which is predicted to contain seven transmembrane α-helices. Frizzled-6 deficiency results in a severe midbrain morphogenesis defect in mice [ ]. In humans, mutations in Frizzled-6 have been described to cause nail dysplasia []. Frizzleds are seven transmembrane-spanning proteins that constitute an unconventional class of G protein-coupled receptors [ ]. They have important regulatory roles during embryonic development [, ].Frizzleds expose their large N terminus on the extracellular side. The N-terminal, extracellular cysteine-rich domain (CRD) has been implicated as the Wnt binding domain and its structure has been solved [ ]. The cysteine-rich domain of Frizzled (Fz) is shared with other receptor tyrosine kinases that have roles in development including the muscle-specific receptor tyrosine kinase (MuSK), the neuronal specific kinase (NSK2), and ROR1 and ROR2. The cytoplasmic side of many Fz proteins has been shown to interact with the PDZ domains of PSD-95 family members and is thought to have a role in the assembly of signalling complexes. The conserved cytoplasmic motif of Fz, Lys-Thr-X-X-X-Trp, is required for activation of the beta-catenin pathway, and for membrane localisation and phosphorylation of Dsh.In Drosophila melanogaster, the frizzled locus is involved in planar cell polarity, which is the coordination of the cytoskeleton of epidermal cells to produce a parallel array of cuticular hairs and bristles [ , ]. In the wild-type wing, all hairs point towards the distal tip [], whereas in Fz mutants, the orientation of individual hairs with respect both to their neighbours and to the organism as a whole is altered. In the developing wing, Fz function is required for cells to respond to the extracellular polarity signal as well as the proximal-distal transmission of an intracellular polarity signal.In Caenorhabditis elegans, protein mom-5 is the equivalent of frizzled [ ].Three main signalling pathways are activated by agonist-activated Frizzled proteins: the Fz/beta-catenin pathway, the Fz/Ca2+ pathway and the Fz/PCP (planar cell polarity) pathway [ ]. The Wnt/beta-catenin pathway is the best studied signalling pathway involving Fz receptors. In the Wnt/beta-catenin pathway the first downstream cytoplasmic components activated by Fz signalling include Dishevelled (Dsh) and/or its regulatory kinases.
USDA
InterMine logo
The Legume Information System (LIS) is a research project of the USDA-ARS:Corn Insects and Crop Genetics Research in Ames, IA.
LegumeMine || ArachisMine | CicerMine | GlycineMine | LensMine | LupinusMine | PhaseolusMine | VignaMine | MedicagoMine
InterMine © 2002 - 2022 Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, United Kingdom