Search our database by keyword

- or -

Examples

  • Search this entire website. Enter identifiers, names or keywords for genes, pathways, authors, ontology terms, etc. (e.g. eve, embryo, zen, allele)
  • Use OR to search for either of two terms (e.g. fly OR drosophila) or quotation marks to search for phrases (e.g. "dna binding").
  • Boolean search syntax is supported: e.g. dros* for partial matches or fly AND NOT embryo to exclude a term

Search results 11301 to 11400 out of 30763 for seed protein

Category restricted to ProteinDomain (x)

0.03s

Categories

Category: ProteinDomain
Type Details Score
Protein Domain
Name: G-protein alpha subunit, group Q
Type: Family
Description: Guanine nucleotide binding proteins (G proteins) are membrane-associated, heterotrimeric proteins composed of three subunits: alpha ( ), beta ( ) and gamma ( ) [ ]. G proteins and their receptors (GPCRs) form one of the most prevalent signalling systems in mammalian cells, regulating systems as diverse as sensory perception, cell growth and hormonal regulation []. At the cell surface, the binding of ligands such as hormones and neurotransmitters to a GPCR activates the receptor by causing a conformational change, which in turn activates the bound G protein on the intracellular-side of the membrane. The activated receptor promotes the exchange of bound GDP for GTP on the G protein alpha subunit. GTP binding changes the conformation of switch regions within the alpha subunit, which allows the bound trimeric G protein (inactive) to be released from the receptor, and to dissociate into active alpha subunit (GTP-bound) and beta/gamma dimer. The alpha subunit and the beta/gamma dimer go on to activate distinct downstream effectors, such as adenylyl cyclase, phosphodiesterases, phospholipase C, and ion channels. These effectors in turn regulate the intracellular concentrations of secondary messengers, such as cAMP, diacylglycerol, sodium or calcium cations, which ultimately lead to a physiological response, usually via the downstream regulation of gene transcription. The cycle is completed by the hydrolysis of alpha subunit-bound GTP to GDP, resulting in the re-association of the alpha and beta/gamma subunits and their binding to the receptor, which terminates the signal [ ]. The length of the G protein signal is controlled by the duration of the GTP-bound alpha subunit, which can be regulated by RGS (regulator of G protein signalling) proteins or by covalent modifications [].G protein alpha subunits are 350-400 amino acids in length and have molecular weights in the range 40-45kDa. Seventeen distinct types ofalpha subunit have been identified in mammals. These fall into 4 main groups on the basis of both sequence similarity and function: alpha-S (), alpha-Q (), alpha-I ( )and alpha-12( ) [ ].The specific combination of subunits in heterotrimeric G proteins affects not only which receptor it can bind to, but also which downstream target is affected, providing the means to target specific physiological processes in response to specific external stimuli [ , ]. G proteins carry lipid modifications on one or more of their subunits to target them to the plasma membrane and to contribute to protein interactions.This family consists of the G protein alpha-Q subunit, which includes both G alpha-Q and G alpha-11. G alpha-Q proteins are widely expressed, toxin-insensitive proteins that couple various receptors to the effector enzyme phospholipase C (PLC), causing the release of calcium from internal stores via calcium channel activation. For example, in Drosophila, G alpha-Q proteins are involved in phototransduction, which is initiated by the activation of rhodopsin by light and proceeds through a photoreceptor-specific G alpha-Q protein that activates PLC, which in turn affects a calcium channel causing depolarisation of the photoreceptor cell [ ].
Protein Domain
Name: M matrix/glycoprotein, SARS-CoV-like
Type: Family
Description: The membrane (M) protein is the most abundant structural protein and defines the shape of the viral envelope. It is also regarded as the central organiser of coronavirus assembly, interacting with all other major coronaviral structural proteins. M proteins play a critical role in protein-protein interactions (as well as protein-RNA interactions) since virus-like particle (VLP) formation in many CoVs requires only the M and envelope (E) proteins for efficient virion assembly [ ]. Interaction of spike (S) with M is necessary for retention of S in the ER-Golgi intermediate compartment (ERGIC)/Golgi complex and its incorporation into new virions, but dispensable for the assembly process. Binding of M to nucleocapsid (N) proteins stabilises the nucleocapsid (N protein-RNA complex), as well as the internal core of virions, and, ultimately, promotes completion of viral assembly. Together, M and E protein make up the viral envelope and their interaction is sufficient for the production and release of virus-like particles (VLPs) [ , , ].This entry contains the Membrane (M) protein of Severe acute respiratory syndrome coronavirus (SARS-CoV), SARS-CoV-2 (also known as 2019 novel CoV (2019-nCoV) or COVID-19 virus), and related proteins from betacoronaviruses in the sarbecovirus subgenera (B lineage). M protein from SARS-CoV-2 has high sequence identity to its SARS-CoV homologue and is predicted to have a small glycosylated amino-terminal ectodomain, a triple-membrane spanning domain, and a carboxyl-terminal endodomain. The C-terminal portion of coronaviral M proteins binds to the N protein within the cell membrane of the ER or Golgi complex, stabilizing the nucleocapsid and the core of the virion [ ].
Protein Domain
Name: Membrane insertase YidC/ALB3/OXA1/COX18
Type: Family
Description: This entry includes membrane insertase YidC from bacteria, ALBINO3-like proteins from plants, and mitochondrial membrane insertase OXA1 and cytochrome c oxidase assembly protein COX18 from eukaryotes. They are a group of evolutionarily conserved proteins that function in membrane protein integration and protein complex stabilization. They share a conserved region composed of five transmembrane regions [ ]. YidC is required for the insertion of integral membrane proteins into the membrane. It may also be involved in protein secretion processes [ ]. YidC from Gram-negative bacteria contains an extra transmembrane segment (TM1) at the N-terminal and a large periplasmic domain, located between TM1 and TM2, that adopts a β-super sandwich fold that is found in sugar-binding proteins such as galactose mutarotase [ , ]. The well-characterised YidC protein from Escherichia coli and its close homologues contain a large N-terminal periplasmic domain (). This protein interacts with SecYEG protein-conducting channel and the accessory proteins SecDF-YajC to form the bacterial holo-translocon (HTL) [ ].COX18 is a mitochondrial membrane insertase required for the translocation of the C-terminal of cytochrome c oxidase subunit II (MT-CO2/COX2) across the mitochondrial inner membrane. It plays a role in MT-CO2/COX2 maturation following the COX20-mediated stabilization of newly synthesized MT-CO2/COX2 protein and before the action of the metallochaperones SCO1/2 [ ].OXA1 is a mitochondrial inner membrane insertase that mediates the insertion of both mitochondrion-encoded precursors and nuclear-encoded proteins from the matrix into the inner membrane. It links mitoribosomes with the inner membrane [ ].Plant ALBINO3-like proteins are required for the insertion of some light harvesting chlorophyll-binding proteins (LHCP) into the chloroplast thylakoid membrane [ , ].
Protein Domain
Name: Small GTPase superfamily, SAR1-type
Type: Family
Description: Small GTPases form an independent superfamily within the larger class of regulatory GTP hydrolases. This superfamily contains proteins that control a vast number of important processes and possess a common, structurally preserved GTP-binding domain [ , ]. Sequence comparisons of small G proteins from various species have revealed that they are conserved in primary structures at the level of 30-55% similarity [].Crystallographic analysis of various small G proteins revealed the presence of a 20kDa catalytic domain that is unique for the whole superfamily [ , ]. The domain is built of five alpha helices (A1-A5), six β-strands (B1-B6) and five polypeptide loops (G1-G5). A structural comparison of the GTP- and GDP-bound form, allows one to distinguish two functional loop regions: switch I and switch II that surround the gamma-phosphate group of the nucleotide. The G1 loop (also called the P-loop) that connects the B1 strand and the A1 helix is responsible for the binding of the phosphate groups. The G3 loop provides residues for Mg2 and phosphate binding and is located at the N terminus of the A2 helix. The G1 and G3 loops are sequentially similar to Walker A and Walker B boxes that are found in other nucleotide binding motifs. The G2 loop connects the A1 helix and the B2 strand and contains a conserved Thr residue responsible for Mg2 binding. The guanine base is recognised by the G4 and G5 loops. The consensus sequence NKXD of the G4 loop contains Lys and Asp residues directly interacting with the nucleotide. Part of the G5 loop located between B6 and A5 acts as a recognition site for the guanine base [].The small GTPase superfamily can be divided into at least 8 different families, including:Arf small GTPases. GTP-binding proteins involved in protein trafficking by modulating vesicle budding and uncoating within the Golgi apparatus.Ran small GTPases. GTP-binding proteins involved in nucleocytoplasmic transport. Required for the import of proteins into the nucleus and also for RNA export.Rab small GTPases. GTP-binding proteins involved in vesicular traffic.Rho small GTPases. GTP-binding proteins that control cytoskeleton reorganisation.Ras small GTPases. GTP-binding proteins involved in signalling pathways.Sar1 small GTPases. Small GTPase component of the coat protein complex II (COPII) which promotes the formation of transport vesicles from the endoplasmic reticulum (ER).Mitochondrial Rho (Miro). Small GTPase domain found in mitochondrial proteins involved in mitochondrial trafficking.Roc small GTPases domain. Small GTPase domain always found associated with the COR domain.The SAR1 [ , ] protein, first identified in budding yeast, is a 21kDa GTP-binding protein involved in vesicular transport between the endoplasmic reticulum and the Golgi []. It is a GTP-binding protein that takes part in theformation of secretory vesicles by binding to an ER type II membrane protein, Sec12p []. It is evolutionary conserved and seems to be presentin all eukaryotes. SAR1 is generally included in the RAS 'superfamily' of small GTP-binding proteins, but it is only slightly related to other RAS proteins. It alsodiffers from RAS proteins in that it lacks cysteine residues at the C terminus and is therefore not subject to prenylation. SAR1 is slightly related to ARFs.
Protein Domain
Name: G-protein alpha subunit, group I
Type: Family
Description: Guanine nucleotide binding proteins (G proteins) are membrane-associated, heterotrimeric proteins composed of three subunits: alpha ( ), beta ( ) and gamma ( ) [ ]. G proteins and their receptors (GPCRs) form one of the most prevalent signalling systems in mammalian cells, regulating systems as diverse as sensory perception, cell growth and hormonal regulation []. At the cell surface, the binding of ligands such as hormones and neurotransmitters to a GPCR activates the receptor by causing a conformational change, which in turn activates the bound G protein on the intracellular-side of the membrane. The activated receptor promotes the exchange of bound GDP for GTP on the G protein alpha subunit. GTP binding changes the conformation of switch regions within the alpha subunit, which allows the bound trimeric G protein (inactive) to be released from the receptor, and to dissociate into active alpha subunit (GTP-bound) and beta/gamma dimer. The alpha subunit and the beta/gamma dimer go on to activate distinct downstream effectors, such as adenylyl cyclase, phosphodiesterases, phospholipase C, and ion channels. These effectors in turn regulate the intracellular concentrations of secondary messengers, such as cAMP, diacylglycerol, sodium or calcium cations, which ultimately lead to a physiological response, usually via the downstream regulation of gene transcription. The cycle is completed by the hydrolysis of alpha subunit-bound GTP to GDP, resulting in the re-association of the alpha and beta/gamma subunits and their binding to the receptor, which terminates the signal []. The length of the G protein signal is controlled by the duration of the GTP-bound alpha subunit, which can be regulated by RGS (regulator of G protein signalling) proteins or by covalent modifications [].G protein alpha subunits are 350-400 amino acids in length and have molecular weights in the range 40-45kDa. Seventeen distinct types ofalpha subunit have been identified in mammals. These fall into 4 main groups on the basis of both sequence similarity and function: alpha-S (), alpha-Q (), alpha-I ( )and alpha-12( ) [ ].The specific combination of subunits in heterotrimeric G proteins affects not only which receptor it can bind to, but also which downstream target is affected, providing the means to target specific physiological processes in response to specific external stimuli [ , ]. G proteins carry lipid modifications on one or more of their subunits to target them to the plasma membrane and to contribute to protein interactions.This family consists of the G protein alpha subunit group I (inhibitory), which includes G alpha-I, G alpha-O, G alpha-T and G alpha-Z. G alpha-I proteins were originally identified by their receptor-mediated inhibition of the cAMP-generating enzyme adenylyl cyclase. G alpha-O proteins are extremely abundant and are predominantly expressed in the brain, where they are known to couple receptors such as the M2 acetylcholine receptor to neuronal potassium and calcium channels, and have been implicated in membrane trafficking [ ].The G alpha-T protein, transducin, is responsible for transducing the light response in vertebrate retinas, which is initiated by the activation of rhodopsin or cone opsin by light and is transduced through transducin, which activates cGMP-phosphodiesterase, causing the cGMP-gated sodium channel to close and generating a nerve impulse through hyperpolarisation of cell membranes [ ]. G alpha-Z proteins are involved in calcium mobilisation [].
Protein Domain
Name: Preprotein translocase SecG subunit
Type: Family
Description: Secretion across the inner membrane in some Gram-negative bacteria occurs via the preprotein translocase pathway. Proteins are produced in the cytoplasm as precursors, and require a chaperone subunit to direct them tothe translocase component [ ]. From there, the mature proteins are either targeted to the outermembrane, or remain as periplasmic proteins. The translocase protein subunits are encoded on the bacterial chromosome.The translocase itself comprises 7 proteins, including a chaperone protein (SecB), an ATPase (SecA), an integral membrane complex (SecCY, SecE and SecG), and two additional membrane proteins that promote the release ofthe mature peptide into the periplasm (SecD and SecF) [ ]. The chaperone protein SecB [] is a highly acidic homotetrameric protein that exists as a "dimer of dimers"in the bacterial cytoplasm. SecB maintains preproteins in an unfolded state after translation, and targets these to the peripheral membraneprotein ATPase SecA for secretion [ ]. Together withSecY and SecG, SecE forms a multimeric channel through which preproteins are translocated, using both proton motive forces and ATP-driven secretion. The latter is mediated by SecA. SecG has two transmembrane domains, both of which contribute to the recognition of preprotein signalsequences by the translocation complex [ ]. The protein also undergoesmembrane topology inversion when coupled to the SecA cycle [ ].
Protein Domain
Name: Thioredoxin, conserved site
Type: Conserved_site
Description: Thioredoxins [ , , , ] are small disulphide-containing redox proteins that have been found in all the kingdoms of living organisms. Thioredoxin serves as a general protein disulphide oxidoreductase. It interacts with a broad range of proteins by a redox mechanism based on reversible oxidation of two cysteine thiol groups to a disulphide, accompanied by the transfer of two electrons and two protons. The net result is the covalent interconversion of a disulphide and a dithiol. In the NADPH-dependent protein disulphide reduction, thioredoxin reductase (TR) catalyses the reduction of oxidised thioredoxin (trx) by NADPH using FAD and its redox-active disulphide; reduced thioredoxin then directly reduces the disulphide in the substrate protein [].Thioredoxin is present in prokaryotes and eukaryotes and the sequence around the redox-active disulphide bond is well conserved. All thioredoxins contain a cis-proline located in a loop preceding β-strand 4, which makes contact with the active site cysteines, and is important for stability and function [ ]. Thioredoxin belongs to a structural family that includes glutaredoxin, glutathione peroxidase, bacterial protein disulphide isomerase DsbA, and the N-terminal domain of glutathione transferase []. Thioredoxins have a beta-alpha unit preceding the motif common to all these proteins.A number of eukaryotic proteins contain domains evolutionary related to thioredoxin, most of them are protein disulphide isomerases (PDI). PDI ( ) [ , , ] is an endoplasmic reticulum multi-functional enzyme that catalyses the formation and rearrangement of disulphide bonds during protein folding []. All PDI contains two or three (ERp72) copies of the thioredoxin domain, each of which contributes to disulphide isomerase activity, but which are functionally non-equivalent []. Moreover, PDI exhibits chaperone-like activity towards proteins that contain no disulphide bonds, i.e. behaving independently of its disulphide isomerase activity []. The various forms of PDI which are currently known are:PDI major isozyme; a multifunctional protein that also function as the beta subunit of prolyl 4-hydroxylase ( ), as a component of oligosaccharyl transferase ( ), as thyroxine deiodinase ( ), as glutathione-insulin transhydrogenase ( ) and as a thyroid hormone-binding protein ERp60 (ER-60; 58 Kd microsomal protein). ERp60 was originally thought to be a phosphoinositide-specific phospholipase C isozyme and later to be a protease.ERp72.ERp5.Bacterial proteins that act as thiol:disulphide interchange proteins that allows disulphide bond formation in some periplasmic proteins also contain a thioredoxin domain. These proteins include:Escherichia coli DsbA (or PrfA) and its orthologs in Vibrio cholerae (TtcpG) and Haemophilus influenzae (Por).E. coli DsbC (or XpRA) and its orthologues in Erwinia chrysanthemi and H. influenzae.E. coli DsbD (or DipZ) and its H. influenzae orthologue.E. coli DsbE (or CcmG) and orthologues in H. influenzae.Rhodobacter capsulatus (Rhodopseudomonas capsulata) (HelX), Rhiziobiacae (CycY and TlpA).This entry represents a conserved site found in the thioredoxin domain. This site contains two cysteines that form the redox-active disulphide bond.
Protein Domain
Name: Thioredoxin domain
Type: Domain
Description: This entry represents the thioredoxin domain.Thioredoxins [ , , , ] are small disulphide-containing redox proteins that have been found in all the kingdoms of living organisms. Thioredoxin serves as a general protein disulphide oxidoreductase. It interacts with a broad range of proteins by a redox mechanism based on reversible oxidation of two cysteine thiol groups to a disulphide, accompanied by the transfer of two electrons and two protons. The net result is the covalent interconversion of a disulphide and a dithiol. In the NADPH-dependent protein disulphide reduction, thioredoxin reductase (TR) catalyses the reduction of oxidised thioredoxin (trx) by NADPH using FAD and its redox-active disulphide; reduced thioredoxin then directly reduces the disulphide in the substrate protein [].Thioredoxin is present in prokaryotes and eukaryotes and the sequence around the redox-active disulphide bond is well conserved. All thioredoxins contain a cis-proline located in a loop preceding β-strand 4, which makes contact with the active site cysteines, and is important for stability and function [ ]. Thioredoxin belongs to a structural family that includes glutaredoxin, glutathione peroxidase, bacterial protein disulphide isomerase DsbA, and the N-terminal domain of glutathione transferase []. Thioredoxins have a beta-alpha unit preceding the motif common to all these proteins.A number of eukaryotic proteins contain domains evolutionary related to thioredoxin, most of them are protein disulphide isomerases (PDI). PDI ( ) [ , , ] is an endoplasmic reticulum multi-functional enzyme that catalyses the formation and rearrangement of disulphide bonds during protein folding []. All PDI contains two or three (ERp72) copies of the thioredoxin domain, each of which contributes to disulphide isomerase activity, but which are functionally non-equivalent []. Moreover, PDI exhibits chaperone-like activity towards proteins that contain no disulphide bonds, i.e. behaving independently of its disulphide isomerase activity []. The various forms of PDI which are currently known are:PDI major isozyme; a multifunctional protein that also function as the beta subunit of prolyl 4-hydroxylase ( ), as a component of oligosaccharyl transferase ( ), as thyroxine deiodinase ( ), as glutathione-insulin transhydrogenase ( ) and as a thyroid hormone-binding protein ERp60 (ER-60; 58 Kd microsomal protein). ERp60 was originally thought to be a phosphoinositide-specific phospholipase C isozyme and later to be a protease.ERp72.ERp5.Bacterial proteins that act as thiol:disulphide interchange proteins that allows disulphide bond formation in some periplasmic proteins also contain a thioredoxin domain. These proteins include:Escherichia coli DsbA (or PrfA) and its orthologs in Vibrio cholerae (TtcpG) and Haemophilus influenzae (Por).E. coli DsbC (or XpRA) and its orthologues in Erwinia chrysanthemi and H. influenzae.E. coli DsbD (or DipZ) and its H. influenzae orthologue.E. coli DsbE (or CcmG) and orthologues in H. influenzae.Rhodobacter capsulatus (Rhodopseudomonas capsulata) (HelX), Rhiziobiacae (CycY and TlpA).
Protein Domain
Name: S27a-like superfamily
Type: Homologous_superfamily
Description: Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [, ]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [ , ].This entry represents the S27a ribosomal domain from both archaea and eukaryotes. In eukaryotes, the 40S ribosomal protein S27a is synthesized as a C-terminal extension of ubiquitin, and this fusion protein is known as UBS27 [ ]. The S27a domain compromises the C-terminal half of the protein. The synthesis of ribosomal proteins as extensions of ubiquitin promotes their incorporation into nascent ribosomes by a transient metabolic stabilisation and is required for efficient ribosome biogenesis []. The ribosomal extension protein S27a contains a basic region that is proposed to form a zinc finger; its fusion gene is proposed as a mechanism to maintain a fixed ratio between ubiquitin necessary for degrading proteins and ribosomes a source of proteins [].
Protein Domain
Name: Leucine-rich repeat-containing adjacent domain
Type: Domain
Description: Leucine-rich repeats (LRR) consist of 2-45 motifs of 20-30 amino acids in length that generally folds into an arc or horseshoe shape [ ]. LRRs occur in proteins ranging from viruses to eukaryotes, and appear to provide a structural framework for the formation of protein-protein interactions [, ].Proteins containing LRRs include tyrosine kinase receptors, cell-adhesion molecules, virulence factors, and extracellular matrix-binding glycoproteins, and are involved in a variety of biological processes, including signal transduction, cell adhesion, DNA repair, recombination, transcription, RNA processing, disease resistance, apoptosis, and the immune response [, ].Sequence analyses of LRR proteins suggested the existence of several different subfamilies of LRRs. The significance of this classification is that repeats from different subfamilies never occur simultaneously and have most probably evolved independently. It is, however, now clear that all major classes of LRR have curved horseshoe structures with a parallel beta sheet on the concave side and mostly helical elements on the convex side. At least six families of LRR proteins, characterised by different lengths and consensus sequences of the repeats, have been identified. Eleven-residue segments of the LRRs (LxxLxLxxN/CxL), corresponding to the β-strand and adjacent loop regions, are conserved in LRR proteins, whereas the remaining parts of the repeats (herein termed variable) may be very different. Despite the differences, each of the variable parts contains two half-turns at both ends and a "linear"segment (as the chain follows a linear path overall), usually formed by a helix, in the middle. The concave face and the adjacent loops are the most common protein interaction surfaces on LRR proteins. 3D structure of some LRR proteins-ligand complexes show that the concave surface of LRR domain is ideal for interaction with α-helix, thus supporting earlier conclusions that the elongated and curved LRR structure provides an outstanding framework for achieving diverse protein-protein interactions []. Molecular modeling suggests that the conserved pattern LxxLxL, which is shorter than the previously proposed LxxLxLxxN/CxL is sufficient to impart the characteristic horseshoe curvature to proteins with 20- to 30-residue repeats []. These are small, all beta strand domains, structurally described for the protein Internalin (InlA) and related proteins InlB, InlC, InlH from the pathogenic bacterium Listeria monocytogenes. Their function appears to be mainly structural: They are fused to the C-terminal end of leucine-rich repeats (LRR), significantly stabilising the LRR, and forming a common rigid entity with the LRR. They are themselves not involved in protein-protein-interactions but help to present the adjacent LRR-domain for this purpose. These domains belong to the family of Ig-like domains in that they consist of two sandwiched beta sheets that follow the classical connectivity of Ig-domains. The beta strands in one of the sheets is, however, much smaller than in most standard Ig-like domains, making it somewhat of an outlier [ , , , ].
Protein Domain
Name: Small GTPase superfamily, ARF/SAR type
Type: Family
Description: Small GTPases form an independent superfamily within the larger class of regulatory GTP hydrolases. This superfamily contains proteins that control a vast number of important processes and possess a common, structurally preserved GTP-binding domain [ , ]. Sequence comparisons of small G proteins from various species have revealed that they are conserved in primary structures at the level of 30-55% similarity [].Crystallographic analysis of various small G proteins revealed the presence of a 20kDa catalytic domain that is unique for the whole superfamily [ , ]. The domain is built of five alpha helices (A1-A5), six β-strands (B1-B6) and five polypeptide loops (G1-G5). A structural comparison of the GTP- and GDP-bound form, allows one to distinguish two functional loop regions: switch I and switch II that surround the gamma-phosphate group of the nucleotide. The G1 loop (also called the P-loop) that connects the B1 strand and the A1 helix is responsible for the binding of the phosphate groups. The G3 loop provides residues for Mg2 and phosphate binding and is located at the N terminus of the A2 helix. The G1 and G3 loops are sequentially similar to Walker A and Walker B boxes that are found in other nucleotide binding motifs. The G2 loop connects the A1 helix and the B2 strand and contains a conserved Thr residue responsible for Mg2 binding. The guanine base is recognised by the G4 and G5 loops. The consensus sequence NKXD of the G4 loop contains Lys and Asp residues directly interacting with the nucleotide. Part of the G5 loop located between B6 and A5 acts as a recognition site for the guanine base [].The small GTPase superfamily can be divided into at least 8 different families, including:Arf small GTPases. GTP-binding proteins involved in protein trafficking by modulating vesicle budding and uncoating within the Golgi apparatus.Ran small GTPases. GTP-binding proteins involved in nucleocytoplasmic transport. Required for the import of proteins into the nucleus and also for RNA export.Rab small GTPases. GTP-binding proteins involved in vesicular traffic.Rho small GTPases. GTP-binding proteins that control cytoskeleton reorganisation.Ras small GTPases. GTP-binding proteins involved in signalling pathways.Sar1 small GTPases. Small GTPase component of the coat protein complex II (COPII) which promotes the formation of transport vesicles from the endoplasmic reticulum (ER).Mitochondrial Rho (Miro). Small GTPase domain found in mitochondrial proteins involved in mitochondrial trafficking.Roc small GTPases domain. Small GTPase domain always found associated with the COR domain.This entry represents a branch of the small GTPase superfamily that includes the ADP ribosylation factor Arf, Arl (Arf-like), Arp (Arf-related proteins) and the remotely related Sar (Secretion-associated and Ras-related) proteins. Arf proteins are major regulators of vesicle biogenesis in intracellular traffic []. They cycle between inactive GDP-bound and active GTP-bound forms that bind selectively to effectors. The classical structural GDP/GTP switch is characterised by conformational changes at the so-called switch 1 and switch 2 regions, which bind tightly to the gamma-phosphate of GTP but poorly or not at all to the GDP nucleotide. Structural studies of Arf1 and Arf6 have revealed that although these proteins feature the switch 1 and 2 conformational changes, they depart from other small GTP-binding proteins in that they use an additional, unique switch to propagate structural information from one side of the protein to the other. The GDP/GTP structural cycles of human Arf1 and Arf6 feature a unique conformational change that affects the beta2-beta3 strands connecting switch 1 and switch 2 (interswitch) and also the amphipathic helical N terminus. In GDP-bound Arf1 and Arf6, the interswitch is retracted and forms a pocket to which the N-terminal helix binds, the latter serving as a molecular hasp to maintain the inactive conformation. In the GTP-bound form of these proteins, the interswitch undergoes a two-residue register shift that pulls switch 1 and switch 2 up, restoring an active conformation that can bind GTP. In this conformation, the interswitch projects out of the protein and extrudes the N-terminal hasp by occluding its binding pocket.
Protein Domain
Name: Tetracyclin repressor-like, C-terminal domain
Type: Domain
Description: The antibiotic tetracycline has a broad spectrum of activity, acting to inhibit bacterial protein synthesis by binding to the 30S ribosomal subunit, which prevents the association of the aminoacyl-tRNA to the ribosomal acceptor A site. Tetracycline binding is reversible, therefore diluting out the antibiotic can reverse its effects. Tetracycline resistance genes are often located on mobile elements, such as plasmids, transposons and/or conjugative transposons, which can sometimes be transferred between bacterial species. In certain cases, tetracycline can enhance the transfer of these elements, thereby promoting resistance amongst a bacterial colony. There are three types of tetracycline resistance: tetracycline efflux, ribosomal protection, and tetracycline modification [ , ]: Tetracycline efflux proteins belong to the major facilitator superfamily. Efflux proteins are membrane-associated proteins that recognise and export tetracycline from the cell. They are found in both Gram-positive and Gram-negative bacteria [ ]. There are at least 22 different tetracycline efflux proteins, grouped according to sequence similarity: Group 1 are Tet(A), Tet(B), Tet(C), Tet(D), Tet(E), Tet(G), Tet(H), Tet(J), Tet(Z) and Tet(30); Group 2 are Tet(K) and Tet(L); Group 3 are Otr(B) and Tcr(3); Group 4 is TetA(P); Group 5 is Tet(V). In addition, there are the efflux proteins Tet(31), Tet(33), Tet(V), Tet(Y), Tet(34), and Tet(35).Ribosomal protection proteins are cytoplasmic proteins that display homology with the elongation factors EF-Tu and EF-G. Protection proteins bind the ribosome, causing an alteration in ribosomal conformation that prevents tetracycline from binding. There are at least ten ribosomal protection proteins: Tet(M), Tet(O), Tet(S), Tet(W), Tet(32), Tet(36), Tet(Q), Tet(T), Otr(A), and TetB(P). Both Tet(M) and Tet(O) have ribosome-dependent GTPase activity, the hydrolysis of GTP providing the energy for the ribosomal conformational changes. Tetracycline modification proteins include the enzymes Tet(37) and Tet(X), both of which inactivate tetracycline. In addition, there are the tetracycline resistance proteins Tet(U) and Otr(C).The expression of several of these tet genes is controlled by a family of tetracycline transcriptional regulators known as TetR. TetR family regulators are involved in the transcriptional control of multidrug efflux pumps, pathways for the biosynthesis of antibiotics, response to osmotic stress and toxic chemicals, control of catabolic pathways, differentiation processes, and pathogenicity [ ]. The TetR proteins identified in over 115 genera of bacteria and archaea share a common helix-turn-helix (HTH) structure in their DNA-binding domain. However, TetR proteins can work in different ways: they can bind a target operator directly to exert their effect (e.g. TetR binds Tet(A) gene to repress it in the absence of tetracycline), or they can be involved in complex regulatory cascades in which the TetR protein can either be modulated by another regulator or TetR can trigger the cellular response. This entry represents the C-terminal domain found in a number of different TetR transcription regulator proteins. TetR regulates the expression of the membrane-associated tetracycline resistance protein, TetA, which exports the tetracycline antibiotic out of the cell before it can attach to the ribosomes and inhibit protein synthesis [ ]. TetR blocks transcription from the genes encoding both TetA and TetR in the absence of antibiotic. The C-terminal domain is multi-helical and is interlocked in the homodimer with the helix-turn-helix (HTH) DNA-binding domain. Other members of the TetR family of transcriptional regulators carry this C-terminal domain. These include:QacR from Staphylococcus aureus, a multidrug binding protein that represses transcription of the qacA multidrug transporter gene [ ] Ethr, a repressor from Mycobacterium tuberculosis implicated in ethionamide drug resistance [ ] CprB, a gamma-butyrolactone autoregulator/receptor from Streptomyces coelicolor that acts as a DNA-binding protein [ ] YcdC, a hypothetical transcriptional regulator from Escherichia coliYsiA, YfiR, and YxaF, hypothetical transcriptional regulators from Bacillus subtilis YbiH, a hypothetical transcriptional regulator from Salmonella typhimurium
Protein Domain
Name: MPN domain
Type: Domain
Description: The MPN (Mpr1, Pad1 N-terminal) domain (also known as the Mov34, JAB, PAD-1, or JAMM (JAB1/MPN/Mov34 metalloenzyme) domain) is a widespread 120 amino acidprotein module found in archaea, bacteria and eukaryotes. In eukaryotes the MPN domain is found in subunits of several multiprotein complexes includingthe proteasome, whereas in eubacteria and archaea, the MPN domain is usually found in single domain proteins [, , , , , , ].Within the MPN domain super-family, two main subclasses have been characterized: the MPN+ and MPN- domain-containing proteins. MPN+ domain-containing proteins are classified as metalloenzymes responsible for isopeptidase activity. These proteins contain a conserved glutamate (E) and aJAMM (Jab1/MPN/Mov34 metalloenzyme) motif, typically consisting of a canonical sequence (H-x-H-x[]-S-x[]-D) and coordinating a zinc ion. The E and JAMM motif specify a catalytic centre essential for selective hydrolysis of linkages, contained between ubiquitin/ubiquitin-like proteins and target proteins or between ubiquitin monomers within a polymeric chain. MPN- domains are recognizable by the absence of essential Zn(2+)-coordinating residues that are required for catalytic function. In protein complexes, an MPN+ domain can associate with MPN- domains for purposes that are not well understood[ , , , ].The MPN domain is composed of a five-stranded mixed β-sheet with strand order 21345 sanwiched between two α-helices. There is a second smallerthree-stranded parallel β-sheet formed by residues at the N and C termini and in the loop between the first two strands of the main sheet. MPN domains often form part of larger proteins. The N and Ctermini of the structure are close in space, allowing it to be easily inserted into a multidomain protein. The histidine and aspartate residues of the JAMMmotif coordinate a zinc ion [ , , ].Some proteins known to contain a MPN domain are listed below:Eukaryotic COP9 signalosome complex subunit 5 (CSN5).Eukaryotic COP9 signalosome complex subunit 6 (CSN6).Eukaryotic 26S proteasome regulatory subunit RPN11.Eukaryotic 26S proteasome non-ATPase subunit 7 (PSMD7), also known as Mov34.Eukaryotic AMSH (associated molecule with the SH3 domain of STAM) proteins, zinc metalloproteases that specifically cleave ubiquitin chains.Eukaryotic pre-mRNA-splicing factor 8 (PRP8), a component of the U5 small nuclear RNA-protein (snRNP) complex.Eukaryotic translation initiation factor 3 (eIF3) subunits f and h (alsoknown as subunits p47 and p40, respectively).Animal BRCC36 family proteins, Lys-63-specific deubiquitinases.Animal BRCA1-A complex subunit Abraxas.Animal BRISC complex subunit Abro1.Bacterial UPF0758 protein.Bacteriophage lambda tail assembly protein 'K' (vtak).
Protein Domain
Name: Envelope small membrane protein, MERS-CoV-like
Type: Family
Description: This entry represents the Envelope (E) small membrane protein of Middle East respiratory syndrome (MERS) coronavirus (CoV), as well as E proteins from related coronaviruses.E protein is the smallest of the major structural proteins. It is conserved among Coronavirus strains. It is an integral membrane protein involved in several aspects of the virus' life cycle, such as assembly, budding, envelope formation, and pathogenesis [ ]. During the replication cycle, E is abundantly expressed inside the infected cell, but only a small portion is incorporated into the virus envelope. The majority of the protein participates in viral assembly and budding [, ]. It can act as a viroporin by oligomerizing after insertion in host membranes to create a hydrophilic pore that allows ion transport [, ]. Additionally, the E protein is thought to prevent M protein aggregation and induce membrane curvature [].SARS-CoV E protein forms a Ca2+ permeable channel in the endoplasmic reticulum Golgi apparatus intermediate compartment (ERGIC)/Golgi membranes. The E protein ion channel activity alters Ca2+ homeostasis within cells boosting the activation of the NLRP3 inflammasome, which leads to the overproduction of IL-1beta. SARS-CoV overstimulates the NF-kappaB inflammatory pathway and interacts with the cellular protein syntenin, triggering p38 MARK activation. These signalling cascades result in exacerbated inflammation and immunopathology [ ].Cov E proteins have a short hydrophilic N terminus, followed by a large hydrophobic transmembrane (TM) domain, and end with a long, hydrophilic C terminus, which comprises the majority of the protein. The hydrophobic region of the TM domain contains at least one predicted amphipathic α-helix that pentamerizes to form an ion-conductive pore in membranes. CoV E proteins have been proposed to have at least two roles. One is related to their TM channel domain. This would be active in the secretory pathway, altering lumenal environments and rearranging secretory organelles and leading to efficient trafficking of virions. The other would be related to their extramembrane domains, particularly the C-terminal domain. This is involved in protein-protein interactions and targeting, among other roles [ , , , ]. In the CoV E protein structure a longer α-helix encompasses the TM domain, which is connected to another shorter C-terminal α-helix by a flexible linker domain, forming an L-shape [Li]. The CoV E pentamer is a right handed α-helical bundle where the C-terminal tails coil around each other [].
Protein Domain
Name: Tetracycline transcriptional regulator, TetR
Type: Family
Description: The antibiotic tetracycline has a broad spectrum of activity, acting to inhibit bacterial protein synthesis by binding to the 30S ribosomal subunit, which prevents the association of the aminoacyl-tRNA to the ribosomal acceptor A site. Tetracycline binding is reversible, therefore diluting out the antibiotic can reverse its effects. Tetracycline resistance genes are often located on mobile elements, such as plasmids, transposons and/or conjugative transposons, which can sometimes be transferred between bacterial species. In certain cases, tetracycline can enhance the transfer of these elements, thereby promoting resistance amongst a bacterial colony. There are three types of tetracycline resistance: tetracycline efflux, ribosomal protection, and tetracycline modification [ , ]: Tetracycline efflux proteins belong to the major facilitator superfamily. Efflux proteins are membrane-associated proteins that recognise and export tetracycline from the cell. They are found in both Gram-positive and Gram-negative bacteria [ ]. There are at least 22 different tetracycline efflux proteins, grouped according to sequence similarity: Group 1 are Tet(A), Tet(B), Tet(C), Tet(D), Tet(E), Tet(G), Tet(H), Tet(J), Tet(Z) and Tet(30); Group 2 are Tet(K) and Tet(L); Group 3 are Otr(B) and Tcr(3); Group 4 is TetA(P); Group 5 is Tet(V). In addition, there are the efflux proteins Tet(31), Tet(33), Tet(V), Tet(Y), Tet(34), and Tet(35).Ribosomal protection proteins are cytoplasmic proteins that display homology with the elongation factors EF-Tu and EF-G. Protection proteins bind the ribosome, causing an alteration in ribosomal conformation that prevents tetracycline from binding. There are at least ten ribosomal protection proteins: Tet(M), Tet(O), Tet(S), Tet(W), Tet(32), Tet(36), Tet(Q), Tet(T), Otr(A), and TetB(P). Both Tet(M) and Tet(O) have ribosome-dependent GTPase activity, the hydrolysis of GTP providing the energy for the ribosomal conformational changes. Tetracycline modification proteins include the enzymes Tet(37) and Tet(X), both of which inactivate tetracycline. In addition, there are the tetracycline resistance proteins Tet(U) and Otr(C).The expression of several of these tet genes is controlled by a family of tetracycline transcriptional regulators known as TetR. TetR family regulators are involved in the transcriptional control of multidrug efflux pumps, pathways for the biosynthesis of antibiotics, response to osmotic stress and toxic chemicals, control of catabolic pathways, differentiation processes, and pathogenicity [ ]. The TetR proteins identified in over 115 genera of bacteria and archaea share a common helix-turn-helix (HTH) structure in their DNA-binding domain. However, TetR proteins can work in different ways: they can bind a target operator directly to exert their effect (e.g. TetR binds Tet(A) gene to repress it in the absence of tetracycline), or they can be involved in complex regulatory cascades in which the TetR protein can either be modulated by another regulator or TetR can trigger the cellular response. This entry represents the tetracycline transcriptional repressor TetR, which binds to the Tet(A) gene to repress its expression in the absence of tetracycline [ ]. Tet(A) is a membrane-associated efflux protein that exports tetracycline from the cell before it can attach to ribosomes and inhibit polypeptide chain growth. TetR occurs as a homodimer and uses two helix-turn-helix (HTH) motifs to bind tandem DNA operators, thereby blocking the expression of the associated genes, TetA and TetR. The structure of the class D TetR repressor protein [] involves 10 α-helices, with connecting turns and loops. The three N-terminal helices constitute the DNA-binding HTH domain, which has an inverse orientation compared with HTH motifs in other DNA-binding proteins. The core of the protein, formed by helices 5-10, is responsible for dimerisation and contains, for each monomer, a binding pocket that accommodates tetracycline in the presence of a divalent cation.
Protein Domain
Name: ADP/ATP carrier protein, eukaryotic type
Type: Family
Description: A variety of substrate carrier proteins that are involved in energy transfer are found in the inner mitochondrial membrane [ , , , , ]. Such proteins include: ADP,ATP carrier protein (ADP/ATP translocase); 2-oxoglutarate/malate carrier protein; phosphate carrier protein; tricarboxylate transport protein (or citrate transport protein); Graves disease carrier protein; yeast mitochondrial proteins MRS3 and MRS4; yeast mitochondrial FAD carrier protein; and many others.Sequence analysis of selected members of the carrier protein family has suggested the presence of six transmembrane (TM) domains, with varying degrees of sequence conservation and hydrophilicity []. The TM regions, and adjacent hydrophilic loops, are more highly conserved than other regions of the proteins []. All members of the family appear to consist of a tripartite structure, each of the repeated segments being ~100 residues in length []. Each repeat contains two TM domains, the first being morehydrophobic, with conserved glycyl and prolyl residues. Five of the six TM domains are followed by the conserved sequence (D/E)-Hy(K/R), where - denotes any residue and Hy is a hydrophobic position [ ].Mitochondrial ADP/ATP translocase, an abundant component of the inner membrane, carries ATP from the matrix into the inter-membrane space and transports ADP back [ , ]. The protein is an integral membrane protein that functions as a homodimer.Mutations of the human ADP/ATP translocase 1 (also known as SLC25A4) gene cause mitochondrial diseases, such as PEOA2 and MTDPS12B [ ]. This family contains proteins found in eucaryotes.
Protein Domain
Name: GoLoco motif
Type: Conserved_site
Description: In heterotrimeric G-protein signalling, cell surface receptors (GPCRs) are coupled to membrane-associated heterotrimers comprising a GTP-hydrolysing subunit G-alpha and a G-beta/G-gamma dimer. The inactive form contains the alpha subunit bound to GDP and complexes with the beta and gamma subunit. When the ligand is associated to the receptor, GDP is displaced from G-alpha and GTP is bound. GTP/G-alpha complex dissociates from the trimer and associates to an effector until the intrinsic GTPase activity of G-alpha returns the protein to GDP bound form. Reassociation of GDP bound G-alpha with G-beta/G-gamma dimer terminates the signal. Several mechanisms regulate the signal output at different stage of the G-protein cascade. Two classes of intracellular proteins act as inhibitors of G protein activation: GTPase activating proteins (GAPs), which enhance GTP hydrolysis (see ), and guanine dissociation inhibitors (GDIs), which inhibit GDP dissociation.The GoLoco or G-protein regulatory (GPR) motif found in various G-protein regulators [, ] acts as a GDI on G-alpha(i) [, ].The crystal structure of the GoLoco motif in complex with G-alpha(i) has been solved [ ]. It consists of three small alpha helices. The highly conserved Asp-Gln-Arg triad within the GoLoco motif participates directly in GDP binding by extending the arginine side chain into the nucleotide binding pocket, highly reminiscent of the catalytic arginine finger employed in GTPase-activating protein (see ). This addition of an arginine in the binding pocket affects the interaction of GDP with G-alpha and therefore is certainly important for the GoLoco GDI activity [ ].Some proteins known to contain a GoLoco motif are listed below:Mammalian regulators of G-protein signalling 12 and 14 (RGS12 and RGS14), multifaceted signal transduction regulators.Loco, the drosophila RGS12 homologue.Mammalian Purkinje-cell protein-2 (Pcp2). It may function as a cell-type specific modulator for G protein-mediated cell signalling. It is uniquely expressed in cerebellar Purkinje cells and in retinal bipolar neurons.Eukaryotic Rap1GAP. A GTPase activator for the nuclear ras-related regulatory protein RAP-1A.Drosophila protein Rapsynoid (also known as Partner of Inscuteable, Pins) and its mammalian homologues AGS3 and LGN. They form a G-protein regulator family that also contains TPR repeats.
Protein Domain
Name: GOLD domain
Type: Domain
Description: The GOLD (for Golgi dynamics) domain is a protein module found in severaleukaryotic Golgi and lipid-traffic proteins. It is typically between 90 and 150 amino acids long. Most of the size difference observed in the GOLD-domainsuperfamily is traceable to a single large low-complexity insert that is seen in some versions of the domain. With the exception of the p24 proteins, whichhave a simple architecture with the GOLD domain as their only globular domain, all other GOLD-domain proteins contain additional conserved globular domains. In these proteins, the GOLD domain co-occurs with lipid-, sterol- or fatty acid-binding domains such as PH, CRAL-TRIO, FYVE oxysterol binding- and acyl CoA-binding domains, suggesting that these proteins may interact with membranes. The GOLDdomain can also be found associated with a RUN domain, which may have a role in the interaction of various proteins with cytoskeletalfilaments. The GOLD domain is predicted to mediate diverse protein-protein interactions []. A secondary structure prediction for the GOLD domain reveals that it is likelyto adopt a compact all-β-fold structure with six to seven strands. Most of the sequence conservation is centred on the hydrophobic cores that supportthese predicted strands. The predicted secondary-structure elements and the size of the conserved core of the domain suggests that it may form a beta-sandwich fold with the strands arranged in two beta sheets stacked on each other [].Some proteins known to contain a GOLD domain are listed below:Eukaryotic proteins of the p24 family.Animal Sec14-like proteins. They are involved in secretion.Human Golgi resident protein GCP60. It interacts with the Golgi integral membrane protein Giantin.Yeast oxysterol-binding protein homologue 3 (OSH3).
Protein Domain
Name: Solute-binding family 1, conserved site
Type: Conserved_site
Description: Bacterial high affinity transport systems are involved in active transport of solutes across the cytoplasmic membrane. Most of the bacterial ABC (ATP-binding cassette) importers are composed of one or two transmembrane permease proteins, one or two nucleotide-binding proteins and a highly specific periplasmic solute-binding protein. In Gram-negative bacteria the solute-binding proteins are dissolved in the periplasm, while in archaea and Gram-positive bacteria, their solute-binding proteins are membrane-anchored lipoproteins [ , ]. On the basis of sequence similarities, the vast majority of these solute-binding proteins can be grouped [ ] into eight families or clusters, which generally correlate with the nature of the solute bound. This entry represents a conserved site found in the extracellular solute-binding protein family 1 members from Gram-positive bacteria, Gram-negative bacteria and archaea. Family 1 members include:Maltose/maltodextrin-binding proteins of Enterobacteriaceae (gene malE) [ ] and Streptococcus pneumoniae malXMultiple oligosaccharide binding protein of Streptococcus mutans (gene msmE)Escherichia coli glycerol-3-phosphate-binding proteinSerratia marcescens iron-binding protein (gene sfuA) and the homologous proteins (gene fbp) from Haemophilus influenzae and NeisseriaE. coli thiamine-binding protein (gene tbpA)
Protein Domain
Name: TUG ubiquitin-like domain
Type: Domain
Description: This entry represents the N-terminal ubiquitin-like domain of the TUG protein. This domain in the similar protein, UBL1, appears to participate in protein-protein interactions [ ]. The region does have a area of negative electrostatic potential and increased backbone motility which leads to suggestions of a potential protein-protein interaction site []. This domain is also found at the N terminus of yeast Ubx4 [].Proteins containing this domain include Ubx4 from budding yeast and TUG (also known as tether containing UBX domain for GLUT4) from mice. Ubx4 is involved in Cdc48-dependent protein degradation through the ubiquitin/proteasome pathway [ ]. TUG is a GLUT4 regulating protein and functions to retain membrane vesicles containing GLUT4 intracellularly. TUG releases the GLUT4 containing vesicles to the cellular exocytic machinery in response to insulin stimulation which allows translocation to the plasma membrane [ ]. TUG has an N-terminal ubiquitin-like domain (UBL1) which in similar proteins appears to participate in protein-protein interactions []. The region does have a area of negative electrostatic potential and increased backbone motility which leads to suggestions of a potential protein-protein interaction site [].
Protein Domain
Name: NIF system FeS cluster assembly, NifU-like
Type: Family
Description: Iron-sulphur (FeS) clusters are important cofactors for numerous proteins involved in electron transfer, in redox and non-redox catalysis, in gene regulation, and as sensors of oxygen and iron. These functions depend on the various FeS cluster prosthetic groups, the most common being [2Fe-2S] and [4Fe-4S][ ]. FeS cluster assembly is a complex process involving the mobilisation of Fe and S atoms from storage sources, their assembly into [Fe-S]form, their transport to specific cellular locations, and their transfer to recipient apoproteins. So far, three FeS assembly machineries have been identified, which are capable of synthesising all types of [Fe-S]clusters: ISC (iron-sulphur cluster), SUF (sulphur assimilation), and NIF (nitrogen fixation) systems. The ISC system is conserved in eubacteria and eukaryotes (mitochondria), and has broad specificity, targeting general FeS proteins [ , ]. It is encoded by the isc operon (iscRSUA-hscBA-fdx-iscX). IscS is a cysteine desulphurase, which obtains S from cysteine (converting it to alanine) and serves as a S donor for FeS cluster assembly. IscU and IscA act as scaffolds to accept S and Fe atoms, assembling clusters and transferring them to recipient apoproteins. HscA is a molecular chaperone and HscB is a co-chaperone. Fdx is a [2Fe-2S]-type ferredoxin. IscR is a transcription factor that regulates expression of the isc operon. IscX (also known as YfhJ) appears to interact with IscS and may function as an Fe donor during cluster assembly [ ].The SUF system is an alternative pathway to the ISC system that operates under iron starvation and oxidative stress. It is found in eubacteria, archaea and eukaryotes (plastids). The SUF system is encoded by the suf operon (sufABCDSE), and the six encoded proteins are arranged into two complexes (SufSE and SufBCD) and one protein (SufA). SufS is a pyridoxal-phosphate (PLP) protein displaying cysteine desulphurase activity. SufE acts as a scaffold protein that accepts S from SufS and donates it to SufA [ ]. SufC is an ATPase with an unorthodox ATP-binding cassette (ABC)-like component. SufA is homologous to IscA [], acting as a scaffold protein in which Fe and S atoms are assembled into [FeS]cluster forms, which can then easily be transferred to apoproteins targets. In the NIF system, NifS and NifU are required for the formation of metalloclusters of nitrogenase in Azotobacter vinelandii, and other organisms, as well as in the maturation of other FeS proteins. Nitrogenase catalyses the fixation of nitrogen. It contains a complex cluster, the FeMo cofactor, which contains molybdenum, Fe and S. NifS is a cysteine desulphurase. NifU binds one Fe atom at its N-terminal, assembling an FeS cluster that is transferred to nitrogenase apoproteins [ ]. Nif proteins involved in the formation of FeS clusters can also be found in organisms that do not fix nitrogen [].This entry represents a distinct group of NifU-like proteins, always found with a NifS-like protein and restricted to species that lack a SUF system. Typically, NIF systems service a smaller number of FeS-containing proteins than do ISC or SUF. These proteins are often encoded near the mnmA gene, involved in the carboxymethylaminomethyl modification of U34 in some tRNAs (see ). While other NifU proteins are associated with nitrogen fixation, this family is not.
Protein Domain
Name: Envelope small membrane protein, alphacoronavirus
Type: Family
Description: This family is specific for E proteins from alphacoronaviruses.E protein is the smallest of the major structural proteins. It is conserved among Coronavirus strains. It is an integral membrane protein involved in several aspects of the virus' life cycle, such as assembly, budding, envelope formation, and pathogenesis [ ]. During the replication cycle, E is abundantly expressed inside the infected cell, but only a small portion is incorporated into the virus envelope. The majority of the protein participates in viral assembly and budding [, ]. It can act as a viroporin by oligomerizing after insertion in host membranes to create a hydrophilic pore that allows ion transport [, ]. Additionally, the E protein is thought to prevent M protein aggregation and induce membrane curvature [].SARS-CoV E protein forms a Ca2+ permeable channel in the endoplasmic reticulum Golgi apparatus intermediate compartment (ERGIC)/Golgi membranes. The E protein ion channel activity alters Ca2+ homeostasis within cells boosting the activation of the NLRP3 inflammasome, which leads to the overproduction of IL-1beta. SARS-CoV overstimulates the NF-kappaB inflammatory pathway and interacts with the cellular protein syntenin, triggering p38 MARK activation. These signalling cascades result in exacerbated inflammation and immunopathology [ ].Cov E proteins have a short hydrophilic N terminus, followed by a large hydrophobic transmembrane (TM) domain, and end with a long, hydrophilic C terminus, which comprises the majority of the protein. The hydrophobic region of the TM domain contains at least one predicted amphipathic α-helix that pentamerizes to form an ion-conductive pore in membranes. CoV E proteins have been proposed to have at least two roles. One is related to their TM channel domain. This would be active in the secretory pathway, altering lumenal environments and rearranging secretory organelles and leading to efficient trafficking of virions. The other would be related to their extramembrane domains, particularly the C-terminal domain. This is involved in protein-protein interactions and targeting, among other roles [, , , ]. In the CoV E protein structure a longer α-helix encompasses the TM domain, which is connected to another shorter C-terminal α-helix by a flexible linker domain, forming an L-shape [Li]. The CoV E pentamer is a right handed α-helical bundle where the C-terminal tails coil around each other [].
Protein Domain
Name: Envelope small membrane protein, betacoronavirus
Type: Family
Description: This family is specific for E proteins from betacoronaviruses.E protein is the smallest of the major structural proteins. It is conserved among Coronavirus strains. It is an integral membrane protein involved in several aspects of the virus' life cycle, such as assembly, budding, envelope formation, and pathogenesis [ ]. During the replication cycle, E is abundantly expressed inside the infected cell, but only a small portion is incorporated into the virus envelope. The majority of the protein participates in viral assembly and budding [, ]. It can act as a viroporin by oligomerizing after insertion in host membranes to create a hydrophilic pore that allows ion transport [, ]. Additionally, the E protein is thought to prevent M protein aggregation and induce membrane curvature [].SARS-CoV E protein forms a Ca2+ permeable channel in the endoplasmic reticulum Golgi apparatus intermediate compartment (ERGIC)/Golgi membranes. The E protein ion channel activity alters Ca2+ homeostasis within cells boosting the activation of the NLRP3 inflammasome, which leads to the overproduction of IL-1beta. SARS-CoV overstimulates the NF-kappaB inflammatory pathway and interacts with the cellular protein syntenin, triggering p38 MARK activation. These signalling cascades result in exacerbated inflammation and immunopathology [ ].Cov E proteins have a short hydrophilic N terminus, followed by a large hydrophobic transmembrane (TM) domain, and end with a long, hydrophilic C terminus, which comprises the majority of the protein. The hydrophobic region of the TM domain contains at least one predicted amphipathic α-helix that pentamerizes to form an ion-conductive pore in membranes. CoV E proteins have been proposed to have at least two roles. One is related to their TM channel domain. This would be active in the secretory pathway, altering lumenal environments and rearranging secretory organelles and leading to efficient trafficking of virions. The other would be related to their extramembrane domains, particularly the C-terminal domain. This is involved in protein-protein interactions and targeting, among other roles [ , , , ]. In the CoV E protein structure a longer α-helix encompasses the TM domain, which is connected to another shorter C-terminal α-helix by a flexible linker domain, forming an L-shape [Li]. The CoV E pentamer is a right handed α-helical bundle where the C-terminal tails coil around each other [].
Protein Domain
Name: GOLD domain superfamily
Type: Homologous_superfamily
Description: The GOLD (for Golgi dynamics) domain is a protein module found in several eukaryotic Golgi and lipid-traffic proteins. It is typically between 90 and150 amino acids long. Most of the size difference observed in the GOLD-domain superfamily is traceable to a single large low-complexity insert that is seen in some versions of the domain. With the exception of the p24 proteins, whichhave a simple architecture with the GOLD domain as their only globular domain, all other GOLD-domain proteins contain additional conserved globular domains. In these proteins, the GOLD domain co-occurs with lipid-, sterol- or fatty acid-binding domains such as PH, CRAL-TRIO, FYVE oxysterol binding- and acyl CoA-binding domains, suggesting that these proteins may interact with membranes. The GOLDdomain can also be found associated with a RUN domain, which may have a role in the interaction of various proteins with cytoskeletalfilaments. The GOLD domain is predicted to mediate diverse protein-protein interactions []. A secondary structure prediction for the GOLD domain reveals that it is likelyto adopt a compact all-β-fold structure with six to seven strands. Most of the sequence conservation is centred on the hydrophobic cores that supportthese predicted strands. The predicted secondary-structure elements and the size of the conserved core of the domain suggests that it may form a beta-sandwich fold with the strands arranged in two beta sheets stacked on each other [].Some proteins known to contain a GOLD domain are listed below:Eukaryotic proteins of the p24 family.Animal Sec14-like proteins. They are involved in secretion.Human Golgi resident protein GCP60. It interacts with the Golgi integral membrane protein Giantin.Yeast oxysterol-binding protein homologue 3 (OSH3).
Protein Domain
Name: Small GTPase superfamily, ARF type
Type: Family
Description: Small GTPases form an independent superfamily within the larger class of regulatory GTP hydrolases. This superfamily contains proteins that control a vast number of important processes and possess a common, structurally preserved GTP-binding domain [ , ]. Sequence comparisons of small G proteins from various species have revealed that they are conserved in primary structures at the level of 30-55% similarity [].Crystallographic analysis of various small G proteins revealed the presence of a 20kDa catalytic domain that is unique for the whole superfamily [ , ]. The domain is built of five alpha helices (A1-A5), six β-strands (B1-B6) and five polypeptide loops (G1-G5). A structural comparison of the GTP- and GDP-bound form, allows one to distinguish two functional loop regions: switch I and switch II that surround the gamma-phosphate group of the nucleotide. The G1 loop (also called the P-loop) that connects the B1 strand and the A1 helix is responsible for the binding of the phosphate groups. The G3 loop provides residues for Mg2 and phosphate binding and is located at the N terminus of the A2 helix. The G1 and G3 loops are sequentially similar to Walker A and Walker B boxes that are found in other nucleotide binding motifs. The G2 loop connects the A1 helix and the B2 strand and contains a conserved Thr residue responsible for Mg2 binding. The guanine base is recognised by the G4 and G5 loops. The consensus sequence NKXD of the G4 loop contains Lys and Asp residues directly interacting with the nucleotide. Part of the G5 loop located between B6 and A5 acts as a recognition site for the guanine base [].The small GTPase superfamily can be divided into at least 8 different families, including:Arf small GTPases. GTP-binding proteins involved in protein trafficking by modulating vesicle budding and uncoating within the Golgi apparatus.Ran small GTPases. GTP-binding proteins involved in nucleocytoplasmic transport. Required for the import of proteins into the nucleus and also for RNA export.Rab small GTPases. GTP-binding proteins involved in vesicular traffic.Rho small GTPases. GTP-binding proteins that control cytoskeleton reorganisation.Ras small GTPases. GTP-binding proteins involved in signalling pathways.Sar1 small GTPases. Small GTPase component of the coat protein complex II (COPII) which promotes the formation of transport vesicles from the endoplasmic reticulum (ER).Mitochondrial Rho (Miro). Small GTPase domain found in mitochondrial proteins involved in mitochondrial trafficking.Roc small GTPases domain. Small GTPase domain always found associated with the COR domain.This entry represents a branch of the small GTPase superfamily that includes the ADP ribosylation factor Arf, Arl (Arf-like), and Arp (Arf-related proteins). Arf proteins are major regulators of vesicle biogenesis in intracellular traffic []. They cycle between inactive GDP-bound and active GTP-bound forms that bind selectively to effectors. The classical structural GDP/GTP switch is characterised by conformational changes at the so-called switch 1 and switch 2 regions, which bind tightly to the gamma-phosphate of GTP but poorly or not at all to the GDP nucleotide. Structural studies of Arf1 and Arf6 have revealed that although these proteins feature the switch 1 and 2 conformational changes, they depart from other small GTP-binding proteins in that they use an additional, unique switch to propagate structural information from one side of the protein to the other. The GDP/GTP structural cycles of human Arf1 and Arf6 feature a unique conformational change that affects the beta2-beta3 strands connecting switch 1 and switch 2 (interswitch) and also the amphipathic helical N terminus. In GDP-bound Arf1 and Arf6, the interswitch is retracted and forms a pocket to which the N-terminal helix binds, the latter serving as a molecular hasp to maintain the inactive conformation. In the GTP-bound form of these proteins, the interswitch undergoes a two-residue register shift that pulls switch 1 and switch 2 up, restoring an active conformation that can bind GTP. In this conformation, the interswitch projects out of the protein and extrudes the N-terminal hasp by occluding its binding pocket.
Protein Domain
Name: Plant G-protein, alpha subunit
Type: Family
Description: Guanine nucleotide binding proteins (G proteins) are membrane-associated, heterotrimeric proteins composed of three subunits: alpha ( ), beta ( ) and gamma ( ) [ ]. G proteins and their receptors (GPCRs) form one of the most prevalent signalling systems in mammalian cells, regulating systems as diverse as sensory perception, cell growth and hormonal regulation []. At the cell surface, the binding of ligands such as hormones and neurotransmitters to a GPCR activates the receptor by causing a conformational change, which in turn activates the bound G protein on the intracellular-side of the membrane. The activated receptor promotes the exchange of bound GDP for GTP on the G protein alpha subunit. GTP binding changes the conformation of switch regions within the alpha subunit, which allows the bound trimeric G protein (inactive) to be released from the receptor, and to dissociate into active alpha subunit (GTP-bound) and beta/gamma dimer. The alpha subunit and the beta/gamma dimer go on to activate distinct downstream effectors, such as adenylyl cyclase, phosphodiesterases, phospholipase C, and ion channels. These effectors in turn regulate the intracellular concentrations of secondary messengers, such as cAMP, diacylglycerol, sodium or calcium cations, which ultimately lead to a physiological response, usually via the downstream regulation of gene transcription. The cycle is completed by the hydrolysis of alpha subunit-bound GTP to GDP, resulting in the re-association of the alpha and beta/gamma subunits and their binding to the receptor, which terminates the signal []. The length of the G protein signal is controlled by the duration of the GTP-bound alpha subunit, which can be regulated by RGS (regulator of G protein signalling) proteins or by covalent modifications [].G protein alpha subunits are 350-400 amino acids in length and have molecular weights in the range 40-45kDa. Seventeen distinct types ofalpha subunit have been identified in mammals. These fall into 4 main groups on the basis of both sequence similarity and function: alpha-S (), alpha-Q (), alpha-I ( )and alpha-12( ) [ ].The specific combination of subunits in heterotrimeric G proteins affects not only which receptor it can bind to, but also which downstream target is affected, providing the means to target specific physiological processes in response to specific external stimuli [ , ]. G proteins carry lipid modifications on one or more of their subunits to target them to the plasma membrane and to contribute to protein interactions.This family represents the plant class of G protein alpha subunits, which have been isolated from a variety of plant species. Plant G proteins are involved in signal transduction from hormone receptors, including the plant hormones gibberellin and abscisic acid that regulate gene expression, secretion and cell death in alerone []. Plant alpha subunits are highly conserved between species, but share relatively low sequence similarity with mammalian G-proteins. However, the GTP-binding and hydrolysis regions are well conserved.
Protein Domain
Name: G-protein alpha subunit, group 12/13
Type: Family
Description: Guanine nucleotide binding proteins (G proteins) are membrane-associated, heterotrimeric proteins composed of three subunits: alpha ( ), beta ( ) and gamma ( ) [ ]. G proteins and their receptors (GPCRs) form one of the most prevalent signalling systems in mammalian cells, regulating systems as diverse as sensory perception, cell growth and hormonal regulation []. At the cell surface, the binding of ligands such as hormones and neurotransmitters to a GPCR activates the receptor by causing a conformational change, which in turn activates the bound G protein on the intracellular-side of the membrane. The activated receptor promotes the exchange of bound GDP for GTP on the G protein alpha subunit. GTP binding changes the conformation of switch regions within the alpha subunit, which allows the bound trimeric G protein (inactive) to be released from the receptor, and to dissociate into active alpha subunit (GTP-bound) and beta/gamma dimer. The alpha subunit and the beta/gamma dimer go on to activate distinct downstream effectors, such as adenylyl cyclase, phosphodiesterases, phospholipase C, and ion channels. These effectors in turn regulate the intracellular concentrations of secondary messengers, such as cAMP, diacylglycerol, sodium or calcium cations, which ultimately lead to a physiological response, usually via the downstream regulation of gene transcription. The cycle is completed by the hydrolysis of alpha subunit-bound GTP to GDP, resulting in the re-association of the alpha and beta/gamma subunits and their binding to the receptor, which terminates the signal []. The length of the G protein signal is controlled by the duration of the GTP-bound alpha subunit, which can be regulated by RGS (regulator of G protein signalling) proteins or by covalent modifications [].G protein alpha subunits are 350-400 amino acids in length and have molecular weights in the range 40-45kDa. Seventeen distinct types ofalpha subunit have been identified in mammals. These fall into 4 main groups on the basis of both sequence similarity and function: alpha-S (), alpha-Q (), alpha-I ( )and alpha-12( ) [ ].The specific combination of subunits in heterotrimeric G proteins affects not only which receptor it can bind to, but also which downstream target is affected, providing the means to target specific physiological processes in response to specific external stimuli [ , ]. G proteins carry lipid modifications on one or more of their subunits to target them to the plasma membrane and to contribute to protein interactions.This family consists of the alpha-12 group of G proteins, which includes both alpha-12 group of G proteins and alpha-13 group of G proteins. G alpha-12 and G alpha-13 are ubiquitously expressed and can induce many cellular responses, including phospholipase C-epsilon activation, phospholipase D activation, cytoskeletal change, oncogenic response, apoptosis, MAP kinase activation and Na/H-exchange activation. G alpha-12 and G alpha-13 can activate several effectors, including small GTPases such as Rho [ ].
Protein Domain
Name: P/V phosphoprotein, paramyxoviral
Type: Family
Description: Paramyxoviral P genes are able to generate more than one product, using alternative reading frames and RNA editing. The P gene encodes the structural phosphoprotein P. In addition, it encodes several non-structural proteins present in the infected cell but not in the virus particle. This family includes phosphoprotein P and the non-structural phosphoprotein V from different paramyxoviruses. Phosphoprotein P is a modular protein organised into two moieties that are both functionally and structurally distinct: a well-conserved C-terminal moiety that contains all the regions required for transcription, and a poorly conserved, intrinsically unstructured N-terminal moiety that provides several additional functions required for replication. The N-terminal moiety is responsible for binding to newly synthesized free N(0) (nucleoprotein that has not yet bound RNA), in order to prevent the binding of N(0) to cellular RNA. The C-terminal moiety consists of an oligomerisation domain, an N-RNA (nucleoprotein-RNA)-binding domain and an L polymerase-binding domain [ , ]. Phosphoprotein P is essential for the activity of the RNA polymerase complex which it forms with the L subunit. Although all the catalytic activities of the polymerase are associated with the L subunit, its function requires specific interactions with phosphoprotein P []. The P and V phosphoproteins are amino co-terminal, but diverge at their C-termini. This difference is generated by an RNA-editing mechanism in which one or two non-templated G residues are inserted into P-gene-derived mRNA. In Measles virus and Sendai virus, one G residue is inserted and the edited transcript encodes the V protein. In Mumps virus, Simian virus 5 and Newcastle disease virus, two G residues are inserted, and the edited transcript codes for the P protein []. Being phosphoproteins, both P and V are rich in serine and threonine residues over their whole lengths. In addition, the V proteins are rich in cysteine residues at the C-termini [].
Protein Domain
Name: RGS, subdomain 1/3
Type: Homologous_superfamily
Description: RGS (Regulator of G Protein Signalling) proteins are multi-functional, GTPase-accelerating proteins that promote GTP hydrolysis by the alpha subunit of heterotrimeric G proteins [ ]. Upon activation by GPCRs, heterotrimeric G proteins exchange GDP for GTP, are released from the receptor, and dissociate into free, active GTP-bound alpha subunit and beta-gamma dimer, both of which activate downstream effectors. Usually, the response is terminated upon GTP hydrolysis by the alpha subunit ( ), which can then bind the beta-gamma dimer ( , ) and the receptor. However, in some cases, RGS proteins can have a positive effect on signal potentiation [ ]. All RGS proteins contain an 'RGS-box' (or RGS domain), which is required for activity. Some small RGS proteins such as RGS1 and RGS4 are comprised of little more than an RGS domain, while others also contain additional domains that confer further functionality [ , ]. RGS domains can be found in conjunction with a variety of domains, including: DEP for membrane targeting (), PDZ for binding to GPCRs ( ), PTB for phosphotyrosine-binding ( ), RBD for Ras-binding ( ), GoLoco for guanine nucleotide inhibitor activity ( ), PX for phosphatidylinositol-binding ( ), PXA that is associated with PX ( ), PH for stimulating guanine nucleotide exchange ( ), and GGL (G protein gamma subunit-like) for binding G protein beta subunits ( ) [ ]. Those RGS proteins that contain GGL domains can interact with G protein beta subunits to form novel dimers that prevent G protein gamma subunit binding and G protein alpha subunit association, thereby preventing heterotrimer formation.The RSG box in RSG4 corresponds to an array of α-helices that fold into two domains. Both are required for GAP (GTPase activating protein) activity [ ]. This superfamily represents the subdomains 1 and 3 of the RSG box.
Protein Domain
Name: RGS, subdomain 2
Type: Homologous_superfamily
Description: RGS (Regulator of G Protein Signalling) proteins are multi-functional, GTPase-accelerating proteins that promote GTP hydrolysis by the alpha subunit of heterotrimeric G proteins [ ]. Upon activation by GPCRs, heterotrimeric G proteins exchange GDP for GTP, are released from the receptor, and dissociate into free, active GTP-bound alpha subunit and beta-gamma dimer, both of which activate downstream effectors. Usually, the response is terminated upon GTP hydrolysis by the alpha subunit (), which can then bind the beta-gamma dimer ( , ) and the receptor. However, in some cases, RGS proteins can have a positive effect on signal potentiation [ ]. All RGS proteins contain an 'RGS-box' (or RGS domain), which is required for activity. Some small RGS proteins such as RGS1 and RGS4 are comprised of little more than an RGS domain, while others also contain additional domains that confer further functionality [ , ]. RGS domains can be found in conjunction with a variety of domains, including: DEP for membrane targeting (), PDZ for binding to GPCRs ( ), PTB for phosphotyrosine-binding ( ), RBD for Ras-binding ( ), GoLoco for guanine nucleotide inhibitor activity ( ), PX for phosphatidylinositol-binding ( ), PXA that is associated with PX ( ), PH for stimulating guanine nucleotide exchange ( ), and GGL (G protein gamma subunit-like) for binding G protein beta subunits ( ) [ ]. Those RGS proteins that contain GGL domains can interact with G protein beta subunits to form novel dimers that prevent G protein gamma subunit binding and G protein alpha subunit association, thereby preventing heterotrimer formation.The RSG box in RSG4 corresponds to an array of α-helices that fold into two domains. Both are required for GAP (GTPase activating protein) activity [ ]. This superfamily represents the subdomain 2 of the RSG box.
Protein Domain
Name: SecA conserved site
Type: Conserved_site
Description: Secretion across the inner membrane in some Gram-negative bacteria occurs via the preprotein translocase pathway. Proteins are produced in the cytoplasm as precursors, and require a chaperone subunit to direct them to the translocase component [ ]. From there, the mature proteins are either targeted to the outer membrane, or remain as periplasmic proteins. The translocase protein subunits are encoded on the bacterial chromosome.The translocase itself comprises 7 proteins, including a chaperone protein (SecB), an ATPase (SecA), an integral membrane complex (SecCY, SecE and SecG), and two additional membrane proteins that promote the release of the mature peptide into the periplasm (SecD and SecF) [ ]. The chaperone protein SecB [] is a highly acidic homotetrameric protein that exists as a "dimer of dimers"in the bacterial cytoplasm. SecB maintains preproteins in an unfolded state after translation, and targets these to the peripheral membrane protein ATPase SecA for secretion [ ].SecA is a cytoplasmic protein of 800 to 960 amino acid residues. Homologues of secA are also encoded in the chloroplast genome of some algae [ ] as well as in the nuclear genome of plants []. It could be involved in the intraorganellar protein transport into thylakoids.The signature pattern of this entry is located in the C-terminal region of the RecA-like domain.
Protein Domain
Name: Fungal G-protein, alpha subunit
Type: Family
Description: Guanine nucleotide binding proteins (G proteins) are membrane-associated, heterotrimeric proteins composed of three subunits: alpha ( ), beta ( ) and gamma ( ) [ ]. G proteins and their receptors (GPCRs) form one of the most prevalent signalling systems in mammalian cells, regulating systems as diverse as sensory perception, cell growth and hormonal regulation []. At the cell surface, the binding of ligands such as hormones and neurotransmitters to a GPCR activates the receptor by causing a conformational change, which in turn activates the bound G protein on the intracellular-side of the membrane. The activated receptor promotes the exchange of bound GDP for GTP on the G protein alpha subunit. GTP binding changes the conformation of switch regions within the alpha subunit, which allows the bound trimeric G protein (inactive) to be released from the receptor, and to dissociate into active alpha subunit (GTP-bound) and beta/gamma dimer. The alpha subunit and the beta/gamma dimer go on to activate distinct downstream effectors, such as adenylyl cyclase, phosphodiesterases, phospholipase C, and ion channels. These effectors in turn regulate the intracellular concentrations of secondary messengers, such as cAMP, diacylglycerol, sodium or calcium cations, which ultimately lead to a physiological response, usually via the downstream regulation of gene transcription. The cycle is completed by the hydrolysis of alpha subunit-bound GTP to GDP, resulting in the re-association of the alpha and beta/gamma subunits and their binding to the receptor, which terminates the signal []. The length of the G protein signal is controlled by the duration of the GTP-bound alpha subunit, which can be regulated by RGS (regulator of G protein signalling) proteins or by covalent modifications [].G protein alpha subunits are 350-400 amino acids in length and have molecular weights in the range 40-45kDa. Seventeen distinct types ofalpha subunit have been identified in mammals. These fall into 4 main groups on the basis of both sequence similarity and function: alpha-S (), alpha-Q (), alpha-I ( )and alpha-12( ) [ ].The specific combination of subunits in heterotrimeric G proteins affects not only which receptor it can bind to, but also which downstream target is affected, providing the means to target specific physiological processes in response to specific external stimuli [ , ]. G proteins carry lipid modifications on one or more of their subunits to target them to the plasma membrane and to contribute to protein interactions.This family consists of the fungal class of G-protein alpha subunits. In Saccharomyces cerevisiae, two GTP-binding alpha subunits of the heterotrimeric G protein have been identified, Gpa1 and Gpa2. Gpa1 interacts with yeast pheromone receptors (Ste2 or Ste3) that initiate the signalling response leading to mating between haploid a and alpha cells. The exchange of GDP for GTP on Gpa1 alters its interaction with the G protein beta subunit Ste4, leading to dissociation of the G protein beta-gamma dimer Ste4-Ste18. The dissociated subunits activate the downstream pheromone signalling MAP kinase cascade that induce changes necessary to produce mating-competent cells [ , ]. Gpa2 interacts with Gpr1, a GPCR which has signalling role in response to nutrients []. The sequence identity between different fungal alpha subunits is relatively low and is equivalent to the level of similarity observed between mammalian alpha subtypes. The GTP-binding and hydrolysis regions, however, show remarkable conservation.
Protein Domain
Name: Tetracycline resistance leader peptide, TetL
Type: Family
Description: The antibiotic tetracycline has a broad spectrum of activity, acting to inhibit bacterial protein synthesis by binding to the 30S ribosomal subunit, which prevents the association of the aminoacyl-tRNA to the ribosomal acceptor A site. Tetracycline binding is reversible, therefore diluting out the antibiotic can reverse its effects. Tetracycline resistance genes are often located on mobile elements, such as plasmids, transposons and/or conjugative transposons, which can sometimes be transferred between bacterial species. In certain cases, tetracycline can enhance the transfer of these elements, thereby promoting resistance amongst a bacterial colony. There are three types of tetracycline resistance: tetracycline efflux, ribosomal protection, and tetracycline modification [ , ]: Tetracycline efflux proteins belong to the major facilitator superfamily. Efflux proteins are membrane-associated proteins that recognise and export tetracycline from the cell. They are found in both Gram-positive and Gram-negative bacteria [ ]. There are at least 22 different tetracycline efflux proteins, grouped according to sequence similarity: Group 1 are Tet(A), Tet(B), Tet(C), Tet(D), Tet(E), Tet(G), Tet(H), Tet(J), Tet(Z) and Tet(30); Group 2 are Tet(K) and Tet(L); Group 3 are Otr(B) and Tcr(3); Group 4 is TetA(P); Group 5 is Tet(V). In addition, there are the efflux proteins Tet(31), Tet(33), Tet(V), Tet(Y), Tet(34), and Tet(35).Ribosomal protection proteins are cytoplasmic proteins that display homology with the elongation factors EF-Tu and EF-G. Protection proteins bind the ribosome, causing an alteration in ribosomal conformation that prevents tetracycline from binding. There are at least ten ribosomal protection proteins: Tet(M), Tet(O), Tet(S), Tet(W), Tet(32), Tet(36), Tet(Q), Tet(T), Otr(A), and TetB(P). Both Tet(M) and Tet(O) have ribosome-dependent GTPase activity, the hydrolysis of GTP providing the energy for the ribosomal conformational changes. Tetracycline modification proteins include the enzymes Tet(37) and Tet(X), both of which inactivate tetracycline. In addition, there are the tetracycline resistance proteins Tet(U) and Otr(C).The expression of several of these tet genes is controlled by a family of tetracycline transcriptional regulators known as TetR. TetR family regulators are involved in the transcriptional control of multidrug efflux pumps, pathways for the biosynthesis of antibiotics, response to osmotic stress and toxic chemicals, control of catabolic pathways, differentiation processes, and pathogenicity [ ]. The TetR proteins identified in over 115 genera of bacteria and archaea share a common helix-turn-helix (HTH) structure in their DNA-binding domain. However, TetR proteins can work in different ways: they can bind a target operator directly to exert their effect (e.g. TetR binds Tet(A) gene to repress it in the absence of tetracycline), or they can be involved in complex regulatory cascades in which the TetR protein can either be modulated by another regulator or TetR can trigger the cellular response. This entry represents the tetracycline resistance leader peptide, which can be found in Tet(L) efflux proteins. Tet(L) is a transmembrane protein that can function as a metal-tetracycline/H+ antiporter. Its sequence is preceded by a leader region region that contains a 20-amino-acid open reading frame and an appropriately spaced ribosome binding site [ ]. Expression of the gene is induced by addition of tetracycline, which is thought to act by binding to ribosomes that translate the tet(L) leader peptide coding sequence. The presence of three inverted repeats, which can form two different conformations of mRNA, suggests that the tetracycline resistance (TcR) region is regulated by a translational attenuation mechanism. A Rho-independent transcriptional terminator structure is present immediately after the translational stop codon of the Tet protein [].
Protein Domain
Name: Band-7 stomatin-like
Type: Family
Description: The band-7 protein family comprises a diverse set of membrane-bound proteins characterised by the presence of a conserved domain, the band-7 domain, also known as SPFH or PHB domain. The exact function of the band-7 domain is not known, but examples from animal and bacterial stomatin-type proteins demonstrate binding to lipids and the ability to assemble into membrane-bound oligomers that form putative scaffolds [ ].A variety of proteins belong to the band-7 family. These include the stomatins, prohibitins, flottins and the HflK/C bacterial proteins. Eukaryotic band 7 proteins tend to be oligomeric and are involved in membrane-associated processes. Stomatins are involved in ion channel function, prohibitins are involved in modulating the activity of a membrane-bound FtsH protease and the assembly of mitochondrial respiratory complexes, and flotillins are involved in signal transduction and vesicle trafficking [ ].Stomatin, also known as human erythrocyte membrane protein band 7.2b [ ], was first identified in the band 7 region of human erythrocyte membrane proteins. It is an oligomeric, monotopic membrane protein associated with cholesterol-rich membranes/lipid rafts. Human stomatin is ubiquitously expressed in all tissues; highly in hematopoietic cells, relatively low in brain. It is associated with the plasma membrane and cytoplasmic vesicles of fibroblasts, epithelial and endothelial cells [].Stomatin is believed to be involved in regulating monovalent cation transport through lipid membranes. Absence of the protein in hereditary stomatocytosis is believed to be the reason for the leakage of Na +and K +ions into and from erythrocytes [ ]. Stomatin is also expressed in mechanosensory neurons, where it may interact directly with transduction components, including cation channels [].Stomatin proteins have been identified in various organisms, including Caenorhabditis elegans. There are nine stomatin-like proteins in C. elegans, MEC-2 being the one best characterised [ ]. In mammals, other stomatin family members are stomatin-like proteins SLP1, SLP2 and SLP3, and NPHS2 (podocin), which display selective expression patterns []. Stomatin family members are oligomeric, they mostly localise to membrane domains, and in many cases have been shown to modulate ion channel activity.The stomatins and prohibitins, and to a lesser extent flotillins, are highly conserved protein families and are found in a variety of organisms ranging from prokaryotes to higher eukaryotes, whereas HflK and HflC homologues are only present in bacteria [ ].This entry represents the stomatins and stomatin-like proteins, including podicin, from a wide range of eukaryotes, bacteria, archaea and viruses. It excludes the HflK and HflC proteins, prohibitins and flotillins.
Protein Domain
Name: Envelope small membrane protein, SARS-CoV-2-like
Type: Family
Description: This entry represents the Envelope (E) small membrane protein of Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), also known as 2019 novel coronavirus (2019-nCoV) or COVID-19 virus.E protein is the smallest of the major structural proteins. It is conserved among Coronavirus strains. It is an integral membrane protein involved in several aspects of the virus' life cycle, such as assembly, budding, envelope formation, and pathogenesis [ ]. During the replication cycle, E is abundantly expressed inside the infected cell, but only a small portion is incorporated into the virus envelope. The majority of the protein participates in viral assembly and budding [, ]. It can act as a viroporin by oligomerizing after insertion in host membranes to create a hydrophilic pore that allows ion transport [, ]. Additionally, the E protein is thought to prevent M protein aggregation and induce membrane curvature [].SARS-CoV E protein forms a Ca2+ permeable channel in the endoplasmic reticulum Golgi apparatus intermediate compartment (ERGIC)/Golgi membranes. The E protein ion channel activity alters Ca2+ homeostasis within cells boosting the activation of the NLRP3 inflammasome, which leads to the overproduction of IL-1beta. SARS-CoV overstimulates the NF-kappaB inflammatory pathway and interacts with the cellular protein syntenin, triggering p38 MARK activation. These signalling cascades result in exacerbated inflammation and immunopathology [ ].Cov E proteins have a short hydrophilic N terminus, followed by a large hydrophobic transmembrane (TM) domain, and end with a long, hydrophilic C terminus, which comprises the majority of the protein. The hydrophobic region of the TM domain contains at least one predicted amphipathic α-helix that pentamerizes to form an ion-conductive pore in membranes. CoV E proteins have been proposed to have at least two roles. One is related to their TM channel domain. This would be active in the secretory pathway, altering lumenal environments and rearranging secretory organelles and leading to efficient trafficking of virions. The other would be related to their extramembrane domains, particularly the C-terminal domain. This is involved in protein-protein interactions and targeting, among other roles [ , , , ]. In the CoV E protein structure a longer α-helix encompasses the TM domain, which is connected to another shorter C-terminal α-helix by a flexible linker domain, forming an L-shape [Li]. The CoV E pentamer is a right handed α-helical bundle where the C-terminal tails coil around each other [].
Protein Domain
Name: Thaumatin, conserved site
Type: Conserved_site
Description: Thaumatin [ ] is an intensely sweet-tasting protein, 100,000 times sweeter than sucrose on a molar basis [] found in berries from Thaumatococcus daniellii, a tropical flowering plant known as Katemfe, it is induced by attack by viroids, which are single-stranded unencapsulated RNA molecules that do not code for protein.Thaumatin consists of about 200 residues and contains 8 disulphide bonds. Like other PR proteins, thaumatin is predicted to have a mainly beta structure, with a high content of β-turns and little helix [ ]. Several stress-induced proteins of plants have been found to be related to thaumatins: A maize alpha-amylase/trypsin inhibitorTwo tobacco pathogenesis-related proteins: PR-R major and minor forms,which are induced after infection with viruses Salt-induced protein NP24 from tomatoOsmotin, a salt-induced protein from tobacco [ ] Osmotin-like proteins OSML13, OSML15 and OSML81 from potato [ ] P21, a leaf protein from soybeanPWIR2, a leaf protein from wheat [ ] Zeamatin, a maize antifungal protein [ ] This protein is also referred to as pathogenesis-related group 5 (PR5), as many thaumatin-like proteins accumulate in plants in response to infection by a pathogen and possess antifungal activity [ ]. The proteins are involved in systematically acquired resistance and stress responses in plants, although their precise role is unknown [].This entry represents a conserved site that includes three cysteine residues known to be involved in disulphide bonds.
Protein Domain
Name: Alpha-1-acid glycoprotein
Type: Family
Description: The name 'lipocalin' has been proposed [ ] forthe larger protein family, but cytosolic fatty-acid binding proteins are also included. The sequences of most members of the family, the core or kernal lipocalins, are characterised by three short conserved stretches of residues, while others, the outlier lipocalin group, share only one or two of these[ , ]. Proteins known to belong to this family include alpha-1-microglobulin (protein HC);alpha-1-acid glycoprotein (orosomucoid) [ ]; aphrodisin; apolipoprotein D; beta-lactoglobulin; complementcomponent C8 gamma chain [ ]; crustacyanin []; epididymal-retinoic acid binding protein(E-RABP) [ ]; insectacyanin; odorant-binding protein (OBP); human pregnancy-associated endometrial alpha-2globulin; probasin (PB), a rat prostatic protein; prostaglandin D synthase [ ]; purpurin; VonEbner's gland protein (VEGP) [ ]; and Lacerta vivipara (Common lizard) epididymal secretory protein IV (LESP IV) [ ].Alpha-1-acid glycoprotein (orosomucin) (A1AG) is a major serum glycoprotein of unknown physiological function []. It has been implicated in cellularinflammation by increasing tissue factor expression and tumour necrosis factor alpha secretion of monocytes. The protein is associated with fiveasparaginyl-linked complex oligosaccaride chains, the proportions and identities of which are altered in several physiological and pathologicalconditions [ ]. A1AG is also clinically important as a non-specific drugbinder in serum [ ]. It has recently been shown to bind tothalidomide, which may be involved in its inhibition of tumor necrosis factor alpha production [].
Protein Domain
Name: G-protein, beta subunit
Type: Domain
Description: Guanine nucleotide binding proteins (G proteins) are membrane-associated, heterotrimeric proteins composed of three subunits: alpha (), beta ( ) and gamma ( ) [ ]. G proteins and their receptors (GPCRs) form one of the most prevalent signalling systems in mammalian cells, regulating systems as diverse as sensory perception, cell growth and hormonal regulation []. At the cell surface, the binding of ligands such as hormones and neurotransmitters to a GPCR activates the receptor by causing a conformational change, which in turn activates the bound G protein on the intracellular-side of the membrane. The activated receptor promotes the exchange of bound GDP for GTP on the G protein alpha subunit. GTP binding changes the conformation of switch regions within the alpha subunit, which allows the bound trimeric G protein (inactive) to be released from the receptor, and to dissociate into active alpha subunit (GTP-bound) and beta/gamma dimer. The alpha subunit and the beta/gamma dimer go on to activate distinct downstream effectors, such as adenylyl cyclase, phosphodiesterases, phospholipase C, and ion channels. These effectors in turn regulate the intracellular concentrations of secondary messengers, such as cAMP, diacylglycerol, sodium or calcium cations, which ultimately lead to a physiological response, usually via the downstream regulation of gene transcription. The cycle is completed by the hydrolysis of alpha subunit-bound GTP to GDP, resulting in the re-association of the alpha and beta/gamma subunits and their binding to the receptor, which terminates the signal []. The length of the G protein signal is controlled by the duration of the GTP-bound alpha subunit, which can be regulated by RGS (regulator of G protein signalling) proteins or by covalent modifications [].G protein alpha subunits are 350-400 amino acids in length and have molecular weights in the range 40-45kDa. Seventeen distinct types ofalpha subunit have been identified in mammals. These fall into 4 main groups on the basis of both sequence similarity and function: alpha-S (), alpha-Q (), alpha-I ( )and alpha-12( ) [ ].The specific combination of subunits in heterotrimeric G proteins affects not only which receptor it can bind to, but also which downstream target is affected, providing the means to target specific physiological processes in response to specific external stimuli [ , ]. G proteins carry lipid modifications on one or more of their subunits to target them to the plasma membrane and to contribute to protein interactions.This entry consists of the G protein beta subunit, which assumes a barrel-shaped β-propeller structure containing WD-40 repeats preceded by an N-terminal alpha helix. The beta subunit forms a stable dimer with the gamma subunit. The alpha subunit only contacts the beta subunit in the dimer, lying on the opposite face from the gamma subunit. RGS proteins that contain GGL (G protein gamma-like) domains can interact with beta subunits to form novel dimers that prevent gamma subunit binding, and may prevent heterotrimer formation by inhibiting alpha subunit binding.
Protein Domain
Name: B transposition protein, C-terminal domain superfamily
Type: Homologous_superfamily
Description: Bacteriophage Mu can integrate into the host bacterial genome and replicate via transposition. Mu requires the activity of four proteins for DNA transposition. Two of these proteins are the phage-encoded A and B transposition proteins, while the other two are host-specified accessory factors HU and IHF. These four proteins can form nucleoprotein complexes (transposomes), which enable strand transfer. The stable protein-DNA intermediate is subsequently disassembled prior to DNA replication by host proteins.The Mu B transposition protein is an ATP-dependent, DNA-binding protein required for target capture and immunity, as well as for activating transpososome function [ ]. The C-terminal domain of the B transposition protein is believed to be involved in both DNA-binding and protein-protein contacts with the Mu A transposition protein. The structure of the C-terminal domain consists of four helices in an irregular array [].
Protein Domain
Name: B transposition protein, C-terminal
Type: Domain
Description: Bacteriophage Mu can integrate into the host bacterial genome and replicate via transposition. Mu requires the activity of four proteins for DNA transposition. Two of these proteins are the phage-encoded A and B transposition proteins, while the other two are host-specified accessory factors HU and IHF. These four proteins can form nucleoprotein complexes (transposomes), which enable strand transfer. The stable protein-DNA intermediate is subsequently disassembled prior to DNA replication by host proteins.The Mu B transposition protein is an ATP-dependent, DNA-binding protein required for target capture and immunity, as well as for activating transpososome function [ ]. The C-terminal domain of the B transposition protein is believed to be involved in both DNA-binding and protein-protein contacts with the Mu A transposition protein. The structure of the C-terminal domain consists of four helices in an irregular array [].
Protein Domain
Name: Mitochondrial outer membrane translocase complex, Tom20 domain superfamily
Type: Homologous_superfamily
Description: The mitochondrial protein translocase (MPT) family, which brings nuclearly encoded preproteins into mitochondria, is very complex with 19 currently identified protein constituents. These proteins include several chaperone proteins, four proteins of the outer membrane translocase (Tom) import receptor, five proteins of the Tom channel complex, five proteins of the inner membrane translocase (Tim) and three "motor"proteins [ ].The Tom complex performs multiple functions in a coordinated manner. Tom20, Tom37, Tom70, and Tom72 function as redundant receptor proteins, while Tom40 constitutes a protein conducting channel with the aid of Tom22 [ , ]. Tom20 functions as a general import receptor for mitochondrial proteins and passes them onto the Tom40 channel [].This superfamily represents the structural domain of Tom20 that binds to mitochondrial precursor proteins. It consists of five α-helices. This first four form a well-defined structure with an internal hydrophobic core [ ].
Protein Domain
Name: SAP domain
Type: Domain
Description: The SAP motif is a 35-residue motif, which has been named after SAF-A/B, Acinus and PIAS, three proteins known to contain it. The SAP motif is found ina variety of nuclear proteins involved in transcription, DNA repair, RNA processing or apoptotic chromatin degradation. As the sap motif of SAF-A hasbeen shown to be essential for specific DNA binding activity, it has been proposed that it could be a DNA-binding motif [].A multiple alignment of the SAP motif reveals a bipartite distribution of strongly conserved hydrophobic, polar and bulky amino acids separated by aregion that contains a glycine. Secondary structure predictions suggest that the SAP motif could form two alpha helices separated by a turn [].Some proteins known to contain a SAP motif are listed below:Vertebrate scaffold attachment factors A and B (SAF-A/B). These two proteins are heterogeneous nuclear ribonucleoproteins (hnRNPs) that bind toAT-rich chromosomal region. It has been proposed that they couple RNA metabolism to nuclear organisation [, ]. The SAF-A protein is cleaved bycaspase-3 during apoptosis [ ].Mammalian Acinus, a protein which induces apoptotic chromatin condensation after cleavage by caspase-3 []. Acinus also contains a RNA-recognitionmotif. Eukaryotic proteins of the PIAS (protein inhibitor of activated STAT) family. These proteins interact with phosphorylated STAT dimers and inhibitSTAT mediated gene activation. Deletion of the first 50 amino acid residues containing the SAP domain allows the interaction of PIAS1 with STAT1monomer [ ].Plant poly(ADP-ribose) polymerase (PARP). PARP is a nuclear protein that catalyzes the poly(ADP-ribosyl)ation of proteins. It is involved inresponses to mild and severe oxidative stresses, by mediating DNA repair and programmed cell death processes, respectively []. PARP is tightlybound to chromatin or nuclear matrix. Arabidopsis thaliana Arp, an apurinic endonuclease-redox protein.Yeast THO1 protein. It could be involved in the regulation of transcriptional elongation by RNA polymerase II [].Animal Ku70. Together with Ku86, it forms a DNA ends binding complex that is involved in repairing DNA double-strand breaks.Yeast RAD18, a protein involved in DNA repair.Neurospora crassa UVS-2, the homologue of RAD18.
Protein Domain
Name: LSM domain superfamily
Type: Homologous_superfamily
Description: This domain superfamily is found as the core structure in Lsm (like-Sm) proteins and bacterial Lsm-related Hfq proteins, and as the middle domain of the mechanosensitive channel protein MscS. In each case, the domain adopts a core structure consisting of an open β-barrel with an SH3-like topology.Lsm proteins have diverse functions, and are thought to be important modulators of RNA biogenesis and function [ , ]. The Sm proteins form part of specific small nuclear ribonucleoproteins (snRNPs) that are involved in the processing of pre-mRNAs to mature mRNAs, and are a major component of the eukaryotic spliceosome. These snRNPs consist of seven Sm proteins (B/B', D1, D2, D3, E, F and G), plus a small nuclear RNA (snRNA) (either U1, U2, U5 or U4/6) []. Other snRNPs, such as U7 snRNP, can contain different Lsm proteins. Lsm proteins are also found in archaebacteria, which do not have any splicing apparatus suggesting a more general role for Lsm proteins.The pleiotropic translational regulator Hfq (host factor Q) is a bacterial Lsm-like protein, which modulates the structure of numerous RNA molecules by binding preferentially to A/U-rich sequences in RNA [ ]. Hfq forms an Lsm-like fold, however, unlike the heptameric Sm proteins, Hfq forms a homo-hexameric ring.The middle domain of the mechanosensitive channel of small conductance protein (MscS or YggB) structurally resembles an Lsm protein. MscS is a mechanosensitive channel present in the membrane of bacteria, archaea and eukarya that responds both to stretching of the cell membrane and to membrane depolarisation [ ]. MscS folds as a homo-heptamer with a cylindrical shape, and can be divided into transmembrane and extramembrane regions: an N-terminal periplasmic region, a transmembrane region, and a C-terminal cytoplasmic region. The C-terminal cytoplasmic region can be further divided into middle and C-terminal domains, which together create a framework that connects to the cytoplasm through distinct openings. The middle domain exhibits an Lsm-like structure, consisting of five β-strands that pack together with those of other subunits to form a barrel-like sheet extending around the entire protein.
Protein Domain
Name: WH2 domain
Type: Domain
Description: The WH2 (WASP-Homology 2, or Wiskott-Aldrich homology 2) domain is an ~18 amino acids actin-binding motif. This domain was first recognised as an essential element for the regulation of the cytoskeleton by the mammalian Actin nucleation-promoting factor WAS (also known asWiskott-Aldrich syndrome protein, WASP). WH2 proteins occur in eukaryotes from yeast to mammals, in insect viruses, and in some bacteria. The WH2 domain is found as a modular part of larger proteins; it can be associated with the WH1 or EVH1 domain and with the CRIB domain , and the WH2 domain can occur as a tandem repeat. The WH2 domain binds actin monomers and can facilitate the assembly of actin monomers into newly forming actin filaments [ , ]. Some proteins known to contain a WH2 domain:Mammalian Actin nucleation-promoting factor WAS (WASP), a possible regulator of lymphocyte and platelet function. Defects in WASP are the cause of Wiskott- Aldrich syndrome (WAS), an X-linked recessive disease characterised by immune dysregulation and microthrombocytopenia. WASP proteins bind the actin nucleating protein complex Arp2/3.Mammalian N-WASP/WASL and WASF/SCAR/WAVE1-3, and yeast LAS17, which are also proteins from the WASP family that participate in the transduction of signals from the cell surface to the actin cytoskeleton.WAS protein family homologue 1 (WASH1), acts as a nucleation-promoting factor at the surface of endosomes, where it recruits and activates the Arp2/3 complex to induce actin polymerisation. Baker's yeast Verprolin, a protein involved in cytoskeletal organisation and cellular growth.Human WASP interacting protein (WASPIP/WIP), a WASP-, profilin- and actin-binding protein which induces actin polymerisation and redistribution.Nuclear polyhedrosis virus (NPV) P61/78/83 capsid protein, which may be important for the persistence and survival of the virus.Fruit fly Spir(e) protein, an actin nucleation factor involved in the development of oocytes and embryos. Spir is conserved among metazoans.Mammalian metastasis suppressor 1 or Missing in Metastasis (MIM) protein, an actin-binding protein that may be related to cancer progression or tumor metastasis.
Protein Domain
Name: SAP domain superfamily
Type: Homologous_superfamily
Description: The SAP motif is a 35-residue motif, which has been named after SAF-A/B, Acinus and PIAS, three proteins known to contain it. The SAP motif is found ina variety of nuclear proteins involved in transcription, DNA repair, RNA processing or apoptotic chromatin degradation. As the sap motif of SAF-A hasbeen shown to be essential for specific DNA binding activity, it has been proposed that it could be a DNA-binding motif [].A multiple alignment of the SAP motif reveals a bipartite distribution of strongly conserved hydrophobic, polar and bulky amino acids separated by aregion that contains a glycine. Secondary structure predictions suggest that the SAP motif could form two alpha helices separated by a turn [].Some proteins known to contain a SAP motif are listed below:Vertebrate scaffold attachment factors A and B (SAF-A/B). These two proteins are heterogeneous nuclear ribonucleoproteins (hnRNPs) that bind toAT-rich chromosomal region. It has been proposed that they couple RNA metabolism to nuclear organisation [, ]. The SAF-A protein is cleaved bycaspase-3 during apoptosis [ ].Mammalian Acinus, a protein which induces apoptotic chromatin condensation after cleavage by caspase-3 []. Acinus also contains a RNA-recognitionmotif. Eukaryotic proteins of the PIAS (protein inhibitor of activated STAT) family. These proteins interact with phosphorylated STAT dimers and inhibitSTAT mediated gene activation. Deletion of the first 50 amino acid residues containing the SAP domain allows the interaction of PIAS1 with STAT1monomer [ ].Plant poly(ADP-ribose) polymerase (PARP). PARP is a nuclear protein that catalyzes the poly(ADP-ribosyl)ation of proteins. It is involved inresponses to mild and severe oxidative stresses, by mediating DNA repair and programmed cell death processes, respectively []. PARP is tightlybound to chromatin or nuclear matrix. Arabidopsis thaliana Arp, an apurinic endonuclease-redox protein.Yeast THO1 protein. It could be involved in the regulation of transcriptional elongation by RNA polymerase II [].Animal Ku70. Together with Ku86, it forms a DNA ends binding complex that is involved in repairing DNA double-strand breaks.Yeast RAD18, a protein involved in DNA repair.Neurospora crassa UVS-2, the homologue of RAD18.
Protein Domain
Name: Rap/Ran-GAP domain
Type: Domain
Description: Structural domains comprising this superfamily share the structure of two shown to be homologous GTPase activating proteins for Rap and Ran. Both are Ras-like guanine-nucleotide-binding proteins (GNBPs) involved in a variety of signal-transduction processes and their activity is regulated by GEFs and GAPs.Rap small G proteins have been implicated in various cellular processes such as exocytosis, cAMP signalling, cell adhesion and cell proliferation. Rap proteins acts as molecular switches, with an active GTP-bound form and an inactive GDP-bound form [ ]. The inactive GDP bound form is promoted by GTPase-activating proteins (GAPs). GAP proteins specific for Rap contain a conserved region of around 200 amino-acid residues, the RapGAP domain. This domain can accelerate the GTP hydrolysis activity of Rap by five orders of magnitude [].Ran, also known as GTP-binding nuclear protein, is on the other hand essential for the translocation of RNA and proteins through the nuclear pore complex and has been implicated in the control of DNA synthesis and cell cycle progression. Proteins known to contain a Rap-GAP domain include:RAP1 GTPase activating protein (RAP1GAP).Mammalian tuberin protein, the product of a familial tuberous sclerosis gene which, when deleted, causes begnin tumours. It also have a GAP activity for Rab5 [].Drosophila Gigas protein, an homologue of tuberin involved in regulation of cell cycle.Mammalian tuberin-like protein TULIP.GTPase-activating protein Spa-1. It functions as a negative regulator for the activation of Rap1, thereby having a negative effect on cell adhesion [].
Protein Domain
Name: Rap/Ran-GAP superfamily
Type: Homologous_superfamily
Description: Structural domains comprising this superfamily share the structure of two shown to be homologous GTPase activating proteins for Rap and Ran. Both are Ras-like guanine-nucleotide-binding proteins (GNBPs) involved in a variety of signal-transduction processes and their activity is regulated by GEFs and GAPs.Rap small G proteins have been implicated in various cellular processes such as exocytosis, cAMP signalling, cell adhesion and cell proliferation. Rap proteins acts as molecular switches, with an active GTP-bound form and an inactive GDP-bound form [ ]. The inactive GDP bound form is promoted by GTPase-activating proteins (GAPs). GAP proteins specific for Rap contain a conserved region of around 200 amino-acid residues, the RapGAP domain. This domain can accelerate the GTP hydrolysis activity of Rap by five orders of magnitude [].Ran, also known as GTP-binding nuclear protein, is on the other hand essential for the translocation of RNA and proteins through the nuclear pore complex and has been implicated in the control of DNA synthesis and cell cycle progression. Proteins known to contain a Rap-GAP domain include:RAP1 GTPase activating protein (RAP1GAP).Mammalian tuberin protein, the product of a familial tuberous sclerosis gene which, when deleted, causes begnin tumours. It also have a GAP activity for Rab5 [].Drosophila Gigas protein, an homologue of tuberin involved in regulation of cell cycle.Mammalian tuberin-like protein TULIP.GTPase-activating protein Spa-1. It functions as a negative regulator for the activation of Rap1, thereby having a negative effect on cell adhesion [].
Protein Domain
Name: Zona pellucida domain
Type: Domain
Description: The zona pellucida (ZP) domain is a protein polymerisation module of ~260 amino acid module, which is found at the C terminus of many secretedeukaryotic glycoproteins that play fundamental roles in development, hearing, immunity, and cancer [, , , ].Proteins containing a ZP domain include: Sperm receptor proteins ZP2 and ZP3. Along with protein ZP1, proteins ZP2 and ZP3 are responsible for sperm-adhesion to the zona pellucida. ZP3 firstbinds to specific sperm proteins, thus mediating sperm contacts with the oocyte. ZP2 acts as a second sperm receptor reinforcing the interactions.ZP1 cross-links the polymers formed by ZP2 and ZP3.Zona pellucida sperm-binding protein B (ZP-B) (also known as ZP-X in rabbit and ZP-3 alpha in pig).Glycoprotein GP2, the major component of pancreatic secretory granule membranes.TGF-beta receptor type III (also known as betaglycan). This protein is a proteoglycan that binds to TGF-beta and could be involved in capturing andretaining TGF-beta for presentation to the signalling receptors.Uromodulin (also known as Tamm-Horsfall urinary glycoprotein). The function of this protein, which is the most abundant in human urine, is not yetclear.Chicken beta-tectorin, a major glycoprotein of avian tectorial membrane.Most ZP domain proteins are synthesized as precursors with carboxy-terminaltransmembrane domains or glycosyl phosphatidylinositol (GPI) anchors [ ].The ZP domain contains eight strictly conserved cysteines, which formdisulphide bridges. The disulphide bonds within the ZP domains are divided into two groups, suggesting that the ZP domain consists of two subdomains. Inaddition to the conserved cysteines, only a few aromatic or hydrophobic amino acids are absolutely invariant, probably as a result of structural rather thanfunctional constraints [ , , ].
Protein Domain
Name: Clathrin adaptor, alpha/beta/gamma-adaptin, appendage, Ig-like subdomain
Type: Domain
Description: Proteins synthesized on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. These vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transport [ ]. Clathrin coats contain both clathrin (acts as a scaffold) and adaptor complexes that link clathrin to receptors in coated vesicles. Clathrin-associated protein complexes are believed to interact with the cytoplasmic tails of membrane proteins, leading to their selection and concentration. The two major types of clathrin adaptor complexes are the heterotetrameric adaptor protein (AP) complexes, and the monomeric GGA (Golgi-localising, Gamma-adaptin ear domain homology, ARF-binding proteins) adaptors [, ].AP (adaptor protein) complexes are found in coated vesicles and clathrin-coated pits. AP complexes connect cargo proteins and lipids to clathrin at vesicle budding sites, as well as binding accessory proteins that regulate coat assembly and disassembly (such as AP180, epsins and auxilin). There are different AP complexes in mammals. AP1 is responsible for the transport of lysosomal hydrolases between the TGN and endosomes [ ]. AP2 associates with the plasma membrane and is responsible for endocytosis []. AP3 is responsible for protein trafficking to lysosomes and other related organelles []. AP4 is less well characterised. AP complexes are heterotetramers composed of two large subunits (adaptins), a medium subunit (mu) and a small subunit (sigma). For example, in AP1 these subunits are gamma-1-adaptin, beta-1-adaptin, mu-1 and sigma-1, while in AP2 they are alpha-adaptin, beta-2-adaptin, mu-2 and sigma-2. Each subunit has a specific function. Adaptins recognise and bind to clathrin through their hinge region (clathrin box), and recruit accessory proteins that modulate AP function through their C-terminal ear (appendage) domains. Mu recognises tyrosine-based sorting signals within the cytoplasmic domains of transmembrane cargo proteins []. One function of clathrin and AP2 complex-mediated endocytosis is to regulate the number of GABA(A) receptors available at the cell surface []. GGAs (Golgi-localising, Gamma-adaptin ear domain homology, ARF-binding proteins) are a family of monomeric clathrin adaptor proteins that are conserved from yeasts to humans. GGAs regulate clathrin-mediated the transport of proteins (such as mannose 6-phosphate receptors) from the TGN to endosomes and lysosomes through interactions with TGN-sorting receptors, sometimes in conjunction with AP-1 [ , ]. GGAs bind cargo, membranes, clathrin and accessory factors. GGA1, GGA2 and GGA3 all contain a domain homologous to the ear domain of gamma-adaptin. GGAs are composed of a single polypeptide with four domains: an N-terminal VHS (Vps27p/Hrs/Stam) domain, a GAT (GGA and Tom1) domain, a hinge region, and a C-terminal GAE (gamma-adaptin ear) domain. The VHS domain is responsible for endocytosis and signal transduction, recognising transmembrane cargo through the ACLL sequence in the cytoplasmic domains of sorting receptors [ ]. The GAT domain (also found in Tom1 proteins) interacts with ARF (ADP-ribosylation factor) to regulate membrane trafficking [], and with ubiquitin for receptor sorting []. The hinge region contains a clathrin box for recognition and binding to clathrin, similar to that found in AP adaptins. The GAE domain is similar to the AP gamma-adaptin ear domain, and is responsible for the recruitment of accessory proteins that regulate clathrin-mediated endocytosis [].This entry represents a β-sandwich structural motif found in the appendage (ear) domain of alpha-, beta- and gamma-adaptin from AP clathrin adaptor complexes, and the GAE (gamma-adaptin ear) domain of GGA adaptor proteins. These domains have an immunoglobulin-like β-sandwich fold containing 7 or 8 strands in 2 β-sheets in a Greek key topology [ , ]. Although these domains share a similar fold, there is little sequence identity between the alpha/beta-adaptins and gamma-adaptin/GAE.
Protein Domain
Name: Low-density lipoprotein (LDL) receptor class A, conserved site
Type: Conserved_site
Description: Low-density lipoprotein (LDL) receptors are the major cholesterol-carrying lipoproteins of plasma. Seven successive cysteine-rich repeats of about 40 amino acids are present in the N-terminal of this multidomain membrane protein [ ]. Similar domains have been found (see references in []) in other extracellular and membrane proteins which are listed below: Vertebrate very low density lipoprotein (VLDL) receptor, which binds and transports VLDL. Its extracellular domain is composed of 8 LDLRA domains, 3 EGF-like domains and 6 LDL-receptor class B domains (LDLRB). Vertebrate low-density lipoprotein receptor-related protein 1 (LRP1) (reviewed in [ ]), which may act as a receptor for the endocytosis of extracellular ligands. LRP1 contains 31 LDLRA domains and 22 EGF-like domains. Vertebrate low-density lipoprotein receptor-related protein 2 (LRP2) (also known as gp330 or megalin). LRP2 contains 36 LDLRA domains and 17 EGF-like domains. A LRP-homologue from Caenorhabditis elegans, which contains 35 LDLRA domains and 17 EGF-like domains. Drosophila putative vitellogenin receptor, with 13 copies of LDLRA domains and 17 EGF-like repeats. Complement factor I, which is responsible for cleaving the alpha-chains of C4b and C3b. It consists of a FIMAC domain (Factor I/MAC proteins C6/C7), a scavenger receptor-like domain, 2 copies of LDLRA and a C-terminal serine protease domain. Complement components C6, C7, C8 and C9. They contain each one LDLRA domain. Perlecan, a large multidomain basement membrane heparan sulphate proteoglycan composed of 4 LDLRA domains, 3 LamB domains, 12 laminin EGF- like domains, 14-21 IG-like domains, 3 LamG domains, and 4 EGF-like domains. A similar but shorter proteoglycan (UNC52) is found in Caenorhabditis elegans which has 3 repeats of LDLRA. Invertebrate giant extracellular hemoglobin linker chains, which allow heme-containing chains to construct giant hemoglobin (1 LDLRA domain). G-protein coupled receptor Grl101 of the snail Lymnaea stagnalis, which might directly transduce signals carried by large extracellular proteins. Vertebrate enterokinase (EC 3.4.21.9), a type II membrane protein of the intestinal brush border, which activates trypsinogen. It consists at least of a catalytic light chain and a multidomain heavy chain which has 2 LDLRA, a MAM domain (see ), a SRCR domain (see ) and a CUB domain (see ). Human autosomal dominant polycystic kidney disease protein 1 (PKD1), which is involved in adhesive protein-protein and protein-carbohydrate interactions. The potential calcium-binding site of its single LDLRA domain is missing. Vertebrate integral membrane protein DGCR2/IDD, a potential adhesion receptor with 1 LDLRA domain, a C-type lectin and a VWFC domain (see ). Drosophila serine protease nudel (EC 3.4.21.-), which is involved in the induction of dorsoventral polarity of the embryo. It has 11 LDLRA domains, 3 of which miss the first disulphide bond (C1-C3). Avian subgroup A rous sarcoma virus receptor (1 copy of LDLRA). Bovine Sco-spondin, which is secreted by the subcommissural organ in embryos and is involved in the modulation of neuronal aggregation. It contains at least 2 EGF-like domains and 3 LDLRA domains. The LDL-receptor class A domain contains 6 disulphide-bound cysteines [ ] and a highly conserved cluster of negatively charged amino acids, of which many are clustered on one face of the module []. A schematic representation of this domain is shown here: +---------------------+ +--------------------------------+ | | | |-CxxxxxxxxxxxxCxxxxxxxxCxxxxxxxxCxxxxxxxxxxCxxxxxxxxxxxxxxxxxxxxxC- | *******************************************| | +----------------------------+'C': conserved cysteine involved in a disulphide bond. 'x': any residue.'*': position of the pattern. In LDL-receptors the class A domains form the binding site for LDL [ ] and calcium. The acidic residues between the fourth and sixth cysteines are important for high-affinity binding of positively charged sequences in LDLR's ligands []. The repeat has been shown [] to consist of a β-hairpin structure followed by a series of β-turns. The binding of calcium seems to induce no significant conformational change.
Protein Domain
Name: M matrix/glycoprotein, coronavirus
Type: Family
Description: This family consists of various coronavirus matrix proteins which are transmembrane glycoproteins [ ]. The membrane (M) protein is the most abundant structural protein and defines the shape of the viral envelope. It is also regarded as the central organiser of coronavirus assembly, interacting with all other major coronaviral structural proteins. M proteins play a critical role in protein-protein interactions (as well as protein-RNA interactions) since virus-like particle (VLP) formation in many CoVs requires only the M and envelope (E) proteins for efficient virion assembly []. Interaction of spike (S) with M is necessary for retention of S in the ER-Golgi intermediate compartment (ERGIC)/Golgi complex and its incorporation into new virions, but dispensable for the assembly process. Binding of M to nucleocapsid (N) proteins stabilises the nucleocapsid (N protein-RNA complex), as well as the internal core of virions, and, ultimately, promotes completion of viral assembly. Together, M and E protein make up the viral envelope and their interaction is sufficient for the production and release of virus-like particles (VLPs) [, ].
Protein Domain
Name: M matrix/glycoprotein, bat coronavirus HKU9-like
Type: Family
Description: The membrane (M) protein is the most abundant structural protein and defines the shape of the viral envelope. It is also regarded as the central organiser of coronavirus assembly, interacting with all other major coronaviral structural proteins. M proteins play a critical role in protein-protein interactions (as well as protein-RNA interactions) since virus-like particle (VLP) formation in many CoVs requires only the M and envelope (E) proteins for efficient virion assembly [ ]. Interaction of spike (S) with M is necessary for retention of S in the ER-Golgi intermediate compartment (ERGIC)/Golgi complex and its incorporation into new virions, but dispensable for the assembly process. Binding of M to nucleocapsid (N) proteins stabilises the nucleocapsid (N protein-RNA complex), as well as the internal core of virions, and, ultimately, promotes completion of viral assembly. Together, M and E protein make up the viral envelope and their interaction is sufficient for the production and release of virus-like particles (VLPs) [, , ].This group contains the Membrane (M) protein of Rousettus bat coronavirus HKU9, and similar proteins from betacoronaviruses in the nobecovirus subgenera (D lineage).
Protein Domain
Name: M matrix/glycoprotein, HCoV-like
Type: Family
Description: The membrane (M) protein is the most abundant structural protein and defines the shape of the viral envelope. It is also regarded as the central organiser of coronavirus assembly, interacting with all other major coronaviral structural proteins. M proteins play a critical role in protein-protein interactions (as well as protein-RNA interactions) since virus-like particle (VLP) formation in many CoVs requires only the M and envelope (E) proteins for efficient virion assembly [ ]. Interaction of spike (S) with M is necessary for retention of S in the ER-Golgi intermediate compartment (ERGIC)/Golgi complex and its incorporation into new virions, but dispensable for the assembly process. Binding of M to nucleocapsid (N) proteins stabilises the nucleocapsid (N protein-RNA complex), as well as the internal core of virions, and, ultimately, promotes completion of viral assembly. Together, M and E protein make up the viral envelope and their interaction is sufficient for the production and release of virus-like particles (VLPs) [, , ].This entry contains the Membrane (M) protein of human coronaviruses (HCoVs), HCoV-OC43 and HCoV-HKU1, and similar proteins from betacoronaviruses in the embecovirus subgenera (A lineage).
Protein Domain
Name: Leucine-rich repeat-containing N-terminal, plant-type
Type: Domain
Description: Leucine-rich repeats (LRR) consist of 2-45 motifs of 20-30 amino acids in length that generally folds into an arc or horseshoe shape []. LRRs occur in proteins ranging from viruses to eukaryotes, and appear to provide a structural framework for the formation of protein-protein interactions [, ].Proteins containing LRRs include tyrosine kinase receptors, cell-adhesion molecules, virulence factors, and extracellular matrix-binding glycoproteins, and are involved in a variety of biological processes, including signal transduction, cell adhesion, DNA repair, recombination, transcription, RNA processing, disease resistance, apoptosis, and the immune response [, ].Sequence analyses of LRR proteins suggested the existence of several different subfamilies of LRRs. The significance of this classification is that repeats from different subfamilies never occur simultaneously and have most probably evolved independently. It is, however, now clear that all major classes of LRR have curved horseshoe structures with a parallel beta sheet on the concave side and mostly helical elements on the convex side. At least six families of LRR proteins, characterised by different lengths and consensus sequences of the repeats, have been identified. Eleven-residue segments of the LRRs (LxxLxLxxN/CxL), corresponding to the β-strand and adjacent loop regions, are conserved in LRR proteins, whereas the remaining parts of the repeats (herein termed variable) may be very different. Despite the differences, each of the variable parts contains two half-turns at both ends and a "linear"segment (as the chain follows a linear path overall), usually formed by a helix, in the middle. The concave face and the adjacent loops are the most common protein interaction surfaces on LRR proteins. 3D structure of some LRR proteins-ligand complexes show that the concave surface of LRR domain is ideal for interaction with α-helix, thus supporting earlier conclusions that the elongated and curved LRR structure provides an outstanding framework for achieving diverse protein-protein interactions []. Molecular modeling suggests that the conserved pattern LxxLxL, which is shorter than the previously proposed LxxLxLxxN/CxL is sufficient to impart the characteristic horseshoe curvature to proteins with 20- to 30-residue repeats []. This domain is often found at the N terminus of tandem leucine rich repeats, mainly in plant proteins.
Protein Domain
Name: Zinc finger, CCCH-type
Type: Domain
Description: Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [ , , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. This entry represents C-x8-C-x5-C-x3-H (CCCH) type Zinc finger (Znf) domains. Proteins containing CCCH Znf domains include Znf proteins from eukaryotes involved in cell cycle or growth phase-related regulation, e.g. human TIS11B (butyrate response factor 1), a probable regulatory protein involved in regulating the response to growth factors, and the mouse TTP growth factor-inducible nuclear protein, which has the same function. The mouse TTP protein is induced by growth factors. Another protein containing this domain is the human splicing factor U2AF 35kDa subunit, which plays a critical role in both constitutive and enhancer-dependent splicing by mediating essential protein-protein interactions and protein-RNA interactions required for 3' splice site selection. It has been shown that different CCCH-type Znf proteins interact with the 3'-untranslated region of various mRNA [ , ]. This type of Znf is very often present in two copies.
Protein Domain
Name: Leucine-rich repeat, cysteine-containing subtype
Type: Repeat
Description: Leucine-rich repeats (LRR) consist of 2-45 motifs of 20-30 amino acids in length that generally folds into an arc or horseshoe shape [ ]. LRRs occur in proteins ranging from viruses to eukaryotes, and appear to provide a structural framework for the formation of protein-protein interactions [, ].Proteins containing LRRs include tyrosine kinase receptors, cell-adhesion molecules, virulence factors, and extracellular matrix-binding glycoproteins, and are involved in a variety of biological processes, including signal transduction, cell adhesion, DNA repair, recombination, transcription, RNA processing, disease resistance, apoptosis, and the immune response [, ].Sequence analyses of LRR proteins suggested the existence of several different subfamilies of LRRs. The significance of this classification is that repeats from different subfamilies never occur simultaneously and have most probably evolved independently. It is, however, now clear that all major classes of LRR have curved horseshoe structures with a parallel beta sheet on the concave side and mostly helical elements on the convex side. At least six families of LRR proteins, characterised by different lengths and consensus sequences of the repeats, have been identified. Eleven-residue segments of the LRRs (LxxLxLxxN/CxL), corresponding to the β-strand and adjacent loop regions, are conserved in LRR proteins, whereas the remaining parts of the repeats (herein termed variable) may be very different. Despite the differences, each of the variable parts contains two half-turns at both ends and a "linear"segment (as the chain follows a linear path overall), usually formed by a helix, in the middle. The concave face and the adjacent loops are the most common protein interaction surfaces on LRR proteins. 3D structure of some LRR proteins-ligand complexes show that the concave surface of LRR domain is ideal for interaction with α-helix, thus supporting earlier conclusions that the elongated and curved LRR structure provides an outstanding framework for achieving diverse protein-protein interactions []. Molecular modeling suggests that the conserved pattern LxxLxL, which is shorter than the previously proposed LxxLxLxxN/CxL is sufficient to impart the characteristic horseshoe curvature to proteins with 20- to 30-residue repeats []. This is a cysteine-containing, leucine-rich repeat which is wide spread amongst eukaryotes proteins but does not appear to be found in archae, bacteria or viruses.
Protein Domain
Name: Zinc finger, CCCH-type superfamily
Type: Homologous_superfamily
Description: Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [ , , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. This entry represents C-x8-C-x5-C-x3-H (CCCH) type Zinc finger (Znf) domains superfamily. Proteins containing CCCH Znf domains include Znf proteins from eukaryotes involved in cell cycle or growth phase-related regulation, e.g. human TIS11B (butyrate response factor 1), a probable regulatory protein involved in regulating the response to growth factors, and the mouse TTP growth factor-inducible nuclear protein, which has the same function. The mouse TTP protein is induced by growth factors. Another protein containing this domain is the human splicing factor U2AF 35kDa subunit, which plays a critical role in both constitutive and enhancer-dependent splicing by mediating essential protein-protein interactions and protein-RNA interactions required for 3' splice site selection. It has been shown that different CCCH-type Znf proteins interact with the 3'-untranslated region of various mRNA [ , ]. This type of Znf is very often present in two copies.
Protein Domain
Name: Envelope small membrane protein, coronavirus
Type: Family
Description: E protein is the smallest of the major structural proteins. It is conserved among Coronavirus strains. It is an integral membrane protein involved in several aspects of the virus' life cycle, such as assembly, budding, envelope formation, and pathogenesis [ ]. During the replication cycle, E is abundantly expressed inside the infected cell, but only a small portion is incorporated into the virus envelope. The majority of the protein participates in viral assembly and budding [, ]. It can act as a viroporin by oligomerizing after insertion in host membranes to create a hydrophilic pore that allows ion transport [, ]. Additionally, the E protein is thought to prevent M protein aggregation and induce membrane curvature [].SARS-CoV E protein forms a Ca2+ permeable channel in the endoplasmic reticulum Golgi apparatus intermediate compartment (ERGIC)/Golgi membranes. The E protein ion channel activity alters Ca2+ homeostasis within cells boosting the activation of the NLRP3 inflammasome, which leads to the overproduction of IL-1beta. SARS-CoV overstimulates the NF-kappaB inflammatory pathway and interacts with the cellular protein syntenin, triggering p38 MARK activation. These signalling cascades result in exacerbated inflammation and immunopathology [].Cov E proteins have a short hydrophilic N terminus, followed by a large hydrophobic transmembrane (TM) domain, and end with a long, hydrophilic C terminus, which comprises the majority of the protein. The hydrophobic region of the TM domain contains at least one predicted amphipathic α-helix that pentamerizes to form an ion-conductive pore in membranes. CoV E proteins have been proposed to have at least two roles. One is related to their TM channel domain. This would be active in the secretory pathway, altering lumenal environments and rearranging secretory organelles and leading to efficient trafficking of virions. The other would be related to their extramembrane domains, particularly the C-terminal domain. This is involved in protein-protein interactions and targeting, among other roles [ , , , ]. In the CoV E protein structure a longer α-helix encompasses the TM domain, which is connected to another shorter C-terminal α-helix by a flexible linker domain, forming an L-shape [Li]. The CoV E pentamer is a right handed α-helical bundle where the C-terminal tails coil around each other [].
Protein Domain
Name: Bcl2-like
Type: Family
Description: B cell CLL/lymphoma-2 (Bcl-2) and related proteins comprise the Bcl-2 family. Bcl-2 proteins are central regulators of caspase activation, and play a key role in cell death by regulating the integrity of the mitochondrial and endoplasmic reticulum (ER) membranes []. Though originally characterised with respect to their roles in controlling outer mitochondrial membrane integrity and apoptosis, the members of the Bcl-2 family are involved in numerous cellular pathways [].Bcl-2 and its relatives are functionally classified as either antiapoptotic or proapoptotic. All members contain at least one of four conserved motifs, termed Bcl-2 Homology (BH) domains. Antiapoptotic BCL-2 proteins contain four Bcl-2 homology domains (BH1-4). The major antiapoptotic proteins are Bcl-2-related gene A1 (A1), Bcl-2, Bcl-2-related gene, long isoform (Bcl-xL), Bcl-w, and myeloid cell leukemia 1 (MCL-1). They preserve outer mitochondrial membrane (OMM) integrity by directly inhibiting the proapoptotic Bcl-2 proteins [].The proapoptotic Bcl-2 members are divided into the effector proteins and the BH3-only proteins. The effector proteins Bcl-2 antagonist killer 1 (BAK) and Bcl-2-associated x protein (BAX) were originally described to contain only BH1-3; however, structure-based alignments revealed a conserved BH4 motif [ ]. Upon activation BAK and BAX homo-oligomerise into proteolipid pores within the OMM to promote MOMP (mitochondrial outer membrane permeabilisation). The BH3-only proteins function in distinct cellular stress scenarios and are subdivided based on their ability to interact with the antiapoptotic or both the antiapoptotic and the effector proteins [].This entry represents the Bcl2 family and related proteins, including E1B 19K protein (also known as E1B protein, small T-antigen), which is a putative adenovirus Bcl-2 homologue that inhibits E1A induced apoptosis and hence prolongs the viability of the host cell.
Protein Domain
Name: Zona pellucida domain, conserved site
Type: Conserved_site
Description: The zona pellucida (ZP) domain is a protein polymerisation module of ~260 amino acid module, which is found at the C terminus of many secretedeukaryotic glycoproteins that play fundamental roles in development, hearing, immunity, and cancer [, , , ].Proteins containing a ZP domain include: Sperm receptor proteins ZP2 and ZP3. Along with protein ZP1, proteins ZP2 and ZP3 are responsible for sperm-adhesion to the zona pellucida. ZP3 firstbinds to specific sperm proteins, thus mediating sperm contacts with the oocyte. ZP2 acts as a second sperm receptor reinforcing the interactions.ZP1 cross-links the polymers formed by ZP2 and ZP3.Zona pellucida sperm-binding protein B (ZP-B) (also known as ZP-X in rabbit and ZP-3 alpha in pig).Glycoprotein GP2, the major component of pancreatic secretory granule membranes.TGF-beta receptor type III (also known as betaglycan). This protein is a proteoglycan that binds to TGF-beta and could be involved in capturing andretaining TGF-beta for presentation to the signalling receptors.Uromodulin (also known as Tamm-Horsfall urinary glycoprotein). The function of this protein, which is the most abundant in human urine, is not yetclear.Chicken beta-tectorin, a major glycoprotein of avian tectorial membrane.Most ZP domain proteins are synthesized as precursors with carboxy-terminaltransmembrane domains or glycosyl phosphatidylinositol (GPI) anchors [ ].The ZP domain contains eight strictly conserved cysteines, which formdisulphide bridges. The disulphide bonds within the ZP domains are divided into two groups, suggesting that the ZP domain consists of two subdomains. Inaddition to the conserved cysteines, only a few aromatic or hydrophobic amino acids are absolutely invariant, probably as a result of structural rather thanfunctional constraints [ , , ].This entry represents a conserved site found within the zona pellucida domain, and includes 2 conserved Cys residues.
Protein Domain
Name: Zinc finger, DNL-type
Type: Domain
Description: Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [, , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. The DNL-type zinc finger is found in Tim15, a zinc finger protein essential for protein import into mitochondria. Mitochondrial functions rely on the correct transport of resident proteins synthesized in the cytosol to mitochondria. Protein import into mitochondriais mediated by membrane protein complexes, protein translocators, in the outer and inner mitochondrial membranes, in cooperation with their assistantproteins in the cytosol, intermembrane space and matrix. Proteins destined to the mitochondrial matrix cross the outer membrane with the aid of the outermembrane translocator, the tOM40 complex, and then the inner membrane with the aid of the inner membrane translocator, the TIM23 complex, and mitochondrial motor and chaperone (MMC) proteins including mitochondrial heat-shock protein 70 (mtHsp70), and translocase in the inner mitochondrial membrane (Tim)15. Tim15 is also known as zinc finger motif (Zim)17 or mtHsp70escort protein (Hep)1. Tim15 contains a zinc-finger motif (CXXC and CXXC) of ~100 residues, which has been named DNL after a short C-terminalmotif of D(N/H)L [ , , ].The DNL-type zinc finger is an L-shaped molecule. The two CXXC motifs are located at the end of the L, and are sandwiched by two-stranded antiparallel β-sheets. Two short α-helices constitute another leg of the L. The outer (convex) face of the L has a large acidic groove,which is lined with five acidic residues, whereas the inner (concave) face of the L has two positively charged residues, next to the CXXC motifs [].This entry represents the DNL-type zinc finger.
Protein Domain
Name: Clathrin adaptor, beta-adaptin, appendage, Ig-like subdomain
Type: Homologous_superfamily
Description: Proteins synthesized on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. These vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transport [ ]. Clathrin coats contain both clathrin (acts as a scaffold) and adaptor complexes that link clathrin to receptors in coated vesicles. Clathrin-associated protein complexes are believed to interact with the cytoplasmic tails of membrane proteins, leading to their selection and concentration. The two major types of clathrin adaptor complexes are the heterotetrameric adaptor protein (AP) complexes, and the monomeric GGA (Golgi-localising, Gamma-adaptin ear domain homology, ARF-binding proteins) adaptors [, ].AP (adaptor protein) complexes are found in coated vesicles and clathrin-coated pits. AP complexes connect cargo proteins and lipids to clathrin at vesicle budding sites, as well as binding accessory proteins that regulate coat assembly and disassembly (such as AP180, epsins and auxilin). There are different AP complexes in mammals. AP1 is responsible for the transport of lysosomal hydrolases between the TGN and endosomes [ ]. AP2 associates with the plasma membrane and is responsible for endocytosis []. AP3 is responsible for protein trafficking to lysosomes and other related organelles []. AP4 is less well characterised. AP complexes are heterotetramers composed of two large subunits (adaptins), a medium subunit (mu) and a small subunit (sigma). For example, in AP1 these subunits are gamma-1-adaptin, beta-1-adaptin, mu-1 and sigma-1, while in AP2 they are alpha-adaptin, beta-2-adaptin, mu-2 and sigma-2. Each subunit has a specific function. Adaptins recognise and bind to clathrin through their hinge region (clathrin box), and recruit accessory proteins that modulate AP function through their C-terminal ear (appendage) domains. Mu recognises tyrosine-based sorting signals within the cytoplasmic domains of transmembrane cargo proteins []. One function of clathrin and AP2 complex-mediated endocytosis is to regulate the number of GABA(A) receptors available at the cell surface []. This entry represents a β-sandwich structural motif found in the appendage (ear) domain of gamma1-adaptin from AP1 clathrin adaptor complex, and the homologous C-terminal GAE (gamma-adaptin ear) domain of GGA adaptor proteins. These domains have an immunoglobulin-like β-sandwich fold containing 8 strands in 2 β-sheets in a Greek key topology [ , ]. This is a similar fold to that found in alpha- and beta-adaptins, but there is little sequence identity between them. The GAE domain is involved in the recruitment of accessory proteins, such as gamma-synergin, Rababptin-5, Eps15 and cyclin G-associated kinase, which modulate the functions of GAE domain containing proteins in the membrane trafficking events [, ]. The binding site in GAE for accessory proteins is located in a shallow hydrophobic trough surrounded by charged (mainly basic) residues [].
Protein Domain
Name: Intein C-terminal splicing region
Type: Domain
Description: Inteins (for INternal proTEINs) are protein insertion sequences that are embedded in host protein sequences. They are post-translationally excised fromthe host protein by a self-catalytic protein splicing process, in which the intein sequence is precisely excised, and the flanking host protein sequences(N- and C-exteins) are religated to create a functional protein. Intein and protein splicing may be viewed as the protein equivalent of intron and RNAsplicing, respectively. Inteins were initially discovered as translated intervening sequences that were present in the host gene but absent inhomologous genes. Inteins occur in organisms spanning all three kingdoms of life (eubacteria, archaea and eukaryote). Although many inteins are in hostproteins involved in nucleic acid metabolism, several inteins are located in metabolic enzymes, such as phosphoenolpyruvate synthase, anaerobicribonucleoside triphosphate reductase, UDP-glucose dehydrogenase, ClpP protease/chaperone, vacuolar ATPase proton pump (VMA) and glutamine-fructose6-phosphate transaminase. It should be noted that protein splicing can also occur in trans as in Synechocystis sp. PCC 6803, where the replicative DNApolymerase catalytic subunit (DnaE) is generated from two separate precursor fragments [, , , ].Most inteins are bifunctional proteins mediating both protein splicing and DNA cleavage. The domain involved in splicing is formed by the two terminalsplicing regions, which are separated by a small linker in mini-inteins or a homing endonuclease of 200-250 amino acids in larger inteins[ , ]. The N-terminal splicing region spans the about 100 N-terminal aminoacids and contains the conserved intein blocks A and B which are similar to the motifs found in the C-terminal autoprocessing domain of the hedgehogprotein. The C-terminal splicing region is composed of the two conserved blocks F and G located in the about 50 C-terminal amino acids. Although, nosingle residue is invariant, the Ser and Cys in block A, the His in block B, the His, Asn and Ser/Cys/Thr in block G are the most conserved residues in thesplicing motifs. Protein splicing requires neither cofactors nor auxiliary enzymes and involves a series of four intramolecular reactions in whichseveral of these most conserved residues are implicated [ , ].Resolution of the crystal structure of the Mxe GyrA mini-intein revealed a flattened 'horseshoe shaped' protein composed primarily of β-strands forming two homologous subdomains that are related by a pseudotwofold axis of symmetry. Despite a low level of sequence conservation, the two subdomains are nearly superimposable, suggesting that they could havearisen by tandem duplication of a primordial gene. However, the duplicated sequences do not correspond directly to the two subdomains as the twosubdomains have exchanged homologous loop regions [ , , , ].This entry represents the C-terminal splicing region that covers the intein blocks F and G. It extends to the firstextein residue following the intein.
Protein Domain
Name: Tetracyclin repressor-like MT0489/Rv0472c, C-terminal domain
Type: Domain
Description: This is the C-terminal domain found in putative transcriptional regulators from the TetR family of proteins, predominantly found in Actinobacteria. This entry includes the uncharacterized HTH-type transcriptional regulators MT0489/Rv0472c from Mycobacterium tuberculosis.The antibiotic tetracycline has a broad spectrum of activity, acting to inhibit bacterial protein synthesis by binding to the 30S ribosomal subunit, which prevents the association of the aminoacyl-tRNA to the ribosomal acceptor A site. Tetracycline binding is reversible, therefore diluting out the antibiotic can reverse its effects. Tetracycline resistance genes are often located on mobile elements, such as plasmids, transposons and/or conjugative transposons, which can sometimes be transferred between bacterial species. In certain cases, tetracycline can enhance the transfer of these elements, thereby promoting resistance amongst a bacterial colony. There are three types of tetracycline resistance: tetracycline efflux, ribosomal protection, and tetracycline modification [ , ]: Tetracycline efflux proteins belong to the major facilitator superfamily. Efflux proteins are membrane-associated proteins that recognise and export tetracycline from the cell. They are found in both Gram-positive and Gram-negative bacteria [ ]. There are at least 22 different tetracycline efflux proteins, grouped according to sequence similarity: Group 1 are Tet(A), Tet(B), Tet(C), Tet(D), Tet(E), Tet(G), Tet(H), Tet(J), Tet(Z) and Tet(30); Group 2 are Tet(K) and Tet(L); Group 3 are Otr(B) and Tcr(3); Group 4 is TetA(P); Group 5 is Tet(V). In addition, there are the efflux proteins Tet(31), Tet(33), Tet(V), Tet(Y), Tet(34), and Tet(35).Ribosomal protection proteins are cytoplasmic proteins that display homology with the elongation factors EF-Tu and EF-G. Protection proteins bind the ribosome, causing an alteration in ribosomal conformation that prevents tetracycline from binding. There are at least ten ribosomal protection proteins: Tet(M), Tet(O), Tet(S), Tet(W), Tet(32), Tet(36), Tet(Q), Tet(T), Otr(A), and TetB(P). Both Tet(M) and Tet(O) have ribosome-dependent GTPase activity, the hydrolysis of GTP providing the energy for the ribosomal conformational changes. Tetracycline modification proteins include the enzymes Tet(37) and Tet(X), both of which inactivate tetracycline. In addition, there are the tetracycline resistance proteins Tet(U) and Otr(C).The expression of several of these tet genes is controlled by a family of tetracycline transcriptional regulators known as TetR. TetR family regulators are involved in the transcriptional control of multidrug efflux pumps, pathways for the biosynthesis of antibiotics, response to osmotic stress and toxic chemicals, control of catabolic pathways, differentiation processes, and pathogenicity [ ]. The TetR proteins identified in over 115 genera of bacteria and archaea share a common helix-turn-helix (HTH) structure in their DNA-binding domain. However, TetR proteins can work in different ways: they can bind a target operator directly to exert their effect (e.g. TetR binds Tet(A) gene to repress it in the absence of tetracycline), or they can be involved in complex regulatory cascades in which the TetR protein can either be modulated by another regulator or TetR can trigger the cellular response.
Protein Domain
Name: Lipocalin/cytosolic fatty-acid binding domain
Type: Domain
Description: This entry represents the lipocalin/cytosolic fatty-acid binding domain of a group of proteins that belong to the calycin superfamily. Proteins which transport small hydrophobic molecules such as steroids, bilins, retinoids, and lipids share limited regions of sequence homology and a common tertiary structure architecture [, , , , ]. This is an eight stranded antiparallel β-barrel with a repeated + 1 topology enclosing a internal ligand binding site [, ]. The name 'lipocalin' has been proposed [] forthis protein family, but cytosolic fatty-acid binding proteins are also included. The sequences of most members of the family, the core or kernal lipocalins, are characterised by three short conserved stretches of residues, while others, the outlier lipocalin group, share only one or two of these[ , ]. Proteins known to belong to this family include alpha-1-microglobulin (protein HC);alpha-1-acid glycoprotein (orosomucoid) [ ]; aphrodisin; apolipoprotein D; beta-lactoglobulin; complementcomponent C8 gamma chain [ ]; crustacyanin []; epididymal-retinoic acid binding protein(E-RABP) [ ]; insectacyanin; odorant-binding protein (OBP); human pregnancy-associated endometrial alpha-2globulin; probasin (PB), a rat prostatic protein; prostaglandin D synthase ( ) [ ]; purpurin; VonEbner's gland protein (VEGP) [ ]; and lizard epididymal secretory protein IV (LESP IV) [].
Protein Domain
Name: M matrix/glycoprotein, MERS-like-CoV
Type: Family
Description: This entry contains the Membrane (M) protein of Middle East respiratory syndrome (MERS)-related CoV, bat-CoV HKU5, and similar proteins from betacoronaviruses in the merbecovirus subgenera (C lineage). The membrane (M) protein is the most abundant structural protein and defines the shape of the viral envelope. It is also regarded as the central organiser of coronavirus assembly, interacting with all other major coronaviral structural proteins. M proteins play a critical role in protein-protein interactions (as well as protein-RNA interactions) since virus-like particle (VLP) formation in many CoVs requires only the M and envelope (E) proteins for efficient virion assembly [ ]. Interaction of spike (S) with M is necessary for retention of S in the ER-Golgi intermediate compartment (ERGIC)/Golgi complex and its incorporation into new virions, but dispensable for the assembly process. Binding of M to nucleocapsid (N) proteins stabilises the nucleocapsid (N protein-RNA complex), as well as the internal core of virions, and, ultimately, promotes completion of viral assembly. Together, M and E protein make up the viral envelope and their interaction is sufficient for the production and release of virus-like particles (VLPs) [, , ].
Protein Domain
Name: Lipid transport, open beta-sheet
Type: Domain
Description: This entry represents a conserved open β-sheet domain found in several lipid transport proteins, including vitellogenin and apolipoprotein B-100 [ ].Vitellinogen precursors provide the major egg yolk proteins that are a source of nutrients during early development of oviparous vertebrates and invertebrates. Vitellinogen precursors are multi-domain apolipoproteins that are cleaved into distinct yolk proteins. Different vitellinogen precursors exist, which are composed of variable combinations of yolk protein components; however, the cleavage sites are conserved. In vertebrates, a complete vitellinogen is composed of an N-terminal signal peptide for export, followed by four regions that can be cleaved into yolk proteins: heavy chain lipovitellin (lipovitellin-1), phosvitin, light chain lipovitellin (lipovitellin-2), and a von Willebrand factor type D domain (YGP40) [ , ]. In vitellinogen, this domain is often found as part of the lipovitellin-1 peptide product. Apolipoprotein B can exist in two forms: B-100 and B-48. Apoliporotein B-100 is present on several lipoproteins, including very low-density lipoproteins (VLDL), intermediate density lipoproteins (IDL) and low density lipoproteins (LDL), and can assemble VLDL particles in the liver [ ]. Apolipoprotein B-100 has been linked to the development of atherosclerosis.
Protein Domain
Name: Mitochondrial inner membrane translocase complex, subunit Tim23
Type: Family
Description: The mitochondrial protein translocase (MPT) family, which brings nuclearly encoded preproteins into mitochondria, is very complex with 19 currently identified protein constituents. These proteins include several chaperone proteins, four proteins of the outer membrane translocase (Tom) import receptor, five proteins of the Tom channel complex, five proteins of the inner membrane translocase (Tim) and three "motor"proteins. The inner membrane translocase is formed of a complex with a number of proteins, including the Tim17, Tim23 and Tim44 subunits. Tim17 and Tim23 are thought to form the translocation channel of the inner membrane [ ].The TIM23 complex forms a dynamic multisubunit machinery that recognises preproteins and can transport them into the inner membrane or into the matrix. It is composed of three subunits, Tim50, Tim23, and Tim17, that expose the domains to the intermembrane space. This entry includes the mitochondrial import inner membrane translocase subunit Tim23, consisting of an N-terminal intermembrane space domain that interacts with Tim50, and a C-terminal membrane-embedded domain that, together with Tim17, forms a protein-conducting channel across the inner membrane. Tim50 helps keeping the Tim23 channel in a closed state in the absence of presequence proteins, regulating the gating through the TIM channel [ , ].
Protein Domain
Name: NAD--protein ADP-ribosyltransferase Alt-like
Type: Family
Description: This entry represents a family of NAD-protein ADP-ribosyltransferases found in Myoviridae (phages with contractile tails). It includes the ALT protein from the bacteriophage T4Protein ADP-ribosylation is an important posttranslational modification catalyzed by a group of enzymes known as ADP-ribosyltransferases (ADP-RTs) [ ]. ADP-RTs transfer single or multiple ADP-ribose moieties from NAD to a specific amino acid residue within a target protein, forming mono ADP-ribosylation or poly ADP-ribosylation (PARylation) []. ADP-ribosylation changes the electrostatic potential of a target protein by introducing two phosphate groups and may affect protein-DNA as well as protein-protein interactions []. Protein ADP-ribosylation plays versatile roles in multiple biological processes.Bacteriophage T4 codes for three ADP-ribosyltransferases: Alt, ModA, and ModB. The ADP-ribosylating activity of each is directed to a specific set of host proteins. Among the three phage-encoded T4 mono-ADP-RTs, the Alt protein has the broadest range of target proteins, which include one of the two alpha subunits of host RNA polymerase [ ].The T4 Alt protein initially acts as a structural component of the phage head. At the time of infection, it enters the host cell with phage DNA and immediately displays enzymatic activity [].
Protein Domain
Name: Carbohydrate kinase, predicted, conserved site
Type: Conserved_site
Description: Proteins in this entry are related to Hydroxyethylthiazole kinase ( ) and PfkB carbohydrate kinase implying they are also carbohydrate kinases. Several uncharacterised proteins have been shown to share regions of similarities, including yeast chromosome XI hypothetical protein YKL151c; Caenorhabditis elegans hypothetical protein R107.2; Escherichia coli hypothetical protein yjeF; Bacillus subtilis hypothetical protein yxkO; Helicobacter pylori hypothetical protein HP1363; Mycobacterium tuberculosis hypothetical protein MtCY77.05c; Mycobacterium leprae hypothetical protein B229_C2_201; Synechocystis sp. (strain PCC 6803) hypothetical protein sll1433; and Methanocaldococcus jannaschii (Methanococcus jannaschii) hypothetical protein MJ1586. These are proteins of about 30 to 40kDa whose central region is well conserved.
Protein Domain
Name: Coatomer beta subunit, C-terminal
Type: Domain
Description: Proteins synthesised on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. This traffic is bidirectional, to ensure that proteins required to form vesicles are recycled. Vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transfer [ ]. While clathrin mediates endocytic protein transport, and transport from ER to Golgi, coatomers primarily mediate intra-Golgi transport, as well as the reverse Golgi to ER transport of dilysine-tagged proteins []. For example, the coatomer COP1 (coat protein complex 1) is responsible for reverse transport of recycled proteins from Golgi and pre-Golgi compartments back to the ER, while COPII buds vesicles from the ER to the Golgi []. Coatomers reversibly associate with Golgi (non-clathrin-coated) vesicles to mediate protein transport and for budding from Golgi membranes [ ]. Activated small guanine triphosphatases (GTPases) attract coat proteins to specific membrane export sites, thereby linking coatomers to export cargos. As coat proteins polymerise, vesicles are formed and budded from membrane-bound organelles. Coatomer complexes also influence Golgi structural integrity, as well as the processing, activity, and endocytic recycling of LDL receptors. In mammals, coatomer complexes can only be recruited by membranes associated to ADP-ribosylation factors (ARFs), which are small GTP-binding proteins. Coatomer complexes are hetero-oligomers composed of at least an alpha, beta, beta', gamma, delta, epsilon and zeta subunits. This entry represents the C-terminal domain of the beta subunit from coatomer proteins (Beta-coat proteins). The C-terminal domain probably adapts the function of the N-terminal domain. Coatomer protein complex I (COPI)-coated vesicles are involved in transport between the endoplasmic reticulum and the Golgi but also participate in transport from early to late endosomes within the endocytic pathway [ ].
Protein Domain
Name: Ajuba-like, LIM domain 3
Type: Domain
Description: This entry represents the third LIM domain of Ajuba-like proteins. This domain shares two characteristic zinc finger motifs that contain eight conserved residues each, mostly cysteines and histidines, which coordinately bond to two zinc atoms. Ajuba-like LIM domain-containing protein family includes three highly homologous proteins Ajuba, LIMD1, and WTIP, adapter or scaffold proteins that participate in the assembly of numerous protein complexes and are involved in several cellular processes such as cell fate determination, cytoskeletal organization, repression of gene transcription, mitosis, cell-cell adhesion, cell differentiation, proliferation and migration [ ]. They bind to Ago1/2, RCK, Dcp2, and eIF4E in vivo, being required for miRNA-mediated gene silencing. These three proteins bind to the mRNA 5' m(7)GTP cap-protein complex []. Members of this family negatively regulates the Hippo signalling pathway []. WTIP, the Wt1-interacting protein, was originally identified as an interaction partner of the Wilms tumour protein 1 (WT1). WTIP is involved in kidney and neural crest development. It interacts with the receptor tyrosine kinase Ror2 and inhibits canonical Wnt signaling.LIMD1 was reported to inhibit cell growth and metastases; it has been described that LIMD1 functions as tumour suppressor to block lung tumour cell line in vitro and in vivo [ , ]. The inhibition may be mediated through an interaction with the protein barrier-to-autointegration (BAF), a component of SWI/SNF chromatin-remodeling protein; or through the interaction with retinoblastoma protein (pRB), resulting in inhibition of E2F-mediated transcription, and expression of the majority of genes with E2F1-responsive elements. Recently, LIMD1 was shown to interact with the p62/sequestosome protein and influence IL-1 and RANKL signalling by facilitating the assembly of a p62/TRAF6/a-PKC multi-protein complex.Members of the family contain three tandem C-terminal LIM domains and a proline-rich N-terminal region.
Protein Domain
Name: Ajuba-like, LIM domain 2
Type: Domain
Description: This entry represents the second LIM domain of Ajuba-like proteins. This domain shares two characteristic zinc finger motifs that contain eight conserved residues each, mostly cysteines and histidines, which coordinately bond to two zinc atoms. Ajuba-like LIM domain-containing protein family includes three highly homologous proteins Ajuba, LIMD1, and WTIP, adapter or scaffold proteins that participate in the assembly of numerous protein complexes and are involved in several cellular processes such as cell fate determination, cytoskeletal organization, repression of gene transcription, mitosis, cell-cell adhesion, cell differentiation, proliferation and migration [ ]. They bind to Ago1/2, RCK, Dcp2, and eIF4E in vivo, being required for miRNA-mediated gene silencing. These three proteins bind to the mRNA 5' m(7)GTP cap-protein complex [ ]. Members of this family negatively regulates the Hippo signalling pathway []. WTIP, the Wt1-interacting protein, was originally identified as an interaction partner of the Wilms tumour protein 1 (WT1). WTIP is involved in kidney and neural crest development. It interacts with the receptor tyrosine kinase Ror2 and inhibits canonical Wnt signaling.LIMD1 was reported to inhibit cell growth and metastases; it has been described that LIMD1 functions as tumour suppressor to block lung tumour cell line in vitro and in vivo [ , ]. The inhibition may be mediated through an interaction with the protein barrier-to-autointegration (BAF), a component of SWI/SNF chromatin-remodeling protein; or through the interaction with retinoblastoma protein (pRB), resulting in inhibition of E2F-mediated transcription, and expression of the majority of genes with E2F1-responsive elements. Recently, LIMD1 was shown to interact with the p62/sequestosome protein and influence IL-1 and RANKL signalling by facilitating the assembly of a p62/TRAF6/a-PKC multi-protein complex.Members of the family contain three tandem C-terminal LIM domains and a proline-rich N-terminal region.
Protein Domain
Name: Ajuba-like, LIM domain 1
Type: Domain
Description: This entry represents the first LIM domain of Ajuba-like proteins. This domain shares two characteristic zinc finger motifs that contain eight conserved residues each, mostly cysteines and histidines, which coordinately bond to two zinc atoms. Ajuba-like LIM domain-containing protein family includes three highly homologous proteins Ajuba, LIMD1, and WTIP, adapter or scaffold proteins that participate in the assembly of numerous protein complexes and are involved in several cellular processes such as cell fate determination, cytoskeletal organization, repression of gene transcription, mitosis, cell-cell adhesion, cell differentiation, proliferation and migration [ ]. They bind to Ago1/2, RCK, Dcp2, and eIF4E in vivo, being required for miRNA-mediated gene silencing. These three proteins bind to the mRNA 5' m(7)GTP cap-protein complex []. Members of this family negatively regulates the Hippo signalling pathway []. WTIP, the Wt1-interacting protein, was originally identified as an interaction partner of the Wilms tumour protein 1 (WT1). WTIP is involved in kidney and neural crest development. It interacts with the receptor tyrosine kinase Ror2 and inhibits canonical Wnt signaling.LIMD1 was reported to inhibit cell growth and metastases; it has been described that LIMD1 functions as tumour suppressor to block lung tumour cell line in vitro and in vivo [ , ]. The inhibition may be mediated through an interaction with the protein barrier-to-autointegration (BAF), a component of SWI/SNF chromatin-remodeling protein; or through the interaction with retinoblastoma protein (pRB), resulting in inhibition of E2F-mediated transcription, and expression of the majority of genes with E2F1-responsive elements. Recently, LIMD1 was shown to interact with the p62/sequestosome protein and influence IL-1 and RANKL signalling by facilitating the assembly of a p62/TRAF6/a-PKC multi-protein complex.Members of the family contain three tandem C-terminal LIM domains and a proline-rich N-terminal region.
Protein Domain
Name: SMAP1-like, ArfGAP domain
Type: Domain
Description: This entry represents the ArfGAP domain found in stromal membrane-associated protein 1 (SMAP1) and related proteins. SMAP1 binds to clathrin heavy chain molecules and is involved in the trafficking of clathrin-coated vesicles and preferentially exhibits GAP toward Arf6. SMAP1 is involved in Arf6-dependent vesicle trafficking, but not Arf6-mediated actin cytoskeleton reorganization, and regulates clathrin-dependent endocytosis of the transferrin receptors and E-cadherin [ ]. This entry also includes the uncharacterised protein C824.09c from Schizosaccharomyces pombe, a predicted GTPase activating protein.Proteins containing this domain include ARF1-directed GTPase-activating protein, the cycle control GTPase activating protein (GAP) GCS1 which is important for the regulation of the ADP ribosylation factor ARF, a member of the Ras superfamily of GTP-binding proteins [ ]. The GTP-bound form of ARF is essential for the maintenance of normal Golgi morphology, it participates in recruitment of coat proteins which are required for budding and fission of membranes. Before the fusion with an acceptor compartment the membrane must be uncoated. This step required the hydrolysis of GTP associated to ARF. These proteins contain a characteristic zinc finger motif (Cys-x2-Cys-x(16,17)-x2-Cys) which displays some similarity to the C4-type GATA zinc finger. The ARFGAP domain display no obvious similarity to other GAP proteins.The 3D structure of the ARFGAP domain of the PYK2-associated protein beta has been solved [ ]. It consists of a three-stranded β-sheet surrounded by 5 alpha helices. The domain is organised around a central zinc atom which is coordinated by 4 cysteines. The ARFGAP domain is clearly unrelated to the other GAP proteins structures which are exclusively helical. Classical GAP proteins accelerate GTPase activity by supplying an arginine finger to the active site. The crystal structure of ARFGAP bound to ARF revealed that the ARFGAP domain does not supply an arginine to the active site which suggests a more indirect role of the ARFGAP domain in the GTPase hydrolysis [].
Protein Domain
Name: Stomatin/HflK family
Type: Family
Description: The band-7 protein family comprises a diverse set of membrane-bound proteins characterised by the presence of a conserved domain, the band-7 domain, also known as SPFH or PHB domain. The exact function of the band-7 domain is not known, but examples from animal and bacterial stomatin-type proteins demonstrate binding to lipids and the ability to assemble into membrane-bound oligomers that form putative scaffolds [ ].A variety of proteins belong to the band-7 family. These include the stomatins, prohibitins, flottins and the HflK/C bacterial proteins. Eukaryotic band 7 proteins tend to be oligomeric and are involved in membrane-associated processes. Stomatins are involved in ion channel function, prohibitins are involved in modulating the activity of a membrane-bound FtsH protease and the assembly of mitochondrial respiratory complexes, and flotillins are involved in signal transduction and vesicle trafficking [ ].Stomatin, also known as human erythrocyte membrane protein band 7.2b [ ], was first identified in the band 7 region of human erythrocyte membrane proteins. It is an oligomeric, monotopic membrane protein associated with cholesterol-rich membranes/lipid rafts. Human stomatin is ubiquitously expressed in all tissues; highly in hematopoietic cells, relatively low in brain. It is associated with the plasma membrane and cytoplasmic vesicles of fibroblasts, epithelial and endothelial cells [].Stomatin is believed to be involved in regulating monovalent cation transport through lipid membranes. Absence of the protein in hereditary stomatocytosis is believed to be the reason for the leakage of Na +and K +ions into and from erythrocytes [ ]. Stomatin is also expressed in mechanosensory neurons, where it may interact directly with transduction components, including cation channels [].Stomatin proteins have been identified in various organisms, including Caenorhabditis elegans. There are nine stomatin-like proteins in C. elegans, MEC-2 being the one best characterised [ ]. In mammals, other stomatin family members are stomatin-like proteins SLP1, SLP2 and SLP3, and NPHS2 (podocin), which display selective expression patterns []. Stomatin family members are oligomeric, they mostly localise to membrane domains, and in many cases have been shown to modulate ion channel activity.The stomatins and prohibitins, and to a lesser extent flotillins, are highly conserved protein families and are found in a variety of organisms ranging from prokaryotes to higher eukaryotes, whereas HflK and HflC homologues are only present in bacteria [ ].This entry matches Stomatin, HflK and similar proteins.
Protein Domain
Name: Ribosomal S11, conserved site
Type: Conserved_site
Description: Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [ , ]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [ , ].Ribosomal protein S11 [ ] plays an essential role in selecting the correct tRNA in protein biosynthesis. It is located on the large lobe of the small ribosomal subunit. On the basis of sequence similarities, S11 belongs to a family of bacterial, archaeal and eukaryotic ribosomal proteins. This entry represents a small conserved site found in S11 ribosomal proteins.
Protein Domain
Name: MRP-L47 superfamily, mitochondrial
Type: Homologous_superfamily
Description: Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [ , ]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [ , ].This entry represents the N-terminal region (approximately 8 residues) of the eukaryotic mitochondrial 39-S ribosomal protein L47 (MRP-L47). Mitochondrial ribosomal proteins (MRPs) are the counterparts of the cytoplasmic ribosomal proteins, in that they fulfil similar functions in protein biosynthesis. However, they are distinct in number, features and primary structure [ ].
Protein Domain
Name: G-protein, gamma subunit
Type: Family
Description: Guanine nucleotide binding proteins (G proteins) are membrane-associated, heterotrimeric proteins composed of three subunits: alpha ( ), beta ( ) and gamma ( ) [ ]. G proteins and their receptors (GPCRs) form one of the most prevalent signalling systems in mammalian cells, regulating systems as diverse as sensory perception, cell growth and hormonal regulation []. At the cell surface, the binding of ligands such as hormones and neurotransmitters to a GPCR activates the receptor by causing a conformational change, which in turn activates the bound G protein on the intracellular-side of the membrane. The activated receptor promotes the exchange of bound GDP for GTP on the G protein alpha subunit. GTP binding changes the conformation of switch regions within the alpha subunit, which allows the bound trimeric G protein (inactive) to be released from the receptor, and to dissociate into active alpha subunit (GTP-bound) and beta/gamma dimer. The alpha subunit and the beta/gamma dimer go on to activate distinct downstream effectors, such as adenylyl cyclase, phosphodiesterases, phospholipase C, and ion channels. These effectors in turn regulate the intracellular concentrations of secondary messengers, such as cAMP, diacylglycerol, sodium or calcium cations, which ultimately lead to a physiological response, usually via the downstream regulation of gene transcription. The cycle is completed by the hydrolysis of alpha subunit-bound GTP to GDP, resulting in the re-association of the alpha and beta/gamma subunits and their binding to the receptor, which terminates the signal []. The length of the G protein signal is controlled by the duration of the GTP-bound alpha subunit, which can be regulated by RGS (regulator of G protein signalling) proteins or by covalent modifications [].G protein alpha subunits are 350-400 amino acids in length and have molecular weights in the range 40-45kDa. Seventeen distinct types ofalpha subunit have been identified in mammals. These fall into 4 main groups on the basis of both sequence similarity and function: alpha-S (), alpha-Q (), alpha-I ( )and alpha-12( ) [ ].The specific combination of subunits in heterotrimeric G proteins affects not only which receptor it can bind to, but also which downstream target is affected, providing the means to target specific physiological processes in response to specific external stimuli [ , ]. G proteins carry lipid modifications on one or more of their subunits to target them to the plasma membrane and to contribute to protein interactions.This entry represents the G protein gamma subunit.
Protein Domain
Name: Macroglobulin domain
Type: Domain
Description: The proteinase-binding alpha-macroglobulins (A2M) [ ] are large glycoproteins found in the plasma of vertebrates, in the hemolymph of some invertebrates and in reptilian and avian egg white. A2M-like proteins are able to inhibit all four classes of proteinases by a 'trapping' mechanism. They have a peptide stretch, called the 'bait region', which contains specific cleavage sites for different proteinases. When a proteinase cleaves the bait region, a conformational change is induced in the protein, thus trapping the proteinase. The entrapped enzyme remains active against low molecular weight substrates, whilst its activity toward larger substrates is greatly reduced, due to steric hindrance. Following cleavage in the bait region, a thiol ester bond, formed between the side chains of a cysteine and a glutamine, is cleaved and mediates the covalent binding of the A2M-like protein to the proteinase.This entry represents the MG2 (macroglobulin) domain of alpha-2-macroglobulin and related thioester-containing proteins (TEPs [ , . A2M-like proteins have been identified in pathogenic invasive bacteria and species that colonize higher eukaryotes. This domain is found in eukaryotic and bacterial proteins. In human A2Ms, this domain is termed macroglobulin-like (MG) domain 2 and in Salmonella enterica ser A2Ms, this is domain 4 [ , ].
Protein Domain
Name: CLU central domain
Type: Domain
Description: Mutations in the mitochondrial CLU proteins have been shown to result in clustered mitochondria [ , , ]. CLU proteins include Saccharomyces cerevisiae clustered mitochondria protein (Clu1p, alias translation initiation factor 31/TIF31p), Dictyostelium discoideum clustered mitochondria protein homologue (CluA), Caenorhabditis elegans clustered mitochondria protein homologue (CLUH/ Protein KIAA0664), Drosophila clueless (alias clustered mitochondria protein homologue), Arabidopsis clustered mitochondria protein (CLU, alias friendly mitochondria protein/FMT), and human clustered mitochondria protein homologue (CLUH).Dictyostelium CluA is involved in mitochondrial dynamics and is necessary for both, mitochondrial fission and fusion [ ]. Drosophila clueless is essential for cytoplasmic localization and function of cellular mitochondria []. The Drosophila clu gene interacts genetically with parkin (park, the Drosophila ortholog of a human gene responsible for many familial cases of Parkinson's disease) []. Arabidopsis CLU/FMT is required for correct mitochondrial distribution and morphology []. The specific role CLU proteins play in mitochondrial processes in not yet known. In an early study, S. cerevisiae Clu1/TIF31p was reported as sometimes being associated with the elF3 translation initiation factor. The authors noted, however, that its tentative assignment as a subunit of elf3 was uncertain, and to date there has been no direct evidence for a role of this protein in translation [].This entry represents a central domain in CLU proteins.
Protein Domain
Name: Quinoprotein alcohol dehydrogenase-like superfamily
Type: Homologous_superfamily
Description: Quinoprotein alcohol dehydrogenases are a family of proteins found in methylotrophic or autotrophic bacteria. These quinoproteins use pyrroloquinoline quinone as their prosthetic group. There are three types of alcohol dehydrogenases: type I includes methanol dehydrogenase and ethanol dehydrogenase, type II includes soluble quinohaemoprotein with a C-terminal containing haem C, and type III includes quinoprotein alcohol dehydrogenase with a C-terminal cytochrome C domain [ ]. These quinoproteins contain an 8-bladed β-propeller motif, which is present in the N-terminal domain of quinoprotein alcohol dehydrogenase, ethanol dehydrogenase, and the heavy chain (alpha subunit) of methanol dehydrogenase () [ , , ].This entry represents quinoprotein alcohol dehydrogenases as well as some other proteins that share a similar structure, including WD repeat-containing protein and echinoderm microtubule-associated protein-like proteins.
Protein Domain
Name: Clusterin-like
Type: Family
Description: Clusterin (Clu), also known as apolipoprotein J, is a vertebrate glycoprotein [ ]. Clusterin expression is complex, appearing as different forms indifferent cell compartments. One set of proteins is directed for secretion, and other clusterin species are expressed in the cytoplasm and nucleus. The secretory form of the clusterin protein (sCLU) is targeted to the ER by an initial leader peptide. This ~60kDa pre-sCLU protein is proteolytically cleaved into alpha- and beta-subunits and further glycosylated to form mature disulfide-linked heterodimeric secretory CLU (sCLU). sCLU is an 80kDa protein and acts as a molecular chaperone, scavenging denatured proteins outside cells [, ]. sCLU possesses nonspecific binding activity to hydrophobic domains of various non-native proteins [], binds to some bacteria and bacterial proteins [], and interacts with different immune molecules [].A specific nuclear form of CLU (nCLU) acts as a pro-death signal, inhibiting cell growth and survival. ThenCLU protein has two coiled-coil domains, one at its N terminus that is unable to bind Ku70, and a C-terminal coiled-coil domain that is uniquely able to associate with Ku70 and is minimally required for cell death. The sCLU protein is cytoprotective and anti-apoptotic, whereas the nCLU protein is pro-apoptotic [, , ].This family also includes clusterin-like protein 1 (CLUL1), which is expressed specifically in cone photoreceptor cells [ ] and is likely to be necessary for normal cone function [].
Protein Domain
Name: Leucine-rich repeat
Type: Repeat
Description: Leucine-rich repeats (LRR) consist of 2-45 motifs of 20-30 amino acids in length that generally folds into an arc or horseshoe shape [ ]. LRRs occur in proteins ranging from viruses to eukaryotes, and appear to provide a structural framework for the formation of protein-protein interactions [, ].Proteins containing LRRs include tyrosine kinase receptors, cell-adhesion molecules, virulence factors, and extracellular matrix-binding glycoproteins, and are involved in a variety of biological processes, including signal transduction, cell adhesion, DNA repair, recombination, transcription, RNA processing, disease resistance, apoptosis, and the immune response [, ].Sequence analyses of LRR proteins suggested the existence of several different subfamilies of LRRs. The significance of this classification is that repeats from different subfamilies never occur simultaneously and have most probably evolved independently. It is, however, now clear that all major classes of LRR have curved horseshoe structures with a parallel beta sheet on the concave side and mostly helical elements on the convex side. At least six families of LRR proteins, characterised by different lengths and consensus sequences of the repeats, have been identified. Eleven-residue segments of the LRRs (LxxLxLxxN/CxL), corresponding to the β-strand and adjacent loop regions, are conserved in LRR proteins, whereas the remaining parts of the repeats (herein termed variable) may be very different. Despite the differences, each of the variable parts contains two half-turns at both ends and a "linear"segment (as the chain follows a linear path overall), usually formed by a helix, in the middle. The concave face and the adjacent loops are the most common protein interaction surfaces on LRR proteins. 3D structure of some LRR proteins-ligand complexes show that the concave surface of LRR domain is ideal for interaction with α-helix, thus supporting earlier conclusions that the elongated and curved LRR structure provides an outstanding framework for achieving diverse protein-protein interactions []. Molecular modeling suggests that the conserved pattern LxxLxL, which is shorter than the previously proposed LxxLxLxxN/CxL is sufficient to impart the characteristic horseshoe curvature to proteins with 20- to 30-residue repeats [].
Protein Domain
Name: Leucine-rich repeat, typical subtype
Type: Repeat
Description: Leucine-rich repeats (LRR) consist of 2-45 motifs of 20-30 amino acids in length that generally folds into an arc or horseshoe shape [ ]. LRRs occur in proteins ranging from viruses to eukaryotes, and appear to provide a structural framework for the formation of protein-protein interactions [, ].Proteins containing LRRs include tyrosine kinase receptors, cell-adhesion molecules, virulence factors, and extracellular matrix-binding glycoproteins, and are involved in a variety of biological processes, including signal transduction, cell adhesion, DNA repair, recombination, transcription, RNA processing, disease resistance, apoptosis, and the immune response [, ].Sequence analyses of LRR proteins suggested the existence of several different subfamilies of LRRs. The significance of this classification is that repeats from different subfamilies never occur simultaneously and have most probably evolved independently. It is, however, now clear that all major classes of LRR have curved horseshoe structures with a parallel beta sheet on the concave side and mostly helical elements on the convex side. At least six families of LRR proteins, characterised by different lengths and consensus sequences of the repeats, have been identified. Eleven-residue segments of the LRRs (LxxLxLxxN/CxL), corresponding to the β-strand and adjacent loop regions, are conserved in LRR proteins, whereas the remaining parts of the repeats (herein termed variable) may be very different. Despite the differences, each of the variable parts contains two half-turns at both ends and a "linear"segment (as the chain follows a linear path overall), usually formed by a helix, in the middle. The concave face and the adjacent loops are the most common protein interaction surfaces on LRR proteins. 3D structure of some LRR proteins-ligand complexes show that the concave surface of LRR domain is ideal for interaction with α-helix, thus supporting earlier conclusions that the elongated and curved LRR structure provides an outstanding framework for achieving diverse protein-protein interactions []. Molecular modeling suggests that the conserved pattern LxxLxL, which is shorter than the previously proposed LxxLxLxxN/CxL is sufficient to impart the characteristic horseshoe curvature to proteins with 20- to 30-residue repeats []. This entry represents a most populated subfamily of leucine-rich repeats.
Protein Domain
Name: Clp, N-terminal domain superfamily
Type: Homologous_superfamily
Description: ClpA is an ATP-dependent chaperone and part of the ClpAP protease that participates in regulatory protein degradation and the dissolution and degradation of protein aggregates [ ]. ClpA recognises sequences in specific proteins, which it then unfolds in an ATP-dependent manner and transports into the degradation chamber of the associated ClpP protein [, ]. A small adaptor-like protein, ClpS, modulates the activity of ClpA and is an important regulatory factor for this protein []. It protects ClpA from autodegradation and appears to redirect its activity away from soluble proteins and toward aggregated proteins.Molecular chaperones recognize unfolded or misfolded proteins by binding to hydrophobic surface patches not normally exposed in the native proteins. Members of the Clp/Hsp100 family of chaperones are present in eubacteria and within organelles of all eukaryotes, promoting disaggregation and disassembly of protein complexes and participating in energy-dependent protein degradation. The ClpA, ClpB, and ClpC subfamilies of the Clp/Hsp100 ATPases contain a conserved N-terminal domain of ~150 amino acids, which in turn consists of two repeats of ~75 residues. Although the Clp repeat (R) domain contains two approximate sequence repeats, it behaves as a single cooperatively folded unit. The Clp R domain is thought to provide a means for regulating the specificity of and to enlarge the substrate pool available to Clp/Hsp100 chaperone or protease complexes. These roles can be assisted through the binding of an adaptor protein. Adaptor proteins bind to the Clp R domain, modulate the target specificity of the Clp/Hsp100 complex to a particular substrate of interest, and may also regulate the activity of the complex [ , , , , , ].The Clp R domain is monomeric and partially alpha helical. It is a single folding unit with pseudo 2-fold symmetry. The Clp R domain structure consists of two four-helix bundles connected by a flexible loop [ , , ]. This entry represents the Clp repeat (R) domain [ ].
Protein Domain
Name: CUB domain
Type: Domain
Description: The CUB domain (for complement C1r/C1s, Uegf, Bmp1) is a structural motif of approximately 110 residues found almost exclusively in extracellular and plasma membrane-associated proteins, many of which are developmentally regulated [ , ]. These proteins are involved in a diverse range of functions, including complement activation, developmental patterning, tissue repair, axon guidance and angiogenesis, cell signalling, fertilisation, haemostasis, inflammation, neurotransmission, receptor-mediated endocytosis, and tumour suppression [ ]. Many CUB-containing proteins are peptidases belonging to MEROPS peptidase families M12A (astacin) and S1A (chymotrypsin). Proteins containing a CUB domain include:Mammalian complement subcomponents C1s/C1r, which form the calcium-dependent complex C1, the first component of the classical pathway of the complement system.Cricetidae sp. (Hamster) serine protease Casp, which degrades type I and IV collagen and fibronectin in the presence of calcium.Mammalian complement-activating component of Ra-reactive factor (RARF), a protease that cleaves the C4 component of complement.Vertebrate enteropeptidase ( ), a type II membrane protein of the intestinal brush border, which activates trypsinogen. Vertebrate bone morphogenic protein 1 (BMP-1), a protein which induces cartilage and bone formation and expresses metalloendopeptidase activity.Sea urchin blastula proteins BP10 and SpAN.Caenorhabditis elegans hypothetical proteins F42A10.8 and R151.5.Neuropilin (A5 antigen), a calcium-independent cell adhesion molecule that functions during the formation of certain neuronal circuits.Fibropellins I and III from Strongylocentrotus purpuratus (Purple sea urchin).Mammalian hyaluronate-binding protein TSG-6 (or PS4), a serum and growth factor induced protein.Mammalian spermadhesins.Xenopus laevis embryonic protein UVS.2, which is expressed during dorsoanterior development.Several of the above proteins consist of a catalytic domain together with several CUB domains interspersed by calcium-binding EGF domains. Some CUB domains appear to be involved in oligomerisation and/or recognition of substrates and binding partners. For example, in the complement proteases, the CUB domains mediate dimerisation and binding to collagen-like regions of target proteins (e.g. C1q for C1r/C1s). The structure of CUB domains consists of a β-sandwich with a jelly-roll fold. Almost all CUB domains contain four conserved cysteines that probably form two disulphide bridges (C1-C2, C3-C4). The CUB1 domains of C1s and Map19 have calcium-binding sites [ ].
Protein Domain
Name: Bacterial microcompartment (BMC) circularly permuted domain
Type: Domain
Description: Bacterial microcompartments (BMCs) are large proteinaceous structures comprised of a roughly icosahedral shell and a series of encapsulated enzymes. They are found across bacteria where they play functionally diverse roles including CO(2) fixation and the catabolism of a range of organic compounds. They function as organelles by sequestering particular metabolic processes within the cell. A shell or capsid, which is composed of a few thousand protein subunits, surrounds a series of sequentially acting enzymes and controls the diffusion of substrates and products (including toxic or volatile intermediates) into and out of the lumen. Although functionally distinct BMCs vary in their encapsulated enzymes, all are defined by homologous shell proteins. The shells of BMCs are made primarily of a family of proteins whose structural core is the BMC domain, and variations upon this core provide functional diversity [ , , ]. There are three classes of constituent proteins that form a shell with icosahedral symmetry: hexamer-forming proteins containing a single BMC domain (BMC-H); trimer/pseudohexamer-forming proteins consisting of a fusion of two BMC domains (BMC-T), and pentamer-forming proteins containing a bacterial microcompartment vertex (or BMV) domain (BMC-P). The BMC-H and BMC-T proteins form the facets, and the BMC-P proteins form the vertices of the icosahedron. These three protein types form cyclic homooligomers with pores at the centre of symmetry that enable metabolite transport across the shell [ , , , , , , , ].The BMC domain fold consists of three α-helices (designated A, B, and C) and four β-strands (designated β1, β2, β3, and β4). Some instances of the BMC shell protein reveal a circular permutation in which a highly similar tertiary structure is built from secondary structure elements occurring in a different order. The secondary structure elements contributed by the C-terminal region of the typical BMC fold are instead contributed by the N-terminal region of the BMC circularly permuted domain [ , , ].This entry represent the circularly permuted BMC domain found in ethanolamine utilization protein EutS/ EutL and related proteins.
Protein Domain
Name: Clathrin adaptor, mu subunit
Type: Family
Description: Proteins synthesized on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. These vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transport []. Clathrin coats contain both clathrin (acts as a scaffold) and adaptor complexes that link clathrin to receptors in coated vesicles. Clathrin-associated protein complexes are believed to interact with the cytoplasmic tails of membrane proteins, leading to their selection and concentration. The two major types of clathrin adaptor complexes are the heterotetrameric adaptor protein (AP) complexes, and the monomeric GGA (Golgi-localising, Gamma-adaptin ear domain homology, ARF-binding proteins) adaptors [, ].AP (adaptor protein) complexes are found in coated vesicles and clathrin-coated pits. AP complexes connect cargo proteins and lipids to clathrin at vesicle budding sites, as well as binding accessory proteins that regulate coat assembly and disassembly (such as AP180, epsins and auxilin). There are different AP complexes in mammals. AP1 is responsible for the transport of lysosomal hydrolases between the TGN and endosomes [ ]. AP2 associates with the plasma membrane and is responsible for endocytosis []. AP3 is responsible for protein trafficking to lysosomes and other related organelles []. AP4 is less well characterised. AP complexes are heterotetramers composed of two large subunits (adaptins), a medium subunit (mu) and a small subunit (sigma). For example, in AP1 these subunits are gamma-1-adaptin, beta-1-adaptin, mu-1 and sigma-1, while in AP2 they are alpha-adaptin, beta-2-adaptin, mu-2 and sigma-2. Each subunit has a specific function. Adaptins recognise and bind to clathrin through their hinge region (clathrin box), and recruit accessory proteins that modulate AP function through their C-terminal ear (appendage) domains. Mu recognises tyrosine-based sorting signals within the cytoplasmic domains of transmembrane cargo proteins []. One function of clathrin and AP2 complex-mediated endocytosis is to regulate the number of GABA(A) receptors available at the cell surface []. This entry represents the mu subunit of various clathrin adaptors (AP1, AP2 and AP3) [ ]. The mu subunit regulates the coupling of clathrin lattices with particular membrane proteins by self-phosphorylation via a mechanism that is still unclear []. The mu subunit possesses a highly conserved N-terminal domain of around 230 amino acids, which may be the region of interaction with other AP proteins; a linker region of between 10 and 42 amino acids; and a less well-conserved C-terminal domain of around 190 amino acids, which may be the site of specific interaction with the protein being transported in the vesicle [].
Protein Domain
Name: Ribosomal Proteins L2, RNA binding domain
Type: Domain
Description: Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [ , ]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [, ].Ribosomal protein L2 is one of the proteins from the large ribosomal subunit. The best conserved region is located in the C-terminal section of these proteins. In Escherichia coli, L2 is known to bind to the 23S rRNA and to have peptidyltransferase activity. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities [ ], groups:Eubacterial L2.Algal and plant chloroplast L2.Cyanelle L2.Archaebacterial L2.Plant L2.Slime mold L2.Marchantia polymorpha mitochondrial L2.Paramecium tetraurelia mitochondrial L2.Fission yeast K5, K37 and KD4.Yeast YL6.Vertebrate L8.
Protein Domain
Name: Zinc finger, MYND-type
Type: Domain
Description: Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [ , , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. This entry represents MYND-type zinc finger domains. The MYND domain (myeloid, Nervy, and DEAF-1) is present in a large group of proteins that includes RP-8 (PDCD2), Nervy, and predicted proteins from Drosophila, mammals, Caenorhabditis elegans, yeast, and plants [ , , ]. The MYND domain consists of a cluster of cysteine and histidine residues, arranged with an invariant spacing to form a potential zinc-binding motif []. Mutating conserved cysteine residues in the DEAF-1 MYND domain does not abolish DNA binding, which suggests that the MYND domain might be involved in protein-protein interactions []. Indeed, the MYND domain of ETO/MTG8 interacts directly with the N-CoR and SMRT co-repressors [, ]. Aberrant recruitment of co-repressor complexes and inappropriate transcriptionalrepression is believed to be a general mechanism of leukemogenesis caused by the t(8;21) translocations that fuse ETO with the acute myelogenous leukemia 1 (AML1) protein. ETO has been shown to be a co-repressor recruited by the promyelocytic leukemia zinc finger (PLZF) protein [ ]. Adivergent MYND domain present in the adenovirus E1A binding protein BS69 was also shown to interact with N-CoR and mediate transcriptional repression [ ]. The current evidence suggests that the MYND motif in mammalian proteins constitutes a protein-protein interaction domain that functions as a co-repressor-recruiting interface.
Protein Domain
Name: SUF system FeS cluster assembly, SufR regulator, cyanobacteria
Type: Family
Description: Iron-sulphur (FeS) clusters are important cofactors for numerous proteins involved in electron transfer, in redox and non-redox catalysis, in gene regulation, and as sensors of oxygen and iron. These functions depend on the various FeS cluster prosthetic groups, the most common being [2Fe-2S]and [4Fe-4S] []. FeS cluster assembly is a complex process involving the mobilisation of Fe and S atoms from storage sources, their assembly into [Fe-S]form, their transport to specific cellular locations, and their transfer to recipient apoproteins. So far, three FeS assembly machineries have been identified, which are capable of synthesising all types of [Fe-S] clusters: ISC (iron-sulphur cluster), SUF (sulphur assimilation), and NIF (nitrogen fixation) systems.The ISC system is conserved in eubacteria and eukaryotes (mitochondria), and has broad specificity, targeting general FeS proteins [ , ]. It is encoded by the isc operon (iscRSUA-hscBA-fdx-iscX). IscS is a cysteine desulphurase, which obtains S from cysteine (converting it to alanine) and serves as a S donor for FeS cluster assembly. IscU and IscA act as scaffolds to accept S and Fe atoms, assembling clusters and transferring them to recipient apoproteins. HscA is a molecular chaperone and HscB is a co-chaperone. Fdx is a [2Fe-2S]-type ferredoxin. IscR is a transcription factor that regulates expression of the isc operon. IscX (also known as YfhJ) appears to interact with IscS and may function as an Fe donor during cluster assembly [ ].The SUF system is an alternative pathway to the ISC system that operates under iron starvation and oxidative stress. It is found in eubacteria, archaea and eukaryotes (plastids). The SUF system is encoded by the suf operon (sufABCDSE), and the six encoded proteins are arranged into two complexes (SufSE and SufBCD) and one protein (SufA). SufS is a pyridoxal-phosphate (PLP) protein displaying cysteine desulphurase activity. SufE acts as a scaffold protein that accepts S from SufS and donates it to SufA [ ]. SufC is an ATPase with an unorthodox ATP-binding cassette (ABC)-like component. SufA is homologous to IscA [], acting as a scaffold protein in which Fe and S atoms are assembled into [FeS]cluster forms, which can then easily be transferred to apoproteins targets. In the NIF system, NifS and NifU are required for the formation of metalloclusters of nitrogenase in Azotobacter vinelandii, and other organisms, as well as in the maturation of other FeS proteins. Nitrogenase catalyses the fixation of nitrogen. It contains a complex cluster, the FeMo cofactor, which contains molybdenum, Fe and S. NifS is a cysteine desulphurase. NifU binds one Fe atom at its N-terminal, assembling an FeS cluster that is transferred to nitrogenase apoproteins [ ]. Nif proteins involved in the formation of FeS clusters can also be found in organisms that do not fix nitrogen [].This entry represents members of the SufR cyanobacterial protein family of transcriptional regulators that control the SUF system. In all cases, the sufR gene is encoded near SUF system genes but in the opposite direction. This DNA-binding protein belongs to the DeoR family of helix-loop-helix proteins. All members also have a probable metal-binding motif C-X(12)-C-X(13)-C-X(14)-C near the C terminus.
Protein Domain
Name: Tetracycline resistance leader peptide, TetM
Type: Family
Description: The antibiotic tetracycline has a broad spectrum of activity, acting to inhibit bacterial protein synthesis by binding to the 30S ribosomal subunit, which prevents the association of the aminoacyl-tRNA to the ribosomal acceptor A site. Tetracycline binding is reversible, therefore diluting out the antibiotic can reverse its effects. Tetracycline resistance genes are often located on mobile elements, such as plasmids, transposons and/or conjugative transposons, which can sometimes be transferred between bacterial species. In certain cases, tetracycline can enhance the transfer of these elements, thereby promoting resistance amongst a bacterial colony. There are three types of tetracycline resistance: tetracycline efflux, ribosomal protection, and tetracycline modification [ , ]: Tetracycline efflux proteins belong to the major facilitator superfamily. Efflux proteins are membrane-associated proteins that recognise and export tetracycline from the cell. They are found in both Gram-positive and Gram-negative bacteria [ ]. There are at least 22 different tetracycline efflux proteins, grouped according to sequence similarity: Group 1 are Tet(A), Tet(B), Tet(C), Tet(D), Tet(E), Tet(G), Tet(H), Tet(J), Tet(Z) and Tet(30); Group 2 are Tet(K) and Tet(L); Group 3 are Otr(B) and Tcr(3); Group 4 is TetA(P); Group 5 is Tet(V). In addition, there are the efflux proteins Tet(31), Tet(33), Tet(V), Tet(Y), Tet(34), and Tet(35).Ribosomal protection proteins are cytoplasmic proteins that display homology with the elongation factors EF-Tu and EF-G. Protection proteins bind the ribosome, causing an alteration in ribosomal conformation that prevents tetracycline from binding. There are at least ten ribosomal protection proteins: Tet(M), Tet(O), Tet(S), Tet(W), Tet(32), Tet(36), Tet(Q), Tet(T), Otr(A), and TetB(P). Both Tet(M) and Tet(O) have ribosome-dependent GTPase activity, the hydrolysis of GTP providing the energy for the ribosomal conformational changes. Tetracycline modification proteins include the enzymes Tet(37) and Tet(X), both of which inactivate tetracycline. In addition, there are the tetracycline resistance proteins Tet(U) and Otr(C).The expression of several of these tet genes is controlled by a family of tetracycline transcriptional regulators known as TetR. TetR family regulators are involved in the transcriptional control of multidrug efflux pumps, pathways for the biosynthesis of antibiotics, response to osmotic stress and toxic chemicals, control of catabolic pathways, differentiation processes, and pathogenicity [ ]. The TetR proteins identified in over 115 genera of bacteria and archaea share a common helix-turn-helix (HTH) structure in their DNA-binding domain. However, TetR proteins can work in different ways: they can bind a target operator directly to exert their effect (e.g. TetR binds Tet(A) gene to repress it in the absence of tetracycline), or they can be involved in complex regulatory cascades in which the TetR protein can either be modulated by another regulator or TetR can trigger the cellular response. This entry represents the tetracycline resistance leader peptide, which can be found in Tet(M) ribosomal protection proteins. A short open reading frame corresponding to a 28 amino acid peptide, which contains a number of inverted repeat sequences was found immediately upstream of tet(M). Transcriptional analyses has found that expression of tet(M) resulted from an extension of a small transcript representing the upstream leader region into the resistance determinant. Therefore, this leader sequence is responsible for transcriptional attenuation and thus regulation of the transcription of tet(M) [ ].
Protein Domain
Name: Blue (type 1) copper domain
Type: Domain
Description: Blue (type 1) copper proteins constitute a diverse class of proteins, including small blue proteins and multicopperoxidases. They bind copper and are characterised by an intense electronic absorption band near 600 nm [ , ].The most well known members of this class of proteins are the small blue proteins, which includes azurins and plastocyanins. It is a group of monomeric proteins which contain one copper ion per molecule. The plant chloroplastic plastocyanins exchange electrons with cytochrome c6, and the distantly related bacterial azurins exchange electrons with cytochrome c551. This group also includes amicyanin from bacteria such as Methylobacterium extorquens or Paracoccus versutus (Thiobacillus versutus) that can grow on methylamine; auracyanins A and B from Chloroflexus aurantiacus []; blue copper protein from Alcaligenes faecalis; cupredoxin (CPC) from Cucumis sativus (Cucumber) peelings []; cusacyanin (basic blue protein; plantacyanin, CBP) from cucumber; halocyanin from Natronomonas pharaonis (Natronobacterium pharaonis) [], a membrane associated copper-binding protein; pseudoazurin from Pseudomonas; rusticyanin from Thiobacillus ferrooxidans []; stellacyanin from Rhus vernicifera (Japanese lacquer tree); umecyanin from the roots of Armoracia rusticana (Horseradish); and allergen Ra3 from ragweed. This pollen protein is evolutionary related to the above proteins, but seems to have lost the ability to bind copper.The small blue proteins are single-domain proteins. The domain consists of a β-sheet sandwich, composed of eight strands in two sheets, and has predominantly antiparallel β-strand topology [ ]. This entry represents the blue copper domain.
Protein Domain
Name: SecE superfamily
Type: Homologous_superfamily
Description: Secretion across the inner membrane in some Gram-negative bacteria occurs via the preprotein translocase pathway. Proteins are produced in the cytoplasm as precursors, and require a chaperone subunit to direct them to the translocase component [ ]. From there, the mature proteins are either targeted to the outer membrane, or remain as periplasmic proteins. The translocase protein subunits are encoded on the bacterial chromosome.The translocase itself comprises 7 proteins, including a chaperone protein (SecB), an ATPase (SecA), an integral membrane complex (SecCY, SecE and SecG), and two additional membrane proteins that promote the release of the mature peptide into the periplasm (SecD and SecF) [ ]. The chaperone protein SecB [] is a highly acidic homotetrameric protein that exists as a "dimer of dimers"in the bacterial cytoplasm. SecB maintains preproteins in an unfolded state after translation, and targets these to the peripheral membrane protein ATPase SecA for secretion [ ]. SecE, part of the main SecYEG translocase complex, is ~106 residues in length, and spans the inner membrane of the Gram-negative bacterial envelope. Together with SecY and SecG, SecE forms a multimeric channel through which preproteins are translocated, using both proton motive forces and ATP-driven secretion. The latter is mediated by SecA. The structure of the Escherichia coli SecYEG assembly revealed a sandwich of two membranes interacting through the extensive cytoplasmic domains []. Each membrane is composed of dimers of SecYEG. The monomeric complex contains 15 transmembrane helices.
Protein Domain
Name: Gigaxonin, BTB/POZ domain
Type: Domain
Description: The KLHL (Kelch-like) proteins generally have a BTB/POZ domain, a BACK domain, and five to six Kelch motifs. They constitute a subgroup at the intersection between the BTB/POZ domain and Kelch domain superfamilies. The BTB/POZ domain facilitates protein binding [ ], while the Kelch domain (repeats) form β-propellers. The Kelch superfamily of proteins can be subdivided into five groups: (1) N-propeller, C-dimer proteins, (2) N-propeller proteins, (3) propeller proteins, (4) N-dimer, C-propeller proteins, and (5) C-propeller proteins. KLHL family members belong to the N-dimer, C-propeller subclass of Kelch repeat proteins []. In addition to BTB/POZ and Kelch domains, the KLHL family members contain a BACK domain, first described as a 130-residue region of conservation observed amongst BTB-Kelch proteins []. Many of the Kelch-like proteins have been identified as adaptors for the recruitment of substrates to Cul3-based E3 ubiquitin ligases [, ].Gigaxonin (also known as KLHL16) regulates microtubule-associated protein 1B (MAP1B), which is involved in maintaining the integrity of cytoskeletal structures and promoting neuronal stability [ ]. Gigaxonin belongs to the KLHL family. Mutations in the gigaxonin gene cause giant axonal neuropathy (GAN), which is a devastating sensory and motor neuropathy []. Gigaxonin binds to the ubiquitin-activating enzyme E1 through its N-terminal BTB domain, while the carboxy-terminal kelch repeat domain interacts directly with the light chain (LC) of MAP1B. It may serve as an ubiquitin substrate adaptor protein that controls MAP1B-LC degradation [, ]. This entry represents the BTB domain.
Protein Domain
Name: Small GTPase, Ras-type
Type: Family
Description: Small GTPases form an independent superfamily within the larger class of regulatory GTP hydrolases. This superfamily contains proteins that control a vast number of important processes and possess a common, structurally preserved GTP-binding domain [ , ]. Sequence comparisons of small G proteins from various species have revealed that they are conserved in primary structures at the level of 30-55% similarity [].Crystallographic analysis of various small G proteins revealed the presence of a 20kDa catalytic domain that is unique for the whole superfamily [ , ]. The domain is built of five alpha helices (A1-A5), six β-strands (B1-B6) and five polypeptide loops (G1-G5). A structural comparison of the GTP- and GDP-bound form, allows one to distinguish two functional loop regions: switch I and switch II that surround the gamma-phosphate group of the nucleotide. The G1 loop (also called the P-loop) that connects the B1 strand and the A1 helix is responsible for the binding of the phosphate groups. The G3 loop provides residues for Mg2 and phosphate binding and is located at the N terminus of the A2 helix. The G1 and G3 loops are sequentially similar to Walker A and Walker B boxes that are found in other nucleotide binding motifs. The G2 loop connects the A1 helix and the B2 strand and contains a conserved Thr residue responsible for Mg2 binding. The guanine base is recognised by the G4 and G5 loops. The consensus sequence NKXD of the G4 loop contains Lys and Asp residues directly interacting with the nucleotide. Part of the G5 loop located between B6 and A5 acts as a recognition site for the guanine base [ ].The small GTPase superfamily can be divided into at least 8 different families, including:Arf small GTPases. GTP-binding proteins involved in protein trafficking by modulating vesicle budding and uncoating within the Golgi apparatus.Ran small GTPases. GTP-binding proteins involved in nucleocytoplasmic transport. Required for the import of proteins into the nucleus and also for RNA export.Rab small GTPases. GTP-binding proteins involved in vesicular traffic.Rho small GTPases. GTP-binding proteins that control cytoskeleton reorganisation.Ras small GTPases. GTP-binding proteins involved in signalling pathways.Sar1 small GTPases. Small GTPase component of the coat protein complex II (COPII) which promotes the formation of transport vesicles from the endoplasmic reticulum (ER).Mitochondrial Rho (Miro). Small GTPase domain found in mitochondrial proteins involved in mitochondrial trafficking.Roc small GTPases domain. Small GTPase domain always found associated with the COR domain.Ras proteins are small GTPases that regulate cell growth, proliferation and differentiation. The different Ras isoforms: H-ras, N-ras and K-ras, generate distinct signaloutputs, despite interacting with a common set of activators and effectors. Ras is activated by guanine nucleotide exchange factors (GEFs) that release GDP and allow GTP binding. Many RasGEFs have been identified.These are sequestered in the cytosol until activation by growth factors triggers recruitment to the plasma membrane or Golgi, where the GEFcolocalizes with Ras. Active GTP-bound Ras interacts with several effector proteins: among the best characterised are the Raf kinases,phosphatidylinositol 3-kinase (PI3K), RalGEFs and NORE/MST1. Ras proteins are synthesized as cytosolic precursors that undergo post-translational processing to be ableto associate with cell membranes [ ]. First, protein farnesyl transferase, a cytosolicenzyme, attaches a farnesyl group to the cysteine residue of the CAAX motif. Second, the farnesylated CAAX sequence targets Ras to thecytosolic surface of the ER where an endopeptidase removes the AAX tripeptide. Third, the α-carboxyl group on the now carboxy-terminal farnesylcysteine ismethylated by isoprenylcysteine carboxyl methyltransferase. Finally, after methylation, Ras proteins take one of two routes to the cell surface, which is dictated by a second targetingsignal that is located immediately amino-terminal to the farnesylated cysteine. N-ras and H-ras are expressedstably on the plasma membrane, on Golgi in transfected cells, and at least transiently on the ER. Ras has also been visualized on endosomes.
Protein Domain
Name: Ran GTPase
Type: Family
Description: Small GTPases form an independent superfamily within the larger class of regulatory GTP hydrolases. This superfamily contains proteins that control avast number of important processes and possess a common, structurally preserved GTP-binding domain [, ]. Sequence comparisons of small G proteinsfrom various species have revealed that they are conserved in primary structures at the level of 30-55% similarity [].Crystallographic analysis of various small G proteins revealed the presence of a 20kDa catalytic domain that is unique for the whole superfamily [ , ]. The domain is built of five alpha helices (A1-A5), sixβ-strands (B1-B6) and five polypeptide loops (G1-G5). A structural comparison of the GTP- and GDP-bound form, allows one to distinguish twofunctional loop regions: switch I and switch II that surround the gamma-phosphate group of the nucleotide. The G1 loop (also called the P-loop)that connects the B1 strand and the A1 helix is responsible for the binding of the phosphate groups. The G3 loop provides residues for Mg(2+) and phosphatebinding and is located at the N terminus of the A2 helix. The G1 and G3 loops are sequentially similar to Walker A and Walker B boxes that are found inother nucleotide binding motifs. The G2 loop connects the A1 helix and the B2 strand and contains a conserved Thr residue responsible for Mg(2+) binding.The guanine base is recognised by the G4 and G5 loops. The consensus sequence NKXD of the G4 loop contains Lys and Asp residues directly interacting withthe nucleotide. Part of the G5 loop located between B6 and A5 acts as a recognition site for the guanine base [].The small GTPase superfamily can be divided in 8 different families: Arf small GTPases. GTP-binding proteins involved in protein trafficking by modulating vesicle budding and un-coating within the Golgi apparatusRan small GTPases. GTP-binding proteins involved in nucleocytoplasmic transport. Required for the import of proteins into the nucleus and alsofor RNA export Rab small GTPases. GTP-binding proteins involved in vesicular traffic. Rho small GTPases. GTP-binding proteins that control cytoskeleton reorganisationRas small GTPases. GTP-binding proteins involved in signaling pathways Sar1 small GTPases. Small GTPase component of the coat protein complex II (COPII) which promotes the formation of transport vesicles from theendoplasmic reticulum (ER) Mitochondrial Rho (Miro). Small GTPase domain found in mitochondrial proteins involved in mitochondrial traffickingRoc small GTPases domain. Small GTPase domain always found associated with the COR domain.Ran (or TC4), is an evolutionary conserved member of the Ras superfamily of small GTPases that regulates all receptor-mediated transport between the nucleus and the cytoplasm. Ran has been implicated in a large number of processes, including nucleocytoplasmic transport, RNA synthesis, processing and export and cell cycle checkpoint control [, ]. Ran plays a crucial role in both import/export pathways and determines the directionality of nuclear transport. Import receptors (importins) bind their cargos in the cytoplasm where the concentration of RanGTP is low (due to action of RanGAP), and release their cargos in the nucleus where the concentration of RanGTP is high (due to action of RanGEF) [, ]. Export receptors (exportins) respond to RanGTP in the opposite manner. Furthermore, it has been shown that nuclear transport factor 2 (NTF2, ) stimulates efficient nuclear import of a cargo protein. NTF2 binds specifically to RanGDP and to the FXFG repeat containing nucleoporins [ ]. Ran is generally included in the RAS 'superfamily' of small GTP-binding proteins [ ], but it is only slightly related to the other RAS proteins. It also differs from RAS proteins in that it lacks cysteine residues at its C-terminal and is therefore not subject to prenylation. Instead, Ran has an acidic C terminus. It is, however, similar to RAS family members in requiring a specific guanine nucleotide exchange factor (GEF) and a specific GTPase activating protein (GAP) as stimulators of overall GTPase activity.Ran consists of a core domain that is structurally similar to the GTP-binding domains of other small GTPases but, in addition, Ran has a C-terminal extension consisting of an unstructured linker and a 16 residue α-helix that is located opposite the "Switch I"region in the RanGDP structure [ ]. Three regions of Ran change conformation depending on the nucleotide bound, the Switch I and II regions, which interact with the bound nucleotide, as well as the C-terminal extension. In RanGDP, the C-terminal extension contacts the core of the protein, while in RanGTP, the extension is extending away from the core, most likely due to a steric clash between the switch I region and the linker part of the C-terminal extension. This suggests that the C-terminal extension in RanGDP is crucial for shielding residues in the core domain and preventing the switch regions from adopting a GTP-like form. This prevents binding of transport factors to RanGDP that would otherwise lead to uncoordinated interaction between importin beta-like proteins and cellular factors.
Protein Domain
Name: Fragile X messenger ribonucleoprotein 1, C-terminal core
Type: Domain
Description: This entry represents the core C-terminal region of the fragile X messenger ribonucleoprotein 1 (FMR1) and its paralogues fragile X-related proteins 1/2 (FXR1/2). These proteins have different regions at their very C terminus. The Glutamine-arginine rich region facilitates protein interactions [ , ]. Proteins containing this domain contain two blocks of RGG repeats that bind to G-quartet sequences in a wide variety of mRNAs [].Fragile X messenger ribonucleoprotein 1 (FMR1), fragile X-related 1 protein (FXR1) and fragile X-related 2 protein (FXR2) are members of a small family of RNA-binding proteins that are involved the regulation of alternative mRNA splicing, mRNA stability, mRNA dendritic transport and postsynaptic local protein synthesis of a subset of mRNAs, playing a crucial role in neuronal development and synaptic plasticity [ , , , ]. The proteins contain two KH domains and a RGG box that are characteristic motifs in RNA-binding proteins as well as nuclear localization and export signals.
Protein Domain
Name: K Homology domain, type 1
Type: Domain
Description: The K homology (KH) domain was first identified in the human heterogeneous nuclear ribonucleoprotein (hnRNP) K. It is a domain of around 70 amino acids that is present in a wide variety of quite diverse nucleic acid-binding proteins [ ]. It has been shown to bind RNA [, ]. Like many other RNA-binding motifs, KH motifs are found in one or multiple copies (14 copies in chicken vigilin) and, at least for hnRNP K (three copies) and FMR-1 (two copies), each motif is necessary for in vitroRNA binding activity, suggesting that they may function cooperatively or, in the case of single KH motif proteins (for example, Mer1p), independently [ ].According to structural [ , , ] analysis the KH domain can be separated in two groups. The first group or type-1 contain a β-α-α-β-β-α structure, whereas in the type-2 the two last β-sheet are located in the N-terminal part of the domain (α-β-β-α-α-β). Sequence similarity between these two folds are limited to a short region (VIGXXGXXI) in the RNA binding motif. This motif is located between helices 1 and 2 in type-1 and between helices 2 and 3 in type-2. Proteins known to contain a type-1 KH domain include bacterial polyribonucleotide nucleotidyltransferase (); vertebrate Fragile X messenger ribonucleoprotein 1 (FMR1); eukaryotic heterogeneous nuclear ribonucleoprotein K (hnRNP K), one of at least 20 major proteins that are part of hnRNP particles in mammalian cells; mammalian poly(rC) binding proteins; Artemia salina glycine-rich protein GRP33; yeast PAB1-binding protein 2 (PBP2); vertebrate vigilin; and human high-density lipoprotein binding protein (HDL-binding protein).
USDA
InterMine logo
The Legume Information System (LIS) is a research project of the USDA-ARS:Corn Insects and Crop Genetics Research in Ames, IA.
LegumeMine || ArachisMine | CicerMine | GlycineMine | LensMine | LupinusMine | PhaseolusMine | VignaMine | MedicagoMine
InterMine © 2002 - 2022 Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, United Kingdom