Adhesion G-protein coupled receptor V1 (ADGRV1), also known as G protein-coupled receptor 98, is a receptor that has an essential role in the development of hearing and vision [
,
,
,
]. It couples to G-alpha(i)-proteins, GNAI1/2/3, G-alpha(q)-proteins, GNAQ, as well as G-alpha(s)-proteins, GNAS, inhibiting adenylate cyclase (AC) activity and cAMP production [,
]. Defects in GPR98 are the cause of Usher syndrome type 2C (USH2C). USH is a genetically heterogeneous condition characterised by the association of retinitis pigmentosa with sensorineural deafness [,
]. Defects in GPR98 may also be a cause of familial febrile convulsions type 4 (FEB4); also known as familial febrile seizures 4 [].
The adenovirus early E2A DNA-binding protein (Ad DBP) is a multifunctional protein required, amongst other things, for DNA replication and transcription control. It binds to single- and double-stranded DNA, as well as to RNA, in a sequence-independent manner. This signature represents the zinc binding domain of the viral DNA- binding protein, which is active in DNA replication. The zinc atoms appear to be required for the stability of the protein fold rather than being involved in direct contacts with the DNA, the protein contains two zinc atoms in different, novel coordinations. Two copies of this domain are found at the C terminus of many members of the family [
].
Members of this family are the PhnL protein of C-P lyase systems for utilization of phosphonates. These systems resemble phosphonatase-based systems in having a three-component ABC transporter, where
is the permease,
is the phosphonates binding protein, and
is the ATP-binding cassette (ABC) protein. They differ, however, in having, typically, ten or more additional genes, many of which are believed to form a membrane-associated C-P lyase complex. This protein (PhnL) and the adjacent-encoded PhnK (
) resemble transporter ATP-binding proteins but are suggested, based on mutagenesis studies, to be part of this C-P lyase complex rather than part of a transporter per se [
,
].
This entry represents a region of about 79 amino acids found tandemly repeated up to fourteen times within the proteins that contain it. The repeats lack cysteines and are highly conserved, even at the DNA level, within and between proteins [
]. Proteins containing these repeats include the Rib and alpha surface antigens of group B Streptococcus, Esp of Enterococcus faecalis (Streptococcus faecalis), and related proteins of Lactobacillus. Most members of this protein family also have the cell wall anchor motif, LPXTG, shared by many staphyloccal and streptococcal surface antigens. These repeats are thought to define protective epitopes and may play a role in generating phenotypic and genotypic variation [].
This entry represents the lipid-binding protein YceI from Escherichia coli [
] and the polyisoprenoid-binding protein TTHA0802 from Thermus thermophilus []. Both these proteins share a common domain with an 8-stranded β-barrel fold, which resembles the lipocalin fold, although no sequence homology exists with lipocalins. In TTHA0802, the protein binds the polyisoprenoid chain within the pore of the barrel via hydrophobic interactions []. Sequence homologues of this core structure are present in a wide range of bacteria and archaea. The crystal structures of Yce1 and TTHA0802 suggest that this family of proteins plays an important role in isoprenoid quinone metabolism and/or transport and/or storage [].
CO dehydrogenase flavoprotein, C-terminal domain superfamily
Type:
Homologous_superfamily
Description:
Proteins containing this domain form structural complexes with other known families, such as
and
. The carbon monoxide (CO) dehydrogenase of Oligotropha carboxidovorans is a heterotrimeric complex composed of a apoflavoprotein, a molybdoprotein, and an iron-sulphur protein. It can be dissociated with sodium dodecylsulphate []. CO dehydrogenase catalyzes the oxidation of CO according to the following equation []: CO + H2O = CO2 + 2e + 2H+ Subunit S represents the iron-sulphur protein of CO dehydrogenase and is clearly divided into a C- and an N-terminal domain, each binding a [2Fe-2S] cluster [].The structure of the C-terminal domain has a β(3,4)-α(3) fold arranged in an α+β sandwich.
Adenoviruses are responsible for diseases such as pneumonia, cystitis, conjunctivitis and diarrhoea, all of which can be fatal to patients who are immunocompromised [
]. Viral infection commences with recognition of host cell receptors by means of specialised proteins on viral surfaces. Specific attachment of adenovirus is achieved through interactions between host-cell receptors and the adenovirus fibre protein and is mediated by the globular carboxy-terminal domain of the adenovirus fibre protein, termed the
carboxy-terminal knob domain. Fibre protein forms spikes that protrude from each vertex of the icosahedral capsid. Fibre proteins are shed during virus entry, when virus is still at the cell surface [,
].
This entry includes Protein GlcG from Escherichia coli, Corrinoid adenosyltransferase PduO from Salmonella typhimurium and many uncharacterised proteins from bacteria and fungi. GlcG controls the expression of the genes of the glycolate pathway [
,
]. The structure of GlcG is composed of an α-β(2)-α(3)-β(2)-α fold, similar to the Roadblock/LC7 domain. PduO converts cob(I)alamin to adenosylcobalamin (adenosylcob(III)alamin), the cofactor for propanediol dehydratase [,
,
]. Extracellular haem-binding protein from Streptomyces reticuli (HbpS,
) is also included in this entry. This protein interacts with the SenS/SenR two-component signal transduction system. Iron binds to surface-exposed lysine residues of an octomeric assembly of the protein [
].
Rhodanese, a sulphurtransferase involved in cyanide detoxification (see
) shares evolutionary relationship with a large family of proteins [
], includingCdc25 phosphatase catalytic domain.non-catalytic domains of eukaryotic dual-specificity MAPK-phosphatases.non-catalytic domains of yeast PTP-type MAPK-phosphatases.non-catalytic domains of yeast Ubp4, Ubp5, Ubp7.non-catalytic domains of mammalian Ubp-Y.Drosophila heat shock protein HSP-67BB.several bacterial cold-shock and phage shock proteins.plant senescence associated proteins.catalytic and non-catalytic domains of rhodanese (see
).
Rhodanese has an internal duplication. This domain is found as a single copy in other proteins, including phosphatases and ubiquitin C-terminal hydrolases [
]. The structure of this domain is composed of three layers (alpha/beta/alpha) arranged in parallel β-sheet of five strands.
A number of plant and fungal proteins that bind N-acetylglucosamine (e.g. solanaceous lectins of tomato and potato, plant endochitinases, the wound-induced proteins: hevein, win1 and win2, and the Kluyveromyces lactis killer toxin alpha subunit) contain this domain [
]. The domain may occur in one or more copies and is thought to be involved in recognition or binding of chitin subunits [,
]. In chitinases, as well as in the potato wound-induced proteins, the 43-residue domain directly follows the signal sequence and is therefore at the N terminus of the mature protein; in the killer toxin alpha subunit it is located in the central section of the protein.
Sulphate-binding protein (gene sbp or sbpA) and thiosulphate-binding protein (gene cysP) are two
structurally related periplasmic bacterial proteins which specifically bind sulphate and thiosulphate and are involved in the transport systems for these nutrients [
,
]. There are two conserved regions in the protein, one located in the N-terminal region and the other in the central
part of these proteins. The second region includes two adjacent amino acids (Ser-Gly) that, in sbp, are known to be essential for sulphate binding [
].This entry represents the second conserved region that includes two adjacent amino acids (Ser-Gly) known to be essential for sulfate binding [
].
This family consists of several invasion plasmid antigen IpaD proteins found in Shigella [
] as well as homologues such as SipD from Salmonella and BipD from Burkholderia []. Gram-negative bacteria use type III secretion systems (T3SSs) as protein transport devices for injecting virulence effector proteins into eukaryotic cells during infection. T3SSs consist of a needle complex (NC) and a tip complex (TC). In Shigella the TC is composed of two proteins, each essential to host cell sensing: IpaD and IpaB. IpaD is hydrophilic and required for tip recruitment of the hydrophobic proteins IpaB and IpaC, which later form the pore in host cell membranes [].
This entry represents the lipid-binding protein YceI from Escherichia coli [
] and the polyisoprenoid-binding protein TTHA0802 from Thermus thermophilus []. Both these proteins share a common domain with an 8-stranded β-barrel fold, which resembles the lipocalin fold, although no sequence homology exists with lipocalins. In TTHA0802, the protein binds the polyisoprenoid chain within the pore of the barrel via hydrophobic interactions []. Sequence homologues of this core structure are present in a wide range of bacteria and archaea. The crystal structures of Yce1 and TTHA0802 suggest that this family of proteins plays an important role in isoprenoid quinone metabolism and/or transport and/or storage [].
MsrPQ is an enzymatic system that repairs proteins containing methionine sulfoxide in the bacterial cell envelope. MsrP, a molybdo-enzyme, and MsrQ, a haem-binding membrane protein, are widely conserved throughout Gram-negative bacteria. MsrPQ uses electrons from the respiratory chain, which represents a novel mechanism to import reducing equivalents into the bacterial cell envelope. MsrPQ is essential for the maintenance of envelope integrity under bleach stress, rescuing periplasmic proteins from methionine oxidation, including the primary periplasmic chaperone SurA [
].MsrP have been shown to be able to reduce protein-bound methionine sulfoxide and be able to repair a wide range of periplasmic proteins in vitro [
].
The PLAC (protease and lacunin) domain is a six-cysteine region of about 40 residues that is present at or near the C-terminal of various enzymes and matrix proteins, including: mammalian PACE4 (paired basic amino acid cleaving enzyme 4), mammalian PCSK5 (proprotein convertase subtilisin/kexin type 5), mammalian metalloproteinases ADAMTS-2, -3, -10, -12, -14, -16, -17, and -19, and manduca Sexta matrix protein lacunin []. The PLAC domain is often associated with other domains, such as the thrombospondin type I repeat (TSP1) (), the Kunitz proteinase inhibitor domain (
), the Ig-like domain (
), the WAP domain (
), the subtilase domain (
), or the ADAM-type metalloprotease domain (
).
The F-actin binding domain forms a compact bundle of four antiparallel α-helices, which are arranged in a left-handed topology. Binding of F-actin to the F-actin binding domain may result in cytoplasmic retention and subcellular distribution of the protein, as well as possible inhibition of protein function [
]. Proteins containing this domain include tyrosine-protein kinases Abl1, which is a non-receptor tyrosine-protein kinase that plays a role in many key processes linked to cell growth and survival such as cytoskeleton remodeling in response to extracellular stimuli, cell motility and adhesion, receptor endocytosis, autophagy, DNA damage response and apoptosis [,
]. Abl1 is linked to different forms of leukemia in humans.
APP-BP1 (beta-Amyloid precursor protein binding protein 1), also known as Ula1 in yeasts, is a regulatory subunit of the E1 enzyme. Human APP-BP1 is necessary for cell cycle progression through the S-M checkpoint [
]. Drosophila APP-BP1 interacts antagonistically with APPL (APP-like protein) during Drosophila development [].Ubiquitin-like proteins (ublps) are conjugated to their targets by the sequential action of E1, E2 and often E3 enzymes. Each Ublp has a dedicated E1, or activating enzyme, that initiates its conjugation cascade. In humans the NEDD8 activating E1 enzyme is a heterodimer composed of APP-BP1 and UBA3 subunits. The ubiquitin-like protein (Ublp) NEDD8 regulates cell division, signalling and embryogenesis [
].
RIM proteins are scaffolding proteins at the active zone which bind to several other presynaptic proteins. The long isoforms of RIM proteins, which contain N-terminal Rab3 and Munc13 interacting domains, as well as a central PDZ domain and two C-terminal C2 domains, are encoded by two genes, Rim1 and Rim2 [
]. They have multiple isoforms (alpha, beta, gamma) diverging in their structural composition, which mediate overlapping and distinct functions [,
]. These isoforms are involved in determining Ca2+ channel density and vesicle docking at the presynaptic active zone []. This entry includes Rim and related proteins. In Caenorhabditis elegans Rim acts after vesicle docking likely via regulating priming [
].
JASMONATE ZIM-domain (JAZ) proteins act as repressors of jasmonate (JA) signaling [
]. The ZIM domain recruits the co-repressors NINJA and TOPLESS to JAZ-bound transcription factors, and contains a highly conserved TIF[F/Y]XG motif that defines the larger family of TIFY proteins to which JAZs belong. This family consists of TIFY/JAZ proteins, and also includes JAZ13 (At3g22275), a non-TIFY JAZ protein that is a repressor of JA signaling [
]. Some of the members of the TIFY/JAZ family, such as TIFY8, are not involved in JA signaling. TIFY8 is found in protein complexes involved in regulation of dephosphorylation, deubiquitination and O-linked N-acetylglucosamine modification, suggesting an important role in nuclear signal transduction [].
This entry represents the BAR domain found in a group of fungal proteins including Sip3 and Lam1 from budding yeasts. They contain an N-terminal BAR domain followed by a Pleckstrin Homology (PH) domain. Sip3 interacts with SNF1 protein kinase and activates transcription when anchored to DNA. It may function in the SNF1 pathway. BAR domains form dimers that bind to membranes, induce membrane bending and curvature, and may also be involved in protein-protein interactions [
].BAR domains are dimerization, lipid binding and curvature sensing modules found in many different proteins with diverse functions including organelle biogenesis, membrane trafficking or remodeling, and cell division and migration [
,
,
,
,
].
This entry defines the ligand-binding and dimerisation domain of the bacterial regulatory protein AraC and other HTH-type transcriptional regulators.
The crystal structure of the arabinose-binding and dimerisation domain of the Escherichia coli gene regulatory protein AraC was determined in the presence and absence of L-arabinose. The arabinose-bound molecule shows that the protein adopts an unusual fold, binding sugar within a beta barrel and completely burying the arabinose with the amino-terminal arm of the protein. Dimer contacts in the presence of arabinose are mediated by an antiparallel coiled-coil. In the uncomplexed protein, the amino-terminal arm is disordered, uncovering the sugar-binding pocket and allowing it to serve as an oligomerization interface [].
The plant PUB proteins, also known as U-box domain-containing proteins, are much more numerous in Arabidopsis which has 62 in comparison with the typical 6 in most animals. The majority of AtPUB of this family are known as ARM domain-containing PUB proteins which contain a C-terminal located, tandem ARM (armadillo) repeat protein-interaction region in addition to the U-box domain. They have been implicated in the regulation of cell death and defense. They also play important roles in other plant-specific pathways, such as controlling both self-incompatibility and pseudo-self-incompatibility, as well as acting in abiotic stress. A subgroup of ARM domain-containing PUB proteins harbors a plant-specific U-box N-terminal domain [
,
,
].
This domain, consisting of the distinct N-terminal PRY subdomain followed by the SPRY subdomain, is found at the C terminus of TRIM17, also known as RING finger protein 16 (RNF16) or testis RING finger protein (terf). TRIM17 domain is composed of RING/B-box/coiled-coil core and also known as RBCC protein, expressed almost exclusively in the testis. It exhibits E3 ligase activity, causing protein degradation of ZW10 interacting protein (ZWINT), a known component of the kinetochore complex required for the mitotic spindle checkpoint, and negatively regulates proliferation of breast cancer cells [
,
,
]. TRIM17 undergoes ubiquitination in COS7 fibroblast-like cells but is inhibited and stabilized by TRIM44 [].
Arenaviridae are single stranded RNA viruses. The arenaviridae S RNAs that have been characterised include conserved terminal sequences, an ambisense arrangement of the coding regions for the precursor glycoprotein (GPC) and nucleocapsid (N) proteins and an intergenic region capable of forming a base-paired "hairpin"structure. The mature glycoproteins that result are G1 and G2 and the N protein [
].The C-terminal domain of the nucleocapsid protein (that encapsulates the viral ssRNA) in arenaviridae has an RNaseH-like fold. This domain contains 3'-5' exoribonuclease activity involved in suppressing interferon induction [
]. It forms a typical alpha/beta/alpha sandwich architecture and contains a CCHE zinc-binding site near the 3'-5' exonuclease active site.
Colicins are bacterial toxins produced by Escherichia coli strains and are active against E. coli or related strains. These bacterial antibiotic toxins play an important role in the E. coli colonization of environmental niches. Colicin K uses the Tsx nucleoside-specific receptor for binding at the cell surface, the OmpA protein for translocation through the outer membrane, and the TolABQR proteins for the transit through the periplasm. The N-terminal domain interacts with components of its import machinery, including the TolB and TolQ proteins [
]. Members of this family are colicin K immunity proteins. These proteins are able to protect a cell against colicin K [].
Human SIRT1, an NAD+-dependent deacetylase, plays a role in cell death/ survival, senescence, and endocrine signalling [
]. Its substrates are well characterised, but evidence for the identity of its direct regulators has been wanting. Recently, the nuclear protein AROS (also termed 40S ribosomal protein S19-binding protein 1 []) has been implicated in the direct regulation of SIRT1 function - this protein has been dubbed 'active regulator of SIRT1. The protein has been shown to enhance SIRT1-mediated deacetylation of p53, in vitro and in vivo, and to inhibit p53-mediated transcriptional activity. It is the first direct SIRT1 regulator to have been identified that modulates p53-mediated growth regulation.
ERdj5, also known as JPDI and macrothioredoxin, is a protein containing an N-terminal DnaJ domain and four redox active TRX domains [
]. This entry represents the three TRX domains located at the C-terminal half of the protein. ERdj5 is a ubiquitous protein localized in the endoplasmic reticulum (ER) and is abundant in secretory cells. It's transcription is induced during ER stress []. It interacts with BiP through its DnaJ domain in an ATP-dependent manner. BiP, an ER-resident member of the Hsp70 chaperone family, functions in ER-associated degradation and protein translocation []. Also included in this entry is the single complete TRX domain of an uncharacterized protein from Tetraodon nigroviridis.
The 43kDa postsynaptic protein is a peripheral membrane protein thought to be involved
in the anchoring or stabilisation of the nicotinic acetylcholine receptor at synaptic sites [
]. It may link the receptor to the postsynaptic cytoskeleton through direct association with actin or spectrin. The 43kDa protein is
highly conserved across species. Two highly conserved regions, one encompassing the
N terminus and the other near the C terminus, may be important for interaction of the protein with other components of the postsynaptic membrane. This entry represents the N-terminal sequence of the mature protein (after removal of the initiator methionine); the resulting N-terminal glycine is thought to be myristoylated.
This small motif is found at the N terminus of Kank proteins and has been called the KN (for Kank N-terminal) motif. This protein is found in eukaryotes. Proteins in this family are typically between 413 to 1202 amino acids in length. This protein is found associated with
. This protein has two conserved sequence motifs: TPYG and LDLDF. Kank1 was obtained by positional cloning of a tumor suppressor gene in renal cell carcinoma, while the other members were found by homology search. The family is involved in the regulation of actin polymerisation and cell motility through signaling pathways containing PI3K/Akt and/or unidentified modulators/effectors [].
This entry includes the SAM domains-containing proteins such as Neurabin-1/Neurabin-2 and SAMD14 (Sterile alpha motif domain-containing protein 14). Most of them are multidomain proteins and in addition to SAM domain they contain other protein-binding domains such as PDZ and actin-binding domains. Members of this family participate in signal transduction. Neurabin-1 is involved in the regulation of Ca(2+) signaling intensity in alpha-adrenergic receptors [
]. It forms a functional pair of opposing regulators with neurabin-2. Neurabins are expressed almost exclusively in neuronal cells. They are known to interact with protein phosphatase 1 and inhibit its activity []. They also can bind actin filaments [].The function of SAMD14 is not clear.
Claudins form the paracellular tight junction seal in epithelial tissues. In humans, 24 claudins (claudin 1-24) have been identified. Their ability to polymerise and form strands is affected by the cell types [
,
,
]. They can also form heteropolymers with each other within and between tight junction strands []. Most of the claudins (claudin-12 being the exception) have a C-terminal PDZ-binding motif that can interact with other PDZ domain proteins, such as scaffolding protein, ZO-1, -2 and -3 []. They also interact with non-tight junction proteins, such as cell adhesion proteins EpCam and tetraspanins and the signaling proteins, ephrin A and B and their receptors, EphA and EphB [].
The N-terminal region of a number of fungal transcriptional regulatory proteins contains a Cys-rich motif that is involved in zinc-dependent binding of DNA. The region forms a binuclear Zn cluster, in which two Zn atoms are bound by six Cys residues [
,
]. A wide range of proteins are known to contain this domain. These include the proteins involved in arginine, proline, pyrimidine, quinate, maltose and galactose metabolism, amide and GABA catabolism, leucine biosynthesis, amongst others.This entry represetns a subset of these domains found specifically in maltose fermentation regulatory protein. This protein regulates the transcription of structural MAL1S (maltase) and AGT1 (maltose permease) genes.
The adenovirus early E2A DNA-binding protein (Ad DBP) is a multifunctional protein required, amongst other things, for DNA replication and transcription control. It binds to single- and double-stranded DNA, as well as to RNA, in a sequence-independent manner. This signature represents the zinc binding domain of the viral DNA- binding protein, which is active in DNA replication. The zinc atoms appear to be required for the stability of the protein fold rather than being involved in direct contacts with the DNA, the protein contains two zinc atoms in different, novel coordinations. Two copies of this domain are found at the C terminus of many members of the family [
].
ABC transporters belong to the ATP-Binding Cassette (ABC) superfamily, which uses the hydrolysis of ATP to energise diverse biological systems. ABC transporters minimally consist of two conserved regions: a highly conserved ATP binding cassette (ABC) and a less conserved transmembrane domain (TMD). These can be found on the same protein or on two different ones. Most ABC transporters function as a dimer and therefore are constituted of four domains, two ABC modules and two TMDs.ABC transporters are involved in the export or import of a wide variety of substrates ranging from small ions to macromolecules. The major function of ABC import systems is to provide essential nutrients to bacteria. They are found only in prokaryotes and their four constitutive domains are usually encoded by independent polypeptides (two ABC proteins and two TMD proteins). Prokaryotic importers require additional extracytoplasmic binding proteins (one or more per systems) for function. In contrast, export systems are involved in the extrusion of noxious substances, the export of extracellular toxins and the targeting of membrane components. They are found in all living organisms and in general the TMD is fused to the ABC module in a variety of combinations. Some eukaryotic exporters encode the four domains on the same polypeptide chain [
].The ABC module (approximately two hundred amino acid residues) is known to bind and hydrolyse ATP, thereby coupling transport to ATP hydrolysis in a large number of biological processes. The cassette is duplicated in several subfamilies. Its primary sequence is highly conserved, displaying a typical phosphate-binding loop: Walker A, and a magnesium binding site: Walker B. Besides these two regions, three other conserved motifs are present in the ABC cassette: the switch region which contains a histidine loop, postulated to polarise the attaching water molecule for hydrolysis, the signature conserved motif (LSGGQ) specific to the ABC transporter, and the Q-motif (between Walker A and the signature), which interacts with the gamma phosphate through a water bond. The Walker A, Walker B, Q-loop and switch region form the nucleotide binding site [,
,
].The 3D structure of a monomeric ABC module adopts a stubby L-shape with two distinct arms. ArmI (mainly β-strand) contains Walker A and Walker B. The important residues for ATP hydrolysis and/or binding are located in the P-loop. The ATP-binding pocket is located at the extremity of armI. The perpendicular armII contains mostly the alpha helical subdomain with the signature motif. It only seems to be required for structural integrity of the ABC module. ArmII is in direct contact with the TMD. The hinge between armI and armII contains both the histidine loop and the Q-loop, making contact with the gamma phosphate of the ATP molecule. ATP hydrolysis leads to a conformational change that could facilitate ADP release. In the dimer the two ABC cassettes contact each other through hydrophobic interactions at the antiparallel β-sheet of armI by a two-fold axis [
,
,
,
,
,
].The ATP-Binding Cassette (ABC) superfamily forms one of the largest of all protein families with a diversity of physiological functions [
]. Several studies have shown that there is a correlation between the functional characterisation and the phylogenetic classification of the ABC cassette [,
]. More than 50 subfamilies have been described based on a phylogenetic and functional classification [,
,
].On the basis of sequence similarities a family of related ATP-binding proteins has been characterised [,
,
,
,
]. The proteins belonging to this family also contain one or two copies of the 'A' consensus sequence [
] or the 'P-loop' [].
ABC transporters belong to the ATP-Binding Cassette (ABC) superfamily, which uses the hydrolysis of ATP to energise diverse biological systems. ABC transporters minimally consist of two conserved regions: a highly conserved ATP binding cassette (ABC) and a less conserved transmembrane domain (TMD). These can be found on the same protein or on two different ones. Most ABC transporters function as a dimer and therefore are constituted of four domains, two ABC modules and two TMDs.ABC transporters are involved in the export or import of a wide variety of substrates ranging from small ions to macromolecules. The major function of ABC import systems is to provide essential nutrients to bacteria. They are found only in prokaryotes and their four constitutive domains are usually encoded by independent polypeptides (two ABC proteins and two TMD proteins). Prokaryotic importers require additional extracytoplasmic binding proteins (one or more per systems) for function. In contrast, export systems are involved in the extrusion of noxious substances, the export of extracellular toxins and the targeting of membrane components. They are found in all living organisms and in general the TMD is fused to the ABC module in a variety of combinations. Some eukaryotic exporters encode the four domains on the same polypeptide chain [
].The ABC module (approximately two hundred amino acid residues) is known to bind and hydrolyse ATP, thereby coupling transport to ATP hydrolysis in a large number of biological processes. The cassette is duplicated in several subfamilies. Its primary sequence is highly conserved, displaying a typical phosphate-binding loop: Walker A, and a magnesium binding site: Walker B. Besides these two regions, three other conserved motifs are present in the ABC cassette: the switch region which contains a histidine loop, postulated to polarise the attaching water molecule for hydrolysis, the signature conserved motif (LSGGQ) specific to the ABC transporter, and the Q-motif (between Walker A and the signature), which interacts with the gamma phosphate through a water bond. The Walker A, Walker B, Q-loop and switch region form the nucleotide binding site [,
,
].The 3D structure of a monomeric ABC module adopts a stubby L-shape with two distinct arms. ArmI (mainly β-strand) contains Walker A and Walker B. The important residues for ATP hydrolysis and/or binding are located in the P-loop. The ATP-binding pocket is located at the extremity of armI. The perpendicular armII contains mostly the alpha helical subdomain with the signature motif. It only seems to be required for structural integrity of the ABC module. ArmII is in direct contact with the TMD. The hinge between armI and armII contains both the histidine loop and the Q-loop, making contact with the gamma phosphate of the ATP molecule. ATP hydrolysis leads to a conformational change that could facilitate ADP release. In the dimer the two ABC cassettes contact each other through hydrophobic interactions at the antiparallel β-sheet of armI by a two-fold axis [
,
,
,
,
,
].The ATP-Binding Cassette (ABC) superfamily forms one of the largest of all protein families with a diversity of physiological functions [
]. Several studies have shown that there is a correlation between the functional characterisation and the phylogenetic classification of the ABC cassette [,
]. More than 50 subfamilies have been described based on a phylogenetic and functional classification [,
,
].On the basis of sequence similarities a family of related ATP-binding proteins has been characterised [
,
,
,
,
]. The proteins belonging to this family also contain one or two copies of the 'A' consensus sequence [
] or the 'P-loop' [].
ABC transport system, 1-aminoethylphosphonate-binding protein, putative
Type:
Family
Description:
This ABC transporter extracellular solute-binding protein is found in a number of genomes in operon-like contexts strongly suggesting a substrate specificity for 2-aminoethylphosphonate (2-AEP), though some proteins in this group are described as ferric ion transporters.ABC transporters belong to the ATP-Binding Cassette (ABC) superfamily, which uses the hydrolysis of ATP to energise diverse biological systems. ABC transporters minimally consist of two conserved regions: a highly conserved ATP binding cassette (ABC) and a less conserved transmembrane domain (TMD). These can be found on the same protein or on two different ones. Most ABC transporters function as a dimer and therefore are constituted of four domains, two ABC modules and two TMDs.ABC transporters are involved in the export or import of a wide variety of substrates ranging from small ions to macromolecules. The major function of ABC import systems is to provide essential nutrients to bacteria. They are found only in prokaryotes and their four constitutive domains are usually encoded by independent polypeptides (two ABC proteins and two TMD proteins). Prokaryotic importers require additional extracytoplasmic binding proteins (one or more per systems) for function. In contrast, export systems are involved in the extrusion of noxious substances, the export of extracellular toxins and the targeting of membrane components. They are found in all living organisms and in general the TMD is fused to the ABC module in a variety of combinations. Some eukaryotic exporters encode the four domains on the same polypeptide chain [
].The ABC module (approximately two hundred amino acid residues) is known to bind and hydrolyse ATP, thereby coupling transport to ATP hydrolysis in a large number of biological processes. The cassette is duplicated in several subfamilies. Its primary sequence is highly conserved, displaying a typical phosphate-binding loop: Walker A, and a magnesium binding site: Walker B. Besides these two regions, three other conserved motifs are present in the ABC cassette: the switch region which contains a histidine loop, postulated to polarise the attaching water molecule for hydrolysis, the signature conserved motif (LSGGQ) specific to the ABC transporter, and the Q-motif (between Walker A and the signature), which interacts with the gamma phosphate through a water bond. The Walker A, Walker B, Q-loop and switch region form the nucleotide binding site [,
,
].The 3D structure of a monomeric ABC module adopts a stubby L-shape with two distinct arms. ArmI (mainly β-strand) contains Walker A and Walker B. The important residues for ATP hydrolysis and/or binding are located in the P-loop. The ATP-binding pocket is located at the extremity of armI. The perpendicular armII contains mostly the alpha helical subdomain with the signature motif. It only seems to be required for structural integrity of the ABC module. ArmII is in direct contact with the TMD. The hinge between armI and armII contains both the histidine loop and the Q-loop, making contact with the gamma phosphate of the ATP molecule. ATP hydrolysis leads to a conformational change that could facilitate ADP release. In the dimer the two ABC cassettes contact each other through hydrophobic interactions at the antiparallel β-sheet of armI by a two-fold axis [
,
,
,
,
,
].The ATP-Binding Cassette (ABC) superfamily forms one of the largest of all protein families with a diversity of physiological functions [
]. Several studies have shown that there is a correlation between the functional characterisation and the phylogenetic classification of the ABC cassette [,
]. More than 50 subfamilies have been described based on a phylogenetic and functional classification [,
,
].
Signal transduction histidine kinase, PEP-CTERM system, putative
Type:
Family
Description:
Two-component signal transduction systems enable bacteria to sense, respond, and adapt to a wide range of environments, stressors, and growth conditions [
]. Some bacteria can contain up to as many as 200 two-component systems that need tight regulation to prevent unwanted cross-talk []. These pathways have been adapted to response to a wide variety of stimuli, including nutrients, cellular redox state, changes in osmolarity, quorum signals, antibiotics, and more []. Two-component systems are comprised of a sensor histidine kinase (HK) and its cognate response regulator (RR) []. The HK catalyses its own auto-phosphorylation followed by the transfer of the phosphoryl group to the receiver domain on RR; phosphorylation of the RR usually activates an attached output domain, which can then effect changes in cellular physiology, often by regulating gene expression. Some HK are bifunctional, catalysing both the phosphorylation and dephosphorylation of their cognate RR. The input stimuli can regulate either the kinase or phosphatase activity of the bifunctional HK.A variant of the two-component system is the phospho-relay system. Here a hybrid HK auto-phosphorylates and then transfers the phosphoryl group to an internal receiver domain, rather than to a separate RR protein. The phosphoryl group is then shuttled to histidine phosphotransferase (HPT) and subsequently to a terminal RR, which can evoke the desired response [
,
].Signal transducing histidine kinases are the key elements in two-component signal transduction systems, which control complex processes such as the initiation of development in microorganisms [
,
]. Examples of histidine kinases are EnvZ, which plays a central role in osmoregulation [], and CheA, which plays a central role in the chemotaxis system []. Histidine kinases usually have an N-terminal ligand-binding domain and a C-terminal kinase domain, but other domains may also be present. The kinase domain is responsible for the autophosphorylation of the histidine with ATP, the phosphotransfer from the kinase to an aspartate of the response regulator, and (with bifunctional enzymes) the phosphotransfer from aspartyl phosphate back to ADP or to water []. The kinase core has a unique fold, distinct from that of the Ser/Thr/Tyr kinase superfamily. HKs can be roughly divided into two classes: orthodox and hybrid kinases [
,
]. Most orthodox HKs, typified by the Escherichia coli EnvZ protein, function as periplasmic membrane receptors and have a signal peptide and transmembrane segment(s) that separate the protein into a periplasmic N-terminal sensing domain and a highly conserved cytoplasmic C-terminal kinase core. Members of this family, however, have an integral membrane sensor domain. Not all orthodox kinases are membrane bound, e.g., the nitrogen regulatory kinase NtrB (GlnL) is a soluble cytoplasmic HK []. Hybrid kinases contain multiple phosphodonor and phosphoacceptor sites and use multi-step phospho-relay schemes instead of promoting a single phosphoryl transfer. In addition to the sensor domain and kinase core, they contain a CheY-like receiver domain and a His-containing phosphotransfer (HPt) domain.
Proteins in this entry are putative periplasmic sensor signal transduction histidine kinases. They all contain a GAF domain that is present in phytochromes and cGMP-specific phosphodiesterases, and which has been experimentally proven to be involved in protein:protein interactions. They also contain a C-terminal histidine kinase domain, which is composed of a dimerisation sub-domain and an ATP/ADP-binding phosphotransfer, or catalytic, sub-domain. The proteins in this entry are found strictly within a subset of Gram-negative bacterial species with the proposed PEP-CTERM/exosortase system, analogous to the LPXTG/sortase system [
] common in Gram-positive bacteria, where members of and
also occur.
This group of metallopeptidases belong to the MEROPS peptidase family M13 (neprilysin family, clan MA(E)). The M13 family includes neprilysin (neutral endopeptidase, NEP, enkephalinase, CD10, CALLA,
), endothelin-converting enzyme I (ECE-1,
), erythrocyte surface antigen KELL (ECE-3), phosphate-regulating gene on the X chromosome (PHEX), soluble secreted endopeptidase (SEP), and damage-induced neuronal endopeptidase (DINE)/X-converting enzyme (XCE). These proteins consist of a short N-terminal cytoplasmic domain, a single transmembrane helix, and a larger C-terminal extracellular domain containing the active site. The cytoplasmic domain contains a conformationally-restrained octapeptide, which is thought to act as a stop transfer sequence that prevents proteolysis and secretion [
,
]. Proteins in this family fulfill a broad range of physiological roles due to the greater variation in the S2' subsite allowing substrate specificity [,
]. The protein fold of the peptidase domain for members of this family resembles that of thermolysin, the type example for clan MA and the predicted active site residues for members of this family and thermolysin occur in the motif HEXXH [].M13 peptidases are well-studied proteases found in a wide range of organisms including mammals and bacteria. In mammals they participate in processes such as cardiovascular development, blood-pressure regulation, nervous control of respiration, and regulation of the function of neuropeptides in the central nervous system. In bacteria they may be used for digestion of milk [
,
]. The family includes eukaryotic and prokaryotic oligopeptidases, as well as some of the proteins responsible for the molecular basis of the blood group antigens e.g. Kell []. Neprilysin (NEP), is expressed in a variety of tissues including kidney and brain, and is involved in many physiological and pathological processes, including blood pressure and inflammatory response. It is a plasma membrane-bound mammalian enzyme that is able to digest biologically-active peptides, including enkephalins [], substance P, cholecystokinin, neurotensin and somatostatin. It is an important enzyme in the regulation of amyloid-beta (Abeta) protein that forms amyloid plaques that are associated with Alzeimers disease (AD). The zinc ligands of neprilysin are known and are analogous to those in thermolysin, a related peptidase [,
]. Neprilysins, like thermolysin, are inhibited by phosphoramidon, which appears to selectively inhibit this family in mammals. The enzymes are all oligopeptidases, digesting oligo- and polypeptides, but not proteins [].ECE-1 catalyzes the final rate-limiting step in the biosynthesis of endothelins via post-translational conversion of the biologically inactive big endothelins. Like NEP, it also hydrolyses bradykinin, substance P, neurotensin and Abeta. Endothelin-1 overproduction has been implicated in various diseases, including stroke, asthma, hypertension, and cardiac and renal failure. Kell is a homologue of NEP and constitutes a major antigen on human erythrocytes; it preferentially cleaves big endothelin-3 to produce bioactive endothelin-3, but is also known to cleave substance P and neurokinin A. PHEX forms a complex interaction with fibroblast growth factor 23 (FGF23) and matrix extracellular phosphoglycoprotein, causing bone mineralization. A loss-of-function mutation in PHEX disrupts this interaction leading to hypophosphatemic rickets; X-linked hypophosphatemic (XLH) rickets is the most common form of metabolic rickets. ECEL1 is a brain metalloprotease involved in the critical role in the nervous regulation of the respiratory system, while DINE (damage induced neuronal endopeptidase) is abundantly expressed in the hypothalamus and its expression responds to nerve injury as well. Thus, majority of these M13 proteases are prime therapeutic targets for selective inhibition [
,
,
,
,
,
,
,
,
,
,
,
].
Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [
,
,
,
,
]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. This entry represents the zinc finger domain found in the large T-antigen (T-Ag) as the D1 domain. The T-Ag is found in a group of polyomaviruses consisting of the homonymous murine virus (Py) as well as other representative members such as the Simian virus 40 (SV40) and the human BK polyomavirus (BKPyV) and JC polyomavirus (JCPyV) [
]. Their large T antigen (T-Ag) protein binds to and activates DNA replication from the origin of DNA replication (ori). Insofar as is known, the T-Ag binds to the origin first as a monomer to its pentanucleotide recognition element. The monomers are then thought to assemble into hexamers and double hexamers, which constitute the form that is active in initiation of DNA replication. When bound to the ori, T-Ag double hexamers encircle DNA []. T-Ag is a multi-domain protein that contains an N-terminal J domain, a central origin-binding domain (OBD), and a C-terminal superfamily 3 helicase domain []. The helicase domain actually contains three distinct structural domains: D1 (domain 1), D2 and D3. D1 is the Zn domain at the N terminus and contains five α-helices (alpha1-alpha5). The Zn atom coordinated by a Zn motif is important in holding alpha3 (α-helix 3) and alpha4 together, which in turn provide an anchor for alpha1 and alpha2. The beginning of alpha5 packs with alpha1 and alpha2 of D1, but its C terminus extends to alpha6 of D3. The D2 domain contains three conserved helicase motifs related to SF3 helicases, namely the modified version of Walker A and B motifs and motif C. D2 folds into a core β-sheet consisting of five parallel β-strands sandwiched by α-helices. The third domain, D3, is all α-helical. Its seven α-helices originate from both the N-terminal region (alpha6-alpha8) and the C terminus (alpha13-alpha16), with D2 inserted between [].The Zn motif of T-Ag was proposed to form a canonical zinc-finger structure for DNA binding. However, the Zn domain (D1) has a globular fold stabilised by the coordination of a Zn atom through the Zn motif, and no classical zinc-finger structure specialised for DNA binding is present. The Zn motif is not directly involved in binding DNA but is instead important for stabilising the Zn-domain structure [
].
Globins are haem-containing proteins involved in binding and/or transporting oxygen. They belong to a very large and well studied family that is widely distributed in many organisms [
]. Globins have evolved from a common ancestor and can be divided into three groups: single-domain globins, and two types of chimeric globins, flavohaemoglobins and globin-coupled sensors. Bacteria have all three types of globins, while archaea lack flavohaemoglobins, and eukaryotes lack globin-coupled sensors []. Several functionally different haemoglobins can coexist in the same species. The major types of globins include:Haemoglobin (Hb): tetramer of two alpha and two beta chains, although embryonic and foetal forms can substitute the alpha or beta chain for ones with higher oxygen affinity, such as gamma, delta, epsilon or zeta chains. Hb transports oxygen from lungs to other tissues in vertebrates [
]. Hb proteins are also present in unicellular organisms where they act as enzymes or sensors [].Myoglobin (Mb): monomeric protein responsible for oxygen storage in vertebrate muscle [
].Neuroglobin: a myoglobin-like haemprotein expressed in vertebrate brain and retina, where it is involved in neuroprotection from damage due to hypoxia or ischemia [
]. Neuroglobin belongs to a branch of the globin family that diverged early in evolution. Cytoglobin: an oxygen sensor expressed in multiple tissues. Related to neuroglobin [
].Erythrocruorin: highly cooperative extracellular respiratory proteins found in annelids and arthropods that are assembled from as many as 180 subunit into hexagonal bilayers [
].Leghaemoglobin (legHb or symbiotic Hb): occurs in the root nodules of leguminous plants, where it facilitates the diffusion of oxygen to symbiotic bacteriods in order to promote nitrogen fixation.Non-symbiotic haemoglobin (NsHb): occurs in non-leguminous plants, and can be over-expressed in stressed plants [
].Flavohaemoglobins (FHb): chimeric, with an N-terminal globin domain and a C-terminal ferredoxin reductase-like NAD/FAD-binding domain. FHb provides protection against nitric oxide via its C-terminal domain, which transfers electrons to haem in the globin [
].Globin-coupled sensors: chimeric, with an N-terminal myoglobin-like domain and a C-terminal domain that resembles the cytoplasmic signalling domain of bacterial chemoreceptors. They bind oxygen, and act to initiate an aerotactic response or regulate gene expression [
,
]. Protoglobin: a single domain globin found in archaea that is related to the N-terminal domain of globin-coupled sensors [
].Truncated 2/2 globin: lack the first helix, giving them a 2-over-2 instead of the canonical 3-over-3 α-helical sandwich fold. Can be divided into three main groups (I, II and II) based on structural features [].This entry represents trematode type myoglobin. Trematodes are the parasitic worms inhabiting various body organs of vertebrates where oxygen might be scarce or intermittent, as in the bile ducts or stomach of ruminants or the swim bladder of fishes. Trematodes possess a cytoplasmic oxygen-binding hemoprotein of ~17kDa [
] distributed throughout the body. The absorption spectrum shows the presence of haem (Fe-protoporphyrin IX) as a prosthetic group in the globin []. Unlike human Hb, the trematode oxygen-binding haemoproteins are monomeric and are not involved in the transport of oxygen within a circulatory system. High oxygen affinity molecules would not release the oxygen rapidly. The trematodes have a tyrosine in position B10; two H-bonds from this to the oxygen molecule are thought to be responsible for the very high oxygen affinity. The trematode haemoglobins display a combination of high association rates and very low dissociation rates, resulting in some of the highest oxygen affinities ever observed [].
RNA-directed RNA polymerase (RdRp) (
) is an essential protein encoded in the genomes of all RNA containing viruses with no DNA stage [
,
]. It catalyses synthesis of the RNA strand complementary to a given RNA template, but the precise molecular mechanism remains unclear.The postulated RNA replication process is a two-step mechanism. First, the initiation step of RNA synthesis begins at or near the 3' end of the RNA template by means of a primer-independent (de novo) mechanism. The de novo initiation consists in the addition of a nucleotide tri-phosphate (NTP) to the 3'-OH of the first initiating NTP. During the following so-called elongation phase, this nucleotidyl transfer reaction is repeated with subsequent NTPs to generate the complementary RNA product [
]. All the RNA-directed RNA polymerases, and many DNA-directed polymerases, employ a fold whose organisation has been likened to the shape of a right hand with three subdomains termed fingers, palm and thumb [
]. Only the catalytic palm subdomain, composed of a four-stranded antiparallel β-sheet with two α-helices, is well conserved among all of these enzymes. In RdRp, the palm subdomain comprises three well conserved motifs (A, B and C). Motif A (D-x(4,5)-D) and motif C (GDD) are spatially juxtaposed; the Asp residues of these motifs are implied in the binding of Mg2+ and/or Mn2+. The Asn residue of motif B is involved in selection of ribonucleoside triphosphates over dNTPs and thus determines whether RNA is synthesised rather than DNA [].The domain organisation [
] and the 3D structure of the catalytic centre of a wide range of RdPp's, even those with a low overall sequence homology, are conserved. The catalytic centre is formed by several motifs containing a number of conserved amino acid residues.There are 4 superfamilies of viruses that cover all RNA containing viruses with no DNA stage:
Viruses containing positive-strand RNA or double-strand RNA, except retroviruses and Birnaviridae: viral RNA-directed RNA polymerases including all positive-strand RNA viruses with no DNA stage, double-strand RNA viruses, and the Cystoviridae, Reoviridae, Hypoviridae, Partitiviridae, Totiviridae families.Mononegavirales (negative-strand RNA viruses with non-segmented genomes).Negative-strand RNA viruses with segmented genomes, i.e. Orthomyxoviruses (including influenza A, B, and C viruses, Thogotoviruses, and the infectious salmon anemia virus), Arenaviruses, Bunyaviruses, Hantaviruses, Nairoviruses, Phleboviruses, Tenuiviruses and Tospoviruses.Birnaviridae family of dsRNA viruses.The RNA-directed RNA polymerases in the first of the above superfamilies can be divided into the following three subgroups:All positive-strand RNA eukaryotic viruses with no DNA stage.All RNA-containing bacteriophages -there are two families of RNA-containing bacteriophages: Leviviridae (positive ssRNA phages) and Cystoviridae (dsRNA phages).Reoviridae family of dsRNA viruses.This entry represents RNA-directed RNA polymerase (also known as the large structural protein) from various Filoviruses [
]. The large structural protein (or L protein) carries three enzymatic activities: RNA-directed RNA polymerase (), mRNA (guanine-N(7)-)-methyltransferase (
), and mRNA guanylyltransferase (
). The viral mRNA guanylyl transferase displays a different biochemical reaction than the cellular enzyme. The template is composed of the viral RNA tightly encapsidated by the nucleoprotein (N). The protein can function either as transcriptase or as replicase. The transcriptase synthesises subsequently subgenomic RNAs, assuring their capping and polyadenylation by a stuttering mechanism. The replicase mode is dependent on intracellular N protein concentration. In this mode, the polymerase replicates the whole viral genome without recognizing the transcriptional signals.
RNA-directed RNA polymerase (RdRp) (
) is an essential protein encoded in the genomes of all RNA containing viruses with no DNA stage [
,
]. It catalyses synthesis of the RNA strand complementary to a given RNA template, but the precise molecular mechanism remains unclear.The postulated RNA replication process is a two-step mechanism. First, the initiation step of RNA synthesis begins at or near the 3' end of the RNA template by means of a primer-independent (de novo) mechanism. The de novo initiation consists in the addition of a nucleotide tri-phosphate (NTP) to the 3'-OH of the first initiating NTP. During the following so-called elongation phase, this nucleotidyl transfer reaction is repeated with subsequent NTPs to generate the complementary RNA product [
]. All the RNA-directed RNA polymerases, and many DNA-directed polymerases, employ a fold whose organisation has been likened to the shape of a right hand with three subdomains termed fingers, palm and thumb [
]. Only the catalytic palm subdomain, composed of a four-stranded antiparallel β-sheet with two α-helices, is well conserved among all of these enzymes. In RdRp, the palm subdomain comprises three well conserved motifs (A, B and C). Motif A (D-x(4,5)-D) and motif C (GDD) are spatially juxtaposed; the Asp residues of these motifs are implied in the binding of Mg2+ and/or Mn2+. The Asn residue of motif B is involved in selection of ribonucleoside triphosphates over dNTPs and thus determines whether RNA is synthesised rather than DNA [].The domain organisation [
] and the 3D structure of the catalytic centre of a wide range of RdPp's, even those with a low overall sequence homology, are conserved. The catalytic centre is formed by several motifs containing a number of conserved amino acid residues.There are 4 superfamilies of viruses that cover all RNA containing viruses with no DNA stage:
Viruses containing positive-strand RNA or double-strand RNA, except retroviruses and Birnaviridae: viral RNA-directed RNA polymerases including all positive-strand RNA viruses with no DNA stage, double-strand RNA viruses, and the Cystoviridae, Reoviridae, Hypoviridae, Partitiviridae, Totiviridae families.Mononegavirales (negative-strand RNA viruses with non-segmented genomes).Negative-strand RNA viruses with segmented genomes, i.e. Orthomyxoviruses (including influenza A, B, and C viruses, Thogotoviruses, and the infectious salmon anemia virus), Arenaviruses, Bunyaviruses, Hantaviruses, Nairoviruses, Phleboviruses, Tenuiviruses and Tospoviruses.Birnaviridae family of dsRNA viruses.The RNA-directed RNA polymerases in the first of the above superfamilies can be divided into the following three subgroups:All positive-strand RNA eukaryotic viruses with no DNA stage.All RNA-containing bacteriophages -there are two families of RNA-containing bacteriophages: Leviviridae (positive ssRNA phages) and Cystoviridae (dsRNA phages).Reoviridae family of dsRNA viruses.This entry represents RNA-directed RNA polymerase (also known as the large structural protein) from various Rhabdoviruses, such as Vesicular stomatitis Indiana virus [
]. The large structural protein (or L protein) carries four enzymatic activities: RNA-directed RNA polymerase (), mRNA (guanine-N(7)-)-methyltransferase (
), mRNA guanylyltransferase (
), and poly(A) synthetase. The viral mRNA guanylyl transferase displays a different biochemical reaction than the cellular enzyme. The template is composed of the viral RNA tightly encapsidated by the nucleoprotein (N). The protein can function either as transcriptase or as replicase. The transcriptase synthesises subsequently five subgenomic RNAs, assuring their capping and polyadenylation by a stuttering mechanism. The replicase mode is dependent on intracellular N protein concentration. In this mode, the polymerase replicates the whole viral genome without recognizing the transcriptional signals.
This entry represents the C-terminal domain of bacterial DnaA proteins [
,
,
] that play an important role in initiating and regulating chromosomal replication. DnaA is an ATP- and DNA-binding protein. It binds specifically to 9 bp nucleotide repeats known as dnaA boxes which are found in the chromosome origin of replication (oriC).DnaA is a protein of about 50kDa that contains two conserved regions: the first is located in the N-terminal half and corresponds to the ATP-binding domain, the second is located in the C-terminal half and could be involved in DNA-binding. The protein may also bind the RNA polymerase beta subunit, the dnaB and dnaZ proteins, and the groE gene products (chaperonins) [
].
The effector domain is found in bacterial regulatory proteins, such as transcription factors. The effector domain consists of a duplication of a beta/alpha/beta(2) motif, where the antiparallel beta sheets form a barrel structure. Several proteins contain this domain, such as the multidrug-binding domain of the transcription factor BmrR, which transcriptionally regulates multidrug transporters as well as acting as a multidrug-binding protein [
], the C-terminal domain of the Rob transcription factor, which belongs to the AraC/XylS protein family that regulate genes involved in resistace to antibiotics, organic solvents and heavy metals [], and the gyrase inhibitory protein GyrI (SbmC, TeeB), which is induced by DNA damaging agents to suppress cell proliferation by inhibiting bacterial gyrase activity [].
The forkhead-associated (FHA) domain [
] is a phosphopeptide recognition domain found in many regulatory proteins. It displays specificity for phosphothreonine-containing epitopes but will also recognise phosphotyrosine with relatively high affinity. It spans approximately 80-100 amino acid residues folded into an 11-stranded β-sandwich, which sometimes contain small helical insertions between the loops connecting the strands []. To date, genes encoding FHA-containing proteins have been identified in eubacterial and eukaryotic but not archaeal genomes. The domain is present in a diverse range of proteins, such as kinases, phosphatases, kinesins, transcription factors, RNA-binding proteins and metabolic enzymes which partake in many different cellular processes - DNA repair, signal transduction, vesicular transport and protein degradation are just a few examples.
Metallo beta lactamases exhibit low sequence identity between enzymes but they are structurally similar. They have a characteristic α-β/β-α sandwich fold in which the active site is at the interface between domains. Apart from the beta-lactamases and metallo-beta-lactamases, a number of other proteins contain this domain and share the same fold type [
,
]. These proteins include thiolesterases, members of the glyoxalase II family, that catalyse the hydrolysis of S-D-lactoyl-glutathione to form glutathione and D-lactic acid and a competence protein that is essential for natural transformation in Neisseria gonorrhoeae and could be a transporter involved in DNA uptake. Except for the competence protein these proteins bind two zinc ions per molecule as cofactor.
RanBPM is a scaffolding protein and is important in regulating cellular function in both the immune system and the nervous system. The RanBPM protein contains multiple conserved domains that provide potential protein-protein interaction sites [
]. This entry represents a domain at the C terminus of RanBPM containing the CT11-RanBPM (CRA) motif. The CRA motif was found to be important for the interaction of RanBPM with fragile X messenger ribonucleoprotein 1 (FMRP), but its functional significance has yet to be determined []. The region comprising this domain contains the CTLH and CRA domains annotated by SMART; however, these may be a single domain, and is referred to as a C-terminal to LisH motif [].
Most proteins with this domain are known or deduced to function as the phosphoenolpyruvate-protein phosphotransferases (or enzyme I) of PTS sugar transport systems. Enzyme I transfers the phosphoryl group from phosphoenolpyruvate (PEP) to the phosphoryl carrier protein (HPr). This domain is also found in enzyme I-Ntr, which transfers the phosphoryl group from PEP to the phosphoryl carrier protein NPr, and is involved in regulating nitrogen metabolism [
], indicating that not all phosphotransferase system components are associated directly with sugar transport. In Xanthomonas campestris, FruB is a multiphosphoryl transfer protein involved in fructose transport that consists of three domains: a fructose-specific enzyme-IIA-like N-terminal domain, followed by an HPr-like domain and an enzyme-I-like C-terminal domain, represented by this entry [].
The cysteine-rich domain (CRD) is an essential part of the secreted frizzled-related protein 3 (SFRP3, alias FRZB), which regulates chondrocyte maturation and long bone development [
] and is an antagonist of Wnt8 signalling []. It is also involved in the control of planar cell polarity [].In general, SFRPs antagonize the activation of Wnt signaling by binding to the CRD domains of frizzled (Fz) proteins, thereby preventing Wnt proteins from binding to these receptors. SFRPs are also known to have functions unrelated to Wnt, as enhancers of procollagen cleavage by the TLD proteinases [
]. SFRPs and Fz proteins both contain CRD domains, but SFRPs lack the seven-pass transmembrane domain which is an integral part of Fzs [,
].
This family includes calsyntenin 1/2/3 (also known as alcadein-alpha/beta/gamma). They are single-pass type I membrane proteins that may modulate calcium-mediated postsynaptic signals [
].CLSTN (also known as Alc) forms a tripartite complex with APP (amyloid beta-protein precursor) and X11L (a neuron-specific adaptor protein) in brain; this complex stabilizes both APP and CLSTN proteins metabolically. Deficiencies in the X11L-mediated interaction between CLSTN and APP and/or CTFbeta enhances the production of amyloid beta-protein and have been linked to the development or progression of Alzheimer's disease [
]. Moreover, CLSTN1 strongly associates with kinesin-1 light chains (KLC1) and acts as a cargo that regulates kinesin-1 function. It may affect the transport of APP-containing vesicles by kinesin-1 [].
This entry represents the central, highly conserved Tudor domain of SMN and similar animal proteins. This domain is required for U snRNP assembly and Sm protein binding and has been shown to bind arginine-glycine-rich motifs in an methylarginine-dependent manner [
,
].Survival motor neuron protein (SMN) is part of a multimeric SMN complex that includes spliceosomal Sm core proteins and plays a catalyst role in the assembly of small nuclear ribonucleoproteins (snRNPs), the building blocks of the spliceosome. Mutations in human SMN lead to motor neuron degeneration and spinal muscular atrophy [
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
].
Glutelins are major storage proteins that aggregate in protein bodies in
the endosperm of Zea mays (Maize) [,
]. They comprise the second largest protein fraction in Maize endosperm [] (zeins being the largest), and show sequence similarities to other cereal storage proteins, such as gliadins, glutenins, hordeins, etc. Glutelins have a well-defined structure, including an N-terminal region containing varying numbers of repeats of
the sequence PPPHVL []; a Gln rich region that can be separated into 2 domains; and a Cys rich C-terminal domain that shows some regions of internal similarity. Glutelins have been shown [] to exhibit structural similarity to other cereal storage proteins, including a beta-reverse turn region which forms a loose helix-like domain.
Bacterial membrane-bound nickel-dependent hydrogenases require a number of accessory proteins which are involved in their maturation [
]. One of these proteins is generally known as HypA. HypA is a metallochaperone that binds nickel to bring it safely to its target. The nickel coordinates with four nitrogens within the protein. Four conserved cysteines towards the C terminus bind one zinc moiety, probably to stabilise the protein fold []. In Helicobacter pylori, HypA is involved in the maturation of both hydrogenase and urease []. Escherichia coli has two proteins that belong to this family, HypA and HybF []. A homologue, MJ0214, has also been found in a number of archaeal species, including the genome of Methanocaldococcus jannaschii (Methanococcus jannaschii).
Energy coupling factor transporter S component ThiW
Type:
Family
Description:
Levels of thiamine pyrophosphate (TPP) or thiamine regulate transcription or translation of a number of thiamine biosynthesis, salvage, or transport genes in a wide range of prokaryotes. The mechanism involves direct binding, with no protein involved, to a structural element called THI found in the untranslated upstream region of thiamine metabolism gene operons. This element is called a riboswitch and is seen also for other metabolites such as FMN and glycine. This protein family consists of proteins identified in operons controlled by the THI riboswitch and designated ThiW. The hydrophobic nature of this protein and reconstructed metabolic background suggests that this protein acts in transport of a thiazole precursor of thiamine [].
Cytokine-inducible SH2-containing protein (CIS) is a member of the suppressor of cytokine signaling (SOCS) family proteins. CIS is involved in the negative regulation of cytokines that signal through the JAK-STAT5 pathway such as erythropoietin, prolactin and interleukin 3 (IL3) receptor. It inhibits STAT5 trans-activation by suppressing its tyrosine phosphorylation [
].Suppressor of cytokine signalling (SOCS) was first recognized as a group of cytokine-inducible SH2 (CIS) domain proteins comprising eight family members in human (CIS and SOCS1-SOCS7). In addition to the SH2 domain, SOCS proteins have a variable N-terminal domain and a conserved SOCS box in the C-terminal domain. SOCS proteins bind to a substrate via their SH2 domain [
,
].This entry represents the SH2 domain of CIS.
This domain is found in various uncharacterised proteins and putative lipoproteins from mycoplasmas. Although its function is not clear, it has been suggested that this is a serine protease domain; sequence analysis revealed that the amino acid sequence is similar to that of serine proteases. It is related to pathogenicity as this domain is present in proteins encoded by animal and human pathogenic Mycoplasmas [
,
,
]. It has been proposed that the Ig binding protein M may function together with this domain playing a role in evading host Ig-mediated defense against the MIB-MIP (mycoplasma Ig binding protein (MIB) and mycoplasma Ig protease (MIP)) system, which is a novel mechanism that protects mycoplasma against host immune response [].
This entry is represented by Bacteriophage KVP40, Orf299. The characteristics of the protein distribution suggest prophage matches in addition to the phage matches.This is a family of uncharacterised, mainly bacterial, proteins. While the functions of these proteins are unknown, an analysis has suggested that they may form a novel family within the RNASE H-like superfamily [
]. These proteins appear to contain all the core secondary structural elements of the RNase H-like fold and share several conserved, possible active site, residues. It was suggested, therefore, that they function as nucleases. From the taxonomic distibution of these proteins it was further inferred that they may play a role in DNA repair under stressful conditions.
The Caskin1 protein interacts with the CASK protein via this region [
]. CASK and Caskin1 are synaptic scaffolding proteins. The binding motif on human Caskin1 is EEIWVLRK. A similar motif is found on protein MINT1 and protein TIAM1, both shown to be able to bind to CASK though the motif. However, MINT1 and TIAM1 are not included in this entry []. This region is predicted to be natively unstructured.This entry also matches Caskin2, a Caskin1 homologue deprived of the CASK interaction domain (CID) []. Caskin1 and caskin2 consist of multiple ankyrin repeats, two SAM domains and one SH3 domain. In Caskin1 the CID is located between the SH3 and SAM domains [].
This entry represents the BTB (Broad-Complex, Tramtrack and Bric a brac)/POZ (poxvirus and zinc finger) domain, also known as tetramerization (T1) domain, found in KCTD15 proteins from vertebrates. It is a versatile protein-protein interaction motif that facilitates homodimerization or heterodimerization. The BTB domains from KCTD family members can adopt a wide range of oligomerization geometries, including homodimerization, tetramerization, and pentamerization. The KCTD1 BTB domains, closely related to KCTD15, form pentamers [
,
].BTB/POZ domain-containing protein KCTD15 (potassium channel tetramerization domain-containing protein 15) plays a role in the regulation of neural crest (NC) formation and other steps in embryonic development. It inhibits AP2 transcriptional activity by interacting with its activation domain [
,
,
].
The baculovirus Bro proteins are encoded by a multigene family. The typical Bro proteins that have been experimentally investigated are BroA, BroC and BroD from Bombyx mori uclear polyhedrosis virus (BmNV). They contain distinct amino- and carboxy-terminal domains (Bro-N and Bro-C, respectively) that are present independently of each other and in distinct contexts in a variety of other viral proteins. The Bro-N domain occurs in a stand-alone form or combined with other domains in proteins from temperate phages that infect Gram-positive bacteria and Myxococcus xanthus, and proteins encoded in the genomes of proteobacteria and Gram-positive bacteria. The Bro-N domain appears to define a distinct superfamily of widespread viral DNA-binding domains [
].
The Ah (dioxin) receptor nuclear translocator protein (Arnt) belongs to the basic
helix-loop-helix (bHLH) family of transcription factors and is required for
Ah receptor function. The Ah receptor binds, and mediates the carcinogeniceffects of a variety of environmental pollutants, including
2,3,7,8-tetrachlorodibenzo-p-dioxin and polycyclic aromatic hydrocarbons. Arnt is a structural component of the
XRE-binding form of the Ah receptor. These proteins are class VII members of the basic helix-loop-helix (bHLH) family. The bHLH domain may be responsible for interacting with both the XRE and the ligand-bindingsubunit [
,
].The activated Ah receptor and Arnt protein bind DNA as a heterodimer. Both
proteins contain PAS homology regions, which in Drosophila PER and SIMproteins function as dimerisation domains [
].
Thaumatin [
] is an intensely sweet-tasting protein, 100 000 times sweeter than sucrose on a molar basis [] found in berries from Thaumatococcus daniellii, a tropical flowering plant known as Katemfe. When attacked by viroids, single-stranded unencapsulated RNA molecules that do not code for protein, thaumatin is induced.Thaumatin consists of about 200 residues and contains 8 disulphide bonds. Like other PR proteins, thaumatin is predicted to have a mainly beta structure, with a high content of β-turns and little helix [
].Osmotin belongs to the PR-5 protein family whose members are homologous to thaumatin. Osmotin and other PR-5 proteins were shown to have antifungal activity in vitro against a broad range of fungi, including several plant pathogens[
].
DLG2 (also known as PSD-93) is a scaffolding protein that clusters at synapses and plays an important role in synaptic development and plasticity. The DLG2 delta isoform binds inwardly rectifying potassium Kir2 channels, which determine resting membrane potential in neurons. It regulates the spatial and temporal distribution of Kir2 channels within neuronal membranes [
]. DLG2 is a member of the MAGUK (membrane-associated guanylate kinase) protein family, which is characterized by the presence of a core of three domains: PDZ, SH3, and guanylate kinase (GuK). The GuK domain in MAGUK proteins is enzymatically inactive; instead, the domain mediates protein-protein interactions and associates intramolecularly with the SH3 domain []. DLG2 contains three PDZ domains [].
The KIND (kinase non-catalytic C-lobe domain) is a putative protein interaction domain, which has been identified as being similar to the C-terminal protein kinase catalytic fold (C lobe) (see
).
The presence of the KIND domain at the N terminus of signalling proteins and the absence of the active site residues in the catalytic and activation loops suggest that it folds independently and is likely to be non-catalytic. The occurrence of the domain only in metazoa implies that it has evolved from the catalytic protein kinase domain into an interaction domain possibly by keeping the substrate-binding features [
,
]. In SPIRE1 (protein spire homologue 1) this domain interacts with FMN2 (formin-2) [,
].
DLG5 is a multifunctional scaffold protein that is located at sites of cell-cell contact and is involved in the maintenance of cell shape and polarity [
,
]. Mutations in the DLG5 gene are associated with Crohn's disease (CD) and inflammatory bowel disease (IBD) []. DLG5 is a member of the MAGUK (membrane-associated guanylate kinase) protein family, which is characterised by the presence of a core of three domains: PDZ, SH3, and guanylate kinase (GuK) []. The GuK domain in MAGUK proteins is enzymatically inactive; instead, the domain mediates protein-protein interactions and associates intramolecularly with the SH3 domain []. DLG5 contains 4 PDZ domains as well as an N-terminal domain of unknown function.
WFIKKN proteins are secreted multidomain proteins containing a WAP (whey acidic protein)-, a follistatin/Kazal-, an immunoglobulin-, two Kunitz-type protease inhibitor-domains, and an NTR (netrin) domain. They have been reported to inhibit the activity of trypsin, and to bind GDF11 (growth and differentiation factor 11) and myostatin (GDF8) [
]. Due to the presence of multiple domains that frequently serve as protease inhibitors, WFIKKN proteins are presumed to control the action of diverse serine proteases as well as metalloproteinases [].In humans, WFIKKN1 expression is observed in pancreas, liver, thymus, kidney, testis and lung and WFIKKN2 (also known as WFIKKNRP) expression is observed in ovary, testis, pancreas, brain and lung [
].This entry represents the third immunoglobulin-like domain of WFIKKN1/2.
Thiamine-binding lipoprotein (Cypl) and related proteins are specific to Mycoplasma species. Cypl, also known as p37, whose gene is part of an operon encoding two additional proteins, which are highly similar to components of the periplasmic binding-protein-dependent transport systems of Gram-negative bacteria. It has been suggested that p37 is part of a homologous, high-affinity transport system in Mycoplasma hyorhinis, a Gram-positive bacterium [
]. Its structure indicates that it is a thiamine pyrophosphate-binding protein []. The monomer is a mixed alpha/beta fold split into two globular domains (domains I and II) connected by two linker regions and the C-terminal helix.This entry represents domain I, which consists of a six-stranded β-sheet flanked by seven helices.
Extracytoplasmic thiamine-binding lipoprotein, domain II
Type:
Homologous_superfamily
Description:
Thiamine-binding lipoprotein (Cypl) and related proteins are specific to Mycoplasma species. Cypl, also known as p37, whose gene is part of an operon encoding two additional proteins, which are highly similar to components of the periplasmic binding-protein-dependent transport systems of Gram-negative bacteria. It has been suggested that p37 is part of a homologous, high-affinity transport system in Mycoplasma hyorhinis, a Gram-positive bacterium [
]. Its structure indicates that it is a thiamine pyrophosphate-binding protein []. The monomer is a mixed alpha/beta fold split into two globular domains (domains I and II) connected by two linker regions and the C-terminal helix.This entry represents domain II, which consists of five beta strands and five alpha helices.
Sortilin, also known in mammals as neurotensin receptor-3, is the archetypical member of the vacuolar protein sorting 10 protein domain (Vps10-D) receptor family that binds neurotrophic factors and neuropeptides [
]. This entry represents a domain that constitutes the entire luminal part of Sortilin and is activated in the trans-Golgi network by enzymatic propeptide cleavage [,
]. The structure of the domain has been determined as a ten-bladed propeller, with up to 9 BNR or β-hairpin turns in it []. The mature receptor binds various ligands, including its own propeptide (Sort-pro), neurotensin, the pro-forms of nerve growth factor-beta (NGF)-6 and brain-derived neurotrophic factor (BDNF)-7, lipoprotein lipase (LpL), apo lipoprotein AV14 and the receptor-associated protein (RAP)1 [,
].
This domain, also known as the recognition domain, is found at the N terminus of beta-1,3-glucan-binding proteins. It has an immunoglobulin-like β-sandwich fold composed of two antiparallel β-sheets containing three and five β-strands [
].Beta 1,3-glucan recognition proteins (GRP, also called Gram-negative bacteria binding proteins or GNBPs) have specific affinity for beta 1,3-glucan, a component on the surface of fungi and bacteria. Beta-GRP (beta-1,3-glucan recognition protein) is one of several pattern recognition receptors (PRRs), also referred to as biosensor proteins, that complexes with pathogen-associated beta-1,3-glucans and then transduces signals necessary for activation of an appropriate innate immune response. They are present in insects and lack all catalytic residues [
,
,
,
,
].
Leucine-rich repeats (LRR, see
) consist of 2-45 motifs of 20-30 amino acids in length that generally folds into an arc or horseshoe shape [
]. LRRs occur in proteins ranging from viruses to eukaryotes, and appear to provide a structural framework for the formation of protein-protein interactions []. Proteins containing LRRs include tyrosine kinase receptors, cell-adhesion molecules, virulence factors, and extracellular matrix-binding glycoproteins, and are involved in a variety of biological processes, including signal transduction, cell adhesion, DNA repair, recombination, transcription, RNA processing, disease resistance, apoptosis, and the immune response.LRRs are often flanked by cysteine-rich domains: an N-terminal LRR domain (
) and a C-terminal LRR domain. This entry represents the C-terminal LRR domain.
This entry represents the fungal Sds23 protein, also known as Moc1 or Psp1. The exact function of this protein is not known but it is thought to be required for proper DNA replication and mitosis [
,
]. It has also been shown to induce sexual development [,
] and is involved in the response to nutrient deprivation stress []. Phosphorylation seems to be important in regulating the activity of Sds23 - in actively growing cells the protein is predominatly dephosphorylated while in stationary phase the predominant form is phosphorylated. The Cdc2-Cdc13 complex has been shown to phosphorylate the protein in vitro [
].S. cerevisiae has homologues (Sds23 and Sds24) of the S. pombe Sds23 protein [
].
The IA3 polypeptide of Saccharomyces cerevisiae (also known as Pai3) is an 8kDa inhibitor of the vacuolar aspartic proteinase (proteinase A or saccharopepsin, MEROPS peptidase family A1). It belongs to MEROPS inhibitor family I34, clan JA. No other aspartic proteinase has been found to be inhibited by IA3, and at least 15 aspartic proteinases related to YprA cleave IA3 as a substrate. Ligand- free IA3 has little intrinsic secondary structure, however, upon contact with proteinase A, residues 2-32 of the inhibitor become ordered and adopt a near perfect α-helical conformation occupying the active site cleft of the enzyme. This potent, specific interaction is directed primarily by hydrophobic interactions made by three key features in the inhibitory sequence [
].
This entry represents the glycosidases CRH1, CRH2 and their homologues. These enzymes function as GPI-lipid anchored cell wall proteins (GPI-CWP) that are localised to sites of polarized growth, particularly chitin-rich areas [
]. These proteins are found at the incipient bud site, as a ring at the bud neck as the bud grows, and in the septum at the time of cytokinesis. The GPI-anchor is attached to the protein in the endoplasmic reticulum and serves to target the protein to the cell surface. There, the glucosamine-inositol phospholipid moiety is cleaved of and the GPI-modified mannoprotein is covalently attached via its lipidless GPI glycan remnant to the 1,6-beta-glucan of the outer cell wall layer.
The protein MutL is essential in mismatch repair as it coordinating multiple protein-protein interactions that signal strand removal upon mismatch recognition by MutS. MutL is composed of two structurally conserved domains connected by a variable flexible linker: an N-terminal ATPase domain and C-terminal dimerisation domain [
]. The latter harbours the the endonuclease activity of the protein. Structural studies have shown that this C-terminal region is organized into a dimerization and a regulatory subdomain connected by a helical lever spanning the conserved endonuclease motif.This superfamily represents the dimerization subdomain of the C-terminal domain of the MutL protein which presents a Zn2+ binding site and is important for the mismatch repair function of MutL [
].
DDB1- and CUL4-associated factor 15, WD40 repeat-containing domain
Type:
Domain
Description:
DCAFs, Ddb1- and Cul4-associated factors, are substrate receptors for the Cul4-Ddb1 Ubiquitin Ligase. There are 18 different factors, the majority of which are WD40-repeat-proteins [
]. DDB1- and CUL4-associated factor 15 (DCAF15) is the substrate-recognition component of the DCX(DCAF15) complex, a cullin-4-RING E3 ubiquitin-protein ligase complex that mediates ubiquitination and degradation of target proteins [], and acts as a regulator of the natural killer (NK) cells effector functions []. Aryl sulfonamide anticancer agents appear to promote binding of DCAF15 to the RNA-recognition motif (RRM) of RBM39, which suggests that derivatives of the aryl-sulfonamides may be used to target other RRM-containing proteins [,
].This entry represents the WD40 repeat-containing N-terminal domain found in DCAF15 and similar animal proteins.
This entry represents a domain of about 143 amino acids that may occur singly or in up to 23 tandem repeats in very large proteins in the genus Vibrio, and in related species such as Legionella pneumophila, Photobacterium profundum, Rhodopseudomonas palustris, Shewanella pealeana, and Aeromonas hydrophila. Proteins with these domains represent a subset of a broader set of proteins with a particular signal for type 1 secretion, consisting of several glycine-rich repeats modeled by
, followed by a C-terminal domain modeled by
. Proteins with this domain tend to share several properties with the RtxA (Repeats in Toxin) protein of Vibrio cholerae, including a large size often containing tandemly repeated domains and a C-terminal signal for type 1 secretion.
A cysteine peptidase is a proteolytic enzyme that hydrolyses a peptide bond using the thiol group of a cysteine residue as a nucleophile. Hydrolysis involves usually a catalytic triad consisting of the thiol group of the cysteine, the imidazolium ring of a histidine, and a third residue, usually asparagine or aspartic acid, to orientate and activate the imidazolium ring. In only one family of cysteine peptidases, is the role of the general base assigned to a residue other than a histidine: in peptidases from family C89 (acid ceramidase) an arginine is the general base. Cysteine peptidases can be grouped into fourteen different clans, with members of each clan possessing a tertiary fold unique to the clan. Four clans of cysteine peptidases share structural similarities with serine and threonine peptidases and asparagine lyases. From sequence similarities, cysteine peptidases can be clustered into over 80 different families [
]. Clans CF, CM, CN, CO, CP and PD contain only one family.Cysteine peptidases are often active at acidic pH and are therefore confined to acidic environments, such as the animal lysosome or plant vacuole. Cysteine peptidases can be endopeptidases, aminopeptidases, carboxypeptidases, dipeptidyl-peptidases or omega-peptidases. They are inhibited by thiol chelators such as iodoacetate, iodoacetic acid,
N-ethylmaleimide or
p-chloromercuribenzoate.
Clan CA includes proteins with a papain-like fold. There is a catalytic triad which occurs in the order: Cys/His/Asn (or Asp). A fourth residue, usually Gln, is important for stabilising the acyl intermediate that forms during catalysis, and this precedes the active site Cys. The fold consists of two subdomains with the active site between them. One subdomain consists of a bundle of helices, with the catalytic Cys at the end of one of them, and the other subdomain is a β-barrel with the active site His and Asn (or Asp). There are over thirty families in the clan, and tertiary structures have been solved for members of most of these. Peptidases in clan CA are usually sensitive to the small molecule inhibitor E64, which is ineffective against peptidases from other clans of cysteine peptidases [
].Clan CD includes proteins with a caspase-like fold. Proteins in the clan have an α/β/α sandwich structure. There is a catalytic dyad which occurs in the order His/Cys. The active site His occurs in a His-Gly motif and the active site Cys occurs in an Ala-Cys motif; both motifs are preceded by a block of hydrophobic residues [
]. Specificity is predominantly directed towards residues that occupy the S1 binding pocket, so that caspases cleave aspartyl bonds, legumains cleave asparaginyl bonds, and gingipains cleave lysyl or arginyl bonds.Clan CE includes proteins with an adenain-like fold. The fold consists of two subdomains with the active site between them. One domain is a bundle of helices, and the other a β-barrel. The subdomains are in the opposite order to those found in peptidases from clan CA, and this is reflected in the order of active site residues: His/Asn/Gln/Cys. This has prompted speculation that proteins in clans CA and CE are related, and that members of one clan are derived from a circular permutation of the structure of the other.Clan CL includes proteins with a sortase B-like fold. Peptidases in the clan hydrolyse and transfer bacterial cell wall peptides. The fold shows a closed β-barrel decorated with helices with the active site at one end of the barrel [
]. The active site consists of a His/Cys catalytic dyad.This group of cysteine peptidases belong to the MEROPS peptidase family C2 (calpain family, clan CA). A type example is calpain, which is an intracellular protease involved in many important cellular functions that are regulated by calcium [
,
]. The protein is a complex of 2 polypeptide chains (light and heavy), with eleven known active peptidases in humans and two non-peptidase homologues known as calpamodulin and androglobin []. These include a highly calcium-sensitive (i.e., micro-molar range) form known as mu-calpain, mu-CANP or calpain I; a form sensitive to calcium in the milli-molar range, known as m-calpain, m-CANP or calpain II; and a third form, known as p94, which is found in skeletal muscle only [].All forms have identical light but different heavy chains. Both mu- and m-calpain are heterodimers containing an identical 28kDa subunit and an 80kDa subunit that shares 55-65% sequence homology between the two proteases [
,
]. The crystallographic structure of m-calpain reveals six "domains"in the 80kDa subunit [
,
]: A 19-amino acid NH2-terminal sequence;Active site domain IIa;Active site domain IIb. Domain 2 shows
low levels of sequence similarity to papain; although the catalytic His hasnot been located by biochemical means, it is likely that calpain and papain
are related [].Domain III;An 18-amino acid extended sequence linking domain III to domain IV;Domain IV, which resembles the penta EF-hand family of polypeptides, binds calcium and regulates activity [
]. Ca2+-binding causes a rearrangement of the protein backbone, the net effect of which is that a Trp side chain, which acts as a wedge between catalytic domains IIa and IIb in the apo state, moves away from the active site cleft allowing for the proper formation of the catalytic triad [
]. Calpain-like mRNAs have been identified in other organisms including bacteria, but the molecules encoded by these mRNAs have not been isolated, so little is known about their properties. How calpain activity is regulated in these organisms cells is still unclear In metazoans, the activity of calpain is controlled by a single proteinase inhibitor, calpastatin (
). The calpastatin gene can produce eight or more calpastatin polypeptides ranging from 17 to 85kDa by use of different promoters and alternative splicing events. The physiological significance of these different calpastatins is unclear, although all bind to three different places on the calpain molecule; binding to at least two of the sites is Ca2+ dependent. The calpains ostensibly participate in a variety of cellular processes including remodelling of cytoskeletal/membrane attachments, different signal transduction pathways, and apoptosis. Deregulated calpain activity following loss of Ca2+ homeostasis results in tissue damage in response to events such as myocardial infarcts, stroke, and brain trauma [
].
The stromal antigens or stromalins are a group of highly conserved nuclear proteins, originally identified from a murine bone marrows stromal cell line. Stromalins are represented in several organisms from yeast to humans and are characterised by the stromalin conservative domain (SCD), an 86 amino acid motif found in all proteins of the family described to date [
].Some proteins known to contain a stromalin conservative domain are listed below:Vertebrate cohesin subunits SA-1, -2 and -3 (Stag1, 2 and 3), components of cohesin complex, a complex required for the cohesion of sister chromatids after DNA replication.
Human STAG3-like protein.
Yeast cohesin subunit SCC3, a component of cohesin complex, a complex required for the cohesion of sister chromatids after DNA replication.
Fission yeast meiotic recombination protein Rec11.
This family is named after YEATS (Yaf9, ENL, AF9, Taf14, and Sas5), an evolutionarily conserved module present in 4 proteins (ENL, AF9, GAS41, and YEATS2) in humans and 3 proteins (Sas5, Taf14, and Yaf9) in yeast. These proteins are found in major chromatin-remodeling and histone acetyl-transferase (HAT) complexes and implicated in regulation of chromatin structure, histone acetylation and deposition, gene transcription and DNA damage response. The YEATS domain, which as previously shown is found in a number of chromatin-associated proteins, has recently been shown to have the capacity to bind histone lysine acetylation [
]. The ability of the YEATS domains of human AF9 and yeast Taf14 to recognise the histone mark H3K9ac, have shown that these proteins are members of the family of acetyllysine readers [].
This domain is found at the C terminus of YchF. The crystal structure of YchF
from Haemophilus influenzae has been determined [
]. This protein consists of three domains: an N-terminal domain which has a mononucleotide binding fold typical for the P-loop NTPases, a central domain which forms an α-helical coiled coil, and this C-terminal domain which is composed of a six-stranded half-barrel curved around an α-helix. The central domain and this domain are topologically similar to RNA-binding proteins, while the N-terminal region containsthe features typical of GTP-dependent molecular switches. The purified protein was capable of binding both double-stranded nucleic acid and GTP. It was suggested, therefore, that this protein might be part of a nucleoprotein complex and could function as a GTP-dependent translation factor.
This domain contains sequences that are similar to the N-terminal region of Red protein (
). This and related proteins contain a RED repeat which consists of a number of RE and RD sequence elements [
]. The region in question has several conserved NLS sequences and a putative trimeric coiled-coil region [], suggesting that these proteins are expressed in the nucleus []. Protein RED (also known as IK) is found in the nucleus and is a component of the spliceosome []. It is also associated with the spindle pole where it co-localizes with and interacts with the spindle assembly checkpoint protein MAD1 during metaphase and anaphase. Depletion of RED shortens the mitotic cycle and MAD1 is incorrectly localized [].
The HECT (Homologous to the E6-AP Carboxyl Terminus) domain is an around 350 amino acids motif that has been identified in proteins that all belong to a particular E3 ubiquitin-protein ligase family [
]. HECT domain containing proteins accept ubiquitin from an E2 ubiquitin-conjugating enzyme in the form of a thioester and then transfer it to lysine side chains of target proteins, and transfers additional ubiquitin molecules to the end of growing ubiquitin chains. The site of ubiquitin thioester formation is a conserved cysteine residue located in the last 32-36 aa of the HECT domain []. The amino-terminal part of the HECT domain has been involved in E2 binding [,
]. Once linked to ubiquitin, the target proteins are degraded in the 26 S proteasome.
This entry includes SURF1 and Shy1 proteins. The surfeit locus 1 gene (SURF1 or surf-1) encodes a conserved protein of about 300 amino-acid residues that seems to be involved in the biogenesis of
cytochrome c oxidase []. Vertebrate SURF1 is evolutionary related to yeastprotein Shy1, which is a mitochondrial inner membrane protein required for assembly of cytochrome c oxidase [
]. There seems to be two transmembrane regions in these proteins,one in the N-terminal, the other in the C-terminal.
Defects in SURF1 are a cause of Leigh syndrome (LS). LS is a severe neurological disorder characterised by bilaterally symmetrical necrotic lesions in subcortical brain regions that is commonly associated with systemic cytochrome c oxidase (COX) deficiency [
,
,
].
Malectin is a membrane-anchored protein of the endoplasmic reticulum that recognises and binds Glc2-N-glycan. It carries a signal peptide from residues 1-26, a C-terminal transmembrane helix from residues 255-274, and a highly conserved central part of approximately 190 residues followed by an acidic, glutamate-rich region. Carbohydrate-binding is mediated by the four aromatic residues, Y67, Y89, Y116, and F117 and the aspartate at D186. NMR-based ligand-screening studies has shown binding of the protein to maltose and related oligosaccharides, on the basis of which the protein has been designated "malectin", and its endogenous ligand is found to be Glc2-high-mannose N-glycan [
].This entry represents a malectin domain, and can also be found in probable receptor-like serine/threonine-protein kinases from plants [
] and in proteins described as glycoside hydrolases.
Members of this family are the PhnF protein associated with C-P lyase systems for utilization of phosphonates. These systems resemble phosphonatase-based systems in having a three-component ABC transporter, where
is the permease,
is the phosphonates binding protein, and
is the ATP-binding cassette (ABC) protein. They differ, however, in having, typically, ten or more additional genes, many of which are believed to form a membrane-associated C-P lyase complex. This protein (PhnF) is a predicted helix-turn-helix transcriptional regulatory protein of the broader GntR family [
], and is encoded in a gene cluster associated with genes for the import and degradation of phosphonates and/or related compounds (e.g. phosphonites) with a direct C-P bond. Its presence is apparently not required for phosphate utilisation to occur [].
This domain comprises the whole of a protein in Methanocaldococcus jannaschii and Methanobacterium thermoautotrophicum, all but the N-terminal 60 residues from a protein of Mycobacterium tuberculosis, and all but the C-terminal 180 residues from a protein in Haemophilus influenzae and Escherichia coli, among proteins from published complete genomes. This domain contains a conserved ATP-binding pocket [
]. The domain can be found in Escherichia coli YcaO, which is known to be involved in beta-methylthiolation of ribosomal protein S12 [
]. YcaO-like domains have been shown to be catalytically active, and to use ATP to activate amide backbones during peptide cyclodehydrations []. It is likely to be involved in post-translational modification (PTM) of amino acid residues. In Methanopyrus kandleri and Methanocaldococcus jannaschii YcaO is involved in thioamidation [].
The glucokinase regulatory protein (GCKR) [
] is a vertebrate protein thatinhibits glucokinase by forming a complex with the enzyme, which plays a role
in the control of blood glucose homeostasis. GCKR is a protein of about 70 Kdwhich is evolutionary related to bacterial N-acetylmuramic acid 6-phosphate
etherases (MurNAc-6-P etherase, murQ), which are about half the size of GCKRand form active homodimers. MurNAc-6-P etherases catalyse the cleavage of the
D-lactyl ether substituent of the bacterial cell wall sugar MurNAc and play arole in recycling of the cell wall [
]. The 3D structure of the Haemophilusinfluenzae HI0754/murQ protein shows structural relationships
with other sugar isomerase (SIS) domain proteins. Incontrast to mono-SIS bacterial MurNAc-6-P etherase, mammalian GCKR is composed
of two SIS domains.
The cysteine-rich domain (CRD) is an essential extracellular portion of the frizzled 2 (Fz2) receptor, and is required for binding Wnt proteins, which play fundamental roles in many aspects of early development, such as cell and tissue polarity, neural synapse formation, and the regulation of proliferation. Fz proteins serve as Wnt receptors for multiple signal transduction pathways, including both beta-catenin dependent and -independent cellular signaling, as well as the planar cell polarity pathway and Ca(2+) modulating signaling pathway. CRD containing Fzs have been found in diverse species from amoebas to mammals. 10 different frizzled proteins are found in vertebrata [
]. Fz2 is involved in the Wnt/beta-catenin signaling pathway and in the activation of protein kinase C and calcium/calmodulin-dependent protein kinase (CaM kinase) [].
The cysteine-rich domain (CRD) is an essential part of the tyrosine kinase-like orphan receptor (Ror) proteins, a conserved family of tyrosine kinases that function in various processes, including neuronal and skeletal development, cell polarity, and cell movement. Ror proteins are receptors of Wnt proteins, which are key players in a number of fundamental cellular processes in embryogenesis and postnatal development. In different cellular contexts, Ror proteins can either activate or repress transcription of Wnt target genes, and can modulate Wnt signaling by sequestering Wnt ligands. In addition, a number of Wnt-independent functions have been proposed for both Ror1 and Ror2 [
]. Proteins containing this domain also include CAM-1 from C. elegans. CAM-1 is a ROR receptor tyrosine kinase that inhibits the Wnt pathway [].
The cysteine-rich-domain (CRD) is an essential part of the secreted frizzled related protein 2 (SFRP2), which has a role in development of the neural system, eyes, muscles and limbs. As a Wnt antagonist, SFRP2 regulates Nkx2.2 expression in the ventral spinal cord and anteroposterior axis elongation [
].In general, SFRPs antagonize the activation of Wnt signaling by binding to the CRD domains of frizzled (Fz) proteins, thereby preventing Wnt proteins from binding to these receptors. SFRPs are also known to have functions unrelated to Wnt, as enhancers of procollagen cleavage by the TLD proteinases [
]. SFRPs and Fz proteins both contain CRD domains, but SFRPs lack the seven-pass transmembrane domain which is an integral part of Fzs [,
].
Phosphotransferase system, enzyme II sorbitol-specific factor
Type:
Family
Description:
Bacterial PTS transporters transport and concomitantly phosphorylate their sugar substrates, and typically consist of multiple subunits or protein domains.The Man family is unique in several respects among PTS permease families.It is the only PTS family in which members possess a IID protein.It is the only PTS family in which the IIB constituent is phosphorylated on a histidyl rather than a cysteyl residue.Its permease members exhibit broad specificity for a range of sugars, rather than being specific for just one or a few sugars.The Gut family consists only of glucitol-specific transporters, but these occur both in Gram-negative and Gram-positive bacteria. Escherichia coli consists of IIA protein, a IIC protein and a IIBC protein. This family is specific for the IIC component.
This superfamily represents YEATS (Yaf9, ENL, AF9, Taf14, and Sas5) domain, an evolutionarily conserved module present in 4 proteins (ENL, AF9, GAS41, and YEATS2) in humans and 3 proteins (Sas5, Taf14, and Yaf9) in yeast. These proteins are found in major chromatin-remodeling and histone acetyl-transferase (HAT) complexes and implicated in regulation of chromatin structure, histone acetylation and deposition, gene transcription and DNA damage response. The YEATS domain, which as previously shown is found in a number of chromatin-associated proteins, has recently been shown to have the capacity to bind histone lysine acetylation [
]. The ability of the YEATS domains of human AF9 and yeast Taf14 to recognise the histone mark H3K9ac, have shown that these proteins are members of the family of acetyllysine readers [].
Proteins can accumulate much damage as they age, such as the formation of isoaspartyl residues. Protein-L-isoaspartyl O-methyltransferase (PIMT) (
) is a nearly ubiquitous enzyme that catalyses the methyl esterification of L-isoaspartyl residues in peptides and proteins that result from spontaneous decomposition of normal L-aspartyl and L-asparaginyl residues, thereby playing a role in the repair or degradation of damaged proteins. The crystal structure of PIMT from the bacteria Thermotoga maritima has been determined and reveals an alpha/beta protein with three functional subdomains. The C-terminal subdomain is unique to the PIMT of T. maritime, with no similar sequences being found in PIMTs from other species [
]. These sequences form an α/β subdomain in three layers, β/α/β, with a buried helix.
The mannose 6-phosphate (Man-6-P) receptor homology (MRH) domain is present in recycling receptors (mannose 6-phosphate receptors, MRPs), resident endoplasmic reticulum (ER) proteins (glucosidase 2 beta subunit, Endoplasmic reticulum lectin 1 XTP3-B, OS-9), and in Golgi glycosyltransferase (GlcNAc-phosphotransferase gamma-subunit), which are characterised by the presence of one or more MRH domains. Many MRH domains act as lectins and bind specific phosphorylated (MPRs) or non phosphorylated (glycosidase 2 beta subunit, XTP3-B and OS-9) high mannose-type N-glycans. The MPRs are the only proteins known to bind Man-6-P residues via their MRH domains. The MRH domain can function in protein-carbohydrate and protein-protein interactions [
,
,
,
,
,
].This domain has a β-barrel structure formed by nine β-strands organised into two orthogonally oriented antiparallel β-sheet [,
].
This is a ~120-amino acid protein-protein interaction module that binds DMAP1 (DNA methyltransferase-associated protein 1), a transcriptional co-repressor. It is found at the N terminus of DNMT1 (DNA methyltransferase 1) [
] and animal disco-interacting protein 2 (DIP-2), a protein that maintains morphology of mature neurons [,
]. This domain is also found in N-acetylglucosamine-1-phosphotransferase subunits alpha/beta (GNPTA) and gamma (GNPTG) from animals, which are members of a complex that catalyses the initial step in the formation of the mannose 6-phopsphate targeting signal on newly synthesized lysosomal acid hydrolases. The DMAP1-binding domain mediates the selective binding GlcNAc-1-phosphotranferase to acid hydrolases [,
].The DMAP1-binding domain is predicted to adopt a long helix-turn-helix structure that is rich in leucine residues [
].
This family represent the bacterial flagellum-specific ATP synthase, FliI, which is needed for flagellar assembly. FliI is part of the flagellar type III protein export apparatus acting as an ATPase to drive protein export for flagellar biosynthesis [
,
,
]. When FliI is not engaged in flagellar protein export, FliH, another flagellar type III protein export apparatus protein, functions as a negative regulator to prevent FliI from hydrolysing ATP []. It has been suggested that the N terminus of FliI interacts with FliH, while the the C-terminal domain of FliI possesses the ATPase catalytic function [,
]. The structure of the N-terminally truncated variant of FliI lacking the first 18 residues have been determinded [].This entry represents one (of three) segment of the FliI family tree.
Herpesviruses are enveloped by a lipid bilayer that contains at least a dozen glycoproteins. The virion surface glycoproteins mediate recognition of susceptible cells and promote fusion of the viral envelope with the cell membrane, leading to virus entry. No single glycoprotein associated with the virion membrane has been identified as the fusogen [].Glycoprotein L (gL) forms a non-covalently linked heterodimer with glycoprotein H (gH). This heterodimer is essential for virus-cell and cell-cell fusion since the association of gH and gL is necessary for correct localisation of gH to the virion or cell surface. gH anchoring the heterodimer to the plasma membrane through its transmembrane domain. gL lacks a transmembrane domain and is secreted from cells when expressed in the absence of gH [
].
Mammalian pentatricopeptide repeat domain (PPR) proteins are involved in regulation of mitochondrial RNA metabolism and translation and are required for mitochondrial function. The supernumerary mitochondrial ribosomal protein of the small subunit 27 (MRPS27) is not required for mitochondrial RNA processing or the stability of the small ribosomal subunit. However, MRPS27 is required for mitochondrial protein synthesis [
].Pentatricopeptide repeat-containing protein 2 (PTCD2) is a related mitochondrial PPR protein, although despite their similarity, they may not have the same function [
,
]. PTCD2 has a role in mitochondrial RNA metabolism and downstream effects on mitochondrial translation [,
]. PTCD2 has been identified as a target for autoantibodies in neurodegenerative disease and is used as a biomarker for Alzheimer disease [].
F-box proteins have a bipartite structure: they contain a carboxy-terminal
domain that interacts with substrates and a 42-48 amino-acid F-box domain which binds to the protein Skp1. A subset of F-box proteins ischaracterized by a ~180-residue carboxy-terminal region, which has been called
the F-box-associated (FBA) domain [,
]. A FBA domain has also been identifiedin the catfish, tilapia, and zebrafish nonspecific cytotoxic cell receptor
proteins (NCCRP-1), which do not contain the F-box domain. NCCRP-1 mayfunction as an antigen recognition molecule and, as such, may participate in
innate immunity in teleosts [,
]. The FBA domain is likely to be aglycoprotein-binding module [
,
]. This entry represents the FBA domain, which is an ellipsoid composed of a ten-stranded antiparallel beta-
sandwich with two α-helices [].