Ribonuclease P (Rnp) is a ubiquitous ribozyme that catalyzes a Mg2 -dependent hydrolysis to remove the 5'-leader sequence of precursor tRNA (pre-tRNA) in all three domains of life [
]. In bacteria, the catalytic RNA (typically ~120kDa) is aided by a small protein cofactor (~14kDa) []. Archaeal and eukaryote RNase P consist of a single RNA and archaeal RNase P has four or five proteins, while eukaryotic RNase P consists of 9 or 10 proteins. Eukaryotic and archaeal RNase P RNAs cooperatively function with protein subunits in catalysis [].This entry includes p29 subunit (also known as Rpp29 or Pop4) of the Ribonuclease P complex [
]. Its homologues from eukaryotes are also a subunit of the RNase MRP complex. The structure of the RNase P subunit, Rpp29, from Methanobacterium thermoautotrophicum has been determined. Mth Rpp29 is a member of the oligonucleotide/oligosaccharide binding fold family. It contains a structured β-barrel core and unstructured N- and C-terminal extensions bearing several highly conserved amino acid residues that could be involved in RNA contacts in the protein-RNA complex []. Rpp29 () catalyses the endonucleolytic cleavage of RNA, removing 5'-extranucleotides from tRNA precursor. It interacts with the Rpp25 and Pop5 subunits.
UHMK1, also termed kinase interacting with stathmin (KIS), Kist or P-CIP2, is a serine/threonine protein kinase functionally related to RNA metabolism and neurite outgrowth. It contains an N-terminal kinase domain and a C-terminal RNA recognition motif (RRM), with high homology to the corresponding motif of the mammalian U2 small nuclear ribonucleoprotein auxiliary factor U2AF 65kDa subunit (U2AF65 or U2AF2) [
]. UHMK1 targets two key regulators of cell proliferation and migration, the cyclin-dependent kinase (CDK) inhibitor p27Kip1 and the microtubule-destabilizing protein stathmin [
]. It plays a critical role during vascular wound repair by preventing excessive vascular smooth muscle cell (VSMC) migration into the vascular lesion []. Moreover, UHMK1 may control cell migration and neurite outgrowth by interacting with and phosphorylating the splicing factor SF1, thereby probably contributing to the control of protein expression []. Furthermore, UHMK1 may be functionally related to microtubule dynamics and axon development. It localizes to RNA granules, interacts with three proteins found in RNA granules (KIF3A, NonO, and eEF1A), and further enhances the local translation []. UHMK1 is highly expressed in regions of the brain implicated in schizophrenia and may play a role in susceptibility to schizophrenia [,
].
Ephrins are a family of proteins [
] that are ligands of class V (EPH-related) receptor protein-tyrosine kinases. Initially identified as regulators of axon pathfinding and neuronal cell migration, the Eph receptors and their ephrin ligands are now known to have roles in many other cell-cell interactions, including those of vascular endothelial cells and specialised epithelia [].Ephrins are membrane-attached proteins of 205 to 340 residues. Attachment
appears to be crucial for their normal function. Type-A ephrins are linked tothe membrane via a GPI linkage, while type-B ephrins are type-I membrane
proteins.The globular ephrin receptor-binding domain (ephrin RBD) is a beta barrel
composed of eight strands arranged in two sheets around a hydrophobic core. Interspersed between beta strands are two alpha helices andone 3(10) helix. The sheets are composed of mixed parallel and antiparallel
beta strands arranged in a Greek key topology. Like other cell-surfaceproteins, ephrins contain disulfide bonds to enhance stability. Two buried
disulfide bonds are present: one pair holds together beta strands C and F, andthe other pair anchors two small helices, E and I, at the top of the barrel
[,
].
Pleiotrophin/Midkine heparin-binding growth factor, conserved site
Type:
Conserved_site
Description:
Several extracellular heparin-binding proteins involved in regulation of growth and differentiation belong to a new family of growth factors. These growth factors are highly related proteins of about 140 amino acids that contain 10 conserved cysteines probably involved in disulphide bonds, and include pleiotrophin [
] (also known as heparin-binding growth-associated molecule HB-GAM, heparin-binding growth factor 8 HBGF-8, heparin-binding neutrophic factor HBNF and osteoblast specific protein OSF-1); midkine (MK) []; retinoic acid-induced heparin-binding protein (RIHB) []; and pleiotrophic factors alpha-1 and -2 and beta-1 and -2 from Xenopus laevis, the homologues of midkine and pleiotrophin respectively. Pleiotrophin is a heparin-binding protein that has neurotrophic activity and has mitogenic activity towards fibroblasts. It is highly expressed in brain and uterus tissues, but is also found in gut, muscle and skin. It is thought to possess an important brain-specific function. Midkine is a regulator of differentiation whose expression is regulated by retinoic acid, and, like pleiotrophin, is a heparin-binding growth/differentiation factor that acts on fibroblasts and nerve cells.This entry represents two conserved sites found within pleiotrophin and midkine heparin-binding growth factors; these conserved sites include the majority of the conserved cysteines.
Ribosomal RNA small subunit methyltransferase E, PUA-like domain
Type:
Domain
Description:
Methyltransferases (Mtases) are responsible for the transfer of methyl groups between two molecules. The transfer of the methyl group from the ubiquitous S-adenosyl-L-methionine (AdoMet) to either nitrogen, oxygen or carbon atoms is frequently employed in diverse organisms. The reaction is catalysed by Mtases and modifies DNA, RNA, proteins or small molecules, such as catechol, for regulatory purposes. Proteins in this entry belong to the RsmE family of Mtases, this is supported by crystal structural studies, which show a close structural homology to other known methyltransferases [
].This group of proteins includes Ribosomal RNA small subunit methyltransferase E (RsmE) from Escherichia coli, which specifically methylates the uridine in position 1498 of 16S rRNA in the fully assembled 30S ribosomal subunit [
,
]. This enzyme has two distinct but structurally related domains: the N-terminal PUA domain and the conserved MTase domain at the C-terminal end. This protein adopts a dimeric configuration that is functionally critical for substrate binding and catalysis [].This entry represents the N-terminal PUA (pseudouridine synthases and archaeosine-specific transglycosylases)-like RNA recognition and binding domain found in RsmE. This domain is mainly responsible for recognition of one substrate molecule (the ribosomal RNA fragment and ribosomal protein complex) [
].
Involucrin [
,
] is a highly reactive, soluble, transglutaminase substrate protein present in keratinocytes of epidermis and other stratified squamous epithelia. Involucrin first appears in the cell cytosol, but ultimately becomes cross-linked to membrane proteins by transglutaminase thus helping in the formation of an insoluble envelope beneath the plasma membrane [] functioning as a glutamyldonor during assembly of the cornified envelope.
Structurally involucrin consists of a conserved region of about 75 amino acid
residues followed by two extremely variable length segments that containglutamine-rich tandem repeats. The glutamine residues in the tandem repeats
are the substrate for the tranglutaminase in the cross-linking reaction. Thetotal size of the protein varies from 285 residues (in dog) to 835 residues
(in orangutan).This is the N-terminal three beta strands of involucrin, a protein present in keratinocytes of epidermis and other stratified squamous epithelia. Apigenin is a plant-derived flavanoid that has significant promise as a skin cancer chemopreventive agent. It has been found that apigenin regulates normal human keratinocyte differentiation by suppressing it and this is associated with reduced cell proliferation without apoptosis [
]. The downstream part of the protein is represented by .
The rrf2-type HTH domain is a DNA-binding, winged helix-turn-helix (wHTH)
domain of about 130 residues present in transcription regulators of the rrf2family. This family of bacterial regulators is named after Desulfovibrio
vulgaris rrf2, a regulator of the hmc operon which encodesiron-sulfur-containing proteins as well as other proteins involved in electron
transport []. Other rrf2-type HTH proteins are regulators of genes involvedin nitrite or iron metabolism, or nitric oxide detoxification.
The N-terminal part of the domain shows similarity to the iclR-type, the gntR-type and marR-type HTH, wherein the DNA-binding HTH motif is followed by a β-hairpin
which is called the wing. The C-terminal part of the rrf2-type HTH domain inmost cases contains 3 conserved cysteine residues that may bind an [2Fe-2S]
cluster, like in iscR and nsrR [,
,
,
,
,
]. The nsrR regulator, which contains annitrogen-oxides-sensing Fe-S cluster that is required for DNA binding, is
implicated in denitrification and/or NO detoxification in diverse pathogenicand environmental bacteria.
This entry represents a conserved site located in the central part of
these proteins that covers the stronger conserved region [], 'the wing', whichstarts directly C-terminal to the 'helix-turn-helix' motif of these proteins.
Rubella virus (RV), the sole member of the genus Rubivirus within the family Togaviridae, is a small enveloped, positive strand RNA virus. The nucleocapsid consists of 40S genomic RNA and a single species of capsid protein which is enveloped within a host-derived lipid bilayer containing two viral glycoproteins, E1 (58kDa) and E2 (42-46kDa). In virus infected cells, RV matures by budding either at the plasma membrane, or at the internal membranes depending on the cell type and enters adjacent uninfected cells by a membrane fusion process in the endosome, directed by E1-E2 heterodimers. The heterodimer formation is crucial for E1 transport out of the endoplasmic reticulum to the Golgi and plasma membrane. In RV E1, a cysteine at position 82 is crucial for the E1-E2 heterodimer formation and cell surface expression of the two proteins. E1 has been shown to be a type 1 membrane protein, rich in cysteine residues with extensive intramolecular disulphide bonds [
].This superfamily represents the domain 1 found in tubella membrane glycoprotein E1. Structurally, it consists of 8 beta sheets which are not contributed by a contiguous polypeptide chain.
TetR family regulators are involved in the transcriptional control of multidrug efflux pumps, pathways for the biosynthesis of antibiotics, response to osmotic stress and toxic chemicals, control of catabolic pathways, differentiation processes, and pathogenicity [
]. The TetR proteins identified in overm ultiple genera of bacteria and archaea share a common helix-turn-helix (HTH) structure in their DNA-binding domain. However, TetR proteins can work in different ways: they can bind a target operator directly to exert their effect (e.g. TetR binds Tet(A) gene to repress it in the absence of tetracycline), or they can be involved in complex regulatory cascades in which the TetR protein can either be modulated by another regulator or TetR can trigger the cellular response []. TetR regulates the expression of the membrane-associated tetracycline resistance protein, TetA, which exports the tetracycline antibiotic out of the cell before it can attach to the ribosomes and inhibit protein synthesis []. TetR blocks transcription from the genes encoding both TetA and TetR in the absence of antibiotic. The C-terminal domain is multi-helical and is interlocked in the homodimer with the helix-turn-helix (HTH) DNA-binding domain [].This entry represents the C-terminal domain present in TetR transcriptional regulations mostly found in proteobacteria.
Phosphotyrosyl phosphatase activator (PTPA, also known as protein phosphatase 2A activator) proteins stimulate the phosphotyrosyl phosphatase (PTPase) activity of the dimeric form of protein phosphatase 2A (PP2A). PTPase activity in PP2A (in vitro) is relatively low when compared to the better recognised phosphoserine/ threonine protein phosphorylase activity. It also reactivates the serine/threonine phosphatase activity of an inactive form of PP2A. The specific biological role of PTPA is unknown. PTPA has been suggested to play a role in the insertion of metals to the PP2A catalytic subunit (PP2Ac) active site, to act as a chaperone, and more recently, to have peptidyl prolyl cis/trans isomerase activity that specifically targets human PP2Ac [,
,
,
,
,
]. Together, PTPA and PP2A constitute an ATPase and it has been suggested that PTPA alters the relative specificity of PP2A from phosphoserine/phosphothreonine substrates to phosphotyrosine substrates in an ATP-hydrolysis-dependent manner. Basal expression of PTPA depends on the activity of a ubiquitous transcription factor, Yin Yang 1 (YY1). The tumour suppressor protein p53 can inhibit PTPA expression through an unknown mechanism that negatively controls YY1 [].PTPA as a multihelical structure fold.
This superfamily represents fibrinogen-binding domain 1. In proteins such as fibrinogen-binding adhesion SdrG and clumping factor A, there are two fibrinogen-binding domains with similar core β-sandwich topologies, but with different modulations in their structure. This entry represents the first domain, while
represents the second domain.
Gram-positive pathogens, such as Staphylococci, Streptococci, and Enterococci, contain multiple cell wall-anchored proteins. Some of these proteins act as adhesins and mediate bacterial attachment to host tissues through lock-and-interactions with host ligands, such as fibrinogen, a glycoprotein found in blood plasma that plays a key role in haemostasis and coagulation. For pathogenic bacteria that do not invade host cells, extracellular matrix proteins are preferred targets for bacterial adhesion; adhesins mediating these interactions have been termed MSCRAMMs (microbial surface components recognizing adhesive matrix molecules). A common binding domain organisation found within MSCRAMMs suggests a common ancestry. Both fibrinogen-binding adhesion SdrG and clumping factor A are MSCRAMMs. Fibrinogen-binding adhesion SdrG is a cell wall-anchored adhesion found in the pathogen Mycobacterium farcinogenes that binds to the B-beta chain of human fibrinogen [
], while clumping factor A performs a similar function in Staphylococcus aureus by binding the gamma chain of fibrinogen [].
The SGT1-specific (SGS) domain is a module of ~90 amino acids, which was
initially identified in eukaryotic Sgt1 proteins []. It was latter also found in calcyclin-binding proteins []. The SGS domain has been shown to bind to proteins of the S100 family, which are thought to function as sensors of calcium ion concentration in the cell [].In budding yeasts, Sgt1 is required for both SCF (Skp1p/Cdc53p-Cullin-F-box)-mediated ubiquitination, cyclic AMP pathway activity and kinetochore function [
]. Its Schizosaccharomyces pombe homologue, Git7, is required for glucose and cyclic AMP signaling, cell wall integrity, and septation []. Its two homologues in Arabidopsis, SGT1a and SGT1b, can complement two yeast temperature-sensitive sgt1 mutant alleles, suggesting that fundamental cellular function(s) of yeast SGT1in SCF-mediated protein ubiquitylation. Moreover, SGT1a and SGT1b can act as cochaperones with HSP90 and HSC70 and function in regulating multiple resistance (R) genes and environmental responses [,
,
,
]. The SGS domain of SGT1 is a key determinant of the HSC70-SGT1 association [].Calcyclin (S100A6) is a member of the S100A family of calcium binding proteins and appears to play a role in cell proliferation [
].
The breast cancer susceptibility gene contains at its C terminus two copies of a conserved domain that was named BRCT for BRCA1 C terminus. This domain of about 95 amino acids is found in a large variety of proteins involved in DNA repair, recombination and cell cycle control [
,
,
]. The BRCT domain is not limited to the C-terminal of protein sequences and can be found in multiple copies or in a single copy as in RAP1 and TdT. BRCT domains are often found as tandem-repeat pairs []. Some data [
] indicate that the BRCT domain functions as a protein-protein interaction module.The structure of the first of the two C-terminal BRCT domains of the human DNA repair protein XRCC1 has been determined by X-ray crystallography [
].Structures of the BRCA1 BRCT domains revealed a basis for a widely utilised head-to-tail BRCT-BRCT oligomerization mode [
]. This conserved tandem BRCT architecture facilitates formation of the canonical BRCT phospho-peptide interaction cleft at a groove between the BRCT domains. BRCT domains disrupt peptide binding by directly occluding this peptide binding groove, or by disrupting key conserved BRCT core folding determinants [].
EF2 (or EFG) participates in the elongation phase of protein synthesis by promoting the GTP-dependent translocation of the peptidyl tRNA of the nascent protein chain from the A-site (acceptor site) to the P-site (peptidyl tRNA site) of the ribosome. EF2 also has a role after the termination phase of translation, where, together with the ribosomal recycling factor, it facilitates the release of tRNA and mRNA from the ribosome, and the splitting of the ribosome into two subunits [
]. EF2 is folded into five domains, with domains I and II forming the N-terminal block, domains IV and V forming the C-terminal block, and domain III providing the covalently-linked flexible connection between the two. Domains III and V have the same fold (although they are not completely superimposable and domain III lacks some of the superfamily characteristics), consisting of an alpha/beta sandwich with an antiparallel β-sheet in a (beta/alpha/beta)x2 topology []. This double split beta/alpha/beta fold is also seen in a number of ribonucleotide binding proteins. It is the most common motif occurring in the translation system and is referred to as the ribonucleoprotein (RNP) or RNA recognition (RRM) motif. This entry represents domain III of EF2 proteins.
This entry represents the methyl-CpG binding domains (MBD) of SETDB1/2 and similar animal proteins. This group of proteins also includes the SETDB1 homologues Histone-lysine N-methyltransferase eggless from Drosophila melanogaster [
], and Histone-lysine N-methyltransferase met-2 Caenorhabditis elegans [].SETDB1 is a member of the histone-lysine N-methyltransferase Suvar3-9 subfamily. Members of this subfamily trimethylate 'Lys-9' of histone H3. H3 'Lys-9' trimethylation represents a specific tag for epigenetic transcriptional repression by recruiting HP1 (CBX1, CBX3 and/or CBX5) proteins to methylated histones [
]. This enzyme mainly functions in euchromatin regions, thereby playing a central role in the silencing of euchromatic genes. H3 'Lys-9' trimethylation is coordinated with DNA methylation. It probably forms a complex with MBD1 and ATF7IP that represses transcription and couples DNA methylation and histone 'Lys-9' trimethylation [].SETDB2 is a histone methyltransferase involved in left-right axis specification in early development and mitosis. It specifically trimethylates 'Lys-9' of histone H3 (H3K9me3), a specific tag for epigenetic transcriptional repression that recruits HP1 (CBX1, CBX3 and/or CBX5) proteins to methylated histones. SETDB2 contributes to H3K9me3 in both the interspersed repetitive elements and centromere-associated repeats. This protein plays a role in chromosome condensation and segregation during mitosis [
].
Members of this protein family are BshA, a glycosyltransferase required for bacillithiol biosynthesis. BshA catalyzes the formation of the first intermediate, GlcNAc-Mal, using L-malate and the glucosamine donor substrate UDP-N-acetylglucosamine (UDP-GlcNAc) as substrates. Bacillithiol is a low-molecular-weight thiol, an analogue of glutathione and mycothiol, and is found largely in the Firmicutes [
]. This family is most closely related to the GT1 family of glycosyltransferases [
]. Glycosyltransferases catalyze the transfer of sugar moieties from activated donor molecules to specific acceptor molecules, forming glycosidic bonds. The acceptor molecule can be a lipid, a protein, a heterocyclic compound, or another carbohydrate residue. Glycosyltransferases may transfer UDP, ADP, GDP, or CMP linked sugars. The diverse enzymatic activities reflect a wide range of biological functions. The protein structure for members of the GT1 family has a GTB topology, one of the two protein topologies observed for nucleotide-sugar-dependent glycosyltransferases. GTB proteins have distinct N- and C- terminal domains each containing a typical Rossmann fold. The two domains have high structural homology despite minimal sequence homology. The large cleft that separates the two domains includes the catalytic centre and permits a high degree of flexibility [].
This is the OB-fold domain found at the C terminus of the wolframin protein. It localizes to the ER lumen and lacks the conserved polar residues typical of OB-fold domain-mediated nucleic acid-binding. It has been predicted to mediate protein-protein interactions [
].Wolframin, a multi-pass membrane protein found in the endoplasmic reticulum, is expressed by the Wolfram syndrome 1 gene (WFS1) [
]. The detailed molecular function of the protein is not known, but it is believed to participate, at least in part, in the regulation of cellular calcium homeostasis by modulating the filling state of the endoplasmic reticulum calcium store.Defects in WFS1 cause Wolfram Syndrome (WFS), also referred to as DIDMOAD [
] - this syndrome is characterised by diabetes insipidus, childhood-onset diabetes mellitus, gradual loss of vision owing to optic atrophy, and deafness []. It is a rare autosomal recessive disorder, and may give rise to other complications affecting the bladder and nervous system.Wolframin homologues have been identified in a range of species, from mammals and amphibia to insects. Notwithstanding regions of high similarity, vertebrate and invertebrate wolframins exhibit characteristic lineage- specific differences. This entry represents the Wolframin family, and includes homologues from invertebrates.
This entry represents the effector-associated domain 2 (EAD2). This domain is one of the most prevalent EADs, predominantly found in actinobacteria and to a lesser extent in cyanobacteria and Chloroflexi. EAD2 occurs fused to the signalling peptidases and nucleotide-utilizing domains fused to the VMAP component of those ternary systems and it has been suggested that it recruits the signalling components of the system []. It is predicted to be an all α-helical domain.Effector-associated domains (EADs) are predicted to function as adaptor domains mediating protein-protein interactions. The EADs show a characteristic architectural pattern. One copy is always fused, typically to the N- or C-terminal, of a core component of a biological conflict system; examples include VMAP (vWA-MoxR associated protein), iSTAND (inactive STAND (iSTAND) NTPase system), or GAP1 (GTPase-associated protein 1). Further copies of the same EAD are fused to either effector or signal-transducing domains, or additional EADs. EAD pairs are frequently observed together on the genome in conserved gene neighborhoods, but can also be severed from such neighborhoods and located in distant regions, indicating EAD-EAD protein domain coupling approximates the advantages of collinear transcription [
,
]. EADs are all small domains with no enzymatic features.
This entry represents the effector-associated domain 1 (EAD1). This domain is one of the most prevalent EADs, widely distributed across bacteria. EAD1 is primarily linked to effector domains that also occur directly fused to the vWA component in MoxR-vWA ternary systems or GAP1 in the case of the GAP1-N1 GTPases systems which suggests that it primarily recruits other effectors to the systems [
]. It is predicted to be an all α-helical domain.Effector-associated domains (EADs) are predicted to function as adaptor domains mediating protein-protein interactions. The EADs show a characteristic architectural pattern. One copy is always fused, typically to the N- or C-terminal, of a core component of a biological conflict system; examples include VMAP (vWA-MoxR associated protein), iSTAND (inactive STAND (iSTAND) NTPase system), or GAP1 (GTPase-associated protein 1). Further copies of the same EAD are fused to either effector or signal-transducing domains, or additional EADs. EAD pairs are frequently observed together on the genome in conserved gene neighborhoods, but can also be severed from such neighborhoods and located in distant regions, indicating EAD-EAD protein domain coupling approximates the advantages of collinear transcription [
,
]. EADs are all small domains with no enzymatic features.
(Bovine immunodeficiency virus) (BIV), like the human immunodeficiency virus, is a lentivirus. It shows a great deal of genomic diversity, mostly in the viral envelope gene [
]. This property of the BIV group of viruses may play an important role in the pathobiology of the virus, particularly the conserved (C) 2, hypervariable (V) 1, V2 and C3 regions [].The surface protein (SU) attaches the virus to the host cell by binding to its receptor. This interaction triggers the refolding of the transmembrane protein (TM) and is thought to activate its fusogenic potential by unmasking its fusion peptide. Fusion occurs at the host cell plasma membrane.The transmembrane protein (TM) acts as a class I viral fusion protein. Under the current model, the protein has at least 3 conformational states: pre-fusion native state, pre-hairpin intermediate state, and post-fusion hairpin state. During viral and target cell membrane fusion, the coiled coil regions (heptad repeats) assume a trimer-of-hairpins structure, positioning the fusion peptide in close proximity to the C-terminal region of the ectodomain. The formation of this structure appears to drive apposition and subsequent fusion of viral and target cell membranes. Membranes fusion leads to delivery of the nucleocapsid into the cytoplasm.
Semaphorins are a large and diverse family of proteins, widely expressed across divergent animal phyla, divided into eight classes (1-7 and V) which are structurally and functionally conserved. These proteins are critical in neural development and adult brain plasticity being involved in axon guidance, neural connectivity, angiogenesis, immunoregulation, and also in cancer [
,
,
].This entry represents the C-terminal domain of the transmembrane Semaphorin 4F (Sema4F, also known as Semaphorin-W), which plays a role in visual system development and in oligodendrocyte precursor migration and differentiation. Like other Semaphorin 4 members, it includes a Ig-like domain, adjacent the PSI domain (plexin-semaphorin-integrin domain, conserved across all semaphorins) [
], a transmembrane region and a short intracellular tail. The latter is rich in proline residues similar to other transmembrane semaphorins suggesting that it may interact with cytoskeletal or signalling proteins. It also contains a cyclic nucleotide-dependent protein kinase phosphorylation site, involved in signal transduction into the cell. Sema4F is found in glutamatergic synapses, preferentially in postsynaptic dendrites, attached to SAP90/PSD-95-scaffolding protein. Sema4F interacts with SAP90/PSD-95 PDZ domains through the critical last three C-terminal amino acids [,
].
The Schlafen (SLFN) family includes several mouse and human member genes that have been implicated in important functions, such as the control of cell proliferation, induction of immune responses, and the regulation of viral replication [
,
,
,
]. Mouse and human SLFN proteins are regulated by interferons (IFNs) []. All SLFNs contain an unique Slfn box. SLFN family is comprised of 3 groups, based on the size of the encoded proteins [,
,
,
]:Group1: Slfn1, Slfn2, and Slfn Like 1.Group2: Slfn3, Slfn4 and Slfn12.Group3: Slfn5, Slfn8-11, Slfn13 and Slfn14.Comparing to group1 proteins, group2 and 3 proteins contain an extra SWADL domain C terminus to the AAA domain. Group3 proteins also possess a large extension C terminus to their SWADL domain. This C-terminal extension is homologous to the superfamily I of DNA/RNA helicases [
]. It has been proposed that the divergent AAA ATPase domain may function as an RNA-binding domain []. This entry also includes orthopoxvirus sequences. Analyses indicate that a member of the Schlafen family was horizontally transferred from murine rodents to orthopoxviruses, where it is hypothesised to play a role in allowing the virus to survive host immune defense mechanisms [
].
This entry represents a methyltransferase domain found in proteins which are involved in C1 metabolism in methanogenic archaea and methylotrophic bacteria. It is closely related to, yet is distinct from, uroporphyrinogen decarboxylase (
).
In methanogens, this domain is found in single-domain proteins catalysing the transfer of the methyl group of methylcobalamin to coenzymeM [
]. In Methanosarcina barkeri these proteins have been shown to be involved in methanogenesis from C1 compounds such as methylamine and dimethylsulphide [,
].CH3-cob(III)alamin + coenzymeM -->cob(I)alamin + CH3-coenzymeM In methylotrophic bacteria, this domain is found at the N terminus of the bifunctional corrinoid-binding CmuA protein, which is involved in chloromethane utilisation. The pathway of chloromethane utilisation allows the microorganisms that possess it to grow with chloromethane as the sole carbon and energy source. It is initiated by a corrinoid-dependent methyltransferase system involving methyltransferase I (CmuA) and methyltransferase II (CmuB), which transfer the methyl group of chloromethane onto tetrahydrofolate [
]. The methyl group of chloromethane is first transferred by the protein CmuA to its corrinoid moiety, from where it is transferred to tetrahydrofolate by CmuB, thereby yielding methyltetrahydrofolate [,
].CH3Cl + cob(I)alamin --->CH3-cob(III)alamin + Cl-
The LytTR domain is a DNA-binding, potential winged helix-turn-helix (wHTH) domain of about 100 amino acids, present in bacterial transcriptional regulators of the algR/agrA/lytR family. It is named after Bacillus subtilis LytT and Staphylococcus aureus lytR response regulators, involved in the regulation of cell autolysis [
]. The LytTR domain is found in several bacterial cytoplasmic proteins that regulate the production of important virulence factors, like extracellular polysaccharides, toxins and bacteriocins. These response regulators of the microbial two-component signal transduction systems contain N-terminal cheY-like domains and the LytTR domain in the C-terminal part is expected to bind to specific DNA sequences in the upstream regions of target genes [
]. Other domains found N-terminally of LytTR are an ATP-binding domain of ABC-type transporter family and a PAS domain. Besides these cytoplasmic proteins, LytTR also occurs in membrane-bound bacterial proteins. The N-terminal parts of these proteins contain three to eight predicted transmembrane segments or the transmembrane MHYT domain.According to secondary structure predictions the LytTR domain consists of four
β-strands and three α-helices. The structure is predicted to form anovel type of DNA-binding domain, showing similarity to the 'winged HTH' or
'winged helix' proteins [,
].
In the human adult, FKBP-12, also known as FAP68 or FK506-binding protein, (protein product of the gene glomulin) is expressed at high levels in skeletal muscle, heart, brain and kidney, and at low levels in smaller arteries and veins. The high expression of glomulin in murine vasculature suggests an important role in blood vessel development and/or maintenance, which is supported by the vascular phenotype seen in GVM patients with mutations in this gene [
]. It is a cytoplasmic protein, specifically bound by the non-phosphorylated form of the hepatocyte growth factor, HGF, receptor and is released upon HGF stimulation and receptor phosphorylation, suggesting a potential role for FKBP12 in linking HGF signaling to the regulation of protein synthesis []. FKBP12 is found to regulate Ca2+release from the sarcoplasmic reticulum (SR) by its action on the IP3 receptors (IP3Rs), which crucially regulate diverse cell signalling processes from reproduction to apoptosis, possibly via the kinase rapamycin (mTOR), that potentiates Ca2+ release from the IP3R in smooth muscle [
]. This entry also include the aberrant root formation protein 4 (At5g11030) from
Arabidopsis. It is required for the initiation of lateral roots independent from auxin signaling [
].
In Escherichia coli, XapR is a positive regulator for the expression of xapA gene, encoding xanthosine phosphorylase, and xapB gene, encoding a polypeptide similar to the nucleotide transport protein NupG. As an operon, the expression of both xapA and xapB is fully dependent on the presence of both XapR and the inducer xanthosine. Expression of the xapR is constitutive but not auto-regulated, unlike many other LysR family proteins [
]. This substrate-binding domain shows significant homology to the type 2 periplasmic binding proteins (PBP2).The PBP2 are responsible for the uptake of a variety of substrates such as phosphate, sulfate, polysaccharides, lysine/arginine/ornithine, and histidine. The PBP2 bind their ligand in the cleft between these domains in a manner resembling a Venus flytrap. After binding their specific ligand with high affinity, they can interact with a cognate membrane transport complex comprised of two integral membrane domains and two cytoplasmically located ATPase domains. This interaction triggers the ligand translocation across the cytoplasmic membrane energized by ATP hydrolysis. Besides transport proteins, the PBP2 superfamily includes the substrate- binding domains from ionotropic glutamate receptors, LysR-like transcriptional regulators, and unorthodox sensor proteins involved in signal transduction [
,
,
].
This family of proteins encompasses the Escherichia coli WcaJ protein involved in colanic acid biosynthesis [
,
], the Methylobacillus EpsB protein involved in methanolan biosynthesis [], as well as the GumD protein involved in the biosynthesis of xanthan []. All of these are closely related to the well-characterised WbaP (formerly RfbP) protein [] which is the first enzyme in O-antigen biosynthesis in Salmonella typhimurium. The enzyme transfers galactose from UDP-galactose (NOTE: not glucose) to a polyprenyl carrier (utilising the highly conserved C-terminal sugar transferase domain, ) a reaction that takes place at the cytoplasmic face of the inner membrane. The N-terminal hydrophobic domain is then believed to facilitate the "flippase"function of transferring the liposaccharide unit from the cytoplasmic face to the periplasmic face of the inner membrane. Most of these genes are found within large operons dedicated to the production of complex exopolysaccharides such as the enterobacterial O-antigen. Colanic acid biosynthesis utilises a glucose-undecaprenyl carrier [
], knockout of EpsB abolishes incorporation of UDP-glucose into the lipid phase [] and the C-terminal portion of GumD has been shown to be responsible for the glucosyl-1-transferase activity [].
This group of cysteine aminopeptidases belong to the peptidase family C5 (adenain family, clan CE). Several adenovirus proteins are synthesised as precursors, requiring
processing by a protease before the virion is assembled [,
]. Untilrecently, the adenovirus endopeptidase was classified as a serine protease,
having been reported to be inhibited by serine protease inhibitors [,
].However, it has since been shown to be inhibited by cysteine protease
inhibitors, and the catalytic residues are believed to be His-54 andCys-104 [
,
].A cysteine peptidase is a proteolytic enzyme that hydrolyses a peptide bond using the thiol group of a cysteine residue as a nucleophile. Hydrolysis involves usually a catalytic triad consisting of the thiol group of the cysteine, the imidazolium ring of a histidine, and a third residue, usually asparagine or aspartic acid, to orientate and activate the imidazolium ring. In only one family of cysteine peptidases, is the role of the general base assigned to a residue other than a histidine: in peptidases from family C89 (acid ceramidase) an arginine is the general base. Cysteine peptidases can be grouped into fourteen different clans, with members of each clan possessing a tertiary fold unique to the clan. Four clans of cysteine peptidases share structural similarities with serine and threonine peptidases and asparagine lyases. From sequence similarities, cysteine peptidases can be clustered into over 80 different families [
]. Clans CF, CM, CN, CO, CP and PD contain only one family.Cysteine peptidases are often active at acidic pH and are therefore confined to acidic environments, such as the animal lysosome or plant vacuole. Cysteine peptidases can be endopeptidases, aminopeptidases, carboxypeptidases, dipeptidyl-peptidases or omega-peptidases. They are inhibited by thiol chelators such as iodoacetate, iodoacetic acid,
N-ethylmaleimide or
p-chloromercuribenzoate.
Clan CA includes proteins with a papain-like fold. There is a catalytic triad which occurs in the order: Cys/His/Asn (or Asp). A fourth residue, usually Gln, is important for stabilising the acyl intermediate that forms during catalysis, and this precedes the active site Cys. The fold consists of two subdomains with the active site between them. One subdomain consists of a bundle of helices, with the catalytic Cys at the end of one of them, and the other subdomain is a β-barrel with the active site His and Asn (or Asp). There are over thirty families in the clan, and tertiary structures have been solved for members of most of these. Peptidases in clan CA are usually sensitive to the small molecule inhibitor E64, which is ineffective against peptidases from other clans of cysteine peptidases [
].Clan CD includes proteins with a caspase-like fold. Proteins in the clan have an α/β/α sandwich structure. There is a catalytic dyad which occurs in the order His/Cys. The active site His occurs in a His-Gly motif and the active site Cys occurs in an Ala-Cys motif; both motifs are preceded by a block of hydrophobic residues []. Specificity is predominantly directed towards residues that occupy the S1 binding pocket, so that caspases cleave aspartyl bonds, legumains cleave asparaginyl bonds, and gingipains cleave lysyl or arginyl bonds.Clan CE includes proteins with an adenain-like fold. The fold consists of two subdomains with the active site between them. One domain is a bundle of helices, and the other a β-barrel. The subdomains are in the opposite order to those found in peptidases from clan CA, and this is reflected in the order of active site residues: His/Asn/Gln/Cys. This has prompted speculation that proteins in clans CA and CE are related, and that members of one clan are derived from a circular permutation of the structure of the other.Clan CL includes proteins with a sortase B-like fold. Peptidases in the clan hydrolyse and transfer bacterial cell wall peptides. The fold shows a closed β-barrel decorated with helices with the active site at one end of the barrel [
]. The active site consists of a His/Cys catalytic dyad.Cysteine peptidases with a chymotrypsin-like fold are included in clan PA, which also includes serine peptidases. Cysteine peptidases that are N-terminal nucleophile hydrolases are included in clan PB. Cysteine peptidases with a tertiary structure similar to that of the serine-type aspartyl dipeptidase are included in clan PC. Cysteine peptidases with an intein-like fold are included in clan PD, which also includes asparagine lyases.
A cysteine peptidase is a proteolytic enzyme that hydrolyses a peptide bond using the thiol group of a cysteine residue as a nucleophile. Hydrolysis involves usually a catalytic triad consisting of the thiol group of the cysteine, the imidazolium ring of a histidine, and a third residue, usually asparagine or aspartic acid, to orientate and activate the imidazolium ring. In only one family of cysteine peptidases, is the role of the general base assigned to a residue other than a histidine: in peptidases from family C89 (acid ceramidase) an arginine is the general base. Cysteine peptidases can be grouped into fourteen different clans, with members of each clan possessing a tertiary fold unique to the clan. Four clans of cysteine peptidases share structural similarities with serine and threonine peptidases and asparagine lyases. From sequence similarities, cysteine peptidases can be clustered into over 80 different families [
]. Clans CF, CM, CN, CO, CP and PD contain only one family.Cysteine peptidases are often active at acidic pH and are therefore confined to acidic environments, such as the animal lysosome or plant vacuole. Cysteine peptidases can be endopeptidases, aminopeptidases, carboxypeptidases, dipeptidyl-peptidases or omega-peptidases. They are inhibited by thiol chelators such as iodoacetate, iodoacetic acid,
N-ethylmaleimide or
p-chloromercuribenzoate.
Clan CA includes proteins with a papain-like fold. There is a catalytic triad which occurs in the order: Cys/His/Asn (or Asp). A fourth residue, usually Gln, is important for stabilising the acyl intermediate that forms during catalysis, and this precedes the active site Cys. The fold consists of two subdomains with the active site between them. One subdomain consists of a bundle of helices, with the catalytic Cys at the end of one of them, and the other subdomain is a β-barrel with the active site His and Asn (or Asp). There are over thirty families in the clan, and tertiary structures have been solved for members of most of these. Peptidases in clan CA are usually sensitive to the small molecule inhibitor E64, which is ineffective against peptidases from other clans of cysteine peptidases [
].Clan CD includes proteins with a caspase-like fold. Proteins in the clan have an α/β/α sandwich structure. There is a catalytic dyad which occurs in the order His/Cys. The active site His occurs in a His-Gly motif and the active site Cys occurs in an Ala-Cys motif; both motifs are preceded by a block of hydrophobic residues [
]. Specificity is predominantly directed towards residues that occupy the S1 binding pocket, so that caspases cleave aspartyl bonds, legumains cleave asparaginyl bonds, and gingipains cleave lysyl or arginyl bonds.Clan CE includes proteins with an adenain-like fold. The fold consists of two subdomains with the active site between them. One domain is a bundle of helices, and the other a β-barrel. The subdomains are in the opposite order to those found in peptidases from clan CA, and this is reflected in the order of active site residues: His/Asn/Gln/Cys. This has prompted speculation that proteins in clans CA and CE are related, and that members of one clan are derived from a circular permutation of the structure of the other.Clan CL includes proteins with a sortase B-like fold. Peptidases in the clan hydrolyse and transfer bacterial cell wall peptides. The fold shows a closed β-barrel decorated with helices with the active site at one end of the barrel [
]. The active site consists of a His/Cys catalyticdyad.
This group of cysteine peptidases belong to MEROPS peptidase family C53 (clan C-). The active site residues occur in the order E, H, C in the sequence which is unlike that in any other family. They are unique to pestiviruses. The N-terminal cysteine peptidase (Npro) encoded by the bovine viral diarrhoea virus genome is responsible for the self-cleavage that releases the N terminus of the core protein. This unique protease is dispensable for viral replication, and its coding region can be replaced by a ubiquitin gene directly fused in frame to the core [
,
,
,
].This superfamily represents a domain consisting of 5 anti-parallel beta strands. The domain also contains a TRASH motif connecting the B1-B2 linker to the B3 strand, and this motif means the domain is able to interact with immune-relevant factors IRF3, IRF7 and Hax1.
ABC transporters belong to the ATP-Binding Cassette (ABC) superfamily, which uses the hydrolysis of ATP to energise diverse biological systems. ABC transporters minimally consist of two conserved regions: a highly conserved ATP binding cassette (ABC) and a less conserved transmembrane domain (TMD). These can be found on the same protein or on two different ones. Most ABC transporters function as a dimer and therefore are constituted of four domains, two ABC modules and two TMDs.ABC transporters are involved in the export or import of a wide variety of substrates ranging from small ions to macromolecules. The major function of ABC import systems is to provide essential nutrients to bacteria. They are found only in prokaryotes and their four constitutive domains are usually encoded by independent polypeptides (two ABC proteins and two TMD proteins). Prokaryotic importers require additional extracytoplasmic binding proteins (one or more per systems) for function. In contrast, export systems are involved in the extrusion of noxious substances, the export of extracellular toxins and the targeting of membrane components. They are found in all living organisms and in general the TMD is fused to the ABC module in a variety of combinations. Some eukaryotic exporters encode the four domains on the same polypeptide chain [].The ABC module (approximately two hundred amino acid residues) is known to bind and hydrolyse ATP, thereby coupling transport to ATP hydrolysis in a large number of biological processes. The cassette is duplicated in several subfamilies. Its primary sequence is highly conserved, displaying a typical phosphate-binding loop: Walker A, and a magnesium binding site: Walker B. Besides these two regions, three other conserved motifs are present in the ABC cassette: the switch region which contains a histidine loop, postulated to polarise the attaching water molecule for hydrolysis, the signature conserved motif (LSGGQ) specific to the ABC transporter, and the Q-motif (between Walker A and the signature), which interacts with the gamma phosphate through a water bond. The Walker A, Walker B, Q-loop and switch region form the nucleotide binding site [,
,
].The 3D structure of a monomeric ABC module adopts a stubby L-shape with two distinct arms. ArmI (mainly β-strand) contains Walker A and Walker B. The important residues for ATP hydrolysis and/or binding are located in the P-loop. The ATP-binding pocket is located at the extremity of armI. The perpendicular armII contains mostly the alpha helical subdomain with the signature motif. It only seems to be required for structural integrity of the ABC module. ArmII is in direct contact with the TMD. The hinge between armI and armII contains both the histidine loop and the Q-loop, making contact with the gamma phosphate of the ATP molecule. ATP hydrolysis leads to a conformational change that could facilitate ADP release. In the dimer the two ABC cassettes contact each other through hydrophobic interactions at the antiparallel β-sheet of armI by a two-fold axis [
,
,
,
,
,
].The ATP-Binding Cassette (ABC) superfamily forms one of the largest of all protein families with a diversity of physiological functions [
]. Several studies have shown that there is a correlation between the functional characterisation and the phylogenetic classification of the ABC cassette [,
]. More than 50 subfamilies have been described based on a phylogenetic and functional classification [,
,
].Bacteria have elaborate pathways for the production of toxins and secondary metabolites. Many such compounds, including syringomycin and pyoverdine are synthesized on non-ribosomal templates consisting of a multienzyme complex. On several occasions the proteins of the complex and transporter protein are present on the same operon. Other times these compounds cross the biological membrane by specific transporters. Syringomycin is an amphipathic, cyclic lipodepsipeptide when inserted into host causes formation of channels, permeable to a variety of cations. On the other hand, pyoverdine is a cyclic octa-peptidyl dihydroxyquinoline, which is efficient in sequestering iron for uptake.This entry describes a family of cyclic peptide transporters in bacteria. It includes SyrD [
] and PvdE [] from Pseudomonas, and YojI from Escherichia coli [].
This domain is found in cysteine peptidases belonging to MEROPS peptidase family C25 (gingipain). Gingipains are cysteine proteinases acting as key virulence factors of the bacterium Porphyromonas gingivalis (Bacteroides gingivalis), a Gram-negative anaerobic bacterial species strongly associated with adult periodontitis [].A cysteine peptidase is a proteolytic enzyme that hydrolyses a peptide bond using the thiol group of a cysteine residue as a nucleophile. Hydrolysis involves usually a catalytic triad consisting of the thiol group of the cysteine, the imidazolium ring of a histidine, and a third residue, usually asparagine or aspartic acid, to orientate and activate the imidazolium ring. In only one family of cysteine peptidases, is the role of the general base assigned to a residue other than a histidine: in peptidases from family C89 (acid ceramidase) an arginine is the general base. Cysteine peptidases can be grouped into fourteen different clans, with members of each clan possessing a tertiary fold unique to the clan. Four clans of cysteine peptidases share structural similarities with serine and threonine peptidases and asparagine lyases. From sequence similarities, cysteine peptidases can be clustered into over 80 different families [
]. Clans CF, CM, CN, CO, CP and PD contain only one family.Cysteine peptidases are often active at acidic pH and are therefore confined to acidic environments, such as the animal lysosome or plant vacuole. Cysteine peptidases can be endopeptidases, aminopeptidases, carboxypeptidases, dipeptidyl-peptidases or omega-peptidases. They are inhibited by thiol chelators such as iodoacetate, iodoacetic acid,
N-ethylmaleimide or
p-chloromercuribenzoate.
Clan CA includes proteins with a papain-like fold. There is a catalytic triad which occurs in the order: Cys/His/Asn (or Asp). A fourth residue, usually Gln, is important for stabilising the acyl intermediate that forms during catalysis, and this precedes the active site Cys. The fold consists of two subdomains with the active site between them. One subdomain consists of a bundle of helices, with the catalytic Cys at the end of one of them, and the other subdomain is a β-barrel with the active site His and Asn (or Asp). There are over thirty families in the clan, and tertiary structures have been solved for members of most of these. Peptidases in clan CA are usually sensitive to the small molecule inhibitor E64, which is ineffective against peptidases from other clans of cysteine peptidases [
].Clan CD includes proteins with a caspase-like fold. Proteins in the clan have an α/β/α sandwich structure. There is a catalytic dyad which occurs in the order His/Cys. The active site His occurs in a His-Gly motif and the active site Cys occurs in an Ala-Cys motif; both motifs are preceded by a block of hydrophobic residues [
]. Specificity is predominantly directed towards residues that occupy the S1 binding pocket, so that caspases cleave aspartyl bonds, legumains cleave asparaginyl bonds, and gingipains cleave lysyl or arginyl bonds.Clan CE includes proteins with an adenain-like fold. The fold consists of two subdomains with the active site between them. One domain is a bundle of helices, and the other a β-barrel. The subdomains are in the opposite order to those found in peptidases from clan CA, and this is reflected in the order of active site residues: His/Asn/Gln/Cys. This has prompted speculation that proteins in clans CA and CE are related, and that members of one clan are derived from a circular permutation of the structure of the other.Clan CL includes proteins with a sortase B-like fold. Peptidases in the clan hydrolyse and transfer bacterial cell wall peptides. The fold shows a closed β-barrel decorated with helices with the active site at one end of the barrel [
]. The active site consists of a His/Cys catalytic dyad.Cysteine peptidases with a chymotrypsin-like fold are included in clan PA, which also includes serine peptidases. Cysteine peptidases that are N-terminal nucleophile hydrolases are included in clan PB. Cysteine peptidases with a tertiary structure similar to that of the serine-type aspartyl dipeptidase are included in clan PC. Cysteine peptidases with an intein-like fold are included in clan PD, which also includes asparagine lyases.
This group of cysteine peptidases belong to MEROPS peptidase family C50 (separase family, clan CD). The active site residues for members of this family and family C14 occur in the same order in the sequence: H,C.The separases are caspase-like proteases, which plays a central role in the chromosome segregation. In yeast they cleave the rad21 subunit of the cohesin complex at the onset of anaphase. During most of the cell cycle, separase is inactivated by the securin/cut2 protein, which probably covers its active site. A cysteine peptidase is a proteolytic enzyme that hydrolyses a peptide bond using the thiol group of a cysteine residue as a nucleophile. Hydrolysis involves usually a catalytic triad consisting of the thiol group of the cysteine, the imidazolium ring of a histidine, and a third residue, usually asparagine or aspartic acid, to orientate and activate the imidazolium ring. In only one family of cysteine peptidases, is the role of the general base assigned to a residue other than a histidine: in peptidases from family C89 (acid ceramidase) an arginine is the general base. Cysteine peptidases can be grouped into fourteen different clans, with members of each clan possessing a tertiary fold unique to the clan. Four clans of cysteine peptidases share structural similarities with serine and threonine peptidases and asparagine lyases. From sequence similarities, cysteine peptidases can be clustered into over 80 different families [
]. Clans CF, CM, CN, CO, CP and PD contain only one family.Cysteine peptidases are often active at acidic pH and are therefore confined to acidic environments, such as the animal lysosome or plant vacuole. Cysteine peptidases can be endopeptidases, aminopeptidases, carboxypeptidases, dipeptidyl-peptidases or omega-peptidases. They are inhibited by thiol chelators such as iodoacetate, iodoacetic acid,
N-ethylmaleimide or
p-chloromercuribenzoate.
Clan CA includes proteins with a papain-like fold. There is a catalytic triad which occurs in the order: Cys/His/Asn (or Asp). A fourth residue, usually Gln, is important for stabilising the acyl intermediate that forms during catalysis, and this precedes the active site Cys. The fold consists of two subdomains with the active site between them. One subdomain consists of a bundle of helices, with the catalytic Cys at the end of one of them, and the other subdomain is a β-barrel with the active site His and Asn (or Asp). There are over thirty families in the clan, and tertiary structures have been solved for members of most of these. Peptidases in clan CA are usually sensitive to the small molecule inhibitor E64, which is ineffective against peptidases from other clans of cysteine peptidases [].Clan CD includes proteins with a caspase-like fold. Proteins in the clan have an α/β/α sandwich structure. There is a catalytic dyad which occurs in the order His/Cys. The active site His occurs in a His-Gly motif and the active site Cys occurs in an Ala-Cys motif; both motifs are preceded by a block of hydrophobic residues [
]. Specificity is predominantly directed towards residues that occupy the S1 binding pocket, so that caspases cleave aspartyl bonds, legumains cleave asparaginyl bonds, and gingipains cleave lysyl or arginyl bonds.Clan CE includes proteins with an adenain-like fold. The fold consists of two subdomains with the active site between them. One domain is a bundle of helices, and the other a β-barrel. The subdomains are in the opposite order to those found in peptidases from clan CA, and this is reflected in the order of active site residues: His/Asn/Gln/Cys. This has prompted speculation that proteins in clans CA and CE are related, and that members of one clan are derived from a circular permutation of the structure of the other.Clan CL includes proteins with a sortase B-like fold. Peptidases in the clan hydrolyse and transfer bacterial cell wall peptides. The fold shows a closed β-barrel decorated with helices with the active site at one end of the barrel [
]. The active site consists of a His/Cys catalytic dyad.Cysteine peptidases with a chymotrypsin-like fold are included in clan PA, which also includes serine peptidases. Cysteine peptidases that are N-terminal nucleophile hydrolases are included in clan PB. Cysteine peptidases with a tertiary structure similar to that of the serine-type aspartyl dipeptidase are included in clan PC. Cysteine peptidases with an intein-like fold are included in clan PD, which also includes asparagine lyases.
This group of peptidases, belong to MEROPS peptidase family C69 (dipeptidase, clan PB). They are mainly dipeptidases [
] and incude dipeptidase A from Lactobacillus helveticus (MEROPS identifier C69.001). Comparative sequence and structural analysis, particularly to penicillin V acylase (MEROPS peptidase family C59) revealed a cysteine as the catalytic nucleophile as well as other conserved residues important for catalysis [
]. In general, C69 family is variable in sequence and exhibits great diversity in substrate specificity, to include enzymes such as choloyglycine hydrolases, acid ceramidases, isopenicillin N acyltransferases, and a subgroup of eukaryotic proteins with unclear function.A cysteine peptidase is a proteolytic enzyme that hydrolyses a peptide bond using the thiol group of a cysteine residue as a nucleophile. Hydrolysis involves usually a catalytic triad consisting of the thiol group of the cysteine, the imidazolium ring of a histidine, and a third residue, usually asparagine or aspartic acid, to orientate and activate the imidazolium ring. In only one family of cysteine peptidases, is the role of the general base assigned to a residue other than a histidine: in peptidases from family C89 (acid ceramidase) an arginine is the general base. Cysteine peptidases can be grouped into fourteen different clans, with members of each clan possessing a tertiary fold unique to the clan. Four clans of cysteine peptidases share structural similarities with serine and threonine peptidases and asparagine lyases. From sequence similarities, cysteine peptidases can be clustered into over 80 different families [
]. Clans CF, CM, CN, CO, CP and PD contain only one family.Cysteine peptidases are often active at acidic pH and are therefore confined to acidic environments, such as the animal lysosome or plant vacuole. Cysteine peptidases can be endopeptidases, aminopeptidases, carboxypeptidases, dipeptidyl-peptidases or omega-peptidases. They are inhibited by thiol chelators such as iodoacetate, iodoacetic acid,
N-ethylmaleimide or
p-chloromercuribenzoate.
Clan CA includes proteins with a papain-like fold. There is a catalytic triad which occurs in the order: Cys/His/Asn (or Asp). A fourth residue, usually Gln, is important for stabilising the acyl intermediate that forms during catalysis, and this precedes the active site Cys. The fold consists of two subdomains with the active site between them. One subdomain consists of a bundle of helices, with the catalytic Cys at the end of one of them, and the other subdomain is a β-barrel with the active site His and Asn (or Asp). There are over thirty families in the clan, and tertiary structures have been solved for members of most of these. Peptidases in clan CA are usually sensitive to the small molecule inhibitor E64, which is ineffective against peptidases from other clans of cysteine peptidases [].Clan CD includes proteins with a caspase-like fold. Proteins in the clan have an α/β/α sandwich structure. There is a catalytic dyad which occurs in the order His/Cys. The active site His occurs in a His-Gly motif and the active site Cys occurs in an Ala-Cys motif; both motifs are preceded by a block of hydrophobic residues [
]. Specificity is predominantly directed towards residues that occupy the S1 binding pocket, so that caspases cleave aspartyl bonds, legumains cleave asparaginyl bonds, and gingipains cleave lysyl or arginyl bonds.Clan CE includes proteins with an adenain-like fold. The fold consists of two subdomains with the active site between them. One domain is a bundle of helices, and the other a β-barrel. The subdomains are in the opposite order to those found in peptidases from clan CA, and this is reflected in the order of active site residues: His/Asn/Gln/Cys. This has prompted speculation that proteins in clans CA and CE are related, and that members of one clan are derived from a circular permutation of the structure of the other.Clan CL includes proteins with a sortase B-like fold. Peptidases in the clan hydrolyse and transfer bacterial cell wall peptides. The fold shows a closed β-barrel decorated with helices with the active site at one end of the barrel [
]. The active site consists of a His/Cys catalytic dyad.Cysteine peptidases with a chymotrypsin-like fold are included in clan PA, which also includes serine peptidases. Cysteine peptidases that are N-terminal nucleophile hydrolases are included in clan PB. Cysteine peptidases with a tertiary structure similar to that of the serine-type aspartyl dipeptidase are included in clan PC. Cysteine peptidases with an intein-like fold are included in clan PD, which also includes asparagine lyases.
Saccharomyces cerevisiae strains containing the erg8-1 mutation are temperature sensitive for growth
due to a defect in phosphomevalonate kinase, an enzyme of isoprene and ergosterol biosynthesis.
Subcloning and DNA sequencing have defined the functional ERG8 regulon as an 850bp upstream region and an adjacent 1,272bp open reading frame. The deduced ERG8 protein contains 424 residues and shows
no similarity to known proteins, except within a putative ATP-binding domain present in many kinases [
]. Enzymes that share the N-terminal Gly/Ser-rich putative ATP-binding region include galactokinase, homoserine kinase, mevalonate kinase and phosphomevalonate kinase. Homoserine kinase
is a homodimeric enzyme involved in threonine biosynthesis. Sequence comparison of the yeast enzyme with the corresponding proteins from bacterial sources reveals the presence of several highly conserved
regions, the pattern of occurrence of which suggests that the ancestral sequences might have been composed from separate (functional) domains. A block of similar residues, found towards the C terminus,
is also present in many other proteins involved in threonine (or serine) metabolism; this motif may therefore represent the binding site for the hydroxy-amino acids. Limited similarity was detected
between a motif conserved among the homoserine kinases and consensus sequences found in other mono- or dinucleotide-binding proteins [
].
The α-helical ferredoxin domain contains two Fe4-S4 clusters, typical of bacterial ferredoxin. Iron-sulphur proteins play an important role in electron transfer processes and in various enzymatic reactions. In eukaryotes, the mitochondria are the major site of Fe-S cluster biosynthesis in the cell, used for the assembly of mitochondrial and non-mitochondrial Fe-S proteins. The α-helical ferredoxin domain is present in several proteins involved in redox reactions, including the C-terminal of the respiratory proteins succinate dehydrogenase (SQR) in bacteria/mitochondria, and fumarate reductase (QFR) in bacteria. SQR is analogous to the mitochondrial respiratory complex II, and is involved in the electron transport pathway from succinate as a donor to the acceptor ubiquinone. SQR helps prevent the formation of reactive oxygen species and is used during aerobic respiration, whereas QFR does not and, consequently, is used to catalyse the final step of anaerobic respiration using the acceptor fumarate [
].The α-helical ferredoxin domain is also present in the N-terminal of the cytosolic protein dihydropyrimidine dehydrogenase, (DPD) which catalyses the NADPH-dependent, rate-limiting step in pyrimidine degradation, converting pyrimidines to 5,6-dihydro compounds [
]. DPD catalysis involves electron transfer from NADPH to the substrate via the Fe4-S4 centre and FAD. In mammals, this pathway produces the neurotransmitter beta-alanine.
Analysis of the Brix (biogenesis of ribosomes in Xenopus) protein leaded to the identification of a region of 150-180 residues length, called the Brix
domain, which is found in six protein families: one archaean family (I) including hypothetical proteins (one per genome); and five eukaryote families, each named according to a representative member and including close homologues of this prototype: (II) Peter Pan (D. melanogaster) and SSF1/2 (S.cerevisiae); (III) RPF1 (S. cerevisiae); (IV) IMP4 (S. cerevisiae); (V) Brix (X.laevis) and BRX1 (S. cerevisiae); and (VI) RPF2 (S.cerevisiae).Typically, a protein sequence belonging to the Brix domain superfamily contains a highly charged N-terminal segment (about 50 residues) followed by a single copy of the Brix domain and another highly charged C-terminal region (about 100 residues). The archaean sequences have two unique characteristics: (1) the charged regions are totally absent at the N terminus and are reduced in number to about 10 residues at the C terminus; and (2) the C-terminal part of the Brix domain itself is minimal. Two eukaryote groups have large insertions within the C-terminal region: about 70 residues in the group III and about 120 in the group II. Biological data for some proteins in this family suggest a role in ribosome biogenesis and rRNA binding [
,
,
,
].
Transcription initiation factor TFIID is a multimeric protein complex that plays a central role in mediating promoter responses to various activators and repressors. The complex includes TATA binding protein (TBP) and various TBP-associated factors (TAFS). TFIID is a RNA polymerase II-specific TATA-binding protein-associated factor (TAF) that is essential for viability.This group represents a transcription initiation factor TFIID subunit 1 (TAF1, also known as cell cycle gene 1 protein) [
,
,
]. This is the largest subunit and the core scaffold of the complex, contains Ser/Thr kinase domains which can autophosphorylate or transphosphorylate other transcription factors including TP53, GTF2A1 and GTF2F1 [,
], and has acetyltransferase activity towards histones H3 and H4 []. It is essential for progression of the G1 phase of the cell cycle [].This entry represents the histone acetyltransferase domain (HAT, formerly known as DUF3591) that is found centrally in the protein TAF1 from eukaryotes. This region is highly conserved from yeast to human. X-ray determination of the crystal structure show it has a compact architecture that consists of a winged helix (WH) domain that folds on top of a triple barrel and a C-terminal α-helical region. The WH domain has intrinsic DNA-binding activity [
,
,
,
].
The proteins listed below share a common architecture with a protein kinase homology domain (see
) followed by an ~135-residue globular kinase-extension nuclease (KEN) domain made of eight helices [
]: Mammalian 2-5A-dependent RNase or RNase L (EC 3.1.26.-), an interferon-induced enzyme implicated in both the molecular mechanisms of interferon action and the fundamental control of RNA stability. 2-5A-dependent RNase is a unique enzyme in that it requires 2-5A, unusual oligoadenylates with 2',5'-phosphodiester linkages. RNase L is catalytically active only after binding to an unusual activator molecule containing a 5'-phosphorylated 2', 5'-linked oligoadenylate (2-5A), in the N-terminal half. RNase L consists of three domains, namely the N-terminal ankyrin repeat domain (see
), the protein kinase homology domain, and the C-terminal KEN domain [
,
,
].Eukaryotic Ire1/Ern1, an ancient transmembrane sensor of endoplasmic reticulum (ER) stress with dual protein kinase and ribonuclease activities. In response to ER stress Ire1/Ern1 catalyzes the splicing of target mRNAs in a spliceosome-independent manner. Ire1/Ern1 is a type 1 transmembrane receptor consisting of an N-terminal ER luminal domain, a transmembrane segment and a cytoplasmic region. The cytoplasmic region encompasses a protein kinase domain followed by a C-terminal KEN domain [
,
]. The dimerisation of the kinase domain activates the ribonuclease function of the KEN domain [
].
BRCA2 participates in homologous recombination-mediated repair of double-strand DNA breaks [
,
]. It stimulates the displacement of Replication protein A (RPA), the most abundant eukaryotic ssDNA binding protein []. Mutations that map throughout the BRCA2 protein are associated with breast cancer susceptibility []. BRCA2 is a large nuclear protein and its most conserved region is the C-terminal BRCA2DBD. BRCA2DBD binds ssDNA in vitro, and is composed of five structural domains, three of which are OB folds (OB1, OB2, and OB3). BRCA2DBD OB2 and OB3 are arranged in tandem, and their mode of binding can be considered qualitatively similar to two OB folds of RPA1, DBD-A and DBD-B (the major DBDs of RPA) []. This entry represents OB1, which consists of a highly curved five-stranded β-sheet that closes on itself to form a β-barrel. OB1 has a shallow groove formed by one face of the curved sheet and is demarcated by two loops, one between beta 1 and beta 2 and another between beta 4 and beta 5, which allows for weak single strand DNA binding. The domain also binds the 70-amino acid DSS1 (deleted in split-hand/split foot syndrome) protein, which was originally identified as one of three genes that map to a 1.5-Mb locus deleted in an inherited developmental malformation syndrome [
].
This domain can be found in proteins that have been implicated in telomere maintenance in Saccharomyces cerevisiae [
] and in meiotic chromosome segregation in Schizosaccharomyces pombe []. It can also be found in Mte1 (Mph1-associated telomere maintenance protein 1), human zinc finger protein ZGRF1 (C4ORF21) and fission yeast Dbl2 []. Mte1 is a D-loop-binding protein that interacts and stimulates the helicase and fork regression activities of Mph1 while inhibiting the ability of Mph1 to dissociate recombination intermediates. Mph1 and Mte1 interdependently colocalise at DNA damage-induced foci and dysfunctional telomeres. Mte1 is indicated to play a role in regulation of crossover recombination, response to replication stress, and telomere maintenance []. The fission yeast, Dbl2 is needed for cellular resistance to the topoisomerase I poison camptothecin, forms DNA damage-induced foci, and is needed for the optimal recruitment of Fml1 to DNA damage, while the human ZGRF1 protein has been linked to DNA cross-link repair and mutations of it have been found in a variety of human tumors []. ZGRF1 is a 5'-to-3'helicase that interacts with RAD51 and stimulates homologous recombination and, thus, promotes the repair of replication-blocking DNA lesions []. Having said that, there is no evidence to suggest that this domain is implicated in DNA damage resistance or for nuclear focus formation [].
The caspase recruitment domain (CARD domain) is a homotypic protein interaction module composed of a bundle of six α-helices. CARD is related in sequence and structure to the death domain (DD, see
) and the death effector domain (DED, see
), which work in similar pathways and show similar interaction properties [
]. The CARD domain typically associates with other CARD-containing proteins, forming either dimers or trimers. CARD domains can be found in isolation, or in combination with other domains. Domains associated with CARD include: NACHT () (in Nal1 and Bir1), NB-ARC (
) (in Apaf-1), pyrin/dapin domains (
) (in Nal1), leucine-rich repeats (
) (in Nal1), WD repeats (
) (in Apaf1), Src homology domains (
), PDZ (
), RING, kinase and DD domains [
].CARD-containing proteins are involved in apoptosis through their regulation of caspases that contain CARDs in their N-terminal pro-domains, including human caspases 1, 2, 9, 11 and 12 [
,
]. CARD-containing proteins are also involved in inflammation through their regulation of NF-kappaB []. The mechanisms by which CARDs activate caspases and NF-kappaB involve the assembly of multi-protein complexes, which can facilitate dimerisation or serve as scaffolds on which proteases and kinases are assembled and activated.
The PFU (for PLAA family ubiquitin binding domain) is an ubiquitin binding domain with no homology to several known ubiquitin binding domains (e.g., UIM, NZF, UBA, UEV, UBP, or CUE domains). The PFU domain appears to be unique to the PLAA family of proteins. A single member of this family of proteins exists in every eukaryotic species examined. Each of these homologues possesses identical domain structure: an N-terminal domain containing seven WD40 repeats, a central PFU domain, and a C-terminal PUL domain, which directly binds to Cdc48, a member of the AAA-ATPase family of molecular chaperone [
]. In addition to ubiquitin, the PFU domain of DOA1 has been shown to bind to the SH3 domain [].Secondary structure predictions of the PFU domain suggest the presence of an extensive length of β-sheet, N-terminal to an α-helical region [
].Some proteins known to contain a PFU domain include:
Saccharomyces cerevisiae DOA1 (UFD3, ZZZ4), involved in the ubiquitin conjugation pathway. DOA1 participates in the regulation of the ubiquitin conjugation pathway involving CDC48 by hindering multiubiquitination of substrates at the CDC48 chaperone.Schizosaccharomyces pombe ubiquitin homeostasis protein Lub1, acts as a negative regulator of vacuole-dependent ubiquitin degradation.Mammalian phospholipase A-2-activating protein (PLA2P, PLAA), the homologue of DOA1. PLA2P plays an important role in the regulation of specific inflammatory disease processes.
The biotin operon of Escherichia coli contains 5 structural genes involved in the synthesis of biotin. Transcription of the operon is regulated via one of these proteins, the biotin ligase BirA. BirA is an asymetric protein with 3 specific domains - an N-terminal DNA-binding domain, a central catalytic domain and a C-terminal of unknown function. The ligase reaction intermediate, biotinyl-5'-AMP, is the co-repressor that triggers DNA binding by BirA.
The α-helical N-terminal domain of the BirA protein has the helix-turn-helix structure of DNA-binding proteins with a central DNA recognition helix. BirA undergoes several conformational changes related to repressor function and the N-terminal DNA-binding function is connected to the rest of the molecule through a hinge which will allow relocation of the domains during the reaction []. Biotin-binding causes a large structural change thought to facilitate ATP-binding.Two repressor molecules form the operator-repressor complex, with dimer formation occuring simultaneously with DNA binding. DNA-binding may also cause a conformational change which allows this co-operative interaction. In the dimer structure, the β-sheets in the central domain of each monomer are arranged side-by-side to form a single, seamless β-sheet. The apparent orthologs among the eukaryotes are larger proteins that contain a domain with high sequence homology to BirA.
This entry represents the catalytic domain of Serine/threonine-protein kinase TOR (target of rapamycin), which was first identified by mutations in yeast that confer resistance to the growth inhibitory properties of rapamycin [
]. TOR proteins are structurally and functionally conserved in all eukaryotes examined. However, yeasts contain two Tor proteins (Tor1 and Tor2), while higher eukaryotes such as humans possess a single TOR protein []. They are central regulators of cellular metabolism, growth and survival in response environmental signals [,
,
]. The catalytic domain is similar to that of phosphoinositide 3-kinase (PI3K) [,
].In budding yeast, the Tor2 protein exists in two distinct multi-component complexes, TORC1 and TORC2. TORC1 regulates cell growth by regulating many growth-related processes and is rapamycin sensitive, while TORC2 regulates the cell cytoskeleton and is rapamycin insensitive. Budding yeast TORC1 consists of either Tor1 or Tor2 in complex with Kog1, Lst8 and Tco89, while TORC2 is composed of Avo1, Avo2, Tsc11, Lst8, Bit61, Slm, Slm2 and Tor2 [
,
]. In both yeast and mammals, FKBP12-rapamycin binds to Tor (Tor1, Tor2, or mTOR) in TORC1, but not to Tor (Tor2 or mTOR) in TORC2. It has been suggested that the architecture of TORC2 or its unique composition might be responsible for the observed rapamycin resistance [].
Histones mediate DNA organisation and play a dominant role in regulating eukaryotic transcription. The histone-fold consists of a core of three helices, where the long middle helix is flanked at each end by shorter ones. The histone fold is a structural element that facilitates heterodimerisation [
,
,
]. Proteins displaying this structure include the nucleosome core histones, which form octomers composed of two copies of each of the four histones, H2A, H2B, H3 and H4; archaeal histone, which possesses only the core domain part of eukaryotic histone; and the TATA-box binding protein (TBP)-associated factors (TAF), where the histone fold is a common motif for mediating TAF-TAF interactions. TAF proteins include TAF(II)18 and TAF(II)28, which form a heterodimer, TAF(II)42 and TAF(II)62, which form a heterotetramer similar to (H3-H4)2, and the negative cofactor 2 (NC2) alpha and beta chains, which form a heterodimer. The TAF proteins are a component of transcription factor IID (TFIID), along with the TBP protein. TFIID forms part of the pre-initiation complex on core promoter elements required for RNA polymerase II-dependent transcription. The TAF subunits of TFIID mediate transcriptional activation of subsets of eukaryotic genes. The NC2 complex mediates the inhibition of TATA-dependent transcription through interactions with TBP.
Soluble N-ethylmaleimide attachment protein receptor (SNARE) proteins are a family of membrane-associated proteins characterised by an α-helical coiled-coil domain called the SNARE motif [
]. These proteins are classified as v-SNAREs and t-SNAREs based on their localisation on vesicle or target membrane; another classification scheme defines R-SNAREs and Q-SNAREs, as based on the conserved arginine or glutamine residue in the centre of the SNARE motif. SNAREs are localised to distinct membrane compartments of the secretory and endocytic trafficking pathways, and contribute to the specificity of intracellular membrane fusion processes.The t-SNARE domain consists of a 4-helical bundle with a coiled-coil twist. The SNARE motif contributes to the fusion of two membranes. SNARE motifs fall into four classes: homologues of syntaxin 1a (t-SNARE), VAMP-2 (v-SNARE), and the N- and C-terminal SNARE motifs of SNAP-25. It is thought that one member from each class interacts to form a SNARE complex.The SNARE motif represented in this entry is found in the N-terminal domains of certain syntaxin family members: syntaxin 1a, which is required for neurotransmitter release[
], syntaxin 6, which is found in endosomal transport vesicles [], yeast Sso1p [], and Vam3p, a yeast syntaxin essential for vacuolar fusion []. The SNARE motifs in these proteins share structural similarity, despite having a low level of sequence similarity.
This entry represents Spo11, a meiotic recombination protein found in eukaryotes, and subunit A of topoisomerase VI, a type IIB topoisomerase found predominantly in archaea [
,
,
,
]. These two types of proteins share structural homology.DNA topoisomerases regulate the number of topological links between two DNA strands (i.e. change the number of superhelical turns) by catalysing transient single- or double-strand breaks, crossing the strands through one another, then resealing the breaks. They can be divided into two classes: type I enzymes (
, topoisomerases I, III and V) break single-strand DNA, and type II enzymes (
, topoisomerases II, IV and VI) break double-strand DNA [
]. Topoisomerase VI is a type IIB enzymes that assembles as a heterotetramer, consisting of two A subunits required for DNA cleavage and two B subunits required for ATP hydrolysis. The B subunit is structurally similar to the ATPase domain of type IIA topoisomerases, but the A subunit is distinct, and instead shares homology with the Spo11 protein. Spo11 is a meiosis-specific protein that is responsible for the initiation of recombination through the formation of DNA double-strand breaks by a type II DNA topoisomerase-like activity. Spo11 acts in conjunction with several other proteins, including Rec102 in yeast, to bring about meiotic recombination [
].
The PC-Esterase family [
] is comprised of Cas1p, the Homo sapiens C7orf58, Arabidopsis thaliana PMR5 and a group of plant freezing resistance/cold acclimatization proteins typified by Arabidopsis thaliana ESKIMO1 (also known as XOAT1) [,
], animal FAM55D proteins, and animal FAM113 proteins. The PC-Esterase family has features that are both similar and different from the canonical GDSL/SGNH superfamily []. The members of this family are predicted to have Acyl esterase activity and predicted to modify cell-surface biopolymers such as glycans and glycoproteins [,
]. The Cas1p protein has a Cas1_AcylT domain, in addition, with the opposing acyltransferase activity []. The C7orf58 family has a ATP-Grasp domain fused to the PC-Esterase and is the first identified secreted tubulin-tyrosine ligase like enzyme in eukaryotes []. The plant family with PMR5, XOAT1, TBL3 etc have an N-terminal C rich potential sugar binding domain followed by the PC-Esterase domain [].In Arabidopsis, XOAT1 catalyzes the 2-O-acetylation of xylan, followed by nonenzymatic acetyl migration to the O-3 position, resulting in products that are monoacetylated at both O-2 and O-3 positions [
,
]. Its role is essential for the production of functional xylem vessels [,
]. It functions as a negative regulator of cold acclimation, and mutations in the ESK1 gene provides strong freezing tolerance [].
Many bacterial transcription regulation proteins bind DNA through a
'helix-turn-helix' (HTH) motif. One major subfamily of these proteins [,
] is related to the arabinose operon regulatory protein AraC [
,
. Except for celD [
], all of these proteins seem to be positive transcriptional factors.Although the sequences belonging to this family differ somewhat in length, in nearly every case the HTH motif is situated towards the C terminus in the third quarter of most of the sequences. The minimal DNA binding domain spans roughly 100 residues and comprises two HTH subdomains; the classical HTH domain and another HTH subdomain with similarity to the classical HTH domain but with an insertion of one residue in the turn-region. The N-terminal and central regions of these proteins are presumed to interact with effector molecules and may be involved in dimerisation [
].The known structure of MarA (
) shows that the AraC domain is alpha helical and shows the two HTH subdomains both bind the major groove of the DNA. The two HTH subdomains are separated by only 27
angstroms, which causes the cognate DNA to bend.This entry represents the conserved region covering the first HTH domain and the putative second HTH domain within the AraC binding region.
This entry represents a group of Gram-negative bacterial proteins that form part of the type VI pathogenicity secretion system (T6SS), including TssF [
]. TssF is an essential baseplate component of the type VI secretion system. TssF is a homologue of phage tail proteins and is required for proper assembly of the Hcp tube (the T6SS inner tube) in bacteria [].The cryoEM structure of a TssK-TssF-TssG complex revealed that TssF comprises 3 domains; an α-helical N-terminal domain, a central domain made up of three β-barrels and a C-terminal four-stranded mixed β-sheet packed against α-helices via a hydrophobic core [
]. TssF and TssG assembled into a heterotrimer with 2:1 stoichiometry [,
].The type VI secretion system (T6SS) is a supra-molecular bacterial complex that resembles phage tails. It is a toxin delivery systems which fires toxins into target cells upon contraction of its TssBC sheath [
]. Thirteen essential core proteins are conserved in all T6SSs: the membrane associated complex TssJ-TssL-TssM, the baseplate proteins TssE, TssF, TssG, and TssK, the bacteriophage-related puncturing complex composed of the tube (Hcp), the tip/puncturing device VgrG, and the contractile sheath structure (TssB and TssC). Finally, the starfish-shaped dodecameric protein, TssA, limits contractile sheath polymerization at its distal part when TagA captures TssA [].
Phosphotransferase system, HPr histidine phosphorylation site
Type:
PTM
Description:
The phosphoenolpyruvate-dependent sugar phosphotransferase system (PTS) is a major carbohydrate transport system in bacteria. The PTS catalyses the phosphorylation of incoming sugar substrates concomitant with their translocation across the cell membrane. The general mechanism of the PTS is as follows: a phosphoryl group from phosphoenolpyruvate (PEP) is transferred to Enzyme I (EI) of the PTS which in turn transfers it to a phosphoryl carrier protein (HPr). Phospho-HPr then transfers the phosphoryl group to a sugar-specific permease complex (enzymes EII/EIII).HPr is a small cytoplasmic protein, which in some bacteria is found as a domain in a larger protein that includes a EIII(Fru)(IIA) domain and in some cases also a EI domain
. HPr is a component of the phosphoenolpyruvate-dependent sugar phosphotransferase system (PTS) major carbohydrate transport system in bacteria [
,
].There is a conserved histidine in the N terminus of HPr which serves as an acceptor for the phosphoryl group of EI. In the central part of HPr there is a conserved serine, which in Gram-positive bacteria only, is phosphorylated by an
ATP-dependent protein kinase; a process which probably plays a regulatory role in sugar transport ().
The sequence around both phosphorylation sites are well conserved and can be used as signature patterns for HPr proteins or domains. This signature identifies the histidine phosphorulation site.
Phosphotransferase system, HPr serine phosphorylation site
Type:
PTM
Description:
The phosphoenolpyruvate-dependent sugar phosphotransferase system (PTS) is a major carbohydrate transport system in bacteria. The PTS catalyzes the
phosphorylation of incoming sugar substrates concomitant with theirtranslocation across the cell membrane. The general mechanism of the PTS is
as follows: a phosphoryl group from phosphoenolpyruvate (PEP) is transferredto Enzyme I (EI) of the PTS which in turn transfers it to a phosphoryl carrier
protein (HPr). Phospho-HPr then transfers the phosphoryl group to a sugar-specific permease complex (enzymes EII/EIII).HPr is a small cytoplasmic protein, which in some bacteria is found as a domain in a larger protein that includes a EIII(Fru)
(IIA) domain and in some cases also a EI domain . HPr is a
component of the phosphoenolpyruvate-dependent sugar phosphotransferasesystem (PTS) major carbohydrate transport system in bacteria [
,
].There is a conserved histidine in the N terminus of HPr
, which serves as an acceptor for
the phosphoryl group of EI. In the central part of HPr there is a conserved serine, which in Gram-positive bacteria only, is phosphorylated by anATP-dependent protein kinase; a process which probably plays a regulatory role in sugar
transport.The sequence around both phosphorylation sites are well conserved and can be used as signature patterns for HPr proteins or domains. This signature identifies the serine phosporylation site.
Several extracellular heparin-binding proteins involved in regulation of growth and differentiation belong to a new family of growth factors. These growth factors are highly related proteins of about 140 amino acids that contain 10 conserved cysteines probably involved in disulphide bonds, and include pleiotrophin [
] (also known as heparin-binding growth-associated molecule HB-GAM, heparin-binding growth factor 8 HBGF-8, heparin-binding neutrophic factor HBNF and osteoblast specific protein OSF-1); midkine (MK) []; retinoic acid-induced heparin-binding protein (RIHB) []; and pleiotrophic factors alpha-1 and -2 and beta-1 and -2 from Xenopus laevis, the homologues of midkine and pleiotrophin respectively. Pleiotrophin is a heparin-binding protein that has neurotrophic activity and has mitogenic activity towards fibroblasts. It is highly expressed in brain and uterus tissues, but is also found in gut, muscle and skin. It is thought to possess an important brain-specific function. Midkine is a regulator of differentiation whose expression is regulated by retinoic acid, and, like pleiotrophin, is a heparin-binding growth/differentiation factor that acts on fibroblasts and nerve cells.Pleiotrophin is structurally divided into two domains, both domains consisting of three antiparallel β-strands, but the C-terminal domain has a long flexible hairpin loop where a heparin-binding consensus sequence is located [
]. This superfamily represents the C-terminal domain of pleiotrophin and midkine.
This SCP-like extracellular protein domain is found in cysteine-rich secretory proteins (CRISPs). Involvement of CRISP in response to pathogens, fertilization, and sperm maturation have been proposed [
,
,
]. One member, Tex31 from the venom duct of Conus textile, has been shown to possess proteolytic activity sensitive to serine protease inhibitors []. SCP has also been proposed to be a Ca2 chelating serine protease. The Ca2-chelating function would fit with various signaling processes that members of this family, such as the CRISPs, are involved in, and is supported by sequence and structural evidence of a conserved pocket containing two histidines and a glutamate. It also may explain how helothermine, a toxic peptide secreted by the beaded lizard, blocks Ca++ transporting ryanodine receptors []. One member, DE or CRISP-1, has been shown to mediate gamete fusion by binding to the egg surface; a sequence motif in the SCP domain plays a role in that binding [].The SCP domain is also known as CAP domain [
]. The wider family of SCP containing proteins includes plant pathogenesis-related protein 1 (PR-1), CRISPs, mammalian cysteine-rich secretory proteins, which combine SCP with a C-terminal cysteine rich domain, and allergen 5 from vespid venom. It has been proposed that SCP domains may function as endopeptidases.
The proteins listed below share a common architecture with a protein kinase homology domain (see
) followed by an ~135-residue globular kinase-extension nuclease (KEN) domain made of eight helices [
]: Mammalian 2-5A-dependent RNase or RNase L (EC 3.1.26.-), an interferon-induced enzyme implicated in both the molecular mechanisms of interferon action and the fundamental control of RNA stability. 2-5A-dependent RNase is a unique enzyme in that it requires 2-5A, unusual oligoadenylates with 2',5'-phosphodiester linkages. RNase L is catalytically active only after binding to an unusual activator molecule containing a 5'-phosphorylated 2', 5'-linked oligoadenylate (2-5A), in the N-terminal half. RNase L consists of three domains, namely the N-terminal ankyrin repeat domain (see
), the protein kinase homology domain, and the C-terminal KEN domain [
,
,
].Eukaryotic Ire1/Ern1, an ancient transmembrane sensor of endoplasmic reticulum (ER) stress with dual protein kinase and ribonuclease activities. In response to ER stress Ire1/Ern1 catalyzes the splicing of target mRNAs in a spliceosome-independent manner. Ire1/Ern1 is a type 1 transmembrane receptor consisting of an N-terminal ER luminal domain, a transmembrane segment and a cytoplasmic region. The cytoplasmic region encompasses a protein kinase domain followed by a C-terminal KEN domain [
,
]. The dimerisation of the kinase domain activates the ribonuclease function of the KEN domain [
].
Muscarinic acetylcholine receptors are members of rhodopsin-like G-protein coupled receptor family. They play several important roles; they mediate many of the effects of acetylcholine in the central and peripheral nervous system and modulate a variety of physiological functions, such as airway, eye and intestinal smooth muscle contraction, heart rate and glandular secretions. The receptors have a widespread tissue distribution and are a major drug target in human disease. They may be effective therapeutic targets in Alzheimer's disease, schizophrenia, Parkinson's disease and chronic obstructive pulmonary disease [
,
]. There are five muscarinic acetylcholine receptor subtypes, designated M1-5 [
,
,
,
,
]. The family can be further divided into two broad groups based on their primary coupling to G-proteins. M2 and M4 receptors couple to the pertussis-toxin sensitive Gi proteins, whereas M1, M3 and M5 receptors couple to Gq proteins [,
], which activate phospholipase C. The different subtypes can also couple to a wide range of diverse signalling pathways, some of which are G protein-independent [,
,
].All subtypes seem to serve as autoreceptors [
], and knockout mice reveal the important neuromodulatory role played by this receptor family [,
,
].This entry represents the muscarinic acetylcholine receptor family.
Type III secretion system virulence factor YopR, core domain
Type:
Homologous_superfamily
Description:
This superfamily entry represents a type III secretion system regulator, the secreted virulence factors YopR (Yersinia outer protein R), characterised in Yersinia pestis. Yersinia employs a type III secretion system (T3SS) to secrete and translocate virulence factors into to the cytoplasm of mammalian host cells. YopR, encoded by the YscH (Yersinia secretion H) gene. This Yop protein is unusual in that it is released to the extracellular environment rather than injected directly into the target cell as are most Yop proteins [
,
].The YopR core domain, predominantly found in the Gammaproteobacteria virulence factor YopR, is composed of five α-helices, four of which are arranged in an antiparallel bundle. Little is known about this domain, though it may contribute to the virulence of the protein YopR [
]. YopR controls the selective access of early (YscF, YscI and YscP) substrates to the type III secretion machines of yersiniae and other Gammaproteobacteriae. YopR is a mobile regulatory component thought to function as a checkpoints probing the completion of discrete intermediary stages in the assembly of the type III injection pathway. The location of secreted YopR (into the medium) is directly controlling the secretion of YscF, the polymerised needle protein (
) thereby impacting the assembly of type III machines [
].
A number of C2H2-zinc finger proteins contain a highly conserved N-terminal motif termed the SCAN (named after SRE-ZBP, CTfin51, AW-1 and Number 18 cDNA) domain. The SCAN domain has been shown to be able to mediate homo- and hetero-oligomerisation [
]. These proteins can either activate or repress transcription, although isolated recombinant SCAN domains do not modulate significantly the transcription [,
]. In addition to these zinc finger transcription factors, an isolated SCAN domain without adjacent zinc finger motifs has been identified in some proteins [,
]. It has been noted that the SCAN domain resembles a domain-swapped version of the C-terminal domain of the HIV capsid protein []. The SCAN domain is enriched in hydrophobic and negatively charged residues with a L-X(6)-L motif at its core. This core is flanked by A, E, L, M, H and C residues that are frequently found in α-helices [
]. Predictions of the secondary structure of the domain suggest the presence of at least three α-helices that are separated from one another by short looped regions bounded by proline residues []. It has been shown to be a selective oligomerization domain that mediates homotypic and heterotypic interactions between SCAN box containing proteins [,
].
Temperature downshift produces a number of changes in cellular physiology including decreased membrane fluidity, reduced mRNA transcription and translation due to the stabilisation of secondary structures, inefficient folding of some proteins, and reduced enzyme activity [
]. In response to this, bacteria produce a set of proteins, known as the cold-shock proteins (CSPs), to counteract these harmful effects.This entry represents a family of small CSPs consisting of CspA, one of the first cold-shock proteins identified, and its homologues. Note that while some members of this family are induced during cold-shock, some are either constitutively expressed or induced by other stresses such as nutrient starvation. While information about the physiological functions of the CSPs is limited, their structural properties have been well studied [
,
,
,
]. These proteins have a five-stranded β-barrel structural fold and are part of the wider oligonucleotide/oligosaccharide-binding (OB family). They preferentially bind pyrimidine-rich regions of single-stranded RNA and DNA with high affinity, but not double-stranded RNA or DNA. Thus it is postulated that CSPs may act as RNA chaperones by destabilising RNA secondary structures. Experimental evidence suggests that they bind mRNA and regulate ribosomal translation, rate of mRNA dergadation and termination of transcription; functions that are important during normal growth as well as cold shock.
Paramyxovirinae P phosphoprotein C-terminal domain
Type:
Domain
Description:
The subfamily Paramyxovirinae of the family Paramyxoviridae now contains as main genera the Rubulaviruses, avulaviruses, respiroviruses, Henipavirus-es and morbilliviruses. Protein P is the best characterised, structurally of the replicative complex of N, P and L proteins and consists of two functionally distinct moieties, an N-terminal PNT, and a C-terminal PCT [
]. The P protein is an essential part of the viral RNA polymerase complex formed from the P and L proteins []. P protein plays a crucial role in the enzyme by positioning L onto the N/RNA template through an interaction with the C-terminal domain of N. Without P, L is not functional.The C-terminal part of P (PCT) is only functional as an oligomer and forms with L the polymerase complex. PNT is poorly conserved and unstructured in solution while PCT contains the oligomerisation domain (PMD) that folds as a homotetrameric coiled coil containing the L binding region and a C-terminal partially folded domain, PX (residues 474 to 568), identified as the nucleocapsid binding site. Interestingly, PX is also expressed as an independent polypeptide in infected cells. PX has a C-subdomain (residues 516 to 568) that consists of three {alpha}-helices arranged in an antiparallel triple-helical bundle linked to an unfolded flexible N-subdomain (residues 474 to 515).
This entry represents Spo11, a meiotic recombination protein found in eukaryotes, and subunit A of topoisomerase VI, a type IIB topoisomerase found predominantly in archaea [
,
,
,
]. These two types of proteins share structural homology.DNA topoisomerases regulate the number of topological links between two DNA strands (i.e. change the number of superhelical turns) by catalysing transient single- or double-strand breaks, crossing the strands through one another, then resealing the breaks. They can be divided into two classes: type I enzymes (
, topoisomerases I, III and V) break single-strand DNA, and type II enzymes (
, topoisomerases II, IV and VI) break double-strand DNA [
]. Topoisomerase VI is a type IIB enzymes that assembles as a heterotetramer, consisting of two A subunits required for DNA cleavage and two B subunits required for ATP hydrolysis. The B subunit is structurally similar to the ATPase domain of type IIA topoisomerases, but the A subunit is distinct, and instead shares homology with the Spo11 protein. Spo11 is a meiosis-specific protein that is responsible for the initiation of recombination through the formation of DNA double-strand breaks by a type II DNA topoisomerase-like activity. Spo11 acts in conjunction with several other proteins, including Rec102 in yeast, to bring about meiotic recombination [
].
DNA glycosylases excise damaged or unconventional bases in DNA and initiate the base excision repair (BER) pathway to maintain genomic integrity. Of these, uracil-DNA glycosylase (UDG) excises uracil from DNA by cleavage of the N-glycosidic bond between uracil and deoxyribose sugar. Uracil residues in DNA arise as a result of deamination of cytosine or incorporation of dUMP by DNA polymerase. Generally, the latter event is kept to a minimum by the presence of dUTPase which helps maintain a low level of dUTP. UDGs are inhibited by free uracil and some of its derivatives. Another category of UDG inhibitors is represented by Bacillus subtilis phage PBS1 and PBS2 encoded inhibitor protein Ugi and phage T5 induced proteins. PBS2 Ugi is a heat stable, acidic, low molecular weight protein of 9.5kDa, which interacts with UDG in a 1:1 molar stoichiometry to form a complex that does not dissociate under physiological conditions.
This entry is represents the structural domain of the uracil-DNA glycosylase inhibitor protein from Bacillus phage PBS2 [
,
,
]. The structure of the Escherichia coli uracil-DNA glycosylase complex with uracil-DNA glycosylase inhibitor (Ugi) protein from Bacillus phage PBS2 has been determined, Ugi folds into an α-β-alpha sandwich structure formed by five-stranded antiparallel β-strands and two helices [].
Helicobacter pylori (Campylobacter pylori)
clinical isolates can be classified into two types according to their degree of pathogenicity. Type I strains are associated with a severe disease pathology, express functional VacA (vacuolating cytotoxin A) and contain an insertion of 40 kb of foreign DNA: the cag (cytotoxin-associated gene) pathogenicity island (cagPAI). Type II strains lack the 40 kb insert, cagPAI. The cagPAI may be divided into two regions, cag I and cag II and contain approximately 16 and 15 genes, respectively.
The cagPAI encodes a type IV secretion system (T4SS), which delivers CagAinto the cytosol of gastric epithelial cells
through a rigid needle structure covered by Cag7 or CagY, a VirB10-homologous protein, and CagT, a VirB7-homologous protein, at the base [
]. The CagA protein is the virulence factor that induces morphological changes in host cells, which may be associated with the development of peptic ulcer and gastric carcinoma [
].CagZ is a 23kDa protein consisting of a single compact L-shaped domain, composed of seven α-helices that run antiparallel to each other and 70% of the residues are in α-helix conformation: no β-sheet is present. CagZ is essential for the translocation of the pathogenic protein CagA into host cells [
].
The archease superfamily of proteins are represented in all three domains of life. Archease genes are generally located adjacent to genes encoding proteins involved in DNA or RNA processing and therefore have been predicted to be modulators or chaperones involved in DNA or RNA metabolism. Many of the roles of archeases remain to be established experimentally. The function of one of the archeases from the hyperthermophile Pyrococcus abyssi has been determined. The gene encoding the archease (PAB1946) is located in a bicistronic operon immediately upstream from a second open reading frame (PAB1947), which encodes a tRNA m5C methyltransferase. The methyl transferase catalyses m5C formation at several cytosine's within tRNAs with preference for C49; the specificity of the methyltransferase reaction being increased by the archease. The archease protects the tRNA (cytosine-5-)-methyltransferase PAB1947 against aggregation and increases its specificity. The archease exists in monomeric and oligomeric states, with only the oligomeric forms able to bind the methyltransferase [
].The function of this family of archeases as chaperones is supported by structural analysis of
from Methanobacterium thermoautotrophicum, which shows homology to heat shock protein 33, which is a chaperone protein that inhibits the aggregation of partially denatured proteins [
].This entry represents the archaeal form of archeases.
In eukaryotes, many essential secreted proteins and peptide hormones are
excised from larger precursors by members of a class of calcium-dependentendoproteinases, the prohormone-proprotein convertases (PCs).The P (known as
such because it is essential for proteolytic activity), or Homo B, domain of~150 residues is a distinctive characteristic of members of the proprotein
convertase family. The P/Homo B domain appears to be necessary to both foldand maintain the subtilisin-like active catalytic module and to regulate its
specialized features of calcium and more acidic pH dependence [,
,
,
].The core of the P/Homo B domain consists of a jelly roll-like fold with eight
β-strands. Nonbonded interactions between the catalyticand P/Homo B domains provide additional structural stabilization of the
catalytic domains, which alone appear to be thermodynamically unstable [,
].Most of the PC family members contained the cognate integrin binding RGD sequence within the middle of the P domain [
], which may be requiredfor intracellular compartmentalization and maintenance of enzyme stability within the ER. The integrity of the RGD sequence of proprotein convertase PC1 is critical for its zymogen and C-terminal processing and for its cellular trafficking [
,
]. The carboxy-terminal tail provides uniqueness to each PC family member being the least conserved region of all convertases [].
NADPH oxidase is a multisubunit complex that assembles during phagocytosis to generate reactive oxygen species. NCF2 (p67-phox), NCF1 (p47-phox) are required for activation of the phagocyte NADPH oxidase. The activators p67-phox and p47-phox form a ternary complex together with adaptor protein p40-phox, which participates in activation of the phagocyte oxidase by regulating membrane recruitment of p67-phox and p47-phox [
,
].A PB1 domain is present in p67-phox. p40-phox associates with p67-phox via binding of the p40-phox PC motif to the p67-phox PB1 domain [
].PB1 domain is a modular domain mediating specific protein-protein interactions which play a role in many critical cell processes. A canonical PB1-PB1 interaction, which involves heterodimerization of two PB1 domains, is required for the formation of macromolecular signaling complexes ensuring specificity and fidelity during cellular signaling. The interaction between two PB1 domain depends on the type of PB1. There are three types of PB1 domains: type I which contains an OPCA motif, acidic aminoacid cluster, type II which contains a basic cluster, and type I/II which contains both an OPCA motif and a basic cluster. Interactions of PB1 domains with other protein domains have been described as noncanonical PB1-interactions [
]. The p67-phox proteins contain a type II PB1 domain.
This entry represents the RNA recognition motif (RRM) domain of two structurally related heterogenous nuclear ribonucleoproteins, CIRBP (also known as CIRP) and RBM3, both of which belong to a highly conserved cold shock proteins family, characterised by an N-terminal RNA-binding domain and a C-terminal arginine-glycine-rich (RGG) domain [
]. CIRBP and RBM3 are key factors during early development. They can be induced after exposure to a moderate cold-shock and other cellular stresses such as UV radiation and hypoxia. Despite the similarities between both CIRBP and RBM3 proteins, their biological functions are distinct []. CIRBP is involved in diverse cellular physiological processes, such as cell growth, senescence, and apoptosis. CIRBP has the capacity to bind RNAs, and to modulate them at the post-transcriptional level. For instance, upon UV irradiation, CIRP binds to the 3'-UTR of two stress-responsive transcripts, replication protein A (RPA) and thioredoxin (TRX), thereby stabilizing the bound mRNA and promoting their translation [
,
]. RBM3 has certain functions such as anti-apoptotic, cell proliferation enhancement, and a proto-oncogene function [
]. It can bind to and alter the translation of mRNA []. It associates with the spliceosome and is involved in splicing. It can modulate the translational process and enhances global protein translation. It is also involved in the regulation of miRNA expression [].
This entry represents the RNA recognition motif (RRM) of UHMK1. UHMK1, also termed kinase interacting with stathmin (KIS) or P-CIP2, is a serine/threonine protein kinase functionally related to RNA metabolism and neurite outgrowth. It contains an N-terminal kinase domain and a C-terminal RNA recognition motif (RRM), with high homology to the corresponding motif of the mammalian U2 small nuclear ribonucleoprotein auxiliary factor U2AF 65kDa subunit (U2AF65 or U2AF2) [
]. UHMK1 targets two key regulators of cell proliferation and migration, the cyclin-dependent kinase (CDK) inhibitor p27Kip1 and the microtubule-destabilizing protein stathmin [
]. It plays a critical role during vascular wound repair by preventing excessive vascular smooth muscle cell (VSMC) migration into the vascular lesion []. Moreover, UHMK1 may control cell migration and neurite outgrowth by interacting with and phosphorylating the splicing factor SF1, thereby probably contributing to the control of protein expression []. Furthermore, UHMK1 may be functionally related to microtubule dynamics and axon development. It localizes to RNA granules, interacts with three proteins found in RNA granules (KIF3A, NonO, and eEF1A), and further enhances the local translation []. UHMK1 is highly expressed in regions of the brain implicated in schizophrenia and may play a role in susceptibility to schizophrenia [,
].
This entry represents the RNA recognition motif 3 (RRM3) of TIA-1 and nucleolysin TIAR (TIA-1-related protein), proteins that share high sequence similarity.TIA-1 is the 40kDa isoform of T-cell-restricted intracellular antigen-1 (TIA-1, also known as Cytotoxic granule associated RNA binding protein TIA1), a cytotoxic granule-associated RNA-binding protein mainly found in the granules of cytotoxic lymphocytes [
,
]. TIA-1 regulate alternative pre-mRNA splicing by promoting the use of suboptimal 5' splice sites followed by uridine-rich intronic enhancer sequences. It can be phosphorylated by a serine/threonine kinase that is activated during Fas-mediated apoptosis, and functions as the granule component responsible for inducing apoptosis in cytolytic lymphocyte (CTL) targets [].TIAR is a cytotoxic granule-associated RNA-binding protein that shows high sequence similarity with TIA-1 [
]. TIAR plays a critical role in transcriptional and posttranscriptional regulation of gene expression []. It binds to target mRNA and DNA via its RNA recognition motif (RRM) domains and is involved in both splicing regulation and translational repression via the formation of stress granules []. TIAR is composed of three N-terminal highly homologous RNA recognition motifs (RRMs) and a glutamine-rich C-terminal auxiliary domain containing a lysosome-targeting motif. Its RRMs are involved in binding to U-rich RNA, but with unequal contributions. RRM2 of TIAR is the major RNA- and DNA-binding domain [].
This entry represents the RNA recognition motif 3 (RRM3) of heterogeneous nuclear ribonucleoprotein H3 (hnRNP H3). hnRNP H3 (also termed hnRNP 2H9) is a nuclear RNA binding protein that belongs to the hnRNP H protein family that also includes hnRNP H, hnRNP H2, and hnRNP F. This family is involved in mRNA processing and exhibit extensive sequence homology. Little is known about the functions of hnRNP H3 except for its role in the splicing arrest induced by heat shock [
,
]. The typical hnRNP H proteins contain contain three RNA recognition motifs (RRMs), except for hnRNP H3, in which the RRM1 is absent. RRM1 and RRM2 are responsible for the binding to the RNA at DGGGD motifs, and they play an important role in efficiently silencing the exon. Members in this family can regulate the alternative splicing of the fibroblast growth factor receptor 2 (FGFR2) transcripts, and function as silencers of FGFR2 exon IIIc through an interaction with the exonic GGG motifs. The lack of RRM1 could account for the reduced silencing activity within hnRNP H3. In addition, like other hnRNP H protein family members, hnRNP H3 has an extensive glycine-rich region near the C terminus, which may allow it to homo- or heterodimerize [
].
This entry represents the RNA recognition motif 2 (RRM2) of heterogeneous nuclear ribonucleoprotein H3 (hnRNP H3).hnRNP H3 (also termed hnRNP 2H9) is a nuclear RNA binding protein that belongs to the hnRNP H protein family that also includes hnRNP H, hnRNP H2, and hnRNP F. This family is involved in mRNA processing and exhibit extensive sequence homology. Little is known about the functions of hnRNP H3 except for its role in the splicing arrest induced by heat shock [
,
]. The typical hnRNP H proteins contain contain three RNA recognition motifs (RRMs), except for hnRNP H3, in which the RRM1 is absent. RRM1 and RRM2 are responsible for the binding to the RNA at DGGGD motifs, and they play an important role in efficiently silencing the exon. Members in this family can regulate the alternative splicing of the fibroblast growth factor receptor 2 (FGFR2) transcripts, and function as silencers of FGFR2 exon IIIc through an interaction with the exonic GGG motifs. The lack of RRM1 could account for the reduced silencing activity within hnRNP H3. In addition, like other hnRNP H protein family members, hnRNP H3 has an extensive glycine-rich region near the C terminus, which may allow it to homo- or heterodimerize [].
This entry represents the RNA recognition motif 2 (RRM2) of polypyrimidine tract-binding protein 1 (PTBP1) and related proteins. Proteins containing this domain include PTBP1/2/3 (or PTB1/2/3) and heterogeneous nuclear ribonucleoprotein L-like (HNRNPLL) proteins. PTB can shuttle between nucleus and cytoplasm. It is a splicing repressor factor implicated in the control of alternative exon selection during mRNA processing of many different transcribed genes, including its own pre-mRNA. It is also involved in internal ribosome entry site (IRES)-mediated translation initiation. It may also be involved in the 3'-end processing, localization, and stability of several mRNAs. Two PTB homologues, PTBP2 and PTBP3, are generally expressed in mammalian tissues. PTBP2 is expressed at high levels in adult brain, muscle and testis, while PTBP3 is expressed preferentially in haematopoietic cells [
]. PTB and PTBP2 bind to the same RNA sequences and have similar effects on alternative splicing events. However, differential expression of PTB and PTBP2 can lead to the generation of alternatively spliced mRNAs []. During neuronal differentiation, MicroRNA miR-124 downregulates PTBP1 expression, which in turn leads to upregulation of PTBP2. Later in development, the expression of PTBP2 decreases and this leads to a second wave of alternative splicing changes characteristic of adult brain and essential for brain development []. PTBP3 may be involved in nonsense-mediated mRNA decay [].
Transcription factors are protein molecules that bind to specific DNA
sequences in the genome, resulting in the induction or inhibition of genetranscription [
]. The ets oncogene is such a factor, possessing a region of 85-90 amino acids known as the ETS (erythroblast transformation specific) domain [
,
,
]. This domain is rich inpositively-charged and aromatic residues, and binds to purine-rich segments
of DNA. The ETS domain has been identified in other transcription factorssuch as PU.1, human erg, human elf-1, human elk-1, GA binding protein, and
a number of others [,
,
].It is generally localized at the C terminus of the protein,
with the exception of ELF-1, ELK-1, ELK-3, ELK-4 and ERF where it is found atthe N terminus.
NMR-analysis of the structure of the Ets domains revealed that it contains three α-helices (1-3)
and four-stranded β-sheets (1-4) arranged in the order α1-β1-β2-α2-α3-β3-β4 forming awinged helix-turn-helix (wHTH) topology [
]. The third α-helix isresponsive to contact to the major groove of the DNA. Different members of the Ets family proteins
display distinct DNA binding specificities. The Ets domains and the flanking amino acid sequencesof the proteins influence the binding affinity, and the alteration of a
single amino acid in the Ets domain can change its DNA binding specificities.
Glycoprotein hormones [
,
] (or gonadotropins) are a family of proteins, which include the mammalian hormones follitropin (FSH), lutropin (LSH), thyrotropin(TSH) placental chorionic gonadotropins hCG and eCG [
] and chorionic gonadotropin (CG), as well as at least two forms of fish
gonadotropins. These hormones are central to the complex endocrine system that regulates normal growth, sexual development,
and reproductive function []. The hormones LH, FSH and TSH are secretedby the anterior pituitary gland, while hCG and eCG are secreted by the
placenta []. All these hormones consist of two glycosylated chains (alpha
and beta). The alpha subunit is common to each protein dimer (well conserved within species, but differing between them [
]), and a unique beta subunit, which confers biological specificity [
].The alpha chains are highly conserved proteins of about 100 amino acid
residues which contain ten conserved cysteines all involved in disulphidebonds [
], as shown in the following schematic representation.+---------------------------+
+----------+| +-------------|--+| || | | |
xxxxCxCxxxxxxCxCCxxxxxxxxxxxxxCCxxxxxxxxxxCxCxxCx| | | |
+------|-----------------+ || |+----------------------------+
'C': conserved cysteine involved in a disulphide bond.Intracellular levels of free alpha subunits are greater than those of themature glycoprotein, implying that hormone assembly is limited by the
appearance of the specific beta subunits, and hence that synthesis of alphaand beta is independently regulated [
].
In Gram-negative bacteria, growth on methanol is dependent on the soluble, periplasmic quinoprotein methanol dehydrogenase, which oxidises methanol to formaldehyde. The electrons generated by this reaction are transferred from the reduced enzyme to the unusual cytochrome cL, which is subsequently oxidised itself by cytochrome c2 (also known as cytochrome cH), which then transfers the electrons to a membrane-bound cytochrome oxidase [
].This entry represents cytochrome cL (also known as cytochrome C551i in some species), whose amino acid sequence is distinct from that of other c-type cytochromes and does not fit into any established amino acid sequence class. Despite its lack of homology to other proteins, many of its properties, eg the low-spin haem prosthetic group, are similar to those of class 1 cytochrome c proteins. Other properties, such as its large size and acidic nature are distinct to cytochrome cL. The core of this protein has a structure typical of class I cytochrome c proteins, consisting of compact alpha helices enclosing the haem c prosthetic group with one edge of the haem exposed [
]. Unusually, there is a tightly bound calcium close to the haem group which is thought to help stabilise redox potential and may be involved in the transfer of electrons from methanol dehydrogenase to the haem group.
The transforming growth factor beta family of cytokines, are potent and multifunctional signalling molecules. Prior to ligand receptor binding there exist extracellular regulators that target these cytokines and facilitate the formation of morphogen gradients that control developmental processes. Some of these proteins that are known to sequester latent TGF-beta contains a conserved domain, the TGF-beta binding (TB) domain.The domain is characterised by 8 conserved cysteine residues, which include an unusual cysteine triplet [
,
]. The TB fold is globular with six β-strands and two α-helices [,
,
]. The pairing of the eight cysteines is 1-3, 2-6, 4-7, and 5-8, creating a fairly rigid structure. In follistatin and in the first repeat of fibrillin and LTBPs the last disulphide bridge is absent.Proteins containing a TB domain include:Vertebrate fibrillin-1, 2 and 3. Fibrillins form tissue-specific and temporally regulated microfibril networks. They are implicated in the regulation of TGF-beta signalling.Vertebrate latent TGF-beta binding proteins (LTBPs) 1, 2, 3 and 4 LTBPs regulate TGF-beta signalling by forming a latent complex with the cleaved TGF-beta proproteins.Vertebrate follistatin. It is an extracellular antagonist of various TGF-beta proteins. The TB domain of follistatin mimic a type I receptor of TGF-beta and binds TGF-beta which leads to the formation of a receptor-ligand-antagonist complex [
].
This entry represents the SH2 domain of EAT-2 (EWS-Fli1-activated transcript-2) and ERT (EAT-2-related transducer).The X-linked lymphoproliferative syndrome (XLP) gene encodes SAP (also called SH2D1A/DSHP) a protein that consists of a 5 residue N terminus, a single SH2 domain, and a short 25 residue C-terminal tail [
]. XLP is characterized by an extreme sensitivity to Epstein-Barr virus. Both T and natural killer (NK) cell dysfunctions have been seen in XLP patients. SAP binds the cytoplasmic tail of Signaling lymphocytic activation molecule (SLAM), 2B4, Ly-9, and CD84 []. SAP is believed to function as a signaling inhibitor, by blocking or regulating binding of other signaling proteins. SAP and the SAP-like protein EAT-2 recognize the sequence motif TIpYXX[VI], which is found in the cytoplasmic domains of a restricted number of T, B, and NK cell surface receptors and are proposed to be natural inhibitors or regulators of the physiological role of a small family of receptors on the surface of these cells [,
]. EAT-2 (also known as SH2 domain-containing protein 1B or SH2D1B) and ERT (also known as SH2 domain-containing protein 1C or SH2D1C), similar to SAP, are composed of a single SH2 domain and a short C terminus. They share approximately 50% amino acid identity with SAP [
].
Threonine peptidases are characterised by a threonine nucleophile at the N terminus of the mature enzyme. The threonine peptidases belong to clan PB or are unassigned, clan T-. The type example for this clan is the archaean proteasome beta component of Thermoplasma acidophilum.This entry represents Proteasome subunit beta from Streptomyces coelicolor and similar proteins of found predominantly in Actinobacteria [
]. PSB is a component of the proteasome core, a large protease complex with broad specificity involved in protein degradation. In Mycobacterium tuberculosis, the proteasome is able to cleave oligopeptides not only after hydrophobic but also after basic, acidic and small neutral residues []. In complex with the ATPase Mpa, it degrades protein targets conjugated to a prokaryotic ubiquitin-like protein (Pup) []. It is essential for persistence of the pathogen during the chronic phase of infection in the host [,
] and may also function to recycle amino acids under nutrient starvation, thereby enabling the cell to maintain basal metabolic activities []. It has been related to the resistance of this bacteria to antimicrobial antifolates via regulating both folate reduction and cytokinin production [].Members of this protein family are threonine peptidases belonging to MEROPS peptidase family T1 (clan PB(T)), subfamily T1A.
Rubella virus (RUBV) is the etiological agent of a disease known as rubella or German measles. RUBV, a positive sense, single-strand RNA virus, is the sole member of the Rubivirus genus in the Togaviridae family of animal viruses. The RUBV genome contains two open reading frames (ORFs) encoding five proteins after postranslational proteolytic processing; two proteins (P150 and P90) involved in viral replication are translated from the nonstructural protein ORF (NSP-ORF) and the three structural proteins that form the virus particle are translated from the structural protein ORF (ST-ORF). A cysteine protease domain situated at the C terminus of P150 and immediately upstream from the cleavage site within the NS-ORF precursor (P200) is responsible for self-cleavage of P200 into P150 and P90. This protease domain, termed the NS protease domain, is a papain-like protease with catalytic dyad of Cys and His. The NS protease domain contains an EF-hand Ca+2 binding motif, which plays a structural role in stabilising the protease at physiological temperatures, as well as a structural Zn+2 binding site coordinated by four cysteines, which is necessary for protease activity. The RUBV NS protease domain forms the MEROPS peptidase family C27 of clan CA [
,
,
,
].
This group of cysteine peptidases correspond to MEROPS peptidase family C30 (clan PA(C)). These peptidases are related to serine endopeptidases of family S1 and are restricted to coronaviruses, where they are involved in viral polyprotein processing during replication [
,
,
].This Coronavirus (CoV) domain, peptidase C30, is also known as 3C-like proteinase (3CL-pro), or CoV main protease (M-pro) domain and it is highly conserved among coronaviruses. CoV M-pro is a dimer where each subunit is composed of three domains I, II and III. Domains I and II consist of six-stranded antiparallel beta barrels [
] and together resemble the architecture of chymotrypsin, and of picornaviruses 3C proteinases. The substrate-binding site is located in a cleft between these two domains. The catalytic site is situated at the centre of the cleft. A long loop connects domain II to the C-terminal domain (domain III). This latter domain, a globular cluster of five helices, has been implicated in the proteolytic activity of M-pro. In the active site of M-pro, Cys and His form a catalytic dyad. In contrast to serine proteinases and other cysteine proteinases, which have a catalytic triad, there is no third catalytic residue present [,
,
,
]. Many drugs have been developed to inhibit CoV M-pro [,
].
NF-kappaB is a pleiotropic transcription factor present in almost all cell types. It is the endpoint of a series of signal transduction events that are initiated by a vast array of stimuli related to many biological processes such as inflammation, immunity, differentiation, cell growth, tumorigenesis and apoptosis. NF-kappaB is a homo- or heterodimeric complex formed by the Rel-like domain-containing proteins RelA/p65, RelB, NFKB1/p50, c-Rel and NFKB2/p52 [
]. Each individual NF-kappaB subunit, and perhaps each dimer, carries out unique functions in regulating transcription. Dimer-specific functions can be conferred by selective protein-protein interactions with other transcription factors, coregulatory proteins, and chromatin proteins [].RelB is a component of the NF-kappa-B RelB-p50 complex and the RelB-p52 complex [
]. It is best known for its roles in lymphoid development and noncanonical signalling [], but it seems to be linked to many distinct processes. Among others, RelB may have a role in suppressing inflammation [] and has been shown to negatively regulate osteoblast differentiation and bone formation [].This entry represents the N-terminal sub-domain of the Rel homology domain (RHD) of RelB, a potent transactivator in a heterodimer with NF-kappa B1 (p50) or B2 (p52). It is involved in the regulation of genes that play roles in inflammatory processes and the immune response [
,
].
Proteins in this entry contain glycosyl transferase family 2 domains which are responsible, generally, for the transfer of nucleotide-diphosphate sugars to substrates such as polysaccharides and lipids. The Acidithiobacillus ferrooxidans ATCC 23270 protein (AFE_0974) is encoded in the same locus as the genes for squalene-hopene cyclase (SHC,
) and other proteins associated with the biosynthesis of hopanoid natural products. Similarly, in Ralstonia eutropha (strain JMP134) (Alcaligenes eutrophus) this protein (Reut_B4902) is encoded adjacent to the genes for HpnAB, IspH and HpnH (
), although SHC itself is encoded elsewhere in the genome. Notably, this protein (here named HpnI) and three others form a conserved set (HpnIJKL) which occurs in a subset of all genomes containing the SHC enzyme. This relationship was discerned using the method of partial phylogenetic profiling [
]. This group includes Zymomonas mobilis, the organism where the initial hopanoid biosynthesis locus was described consisting of the genes HpnA-E and SHC (HpnF) []. Continuing past SHC are found genes for a phosphorylase enzyme (ZMO0873, i.e. HpnG, ) and another radical SAM enzyme (ZMO0874), HpnH. Although discontinuous in Z. mobilis, we continue the gene symbol sequence with HpnIJKL. Hopanoids are known to feature polar glycosyl head groups in many organisms.
The transforming growth factor beta family of cytokines, are potent and multifunctional signalling molecules. Prior to ligand receptor binding there exist extracellular regulators that target these cytokines and facilitate the formation of morphogen gradients that control developmental processes. Some of these proteins that are known to sequester latent TGF-beta contains a conserved domain, the TGF-beta binding (TB) domain.The domain is characterised by 8 conserved cysteine residues, which include an unusual cysteine triplet [
,
]. The TB fold is globular with six β-strands and two α-helices [,
,
]. The pairing of the eight cysteines is 1-3, 2-6, 4-7, and 5-8, creating a fairly rigid structure. In follistatin and in the first repeat of fibrillin and LTBPs the last disulphide bridge is absent.Proteins containing a TB domain include:Vertebrate fibrillin-1, 2 and 3. Fibrillins form tissue-specific and temporally regulated microfibril networks. They are implicated in the regulation of TGF-beta signalling.Vertebrate latent TGF-beta binding proteins (LTBPs) 1, 2, 3 and 4 LTBPs regulate TGF-beta signalling by forming a latent complex with the cleaved TGF-beta proproteins.Vertebrate follistatin. It is an extracellular antagonist of various TGF-beta proteins. The TB domain of follistatin mimic a type I receptor of TGF-beta and binds TGF-beta which leads to the formation of a receptor-ligand-antagonist complex [
].
Helicobacter pylori (Campylobacter pylori)
clinical isolates can be classified into two types according to their degree of pathogenicity. Type I strains are associated with a severe disease pathology, express functional VacA (vacuolating cytotoxin A) and contain an insertion of 40 kb of foreign DNA: the cag (cytotoxin-associated gene) pathogenicity island (cagPAI). Type II strains lack the 40 kb insert, cagPAI. The cagPAI may be divided into two regions, cag I and cag II and contain approximately 16 and 15 genes, respectively.
The cagPAI encodes a type IV secretion system (T4SS), which delivers CagA into the cytosol of gastric epithelial cells through a rigid needle structure covered by Cag7 or CagY, a VirB10-homologous protein, and CagT, a VirB7-homologous protein,
at the base []. The CagA protein is the virulence factor that induces morphological changes in host cells, which may be associated with the development of peptic ulcer and gastric carcinoma [
].CagZ is a 23kDa protein consisting of a single compact L-shaped domain, composed of seven α-helices that run antiparallel to each other and 70% of the residues are in α-helix conformation: no β-sheet is present. CagZ is essential for the translocation of the pathogenic protein CagA into host cells [
].
Peptidase S24 LexA-like proteins are involved in the SOS response leading to the repair of single-stranded DNA within the bacterial cell [
]. Proteins containing this domain includes LexA, MucA and UmuD. LexA () is the diverse family of bacterial transcription factors that repress genes in the cellular SOS response to DNA damage [
,
]. The LexA-like proteins contain two-domains: an N-terminal DNA binding domain and a C-terminal domain (CTD) that provides LexA dimerization as well as cleavage activity [
]. They undergo autolysis, cleaving at an Ala-Gly or a Cys-Gly bond, separating the DNA-binding domain from the rest of the protein []. The LexA, UmuD and MucD proteins interact with RecA, which activates self cleavage either derepressing transcription in the case of LexA [
] or activating the lesion-bypass polymerase in the case of UmuD and MucA. UmuD'2, is the homodimeric component of DNA pol V, which is produced from UmuD by RecA-facilitated self-cleavage. The first 24 N-terminal residues of UmuD are removed; UmuD'2 is a DNA lesion bypass polymerase [,
]. MucA [,
], like UmuD, is a plasmid encoded a DNA polymerase (pol RI) which is converted into the active lesion-bypass polymerase by a self-cleavage reaction involving RecA [].
A number of C2H2-zinc finger proteins contain a highly conserved N-terminal motif termed the SCAN (named after SRE-ZBP, CTfin51, AW-1 and Number 18 cDNA) domain. The SCAN domain has been shown to be able to mediate homo- and hetero-oligomerisation [
]. These proteins can either activate or repress transcription, although isolated recombinant SCAN domains do not modulate significantly the transcription [,
]. In addition to these zinc finger transcription factors, an isolated SCAN domain without adjacent zinc finger motifs has been identified in some proteins [,
]. It has been noted that the SCAN domain resembles a domain-swapped version of the C-terminal domain of the HIV capsid protein []. The SCAN domain is enriched in hydrophobic and negatively charged residues with a L-X(6)-L motif at its core. This core is flanked by A, E, L, M, H and C residues that are frequently found in α-helices [
]. Predictions of the secondary structure of the domain suggest the presence of at least three α-helices that are separated from one another by short looped regions bounded by proline residues []. It has been shown to be a selective oligomerization domain that mediates homotypic and heterotypic interactions between SCAN box containing proteins [,
].
This entry represents the C-terminal domain of C1q. C1q is a subunit of the C1 enzyme complex that activates the serum complement system. C1q comprises 6 A, 6 B and 6 C chains. These share the same topology, each possessing a small, globular N-terminal domain, a collagen-like Gly/Pro-rich central region, and a conserved C-terminal region, the C1q domain [
]. The C1qprotein is produced in collagen-producing cells and shows sequence and structural
similarity to collagens VIII and X [,
]. This domain is also found in multimerin and EMILIN proteins.The C-terminal globular domain of the C1q subcomponents and collagen types
VIII and X is important both for the correct folding and alignment of thetriple helix and for protein-protein recognition events [
]. For collagen type X it has been suggested that the domain is important for initiation and maintenance of the correct assembly of the protein []. The globular head is a trimer of C1q domains. Each individual C1q adopts a 10-strand Jelly-roll fold arranged in two antiparallel 5-stranded β-sheets []. There are two well conserved regions within the C1q domain: an aromatic motif is located within the first half of the domain, the other conserved region is located near the C-terminal extremity.
This entry represents the N-terminal SH3 domain found in the CRK family members [
]. CRK adaptor proteins consists of SH2 and SH3 domains, which bind tyrosine-phosphorylated peptides and proline-rich motifs, respectively. They function downstream of protein tyrosine kinases in many signaling pathways started by various extracellular signals, including growth and differentiation factors. Cellular CRK (c-CRK) contains a single SH2 domain, followed by N-terminal and C-terminal SH3 domains. It is involved in the regulation of many cellular processes including cell growth, motility, adhesion, and apoptosis. CRK has been implicated in the malignancy of various human cancers [
,
]. The N-terminal SH3 domain of CRK binds a number of target proteins including DOCK180, C3G, SOS, and cABL [
]. The CRK family includes two alternatively spliced protein forms, CRKI and CRKII, that are expressed by the CRK gene, and the CRK-like (CRKL) protein, which is expressed by a distinct gene (CRKL). CrkI lacks the regulatory phosphorylation site and C-terminal SH3 domain present in CrkII and CrkL [,
]. The N-terminal SH3 domain of CRKII has been shown to bind the recognizes proline-rich motifs (PRMs) found in cABL kinase; this interaction is involved in the regulation of cell spreading, microbial pathogenesis, and cancer metastasis [].
Gram-negative bacteria produce a number of proteins that are secreted into the growth medium by a mechanism by the type I secretion system that does not require a cleaved N-terminal signal sequence. These proteins, while having different functions, share two properties: they bind calcium and they contain a multiple tandem repeat of a nonapeptide [
]. The nonapeptide is found in a group of bacterial exported proteins that includes haemolysin, cyclolysin, leukotoxin and metallopeptidases belonging to MEROPS peptidase family M10 (clan MA(M)), subfamily 10B (serralysin).It has been suggested that the internally repeated domain of haemolysin may be involved in Ca-mediated binding to erythrocytes. It has been shown that such a domain is involved in the binding of calcium ions in a parallel beta roll structure [
].The
Bordetella pertussisadenylate cyclase toxin tertiary structure has been solved. The C-terminal RTX repeats are Asp-rich, which bind calcium ions as the protein moves from the low calcium intercellular environment to a higher extracellular concentration of calcium ions. The C-terminal assembly containing the RTX repeats has been shown to form beta rolls, which prevent backsliding of the protein in the type I sectretion system conduits and accelerating secretion of the large toxin [
].
This entry represents the caspase activation and recruitment domain (CARD) found in a group of proteins, including ASC (apoptosis-associated speck-like protein containing a CARD), caspase recruitment domain-containing protein 8 (CARD8) and NALP1 (CARD7, NLRP1). CARD domain containing proteins are known to be important in the signalling pathways for apoptosis, inflammation and host-defense mechanisms [
]. ASC is an adaptor molecule that mediates inflammatory and apoptotic signals. It has been found to recruit caspase-1 to several members of the nucleotide-binding domain and leucine-rich repeat (LRR)-containing proteins (NLRs), which functions as cytoplasmic sensors for invading bacteria and viruses in cells [
]. The inflammasome is a cytosolic complex that functions to specifically recruit and activate a downstream protease called caspase-1 (CASP1). NLRP1 is a sensor component of the NLRP1 inflammasome, which plays a crucial role in innate immunity and inflammation. It has been linked to a number of human pathologies, such as vitiligo, rheumatoid arthritis, and Crohn disease [
].CARD8 (also known as CARDINAL) regulates caspase-1 activation and apoptosis [
] and participates in NFkappaB activation []. It is part of the inflammasome, which is responsible for the activation of caspases 1 and 5, leading to the processing and secretion of the pro-inflammatory cytokines IL-1beta and IL-18 [].
The actin-depolymerising factor homology (ADF-H) domain is an ~150-amino acid motif that is present in three phylogenetically distinct classes of eukaryotic actin-binding proteins [,
,
]:ADF/cofilins, which include ADF, cofilin, destrin, actophorin, coactosin, depactin and glia maturation factors (GMFs) beta and gamma. ADF/cofilins are small actin-binding proteins composed of a single ADF-H domain. They bind both actin-monomers and filaments and promote rapid filament turnover in cells by depolymerising/fragmenting actin filaments. ADF/cofilins bind ADP-actin with higher affinity than ATP-actin and inhibit the spontaneous nucleotide exchange on actin monomersTwinfilins, which are actin monomer-binding proteins that are composed of two ADF-H domainsAbp1/Drebrins, which are relatively large proteins composed of an N-terminal ADF-H domain followed by a variable region and a C-terminal SH3 domain. Abp1/Drebrins interact only with actin filaments and do not promote filament depolymerisation or fragmentationAlthough these proteins are biochemically distinct and play different roles in actin dynamics, they all appear to use the ADF-H domain for their interactions with actin.The ADF-H domain consists of a six-stranded mixed β-sheet in which the four central strands (β2-β5) are antiparallel and the two edge strands (β1 and β6) run parallel with the neighbouring strands. The sheet is surrounded by two α-helices on each side [
,
,
].
The HD domain, named after the conserved doublet of predicted catalytic
residues, is found in a wide range of bacterial, archaeal and eukaryoticproteins. It defines a superfamily of phosphohydrolases that can catalyze both
metal-dependent and -independent phosphomonoesterase and phosphodiesterasereactions for a broad range of substrates [
,
]. The HD-domain proteins appear to be involved in nucleic acid and nucleotidemetabolism, signal transduction and possibly other functions. They are diverse
in terms of both domain architecture and phylogenetic distribution; each ofthe completely sequenced genomes encodes more than one version of this domain.
The HD domain is composed of a bundle of alpha helices with a 5-helix core. Although all HD domains share key design features, a
striking diversity of catalytic centres have been identified, containing nometal, mono-, bi- or trinuclear metal binding sites [
,
].The HD-related output domain (HDOD) is a protein domain of unknown function.
Proteins containing the HDOD are widespread in diverse bacteria; it can bepresent as a stand-alone domain, and also associated with other domains, such
as response regulatory (RR), GGDEF, andEAL, suggesting a role in regulation and signaling [
,
]. Proteins containing this domain include CdgJ from Vibrio cholerae serotype O1. CdgJ is a phosphodiesterase (PDE) that catalyzes the hydrolysis of cyclic diguanylate (c-di-GMP) [].
Members of the Pumilio family of proteins (Puf) regulate translation and mRNA
stability in a wide variety of eukaryotic organisms including mammals, flies,worms, slime mold, and yeast [
]. Pumilio family members are characterised by the presence of eight tandem copies of an imperfectly repeated 36 amino acids sequence motif, the Pumilio repeat, surrounded by a short N- and C-terminal conserved region. The eight repeats and the N- and C-terminal regions form the Pumilio homology domain (PUM-HD). The PUM-HD domain is a sequence-specific RNA binding domain. Several Puf members have been shown to bind specific RNA sequences mainly found in the 3' UTR of mRNA and repress their translation []. Frequently, Puf proteins function asymmetrically to create protein gradients, thus causing asymmetric cell division and regulating cell fate specification [].Crystal structure of Pumilio repeats has been solved [
]. The PUM repeat with the N- and C-terminal regions pack together to form a right-handed superhelix that approximates a half doughnut structurally similar to the Armadillo (ARM) repeat proteins, beta-catenin andkaryopherin alpha. The RNA binds the concave surface of the molecule, where
each of the protein's eight repeats makes contacts with a different RNA basevia three amino acid side chains at conserved positions [
].This entry represents the PUM-HD domain.
D-galactoside/L-rhamnose binding SUEL lectin domain superfamily
Type:
Homologous_superfamily
Description:
The D-galactoside binding lectin purified from sea urchin (Anthocidaris crassispina) eggs exists as a disulphide-linked homodimer of two subunits; the dimeric form is essential for hemagglutination activity [
]. The sea urchin egg lectin (SUEL) forms a new class of lectins. Although SUEL was first isolated as a D-galactoside binding lectin, it was latter shown that it bind to L-rhamnose preferentially [,
]. L-rhamnose and D-galactose share the same hydroxyl group orientation at C2 and C4 of the pyranose ring structure.A cysteine-rich domain homologous to the SUEL protein has been identified in the following proteins [
,
,
]:Plant beta-galactosidases (
) (lactases).
Mammalian latrophilin, the calcium independent receptor of alpha-latrotoxin (CIRL). The galactose-binding lectin domain is not required for alpha-latratoxin binding [].Human lectomedin-1.Rhamnose-binding lectin (SAL) from catfish (Silurus asotus, Namazu) eggs. This protein is composed of three tandem repeat domains homologous to the SUEL lectin domain. All cysteine positions of each domain are completely conserved [
].The hypothetical B0457.1, F32A7.3A and F32A7.3B proteins from Caenorhabditis elegans.The human KIAA0821 protein.Structurally, the rhamnose-binding lectin domain (also known as the N-terminal lectin domain, Lec) is composed of five β-strands , a single, long α-helix, and two small helical elements. The overall fold is that of a β-sandwich with two antiparallel sheets [
].
This entry includes major allergen I polypeptide chain 1 (Fel d 1 chain 1) from cat and related proteins, such as SCGB1B from mice.Allergies are hypersensitivity reactions of the immune system to specific substances called allergens (such as pollen, stings, drugs, or food) that, in most people, result in no symptoms. A nomenclature system has been established for antigens (allergens) that cause IgE-mediated atopic allergies in Homo sapiens (Human). The nomenclature system uses the first three letters of the genus, followed by the first letter of the species name, followed by a number (additional letters can be added to the name as required to discriminate between similar designations).Fel d 1 is allergen 1 from Felis silvestris catus (Cat) which is an important agent in human allergic reactions [
]. The protein is expressed in saliva and sebaceous glands. The complete primary structure of Fel d 1 has been determined []. The allergen is tetrameric glycoprotein consisting of two disulphide-linked heterodimers of chains 1 and 2, which have been shown to be encoded by different genes. Fel d 1 chains 1 and 2 share structural similarity with uteroglobin, a secretoglobin superfamily member; chain 2 is a glycoprotein with N-linked oligosaccharides.SCGB1B is an androgen-binding protein that plays a role in mate selection in mice [
].
This entry includes major allergen I polypeptide chain 2 (Fel d 1 chain 2) from cat and related proteins, such as SCGB2B from mice.Allergies are hypersensitivity reactions of the immune system to specific substances called allergens (such as pollen, stings, drugs, or food) that, in most people, result in no symptoms. A nomenclature system has been established for antigens (allergens) that cause IgE-mediated atopic allergies in humans. The nomenclature system uses the first three letters of the genus, followed by the first letter of the species name, followed by a number (additional letters can be added to the name as required to discriminate between similar designations).Fel d 1 is allergen 1 from Felis silvestris catus (Cat), which is an important agent in human allergic reactions [
]. The protein is expressed in saliva and sebaceous glands. The complete primary structure of Fel d 1 has been determined []. The allergen is tetrameric glycoprotein consisting of two disulphide-linked heterodimers of chains 1 and 2, which have been shown to be encoded by different genes. Fel d 1 chains 1 and 2 share structural similarity with uteroglobin, a secretoglobin superfamily member; chain 2 is a glycoprotein with N-linked oligosaccharides.SCGB2B is an androgen-binding protein that plays a role in mate selection in mice [
].
This entry represents the C-lobe of FERM domain found in protein-tyrosine phosphatases non-receptor type-14 (PTPN14, also known as Pez) and type-21 (PTPN21). A number of mutations in Pez/PTPN14 have been shown to be associated with breast and colorectal cancer [
]. PTPN14 and PTPN21 contain an N-terminal FERM domain and a C-terminal PTP catalytic domain. The FERM domain has a cloverleaf tripart structure composed of: (1) FERM_N (A-lobe or F1); (2) FERM_M (B-lobe, or F2); and (3) FERM_C (C-lobe or F3). The C-lobe/F3 within the FERM domain is part of the PH domain family. Like most other ERM members they have a phosphoinositide-binding site in their FERM domain. The FERM C domain is the third structural domain within the FERM domain. The FERM domain is found in the cytoskeletal-associated proteins such as ezrin, moesin, radixin, 4.1R, and merlin. These proteins provide a link between the membrane and cytoskeleton and are involved in signal transduction pathways. The FERM domain is also found in protein tyrosine phosphatases (PTPs) , the tyrosine kinases FAK and JAK, in addition to other proteins involved in signaling. This domain is structurally similar to the PH and PTB domains and consequently is capable of binding to both peptides and phospholipids at different sites [
,
].
Endocytosis and intracellular transport involve several mechanistic steps:
(1) for the internalisation of cargo molecules, the membrane needs to bend to form a vesicular structure, which requires membrane curvature and a rearrangement of the cytoskeleton; (2) following its formation, the vesicle has to be pinched off the membrane; (3) the cargo has to be subsequently transported through the cell and the vesicle must fuse with the correct cellular compartment.Members of the Amphiphysin protein family are key regulators in the early steps of endocytosis, involved in the formation of clathrin-coated vesicles by promoting the assembly of a protein complex at the plasma membrane and directly assist in the induction of the high curvature of the membrane at the neck of the vesicle. Amphiphysins contain a characteristic domain, known as the BAR (Bin-Amphiphysin-Rvs)-domain, which is required for their in vivofunction and their ability to tubulate membranes [
]. The crystal structure of these proteins suggest the domain forms a crescent-shaped dimer of a three-helix coiled coil with a characteristic set of conserved hydrophobic, aromatic and hydrophilic amino acids. Proteins containing this domain have been shown to homodimerise, heterodimerise or, in a few cases, interact with small GTPases. This entry identifies several fungal BAR domain-containing proteins, such as Gvp36, that are not detected by
[
].
TetR family regulators are involved in the transcriptional control of multidrug efflux pumps, pathways for the biosynthesis of antibiotics, response to osmotic stress and toxic chemicals, control of catabolic pathways, differentiation processes, and pathogenicity [
]. The TetR proteins identified in overm ultiple genera of bacteria and archaea share a common helix-turn-helix (HTH) structure in their DNA-binding domain. However, TetR proteins can work in different ways: they can bind a target operator directly to exert their effect (e.g. TetR binds Tet(A) gene to repress it in the absence of tetracycline), or they can be involved in complex regulatory cascades in which the TetR protein can either be modulated by another regulator or TetR can trigger the cellular response []. TetR regulates the expression of the membrane-associated tetracycline resistance protein, TetA, which exports the tetracycline antibiotic out of the cell before it can attach to the ribosomes and inhibit protein synthesis []. TetR blocks transcription from the genes encoding both TetA and TetR in the absence of antibiotic. The C-terminal domain is multi-helical and is interlocked in the homodimer with the helix-turn-helix (HTH) DNA-binding domain []. This entry represents the C-terminal domain present in putative TetR family transcriptional regulators found in bacteria. It is a member of a Pfam clam for the the TetR superfamily.
This is the N-terminal domain found in components of the gamma-tubulin complex proteins (GCPs). Proteins containing this domain include spindle pole body (SBP) components such as Spc97 and Spc98 which function as the microtubule-organizing centre in yeast [
]. Proteins containing this domain also include human GCP4 (Gamma-tubulin complex component 4), which has been structurally elucidated []. Functional studies have shown that the N-terminal domain defines the functional identity of GCPs, suggesting that all GCPs are incorporated into the helix of gamma-tubulin small complexes (gTURCs) via lateral interactions between their N-terminal domains. Thereby, they define the direct neighbours and position the GCPs within the helical wall of gTuRC [
]. Sequence alignment of human GCPs based on the GCP4 structure helped delineate conserved regions in the N- and C-terminal domains []. In addition to the conserved sequences, the N-terminal domains carry specific insertions of various sizes depending on the GCP, i.e. internal insertions or N-terminal extensions. These insertions may equally contribute to the function of individual GCPs as they have been implied in specific interactions with regulatory or structural proteins. For instance, GCP6 carries a large internal insertion phosphorylated by Plk4 and containing a domain of interaction with keratins, whereas the N-terminal extension of GCP3 interacts with the recruitment protein MOZART1 [].