This superfamily consist of the morpho-protein BolA from Escherichia coli and its various homologues. In E. coli, over-expression of this protein causes round morphology and may be involved in switching the cell between elongation and septation systems during cell division [
]. The expression of BolA is growth rate regulated and is induced during the transition into the the stationary phase []. BolA is also induced by stress during early stages of growth [] and may have a general role in stress response. It has also been suggested that BolA can induce the transcription of penicillin binding proteins 6 and 5 [,
]. IbaG is a BolA homologue involved in acid resistance [].The structure of the BoIA protein has an α-β(2)-(alpha)-beta fold divided into two layers (alpha/beta).
Sec1-like molecules have been implicated in a variety of eukaryotic vesicle transport processes including neurotransmitter release by exocytosis [
].They regulate vesicle transport by binding to a t-SNARE from the syntaxin family. This process is thought to prevent SNARE complex formation, a protein complex required for membrane fusion. Whereas Sec1 molecules are essential for neurotransmitter release and other secretory events, their interaction with syntaxin molecules seems to represent a negative regulatory step in secretion []. Mutations in vacuolar protein sorting-associated protein 33B (VPS33B) account for most cases of arthrogryposis, renal dysfunction and cholestasis syndrome (ARC) [
]. It is proposed that the VPS33 may play a role in vesicle-mediated protein trafficking to lysosomal compartments and in membrane docking/fusion reactions of late endosomes/lysosomes [
].
This entry represents the RNA recognition motif (RRM) of Nab3p (also known as Hmd1p), an acidic nuclear polyadenylated RNA-binding protein that is essential for cell viability. Nab3p is predominantly localized within the nucleoplasm and essential for growth in budding yeast [
]. It plays an important role in the maintenance of CLN3 mRNA levels []. It is part of the Nrd1 complex (Nrd1p-Nab3p-Sen1p) that directs the termination and processing of short RNA polymerase II transcripts and regulates cellular response to nutrient availability []. Nab3p contains an N-terminal aspartic/glutamic acid-rich region, a central RNA recognition motif (RRM), and a C-terminal region rich in glutamine and proline residues [
].This group of proteins also includes Uncharacterized RNA-binding protein C3H8.09c from Schizosaccharomyces pombe, the ortologous protein in fission yeast.
The MASTL kinases carry only a catalytic domain, which contains a long insertion relative to MAST kinases. MASTL, also called greatwall kinase (Gwl), is involved in the regulation of mitotic entry, which is controlled by the coordinated activities of protein kinases and opposing protein phosphatases (PPs) [
]. The cyclin B/CDK1 complex induces entry into M-phase while PP2A-B55 shows anti-mitotic activity. MASTL/Gwl is activated downstream of cyclin B/CDK1 and indirectly inhibits PP2A-B55 by phosphorylating the small protein alpha-endosulfine (Ensa) or the cAMP-regulated phosphoprotein 19 (Arpp19), resulting in M-phase progression []. Gwl kinase may also play roles in mRNA stabilization and DNA checkpoint recovery [,
]. The human MASTL gene has also been named FLJ14813; a missense mutation in FLJ14813 is associated with autosomal dominant thrombocytopenia [].
Two types of dihydroxyacetone kinase (glycerone kinase) are described. In yeast and a few bacteria, e.g. Citrobacter freundii, the enzyme is a single chain that uses ATP as phosphoryl donor and is designated
. By contract, Escherichia coli and many other bacterial species have a multisubunit form with a phosphoprotein donor related to PTS transport proteins. This family represents a protein, unique to the firmicutes (low GC Gram-positives), that appears to be a divergent second copy of the K subunit of that complex; its gene is always found in operons with the other three proteins of the complex.
This entry also includes DhaKLM operon coactivator DhaQ from Streptococcus lactis. It forms a heterotetramer with DhaS and functions as a transcriptional regulator [
].
This entry represents the N-terminal domain of TORC proteins. TORC (transducer of regulated CREB activity) is a protein family of coactivators that enhances the activity of CRE-dependent transcription via a phosphorylation-independent interaction with the bZIP DNA binding/dimerisation domain of CREB (cAMP Response Element-Binding) [
]. The proteins display a highly conserved predicted N-terminal coiled-coil domain and an invariant sequence matching a protein kinase A (PKA) phosphorylation consensus sequence (RKXS) []. The coiled-coil structure interacts with the bZIP domain of CREB []. This interaction may occur via ionic bonds because it is disrupted under high-salt conditions []. In addition to CREB-binding, the N-terminal domain plays a role in the tetramer formation of TORCs [], but the physiological function of the multimeric complex has not been clarified yet.
Arsenic is a toxic metalloid whose trivalent and pentavalent ions inhibit
a variety of biochemical processes. Operons that encode arsenic resistancehave been found in multicopy plasmids from both Gram-positive and
Gram-negative bacteria []. The resistance mechanism is encoded from a singleoperon, which houses an anion pump. The pump has two polypeptide components:a catalytic subunit (the ArsA protein), which functions as an
oxyanion-stimulated ATPase; and an arsenite export component (the ArsB protein),which is associated with the inner membrane [
]. The ArsA and ArsB proteinsare thought to form a membrane complex that functions as an
anion-translocating ATPase.The ArsB protein is distinguished by its overall hydrophobic character,
in keeping with its role as a membrane-associated channel. Sequenceanalysis reveals the presence of 13 putative transmembrane (TM) regions.
This entry represents the N-terminal substrate-binding domain of the Lon protease. This ATP-dependent enzyme, a serine peptidase belonging to the MEROPS peptidase family S16, is conserved in archaeal, bacterial and eukaryotic organisms and catalyses rapid turnover of short-lived regulatory proteins and many damaged or denatured proteins. In eukaryotes, the majority of the proteins are located in the mitochondrial matrix [
,
]. In yeast, Pim1, is located in the mitochondrial matrix and required for mitochondrial function. It is constitutively expressed but is increased after thermal stress, suggesting that Pim1 may play a role in the heat shock response [].The structure of this domain has been determined and it represents a general protein and polypeptide interaction domain [
,
,
,
].
The ZU5 domain is a domain of 90-110 residues present in zona occludens 1 (ZO-1) protein, in unc5-like netrin receptors and in ankyrins. The ZU5 domain is named after the mouse tight junction protein ZO-1 and the C. elegans uncoordinated protein 5 (unc-5) and related Unc5-like netrin receptors. ZU5 domains are found in eukaryotic proteins that in most cases contain a C-terminal death domain. Other domains which can be found N-terminal to a ZU5 domain are ankyrin repeats; Ig-like and TSP1 repeats; PDZ, SH3 and guanylate kinase; or leucine-rich repeats (LRR) [
,
,
,
]. The ZU5 domain can function as a spectrin-binding domain in ankyrins [], and participates in induction of apoptosis and binding of melanoma-associated antigen D1 (MAGE-D1/NRAGE) in UNC5H1-3 [].
This entry represents the C2H2 AKAP95-type zinc finger.A-kinase (or PKA)-anchoring protein AKAP95 is a member of the AKAP-family that binds to regulatory subunits of protein kinase A (PKA) and localizes PKA to various distinct locations and structures within cells. AKAP95 is a zinc-finger protein implicated in mitotic chromosome condensation by acting as a targeting molecule for the condensin complex. In addition to anchoring the regulatory subunit (RII) of cyclic adenosine monophosphate (cAMP)-dependent protein kinase (PKA), AKAP95 binds chromatin at mitosis. It harbours two zinc fingers, designated ZF1 and ZF2. ZF1 mediates the binding of AKAP95 to DNA. ZF2 is dispensable for chromatin binding; however it may substitute for ZF1 when the latter is rendered non-functional by mutation [
,
,
].
MsrPQ is an enzymatic system that repairs proteins containing methionine sulfoxide in the bacterial cell envelope. MsrP, a molybdo-enzyme, and MsrQ, a haem-binding membrane protein, are widely conserved throughout Gram-negative bacteria. MsrPQ uses electrons from the respiratory chain, which represents a novel mechanism to import reducing equivalents into the bacterial cell envelope. MsrPQ is essential for the maintenance of envelope integrity under bleach stress, rescuing periplasmic proteins from methionine oxidation, including the primary periplasmic chaperone SurA [
].This family represents MsrQ.This family also includes magnetosome protein MamZ, which despite its similarity of sequence to MsrQ, has been shown not to genetically interact with MsrP. Rather it has been proposed to be involved in balancing the redox state of iron within the magnetosome compartment [
].
This family consists of the acyl carrier protein, also called the delta subunit, of malonate decarboxylase. This subunit has the same covalently bound prosthetic group, derived from and similar to coenzyme A, as does citrate lyase, although this protein and the acyl carrier protein of citrate lyase do not show significant sequence similarity. Both malonyl and acetyl groups are transferred to the prosthetic group for catalysis.This entry represents the malonate decarboxylate delta subunit. Malonate decarboxylase of Klebsiella pneumoniae consists of four different subunits and catalyses the conversion of malonate plus H
+to acetate and CO2. The catalysis proceeds via acetyl and malonyl thioester residues with the phosphribosyl-dephospho-CoA prosthetic group of the acyl carrier protein (ACP) subunit. MdcC is the (apo) ACP subunit [
].
Peroxidasin is a secreted heme peroxidase which is involved in hydrogen peroxide metabolism and peroxidative reactions in the cardiovascular system [
,
]. It is a multidomain peroxidase that contain a signal peptide, several (e.g., five in Drosophila or one in Homo sapiens) leucine-rich regions, followed by four immunoglobulin-like domains, a linking region, the peroxidase domain, and a carboxy terminal Willebrand type C protein-protein interaction domain [,
,
].This entry represents the peroxidase domain of peroxidasin and related proteins. The domain co-occurs with extracellular matrix domains and may play a role in the formation of the extracellular matrix [
].This entry also includes the human peroxidasin-like protein (PDXNL), a probable oxidoreductase that lacks peroxidase activity. It inhibits the peroxidase activity of PXDN through its interaction [
].
This entry represents a histidine-rich calcium-binding repeat which appears in proteins called histidine-rich-calcium binding proteins (HRC). HRC is a high capacity, low affinity Ca2+-binding protein, residing in the lumen of the sarcoplasmic reticulum. HRC binds directly to triadin. This binding interaction occurs between the histidine-rich region of HRC and multiple clusters of charged amino acids, named KEKE motifs, in the lumenal domain of triadin. This repeat is found in the acidic region of the protein, which can be long and variable. There is also a cysteine-rich region further towards the C terminus [
]. HRC may regulate sarcoplasmic reticular calcium transport, play a critical role in maintaining calcium homeostasis, and function in the heart. HRC is a candidate regulator of sarcoplasmic reticular calcium uptake.
Proteins containing this domain include mammalian Nit1 and Nit2, the Nit1-like domain of the invertebrate NitFhit, yeast Nit3 and various uncharacterized bacterial and archaeal Nit-like proteins. In general, they are amidases involved in various metabolic processes. Nit1 is a deaminated glutathione amidase [
], while Nit2 is an omega-amidase []. They are candidate tumour suppressor proteins []. In NitFhit, the Nit1-like domain is encoded as a fusion protein with the non-homologous tumour suppressor, fragile histidine triad (Fhit) []. Mammalian Nit1 and Fhit may affect distinct signal pathways, and both may participate in DNA damage-induced apoptosis []. Nit1 is a negative regulator in T cells []. Overexpression of Nit2 in HeLa cells leads to a suppression of cell growth through cell cycle arrest in G2 [].
This entry represents a group of adenylate cyclases (ACs) from animals, including adenylate cyclase type 1-8 from humans. Cyclic AMP (cAMP) is a ubiquitous signalling molecule which mediates many cellular processes by activating cAMP-dependent kinases and also inducing protein-protein interactions. This molecule is produced by the adenylate cyclase (AC) enzyme, using ATP as its substrate. Mammalian adenylate cyclase has nine closely related membrane-bound isoforms (AC1-9) showing significant sequence homology and sharing the same overall structure: two hydrophobic transmembrane domains, and two cytoplasmic domains that are responsible for the catalytic activity. These isoforms differ in both their tissue specificity and their regulation. Regulatory factors known to influence one or more of these isoforms include G proteins, protein kinases, calcium and calmodulin [
,
].
Proteins containing this domain can be found in bacteria, archaea and eukaryotes. Proteins containing this domain are distantly similar to (
). Studies in search for novel ADP-ribosylation systems in bacterial genomes have identified an operon that encodes a conserved protein containing a distinct type of macrodomain associated with this domain. This led to the identification of a toxin-antitoxin (TA) system, with DarT acting as the toxin. It is an enzyme that specifically modifies thymidines on single-stranded DNA in a sequence-specific manner by a nucleotide-type modification called ADP-ribosylation. This modification in turn can be removed by DarG, the antitoxin macrodomain protein. In addition, it was illustrated that substitution of the single completely conserved glutamate residue resulted in attenuation of function where DarT was non-toxic [
].
The multi-domain protein Connector enhancer of kinase suppressor of ras (Connector enhancer of KSR) (CNK) functions as a scaffold in several signal cascades and acts on proliferation, differentiation and apoptosis. CNK connects upstream activators and downstream targets of Ras- and Rho-dependent signalling pathways and may allow cross-talk between these pathways. In invertebrates, CNK is expressed as one isoform, whereas in mammals there exists CNK1, CNK2A, and its splice variant CNK2B. CNK proteins consist of one sterile alpha motif (SAM) domain (see
,
) one conserved region in CNK (CRIC) domain, one PSD-96/Dlg-A/ZO-1 (PDZ) domain (see
,
) and one pleckstrin homology (PH) domain (see
,
. The CRIC domain is enriched in leucine residues and functions as a protein-protein interaction domain [
,
].
This entry represents a group of plant proteins, including multiple organellar RNA editing factor 1-9 (MORF1-9) and ORRM1 from Arabidopsis, and DAG protein from Antirrhinum majus. DAG shares protein sequence similarity with MORFs and the N terminus of ORRM1. However, ORRM1 also contains a C-terminal RNA recognition motif (RRM), which is absent in DAG and MORFs. They are involvled in C-to-U RNA editing (deamination of cytidine to uridine), an essential step of RNA maturation in chloroplasts and mitochondria of land plants from bryophytes to angiosperms [
,
]. They interact selectively with diverse PPR (PLS-class of pentatricopeptide repeat) proteins, and these interactions are important for their RNA editing function. MORFs are required for RNA editing in mitochondria and plastids [].
Transcription initiation factor TFIID is a multimeric protein complex that plays a central role in mediating promoter responses to various activators and repressors. The complex includes TATA binding protein (TBP) and various TBP-associated factors (TAFS). TFIID is a RNA polymerase II-specific TATA-binding protein-associated factor (TAF) that is essential for viability.This group represents a transcription initiation factor TFIID subunit 1 (TAF1, also known as cell cycle gene 1 protein) [
,
,
]. This is the largest subunit and the core scaffold of the complex, contains Ser/Thr kinase domains which can autophosphorylate or transphosphorylate other transcription factors including TP53, GTF2A1 and GTF2F1 [,
], and has acetyltransferase activity towards histones H3 and H4 []. It is essential for progression of the G1 phase of the cell cycle [].
RING1 is a transcriptional repressor associated with the Polycomb group (PcG) protein complex involved in stable repression of gene activity. It is a core component of polycomb repressive complex 1 (PRC1) that functions as an E3-ubuiquitin ligase that transferring the mono-ubuiquitin mark to the C-terminal tail of Histone H2A at K118/K119 [
]. PRC1 is also capable of chromatin compaction, a function not requiring histone tails, and this activity appears important in gene silencing []. RING1 interacts with multiple PcG proteins and displays tumorigenic activity[
]. It also shows zinc-dependent DNA binding activity. Moreover, RING1 inhibits transactivation of the DNA-binding protein recombination signal binding protein-Jkappa (RBP-J) by Notch through interaction with the LIM domains of KyoT2 []. RING1 contains a C3HC4-type RING-HC finger [].
The deltex family of proteins is involved in the regulation of Notch signaling, and therefore may play roles in cell-to-cell communications that regulate mechanisms determining cell fate [
]. They have a central RING-type zinc finger domain and contain a C-terminal domain that is also found in other domain architectures. Deltex-1 (DTX1) contains a RING finger and two WWE domains, indicating that it may be an E3 ubiquitin ligase []. Human deltex 3-like, which contains an additional N-terminal domain (presumably with ubiquitin ligase activity) is also described as E3 ubiquitin-protein ligase DTX3L, B-lymphoma- and BAL-associated protein (BBAP), or rhysin-2. DTX3L mediates monoubiquitination of K91 of histone H4 in response to DNA damage [,
].This entry represents the C-terminal domain of the Deltex proteins.
TssO proteins have one predicted TM region near their N-termini. In Flavobacterium johnsoniae, three specific components (TssN, TssO and TssP) are absent in the other sub-types of T6SS [
].The type VI secretion system (T6SS) is a supra-molecular bacterial complex that resembles phage tails. It is a toxin delivery systems which fires toxins into target cells upon contraction of its TssBC sheath [
]. Thirteen essential core proteins are conserved in all T6SSs: the membrane associated complex TssJ-TssL-TssM, the baseplate proteins TssE, TssF, TssG, and TssK, the bacteriophage-related puncturing complex composed of the tube (Hcp), the tip/puncturing device VgrG, and the contractile sheath structure (TssB and TssC). Finally, the starfish-shaped dodecameric protein, TssA, limits contractile sheath polymerization at its distal part when TagA captures TssA [].
This entry represents the N-terminal cupin domain found in the HTH-type transcription activators RhaS and RhaRan similar sequences predominantly found in Proteobacteria. RhaS and RhaR respond to the availability of L-rhamnose and activate transcription of the operons in the Escherichia coli L-rhamnose catabolic regulon. The E. coli RhaR protein activates expression of the rhaSR operon in the presence of its effector, L-rhamnose. The resulting RhaS protein (plus L-rhamnose) activates expression of the L-rhamnose catabolic operon rhaBAD as well as the transport operon rhaT. These proteins bind DNA as dimers, via their HTH motifs in the C-terminal domain (
). Proteins in this family belong to the cupin superfamily with a conserved "jelly roll-like"β-barrel fold capable of homodimerisation [
,
,
,
,
].
FAS-associated factor 1 (FAF1) is a multidomain protein involved in various biological processes, including regulation of apoptosis and NFkappaB activity, and ubiquitination and proteasomal degradation [
]. It has been linked to cancer, asbestos-induced mesotheliomas, and Parkinson's disease []. FAF1 contains multiple ubiquitin-related domains, including ubiquitin-associated (UBA), ubiquitin-like 1 and 2 (UBL1, UBL2), and ubiquitin regulatory X (UBX) domains. FAF1 interacts with polyubiquitinated proteins via its N-terminal UBA domain and with valosin containing protein (VCP) via its C-terminal UBX domain [,
,
].This entry represents the UBX domain as found in FAF1 and homologues. The UBX domain has a β-grasp fold similar to that of ubiquitin; however, UBX lacks the C-terminal double glycine motif and is thus unlikely to be conjugated to other proteins.
Rusticyanin is a blue copper protein, described in an obligate acidophilic chemolithoautroph, Acidithiobacillus ferrooxidans, as an electron transfer protein. It can constitute up to 5 percent of protein in cells grown on Fe(II) and is thought to be part of an electron chain for Fe(II) oxidation, with two c-type cytochromes, an aa3-type cytochrome oxidase, and 02 as terminal electron acceptor [
,
]. It is rather closely related to sulfocyanin ().
The NMR structure indicates the
fold to be a compact β-barrel or β-sandwich, which contains a hydrophobic core rich in aromatic residues [
]. Its sequence is highly diverged from related copper-blue proteins, but it has a similar
active site, containing conserved His and Cys residues responsible for bindingcopper.
UBASH3A (ubiquitin-associated and SH3 domain-containing protein A, also known as STS-2 or TULA) belongs to the TULA family. It is only found in lymphoid cells and exhibits weak phosphatase activity [
,
]. UBASH3A facilitates T cell-induced apoptosis through interaction with the apoptosis-inducing factor AIF []. It is involved in regulating the level of phosphorylation of the zeta-associated protein (ZAP)-70 tyrosine kinase [].This entry represents the SH3 domain of UBASH3A. The TULA family includes two members termed p70/STS-1/TULA-2 and UBASH3A/STS-2/TULA/Cbl-InteractingProtein 4 (CLIP4). TULA proteins contain an N-terminal UBA domain, a central SH3 domain, and a C-terminal histidine phosphatase domain. They bind c-Cbl (a multidomain adaptor and an E3 ubiquitinligase) through the SH3 domain [
] and to ubiquitin via UBA [].
This entry represents the SH3 domain of STAM1 (signal transducing adapter molecule 1). STAM1 is part of the endosomal sorting complex required for transport (ESCRT-0) and is involved in sorting ubiquitinated cargo proteins from the endosome [
]. It may also be involved in the regulation of IL2 and GM-CSF mediated signaling, and has been implicated in neural cell survival [,
]. STAMs were discovered as proteins that are highly phosphorylated following cytokine and growth factor stimulation [
]. They function in cytokine signaling and surface receptor degradation, as well as regulate Golgi morphology. They associate with many proteins including Jak2 and Jak3 tyrosine kinases [], Hrs, AMSH, and UBPY. STAM adaptor proteins contain VHS (Vps27, Hrs, STAM homology), ubiquitin interacting (UIM), and SH3 domains [].
EFS (also known as SIN) is a member of the CAS family. It is involved in T lymphocyte development and immune system maturation. It has also been linked to several disorders, such as cancers and Chediak-Higashi syndrome [
]. CAS (Crk-associated substrate) family members are adaptor proteins that contain a highly conserved N-terminal SH3 domain, an adjacent unstructured domain (substrate domain) containing multiple tyrosine phosphorylation sites that enable binding by SH2-domain containing proteins, a serine-rich four-helix bundle, and a FAT-like C-terminal domain. Most of these domains mediate protein-protein interactions. Through these interactions, they assemble larger signaling complexes that are essential for cell proliferation, survival, migration, and other processes [
]. The CAS family consists of four members: BCAR1, HEF1, EFS, and CASS4 [].
TssN proteins have multiple predicted transmembrane (TM) regions. In Flavobacterium johnsoniae, three specific components (TssN, TssO and TssP) are absent in the other sub-types of T6SS [
].The type VI secretion system (T6SS) is a supra-molecular bacterial complex that resembles phage tails. It is a toxin delivery systems which fires toxins into target cells upon contraction of its TssBC sheath [
]. Thirteen essential core proteins are conserved in all T6SSs: the membrane associated complex TssJ-TssL-TssM, the baseplate proteins TssE, TssF, TssG, and TssK, the bacteriophage-related puncturing complex composed of the tube (Hcp), the tip/puncturing device VgrG, and the contractile sheath structure (TssB and TssC). Finally, the starfish-shaped dodecameric protein, TssA, limits contractile sheath polymerization at its distal part when TagA captures TssA [].
The Afa/Dr family of adhesins bind to the Dr blood group antigen component of decay-accelerating factor. These proteins contain both fimbriated and afimbriated adherence structures and mediate adherence of uropathogenic Escherichia coli to the urinary tract, leading to urinary tract infections (UTIs) and pregnancy complications [
,
]. They also confer the mannose-resistant hemagglutination (MHRA) phenotype, which can be inhibited by chloramphenicol. The N-terminal portion of the mature protein is thought to be responsible for chloramphenicol sensitivity []. These proteins remain attached to host cells or collagen once bound even in the presence of elevated shear stress or of chloramphenicol, a competitive inhibitor of binding [].This entry represents the signal peptide region necessary for protein secretion to the cell surface.
Fanconi Anaemia (FA) is a cancer predisposition disorder characterised by chromosome fragility and hypersensitivity to genotoxic agents that suggest defects in the molecular mechanisms of DNA damage signalling and repair. In response to DNA damage, the FA core complex monoubiquitinates the FANCD2 protein. This ubiquitination targets FANCD2 to nuclear foci where it interacts with a variety of DNA repair proteins.The FA group E protein (FANCE) has an important role in DNA repair, functioning as the FANCD2-binding protein in the FA core complex [
]. This entry represents the C-terminal domain of FANCE, which consists predominantly of helices and does not contain any β-strands. This domain folds in a continuous right-handed solenoidal pattern from its N terminus to its C terminus.
ARMET, also known as mesencephalic astrocyte-derived neurotrophic factor (MANF) or arginine-rich protein, is a small protein of approximately 170 residues which contains four di-sulphide bridges that are highly conserved from nematodes to humans. It is a soluble protein resident in the endoplasmic reticulum and induced by ER stress. It appears to be involved with dealing with mis-folded proteins in the ER, thus in quality control of ER stress [
]. ARMET from Rattus norvegicus (Rat) selectively promotes the survival of dopaminergic neurons of the ventral mid-brain. It modulates GABAergic transmission to the dopaminergic neurons of the substantia nigra, and enhances spontaneous, as well as evoked, GABAergic inhibitory postsynaptic currents in dopaminergic neurons [].This entry also includes the related neurotrophic factor CDNF (cerebral dopamine neurotrophic factor) [
].
This entry consists of the N-terminal domain of eukaryotic Aar2 and Aar2-like proteins.Aar2 is a U5 small nuclear ribonucleoprotein (snRNP) particle assembly factor and part of Prp8, which forms a large complex containing U5 snRNA, Snu114, and seven Sm proteins (B, D1, D2, D3, E, F and G). Upon import of the complex into the nucleus, Aar2 phosphorylation leads to its release from Prp8 and replacement by Brr2p, thus playing an important role in Brr2p regulation and possibly safeguarding against non-specific RNA binding to Prp8 [
,
,
,
,
]. Aar2p binds directly with the RNaseH-like domain in the C-terminal region of Prp8p []. In yeast, Aar2 protein is involved in splicing pre-mRNA of the a1 cistron and other genes important for cell growth [].
The hok/gef family of Gram-negative bacterial proteins are toxic to cells
when over-expressed, killing the cells from within by interfering with avital function in the cell membrane [
]. Some family members (flm) increase the stability of unstable RNA [], some (pnd) induce the degradation of stable RNA at higher than optimum growth temperatures [], amongst others, affect the release of cellular magnesium by membrane alterations []. Theproteins are short (50-70 residues), consisting of an N-terminal hydrophobic (possibly membrane spanning) domain, and a C-terminal periplasmic region, which contains the toxic domain. The C-terminal region contains a conserved cysteine residue that mediates homo-dimerisation in the gef protein, although dimerisation is not necessary for the toxic effect [
].This entry represents a conserved region found within these proteins.
Prenylated cyclic peptide, anacyclamide/piricyclamide family
Type:
Family
Description:
Members of this protein family occur primarily in Cyanobacteria. They average about 50 residues in length and are the ribosomally translated precursors of peptide natural products whose modifications include cleavage, cyclization, and prenylation. Sequences are well-conserved in the N-terminal region. They are nearly invariant over the last eight residues, but hypervariable just before that stretch [].Cyanobactins are small, cyclic peptides found in cyanobacteria. They are ribosomally synthesised and post-translationally modified. Anaclyclamides are a type of cyanobactin produced in strains of the cyanobacteria Anabaena [
].Anacyclamide synthesis protein AcyE is a 49-amino-acid protein with N-terminal homology to the peptide precursor proteins in the other cyanobactin pathways. The core peptide of AcyE is cleaved during post-translational processing of the precursor peptide [
].
The viral attachment protein domain forms part of the fibre proteins in adenoviruses [
], and the sigma 1 protein in reoviruses []. Both proteins are trimers that contain fibrous tails and globular heads (reovirus), or knobs (adenovirus), which are structurally very similar. Both domain cores consist of eight anti-parallel β-sheets, forming either a beta sandwich (fibre knob), or a β-barrel (sigma 1 head), with a long loop containing an alpha helix. The remaining loops tend to be longer in the fibre knob. Functionally both the fibre knob and the sigma 1 head are involved in binding selectively to cell surface receptors. The reoviruses bind JAM (junction adhesion molecule), while the adenoviruses use the CAR (Coxsackievirus and adenovirus) receptor, with the exception of subgroup B adenoviruses [].
CagE, TrbE, VirB component of type IV transporter system, central domain
Type:
Domain
Description:
This domain is found in (amongst others): the Helicobacter pylori protein CagE (see examples), which together with other proteins from the cag pathogenicity island (PAI), encodes a type IV transporter secretion system. The precise role of CagE is not known, but studies in animal models have shown that it is essential for pathogenesis in Helicobacter pylori induced gastritis and
peptic ulceration []. Indeed, the expression of the cag PAI has been shown to be essential for stimulating human gastricepithelial cell apoptosis
in vitro[
]. Similar type IV transport systems are also found in other bacteria. This domain is also found in proteins from the trb and Vir conjugal transfer systems in
Agrobacterium tumefaciens and homologues of VirB proteins from other species.
NlpE is a bacterial outer membrane lipoprotein that is necessary for signalling by the Cpx pathway [
]. This pathway responds to cell envelope disturbances and increases the expression of periplasmic protein folding and degradation factors. While the molecular function of the NlpE protein is unknown, it may be involved in detecting bacterial adhesion to abiotic surfaces. In Escherichia coli and Salmonella typhi, NlpE is also known to confer copper tolerance in copper-sensitive strains of Escherichia coli, and may be involved in copper efflux and delivery of copper to copper-dependent enzymes [].NlpE consists of two β-barrel domains. The N-terminal domain resembles the bacterial lipocalin Blc, and the C-terminal domain has an oligonucleotide/oligosaccharide-binding (OB) fold [
].This entry represents the N-terminal domain of the NlpE protein.
The TROVE (Telomerase, Ro and Vault) domain is a module of ~300-500 residues that is found in TEP1 and Ro60 the protein components of three ribonucleoprotein particles. It is also found in bacterial ribonucleoproteins suggesting an ancient origin of these ribonucleoproteins. It can be found associated with other domains, such as the VWFA domain, the TEP1 N-terminal domain, the NACHT-NTPase domain, and WD-40 repeats. This domain may be involved in binding the RNA components of the three RNPs, which are telomerase RNA, Y RNA and vault RNA [
].The TROVE domain contains a few absolutely conserved residues. As none of these conserved residues are the polar type of amino acids found in active sites, it seems unlikely that this region has an enzymatic function [
].
This entry represents the Gyroviral VP2 protein and TT viral ORF2.Torque teno virus (TTV) is a nonenveloped and single-stranded DNA virus that was initially isolated from a Japanese patient with hepatitis of unknown aetiology, and which has since been found to infect both healthy and diseased individuals [
]. Numerous prevalence studies have raised questions about its role in unexplained hepatitis. ORF2 is a 150 residue protein of unknown function. Gyroviruses are small circular single stranded viruses, such as the Chicken anaemia virus. The VP2 protein contains a set of conserved cysteine and histidine residues suggesting a zinc binding domain. VP2 may act as a scaffold protein in virion assembly and may also play a role in intracellular signaling during viral replication.
Peroxisome biogenesis factor 2, HC subclass-RING finger domain
Type:
Domain
Description:
This entry represents the RING finger domain, HC subclass, found exclusively in eukaryotic proteins such as the Peroxisome biogenesis factor 2 (PEX2) from Homo sapiens and Arabidopsis thaliana.PEX2 is an integral peroxisomal membrane protein with two transmembrane regions and a C3HC4-type RING-HC finger within its cytoplasmically exposed C terminus. It may be involved in the biogenesis of peroxisomes, as well as in peroxisomal matrix protein import [
,
,
,
]. Mutations in the PEX2 gene are the primary defect in a subset of patients with Zellweger syndrome and related peroxisome biogenesis disorders []. Moreover, PEX2 functions as an E3-ubiquitin ligase that mediates the UBC4-dependent polyubiquitination of PEX5, a key player in peroxisomal matrix protein import, to control PEX5 receptor recycling or degradation [].
This entry represents the Major outer membrane lipoprotein (Lpp, also known as murein lipoprotein), which controls the distance between the inner and outer membranes [
,
]. This abundant protein contains lipids in its N-terminal which anchor it to the outer membrane. This is the only protein known to bind covalently to the peptidoglycan network (PGN) but it also binds it through non-covalent interactions. About one-third of the Lpp is bound to the cell wall and the rest is free in the periplasm. Lpp have a structural function mediating the interactions between the outer membrane and PGN to assure the correct distance between them and contributes to the structural and functional integrity of the cell membrane [
]. The role of the periplasm free Lpp remains unknown.
Serologically defined colon cancer antigen 8 (SDCCAG8), also known as CCCAP (centrosomal colon cancer autoantigen protein), is a family of proteins found in eukaryotes. Mutations of SDCCAG8 are associated with nephronophthisis [
] and Bardet-Biedl syndrome [], as well as schizophrenia. SDCCAG8 is associated with the centrosome [] and interacts with pericentriolar material 1 (PCM1), a centriolar satellite protein crucial for targeting proteins to the centrosome. It regulates centrosomal accumulation of pericentriolar material and neuronal polarisation and migration in the developing mouse cortex [].SDCCAG8 is localized at both centrioles and interacts directly with OFD1 (oral-facial-digital syndrome 1), which is linked to an autosomal recessive cystic kidney disease, NPHP-RC (nephronophthisis-related ciliopathies). Depletion of SDCCAG8 causes kidney cysts and a body axis defect in zebrafish [
].
Jasmonates are a family of plant hormones that regulate plant growth, development and responses to stress. COI1 is an F-box protein that functions as the substrate-recruiting module of the Skp1-Cul1-F-box protein (SCF) ubiquitin E3 ligase complex. The role of COI1-mediated JAZ degradation in jasmonate (JA) signaling is analogous to auxin signaling through the receptor F-box protein transport inhibitor response 1 (TIR1) , which promotes hormone-dependent turnover of the AUX/IAA transcriptional repressors. The crystal structure of COI1 reveals a TIR1-like overall architecture, with an N-terminal tri-helical F-box motif bound to ASK1 and a C-terminal horseshoe-shaped solenoid domain formed by 18 tandem leucine-rich repeats. This entry represents the N-terminal F-box domain which is also found in other auxin signaling f-box proteins such as AFB1, AFB2 and AFB3 [
].
This entry represents the CARD domain found in BinCARD, also known as CARD19. BinCARD interacts with apoptosis inducer CARD protein Bcl10 through CARD. It inhibits Bcl10-mediated activation of NF-kappa B and to suppress Bcl10 phosphorylation [
].In general, CARDs are death domains (DDs) found associated with caspases [
]. They are known to be important in the signaling pathways for apoptosis, inflammation, and host-defense mechanisms []. DDs are protein-protein interaction domains found in a variety of domain architectures. Their common feature is that they form homodimers by self-association or heterodimers by associating with other members of the DD superfamily including PYRIN and DED (Death Effector Domain). They serve as adaptors in signaling pathways and can recruit other proteins into signaling complexes [].
The P0 protein is a major structural glycoprotein of the peripheral nervous system (PNS) myelin, which probably plays an important role for the formation and the maintenance of myelin structure; it is well conserved across species [
,
]. P0 is a protein of about 28 Kd that consists of three domains: a N-terminal extracellular domain of about 125 residues which seems to form a V-type immunoglobulin domain; a transmembrane region, and a cytoplasmic C-terminal domain of about 70 residues which contains many positively charged residues and which is a substrate of some serine protein kinases. This entry represents a signature pattern located in the C-terminal part of the cytoplasmic domain that contains many positive charges and serine residues that could be phosphorylated.
This superfamily represents a domain found in proteases belonging to the MEROPS peptidase family S1 (clan PA). This domain has a two β-barrel structure. The PA clan contains both cysteine and serine proteases that can be found in plants, animals, fungi, eubacteria, archaea and viruses [
].A cysteine peptidase is a proteolytic enzyme that hydrolyses a peptide bond using the thiol group of a cysteine residue as a nucleophile. Hydrolysis involves usually a catalytic triad consisting of the thiol group of the cysteine, the imidazolium ring of a histidine, and a third residue, usually asparagine or aspartic acid, to orientate and activate the imidazolium ring. In only one family of cysteine peptidases, is the role of the general base assigned to a residue other than a histidine: in peptidases from family C89 (acid ceramidase) an arginine is the general base. Cysteine peptidases can be grouped into fourteen different clans, with members of each clan possessing a tertiary fold unique to the clan. Four clans of cysteine peptidases share structural similarities with serine and threonine peptidases and asparagine lyases. From sequence similarities, cysteine peptidases can be clustered into over 80 different families [
]. Clans CF, CM, CN, CO, CP and PD contain only one family.Cysteine peptidases are often active at acidic pH and are therefore confined to acidic environments, such as the animal lysosome or plant vacuole. Cysteine peptidases can be endopeptidases, aminopeptidases, carboxypeptidases, dipeptidyl-peptidases or omega-peptidases. They are inhibited by thiol chelators such as iodoacetate, iodoacetic acid,
N-ethylmaleimide or
p-chloromercuribenzoate.
Clan CA includes proteins with a papain-like fold. There is a catalytic triad which occurs in the order: Cys/His/Asn (or Asp). A fourth residue, usually Gln, is important for stabilising the acyl intermediate that forms during catalysis, and this precedes the active site Cys. The fold consists of two subdomains with the active site between them. One subdomain consists of a bundle of helices, with the catalytic Cys at the end of one of them, and the other subdomain is a β-barrel with the active site His and Asn (or Asp). There are over thirty families in the clan, and tertiary structures have been solved for members of most of these. Peptidases in clan CA are usually sensitive to the small molecule inhibitor E64, which is ineffective against peptidases from other clans of cysteine peptidases [].Clan CD includes proteins with a caspase-like fold. Proteins in the clan have an α/β/α sandwich structure. There is a catalytic dyad which occurs in the order His/Cys. The active site His occurs in a His-Gly motif and the active site Cys occurs in an Ala-Cys motif; both motifs are preceded by a block of hydrophobic residues [
]. Specificity is predominantly directed towards residues that occupy the S1 binding pocket, so that caspases cleave aspartyl bonds, legumains cleave asparaginyl bonds, and gingipains cleave lysyl or arginyl bonds.Clan CE includes proteins with an adenain-like fold. The fold consists of two subdomains with the active site between them. One domain is a bundle of helices, and the other a β-barrel. The subdomains are in the opposite order to those found in peptidases from clan CA, and this is reflected in the order of active site residues: His/Asn/Gln/Cys. This has prompted speculation that proteins in clans CA and CE are related, and that members of one clan are derived from a circular permutation of the structure of the other.Clan CL includes proteins with a sortase B-like fold. Peptidases in the clan hydrolyse and transfer bacterial cell wall peptides. The fold shows a closed β-barrel decorated with helices with the active site at one end of the barrel [
]. The active site consists of a His/Cys catalytic dyad.Cysteine peptidases with a chymotrypsin-like fold are included in clan PA, which also includes serine peptidases. Cysteine peptidases that are N-terminal nucleophile hydrolases are included in clan PB. Cysteine peptidases with a tertiary structure similar to that of the serine-type aspartyl dipeptidase are included in clan PC. Cysteine peptidases with an intein-like fold are included in clan PD, which also includes asparagine lyases.
These peptidases have gamma-glutamyl hydrolase activity; that is they catalyse the cleavage of the gamma-glutamyl bond in poly-gamma-glutamyl substrates. They are structurally related to
, but contain extensions in four loops and at the C terminus [
]. They belong to MEROPS peptidase family C26 (gamma-glutamyl hydrolase family), clan PC. The majority of the sequences are classified as unassigned peptidases. A cysteine peptidase is a proteolytic enzyme that hydrolyses a peptide bond using the thiol group of a cysteine residue as a nucleophile. Hydrolysis involves usually a catalytic triad consisting of the thiol group of the cysteine, the imidazolium ring of a histidine, and a third residue, usually asparagine or aspartic acid, to orientate and activate the imidazolium ring. In only one family of cysteine peptidases, is the role of the general base assigned to a residue other than a histidine: in peptidases from family C89 (acid ceramidase) an arginine is the general base. Cysteine peptidases can be grouped into fourteen different clans, with members of each clan possessing a tertiary fold unique to the clan. Four clans of cysteine peptidases share structural similarities with serine and threonine peptidases and asparagine lyases. From sequence similarities, cysteine peptidases can be clustered into over 80 different families []. Clans CF, CM, CN, CO, CP and PD contain only one family.Cysteine peptidases are often active at acidic pH and are therefore confined to acidic environments, such as the animal lysosome or plant vacuole. Cysteine peptidases can be endopeptidases, aminopeptidases, carboxypeptidases, dipeptidyl-peptidases or omega-peptidases. They are inhibited by thiol chelators such as iodoacetate, iodoacetic acid,
N-ethylmaleimide or
p-chloromercuribenzoate.
Clan CA includes proteins with a papain-like fold. There is a catalytic triad which occurs in the order: Cys/His/Asn (or Asp). A fourth residue, usually Gln, is important for stabilising the acyl intermediate that forms during catalysis, and this precedes the active site Cys. The fold consists of two subdomains with the active site between them. One subdomain consists of a bundle of helices, with the catalytic Cys at the end of one of them, and the other subdomain is a β-barrel with the active site His and Asn (or Asp). There are over thirty families in the clan, and tertiary structures have been solved for members of most of these. Peptidases in clan CA are usually sensitive to the small molecule inhibitor E64, which is ineffective against peptidases from other clans of cysteine peptidases [
].Clan CD includes proteins with a caspase-like fold. Proteins in the clan have an α/β/α sandwich structure. There is a catalytic dyad which occurs in the order His/Cys. The active site His occurs in a His-Gly motif and the active site Cys occurs in an Ala-Cys motif; both motifs are preceded by a block of hydrophobic residues [
]. Specificity is predominantly directed towards residues that occupy the S1 binding pocket, so that caspases cleave aspartyl bonds, legumains cleave asparaginyl bonds, and gingipains cleave lysyl or arginyl bonds.Clan CE includes proteins with an adenain-like fold. The fold consists of two subdomains with the active site between them. One domain is a bundle of helices, and the other a β-barrel. The subdomains are in the opposite order to those found in peptidases from clan CA, and this is reflected in the order of active site residues: His/Asn/Gln/Cys. This has prompted speculation that proteins in clans CA and CE are related, and that members of one clan are derived from a circular permutation of the structure of the other.Clan CL includes proteins with a sortase B-like fold. Peptidases in the clan hydrolyse and transfer bacterial cell wall peptides. The fold shows a closed β-barrel decorated with helices with the active site at one end of the barrel [
]. The active site consists of a His/Cys catalytic dyad.Cysteine peptidases with a chymotrypsin-like fold are included in clan PA, which also includes serine peptidases. Cysteine peptidases that are N-terminal nucleophile hydrolases are included in clan PB. Cysteine peptidases with a tertiary structure similar to that of the serine-type aspartyl dipeptidase are included in clan PC. Cysteine peptidases with an intein-like fold are included in clan PD, which also includes asparagine lyases.
Endopeptidases from cyanobacteria Anabaena variabilis and Nostoc punctiforme contain two major domains: a bacterial SH3-like domain (SH3b) and a ubiquitous cell-wall-associated NlpC/P60 (or CHAP) cysteine peptidase domain. This entry represents the NLPC/P60 domain, which is a primitive, papain-like peptidase in the CA clan of cysteine peptidases with a Cys126/His176/His188 catalytic triad and a conserved catalytic core [
]. The function of this domain is not clear.
A cysteine peptidase is a proteolytic enzyme that hydrolyses a peptide bond using the thiol group of a cysteine residue as a nucleophile. Hydrolysis involves usually a catalytic triad consisting of the thiol group of the cysteine, the imidazolium ring of a histidine, and a third residue, usually asparagine or aspartic acid, to orientate and activate the imidazolium ring. In only one family of cysteine peptidases, is the role of the general base assigned to a residue other than a histidine: in peptidases from family C89 (acid ceramidase) an arginine is the general base. Cysteine peptidases can be grouped into fourteen different clans, with members of each clan possessing a tertiary fold unique to the clan. Four clans of cysteine peptidases share structural similarities with serine and threonine peptidases and asparagine lyases. From sequence similarities, cysteine peptidases can be clustered into over 80 different families [
]. Clans CF, CM, CN, CO, CP and PD contain only one family.Cysteine peptidases are often active at acidic pH and are therefore confined to acidic environments, such as the animal lysosome or plant vacuole. Cysteine peptidases can be endopeptidases, aminopeptidases, carboxypeptidases, dipeptidyl-peptidases or omega-peptidases. They are inhibited by thiol chelators such as iodoacetate, iodoacetic acid,
N-ethylmaleimide or
p-chloromercuribenzoate.
Clan CA includes proteins with a papain-like fold. There is a catalytic triad which occurs in the order: Cys/His/Asn (or Asp). A fourth residue, usually Gln, is important for stabilising the acyl intermediate that forms during catalysis, and this precedes the active site Cys. The fold consists of two subdomains with the active site between them. One subdomain consists of a bundle of helices, with the catalytic Cys at the end of one of them, and the other subdomain is a β-barrel with the active site His and Asn (or Asp). There are over thirty families in the clan, and tertiary structures have been solved for members of most of these. Peptidases in clan CA are usually sensitive to the small molecule inhibitor E64, which is ineffective against peptidases from other clans of cysteine peptidases [
].Clan CD includes proteins with a caspase-like fold. Proteins in the clan have an α/β/α sandwich structure. There is a catalytic dyad which occurs in the order His/Cys. The active site His occurs in a His-Gly motif and the active site Cys occurs in an Ala-Cys motif; both motifs are preceded by a block of hydrophobic residues [
]. Specificity is predominantly directed towards residues that occupy the S1 binding pocket, so that caspases cleave aspartyl bonds, legumains cleave asparaginyl bonds, and gingipains cleave lysyl or arginyl bonds.Clan CE includes proteins with an adenain-like fold. The fold consists of two subdomains with the active site between them. One domain is a bundle of helices, and the other a β-barrel. The subdomains are in the opposite order to those found in peptidases from clan CA, and this is reflected in the order of active site residues: His/Asn/Gln/Cys. This has prompted speculation that proteins in clans CA and CE are related, and that members of one clan are derived from a circular permutation of the structure of the other.Clan CL includes proteins with a sortase B-like fold. Peptidases in the clan hydrolyse and transfer bacterial cell wall peptides. The fold shows a closed β-barrel decorated with helices with the active site at one end of the barrel [
]. The active site consists of a His/Cys catalytic dyad.Cysteine peptidases with a chymotrypsin-like fold are included in clan PA, which also includes serine peptidases. Cysteine peptidases that are N-terminal nucleophile hydrolases are included in clan PB. Cysteine peptidases with a tertiary structure similar to that of the serine-type aspartyl dipeptidase are included in clan PC. Cysteine peptidases with an intein-like fold are included in clan PD, which also includes asparagine lyases.
The type I glycoprotein S of Coronavirus, trimers of which constitute the typical viral spikes, is assembled into virions through noncovalent interactions with the M protein. The spike glycoprotein is translated as a large polypeptide that is subsequently cleaved to S1 () and S2 [
]. The cleavage of S can occur at two distinct sites: S2 or S2' []. The spike is present in two very different forms: pre-fusion (the form on mature virions) and post-fusion (the form after membrane fusion has been completed). The spike is cleaved sequentially by host proteases at two sites: first at the S1/S2 boundary (i.e. S1/S2 site) and second within S2 (i.e. S2' site). After the cleavages, S1 dissociates from S2, allowing S2 to transition to the post-fusion structure []. Both chimeric S proteins appeared to cause cell fusion when expressed individually, suggesting that they were biologically fully active [
]. The spike is a type I membrane glycoprotein that possesses a conserved transmembrane anchor and an unusual cysteine-rich (cys) domain that bridges the putative junction of the anchor and the cytoplasmic tail [].SARS-CoV S is largely uncleaved after biosynthesis. It can be later processed by endosomal cathepsin L, trypsin, thermolysin, and elastase, which are shown to induce syncytia formation and virus entry. Other proteases that are of potential biological relevance in potentiating SARS-CoV S include TMPRSS2, TMPRSS11a, and HAT which are localized on the cell surface and are highly expressed in the human airway [
]. The furin-like S2' cleavage site at KR/SF with P1 and P2 basic residues and a P2' hydrophobic Phe downstream of the IFP is identical between the SARS-CoV-2 and SARS-CoV. One or more furin-like enzymes would cleave the S2' site at KR/SF [,
]. Deletion of SARS-CoV-2 furin cleavage site suggests that it may not be required for viral entry but may affect replication kinetics and altered sites have been still seen proteolytically cleaved. Several substitutions within the S2' cleavage domain of SARS-COV-2 have been reported, including P812L/S/T, S813I/G, F817L, I818S/V, but further experimental study of their consequences and the replication properties of the altered viruses are required to understand the role of furin cleavage in SARS-CoV-2 infection and virulence []. The S2 subunit normally contains multiple key components, including one or more fusion peptides (FP), a second proteolytic site (S2') and two conserved heptad repeats (HRs), driving membrane penetration and virus-cell fusion. The HRs can trimerize into a coiled-coil structure built of three HR1-HR2 helical hairpins presenting as a canonical six-helix bundle and drag the virus envelope and the host cell bilayer into close proximity, preparing for fusion to occur [
]. The fusion core is composed of HR1 and HR2 and at least three membranotropic regions that are denoted as the fusion peptide (FP), internal fusion peptide (IFP), and pretransmembrane domain (PTM). The HR regions are further flanked by the three membranotropic components. Both FP and IFP are located upstream of HR1, while PTM is distally downstream of HR2 and directly precedes the transmembrane domain of SARS-CoV S. All of these three components are able to partition into the phospholipid bilayer to disturb membrane integrity. []. During the pandemic, many conservative amino acid changes in FP segment of SARS-CoV-2 have been reported (i.e., L821I, L822F, K825R, V826L, T827I, L828P, A829T, D830G/A, A831V/S/T, G832C/S, F833S, I834T), although their impact is not known as the active conformation and mode of insertion of SARS-CoV-2 fusion peptide have not been experimentally characterised. Differences in HR1 sequences between SARS-CoV and SARS-CoV-2 suggest that SARS-CoV-2 HR2 makes stronger interactions with HR1. However, the substitutions observed in the solvent accessible surface of the HR1 domain (e.g., D936Y, S943P, S939F) of SARS-CoV-2 do not seem to be involved in stabilizing interactions with HR2. Substitutions in HR2 (e.g., K1073N, V1176F) or the TM or cytoplasmic tail domains have also been observed, but further experimental work is required to determine the effects of these changes [].This entry represents the heptad repeat 2 (HR2) from coronavirus Spike glycoprotein, S2 subunit. It adopts a mixed conformation: the central part fold into a nine-turn α-helix, while the residues on either side of the helix adopt an extended conformation. Packing of the helical parts of HR2 on the HR1 trimer grooves and formation of a six-helical bundle plays an important role in the formation of a stable post-fusion structure [,
].
A cysteine peptidase is a proteolytic enzyme that hydrolyses a peptide bond using the thiol group of a cysteine residue as a nucleophile. Hydrolysis involves usually a catalytic triad consisting of the thiol group of the cysteine, the imidazolium ring of a histidine, and a third residue, usually asparagine or aspartic acid, to orientate and activate the imidazolium ring. In only one family of cysteine peptidases, is the role of the general base assigned to a residue other than a histidine: in peptidases from family C89 (acid ceramidase) an arginine is the general base. Cysteine peptidases can be grouped into fourteen different clans, with members of each clan possessing a tertiary fold unique to the clan. Four clans of cysteine peptidases share structural similarities with serine and threonine peptidases and asparagine lyases. From sequence similarities, cysteine peptidases can be clustered into over 80 different families [
]. Clans CF, CM, CN, CO, CP and PD contain only one family.Cysteine peptidases are often active at acidic pH and are therefore confined to acidic environments, such as the animal lysosome or plant vacuole. Cysteine peptidases can be endopeptidases, aminopeptidases, carboxypeptidases, dipeptidyl-peptidases or omega-peptidases. They are inhibited by thiol chelators such as iodoacetate, iodoacetic acid,
N-ethylmaleimide or
p-chloromercuribenzoate.
Clan CA includes proteins with a papain-like fold. There is a catalytic triad which occurs in the order: Cys/His/Asn (or Asp). A fourth residue, usually Gln, is important for stabilising the acyl intermediate that forms during catalysis, and this precedes the active site Cys. The fold consists of two subdomains with the active site between them. One subdomain consists of a bundle of helices, with the catalytic Cys at the end of one of them, and the other subdomain is a β-barrel with the active site His and Asn (or Asp). There are over thirty families in the clan, and tertiary structures have been solved for members of most of these. Peptidases in clan CA are usually sensitive to the small molecule inhibitor E64, which is ineffective against peptidases from other clans of cysteine peptidases [
].Clan CD includes proteins with a caspase-like fold. Proteins in the clan have an α/β/α sandwich structure. There is a catalytic dyad which occurs in the order His/Cys. The active site His occurs in a His-Gly motif and the active site Cys occurs in an Ala-Cys motif; both motifs are preceded by a block of hydrophobic residues [
]. Specificity is predominantly directed towards residues that occupy the S1 binding pocket, so that caspases cleave aspartyl bonds, legumains cleave asparaginyl bonds, and gingipains cleave lysyl or arginyl bonds.Clan CE includes proteins with an adenain-like fold. The fold consists of two subdomains with the active site between them. One domain is a bundle of helices, and the other a β-barrel. The subdomains are in the opposite order to those found in peptidases from clan CA, and this is reflected in the order of active site residues: His/Asn/Gln/Cys. This has prompted speculation that proteins in clans CA and CE are related, and that members of one clan are derived from a circular permutation of the structure of the other.Clan CL includes proteins with a sortase B-like fold. Peptidases in the clan hydrolyse and transfer bacterial cell wall peptides. The fold shows a closed β-barrel decorated with helices with the active site at one end of the barrel [
]. The active site consists of a His/Cys catalytic dyad.Cysteine peptidases with a chymotrypsin-like fold are included in clan PA, which also includes serine peptidases. Cysteine peptidases that are N-terminal nucleophile hydrolases are included in clan PB. Cysteine peptidases with a tertiary structure similar to that of the serine-type aspartyl dipeptidase are included in clan PC. Cysteine peptidases with an intein-like fold are included in clan PD, which also includes asparagine lyases.This entry represents the catalytic domain of cysteine peptidases that constitute MEROPS peptidase family C54 (Aut2 peptidase family, clan CA), which includes cysteine protease ATG4 from human and its homologues from fungi and plants. ATG4 plays a key role in cytoplasm to vacuole transport (Cvt) and autophagy by mediating both proteolytic activation and delipidation of ATG8 [
,
,
].
RNA-directed RNA polymerase (RdRp) (
) is an essential protein encoded in the genomes of all RNA containing viruses with no DNA stage [
,
]. It catalyses synthesis of the RNA strand complementary to a given RNA template, but the precise molecular mechanism remains unclear.The postulated RNA replication process is a two-step mechanism. First, the initiation step of RNA synthesis begins at or near the 3' end of the RNA template by means of a primer-independent (de novo) mechanism. The de novo initiation consists in the addition of a nucleotide tri-phosphate (NTP) to the 3'-OH of the first initiating NTP. During the following so-called elongation phase, this nucleotidyl transfer reaction is repeated with subsequent NTPs to generate the complementary RNA product [
]. All the RNA-directed RNA polymerases, and many DNA-directed polymerases, employ a fold whose organisation has been likened to the shape of a right hand with three subdomains termed fingers, palm and thumb [
]. Only the catalytic palm subdomain, composed of a four-stranded antiparallel β-sheet with two α-helices, is well conserved among all of these enzymes. In RdRp, the palm subdomain comprises three well conserved motifs (A, B and C). Motif A (D-x(4,5)-D) and motif C (GDD) are spatially juxtaposed; the Asp residues of these motifs are implied in the binding of Mg2+ and/or Mn2+. The Asn residue of motif B is involved in selection of ribonucleoside triphosphates over dNTPs and thus determines whether RNA is synthesised rather than DNA [].The domain organisation [
] and the 3D structure of the catalytic centre of a wide range of RdPp's, even those with a low overall sequence homology, are conserved. The catalytic centre is formed by several motifs containing a number of conserved amino acid residues.There are 4 superfamilies of viruses that cover all RNA containing viruses with no DNA stage:
Viruses containing positive-strand RNA or double-strand RNA, except retroviruses and Birnaviridae: viral RNA-directed RNA polymerases including all positive-strand RNA viruses with no DNA stage, double-strand RNA viruses, and the Cystoviridae, Reoviridae, Hypoviridae, Partitiviridae, Totiviridae families.Mononegavirales (negative-strand RNA viruses with non-segmented genomes).Negative-strand RNA viruses with segmented genomes, i.e. Orthomyxoviruses (including influenza A, B, and C viruses, Thogotoviruses, and the infectious salmon anemia virus), Arenaviruses, Bunyaviruses, Hantaviruses, Nairoviruses, Phleboviruses, Tenuiviruses and Tospoviruses.Birnaviridae family of dsRNA viruses.The RNA-directed RNA polymerases in the first of the above superfamilies can be divided into the following three subgroups:All positive-strand RNA eukaryotic viruses with no DNA stage.All RNA-containing bacteriophages -there are two families of RNA-containing bacteriophages: Leviviridae (positive ssRNA phages) and Cystoviridae (dsRNA phages).Reoviridae family of dsRNA viruses.The nucleotide sequence for the RNA of Potato leafroll virus (PLrV) has been determined [
,
]. The sequence contains six large open reading frames (ORFs). The 5' coding region encodes two polypeptides of 28K and 70K, which overlap in different reading frames; it is suggested that the third ORF in the 5' block is translated by frameshift read through near the end of the 70K protein, yielding a 118K polypeptide []. The C-terminal part of the 118K protein contains a consensus sequence for RNA-dependent RNA-polymerases [].The genomic RNA sequence of Southern bean mosaic virus (SBMV) has been determined [
]. The genome contains four ORFs. The largest ORF encodes the two largest proteins translated in cell-free extracts from full-length virion RNA []. Segments of the predicted amino acid sequence of this ORF resemble those of known viral RNA-polymerases, ATP-binding proteins and viral genome-linked proteins [].The genome sequence of Pea enation mosaic virus (PEMV) RNA 1 shows strong organisational relationships and sequence similarities to the Beet western yellows virus (BWYV) and PLrV [
]. Sequence analysis reveals five predominant ORFs. The third ORF is characterised by a number of RNA-polymerase motifs and a helicase-like motif typical of RNA-dependent RNA-polymerases []. It overlaps (out of frame) the ORF 2 product and is proposed to be expressed by a frameshift fusion of ORF 2 and ORF 3 [].The PLrV sequence shows some similarities to the putative polymerase of SBMV [
], and more extensive similarities to the corresponding BWYV polypeptide [].
The type I glycoprotein S of Coronavirus, trimers of which constitute the typical viral spikes, is assembled into virions through noncovalent interactions with the M protein. The spike glycoprotein is translated as a large polypeptide that is subsequently cleaved to S1 (
) and S2 [
]. The cleavage of S can occur at two distinct sites: S2 or S2' []. The spike is present in two very different forms: pre-fusion (the form on mature virions) and post-fusion (the form after membrane fusion has been completed). The spike is cleaved sequentially by host proteases at two sites: first at the S1/S2 boundary (i.e. S1/S2 site) and second within S2 (i.e. S2' site). After the cleavages, S1 dissociates from S2, allowing S2 to transition to the post-fusion structure []. Both chimeric S proteins appeared to cause cell fusion when expressed individually, suggesting that they were biologically fully active [
]. The spike is a type I membrane glycoprotein that possesses a conserved transmembrane anchor and an unusual cysteine-rich (cys) domain that bridges the putative junction of the anchor and the cytoplasmic tail [].SARS-CoV S is largely uncleaved after biosynthesis. It can be later processed by endosomal cathepsin L, trypsin, thermolysin, and elastase, which are shown to induce syncytia formation and virus entry. Other proteases that are of potential biological relevance in potentiating SARS-CoV S include TMPRSS2, TMPRSS11a, and HAT which are localized on the cell surface and are highly expressed in the human airway [
]. The furin-like S2' cleavage site at KR/SF with P1 and P2 basic residues and a P2' hydrophobic Phe downstream of the IFP is identical between the SARS-CoV-2 and SARS-CoV. One or more furin-like enzymes would cleave the S2' site at KR/SF [,
]. Deletion of SARS-CoV-2 furin cleavage site suggests that it may not be required for viral entry but may affect replication kinetics and altered sites have been still seen proteolytically cleaved. Several substitutions within the S2' cleavage domain of SARS-COV-2 have been reported, including P812L/S/T, S813I/G, F817L, I818S/V, but further experimental study of their consequences and the replication properties of the altered viruses are required to understand the role of furin cleavage in SARS-CoV-2 infection and virulence []. The S2 subunit normally contains multiple key components, including one or more fusion peptides (FP), a second proteolytic site (S2') and two conserved heptad repeats (HRs), driving membrane penetration and virus-cell fusion. The HRs can trimerize into a coiled-coil structure built of three HR1-HR2 helical hairpins presenting as a canonical six-helix bundle and drag the virus envelope and the host cell bilayer into close proximity, preparing for fusion to occur [
]. The fusion core is composed of HR1 and HR2 and at least three membranotropic regions that are denoted as the fusion peptide (FP), internal fusion peptide (IFP), and pretransmembrane domain (PTM). The HR regions are further flanked by the three membranotropic components. Both FP and IFP are located upstream of HR1, while PTM is distally downstream of HR2 and directly precedes the transmembrane domain of SARS-CoV S. All of these three components are able to partition into the phospholipid bilayer to disturb membrane integrity. []. During the pandemic, many conservative amino acid changes in FP segment of SARS-CoV-2 have been reported (i.e., L821I, L822F, K825R, V826L, T827I, L828P, A829T, D830G/A, A831V/S/T, G832C/S, F833S, I834T), although their impact is not known as the active conformation and mode of insertion of SARS-CoV-2 fusion peptide have not been experimentally characterised. Differences in HR1 sequences between SARS-CoV and SARS-CoV-2 suggest that SARS-CoV-2 HR2 makes stronger interactions with HR1. However, the substitutions observed in the solvent accessible surface of the HR1 domain (e.g., D936Y, S943P, S939F) of SARS-CoV-2 do not seem to be involved in stabilizing interactions with HR2. Substitutions in HR2 (e.g., K1073N, V1176F) or the TM or cytoplasmic tail domains have also been observed, but further experimental work is required to determine the effects of these changes [].This entry represents the cysteine rich intravirion region found at the C-terminal of coronavirus spike proteins (S) [
]. These cysteine residues are targets for palmitoylation, necessary for efficiently S incorporation into virions and S-mediated membrane fusions.
This entry represents the katanin p80 WD40 repeat-containing subunit B1. The microtubule-severing protein katanin consists of a heterodimer of 60 and 80kDa subunits. p60 has microtubule-stimulated ATPase and microtubule-severing activities, while p80 is a novel protein containing WD40 repeats, which are frequently involved in protein-protein interactions.Katanin p80 WD40-containing subunit B1 may act to target p60 to sites of action such as the centrosome []. Microtubule severing may promote rapid reorganisation of cellular microtubule arrays and the release of microtubules from the centrosome following nucleation. There is also evidence that katanin localises at the leading edge of migratory cells and trims microtubules at their dynamic plus ends [].The C-terminal domain of p80 katanin binds microtubules in vitro, while the N-terminal WD40 domain acts as a negative regulator of microtubule disassembly activity [
].
S-layers are paracrystalline mono-layered assemblies of (glyco)proteins which coat the surface of bacteria [
,
]. Several S-layer proteins and some other cell wall proteins contain one or more copies of a domain of about 50-60 residues, which has been called SLH (for S-layer homology). Although it was originally proposed that SLH domains bind to peptidoglycan, it is now evident that pyruvylated secondary cell wall polymers (SCWPs), which are either teichoic acids, teichuronic acids, lipoteichoic acids or lipoglycans, serve as the anchoring structures for SLH motifs in the Gram-positive cell wall [,
]. However, the study of S-layer protein SbpA of Bacillus sphaericus revealed that SLH motifs are not sufficient for specific binding to SCWPs. Thus, the molecular basis explaining SLH affinity and specificity of interaction with cell wall polymers are not completely elucidated [].
This protein family includes 3'-5' ssDNA/RNA exonuclease TatD and many uncharacterised deoxyribonucleases and metal-dependent hydrolases. The family is related to a large superfamily of metalloenzymes [
]. TatD has been shown to be a 3'-5' exonuclease that processes single-stranded DNA in DNA repair [,
].In E. coli TatD, which adopts a TIM-barrel fold, is encoded by a operon that encodes Tat proteins, including TatA, TatB, and TatC, for protein transport via the Tat (Twin-Arginine Translocation) pathway. However, TatD is not involved in the protein export in the Tat pathway [
].Deoxyribonuclease TATDN1 from Danio rerio (Zebrafish) catalyses (in vitro) the decatenation of kinetoplast DNA producing linear DNA molecules. It is involved in chromosomal segregation and cell cycle progression during eye development [
]. TATDN1 has been related to several types of cancer [,
,
].
SKP1 (together with SKP2) was identified as an essential component of the
cyclin A-CDK2 S phase kinase complex []. It was found to bind several F-box containing proteins (e.g., Cdc4, Skp2, cyclin F) and to be involved in the ubiquitin protein degradation pathway. A yeast homologue of SKP1 (P52286) was identified in the centromere bound kinetochore complex [] and is also involved in the ubiquitin pathway []. In Dictyostelium discoideum (Slime mold) FP21 was shown to be glycosylated in the cytosol and has homology to SKP1 [].This entry represents a POZ domain with a core structure consisting of beta(2)/alpha(2)/beta(2)/alpha(2) in two layers, alpha/beta. This domain is found at the N-terminal of SKP1 proteins [
] as well as in subunit D of the centromere DNA-binding protein complex Cbf3 [].
Protein containing this domain are highly divergent in their overall sequence, however, they share a common region of roughly 200 amino acids known as the SEC7 domain [[cite27373159], ]. The 3D structure of the domain displays several α-helices []. It was found to be associated with other domains involved in guanine nucleotide exchange (e.g., CDC25, Dbl) in mammalian guanine-nucleotide-exchange factors [].SEC7 domain containing proteins are guanine nucleotide exchange factors (GEFs) specific for the ADP-rybosylation factors (ARF), a Ras-like GTPases which is important for vesicular protein trafficking. These proteins can be divided into five families, based on domain organisation and conservation of primary amino acid sequence: GBF/BIG, cytohesins,eFA6, BRAGs, and F-box [
]. They are found in all eukaryotes, and are involved in membrane remodeling processes throughout the cell [].
Peroxin 3 (Pex3p), also known as Peroxisomal biogenesis factor 3, has been identified and characterised as a peroxisomal membrane protein in yeasts and mammals [
]. Two putative peroxisomal membrane-bound Pex3p homologues have also been found in Arabidopsis thaliana []. They possess a membrane peroxisomal targeting signal. Pex3p is an integral membrane protein of peroxisomes, exposing its N- and C-terminal parts to the cytosol []. Peroxin is involved in peroxisome biosynthesis and integrity; it assembles membrane vesicles before the matrix proteins are translocated.In humans, defects in PEX3 are the cause of peroxisome biogenesis disorders [
], which include Zellweger syndrome (ZWS), neonatal adrenoleukodystrophy (NALD), infantile Refsum disease (IRD), and classical rhizomelic chondrodysplasia punctata (RCDP). These are peroxisomal disorders that are the result of proteins failing to be imported into the peroxisome.
DOCK family members are evolutionarily conserved guanine nucleotide exchange factors (GEFs) for Rho-family GTPases [
]. DOCK proteins are required during several cellular processes, such as cell motility and phagocytosis. The N-terminal SH3 domain of the DOCK proteins functions as an inhibitor of GEF, which can be relieved upon its binding to the ELMO1-3 adaptor proteins, after their binding to active RhoG at the plasma membrane [,
]. DOCK family proteins are categorised into four subfamilies based on their sequence homology: DOCK-A subfamily (DOCK1/180, 2, 5), DOCK-B subfamily (DOCK3, 4), DOCK-C subfamily (DOCK6, 7, 8), DOCK-D subfamily (DOCK9, 10, 11) []. All DOCKs contain two homology domains: the DHR-1 (Dock homology region-1), also called CZH1 (CED-5, Dock180, and MBC-zizimin homology 1), and DHR-2 (also called CZH2 or Docker).
The Nir/rdgB (N-terminal domain-interacting receptor/Drosophila retinal
degeneration B proteins) family has been identified in a variety of eukaryoticorganisms, ranging from worms to mammals. Members of this family are
implicated in regulation of lipid trafficking, metabolism, and signaling. TheNir/rdgB proteins contain a 180 amino-acids-long conserved region in the
central part of the protein. This domain contains four conserved residues,DDHD, which may form a metal-binding site. This domain is named DDHD after
these four residues. This pattern of conservation of metal-binding residues isoften seen in phosphoesterase domains [
].The DDHD domain is found in the central part of Nir/rdgB proteins, as well as
the C-terminal part of the phosphatidic acid-preferring phospholipase A1. TheDDHD domain function is not currently known but it may be implicated in
phospholipid metabolism, membrane turnover, or intracellular trafficking [].
Adaptor protein (AP) complexes function in protein transport via transport vesicles in different membrane traffic pathways. AP complexes are involved in the formation of clathrin-coated vesicles (CCVs) by recruiting the scaffold protein, clathrin. AP complexes are also involved in the cargo selection by recognising the sorting signals in proteins. Several AP complexes have been identified: AP-2 mediates endocytosis from the plasma membrane, while AP-1, AP-3 and AP-4 play a role in the endosomal/lysosomal sorting pathways [
,
]. AP complexes are heterotetramers which consist of two large subunits, one medium subunit and one small subunit. One of the large subunits (alpha, gamma, delta and epsilon) mediates binding to the target membrane. The other large subunit (beta) recruits clathrin through the clathrin-binding sequence []. This entry represents the AP complexes beta subunit.
The K homology domain is a common RNA-binding motif present in one or multiple copies in both prokaryotic and eukaryotic regulatory proteins. The KH motifs may act cooperatively to bind RNA in the case of multiple motifs, or independently in the case of single KH motif proteins. Prokaryotic (pKH) and eukaryotic (eKH) KH domains share a KH-motif, but have different topologies. The pKH domain has been found in a number of proteins, including the N-terminal domain of the S3 ribosomal protein [
], the C-terminal domain of Era GTPase [] and the two C-terminal domains of the NusA transcription factor []. The structure of the pKH domain consists of a two-layer α/β fold in the arrangement α/β(2)/α/β. This entry represents K homology domains, as well as related domains that share the same 2-layer α/β structure.
This family (also known as the ABI (abortive infection) family) contains putative intramembrane proteases (IMPs) and has homologues in all three domains of life, including Rce1 from S. cerevisiae [
]. Rce1 is a type II CAAX prenyl protease that processes all farnesylated and geranylgeranylated CAAX proteins. It is an integral membrane endoprotease localized to the endoplasmic reticulum that mediates the cleavage of the carboxyl-terminal three amino acids from CaaX proteins. It is involved in processing the Ras family of small GTPases, the gamma-subunit of heterotrimeric GTPases, nuclear lamins, and protein kinases and phosphatases []. Three residues of S. cerevisiae Rce1 -E156, H194 and H248- are critical for catalysis []. The structure of Rce1 from the archaea Methanococcus (MmRce1) suggests that this group of proteins represents a novel IMP family, the glutamate IMPs [].
Mop domains are present in a variety of bacterial and archeal proteins and specifically bind molybdate (or tungstate). The simplest mop-containing proteins are the so-called molbindins, consisting entirely of either one or two mop domains. The physiological role of these proteins is unclear, although they have been implicated in molybdate storage and homeostasis. Other mop-containing proteins are ModC, a component of the high affinity ABC transporter, and ModE, the molybdate-dependent transcriptional regulator [
,
,
].The mop domain consists of 68 amino acid residues arranged in six β-strands linked by short loops. It contains a β-barrel comprised of five antiparallel β-strands in a Greek key arrangement that is capped by amphipathic two-turns α-helices. The mop domain structure corresponds to the canonical oligonucleotide/oligosaccharide binding (OB) fold [
,
,
].
This entry includes CD36 and CD36-like proteins, including SCARB1/2 from vertebrates and SNMP1/2 from flies. CD36 is a transmembrane, highly glycosylated, 88kDa glycoprotein expressed by monocytes, macrophages, platelets, microvascular endothelial cells and adipose tissues. Platelet glycoprotein IV (GP IV)(GPIIIb) (CD36 antigen) is also called GPIV, OKM5-antigen or PASIV. CD36 recognises oxidized low density lipoprotein, long chain fatty acids, anionic phospholipids, collagen types I, IV and V, thrombospondin (TSP) and Plasmodium falciparum infected erythrocytes. The recognition of apoptotic neutrophils is in co-operation with TSP and avb3. Other ligands may still be unknown.CD36 is a scavenger receptor for oxidized LDL and shed photoreceptor outer segments and in recognition and phagocytosis of apoptotic cells and is the cell adhesion molecule in platelet adhesion and aggregation, platelet-monocyte and platelet-tumor cell interaction [
].
NlpE is a bacterial outer membrane lipoprotein that is necessary for signalling by the Cpx pathway [
]. This pathway responds to cell envelope disturbances and increases the expression of periplasmic protein folding and degradation factors. While the molecular function of the NlpE protein is unknown, it may be involved in detecting bacterial adhesion to abiotic surfaces. In Escherichia coli and Salmonella typhi, NlpE is also known to confer copper tolerance in copper-sensitive strains of Escherichia coli, and may be involved in copper efflux and delivery of copper to copper-dependent enzymes [
].NlpE consists of two β-barrel domains. The N-terminal domain resembles the bacterial lipocalin Blc, and the C-terminal domain has an oligonucleotide/oligosaccharide-binding (OB) fold [
].This superfamily represents a domain is found at the C terminus of the NlpE protein.
Bypass of forespore C (BofC) is a monomer made up of two domains, an N-terminal and a C-terminal domain. The N-terminal domain of BofC is composed of a four-stranded β-sheet covered by an α-helix. The β-sheet has a beta2-beta1-beta4-beta3 topology, where strands beta1 and beta2 and strands beta3 and beta4 are connected by β-turns, whereas strands beta2 and beta3 are joined by an α-helix that runs across one face of the β-sheet. This domain is similar to the third immunoglobulin G-binding domain of protein G from Streptococcus, the latter belonging to a large and diverse group of cell surface-associated proteins that bind to immunoglobulins. It has been hypothesised that this domain may be a mediator of protein-protein interactions involved in proteolytic events at the cell surface [
].
Cox proteins are expressed by Enterobacteria phages. The Cox protein is a 79-residue basic protein with a predicted strong helix-turn-helix DNA-binding motif. It inhibits integrative recombination and it activates site-specific excision of the HP1 genome from the Haemophilus influenzae chromosome, Hp1. Cox appears to function as a tetramer. Cox binding sites consist of two direct repeats of the consensus motif 5'-GGTMAWWWWA, one Cox tetramer binding to each motif. Cox binding interferes with the interaction of HP1 integrase with one of its binding sites, IBS5. This competition is central to directional control. Both Cox binding sites are needed for full inhibition of integration and for activating excision, because it plays a positive role in assembling the nucleoprotein complexes that produce excisive recombination, by inducing the formation of a critical conformation in those complexes [
].
The Sp100 protein is a constituent of nuclear domains, also known as nuclear dots (NDs). An ND-targeting region that coincides with a homodimerisation domain was mapped in Sp100. Sequences similar to the Sp100 homodimerization/ND-targeting region occur in several other proteins and constitute a novel protein motif, termed HSR domain (for homogeneously-staining region) [
]. This domain can also be found in Vertebrate AutoImmune REgulator (AIRE) protein, a transcription regulator predominantly expressed in thymus medullary epithelial cells which also localises to nuclear dots []. The HSR domain, which has also been named ASS (AIRE, Sp-100 and Sp140) domain [
], can be found alone or in association with other domains, such as SAND , PHD finger
and Bromo
, The HSR domain is predicted to be predominantly α-helical [
,
,
].
Fanconi anemia (FA) is a genomic instability syndrome caused by mutations in at least 13 distinct genes whose products function in a common DNA repair signaling pathway, the FA pathway. The FA pathway cooperates with other DNA repair proteins for resolving DNA interstrand cross-links during replication [
].Fanconi anemia group F protein (FANCF) is a component of the FA core complex [
,
]. FANCF regulates its own expression by methylation at both mRNA and protein levels. Methylation-induced inactivation of FANCF has an important role on the occurrence of ovarian cancers by disrupting the FA-BRCA pathway [].This entry represents the C-terminal region of the FANCF protein found in metazoa. The C-terminal domain has an helical repeat structure and is necessary for the proper assembly of the FA core complex [
].
This entry represents Torsin-1A-interacting proteins 1 and 2 (TOIP 1/2) also known as LAP1 proteins (Lamina-associated polypeptide 1), which are type 2 integral membrane proteins with a single membrane-spanning region of the inner nuclear membrane [
,
,
]. These proteins interact with and activate Torsin A, an AAA+ ATPase localized to the endoplasmic reticulum (ER), through a perinuclear domain and forms a heterohexameric (LAP1-Torsin)3 ring that targets Torsin to the nuclear envelope. LAP1 has an atypical AAA+ fold and provides an arginine finger to the Torsin A active site to promote its ATPase activity [,
]. A single mutation in Torsin A causes early onset primary dystonia, a painful and severely disabling neuromuscular disease [,
].This is the conserved C-terminal domain of LAP1, the AAA Activator domain [,
].
Cyanobactin biosynthesis protein, PatB/AcyB/McaB family
Type:
Family
Description:
Members of this protein family are small (~ 80 amino acids) and occur in biosynthesis clusters for cyanobactins, a type of ribosomal natural product, thiazole/oxazole-modified microcin (TOMM). The function of this protein family is unknown, and the recognised cyanobactin precursors (e.g., microcyclamides and patellamides) are encoded by a different protein. In this protein family, however, a core region of about 62 amino acids is followed by a hypervariable region of 5 to 23 amino acids, with hallmarks of possible cyclodehydratase modification sites. The hallmarks include Cys residues flanked by Gly, and variable length Ser-rich tripeptide repeats. Further, members of this family were shown dispensible for patellamide biosynthesis, and two may occur in a cluster. Therefore, this family may represent a precursor of another type of ribosomal natural product [
].
The PP-loop motif appears to be a modified version of the P-loop of nucleotide binding domain that is involved in phosphate binding [
]. Named PP-motif, since it appears to be a part of a previously uncharacterised ATP pyrophophatase domain. ATP sulfurylases, Escherichia coli NtrL, and Bacillus subtilis OutB consist of this domain alone. In other proteins, the pyrophosphatase domain is associated with amidotransferase domains (type I or type II), a putative citrulline-aspartate ligase domain or a nitrilase/amidase domain.PP-loop ATPases are part of the HUP domain class (after HIGH-signature proteins, UspA, and PP-ATPase), along with the nucleotide-binding domains of class I aminoacyl-tRNA synthetases, UspA protein (USPA domains), photolyases, and electron transport flavoproteins (ETFP). The HUP domain is a distinct class of alpha/beta domain [
].This entry represents MJ1016-type predicted PP-loop ATPases.
The PP-loop motif appears to be a modified version of the P-loop of nucleotide binding domain that is involved in phosphate binding [
]. Named PP-motif, since it appears to be a part of a previously uncharacterised ATP pyrophophatase domain. ATP sulfurylases, Escherichia coli NtrL, and Bacillus subtilis OutB consist of this domain alone. In other proteins, the pyrophosphatase domain is associated with amidotransferase domains (type I or type II), a putative citrulline-aspartate ligase domain or a nitrilase/amidase domain.PP-loop ATPases are part of the HUP domain class (after HIGH-signature proteins, UspA, and PP-ATPase), along with the nucleotide-binding domains of class I aminoacyl-tRNA synthetases, UspA protein (USPA domains), photolyases, and electron transport flavoproteins (ETFP). The HUP domain is a distinct class of alpha/beta domain [
].This entry represents MJ1599-type predicted PP-loop ATPases.
The PP-loop motif appears to be a modified version of the P-loop of nucleotide binding domain that is involved in phosphate binding [
]. Named PP-motif, since it appears to be a part of a previously uncharacterised ATP pyrophophatase domain. ATP sulfurylases, Escherichia coli NtrL, and Bacillus subtilis OutB consist of this domain alone. In other proteins, the pyrophosphatase domain is associated with amidotransferase domains (type I or type II), a putative citrulline-aspartate ligase domain or a nitrilase/amidase domain.PP-loop ATPases are part of the HUP domain class (after HIGH-signature proteins, UspA, and PP-ATPase), along with the nucleotide-binding domains of class I aminoacyl-tRNA synthetases, UspA protein (USPA domains), photolyases, and electron transport flavoproteins (ETFP). The HUP domain is a distinct class of alpha/beta domain [
].This group represents MJ1638-type predicted PP-loop ATPases.
This family represents a group of proteins from vertebrates that belong to the G-protein coupled receptor 1 family, including G-protein coupled estrogen receptor 1 (GPER1, also known as G-protein coupled receptor 30), the atypical chemokine receptor 3 and orphan receptors such as GP182 and GP146. GPER1 has been described as an atypical GPCR whose physiological and/or pathological roles remain to be determined. It was first described that it binds 17-beta-estradiol (E2) with high affinity, leading to rapid and transient activation of numerous intracellular signalling pathways, however, evidence confirming this is still lacking [,
]. In vitro studies suggest that E2 could bind GPER1 to trigger various intracellular signalling pathways, and in vivo studies do not exclude GPER1 as an ER in mediating estrogenic responses, but further investigation is needed to confirm this [].
HR1 was first described as a three times repeated homology region of the N-terminal non-catalytic part of protein kinase PRK1(PKN) [
]. The first two of these repeats were later shown to bind the small G protein rho [,
] known to activate PKN in its GTP-bound form. Similar rho-binding domains also occur in a number of other protein kinases and in the rho-binding proteins rhophilin and rhotekin. Recently, the structure of the N-terminal HR1 repeat complexed with RhoA has been determined by X-ray crystallography. This domain contains two long alpha helices forming a left-handed antiparallel coiled-coil fold termed the antiparallel coiled- coil (ACC) finger domain. The two long helices encompass the basic region and the leucine repeat region, which are identified as the Rho-binding region [
,
,
].
This entry represents P40 nucleoproteins from several Borna disease virus (BDV) strains. BDV is an RNA virus that is a member of the Mononegavirales family, which includes such members as Measles virus and Ebola virus sp.. BDV causes an infection of the central nervous system in a wide range of vertebrates, which can progress to an often fatal immune-mediated disease. Viral nucleoproteins are central to transcription, replication, and packaging of the RNA genome. P40 nucleoprotein from BDV is multi-helical in structure and can be divided into two subdomains, each of which has an α-bundle topology [
]. The nucleoprotein assembles into a planar homotetramer, with the RNA genome either wrapping around the outside of the tetramer or possibly fitting within the charged central channel of the tetramer [].
This entry represents the RNA recognition motif 1 (RRM1) of PHIP1. A. thaliana PHIP1 and its homologues represent a novel class of plant-specific RNA-binding proteins that may play a unique role in the polarized mRNA transport to the vicinity of the cell plate [
]. PHIP1 is a peripheral membrane protein and is localized at the cell plate during cytokinesis in plants. In addition to phragmoplastin, PHIP1 interacts with two Arabidopsis small GTP-binding proteins, Rop1 and Ran2. However, PHIP1 interacted only with the GTP-bound form of Rop1 but not the GDP-bound form. It also binds specifically to Ran2 mRNA []. This group of proteins consist of multiple functional domains, including a lysine-rich domain (KRD domain) that contains three nuclear localization motifs (KKKR/NK), two RNA recognition motifs (RRMs), and three CCHC-type zinc fingers.
This entry represents the RNA recognition motif 2 (RRM2) of PHIP1. A. thaliana PHIP1 and its homologues represent a novel class of plant-specific RNA-binding proteins that may play a unique role in the polarized mRNA transport to the vicinity of the cell plate [
]. PHIP1 is a peripheral membrane protein and is localized at the cell plate during cytokinesis in plants. In addition to phragmoplastin, PHIP1 interacts with two Arabidopsis small GTP-binding proteins, Rop1 and Ran2. However, PHIP1 interacted only with the GTP-bound form of Rop1 but not the GDP-bound form. It also binds specifically to Ran2 mRNA []. This group of proteins consist of multiple functional domains, including a lysine-rich domain (KRD domain) that contains three nuclear localization motifs (KKKR/NK), two RNA recognition motifs (RRMs), and three CCHC-type zinc fingers.
This entry represents the RRM of PDIP3.Polymerase delta-interacting protein 3 (PDIP3), also termed SKAR or 46kDa DNA polymerase delta interaction protein (PDIP46), belongs to the Aly/REF family of RNA binding proteins that have been implicated in coupling transcription with pre-mRNA splicing and nucleo-cytoplasmic mRNA transport [
,
]. PDIP3 is widely expressed and localizes to the nucleus. It may be a critical player in the function of S6K1 in cell and organism growth control by binding the activated, hyperphosphorylated form of S6K1 but not S6K2 []. Furthermore, PDIP3 functions as a protein partner of the p50 subunit of DNA polymerase delta []. In addition, PDIP3 may have particular importance in pancreatic beta cell size determination and insulin secretion []. PDIP3 contains a well conserved RNA recognition motif (RRM).
This entry represents the RNA recognition motif (RRM) in Pin4, a novel phosphothreonine (pThr)-containing protein that specifically interacts with the pThr-binding site of the Rad53 FHA1 domain. Pin4 is involved in normal G2/M cell cycle progression in the absence of DNA damage and functions as a novel target of checkpoint-dependent cell cycle arrest pathways [
,
]. It contains an N-terminal RRM, a nuclear localization signal, a coiled coil, and a total of 15 SQ/TQ motifs.In S. pombe, Cip1 (Csx1-interacting protein 1) and Cip2 (Csx1-interacting protein 2) are novel cytoplasmic RRM-containing proteins that counteract Csx1 function during oxidative stress. They are not essential for viability in S. pombe [
]. Both cip1 and cip2 contain one RRM. Like Pin4, Cip2 also possesses an R3H motif that may function in sequence-specific binding to single-stranded nucleic acids.
At least five distinct pathways exist for the catabolism of propionate by way of propionyl-CoA. Most members of this family are bacterial proteins known or predicted to act as 2-methylcitrate dehydratase; an enzyme which catalyses the third step the methylcitrate cycle of propionate catabolism [
]. A related clade of archaeal proteins that may or may not be functionally equivalent is excluded from this family. The PrpD enzyme of Escherichia coli is responsible for the minor aconitase activity (AcnC) not accounted for by AcnA and AcnB []. Some proteins in this entry are annotated as MmgE, a Bacillus subtilis protein encoded within an operon which is expressed during sporulation, subject to catabolite repression []. An enzyme from Bacillus subtilis (citrate/2-methylcitrate synthase) acts as both a citrate synthase and a methylcitrate synthase [,
].
This entry represents P40 nucleoproteins from several Borna disease virus (BDV) strains. BDV is an RNA virus that is a member of the Mononegavirales family, which includes such members as Measles virus and Ebola virus sp.. BDV causes an infection of the central nervous system in a wide range of vertebrates, which can progress to an often fatal immune-mediated disease. Viral nucleoproteins are central to transcription, replication, and packaging of the RNA genome. P40 nucleoprotein from BDV is multi-helical in structure and can be divided into two subdomains, each of which has an α-bundle topology [
]. The nucleoprotein assembles into a planar homotetramer, with the RNA genome either wrapping around the outside of the tetramer or possibly fitting within the charged central channel of the tetramer [].
The Spot 14 family includes thyroid hormone-inducible hepatic protein (Spot 14), Mid1-interacting protein and related sequneces. Mainly expressed in tissues that synthesise triglycerides, the mRNA coding for Spot 14 has been shown to be increased in rat liver by insulin, dietary carbohydrates, glucose in hepatocyte culture medium, as well as thyroid hormone. In contrast, dietary fats and polyunsaturated fatty acids, have been shown to decrease the amount of Spot 14 mRNA, while an elevated level of cAMP acts as a dominant negative factor. In addition, liver-specific factors or chromatin organisation of the gene have been shown to contribute to the regulation of its expression [
]. Spot 14 protein is thought to be required for induction of hepatic lipogenesis []. Mid1-interacting protein is involved in stabilisation of microtubules [].
This entry include a group of proteins involved in chromatin remodelling, including Vps72 (vacuolar protein sorting-associated protein 72) from budding yeasts. Vps72 is a Htz1-binding component of the SWR1 complex, which is required for the incorporation of the histone variant H2AZ into chromatin [
]. It is also required for vacuolar protein sorting in budding yeasts [].The Vps72 homologue from animals, YL-1, is a deposition-and-exchange histone chaperone specific for H2AZ1, specifically chaperones H2AZ1 and deposits it into nucleosomes. It is component of the SRCAP and Tip60 complexes, and mediates the ATP-dependent exchange of histone H2AZ1/H2B dimers for nucleosomal H2A/H2B, leading to transcriptional regulation of selected genes by chromatin remodeling [
,
].This is the N-terminal region from Vps72/YL1, which covers the aspartate- and glutamate-rich N-terminal domain and the central DNA-binding domain [,
].
Xeroderma pigmentosum (XP) [
] is a human autosomal recessive disease,characterised by a high incidence of sunlight-induced skin cancer. Skin cells of individual's with this condition are hypersensitive to ultraviolet light, due
to defects in the incision step of DNA excision repair. There are a minimum ofseven genetic complementation groups involved in this pathway: XP-A to XP-G.
XP-A is the most severe form of the disease and is due to defects in a 30kDanuclear protein called XPA (or XPAC) [
].The sequence of the XPA protein is conserved from higher eukaryotes [
] toyeast (gene RAD14) [
]. XPA is a hydrophilic protein of 247 to 296 amino-acidresidues which has a C4-type zinc finger motif in its central section.This entry represents the uncharacterised C-terminal domain of the XPA protein.
Intermediate filaments (IF) [
,
,
] are proteins which are primordial components of the cytoskeleton and the nuclear envelope. They generally form filamentous structures 8 to 14 nm wide.Type II keratins are the basic or neutral courterparts to the acidic type I
keratins. Each type II keratin forms a heterodimer with a specific acidickeratin, and the heterodimers are organised into tetramers and then into
chains []. Type II keratins consist of head-, rod- and tail-likestructures, the rod being constructed from three linked coils: 1A, 1B and 2.
The head and tail structures of the type 2 keratins are highly variablelow-complexity regions.
This protein family also includes Intermediate filament protein ON3 from Carassius auratus (Goldfish), one of the non-neuronal predominant intermediate filament proteins of the visual pathway [
].
The Yersinia adhesin A (YadA) is a trimeric autotransporter adhesin of enteric yersiniae. It consists of three major domains: a head mediating adherence to host cells, a stalk involved in serum resistance, and an anchor that forms a membrane pore and is responsible for the autotransport function [].This entry represents the C-terminal membrane anchor domain of YadA. Proteins containing this domain also include UspA2 from Moraxella , Eib immunoglobulin-binding proteins from E. coli, and DsrA proteins of Haemophilus ducreyi. These proteins are homologous at their C-terminal and have predicted signal sequences, but they diverge elsewhere. The C-terminal 9 amino acids, consisting of alternating hydrophobic amino acids ending in F or W, comprise a targeting motif for the outer membrane of the Gram-negative cell envelope. This region is important for oligomerisation [
].
Bypass of forespore C (BofC) is a monomer made up of two domains, an N-terminal and a C-terminal domain. The N-terminal domain of BofC is composed of a four-stranded β-sheet covered by an α-helix. The β-sheet has a beta2-beta1-beta4-beta3 topology, where strands beta1 and beta2 and strands beta3 and beta4 are connected by β-turns, whereas strands beta2 and beta3 are joined by an α-helix that runs across one face of the β-sheet. This domain is similar to the third immunoglobulin G-binding domain of protein G from Streptococcus, the latter belonging to a large and diverse group of cell surface-associated proteins that bind to immunoglobulins. It has been hypothesised that this domain may be a mediator of protein-protein interactions involved in proteolytic events at the cell surface [
].
The four families of large eukaryotic DNA viruses, Poxviridae, Asfarviridae, Iridoviridae, and Phycodnaviridae, referred to collectively as nucleocytoplasmic large DNA viruses or NCLDV, have all been shown to have a lipid membrane, in spite of the major differences in virion structure. The paralogous genes L1R and F9L encode membrane proteins that have a conserved domain architecture, with a single, C-terminal transmembrane helix, and an N-terminal, multiple-disulphide-bonded domain. The conservation of the myristoylated, disulphide-bonded protein L1R/F9L in most of the NCLDV correlates with the conservation of the thiol-disulphide oxidoreductase E10R which, in vaccinia virus, is required for the formation of disulphide bonds in L1R and F9L [
].This entry also includes Inner membrane protein pE248R from African swine fever virus. This protein is essential for viral fusion with host endosomal membrane and core release. [
].
Vinculin is an actin filament (F-actin) binding protein involved in cell-matrix adhesion and cell-cell adhesion [
]. In addition to actin, vinculin interacts with other structural proteins such as talin and alpha-actinins [].Vinculin is a large protein of 116kDa (about a 1000 residues). Structurally the protein consists of an acidic N-terminal domain of about 90kDa separated from a basic C-terminal domain of about 25kDa by a proline-rich region of about 50 residues. The central part of the N-terminal domain consists of a variable number (3 in vertebrates, 2 in Caenorhabditis elegans) of repeats of a 110 amino acids domain. Interestingly, vinculin exists in two conformations in the cell: an open, active form and a closed, auto-inhibited state in which the head domain forms extensive interactions with the tail [
].
Herpesviruses have been implicated as a cause of tumours in a number of species. The common tumour condition of the fowl is caused by Meleagrid herpesvirus 1 (MeHV-1). Although infection is ubiquitous, only a proportion of fowl develop tumours, and the precise details of the process of tumour induction remains to be elucidated. Secretory glycoprotein GP57-65 precursor (glycoprotein A - GA) is thought to play an immunoevasive role in the pathogenesis of Marek's disease. It is a candidate for causing the early-stage immunosuppression that occurs after MDHV infection. The protein is predominantly secreted, but a small amount of mature GP57-65 is anchored in the plasma membrane, or held by other interactions. GA is similar to Herpesvirus glycoprotein C, and belongs to the immunoglobulin gene superfamily [
,
].
This domain, consisting of the distinct N-terminal PRY subdomain followed by the SPRY subdomain, is found at the C terminus of Ret finger protein-like (RFPL) protein family, which includes RFPL1, RFPL2, RFPL3 and RFPL4. In humans, RFPL transcripts can be detected at the onset of neurogenesis in differentiating human embryonic stem cells, and in the developing human neocortex [
]. The human RFPL1, 2, 3 genes have a role in neocortex development. RFPL1 is a primate-specific target gene of Pax6, a key transcription factor for pancreas, eye and neocortex development; human RFPL1 decreases cell number through its RFPL-defining motif (RDM) and SPRY domains []. The RFPL4 (also known as RFPL4A) gene encodes a putative E3 ubiquitin-protein ligase expressed in adult germ cells and interacts with oocyte proteins of the ubiquitin-proteasome degradation pathway [].
Xeroderma pigmentosum (XP) [
] is a human autosomal recessive disease,characterised by a high incidence of sunlight-induced skin cancer. Skin cells of individual's with this condition are hypersensitive to ultraviolet light, due
to defects in the incision step of DNA excision repair. There are a minimum ofseven genetic complementation groups involved in this pathway: XP-A to XP-G.
XP-A is the most severe form of the disease and is due to defects in a 30kDanuclear protein called XPA (or XPAC) [
].The sequence of the XPA protein is conserved from higher eukaryotes [
] toyeast (gene RAD14) [
]. XPA is a hydrophilic protein of 247 to 296 amino-acidresidues which has a C4-type zinc finger motif in its central section.
This superfamily represents a domain of the XPA protein.
This entry represents the first SH3 domain found in the F-BAR and double SH3 domains proteins, including FCHSD1 and FCHSD2. These proteins have a common domain structure consisting of an N-terminal F-BAR (FES-CIP4 Homology and Bin/Amphiphysin/Rvs), two SH3, and C-terminal proline-rich domains [
]. FCHSD1 and FCHSD2 are expressed in cochlear sensory hair cells. FCHSD2 interacts with WASP and N-WASP, and stimulates F-actin assembly in vitro. FCHSD1 does not bind to WASP, but binds via its F-BAR domain to the SH3 domain of Sorting Nexin 9, a BAR protein that has been shown to promote WASP-Arp2/3-dependent F-actin polymerization []. FCHSD2 has been implicated in receptor endocytosis and transport []. The insect protein nervous wreck, which acts as a regulator of synaptic growth signaling [], also contains this domain.
This is a N-terminal transactivation domain (TAD) domain 2 found in p53 proteins. In p53 two TAD domains are found termed TAD1 (residues 1-39) and TAD2 (residues 40-61), both of which have been shown to be able to independently activate gene transcription and are intrinsically disordered protein domains that adopt a helical conformation for at least part of their length when bound. This inherent flexibility allows the TADs to adapt to and bind a broad range of proteins. This entry describes TAD2 which can independently interact with Taz2 domain of the histone acetyltransferase p300 [
]. It has also been shown to bind to OB-fold domain of replication protein 70 A (RPA) [] as well as the pleckstrin homology (PH) domain of the p62 and Tfb1 subunits of human and yeast TFIIH [].
Protein slowmo (Slmo) is a mitochondrial protein from Drosophila involved in development of the nervous system [
] and germline proliferation []. Slmo contains a conserved PRELI/MSF1 domain, found in proteins from a wide variety of eukaryotic organisms []. Slmo is a homologue of yeast UPS proteins Ups1, Ups2, and Ups3, that control phospholipid metabolism in the mitochondrial intermembrane space. Humans possess four principal homologues of the Ups family, namely PRELID1 (also known as PRELI), PRELID2, SLMO1 and SLMO2 (also termed PRELID3a and PRELID3b) [].The Ups family is evolutionary conserved from yeast to man, and together with its mitochondrial chaperones (TRIAP1/Mdm35), represent a unique heterodimeric lipid transfer system. In yeast mitochondria, Ups1 and Ups2 form tight complexes with Mdm35 and together they regulate the subsequent biosynthesis of cardiolipin and phosphatidylethanolamine [
].