ThermoDBP is a single-stranded DNA-binding protein (SSB) protein of the Thermoproteales. SSB proteins are essential for the genome maintenance of all known cellular organisms. Many SSBs contain an OB fold domain, albeit with low sequence conservation, and OB fold-containing SSB proteins have been detected in all three domains of life. However, the Thermoproteales SSB protein, ThermoDBP, lacks the OB fold and binds specifically to ssDNA with low sequence specificity. Its three-dimensional structure resembles that of the Hut operon positive regulatory protein HutP [
].
TTHB210 is an uncharacterized protein found in Thermus thermophilus, and is controlled by the sigma(E) /anti-sigma(E) regulatory system. It is one of the five proteins of the extracytoplasmic function (ECF) sigma factor sigma(E)-regulated gene products whose physiological function have not been determined. Its crystallographic structure reveals a novel homodecamer although it is a dimer in solution [
].
This entry contains cuticle proteins with a CX(5)C motif, although some members have a CX(7)C motif. In Anopheles gambiae, mRNA for this protein is most abundant immediately following ecdysis in larvae, pupae and adults, and is localised primarily in epidermis that secretes hard cuticle, sclerites, setae, head capsules, appendages and spermatheca. EM immunolocalisation studies have shown that the protein is present in the endocuticle of legs and antennae. CPCFC is found throughout the Hexapoda and in several classes of Crustacea [
].
Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [
,
]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [
,
].This superfamily entry represents the ribosomal protein S14 found in the small (30S) subunit of the bacterial, algal and plant chloroplast ribosome. This entry also includes homologues such as cyanelle S14, archaebacteria Methanococcus vannielii S14, as well as yeast mitochondrial MRP2, yeast YS29A/B, and mammalian S29.In E. coli, S14 is known to be required for the assembly of 30S particles and may also be responsible for determining the conformation of 16S rRNA at the A site. S29 has zinc-finger-like motifs [
].
Plexin domain-containing protein 1 (TEM7) and plexin domain-containing protein 2 (TEM7R) were identified as tumour endothelial markers (TEMs) that displayed elevated expression during tumour angiogenesis [
]. TEM7 encodes a type I transmembrane protein with a large extracellular domain, a hydrophobic transmembrane domain, and a short cytoplasmic tail. Its cell surface expression is essential during endothelial cell capillary morphogenesis [].
This is the C-terminal domain of protein NDNF (Neuron-derived Neurotrophic Factor). NDNF is a glycosylated and disulfide-bonded secreted protein expressed in brain and spinal cord that promotes migration and neurite growth of hippocampal neurons. It also promotes endothelial cell survival, vessel formation and plays an important role in the process of revascularization. This domain contains one of the two fibronectin type III domains (FnIII) present in NDNF and one of the two potential consensus sites for N-linked glycosylation [
,
].
TraY is part of the relaxosome, a complex of DNA-processing proteins required for the initiation of conjugative DNA transfer. It facilitates a site- and strand-specific cut in the origin of transfer by TraI, at the nic site [
].
This entry includes FYN-binding protein 1 and 2 (FYB1, FYB2) and PML-RARA-regulated adapter molecule 1 (PRAM1). Most of the proteins in this entry contain an altered SH3 domain fold.FYN-binding protein 1 (FYB1), also known as ADAP or SLAP, is an adapter protein in beta 1 integrin signalling and T lymphocyte migration [
]. It has been found to co-localise with F-actin in membrane ruffles, adhesion plaques/podosomes and phagocytic cups [,
]. In activated T cells, Fyb/SLAP associates with Ena/VASP family proteins and may link T cell signalling to the actin cytoskeleton remodelling []. FYB1 also interacts with mammalian actin binding protein 1 (mAbp1) that affects F-actin dynamics [].FYN-binding protein 2 (FYB2, also known as ARAP) is a adaptor protein required for TCR signaling and integrin-mediated adhesion [
].PRAM1 is an intracellular adaptor molecule that is upregulated during the induced granulocytic differentiation of promyelocytic leukemic cells and during normal human myelopoiesis [
].
This entry includes brain-enriched guanylate kinase-associated protein (BEGAIN) and tight junction-associated protein 1 (Tjap1).Brain-enriched guanylate kinase-associated protein (BEGAIN) interacts with postsynaptic density protein PSD-95/SAP90 and may be involved in the organization of the components of synaptic junctions [
].Tight junction-associated protein 1 (also known as Pilt) is a family of eukaryotic tight junction-proteins [
]. Pilt is a component of TJs (Tight junctions) rather than AJs (Adhesin junctions). TJs function as a barrier preventing solutes and water from passing freely through the paracellular pathway. TJs consist of transmembrane proteins (claudin, occludin, and JAM) plus many peripheral membrane proteins and cell polarity molecules. Pilt is a novel peripheral membrane protein which is incorporated into TJs after TJ strands are formed, thereby the name Pilt for 'protein incorporated later into TJs'. Pilt binds to the guanylate-kinase region of hDlg/SAP97 (disk large homologue/synapse-associated protein 97) [].
Tmem119 is an osteoblast induction factor, also known as Obif, which encodes a single transmembrane protein. It promotes the differentiation of myoblasts into osteoblasts [
,
], and plays an essential role in bone formation by regulating osteoblastogenesis [].
This entry represents the B3 protein from Orthopoxvirus. Proteins in this entry includes poxin-Schlafen from Cowpox virus. It is a nuclease responsible for viral evasion of host cGAS-STING innate immunity [
].
In eukaryotes, the translation of mRNA is initiated by the binding of eIF4F complex, which is composed of eIF4E, eIF4A and eIF4G proteins. elF4E-binding proteins (4E-BPs) are involved in translational regulation through their interaction with eIF4E. There are two elF4E-binding proteins (4E-BPs) found in S. cerevisiae, Caf20 and Eap1 [
].This entry represents Caf20 (also known as p20), which competes with elF4G for binding to elF4E and interferes with the formation of the elF4F complex, hence inhibiting translation [
,
]. Caf20 is needed for the induction of pseudohyphal growth in response to nitrogen limitation [].
The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [
]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [
,
,
].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [
]. This entry represents one of two closely related subfamilies that belong to the larger family of CRISPR-associated protein TM1801. Members are the Csd2 proteins of the Dvulg subtype of the CRISPR/cas system. CRISPR stands for Clustered Regularly Interspaced Short Palindromic Repeats. A related entry is
, the Csh2 protein of the Hmari CRISPR subtype.
The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes []. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [
,
,
].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [
]. This entry represents one of two closely related subfamilies that belong to the larger family of CRISPR-associated protein TM1801. Members are the Csh2 protein of the Hmari subtype of the CRISPR/cas system. CRISPR stands for Clustered Regularly Interspaced Short Palindromic Repeats. A related model is
, the Csd3 protein of the Dvulg CRISPR subtype.
This entry includes protein BRANCHLESS TRICHOME (BLT, At1g64690) and At5g41620 from Arabidopsis. BLT is a regulator of trichome branching. It interacts with STICHEL, another key regulator of trichome branching [
]. The function of At5g41620 is not clear.
This entry includes human holliday junction recognition protein (HJURP) and its homologue, Scm3 from budding yeasts. HJURP is a histone chaperone that plays a central role in the incorporation and maintenance of histone H3-like variant CENP-A at centromeres [
]. Scm3 is a non-histone component of centromeric chromatin that binds to CenH3-H4 histones, which are required for kinetochore assembly. Scm3 is required for Cse4 (CENP- homologue) localisation and is required for its centromeric association [
,
]. The histone H3 variant Cse4 replaces conventional histone H3 in centromeric chromatin and helps direct the assembly of the kinetochore. In addition, Scm3 has is required for G2/M progression [
] and is required to maintain kinetochore function throughout the cell cycle. Scm3 contains a nuclear export signal (NES). The N-terminal region of Scm3 is well conserved and functions as the CenH3-interacting domain, while the C-terminal region is variable in size and sometimes consists of DNA binding motifs [].
These sequences represent a group of paralogous families in plasmodium species alternately annotated as reticulocyte binding protein, 235kDa family protein and rhoptry protein. Rhoptry protein is localised on the cell surface and is extremely large (although apparently lacking in repeat structure) and is important for the process of invasion of the RBCs by the parasite [
,
]. These proteins are found in Plasmodium.
These sequences represent an uncharacterised family of proteins from a number of phage of Gram-positive bacteria, including Streptococcus phage Sfi11 and Lactobacillus phage JNU_P5. This protein contains a P-loop motif, G/A-X-X-G-X-G-K-T near its amino end. The characteristics of the protein distribution suggest prophage matches in addition to the phage matches. The function of these proteins is unknown.
This entry represents a family of phage and plasmid replication proteins. In bacteriophage IKe and related phage, the full-length protein is designated gene II protein (G2P). A much shorter protein of unknown function, translated from a conserved in-frame alternative initiator, is designated gene X protein (G10P). Members of this family also include plasmid replication proteins.
This entry includes the EF-hand calcium-binding domain-containing proteins SPT21 (spermatogenesis-associated protein 21). SPT21 is involved in the differentiation of haploid spermatids [
].Many calcium-binding proteins belong to the same evolutionary family and share a type of calcium-binding domain known as the EF-hand. This type of domain consists of a twelve residue loop flanked on both sides by a twelve residue α-helical domain. In an EF-hand loop the calcium ion is coordinated in a pentagonal bipyramidal configuration. The six residues involved in the binding are in positions 1, 3, 5, 7, 9 and 12; these residues are denoted by X, Y, Z, -Y, -X and -Z. The invariant Glu or Asp at position 12 provides two oxygens for liganding Ca (bidentate ligand). Ca2 binding induces a conformational change in the EF-hand motif, leading to the activation or inactivation of target proteins. EF-hands tend to occur in pairs or higher copy numbers [
,
,
,
,
].
Mitosin or centromere-associated protein-F (Cenp-F) is a coiled-coil protein that dimerizes and localizes to diverse subcellular locations, including microtubule plus-ends, mitochondria, nuclear pores, and kinetochores [
]. It localizes to kinetochores during mitosis and is then rapidly degraded after mitosis. It is required for kinetochore-microtubule interactions and spindle checkpoint function []. Cenp-F contains two microtubule-binding domains, and physically associates with dynein motor regulators []. Cenp-F has also been shown to couple mitochondria to dynamic microtubule tips [].
This is a family of proteins found in SARS and SARS-like coronaviruses. It includes Protein 9b from SARS coronavirus 2 (SARS-CoV-2), Human SARS coronavirus (SARS-CoV) and Bat coronaviruses.Protein 9b is one of 8 accessory proteins in SARS-CoV [
]. The gene (ORF 9b, also known as ORF13) that encodes this protein is included within the nucleocapsid (N) gene (alternative ORF) []. Data suggest that protein 9b is a structural component of SARS-CoV virions and functions as an unusual lipid binding protein [,
].SARS-CoV ORF-9b has been shown to localise to the outer mitochondrial membrane and to target mitochondrial antiviral signalling proteins (MAVS), suppressing innate immunity [
,
,
]. Antibodies against SARS-CoV ORF-9b have been found in patients, demonstrating that it is produced during infection [,
].Protein 9b from SARS CoV comprises 98 amino acids. Its structure has a novel fold which forms a dimeric tent-like beta structure with an amphipathic surface, and a central hydrophobic cavity that binds lipid molecules [
]. This cavity is likely to be involved in membrane attachment [].Protein 9b is a group-specific protein of SARS coronavirus (CoV). The sequence of ORF-9b is well conserved in different SARS isolates, however, there is little homology between Protein 9b from SARS-CoV and the I protein (Protein 9b homologue) present in other coronaviruses [
,
,
].
Frequency (FRQ) is a key circadian clock component that is involved in the generation of biological rhythms in Neurospora crassa [
]. It plays a role in rhythm stability, period length, and temperature compensation. FRQ behaves as a negative element in the circadian transcriptional loop []. The protein has been shown to interact with itself via a coiled-coil [].
Leukemia-associated protein 7 (DLEU7 or LEU7) is a gene of unknown function deleted in lymphocytic leukemia [
]. It is a vertebrate-specific gene that might have a role in development [].
The transcriptional corepressor CtBP is a dehydrogenase with sequence and structural similarity to the d2-hydroxyacid dehydrogenase family. CtBP was initially identified as a protein that bound the PXDLS sequence at the adenovirus E1A C-terminal, causing the loss of CR-1-mediated transactivation. CtBP binds NAD(H) within a deep cleft, undergoes a conformational change upon NAD binding, and has NAD-dependent dehydrogenase activity [
,
].
MCM proteins are DNA-dependent ATPases required for the initiation of eukaryotic DNA replication [
,
,
]. In eukaryotes there is a family of eight proteins, MCM2 to MCM9. They were first identified in yeast where most of them have adirect role in the initiation of chromosomal DNA replication by interacting directly with autonomously replicating sequences (ARS). They were thus called minichromosome maintenance proteins, MCM proteins [
]. These proteins are evolutionarily related and belong to the AAA+ superfamily. They contain the Mcm family domain, which includes motifs that are required for ATP hydrolysis (such as the Walker A and B, and R-finger motifs). Mcm2-7 forms a hexameric complex [] in which individual subunits associate with different affinities, and there is a tightly associated core of Mcm4 (Cdc21), Mcm6 (Mis5) and Mcm7 []. Mcm2-7 complex is the replicative helicase involved in replication initiation and elongation [], whereas Mcm8 and Mcm9 from and separate one, conserved among many eukaryotes except yeast and C. elegans. Mcm8/9 complex play a role during replication elongation or recombination, being involved in the repair of double-stranded DNA breaks and DNA interstrand cross-links by homologous recombination. Drosophila is the only organism that has MCM8 without MCM9, involved in meiotic recombination [,
].This family is also present in the archebacteria in 1 to 4 copies. Methanocaldococcus jannaschii (Methanococcus jannaschii) has four members, MJ0363, MJ0961, MJ1489 and MJECL13.Schizosaccharomyces pombe (Fission yeast) MCMs, like those in metazoans, are found in the nucleus throughout the cell cycle. This is in contrast to the Saccharomyces cerevisiae (Baker's yeast) in which MCM proteins move in and out of the nucleus during each cell cycle. The assembly of the MCM complex in S. pombe is required for MCM localisation, ensuring that only intact MCM complexes remain in the nucleus [
].
These sequences represent the CopA copper resistance protein family. CopA is related to laccase (benzenediol:oxygen oxidoreductase) and L-ascorbate oxidase, both copper-containing enzymes. Most members have a typical TAT (twin-arginine translocation) signal sequence with an Arg-Arg pair. Twin-arginine translocation is observed for a large number of periplasmic proteins that cross the inner membrane with metal-containing cofactors already bound. The combination of copper-binding sites and TAT translocation motif suggests a mechanism of resistance by packaging and export.
Small vasohibin-binding protein (SVBP) acts as a secretory chaperone for VASH1 and contributes to its anti-angiogenic activity [
,
]. SVBP also interacts with VASH2 and affects its angiogenesis activity [].
Tail virion protein 7P is a viral transmembrane protein that interacts with the packaging signal of the viral genome leading to the initiation the virion concomitant assembly-budding process in the host inner membrane.
This family of proteins is found in eukaryotes. Proteins in this family are typically between 152 and 182 amino acids in length. Their function is not known.
This family of proteins is found in eukaryotes. Proteins in this family are typically between 117 and 297 amino acids in length. The function is not known but there are many highly conserved proline residues.
Soti is a post-meiotically transcribed gene that is required in late spermiogenesis for normal spermatid individualisation. Besides, it is expressed in primary spermatocytes and round spermatids [
].
Swm2 (Synthetic With MUD-2-delta protein2) binds Tgs1, an enzyme responsible for 2,2,7-trimethylguanosine (TMG) capping of small nuclear (sn) RNAs implicated in pre-mRNA splicing [,
,
]. Deletion of SWM2 impairs pre-mRNA splicing and ribosome biogenesis, exhibiting a similar phenotype to tgs1 mutant. Swm2 is required for trimethylation of spliceosomal snRNAs and the U3 snoRNA by Tgs1 [].
TDRP is a family of proteins found in chordates. It is predominantly expressed in the testis, distributed in both cytoplasm and the nuclei of spermatogenic cells. It may act as a nuclear factor with an important role in spermatogenesis [
].
The function of C6orf120 protein is not clear. C6orf120 recombinant protein has been shown to induce apoptosis of Jurkat cells and primary CD4(+) T cells [
].
This entry represents a lineage-specific bacterial ribosomal small subunit protein bTHX (previously THX), originally shown to exist in the genus Thermus [
]. The protein is conserved for the first 26 amino acids, past which some members continue with additional sequence, often repetitive or low-complexity. This entry also includes 30S ribosomal proteins from plant mitochondria and plastids, which have additional N-terminal transit peptides [].
Spike protein (also known as S protein) is a virus glycoprotein that mediates receptor binding, membrane fusion, and virus entry, and determines host range [
]. This entry represents the S protein from Torovirinae. This entry also includes a putative glycoprotein from Ictalurid herpesvirus 1.
This family of uncharacterised proteins is found in prophage regions of a number of bacterial genomes, including Haemophilus influenzae, Xylella fastidiosa, Salmonella typhi, and Enterococcus faecalis. This family includes sequences mainly from proteobacteria and firmicutes, such as Abc1, a protein that counteracts or regulates the endogenous CBASS (cyclic oligonucleotide-based antiphage signalling system) antiviral defense system. It is a phosphodiesterase that enables metal-independent hydrolysis of the host cyclic di- and trinucleotide CBASS signals such as 3'3'-cGAMP, 3'3'cUA, and 3'3'3'-cAAA [
].
This family consists of Abc1 protein which counteracts or regulates the endogenous CBASS (cyclic oligonucleotide-based antiphage signalling system) antiviral defense system. It is a phosphodiesterase that enables metal-independent hydrolysis of the host cyclic di- and trinucleotide CBASS signals such as 3'3'-cGAMP, 3'3'cUA, and 3'3'3'-cAAA [
].
Spo11 is a meiosis-specific protein that is responsible for the initiation of recombination during the early stages of meiosis through the formation of DNA double-strand breaks (DSBs) by a type II DNA topoisomerase-like activity [
,
]. These DSBs initiate homologous recombination, which is required for chromosomal segregation and generation of genetic diversity during meiosis. Spo11 acts in conjunction with several other proteins, including Rec102 in yeast, to bring about meiotic recombination []. Mouse and human homologues of Spo11 have been cloned and characterised. The proteins are 82% identical and share ~25% identity with other family members. Mouse Spo11 has been localised to chromosome 2H4, and human SPO11 to chromosome 20q13.2-q13.3, a region amplified in some breast and ovarian tumours [].Similarity between SPO11 and archaebacterial TOP6A proteins points to evolutionary specialisation of a DNA-cleavage function for meiotic recombination [
]. Note that the yeast SPO11 protein shares far less similarity to other SPO11 proteins than the human and mouse homologues do to each other.
Uup is an ATP-binding cassette (ABC) protein of the F type (ABCF). It has ATPase activity and possesses two nucleotide binding domains which comprise Walker A/B motifs connected by a linker and a coiled-coil C-terminal domain. It binds double-stranded DNA with no sequence specificity [
]. The Uup ATPase activity functions cooperatively thus the inactivation of one site suppresses all ATP hydrolytic activity []. It is implicated in precise excision of transposons. It bind to branched DNAs and is involved in resolution of branched DNA intermediates that result from template switching in post replication gaps [].It may also be involved in the 50S ribosome subunit assembly due to its genetic interaction with translational GTPase BipA [].
This entry represents tape measure proteins (TMPs) found in Siphoviridae [
]. TMP is important for assembly of phage tails and involved in tail length determination [,
]. It serves as a ruler that controls the length of tail and is ejected from the tail prior to the DNA during DNA injection [,
]. Mutated forms of TMP cause tail fibres to be shortened [].This protein is also found in bacteria, suggesting prophage matches also occur.
Eighty-one archaeal-like genes, ranging in
size from 4-20kb, are clustered in 15 regions of the Thermotoga maritima genome [].Conservation of gene order between T. maritima and Archaea in many of these
regions suggests that lateral gene transfer may have occurred betweenthermophilic Eubacteria and Archaea [
]. One of the T. maritima sequences (hypothetical protein TM1410)
shares similarity with Methanocaldococcus jannaschii (Methanococcus jannaschii) hypothetical protein MJ1477and with hypothetical protein DR0705 from Deinococcus radiodurans. The
sequences are characterised by relatively variable N- and C-terminal domains,and a more conserved central domain. They share no similarity with any other
known, functionally or structurally characterised proteins.
This domain is found in CorA present in Methylomicrobium album. CorA is a copper repressible surface associated copper(I)-binding protein. CorA can bind one copper ion per protein molecule. The overall fold of CorA is similar to M. capsulatus protein MopE, including the unique copper(I)-binding site and most of the secondary structure elements [
].
Interleukin-1 alpha and interleukin-1 beta (IL-1 alpha and IL-1 beta) are
cytokines that participate in the regulation of immune responses, inflammatory reactions, and hematopoiesis []. Two types of IL-1 receptor, each with three extracellular immunoglobulin (Ig)-like domains, limited sequence similarity (28%) and different pharmacological characteristics have been cloned from mouse and human cell lines: these have been termed type I and type II receptors []. The receptors both exist in transmembrane (TM) and soluble forms: the soluble IL-1 receptor is thought to be post-translationally derived from cleavage of the extracellular portion of the membrane receptors.Both IL-1 receptors appear to be well conserved in evolution, and map to the
same chromosomal location []. The receptors can both bind all three forms of IL-1 (IL-1 alpha, IL-1 beta and IL-1RA).The crystal structures of IL1A and IL1B [
] have been solved, showing them to share the same 12-stranded β-sheet structure as both the heparin binding growth factors and the Kunitz-type soybean trypsin inhibitors []. The β-sheets are arranged in 3 similar lobes around a central axis, 6 strands forming an anti-parallel β-barrel. Several regions, especially the loop between strands 4 and 5, have been implicated in receptor binding.The Vaccinia virus genes B15R and B18R each encode proteins with N-terminal
hydrophobic sequences, possible sites for attachment of N-linked carbohydrate and a short C-terminal hydrophobic domain []. These propertiesare consistent with the mature proteins being either virion, cell surface or secretory glycoproteins. Protein sequence comparisons reveal that the gene products are related to each other (20% identity) and to the Ig superfamily. The highest degree of similarity is to the human and murine interleukin-1 receptors, although both proteins are related to a wide range of Ig superfamily members, including the interleukin-6 receptor. A novel method for virus immune evasion has been proposed in which the product of one or both of these proteins may bind interleukin-1 and/or interleukin-6, preventing these cytokines reaching their natural receptors [
]. A similar gene product from Cowpox virus (CPV) has also been shown to specifically bind murine IL-1 beta [].
SSTK-IP, SSTK-interacting protein or TSSK6-activating co-chaperone, is a family of proteins found in eukaryotes. SSTK-IP directly binds to HSP70, is found associated with HSP70 and HSP90 in cells, and facilitates HSP90-dependent enzymatic activation of SSTK. SSTK is a small serine/threonine kinase expressed post-meiotically and essential for male fertility along with two other serine threonine kinases. SSTK is one of the smallest protein kinases, consisting only of N- and C-lobes of a kinase catalytic domain, and forms stable associations with heat shock protein (HSP) 70 and 90. SSTK-IP, its interacting protein, represents a germ cell-specific co-chaperone critical to the HSP90-mediated activation of SSTK [
].
This domain is found in periplasmic binding proteins, mainly belonging to the leucine-binding and Leu/Ile/Val-binding proteins. The periplasmic leucine-binding protein is the primary receptor for the leucine transport system in Escherichia coli [
].
The function of this family, TMEM187, is not known, however it is predicted to be a multi-pass membrane protein. This protein family is also alternatively named ITBA1. Proteins in this family are found in eukaryotes and are typically between 239 and 267 amino acids in length.
This protein family has an important function in acting against the prion protein, scrapie [
,
]. This family of proteins is found in eukaryotes. Proteins in this family are approximately 98 amino acids in length.
The function of this family of transmembrane proteins has not, as yet, been determined. However, it is thought to be a therapeutic target for ovine lentivirus infection [
]. This family of proteins is found in eukaryotes and members are typically between 138 and 320 amino acids in length.
The function of this family of transmembrane proteins, TMEM89, has not, as yet, been determined. This family of proteins is found in eukaryotes. Proteins are approximately 159 amino acids in length.
Proteins in this entry include the Immunity protein YezG from Bacillus subtilis and Type VII secretion system protein EsaG from Staphylococcus aureus. YezG is the antitoxin component of a LXG toxin-antitoxin (TA) module that promote kin selection, mediate competition in biofilms, and drive spatial segregation of different strains, probably helping to avoid warfare between strains in biofilms [
]. EsaG is part of toxin-antitoxin system that counteracts the toxic effect of EssD via direct interaction [].
GTR1 was first identified in Saccharomyces cerevisiae (Baker's yeast) as a suppressor of a mutation in RCC1. RCC1 catalyzes guanine nucleotide exchange on Ran, a well characterised nuclear Ras-like small G protein that plays an essential role in the import and export of proteins and RNAs across the nuclear membrane through the nuclear pore complex. RCC1 is located inside the nucleus, bound to chromatin. The concentration of GTP within the cell is ~30 times higher than the concentration of GDP, thus resulting in the preferential production of the GTP form of Ran by RCC1 within the nucleus.Gtr1p is located within both the cytoplasm and the nucleus and has been reported to play a role in cell growth. Biochemical analysis revealed that Gtr1 is in fact a G protein of the Ras family. The RagA/B proteins are the human homologues of Gtr1 and Rag A and Gtr1p belong to the sixth subfamily of the Ras-like small GTPase superfamily [
].
The pathogenic dimorphic fungal organism Blastomyces dermatitidis exists as a budding yeast at 37 degrees C and as a mycelium at 25
degrees C. Bys1 is expressed specifically in the high temperature, unicellular yeast morphology and codes for a protein of 18.6kDa that contains multipleputative phosphorylation sites, a hydrophobic N terminus, and two 34-amino-acid domains with similarly spaced nine-amino-acid
degenerative repeating motifs []. The molecular function of this protein is not known.
This entry represents the minor spike protein (also known as H protein), which is a minor spike component of the viral shell. It is involved in the ejection of the phage DNA in the host and is injected with the DNA in the periplasmic space of the host. It is involved in the determination of the phage host-range [
,
]. Bacteriophage PhiX174 is one of the simplest viruses, having a single-stranded, closed circular DNA of 5386 nucleotide bases and four capsid proteins, J, F, G and
H. A single molecule of H protein is found on each of the 12 spikes on the microvirus shell of the bacteriophage. H is involved in the ejection of the phage DNA, and at least one copy is injected into the hosts periplasmic space along with the ssDNA viral genome []. Part of H is thought to lie outside the shell, where it recognises lipopolysaccharide from virus-sensitive bacterial strains []. Part of H may lie within the capsid, since mutations in H can influence the DNA ejection mechanism by affecting the DNA-protein interactions []. H may span the capsid through the hydrophilic channels formed by G proteins [].
Pycsar (pyrimidine cyclase system for antiphage resistance) provides immunity against bacteriophage and consists of a pyrimidine cyclase (PycC), which synthesizes cyclic nucleotides in response to infection and an effector protein. Cyclic nucleotides serve as specific second messenger signals, which activate the adjacent effector, leading to bacterial cell death and abortive phage infection [
].This family represents the effector protein from Pycsar system.
This entry represents Uncharacterized protein KIAA2013 and similar proteins from animals. K2013 might be related to cannabinoid receptors [
] and neurodegenerative disorders such as ALS []. The function of this protein is still unknown.
This entry represents a group of nuclear proteins that are conserved from nematodes to humans. They can form a helix-loop-helix motif and are related to, yet distinct from, the HMG-I/Y-like subfamily of HMG proteins. This entry includes two human nuclear proteins, NUPR1 (also known as p8) and NUPR2. Similar to several HMGs, NUPR1 binds DNA, however, it has a low affinity and poor sequence specificity for DNA binding [
]. NUPR1 is multifunctional protein that interacts with several partners to target different signalling pathways, such as the NUPR1/RELB/IER3 survival pathway [] and the PI3K/AKT signaling pathway []. It also binds to and inhibits MSL1, a protein with 53BP1-dependent DNA-repair activity [].
Macroautophagy is a bulk degradation process induced by starvation in eukaryotic cells [
]. In yeast, 15 autophagy (Apg) proteins coordinate the formation of autophagosomes. Apg14p and Vps30p/Apg6p, components of an autophagy-specific phosphatidylinositol 3-kinase complex, together with Apg9p and Apg16p, are required for localisation of Apg5p and Aut7p to a pre-autophagosome structure that functions in autophageosome formation [,
]. Mutations in the Saccharomyces cerevisiae APG14 gene have been shown to cause defects in autophagy [].
TRAF3-interacting protein 1 (TRAF3IP1) recruits TRAF3 (tumour necrosis factor receptor-associated factor 3) and DISC1 (Disrupted-In-Schizophrenia 1) to the microtubules and is conserved from worms to humans [
]. The N-terminal region is the microtubule binding domain and is well-conserved; the C-terminal 100 residues, also well-conserved, constitute the coiled-coil region which binds to TRAF3. The central region of the protein is rich in lysine and glutamic acid and carries KKE motifs which may also be necessary for tubulin-binding, but this region is the least well-conserved [
]. In humans, it plays an inhibitory role on IL13 signaling by binding to IL13RA1. It is involved in suppression of IL13-induced STAT6 phosphorylation, transcriptional activity and DNA-binding [,
].
Sec22a is a multi-pass membrane protein that may be involved in vesicle transport between the ER and the Golgi complex [
]. Sec22c is an endoplasmic reticulum (ER)-localised transmembrane protein involved in regulation of the vesicle transport between the ER and the Golgi [].This entry includes Sec22a and Sec22c. They belong to the synaptobrevin family.
Pet117 is a family of eukaryotic proteins found from fungi and plants to human. It is likely to be involved in the assembly of cytochrome C oxidase, and is found in the mitochondrion [
].
Yvfg is a hypothetical protein of 71 residues expressed in some bacteria. The monomer consists of two parallel α-helices, and the protein crystallises as a homodimer.
Slingshot (SSH) is a cofilin-specific phosphatase. Dephosphorylation reactivates cofilin, which in turn depolymerizes actin and is thus required for actin filament reorganization [
]. Slingshot is a member of the dual-specificity protein phosphatase family []. The N-terminal SSH region may be involved in P-cofilin binding (the model C terminus plus the DEK_C-like domain, which are characterized as the "B"domain in some of the literature), and may be required for the F-actin mediated activation of slingshot [
,
].
Macrophage stimulating protein (MSP; also known as hepatocyte growth factor-like protein) and its close relative hepatocyte growth factor (
) are non-peptidase homologues belonging to MEROPS peptidase family S1 (chymotrypsin family, clan PA(S)), subfamily S1A.
MSP is a plasma protein that is secreted from the liver into the circulation as a single-chain, biologically inactive pro-protein. Proteolytic cleavage at a single site yields the active molecule composed of disulphide-linked alpha and beta chains. The MSP receptor is a transmembrane tyrosine kinase found in murine resident peritoneal macrophages but not exudate macrophages or blood monocytes. MSP induces phosphorylation of the receptor cytoplasmic domain, association of phosphatidylinositol (PI)-3 kinase with the receptor, and phosphorylation of receptor-bound PI-3 kinase. MSP may function in tissue injury or wound healing [
].
Arb1 is a component of the argonaute siRNA chaperone (ARC) complex which is required for histone H3K9 methylation, heterochromatin assembly and siRNA generation [
].
The YycFG two-component system is the only signal transduction system in Bacillus subtilis known to be essential for cell viability. This system is highly conserved in low-G+C Gram-positive bacteria, regulating important processes such as cell wall homeostasis, cell membrane integrity, and cell division. Four other genes, yycHIJK, are organised within the same operon with yycF and yycG in B. subtilis. This entry represents a domain found in the YycI proteins. It shares the same structural fold with domains two and three of YycH [
] (). Both YycH and YycI are always found in pair on the chromosome, downstream of the essential histidine kinase YycG. Additionally, both proteins share a function in regulating the YycG kinase with which they appear to form a ternary complex. Lastly, the two proteins always contain an N-terminal transmembrane helix and are localized to the periplasmic space as shown by PhoA fusion studies.
YycI and YycH proteins interact to control the activity of the YycG kinase. Both YycI and YycH proteins are localized outside the cytoplasm and attached to the membrane by an N-terminal transmembrane sequence. Bacterial two-hybrid data showed that the YycH, YycI, and the kinase YycG form a ternary complex. The data suggest that YycH and YycI control the activity of YycG in the periplasm and that this control is crucial in regulating important cellular processes
[,
].
Rabbit hemorrhagic disease virus (RHDV) which causes a highly contagious disease of wild and domestic rabbits belongs to the family Caliciviridae [
]. The capsid protein self assembles to form an icosahedral capsid with a T=3 symmetry. It is about 38nm in diameter and consists of 180 capsid proteins. The capsid encapsulates the genomic RNA and VP2 proteins and attaches the virion to target cells by binding histo-blood group antigens present on gastroduodenal epithelial cells. The Shell domain (S domain) contains elements essential for the formation of the icosahedron. The Protruding domain (P domain) is divided into sub-domains P1 and P2. An hypervariable region in P2 is thought to play an important role in receptor binding and immune reactivity.This entry represents a calicivirus coat protein.
This domain occurs in the capsid proteins of picornaviruses and caliciviruses. They are non-enveloped plus-strand ssRNA animal viruses. The Picornaviridae family includes rhinovirus (common cold), poliovirus, hepatitis A virus, foot-and-mouth disease virus and encephalomyocarditis virus. The common structure of this domain consists of an 8-stranded beta sandwich [
,
].
PDZK1-interacting protein 1 (PDZK1IP1, MAP17) is a small non-glycosylated two-pass membrane protein that is overexpressed in many tumours of different origins, including carcinomas [
,
,
,
].This entry also includes small integral membrane protein 24 (SMIM24) from mammals and proximal tubules-expressed gene protein (pteg) from Xenopus. The function of SMIM24 is not clear. Pteg is essential for pronephric mesoderm specification and tubulogenesis [].
This is a family of putative phage translocases involved in the injectosome mechanism. This entry includes gp20, a component of the phage injection machinery, from Enterobacteria phage P22 [
]. Phage P22 of Salmonella typhimurium ejects four proteins, gp7, gp16, gp20 and gp26, which are ejected from the phage virion into the bacterial cell after absorption. These four proteins may play a role in DNA ejection [
,
,
].
Chloroplastic protein TIC110 is a component of the Tic complex [
]. It has been proposed to be central to protein translocation across the inner envelope membrane [,
], though this is still controversial [].
Members of this family of proteins are a component of the heterotetrameric Sec62/63 complex composed of SEC62, SEC63, SEC66 and SEC72. The Sec62/63 complex associates with the Sec61 complex to form the Sec complex. Sec 66 is involved in SRP-independent post-translational translocation across the endoplasmic reticulum and functions together with the Sec61 complex and KAR2 in a channel-forming translocon complex. Furthermore, Sec66 is also required for growth at elevated temperatures [
,
,
,
].
Autophagy is an intracellular degradation system that responds to nutrient starvation. Atg31 has been shown to be required for autophagosome formation in Saccharomyces cerevisiae [
]. It functions with Atg17 and Atg29 at the preautophagosomal structure (PAS) in order to form normal autophagosomes under starvation conditions [].
This family of proteins is found in eukaryotes. Proteins in this family are typically between 241 and 349 amino acids in length. The function of this family is unknown.
SPATA9, spermatogenesis-associated protein 9, or testis development protein NYD-SP16, is a family of eukaryotic proteins associated with sperm production. It is highly expressed in human testis and contains one transmembrane domain. Its localisation indicates it is likely to play an important role in testicular development and spermatogenesis and may be an important factor in male infertility [
].
This entry includes the DNA-binding protein VF530 from Vibrio fischeri. VF530 contains a unique four-helix motif that shows some similarity to the C-terminal double-stranded DNA (dsDNA) binding domain of RecA, as well as other nucleic acid binding domains [
].
In mice, Tmem178 is a negative regulator of osteoclast differentiation in basal and inflammatory conditions by regulating TNFSF11-induced Ca (2+) fluxes [
].
Proteins in this entry, typified by YhbH from Bacillus subtilis, are found in the genomes of nearly every endospore-forming bacterium, and in no other genomes. The gene in B. subtilis was shown to be a member of the sigma-E regulon, with mutation leading to a sporulation defect [
].