This group represents a Wee1-like protein kinase (
), which is a key regulator for G(2)-to-M progression. This enzyme phosphorylates Tyr15 on Cdc2 (cell division control protein 2) when Cdc2 is complexed to cyclin B1, which results in blocking the progression of cells into M phase [
]. As such, it negatively regulates the entry of a cell into mitosis (G2 to M transition) by protecting the nucleus from cytoplasmically activated cyclin B1-complexed Cdc2 before the onset of mitosis [,
]. The activity of this enzyme increases during S and G2 phases and decreases at M phase when it is hyperphosphorylated; it is then ubiquitinated and degraded at the onset of G2/M phase.
This entry represents the N-terminal domain of the protein MGARP (also known as HUMMR) [
]. HUMMR interacts with Miro, an mitochondrial membrane protein. It biases mitochondrial movement in the anterograde direction in response to hypoxia [].
The actin-filament associated protein (AFAP) family of adaptor proteins include AFAP1, AFAP1L1, and AFAP1L2/XB130 [
]. They contain the pleckstrin homology (PH) domain that plays a role in recruiting proteins to different membranes.
This family of proteins is found in bacteria. Proteins in this family are approximately 60 amino acids in length. There are 5 conserved cysteines which occur in a CPXCG motif and a DCXXCCXP motif.
This family of proteins bind to DNA and to TBP (TATA box binding protein), TATA-binding protein (TBP)-related protein 2 (TRF2) and several polycomb factors. It is likely to function as a transcription regulator [
,
].
NYD-SP12, also known as SPATA16, is a germ-cell specific participant in the Golgi apparatus, and its expression is confined to spermatogenic epithelium, not being found in interstitial cells []. Computer analysis of the protein-sequence showed that NYD-SP12 contains a cluster of phosphorylation sites for protein kinase C as well as for cyclic nucleotide-dependent protein kinases [,
]. It has been postulated that, since the mutation of some Golgi apparatus proteins is responsible for male infertility, NYD-SP12 might play a role in modification and sorting of acrosomal enzymes [].
This entry represents plant sialyltransferase-like proteins, including SIA1 (also known as Protein MALE GAMETOPHYTE DEFECTIVE 2 MGP2) from Arabidopsis and STLP5 from Oryza sativa subsp. japonica. SIA1 is localised to the Golgi apparatus and is required for normal pollen grain germination and pollen tube growth [
,
].
Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [
,
]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [
,
].L41 associates with the ribonucleoprotein particles of the 60S subunit late in the ribosomal maturation process. L41 is encoded by the smallest known open reading frame and in yeast is composed of only 24 amino acids, 17 of which are arginine or lysine.
This entry represents a group of DNA binding proteins, known as SBP (SQUAMOSA-PROMOTER BINDING PROTEIN) family. They are putative transcription factors characterised by a highly conserved SBP-box of 76 amino acids involved in DNA binding and nuclear localisation [
]. They are involved in the control of early flower development [].
Microtubule-associated protein 6 (MAP6) is a calmodulin binding protein that is involved in microtubule stabilisation [
]. Neurons contain abundant subsets of highly stable microtubules that resist de-polymerising conditions such as exposure to the cold. Stable microtubules are thought to be essential for neuronal development, maintenance, and function. MAP6 is a major factor responsible for the intriguing stability properties of neuronal microtubules and is important for synaptic plasticity [
]. It regulates axonal growth during neuron polarization, which is controlled by a palmitoylation cycle []. MAP6 interacts with TMEM106B, and this interaction is crucial for controlling dendritic trafficking of lysosomes, presumably by acting as a molecular brake for retrograde transport []. TMEM106B is a major risk factor for frontotemporal lobar degeneration with TDP-43 pathology.
This entry represents a group of plant F-box proteins, including AUF1 (At1g78100) and AUF2 (At1g22220) from Arabidopsis [
]. AUF1/2 may mediate the crosstalk between auxin and cytokinin, which modifies auxin distribution and ultimately root elongation [].
In animals, Targeting Protein for Xklp2 (TPX2) is critical for mitotic spindle assembly. It has a highly conserved TPX2 central domain, an Aurora-A binding domain and a TPX2_C domain that contains a conserved pentapeptide KLEEK motif. However, in plants besides the TPX2 homologue (with three domains), several proteins were found with only the TPX2_C domain. This group of TPX2_C domain only proteins in plants has been named as WDL (or WVD2-like) proteins, as the first identified member is known as WVD2. These WDL proteins bind to microtubules; however, they have lost the function of targeting Xklp2 and were found to play roles in plant development and responses to environment cues [
,
].This entry includes WAVE-DAMPENED2 (WVD2) and WDL1-4. WVD2, and its closest paralogue WDL1, modulate both rotational polarity and anisotropic cell expansion during organ growth and also promote clockwise root and etiolated hypocotyls coiling, clockwise leaf curling, but left-handed petiole twisting [
,
]. WDL3 stabilises microtubules, playing an important role in the regulation of hypocotyl cell elongation. WDL3 levels are regulated by an ubiquitin-26S proteasome-dependent pathway in response to light and functions as a negative regulator of hypocotyl cell elongation in the light [].
Competence is the ability of a cell to take up exogenous DNA from its environment, resulting in transformation. It is widespread among bacteria and is probably an important mechanism for the horizontal transfer of genes. DNA usually becomes available by the death and lysis of other cells. Competent bacteria use components of extracellular filaments called type 4 pili to create pores in their membranes and pull DNA through the pores into the cytoplasm. This process, including the development of competence and the expression of the uptake machinery, is regulated in response to cell-cell signalling and/or nutritional conditions [].The development of genetic competence in Bacillus subtilis is a highly regulated adaptive response to stationary-phase stress. For competence to develop, the transcriptional regulator, ComK, must be activated. ComK is required for the expression of genes encoding proteins that function in DNA uptake. In log-phase cultures, ComK is inactive in a complex with MecA and ClpC. The comS gene is induced in response to high culture cell density and nutritional stress and its product functions to release active ComK from the complex. ComK then stimulates the transcription initiation of its own gene as well as that of the late competence operons [
].Proteins in this family are found in bacterial species which posses systems for natural transformation (competence) (eg Bacillus subtilis, Haemophilus influenzae), and also species without these systems (eg Escherichia coli). Competence protein F has been shown to be important for the uptake of exogenous DNA in naturally competent bacteria, though the precise role of this protein is not yet known [
,
]. GntX is a periplasmic gluconate binding protein thought to be part of a high-affinity gluconate transport system [].
This entry includes human transmembrane protein 41A/B (TMEM41A/B) and related proteins. TMEM41B is involved in early stages of autophagosome biogenesis at the ER membrane probably via mobilization of neutral lipids from lipid droplets [
]. TMEM41B and its homologues from fish are involved in motor neuron development []. It is also a critical host factor required for infection by human coronaviruses SARS-CoV-2, HCoV-OC43, HCoV-NL63, and HCoV-229E, as well as all flaviviruses tested such as Zika virus and Yellow fever virus [,
]. The function of TMEM41A is not clear.
This entry represents a group of putative HupE/UreJ proteins mostly from bacteria. These proteins contain many conserved histidines that may be involved in nickel binding.
Shedding of transmembrane protein 35 (TMEM35) produces a soluble peptide that may modulate neurite outgrowth in the adrenal zona glomerulosa after sodium depletion [
]. TMEM35 could be required for normal activity of the hypothalamic-pituitary-adrenal axis and limbic circuitry [].TMEM35A, also known as NACHO, acts as a acetylcholine receptor chaperone that mediates the proper assembly and functional expression of the nicotinic acetylcholine receptors (nAChRs) throughout the brain [
].
AHH containing protein is a predicted nuclease of the HNH/ENDO VII superfamily of the treble clef fold. The name is derived from the conserved motif, AHH. It is found in bacterial polymorphic toxin systems [
] and functions as a toxin module. Like WHH and LHH, the AHH nuclease contains 4 conserved histidines of which, the first one is predicted to bind a metal-ion and the other three ones are involved in activation of a water molecule for hydrolysis.
Primosomal protein N', also known as ATP-dependent helicase PriA, is a component of the primosome, which is involved in replication, repair, and recombination. PriA serves as a sensor/stabiliser for an arrested replication fork and eventually promotes restart of DNA replication through assembly of a primosome. It also serves as a checkpoint protein that prevents the replicase from advancing in a strand displacement reaction on forks that do not contain a functional replicative helicase [
,
].
This domain corresponds to the mature part of the Ecp2 effector protein from the tomato pathogen Cladopsorium fulvum. Effectors are low molecular weight proteins that are secreted by bacteria, oomycetes and fungi to manipulate their hosts and adapt to their environment. Ecp2 is a 165 amino acid secreted protein that was originally identified as a virulence factor in C. fulvum, since disruption reduces virulence of the fungus on tomato plants. It has been recently determined that Ecp2 is a member of a novel, widely distributed and highly diversified within the fungal kingdom, multigene superfamily which have been designated Hce2, for Homologs of C. fulvum Ecp2 effector. Although Ecp2 is present in most organisms as a small secreted protein, the mature part of this protein can be found fused to other protein domains, including the fungal Glycoside hydrolase family 18 (
) and other, unknown, protein domains. The intrinsic function of Ecp2 remains unknown but it is postulated that it is a necrosis-inducing factor in plants that serves pathogenicity on the host [].
The cell envelope of Gram-negative bacteria consists of an inner (IM) and an outer membrane (OM) separated by an aqueous compartment, the periplasm, which contains the peptidoglycan layer. The OM is an asymmetric bilayer, with phospholipids in the inner leaflet and lipopolysaccharides (LPS) facing outward [
,
]. The OM is an effective permeability barrier that protects the cells from toxic compounds, such as antibiotics and detergents, thus allowing bacteria to inhabit several different and often hostile environments. LPS is responsible for the permeability properties of the OM. LPS consists of the lipid A moiety (a glucosamine-based phospholipid) linked to the short core oligosaccharide and the distal O-antigen polysaccharide chain. The core oligosaccharide can be further divided into an inner core, composed of 3-deoxy-D-mannooctulosanate (KDO) and heptose, and an outer core, which has a somewhat variable structure. LPS is essential in most Gram-negative bacteria, with the notable exception of Neisseria meningitidis. The biogenesis of the OM implies that the individual components are transported from the site of synthesis to their final destination outside the IM by crossing both hydrophilicand hydrophobic compartments. The machinery and the energy source that drive this process are not yet fully understood. The lipid A-core moiety and the O-antigen repeat units are synthesized at the cytoplasmic face of the IM and are separately exported via two independent transport systems, namely, the O-antigen transporter Wzx (RfbX) [
,
] and the ATP binding cassette (ABC) transporter MsbA that flips the lipid A-core moiety from the inner leaflet to the outer leaflet of the IM [,
,
]. O-antigen repeat units are then polymerised in the periplasm by the Wzy polymerase and ligated to the lipid A-core moiety by the WaaL ligase [see, ,
].The LPS transport machinery is composed of LptA, LptB, LptC, LptD, LptE. This supported by the fact, that depletion of any of one of these proteins blocks the LPS assembly pathway and results in very similar OM biogenesis defects. Moreover, the location of at least one of these five proteins in every cellular compartment suggests a model for how the LPS assembly pathway is organised and ordered in space [
].LptD forms a complex with LptE, which is involved in the assembly of LPS in the outer leaflet of the outer membrane. It determines N-hexane tolerance and is involved in outer membrane permeability as well as being ssential for envelope biogenesis [
,
,
].
WHH domain containing protein is a predicted nuclease of the HNH/ENDO VII superfamily. The name is derived from the conserved motif WHH. It is found in bacterial polymorphic toxin systems [
] and functions as a toxin module. It is the shortest version of HNH nuclease families. Like AHH and LHH, the WHH nuclease contains 4 conserved histidines of which the first one is predicted to bind a metal-ion and other three ones are involved in activation of water molecule for hydrolysis [].
HS1-associating protein X-1 (HAX-1) promotes cell survival [
] and is involved in the clathrin-mediated endocytosis pathway []. It has been implicated in severe congenital neutropenia (SCN), neurological disorders and cancer []. The voltage-dependent Kv3.3 potassium channel has been shown to bind to HAX-1 to induce Arp2/3 (actin-related protein 2/3 complex) dependent actin filament nucleation at the plasma membrane []. Mutations in HAX-1 gene cause neutropenia, severe congenital 3, autosomal recessive (SCN3), which is a disorder of hematopoiesis characterised by maturation arrest of granulopoiesis at the level of promyelocytes with peripheral blood absolute neutrophil counts below 0.5 x 10(9)/l and early onset of severe bacterial infections [
].
This family of transmembrane proteins is found in eukaryotes. Proteins in this family are typically between 197 and 222 amino acids in length. The function of this family is unknown.
In animals, Targeting Protein for Xklp2 (TPX2) is critical for mitotic spindle assembly. It has a highly conserved TPX2 central domain, an Aurora-A binding domain and a TPX2_C domain that contains a conserved pentapeptide KLEEK motif. However, in plants besides the TPX2 homologue (with three domains), several proteins were found with only the TPX2_C domain. This group of TPX2_C domain only proteins in plants has been named as WDL (or WVD2-like) proteins, as the first identified member is known as WVD2. These WDL proteins bind to microtubules; however, they have lost the function of targeting Xklp2 and were found to play roles in plant development and responses to environment cues [
,
].This entry includes WDL5/6 from Arabidopsis.
This entry represents the N-acetylglucosamine-binding protein A (GbpA) of various pathogenic bacteria, including Vibrio cholerae, which is the etiologic agent of cholera in humans. GbpA binds to chitin and specifically to the chitin monomer N-acetylglucosamine (GlcNAc), a sugar residue that is also found on the surface of epithelial cells. Intestinal colonization of V. cholerae occurs in a stepwise fashion, initiating with attachment to the small intestinal epithelium [
]. This attachment is followed by expression of the toxin co-regulated pilus, microcolony formation, and cholera toxin production. GlcNAc binding protein A (GbpA) is a secreted attachment factor, which functions in attachment to environmental chitin sources as well as to intestinal substrates. GbpA is expressed and secreted at low-cell densities and is suppressed at high cell-densities and in particular in cells that produced HapR, the central regulator of the cell density dependent quorum sensing system of V. cholerae. HapR activates the expression of genes encoding the secreted proteases HapA and PrtV, which degrade GbpA; suggested that the fluctuation of GbpA levels is a consequence of the levels of bacterial proteases in response to quorum sensing signals. This provides a mechanism for GbpA-mediated attachment to, and detachment from, surfaces in response to environmental cues [
].
This family represents a group of DUF3700 domain containing proteins from plants, including stem-specific protein TSJT1 from Nicotiana tabacum. Their function is not clear.
This entry represents a family of uncharacterised proteins, including YtoQ from Bacillus subtilis. This family shows some sequence similarity to a family of nucleoside 2-deoxyribosyltransferases (COG3613 as iterated through CDD), but sufficiently remote that PSI-BLAST starting from YtoQ and exploring outwards does not discover the relationship.
Nepoviruses are plant viruses that, together with comoviruses and picornaviruses, are classified in the picornavirus superfamily of plus strand single-stranded RNA viruses. Its genome consist of two single-stranded RNAs, both required for infection [
]. This family aligns several nepovirus coat protein sequences. In several cases, this is found at the C terminus of the RNA2-encoded viral polyprotein. The coat protein consists of three trapezoid-shaped β-barrel domains, and forms a pseudo T = 3 icosahedral capsid structure [
].
The matrix (M) proteins of Rabies virus (RV) plays a key role in both assembly and budding of progeny virions. A PPPY motif (PY motif or late-budding domain) is conserved in the M proteins. These PY motifs are important for virus budding and for mediating interactions with specific cellular proteins containing WW domains.
Reticulons are a family of ER-localised proteins found in a wide range of eukaryotes and have been shown to localise to the ER in many species. This entry represents a group of plant reticulon-like proteins, including RTNLB17/18/21 from Arabidopsis. RTNLB17/18/21 are reticulon proteins with an additional N-terminal domain. However, there is no function predicted or identified for the N-terminal domain of RTNLB17/18/21 [
].
This entry contains Epstein-Barr virus EBNA-LP protein. It is a protein involved in latency whose function is not fully understood. The protein contains three domains, each of which contains conserved serine residues within conserved regions (CR1 to CR3). These regions are essential for the EBNA2 cooperativity function. The domains have a bipartite nuclear localisation signal and nuclear localisation of EBNA-LP is essential for EBNA2 cooperativity function. [
].
Adenoviruses are medium-sized, non-enveloped viruses containing double-stranded DNA. They can cause a variety of diseases including pneumonia, cystitis, conjunctivitis and diarrhoea, all of which can be fatal to patients who are immunocompromised [
]. These viruses have many mechanisms to evade the host immune response, including several proteins which are expressed as part of the early transcription unit 3 (E3) []. One of the regions of E3, known as the E3B region, encodes three proteins known as 10.4K, 14.5K and 14.7K. Two of these proteins, 10.4K and 14.5K, form the RID complex (receptor internalisation and degradation) which protects the infected cell from host-induced lysis by clearing the the TNF and Fas receptors from the cell surface []. Other receptors, such as the epidermal growth factor receptor, are also known to be cleared by RID []. This entry represents the E3B region 10.4K protein, also known as the RID alpha subunit.
This entry represents a group of bacterial and eukaryotic DnaJ proteins, including the P58IPK homologue from Arabidopsis and DNJC7 from S. pombe. Plant P58IPK is a double-stranded RNA-dependent protein kinase PKR inhibitor required by viruses for virulence [
].
This non-structural protein is one of two found in pneumoviruses. The protein is about 140 amino acids in length. The NS1 protein appears to be important for
efficient replication but not essential []. The NS1 protein has been shown by yeast two-hybrid to interact with the viral P protein []. This protein is also known asthe 1C protein. It has also been shown that NS1 can potently inhibit transcription and RNA replication [
].
EspA, together with EspB, EspD and Tir are exported by a type III secretion system. These proteins are essential for
attaching and effacing lesion formation. EspA is a structural protein and a major component of a large, transiently expressed, filamentous surface organelle whichforms a direct link between the bacterium and the host cell [
,
].
Proteins EXECUTER 1/2 (EX1/2) are located in chloroplasts and essential for singlet oxygen signalling, which is a stress signal that triggers the programmed cell death (PCD) [
,
]. This singlet oxygen-induced stress-related signalling that triggers a PCD pathway is unique to photosynthetic eukaryotes and operates under mild stress conditions, invokes photoinhibition of photosystem II (PSII) without causing photooxidative damage of the plant [].
SurE is one of the several proteins that is expressed when bacterial cells are subjected to environmental stresses and in the stationary growth phase of bacteria [
,
]. In E. coli, surE is next to pcm, an L-isoaspartyl protein repair methyltransferase that is also required for stationary phase survival. It was originally predicted to be an acid phosphatase, however, subsequent work showed this is not the case []. SurE has been shown to possess phosphatase activity and appears to be specific to nucleoside monophosphates []. In E. coli, Thermotoga maritima, and Pyrobaculum aerophilum SurE acts strictly on nucleoside 5'- and 3'-monophosphates.
Salvador (sav, also known as Shar-pei) is a scaffold protein that plays an important role in the Hippo/SWH (Sav/Wts/Hpo) signalling pathway, which is involved in organ size control and tumour suppression by restricting proliferation and promoting apoptosis [
,
]. In Drosophila it mediates cell proliferation arrest during imaginal disc growth []. It also promotes both cell cycle exit and apoptosis [].
Cold-regulated proteins 27 and 28 are regulated by low temperature and light. They are involved in central circadian clock regulation and in flowering promotion, by binding to the chromatin of clock-associated evening genes TOC1, PRR5, ELF4 and cold-responsive genes in order to repress their transcription [
,
]. They are also involved in freezing tolerance regulation.
This domain can be found in the N terminus of the bacterial SirB1 protein. It can also be found in the mammalian F-box only protein 21 (FBXO21). SirB is required for maximal expression of sirC, which encodes an SPI1-encoded transcription factor [
]. FBXO21 is a substrate-recognition component of the SCF (SKP1-CUL1-F-box protein)-type E3 ubiquitin ligase complex [
].
This entry represents protein disulphide-isomerase A4 (also known as endoplasmic reticulum protein 72kDa or ERp-72) (
), which acts as a molecular ER chaperone. ER chaperones are critical not only for quality control of proteins processed in the ER, but also for regulation of ER signalling in response to ER stress [
]. ERp-72 catalyses the rearrangement of -S-S- bonds in proteins. It is part a large chaperone multiprotein complex comprising CABP1, DNAJB11, HSP90B1, HSPA5, HYOU, PDIA2, PDIA4, PPIB, SDF2L1, UGT1A1 and very small amounts of ERP29, but not, or at very low levels, CALR nor CANX. ERp-72 may functionally associate with NADPH oxidase 1 (NOX1) [].
Proteins in this group have homology with the RepC protein of Agrobacterium Ri and Ti plasmids [
]. They may be involved in plasmid replication and stabilisation functions.
This entry represents a predicted immunity protein with an alpha+beta fold and several conserved charged and hydrophobic residues. Proteins containing this domain are present in bacterial polymorphic toxin systems as an immediate gene neighbour of the toxin gene, usually containing a domain of the Tox-URI2 family [
]. The protein is also found in heterogeneous polyimmunity loci.
This entry represents a predicted immunity protein with a mostly α-helical fold and a conserved DxG motif. Proteins containing this domain are present in bacterial polymorphic toxin systems as an immediate gene neighbour of the toxin gene, usually containing a domain of the Tox-SHH family of HNH/Endonuclease VII fold nucleases [
].
This entry represents a predicted immunity protein with an alpha+beta fold. Proteins containing this domain are present in heterogeneous polyimmunity loci of polymorphic toxin systems [
].
This entry represents a predicted immunity protein with an alpha+beta fold and a conserved HxxRN motif. These proteins are present in heterogeneous polyimmunity loci in polymorphic toxin systems [
].
This entry represents a predicted immunity protein with two transmembrane helices, and a WxW motif and a conserved arginine between the two helices. These proteins are present in heterogeneous polyimmunity loci in polymorphic toxin systems [].
This entry represents a predicted immunity protein with an alpha+beta fold and a conserved C-terminal tryptophan residue. Proteins containing this domain are present in bacterial polymorphic toxin systems as an immediate gene neighbour of the toxin gene, usually containing a domain of the Tox-ColE3 family [
].
This entry represents a predicted immunity protein with an alpha+beta fold. These proteins are present in bacterial polymorphic toxin systems as an immediate gene neighbour of the toxin gene, usually containing a domain of the Tox-URI1, Tox-URI2 or Tox-ParBL1 families [
]. The gene for this toxin is also found in heterogeneous polyimmunity loci that show variations in structure even between closely related strains.
This entry represents a predicted immunity domain with an alpha+beta fold with conserved tryptophan, proline, aspartate, serine and arginine residues. Proteins containing this domain are present in bacterial polymorphic toxin systems as an immediate gene neighbour of the toxin gene, usually containing a domain of the Tox-AHH family of HNH/Endonuclease VII fold nucleases [
]. The gene for this toxin is also found in heterogeneous polyimmunity loci.
This entry represents a predicted immunity protein with an all α-helical fold and a conserved proline residue. These proteins are present in bacterial polymorphic toxin systems as an immediate gene neighbour of the toxin gene, usually containing a domain of the Tox-REAse-1 or Tox-REase-6 families [
].
This entry represents a predicted immunity protein with an all α-helical fold and a conserved HRG motif. These proteins are present in heterogeneous polyimmunity loci in polymorphic toxin systems [
].
This entry represents a predicted immunity protein with an alpha+beta fold and a conserved KxGDxxK motif. These proteins are present in heterogeneous polyimmunity loci in polymorphic toxin systems [
].
This entry represents a predicted immunity protein with an alpha+beta fold and conserved phenylalanine and tryptophan residues and a GGD motif. Proteins containing this domain are present in bacterial polymorphic toxin systems as an immediate gene neighbour of the toxin gene, usually containing a domain of the Ntox19 family [
].
This entry represents a predicted immunity protein with an alpha+beta fold and conserved GR, and GxK motifs. These proteins are present in bacterial polymorphic toxin systems as an immediate gene neighbour of the toxin gene, usually containing a domain of the Tox-URI2 family of nucleases [
].
This entry represents a predicted immunity protein with an alpha+beta fold and a conserved tryptophan residue. Proteins containing this domain are present in bacterial polymorphic toxin systems as an immediate gene neighbour of the toxin gene, usually containing a protease domain such as Tox-PL1 and Ntox40. In some instances, it is also fused to a papain-like toxin, ADP-ribosyl glycohydrolase and a S8-like peptidase [
]. Based on these associations the domain is likely to be a protease inhibitor.
This entry represents a predicted immunity protein with an alpha+beta fold and a conserved histidine residue. These proteins are present in bacterial polymorphic toxin systems as an immediate gene neighbour of the toxin gene, usually containing a domain of the Ntox12 or Ntox37 or Notx 7 families [
].
Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [
,
]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [,
].This family of proteins has been identified as part of the mitochondrial large ribosomal subunit in Saccharomyces cerevisiae [
].
This entry represents proteins conserved in Firmicutes and Proteobacteria. Some members are annotated as being glucose-6-phosphate 1-dehydrogenase but there is no evidence supporting this.
This entry includes a group of homeobox proteins, including HXA9/HXB9/HXC9 from humans. They are sequence-specific transcription factors which are involved in regulating developmental gene expression in cell lineage differentiation [
,
,
].
RNA-binding protein NOB1 has a nuclear location [
] and contains a PIN domain and binds a single zinc ion. Budding yeast Nob1 is involved in proteasomal and 40S ribosomal subunit biogenesis []. It is also required for maturation of the small subunit ribosomal RNA by catalyzing cleavage at site D after export of the preribosomal subunit into the cytoplasm. Nob1 is also found in archaea, where it is manganese-dependent and also processes RNA-substrates []. This entry also includes VapC1 and VapC3. VapC1 is a toxic component and a ribonuclease of a toxin-antitoxin (TA) module [].
This entry includes a group of NAD(P)-binding domain-containing proteins from plants and Cyanobacteria, including TIC62 from Arabidopsis. TIC62 is part of the Tic complex (a redox regulon) involved in translocation of nuclear-encoded preproteins across the inner envelope of chloroplasts [
]. TIC62 binds to ferredoxin-NADP(H) oxidoreductase (FNR) and acts as a redox sensor of the complex [].
This family of proteins is found in eukaryotes and includes MRN complex-interacting protein (MRNIP), which plays a role in preventing the accumulation of damaged DNA in cells. It associates with the MRE11-RAD50-NBS1 (MRN) damage-sensing complex and is rapidly recruited to sites of DNA damage. Phosphorylation of a serine promotes nuclear localization of MRNIP [
].
TANGLED1 (TAN1) is a microtubule-binding protein that localizes to the division site and mitotic microtubules and plays a critical role in division plane orientation in plants [
]. It acts as a component of a cortical guidance cue that remains behind when the preprophase band (PPB) is disassembled and directs the expanding phragmoplast to the former PPB site during cytokinesis [,
,
,
]. It has been shown to mediate microtubule zippering or end-on microtubule interactions, depending on their contact angle [].
Arabinogalactan proteins (AGPs) are proteoglycans of the plant extracellular matrix that have been implicated in diverse developmental roles such as differentiation, cell-cell recognition, embryogenesis and programmed cell death. This entry includes AGP23 (At3g57690) and AGP40 (At3g20865) from Arabidopsis thaliana. AGP40 is only expressed in pollen, while AGP23 is only expressed in flowers [
].
This entry represents a group of kinesin-like proteins, including KIF15 from animals and KIN-12A-G from plants. KIF15 homologue in Xenopus, Xklp2, is required for centrosome separation during mitosis [
]. KIN-12E has been shown to regulate metaphase spindle flux and help control spindle size in Arabidopsis []. KIN12 are plus-end directed kinesin-like motor enzymes that plays a critical role in the organization of phragmoplast microtubules during cytokinesis [,
].
This entry represents the spermatogenesis-associated protein 7 (SPATA7, also known as HSD3). It was first identified in human spermatocytes. Later on, it was also found expressed in multiple layers of the mature mouse retina [
]. Mutations in SPATA7 cause Leber congenital amaurosis 3 (LCA3), which is a severe dystrophy of the retina, typically becoming evident in the first years of life [,
]. Mutations in SPATA7 also cause autosomal recessive retinitis pigmentosa (ARRP), which is a retinal dystrophy belonging to the group of pigmentary retinopathies [].
Podocalyxin-like protein 2 (PODXL2), also known as Endoglycan, is a ligand for the vascular selectins [
]. It belongs to the CD34 family, whose members contain an N-terminal acidic region appended to the sialomucin domain [].
Fibrinogen silencer-binding protein (FSBP) is a transcription factor that interacts with the neuronal adaptor protein X11alpha. The X11alpha/FSBP complex has been shown to signal to the nucleus to repress glycogen synthase kinase-3beta promoter activity [
].
Z-DNA-binding protein 1 (ZBP1), also known as DAI (DNA-dependent activator of IFN regulatory factors), is a cytosolic DNA sensor that activates the innate immune system [
].
This family consist of Mapk-regulated corepressor-interacting protein 1 (MCRIP1) and 2 (MCRIP2). MCRIP1 binds to and inhibits the transcriptional co-repressor CtBP. MCRIP1 is an ERK substrate; when phosphorylated by ERK, MCRIP1 releases CtBP to induce chromatin modifications [
].
This entry represents the N-terminal of the MMS22L (Methyl methanesulfonate-sensitivity protein 22-like) protein.Methyl methanesulfonate-sensitivity protein 22-like (MMS22L) is a component of the MMS22L-TONSL complex, a complex that stimulates the recombination-dependent repair of stalled or collapsed replication forks [
]. The MMS22L-TONSL complex is required to maintain genome integrity during DNA replication by promoting homologous recombination-mediated repair of replication fork-associated double-strand breaks [,
]. It may act by mediating the assembly of RAD51 filaments on ssDNA [].
This group represents dystrophin-related protein 2 (DRP2) [
]. Its expression is largely confined to the vertebrate central nervous system and may have a role in the organisation of central cholinergic synapses [].
This family represents the C terminus of protein lines [
]. In Drosophila, this protein is involved in embryonic segmentation and may function as a transcriptional regulator [,
].
This entry represents a group of eukaryotic proteins that aid the production of new cilia in ciliogenesis. Mutations in the human protein cause a disease named Joubert syndrome type 14 (JBTS14), which is an autosomal recessive disorder characterised by severe mental retardation, hypotonia, breathing abnormalities in infancy, and dysmorphic facial features [
]. Loss of the mammalian TMEM237 results in defective ciliogenesis and deregulation of Wnt signaling []. Proteins in this family are typically between 203 and 512 amino acids in length. There are two completely conserved G residues that may be functionally important.
This entry contains a subset of the AS2/LOB (ASYMMETRIC LEAVES 2/LATERAL ORGAN BOUNDARIES) gene family that are highly homologous and plant specific.The N terminus contains a conserved approximately 100-amino acid domain (the LOB domain) that is present in A. thaliana proteins and in proteins from a variety of other plant species. Genes encoding LOB domain (LBD) proteins are expressed in a variety of temporal- and tissue-specific patterns, suggesting that they may function in diverse processes [
,
]. The LOB domain contains conserved blocks of amino acids that identify the LBD gene family. In particular, a conserved C-x(2)-C-x(6)-C-x(3)-C motif, which is defining feature of the LOB domain, is present in all LBD proteins. It is possible that this motif forms a new zinc finger [].
Proteins in this family contain the ATP-grasp fold predicted to be involved in the biosynthesis of cell surface polysaccharides such as the O-antigen in proteobacteria, the capsule in firmicutes, and the polyglutamate chain of teichuronopeptide [
].
The extracellular polysaccharide colanic acid (CA) is produced by species of the family Enterobacteriaceae. In Escherichia coli (strain K12) the CA cluster comprises 19 genes. The wzx gene encodes a protein with multiple transmembrane segments that may function in export of the CA repeat unit from the cytoplasm into the periplasm in a process analogous to O-unit export. The CA gene clusters may be involved in the export of polysaccharide from the cell [
].
This entry includes a group of ER-resident saposin-like proteins, including CNPY1/2 from mammals and protein seele from Drosophila melanogaster. They contain a saposin B-type domain, which is an endoplasmic reticulum (ER) retention domain. Seele is involved in embryonic dorsal-ventral patterning [
]. Zebrafish CNPY1 acts as a positive feedback regulator of FGF that contributes to the development of left-right body asymmetry by controlling stem cell clustering during Kupffer's vesicle organogenesis []. CNPY2 interacts with MYLIP, an E3 ubiquitin ligase that marks its targets for lysosomal degradation. CNPY2 protects targets of MYLIP from lysosomal degradation [].
Microtubule-severing enzyme Katanin is composed of a catalytic p60 subunit (A subunit, KATNA1) and a regulatory p80 subunit (B subunit, KATNB1). KATNBL1 functions as a regulator of Katanin A subunit microtubule-severing activity during mitosis [
].
This domain is found in protein TOPAZ1, which may play an important role in germ cell development [
]. Homology suggests that it is a putative aspartate racemase.Protein TOPAZ1 was previously called testis and ovary-specific PAZ domain-containing protein 1 [
], but the name was changed to Protein TOPAZ1 as it does not contain a PAZ domain.
Cytokine-like protein 1 (CYTL1) was identified as a secretory protein expressed in CD34+ haemopoietic cells [
]. CYTL1 seems to regulate chondrogenesis and is required for the maintenance of cartilage homeostasis [,
]. It may also work as a regulatory factor in embryo implantation []. This family of proteins, CYTL1, is found in vertebrates. Proteins have two conserved sequence motifs: PPTCYSR and DDC.