This entry includes PHF20 and PHF20L1 proteins.PHD finger protein 20 (PHF20) binds dimethyl lysine residues via its Tudor domain [
]. It is a component of the MLL1, MOF histone acetyltransferase [] and NSL protein complexes [].PHF20L1 binds to monomethylated lysine 142 on DNA (cytosine-5) methyltransferase 1 (DNMT1). It has been shown to antagonize DNMT1 proteasomal degradation [
].
Migration and invasion-inhibitory protein (MIIP or IIp45) binds to insulin-like growth factor binding protein 2 (IGFBP-2) and inhibits the invasion of glioma cells [
]. MIIP attenuates mitotic transition, thereby inhibiting glioma progression [
]. It not only modulates IGFBP-2, but also inhibits cell migration by binding to and inhibiting HDAC6, a class 2 histone deacetylase that deacetylates alpha-tubulin [].
Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [
,
]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [
,
].The sequence of the acidic ribosomal protein S6 from Haloarcula marismortui has been determined [
]. The protein consists of 116 amino acid residues, and has a molecular mass of 12,251kDa. Sequence comparison with ribosomal proteins of other organisms has revealed that H. marismortui protein S6 is similar to mammalian protein L7a [], yeast L4 [], yeast NHP2 [], Bacillus subtilis hypothetical protein ylxQ and Methanocaldococcus jannaschii (Methanococcus jannaschii).
Protein NPAT is required for progression through the G1 and S phases of the cell cycle, and for S phase entry [
,
]. It regulates histone gene expression as well as nonhistone targets that influence competency for cell cycle progression [].This entry represents the C-terminal domain of NPAT.
This entry includes flagellar M-ring protein (FliF) and Yop proteins translocation lipoprotein J (YscJ). They share a low level of similarity, presumably due to evolution of the type III secretion system from the flagellar biosynthetic pathway.FliF is the major protein
of the M-ring in bacterial flagellar basal body [].The basal body consists of four rings (L,P,S and M) surrounding the
flagellar rod, which is believed to transmit motor rotation to the filament [
]. The M ring is integral to the inner membrane of the cell, and may be
connected to the rod via the S (supramembrane) ring, which lies just distalto it. The L and P rings reside in the outer membrane and periplasmic space,
respectively.FliF lacks a signal peptide and is predicted to have considerable
α-helical structure, including an N-terminal sequence that is likelyto be membrane-spanning [
]. Overall, however, FliF has a relativelyhydrophilic sequence, with a high charge density, especially towards its
C terminus [].The type III secretion system is of great interest as it is used to transport virulence factors from the pathogen directly into the host cell
[] and is only triggered when the bacterium comes into close contact with the host. The protein subunits of the system are very similar to those of bacterial flagellar biosynthesis []. However, while the latter forms a ring structure to allow secretion of flagellin and is an integral part of the flagellum itself, type III subunits in the outer membrane translocate secreted proteins through a channel-like structure.One of the outer membrane protein subunit families, termed "K"here for nomenclature purposes, aids in the structural assembly of the invasion
complex []. It is also described as a lipoprotein. These lipoproteins include the PrgK and SsaJ from Salmonella, MxiJ from Shigella, YscJ from Yersinia, and from the plant enteropathogens NolT (Rhizobium) and HrcJ (Erwinia).
This family corresponds to the FliF protein. FliF is the major protein
of the M-ring in bacterial flagellar basal body [].The basal body consists of four rings (L,P,S and M) surrounding the
flagellar rod, which is believed to transmit motor rotation to the filament [
]. The M ring is integral to the inner membrane of the cell, and may be
connected to the rod via the S (supramembrane) ring, which lies just distalto it. The L and P rings reside in the outer membrane and periplasmic space,
respectively.FliF lacks a signal peptide and is predicted to have considerable
α-helical structure, including an N-terminal sequence that is likelyto be membrane-spanning [
]. Overall, however, FliF has a relativelyhydrophilic sequence, with a high charge density, especially towards its
C terminus [].
This entry represents the transcription antitermination protein, RfaH [
]. This protein is most closely related to the transcriptional termination/antitermination protein NusG () and contains the KOW motif (
) [
]. This protein appears to be limited to the proteobacteria.In Escherichia coli, it enhances distal genes transcription elongation in a specialised subset of operons that encode extracytoplasmic components [
]. RfaH is recruited into a multi-component RNA polymerase complex by the ops element, which is a short conserved DNA sequence located downstream of the main promoter of these operons. Once bound, RfaH suppresses pausing and inhibits Rho-dependent and intrinsic termination at a subset of sites. Termination signals are bypassed, which allows complete synthesis of long RNA chains. Enhances expression of several operons involved in synthesis of lipopolysaccharides, exopolysaccharides, hemolysin, and sex factor. Also negatively controls expression and surface presentation of AG43 and possibly another AG43-independent factor that mediates cell-cell interactions and biofilm formation [,
,
].
Mitosin or centromere-associated protein-F (Cenp-F) is a coiled-coil protein that dimerizes and localizes to diverse subcellular locations, including microtubule plus-ends, mitochondria, nuclear pores, and kinetochores [
]. It localizes to kinetochores during mitosis and is then rapidly degraded after mitosis. It is required for kinetochore-microtubule interactions and spindle checkpoint function []. Cenp-F contains two microtubule-binding domains, and physically associates with dynein motor regulators []. Cenp-F has also been shown to couple mitochondria to dynamic microtubule tips [].This entry represents the N-terminal region of Cenp-F.
Batten's disease, the juvenile variant of neuronal ceroid lipofuscionosis
(NCL), is a recessively inherited disorder affecting children of 5-10years of age. The disease is characterised by progressive loss of vision,
seizures and psychomotor disturbances. Biochemically, the disease ischaracterised by lysosomal accumulation of hydrophobic material, mainly ATP
synthase subunit C, largely in the brain but also in other tissues. The disease is fatal within a decade [].Mutations in the CLN3 gene are believed to cause Batten's disease [
]. TheCLN3 gene, with a predicted 438-residue product, maps to chromosome p16p12.1. The gene contains at least 15 exons spanning 15kb and is highly conserved in mammals [
]. A 1.02kb deletion in the CLN3 gene, occurring in either one or both alleles, is found in 85% of Batten disease chromosomes causing a frameshift generating a predicted translated product of 181 amino acid residues [,
]. 22 other mutations, including deletions, insertions and point mutations, have beenreported. It has been suggested that such mutations result in severely
truncated CLN3 proteins, or affect its structure/conformation [,
].CLN3 proteins, which are believed to associate in complexes, are heavily
glycosylated lysosomal membrane proteins [], containing complex Asn-linkedoligosaccharides [
]. Extensive glycosylation is important for the stabilityof these lysosomal proteins in the highly hydrolytic lysosomal lumen. Lysosomal
sequestration of active lysosomal enzymes, transport of degraded moleculesfrom the lysosomes, and fusion and fission between lysosomes and other
organelles. The CLN3 protein is a 43kDa, highly hydrophobic, multi-transmembrane (TM),phosphorylated protein [
]. Hydrophobicity analysis predicts 6-9 TMsegments, suggesting that CLN3 is a TM protein that may function as a
chaperone or signal transducer. The majority of putative phosphorylationsites are found in the N-terminal domain, encompassing 150 residues [
].Phosphorylation is believed to be important for membrane compartment
interaction, in the formation of functional complexes, and in regulation and interactions with other proteins [
].CLN3 contains several motifs that may undergo lipid post-translational
modifications (PTMs). PTMs contribute to targeting and anchoring of modifiedproteins to distinct biological membranes [
]. There are three general classes of lipid modification: N-terminal myristoylation, C-terminal
prenylation, and palmitoylation of cysteine residues. Such modifications are believed to be a common form of PTM occurring in 0.5% of all cellular
proteins, including brain tissue []. The C terminus of the CLN3 containsvarious lipid modification sites: C435, target for prenylation; G419,
target for myristoylation; and C414, target for palmitoylation [].Prenylation results in protein hydrophobicity, influences interaction with
upstream regulatory proteins and downstream effectors, facilitates protein-protein interaction (multisubunit assembly) and promotes anchoring tomembrane lipids. The prenylation motif, Cys-A-A-X, is highly conserved
within CLN3 protein sequences of different species [].Species with known CLN3 protein homologues include: Homo sapiens, Canis familiaris, Mus musculus, Saccharomyces cerevisiae and Drosophila melanogaster.
The hypothetical protein YqaI is expressed in bacteria, particularly Bacillus subtilis. It forms a homo-dimer, with each monomer containing an alpha helix and four beta strands.
This is a family of metal-binding periplasmic proteins. Tp34 has been classified together with other proteins in a group of uncharacterised proteins probably involved in high affinity Fe2+ transport. However, the structural and functional aspects of this group of proteins remain undetermined. Tp34 may bind zinc and the iron-sequestering lactoferrin [
] and may have a role in metal ion homeostasis [].
This family of proteins is found in chordates. Proteins in this family are typically between 73 and 221 amino acids in length. There is a conserved AYV sequence motif.
This entry represents the C-terminal domain found in members of the transmembrane protein 132 family. The family consists of TMEM132A, TMEM132B, TMEM132C, TMEM132D, TMEM132E.TMEM132A may play a role in embryonic and postnatal development of the brain. It increased resistance to cell death induced by serum starvation in cultured cells. It regulates cAMP-induced GFAP gene expression via STAT3 phosphorylation [
,
]. TMEM132D is a single-pass transmembrane protein that is highly expressed in the cortical regions of the human and mouse brain. The function is still unknown. It may act as a cell-surface marker for oligodendrocyte differentiation [
,
]. Additionally, as it may be most strongly expressed in neurons and it colocalises with actin filaments, TMEM132D may be implicated in neuronal sprouting and connectivity in brain regions important for anxiety-related behaviour [].
This entry represents the N-terminal domain found in members of the transmembrane protein 132 family. The family consists of TMEM132A, TMEM132B, TMEM132C, TMEM132D, TMEM132E.TMEM132A may play a role in embryonic and postnatal development of the brain. It increased resistance to cell death induced by serum starvation in cultured cells. It regulates cAMP-induced GFAP gene expression via STAT3 phosphorylation [
,
]. TMEM132D is a single-pass transmembrane protein that is highly expressed in the cortical regions of the human and mouse brain. The function is still unknown. It may act as a cell-surface marker for oligodendrocyte differentiation [
,
]. Additionally, as it may be most strongly expressed in neurons and it colocalises with actin filaments, TMEM132D may be implicated in neuronal sprouting and connectivity in brain regions important for anxiety-related behaviour [].
Monopolar spindle protein 2 (Mps2) is a fungal transmembrane protein localised to the spindle pole body (SPB) [
]. It is required for the insertion of the nascent SPB into the nuclear envelope and for the proper execution of spindle pole body (SPB) duplication [,
]. The interaction between Mps2 and Spc24 may contribute to the localisation of Spc24 and other kinetochore components to the inner plaque of the SPB [].
AP-1 complex-associated regulatory protein (AP1AR), also known as gadkin or gamma-BAR, is a negative regulator of the actin-related protein (ARP)2/3 complex, a major regulator of actin dynamics. Through inhibition of ARP2/3, gadkin negatively regulates cell spreading and motility [
]. Gadkin is also a modulator of adaptor protein (AP)-1-mediated endosomal membrane trafficking [,
,
].
This entry represents the N-terminal domain of the centrosomal proteins Cep63 and Deup1. Cep63 regulates mother-centriole-dependent centriole duplication, whereas Deup1 governs deuterosome assembly for large-scale de novo centriole biogenesis [
].
This entry represents coiled-coil domain-containing protein 89 (CCDC89) and tumor suppressor candidate gene 1 protein (TUSC1).Hairy-related transcription factor (HRT), also known as Bc8, is a transcriptional repressor and has been identified as effector of the Notch signalling pathway. HRT proteins contain a conserved amino acid domain, denoted either as the Orange domain or as helix III/IV located at the C-terminal after the bHLH domain that is characteristic of all HES/E(spl)-related proteins, also referred to as bHLH-Orange (bHLH-O) proteins.Coiled-coil domain-containing protein 89 (CCDC89), also known as Bc8 orange-interacting protein (BOIP), interacts with the orange domain of HRT1. BOIP binds to the orange domain of HRT1/Hey1 and is predominantly expressed at a specific stage in spermatogenesis. BOIP may play a role in the maturation process from haploid round spermatids to spermatozoa, possibly as a tissue-specific regulator of HRT1/Hey1 activity [
].Tumor suppressor candidate 1 (TUSC1) has been identified as a potential tumor suppressor in human cancers and could be a novel therapeutic target for glioblastoma [
].
These are putative membrane proteins from alpha and gamma proteobacteria, each making up their own clade. The two clades have less than 25% identity between them.
These sequences represent a family of integral membrane proteins, most of which are about 650 residues in size and predicted to span the membrane seven times. Nearly half of the members of this family are found in association with a member of the lactococcin 972 family of bacteriocins (
) [
]. Others may be associated with uncharacterised proteins that may also act as bacteriocins. Although this protein is suggested to be an immunity protein, and the bacteriocin is suggested to be exported by a Sec-dependent process, the role of this protein is unclear.
There is currently no experimental data for members of this group or their homologues, nor do they exhibit features indicative of any function. However, members contain multiple membrane-spanning regions and are found in association with bacteriocins [
].
This entry represents polar-tube protein 2 from Microsporidia. Humans can be infected with the unicellular eukaryote Microsporidia which are obligate intracellular parasites that produce resistant spores. To initiate entry into a new host cell a unique motile process is formed by a sudden extrusion of the polar tube protein from the spore [,
]. There are a series of conserved cysteine residues.
This entry represents a number of known and suspected phage virion morphogenesis proteins. The S protein from phage P2 is thought to act in tail completion and stable head joining [
].
This DNA-binding protein binds to the autonomously replicating sequence (ARS) binding element. It may play a role in regulating the cell cycle response to stress signals [
].
Asp4 and Asp5 are putative accessory components of the SecY2 channel of the SecA2-SecY2 mediated export system, but they are not present in all SecA2-SecY2 systems [
,
]. This family of Asp5 proteins is found in Streptococcus spp.
This domain consists of the adjacent Saf-Nte and Saf-pilin chains of the pilus-forming complex. Pilus assembly in Gram-negative bacteria involves a Donor-strand exchange mechanism between the C- and the N-termini of this domain. The C-terminal subunit forms an incomplete Ig-fold which is then complemented by the 10-18 residue N terminus of another, incoming, pilus subunit which is not involved in the Ig-fold. The N terminus sequences contain a motif of alternating hydrophobic residues that occupy the P2 to P5 binding pockets in the groove of the first pilus subunit [
].
This entry represents the N-terminal domain of the ribosomal protein L5.Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [
,
]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [
,
].Ribosomal protein L5, ~180 amino acids in length, is one of the proteins from the large ribosomal subunit. In Escherichia coli, L5 is known to be involved in binding 5S RNA to the large ribosomal subunit. It belongs to a family of ribosomal proteins which, on the basis of sequence similarities [,
,
], groups:Eubacterial L5.Algal chloroplast L5.Cyanelle L5.Archaebacterial L5.Mammalian L11.Tetrahymena thermophila L21.Dictyostelium discoideum (Slime mold) L5Saccharomyces cerevisiae (Baker's yeast) L16 (39A).Plant mitochondrial L5.
Catabolite control protein A is a LacI family global transcriptional regulator found in Gram-positive bacteria. CcpA is involved in repressing carbohydrate utilization genes [ex: alpha-amylase (amyE), acetyl-coenzyme A synthase (acsA)] and in activating genes involved in transporting excess carbon fromthe cell [e.g. acetate kinase (ackA), alpha-acetolactate synthase (alsS)]. Additionally, disruption of CcpA in Bacillus megaterium, Staphylococcus xylosus, Lactobacillus casei and Lactobacillus pentosus also decreases growth rate, which suggests CcpA is involved in the regulation of other metabolic pathways []. Its structure has been solved [].
UCMA is a secreted cartilage-specific protein expressed predominantly in resting chondrocytes. It is secreted into the extracellular matrix as an uncleaved precursor and shows the same restricted distribution pattern in cartilage as UCMA mRNA. This protein is proteolytically processed and contains tyrosine sulfates. It seems to be involved in the negative control of osteogenic differentiation of osteochondrogenic precursor cells in peripheral zones of foetal cartilage [
].
Melanoma-derived growth regulatory protein or Melanoma inhibitory activity (MIA) is a 12kDa protein that is secreted from both chondrocytes and malignant melanoma cells. It has effects on cell growth and adhesion and it may play a role in melanoma metastasis and cartilage development. MIA elicits growth inhibition on melanoma cells in vitro. It is possible that secretion of MIA in vivo leads to decreased adhesiveness of melanocytic cells and thereby promotes melanoma progression and invasion [
]. Crystal structure revealed an Src homology 3 (SH)-like domain with N- and C-terminal extensions of about 20 aa each. It is the first structure of a secreted protein that contains an SH3 subdomain. MIA SH3 subdomain shares sequence similarity with canonical SH3 domains, suggesting that they are evolutionary related. It has a protein interaction site and, unlike conventional SH3 domains, MIA does not recognise polyproline helices [
].
This entry includes a group of zinc finger transcription regulators, including GLI1-3 and GLIS1-3 from humans.GLI1-3 are central effectors of the Hedgehog (Hh) pathway in vertebrates [
,
]. GLI1 is an obligatory activator, whereas GLI2 and GLI3 carry activator and repressor functions [
]. The full-length GLI3 form, after phosphorylation and nuclear translocation, acts as an activator (GLI3A) while its C-terminally truncated form acts as a repressor (GLI3R). GLI3 participates in the patterning and growth of many organs, including the central nervous system (CNS) and limbs [,
,
,
,
].GLIS1-3 play key roles in the regulation of a number of physiological processes and have been implicated in several pathologies. Glis1-3 share a highly homologous zinc finger domain (ZFD) containing five Cys2-His2-type zinc finger (ZF) motifs with members of the Gli and Zic family [
].
HGAL (human germinal centre-associated lymphoma, also known as a germinal centre-expressed transcript 2 or germinal centre-associated signalling and motility protein) is part of a family of mammalian sequences typically between 104 and 179 amino acids in length. Members were discovered in a search for proteins precipitating diffuse large B-cell lymphomas [
,
]. HGAL and its murine homologue, M17 protein, are specifically expressed in germinal centre (GC) B-lymphocytes [].HGAL interacts with the cytoskeleton and aids the activity of interleukin-6 on cell migration [
]. It also modulates the RhoA signalling pathway []. HGAL protein also regulates B-cell receptor (BCR) signalling []. It seems that membrane-bound and cytoplasmic HGAL regulates distinct biological processes in B-cells.
This entry represents a family of Tfx DNA-binding proteins, which is restricted to euryarchaeota. TfX has a 2-layer α/β topology. Homology among the members is strongest in the helix-turn-helix-containing N-terminal region. Tfx from Methanothermobacter thermautotrophicus is associated with the operon for molybdenum formyl-methanofuran dehydrogenase and binds a DNA sequence near its promoter [
].
This entry represents a family of Tfx DNA-binding proteins, which is restricted to the archaea. TfX has a 2-layer alpha/beta topology. Homology among the members is strongest in the helix-turn-helix-containing N-terminal region. Tfx from Methanothermobacter thermautotrophicus (Methanobacterium thermoformicicum) is associated with the operon for molybdenum formyl-methanofuran dehydrogenase and binds a DNA sequence near its promoter.
This family of proteins is found in eukaryotes. Proteins in this family are typically between 187 and 319 amino acids in length. There are two completely conserved residues (G and P) that may be functionally important.
This entry represents the short highly conserved C-terminal domain of certain bromodomain proteins, notably Brd4. The Brd4 CTD interacts with the cyclin T1 and Cdk9 subunits of positive transcription elongation factor b (pTEFb) complex. Brd4 displaces negative regulators, the HEXIM1 and 7SKsnRNA complex, from pTEFb, thereby transforming it into an active form that can phosphorylate RNA pol II [
,
].
The lysosome associated protein transmembrane (LAPTM) family is comprised of three members: LAPTM5, LAPTM4a and LAPTM4b; they are lysosome-associated transmembrane proteins, found in mammals, insects and nematodes.
This entry includes a group of GPCR-associated sorting proteins, including GASP1, GASP2 and GASP3 (also known as BHLHb9/p60TRP).GASP3/BHLHb9 modulates the phosphorylation and proteolytic processing of amyloid precursor protein (App), N-catherin (Cdh), presenilin (Psen) and tau protein, which might be a new therapeutic target for the treatment of Alzheimer's disease. It also regulates Ngf-dependent neuronal survival and differentiation, promoting neurosynaptogenesis [
]. GASP3 is expressed in central nervous system, heart and skeletal muscle, conferring a possible involvement in cardiovascular functions [].GASP-1 regulates the degradation of several GPCRs that traffic to lysosomes after agonist-induced endocytosis. It also interacts with Beclin 2, which is essential for GPCR degradation [
].GASP-2 is thought to play a role in the regulation of a variety of GPCRs.
Fibrous sheath CABYR (Calcium Binding Tyrosine-(Y)-Phosphorylation Regulated) binding protein (FSCB) is a testis- and sperm-specific protein that is expressed post-meiotically and localised in mouse sperm flagella. FSCB homologues are present in other mammals, including rat and human. Conserved motifs in FSCB include PXXP, proline-rich and extensin-like regions [
].FSCB is regulated by protein kinase A (PKA)-mediated tryrosine phosphorylation. Its capacity of being phosphorylated and binding to calcium enhances the spermatozoa flagellar movement, spermatozoa capacitation and activation. FSCB phosphorylation activates spermatozoa motility and inhibits SUMOylation of two crucial proteins ROPN1/ROPN1L that are associated with PKA/A kinase activity and spermatozoa motility [
].
Thx forms part of the 30S ribosomal subunit. It fits into a cavity between multiple RNA elements in the top of the 30S subunit head and stabilises the organisation of these elements [
].
Outer capsid protein VP7 is a reoviral protein that interacts with VP4 to form the outer icosahedral capsid [
]. Outer capsids are involved directly in viral host interactions.
SafA (also known as B1500) is a connector protein acting in the bacterial two-component signal-transduction systems (TCSs). It connects the signal transduction between the two-component systems EvgS/EvgA and PhoQ/PhoP, by directly interacting with PhoQ and thus activating the PhoQ/PhoP system, in response to acid stress conditions [
].
This family of sequences describe an accessory protein required for the assembly of formate dehydrogenase of certain proteobacteria although not present in the final complex [
]. The exact nature of the function of FdhE in the assembly of the complex is unknown, but considering the presence of selenocysteine, molybdopterin, iron-sulphur clusters and cytochrome b556, it is likely to be involved in the insertion of cofactors.
Some antibiotics bind to the large ribosomal subunit at the peptidyl-transferase centre (PTC) or adjacent to it, in the ribosomal exit tunnel. Gram-positive bacteria have antibiotic resistance (ARE) ATP-binding cassette (ABC) proteins of the F type (ABCF) that provide them resistance to that kind of antibiotics. These proteins are also found in some Gram-negative bacteria [
].VmlR is a ribosomal protection protein that confers resistance to lincomycin (Lnc), the streptogramin A (SA) antibiotic virginiamycin M (VgM) and the pleuromutilin antibiotic tiamulin. It is an ABCF protein that binds to the E site of the antibiotic-stalled ribosomes inducing conformational changes in the P-site, and allows the dissociation of the antibiotic from its PTC binding site [
,
].
A predicted immunity protein of the NTF2 fold. Proteins containing this domain are present in bacterial polymorphic toxin systems as an immediate gene neighbour of the toxin gene, which usually contains toxin domains of the Tox-JAB-2 family [
].
Cysteine-rich C-terminal protein 1 (also known as NICE-1) is family of proteins found in mammals. Proteins in this family are typically between 51 and 105 amino acids in length.
LLCFC1 (also known as SOF1) and its homologues can be found in eukaryotes. They have two conserved sequence motifs: LLLL and CFNLAS. LLCFC1 is a sperm protein required for sperm-oocyte fusion in mice [
].
TEX29, testis-expressed sequence 29 protein, is a family of proteins found in eukaryotes. Proteins in this family are typically between 39 and 150 amino acids in length.
Testis-expressed sequence 38 is a family of proteins found in eukaryotes. Proteins in this family are typically between 152 and 232 amino acids in length.
In humans, Retinal degeneration protein 3 (RD3) is found preferentially expressed in the retina [
]. Mutations in RD3 causes Leber Congenital Amaurosis type 12 ,which is a severe dystrophy of the retina, typically becoming evident in the first years of life [].
Iron-sulphur (FeS) clusters are important cofactors for numerous proteins involved in electron transfer, in redox and non-redox catalysis, in gene regulation, and as sensors of oxygen and iron. These functions depend on the various FeS cluster prosthetic groups, the most common being [2Fe-2S] and [4Fe-4S][
]. FeS cluster assembly is a complex process involving the mobilisation of Fe and S atoms from storage sources, their assembly into [Fe-S]form, their transport to specific cellular locations, and their transfer to recipient apoproteins. So far, three FeS assembly machineries have been identified, which are capable of synthesising all types of [Fe-S] clusters: ISC (iron-sulphur cluster), SUF(sulphur assimilation), and NIF (nitrogen fixation) systems.
The ISC system is conserved in eubacteria and eukaryotes (mitochondria), and has broad specificity, targeting general FeS proteins [
,
]. It is encoded by the isc operon (iscRSUA-hscBA-fdx-iscX). IscS is a cysteine desulphurase, which obtains S from cysteine (converting it to alanine) and serves as a S donor for FeS cluster assembly. IscU and IscA act as scaffolds to accept S and Fe atoms, assembling clusters and transferring them to recipient apoproteins. HscA is a molecular chaperone and HscB is a co-chaperone. Fdx is a [2Fe-2S]-type ferredoxin. IscR is a transcription factor that regulates expression of the isc operon. IscX (also known as YfhJ) appears to interact with IscS and may function as an Fe donor during cluster assembly [
].The SUF system is an alternative pathway to the ISC system that operates under iron starvation and oxidative stress. It is found in eubacteria, archaea and eukaryotes (plastids). The SUF system is encoded by the suf operon (sufABCDSE), and the six encoded proteins are arranged into two complexes (SufSE and SufBCD) and one protein (SufA). SufS is a pyridoxal-phosphate (PLP) protein displaying cysteine desulphurase activity. SufE acts as a scaffold protein that accepts S from SufS and donates it to SufA [
]. SufC is an ATPase with an unorthodox ATP-binding cassette (ABC)-like component. SufA is homologous to IscA [], acting as a scaffold protein in which Fe and S atoms are assembled into [FeS]cluster forms, which can then easily be transferred to apoproteins targets.
In the NIF system, NifS and NifU are required for the formation of metalloclusters of nitrogenase in Azotobacter vinelandii, and other organisms, as well as in the maturation of other FeS proteins. Nitrogenase catalyses the fixation of nitrogen. It contains a complex cluster, the FeMo cofactor, which contains molybdenum, Fe and S. NifS is a cysteine desulphurase. NifU binds one Fe atom at its N-terminal, assembling an FeS cluster that is transferred to nitrogenase apoproteins [
]. Nif proteins involved in the formation of FeS clusters can also be found in organisms that do not fix nitrogen [].This entry represents the IscA component of the ISC system for iron-sulphur cluster assembly. IscA is believed to act as a scaffold upon which 2Fe-2S clusters are assembled and subsequently transferred to ferredoxin [
,
,
]. This clade is limited to the proteobacteria.
This entry represents a group of bacterial proteins, including UbiV from Escherichia coli. UbiV is involved in the O2-independent ubiquinone (coenzyme Q) biosynthesis pathway alongside UbiU (YhbU), and UbiT (YhbT) [
].
This entry represents a group of bacterial proteins, including UbiU from Escherichia coli. UbiU is involved in the O2-independent ubiquinone (coenzyme Q) biosynthesis pathway alongside UbiV (YhbV), and UbiT (YhbT) [
].
Small outer capsid protein stabilises T4 capsids after the majority of the capsid assembly stages, and helps to stabilise T4 against high temperatures and pH extremes by binding the capsid surface at interfaces between Gp23 hexameric capsomers [
].
Kinetoplastid membrane protein 11 is a major cell surface glycoprotein of the parasite Leishmania donovani. It stimulates T-cell proliferation and may play a role in the immunlogy of the dieases Leishmaniasis.
Tsg was identified in Drosophila melanogaster as being required to specify the dorsal-most structures in the embryo, for example the amnioserosa. Biochemical experiments have revealed three key properties of Tsg: it can synergistically inhibit Dpp/BMP action in both D. melanogaster and vertebrates by forming a tripartite complete between itself, SOG/chordin and a BMP ligand; Tsg seems to enhance the Tld/BMP-1-mediated cleavage rate of SOG/chordin and may change the preference of site utilisation; Tsg can promote the dissociation of chordin cysteine-rich-containing fragments from the ligand to inhibit BMP signalling [
,
].
This family represents the tail assembly protein GT, from lambda-like viruses and their prophage.In bacteriophage lambda, the overlapping open reading frames G and T are expressed by a programmed translational frameshift to produce the tail assembly proteins G and GT [
]. Tail assembly protein GT shares it's N-terminal residues with tail assembly protein G, followed by residues of unique sequence []. An analogous frameshift is widely conserved among other dsDNA tailed phages in their corresponding 'G' and 'GT' tail genes even in the absence of detectable sequence homology []. The lambda tail assembly protein G and frameshift product GT are produced in a molar ratio of approximately 30:1[]. The correct molar ratio of these two related proteins, normally determined by the efficiency of the frameshift, is crucial for efficient assembly of functional tails []. Although tail assembly proteins G and GT are both required for assembly of functional tails, neither is present in mature tails [].
This pilin glycosylation ligase domain found in Gram-negative bacteria [
]. PglL O-oligosaccharyltransferases differ from WaaL O-antigen ligases, , in its substrate-specificity. PglL O-oligosaccharyltransferases (O-OTase) transfer oligosaccharide to serine or threonine in a protein. A further indication that the genes identified by this domain are PglL rather than WaaL homologues is that they are not located within lipopolysaccharide biosynthetic loci [
]. The specific pilin glycosylation ligases are a subset of the more general bacterial protein o-oligosaccharyltransferases [].
CAAX box protein 1 (CAAX1, FAM127A) is found in primates. CAAX refers to the highly characteristic C-terminal residues, a cysteine and two aliphatic residues followed by any residue, a C-terminal tetrapeptide recognition motif called the Ca1a2X box [
]. This motif on substrates is recognised by prenyltransferases that then attach an isoprenoid lipid (a process termed prenylation), one of the many post-translational modifications that occur in cells []. The function of the prenylated protein is not known.
Members of this family are extremely potent competitive inhibitors of cAMP-dependent protein kinase activity. These proteins interact with the catalytic subunit of the enzyme after the cAMP-induced dissociation of its regulatory chains.
This is a family of 121-amino acid secretory proteins consisting of learning associated protein 18 (LAPS18) and related sequences. LAPS18 functions in the regulation of neuronal cell adhesion and/or movement and synapse attachment [
]. It has been shown to bind to the ApC/EBP (Aplysia CCAAT/enhancer binding protein) promoter and activate the transcription of ApC/EBP mRNA []. In mice hippocampal neurons, regulates dendritic and spine growth and synaptic transmission [].
This entry represents the N terminus of actin-like protein 7A. This domain interacts and forms a heterodimer with TES, a putative human tumor suppressor. This heterodimer then interacts with ENAH to form a heterotrimer [
].
This entry includes a group of nuclear dot-associated proteins, including Sp110/Sp140/Sp140L and autoimmune regulator from humans. They are proteins with a constituent of nuclear domains, also known as nuclear dots (NDs). Sequences similar to the Sp100 homodimerization/ND-targeting region occur in several other proteins and constitute a novel protein motif, termed HSR domain (for homogeneously-staining region) [
].Sp110 is a leukocyte-specific component of the nuclear body [
]. It may function as a nuclear hormone receptor transcriptional coactivator that may play a role in inducing differentiation of myeloid cells []. It is also involved in resisting intracellular pathogens and functions as an important drug target for preventing intracellular pathogen diseases, such as tuberculosis, hepatic veno-occlusive disease, and intracellular cancers [,
]. Sp110 gene polymorphisms may be associated with susceptibility to tuberculosis in Chinese population []. The function of nuclear protein Sp140 is not known, though it contains several chromatin related modules such as plant homeodomain (PHD), bromodomain (BRD) and SAND domain, which suggests a role in chromatin-mediated regulation of gene expression [
]. It also harbours a nuclear localisation signal and a dimerisation domain (HSR or CARD domain). The PHD finger of Sp140 presents an atypical fold which does not bind to histone H3 tails but binds to peptidylprolyl isomerase Pin1. Pin1 catalyses the isomerisation of a phospho-Threonine-Proline bond in Sp140-PHD and thus may modulate Sp140 function [
].Human Sp140 is an interferon inducible nuclear leukocyte-specific protein that may be involved in the pathogenesis of acute promyelocytic leukemia and viral infection [
]. It localises to LYSP100-associated nuclear dots and is also a component of the promyelocytic leukemia nuclear body (PML-NBs) [,
]. The Sp140 locus has been identified as a lymphocytic leukemia (CLL) risk locus [].This family also includes protein Sp140-like (SP140L) [
].
RAX is a transcription factor that plays a critical role in the eye and forebrain development of vertebrate species. It is involved in the establishment of the retina [
]. Mutations in the RAX gene cause microphthalmia, isolated 3, which is a disorder of eye formation, ranging from small size of a single eye to complete bilateral absence of ocular tissues []. Homeobox protein ceh-8 is the RAX orthologue from C. elegans required for cell specification of the RIA interneurons [].Retina and anterior neural fold homeobox protein 2 (RAX2) is known to bind to the Ret-1 and Bat-1 element within the rhodopsin promoter and may be involved in modulating the expression of photoreceptor specific genes [
].This entry includes a group of homeobox proteins, including RAX and RAX2.
Sushi repeat-containing protein SRPX2 is a multifunctional protein that is involved in seizure disorders, angiogenesis and cellular adhesion [
,
]. It acts as a ligand for the urokinase plasminogen activator surface receptor []. The overexpression of SRPX2 enhances cellular migration and adhesion in gastric cancer cells []. Mutations in SRPX2 gene cause rolandic epilepsy with speech dyspraxia and mental retardation X-linked (RESDX) [].Sushi repeat-containing protein SRPX is deleted in patients with X-linked retinitis pigmentosa [
].This entry includes SRPX and SRPX2.
RcpB is part of the tad operon of proteins []. The Tad (tight adherence) macromolecular transport system, present in many bacterial and archaeal species, represents an ancient and major new subtype of type II secretion. The three Rcp proteins (RcpA, RcpB, and RcpC) and TadD, a putative lipoprotein, are localised to the bacterial outer membrane [].
This entry represents the N-terminal domain of EP400, a component of the NuA4 histone acetyltransferase complex that was first identified through its ability to bind the adenovirus E1A protein [
,
]. The exact function of this domain is not known. This domain is largely low-complexity residues.
CzcE is involved in heavy-metal resistance [
]. It binds copper [], which induces a conformational change. It is thought that CzcE acts both as a Cu(I) and Cu(II) sensor and mediator of the multiple cross-responses of C. metallidurans CH34 against heavy metals by linking copper and cobalt-zinc-cadmium resistance [
].
Glutamate receptor-interacting proteins (GRIPs) are multi-PDZ domain scaffolding proteins required for dendrite development. There are two members of the GRIP family: GRIP1 and GRIP2/ABP (AMPA receptor binding protein) [
]. GRIPs regulate AMPAR receptor trafficking and neuronal plasticity [,
]. They also bind various receptor and signalling proteins, such as EphB receptors, ephrinB ligands, the proteoglycan NG2, the Fraser syndrome protein Fras1, and matrix metalloproteinase 5 [].This entry includes glutamate receptor-interacting protein 1/2 (GRIP1/2).
The yfhO gene is transcribed in Difco sporulation medium and the transcription is affected by the YvrGHb two-component system [
]. Some members of this family have been annotated as putative ABC transporter permease proteins.
Asp4 and Asp5 are putative accessory components of the SecY2 channel of the SecA2-SecY2 mediated export system, but they are not present in all SecA2-SecY2 systems [
]. This family of Asp4 proteins is found in Streptococcus spp.
WIP proteins form a plant specific subfamily of C2H2 zinc finger (ZF) proteins [
]. Arabidopsis TRANSPARENT TESTA 1 protein (At1g34790, also known as WIP1 or TT1) may act as a transcriptional regulator involved in the differentiation of young endothelium [,
].WIP2, also known as NTT, is a transcription factor that promotes replum development in Arabidopsis fruits [
].
Calcium-binding proteins (CaBPs) are a family of Ca(2+)-binding proteins related to calmodulin with strong sequence similarity to Calmodulin (CaM), which is an important component of Ca(2+)-mediated cellular signal transduction in the central nervous system [
]. Like calmodulin, CaBP family members regulate effectors such as voltage-gated Ca2+ channels in a Ca2+-dependent manner [,
]. CaBPs are localised in the brain and sensory organs. Mutations in calcium-binding protein 2 (CaBP2) cause hearing loss, revealing a role for CaBP2 in the mammalian auditory system [].This family contains calcium binding protein 1, 2, 4 and 5.
CCDC106, coiled-coil domain-containing protein 106, is a family of eukaryote proteins. Yeast two-hybrid screening has identified CCDC106 as a p53-interacting partner [
]. CCDC106 is a negative regulator of p53 and may be involved in tumourigenesis in some cancers by promoting the degradation of p53 protein and inhibiting its transactivity [].
This protein consists of a unit of helices and β-sheets that crystallises into a beautiful asymmetrical dodecameric barrel-structure, of two six-membered rings one on top of the other. It is expressed in bacteria but is of viral origin, as it is found in phage BcepMu and is probably a pathogenesis factor [
]. This entry also includes a putative cytoplasmic protein, STM4215, from Salmonella.
Members of this family are involved in mitochondrial biogenesis and G2/M phase cell cycle progression. They form a component of the mitochondrial ribosome large subunit (39S) which comprises a 16S rRNA and about 50 distinct proteins [
].
Bardet-Biedl syndrome 5 protein (BBS5) is part of the BBSome complex that may function as a coat complex required for sorting of specific membrane proteins to the primary cilia [
]. Mutations in the BBS5 gene cause Bardet-Biedl syndrome 5, which is a syndrome characterised by usually severe pigmentary retinopathy, early-onset obesity, polydactyly, hypogenitalism, renal malformation and mental retardation [,
].
This is a family of small proteins (34-48 aa) with a single TM region. Members can exhibit antibacterial activity against Gram-positive bacteria but not against Gram-negative bacteria [
]. Proteins in this entry include BsrG, BsrH and BsrE from Bacillus subtilis. They are the toxic components of the type I toxin-antitoxin (TA) system [,
].
HrpK1 is a putative Type III secretion system pore-forming bacterial protein. It allows transfer of pathogenic material from bacterial cytoplasm into the plant host cytoplasm.
The function of coiled-coil domain-containing protein 48 (CCDC48), also known as EF-hand and coiled-coil domain-containing protein 1 (EFCC1), is not known. Proteins in this family are typically between 161 and 575 amino acids in length.