Stomatal closure-related actin-binding (SCAB) proteins bind, bundle and stabilise actin filaments and regulate stomatal movement [
]. Homologues are known only from plants.
WD repeat-containing protein 11 (WDR11) is part of the WDR11 complex (consisting of WDR11, C17orf75 and FAM91A) that facilitates the tethering of AP-1-derived vesicles [
]. Mutations of the WDR11 gene has been linked to congenital hypogonadotropic hypogonadism (CHH) and Kallmann syndrome (KS), human developmental genetic disorders defined by delayed puberty and infertility []. Homologues are known from animals and plants.
This family includes the VP2 and VP3 internal coat proteins from Polyomaviruses, which are small dsDNA tumour viruses. Their capsids contain 360 copies of the VP1 proteins
arranged in 72 pentamers. This capsid encloses the internal proteins VP2 and VP3, as well as the viral DNA. A single copy of VP2 or VP3 associates with each VP1 pentamer. A
crystal structure shows that the C-terminal region of the VP2/VP3 proteininteracts with the VP1 pentamer [
].
Proteins in this family contain an MD-2-related lipid-recognition (ML) domain, which is implicated in lipid recognition, particularly in the recognition of pathogen related products. It has an immunoglobulin-like β-sandwich fold similar to that of E-set Ig domains. This domain is present in proteins from plants, animals and fungi, including the following proteins:NPC intracellular cholesterol transporter 2 (NPC2), which is known to bind cholesterol. Niemann-Pick disease type C2 is a fatal hereditary disease characterised by accumulation of low-density lipoprotein-derived cholesterol in lysosomes [
].Phosphatidylglycerol/phosphatidylinositol transfer protein (Npc2) from yeasts, which catalyses the intermembrane transfer of phosphatidylglycerol and phosphatidylinositol [
].House-dust mite allergen proteins such as Der f 2 from Dermatophagoides farinae and Der p 2 from Dermatophagoides pteronyssinus [
].
This family of proteins is regulated by the ferric uptake regulator protein Fur [
]. This family does not regulate the lutABC operon encoding iron sulfur-containing enzymes necessary for growth on lactate [].
This is a family of proteins from single-stranded DNA bacteriophages. The G protein is a major spike protein involved in attachment to the bacterial host cell. The virion is composed of sixty copies of each of the F, G and J proteins, and 12 copies of the H protein. There are twelve spikes formed by five G proteins, each a tight beta barrel, and one H protein [
,
].
Secretion of virulence factors in Gram-negative bacteria involves
transportation of the protein across two membranes to reach the cell exterior. There have been four secretion systems described in
sequence similarities in plant pathogens like Ralstonia and Erwinia [].The type III secretion system is of great interest, as it is used to
transport virulence factors from the pathogen directly into the host cell [
] and is only triggered when the bacterium comes into close contact withthe host. The protein subunits of the system are very similar to those of
bacterial flagellar biosynthesis. However, while the latter forms aring structure to allow secretion of flagellin and is an integral part of
the flagellum itself [], type III subunits in the outer membranetranslocate secreted proteins through a channel-like structure.
The Salmonella/Shigella invasion protein E gene (InvE) is one such type
III secretion protein subunit, and is localised to the outer membrane of the SPI I pathogenicity island, and is involved in the surface presentation.
This entry represents the EF-hand-like domain found in protein KASH5. KASH5 is a component of the LINC (LInker of Nucleoskeleton and Cytoskeleton) complex, involved in the connection between the nuclear lamina and the cytoskeleton [
]. It interacts (via the last 22 AA) with SUN1; this interaction mediates telomere localisation []. Proteins containing this domain also include lrmp from Zebra fish. It is a maternally expressed membrane and cytoskeletal linker protein, which is essential for attachment of the centrosome to the male pronucleus [
].
TonB-dependent transporters (TBDT) are bacterial outer membrane (OM) proteins that bind and transport ferric chelates called siderophores. While iron complexes constitute the majority of substrates for TBDTs, others, like vitamin B12, are also transported by this mechanism [
]. These transporters show high affinity and specificity for siderophores and require energy derived from the proton motive force across the inner membrane to transport them. The energy force is provided through interaction with an inner membrane protein complex consisting of TonB, ExbB, and ExbD []. The source of this energy is the ion electrochemical gradient of the cytoplasmic membrane, harvested by heteromultimeric complexes of ExbB and ExbD proteins, and transduced to the OM high affinity siderophore transporters by the protein TonB [].TonB is composed of three domains. The N-terminal transmembrane helix anchors the protein to the inner membrane and makes contact with ExbB and ExbD to form an energy transducing complex. The C-terminal globular domain directly contacts the transporters in the OM. These two domains are separated by a flexible, unstructured proline-rich domain that resides within the periplasm [
].Escherichia coli has only one TonB protein which is shared by different TBDTs involved in in the acquisition of various substrates, but most bacteria have more than one tonB gene [].
Secretion of virulence factors in Gram-negative bacteria involves
transportation of the protein across two membranes to reach the cell exterior [
]. There have been four secretion systems described in animal enteropathogens, such as Salmonella and Yersinia, with further
sequence similarities in plant pathogens like Ralstonia and Erwinia [].The type III secretion system is of great interest, as it is used to
transport virulence factors from the pathogen directly into the host cell and is only triggered when the bacterium comes into close contact with
the host. The protein subunits of the system are very similar to those of bacterial flagellar biosynthesis [
]. However, while the latter forms aring structure to allow secretion of flagellin and is an integral part of
the flagellum itself, type III subunits in the outer membranetranslocate secreted proteins through a channel-like structure.
Exotoxins secreted by the type III system do not possess a secretion signal,
and are considered unique for this reason []. Yersinia spp. secrete a serine/threonine kinase, YpkA, [
,
] that causes autophosphorylation of host cell components, although the exact targets are unknown at present. It
has also been suggested that the YpkA protein is involved in interferenceof signal transduction in the target cell [
].
This domain is SSO6904 present in Sulfolobus solfataricus. SSO6904 is a calcium binding protein thought to have a weak affinity for other cations such as Mg2+ and Zn2+. The structure of SSO6904 is similar to that of saposin-fold proteins. Saposin proteins are membrane-interacting glycoproteins required for the hydrolysis of certain sphingolipids by specific lysosomal hydrolases [
].
This entry represents ribosomal protein L19 from eukaryotes, as well as L19e from archaea [
]. L19/L19e is part of the large ribosomal subunit, whose structure has been determined in a number of eukaryotic and archaeal species [].
This entry represents proteins with an N-terminal NTF2 (nuclear transport factor 2) domain and a C-terminal RRM (RNA recognition motif) domain. It includes mammalian ATP- and magnesium-dependent helicase Ras GTPase-activating protein-binding proteins 1 and 2 [
] and UBP3-associated protein Bre5 from yeast, which is a co-factor required for de-ubiquitination []. This entry also includes AtMBD6 from Arabidopsis. AtMBD6 maintains gene silencing in Arabidopsis by interacting with RNA binding proteins [].
This entry represents the archaeal ribosomal protein S19e. It may be involved in maturation of the 30S ribosomal subunit.Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [
,
]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [
,
].
F-box only protein 4 (FBXO4) is a substrate-specific adaptor of SCF (SKP1-CUL1-F-box protein) E3 ubiquitin-protein ligase complex that mediates the ubiquitination and subsequent proteasomal degradation of target proteins [
].
TBCC-domain containing 1 (TBCCD1) has a role in internal cell organization in animals, Chlamydomonas reinhardtii, and trypanosomes [
]. In humans, it is required for centrosome and Golgi apparatus positioning [].
This protein family is restricted to the Actinomycetales, including Mycobacterium, Rhodococcus, Nocardia, Gordonii, and others. The invariant motif HEXXH, at the core of the best conserved region in the protein, suggests metallohydrolase activity, as does local sequence similarity in this region to other metallohydrolases.
Members of this family are absolutely restricted to the Mollicutes (Mycoplasma and Ureaplasma). All have a signal peptide, usually of the lipoprotein type, suggesting surface expression. Most members have lengths of about 280 residues but some members have a nearly full-length duplication. The mostly nearly invariant residue, a Trp,is part of a strongly conserved 9-residue motif, [ND]-W-[LY]-[WF]-X-[LF]-X-N-[LI], where X usually is hydrophobic. Because the hydrophobic six-residue core of this motif almost always contains three to four aromatic residues, we name this family aromatic cluster surface protein. Multiple paralogs may occur in a given Mycoplasma, usually clustered on the genome.
Arginine vasopressin-induced protein 1, also known as VIP32, may be involved in MAP kinase activation, epithelial sodium channel (ENaC) down-regulation and cell cycling [
].
This repeat domain is designated SSRS51, Streptococcal and Staphylococcal Surface Protein Repeat of size 51. Up to twelve tandem repeats can occur, on some of the longest proteins of their respective species. Nearly all member proteins carry the C-terminal sortase target sequence, LPXTG. The repeat structure and probable surface location suggest a possible adhesion function. A protein with this class of repeats may have other classes as well.
Developmental pluripotency-associated protein 2 (Dppa2, also known as ECAT15-2) and Dppa4 (also known as ECAT15-1) have a common DNA binding domain known as the SAP motif. Dppa2 has been found to bind to the regulatory region of Nkx2-5 in embryonic stem (ES) cells [
]. Dppa4 has been found to bind to both DNA and histone H3 necessary for the chromatin structure resistance to MNase and for the proper localization of Dppa4 in ES cell nuclei [].
Growth arrest-specific protein 1 (GAS1) is a glycosylphosphatidylinositol-anchored protein involved in growth suppression [
,
,
]. It inhibits cell proliferation when overexpressed in normal and transformed cell lines, and reduces tumour cell growth []. It also promotes apoptosis [], plays a role in mouse embryonic development [], and suppresses melanoma metastases [].
HMGXB3 (HMG-box containing 3) belongs to the High Mobility Group superfamily, and participates in a range of cellular processes including cell migration and proliferation [
,
].
This entry describes a family of integral membrane proteins. Some members of this family have been proposed to function as a thallium-specific efflux pump [
].
Ty are yeast transposons. A 5.7kb transcript codes for p3 a fusion protein of TYA and TYB. The TYA protein is analogous to the gag protein of retroviruses. TYA a is cleaved to form 46kd protein which can form mature virion like particles [
]. This entry corresponds to the capsid protein from Ty1 and Ty2 transposons.Yeast retrotransposon Ty1 produces its proteins as precursors that are subsequently cleaved by an aspartic protease encoded by the element. Cleavage of the gag and gag-Pol polyprotein precursors is a critical step in proliferation of retroviruses and retroelements. These cleavage events are essential for transposition as they release the active reverse transcriptase and integrase and they modify the structure of the virus-like particles in a way that is analogous to the morphological changes that occur during retrovirus core maturation [
,
,
].
This entry represents a group of calcium uptake proteins, including MICU1/2/3 from humans. They contain the conserved EF-hand Ca2+-binding domains. MICU1 and MICU2 are the main regulators of the mitochondrial Ca(2+)-uniporter (MCU) [
]. MICU2 forms a heterodimer with MICU1 to modulate MCU channel activity []. MICU3 a tissue-specific enhancer of mitochondrial calcium uptake [].
This family of proteins found in eukaryotes represents sperm acrosome-associated protein 9 (SPACA9, previously known as C9orf9 or MAST).Sperm acrosome-associated protein 9 has been suggested to form a complex with calcium-binding proteins calreticulin and caldendrin localized to the acrosome. Despite this, no known protein interaction motifs have been identified in MAST [
].
This entry represents a group of ubiquitin domain-containing proteins, including UBTD1 and UBTD2. UBTD1 regulates cellular senescence through a positive feedback loop with TP53 [
]. UBTD1 and UBE2D (E2 ubiquitin conjugating enzyme) have been shown to form a stable, stoichiometric complex []. UBTD2, also known as DC-UbP, interacts with deubiquitinating enzyme USP5 and the Ub-activating enzyme UbE1 [].
Protein B2 binds double-strand RNA (dsRNA) with high affinity and suppresses the host RNA silencing-based antiviral response. B2 is expressed by the insect Flock House virus (FHV) as a counter-defense mechanism against antiviral RNA silencing during infection. In vitro, B2 binds to dsRNA as a dimer and inhibits the cleavage of it by Dicer. B2 blocks cleavage of the FHV genome by Dicer and also the incorporation of FHV small interfering RNAs into the RNA-induced silencing complex [
].
This entry represents the C-terminal DNA-binding domain found in centromere-binding protein ParB, which is required for stable segregation. The C-terminal domain has a ribbon-helix helix (RHH) motif with a C-terminal loop (residues 119-128) following helix alpha-2. The domain forms a dimer with the C-terminal of the beta chain [
].
This entry includes cullin-associated NEDD8-dissociated proteins 1 (CAND1 also known as TIP120A) and 2 (CAND2); these proteins have a C-terminal TATA-binding protein interacting (TIP20) domain. CAND1 is required for the assembly of the SCF E3 ubiquitin ligase complex. The SCF ubiquitin E3 ligase consists of SKP1, CUL1 and F-box protein, and it regulates ubiquitin-dependent proteolysis. CAND1 binds to CUL1, preventing it from associating with the other components that form the ligase. Neddylation of CUL1 (or the presence of SKP1 and ATP) dissociates it from CAND1, allowing the ligase complex to form [
,
,
]. CAND1 also interacts with CUL3, a component of the Cul3-dependent E3 ubiquitin ligase complex []. CAND1 has been proposed to be an F-box protein exchange factor, and as substrates of the ligase complex are degraded by the proteasome and depleted, the ligase complex enters an intermediate, deneddylated state when CAND1 can bind, promoting dissociation of the substrate-recognition subunit and recruitment of a new substrate-recognition subunit []. CAND2 is uncharacterized but is assumed to have similar roles to CAND1.
This family of proteins are membrane localised chaperones that are required for correct plasma membrane localisation of amino acid permeases (AAPs) [
]. Shr3 prevents AAPs proteins from aggregating and assists in their correct folding. In the absence of Shr3, AAPs are retained in the ER.
Proteins in this entry contain the U-box domain, which has been suggested to be a modified RING finger motif where the metal-coordinating cysteines and histidines have been replaced with other amino acids [
]. RING finger protein 37 (RNF37, also known as Ubox5) belongs to the U box protein family, whose members are ubiquitin-protein ligases [].
KXD1 is part of the BORC complex that may regulate lysosome positioning [
]. This entry also includes CG10681 from fruit flies and KxDL motif-containing protein LO9-177 from rice. LO9-177 contributes to the promotion of leaf inclination and grain size by modulating cell elongation.
Nucleolar pre-ribosomal-associated protein 1 (Npa1) is required for ribosome biogenesis and operates in the same functional environment as Rsa3p and Dbp6p during early maturation of 60S ribosomal subunits [
]. The protein partners of Npa1p include eight putative helicases as well as the novel Npa2p factor. Npa1p can also associate with a subset of H/ACA and C/D small nucleolar RNPs (snoRNPs) involved in the chemical modification of residues in the vicinity of the peptidyl transferase centre []. The protein has also been referred to as Urb1.
Quorum-sensing regulator protein G (QseG, also known as YfhG) is involved in the regulation of virulence and metabolism in enterohemorrhagic Escherichia coli (EHEC) [
]. It is required for pedestal formation in host epithelial cells during infection and for translocation of effector molecules into host epithelial cells [
].
The Vanadium binding protein, Vanabin2, contains four α-helices connected by nine disulphide bonds. Vanadium accumulates in Ascidians however the biological reason remains unclear [
].
This domain can be found in animal CCDC34 and CCDC181 proteins. CCDC34 promotes cell proliferation and invasive properties in human cancer [
]. CCDC181 is a microtubule-binding protein that may play a role in mediating ciliary motility [].
CEP120 is a centrosomal protein required for the recruitment of CEP295 to the proximal end of new-born centrioles at the centriolar microtubule wall during early S phase in a PLK4-dependent manner [
]. It has been shown to interact with SPICE1 and CPAP []. This entry also includes CEP120-like proteins from plants and fungi, which do not have centrosomes.Mutations of the CEP120 gene cause short-rib thoracic dysplasia 13 with or without polydactyly (SRTD13) [
] and Joubert syndrome 31 (JBTS31) [].
Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [
,
]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [
,
].L35 is a basic protein of 60 to 70 amino-acid residues from the large subunit [
]. Like many basic polypeptides, L35 completely inhibits ornithine decarboxylase when present unbound in the cell, but the inhibitory function is abolished upon its incorporation into ribosomes []. It belongs to a family of ribosomal proteins, including L35 from bacteria, plant chloroplast, red algae chloroplasts and cyanelles. In plants it is a nuclear encoded gene product, which suggests a chloroplast-to-nucleus relocation during the evolution of higher plants [].The core structure of L35 has an α-β(3)-alpha fold arranged in two layers (alpha/beta).
This domain is found in cell division proteins which are required for kinetochore-spindle association [
]. Proteins containing this domain include budding yeast Spc105 and fission yeast Spc7.Spc7 is a component of the NMS (Ndc80-MIND-Spc7) super complex, which has a role in kinetochore function during late meiotic prophase and throughout the mitotic cell cycle []. Spc105 and Kre28 forms a kinetochore complex, which is required for kinetochore binding by a discrete subset of kMAPs (BIM1, BIK1 and SLK19) and motors (CIN8, KAR3) [].
This entry includes IQ domain-containing proteins F1, F2, F3, F5, and F6 from vertebrates. In mice, IQCF1 is a acrosomal protein involved in sperm capacitation and the acrosome reaction [
].
CCDC78 is a component of the deuterosome, a structure that promotes de novo centriole amplification in multiciliated cells that can generate more than 100 centrioles [
]. CCDC78 does not have the kinesin-motor domain. Mutations in CCDC78 genes cause centronuclear myopathy 4 (CNM4), which is a congenital muscle disorder characterised by progressive muscular weakness and wasting involving mainly limb girdle, trunk, and neck muscles. It may also affect distal muscles [].
The F-actin capping protein binds in a calcium-independent manner to the fast growing ends of actin filaments (barbed end) thereby blocking the exchange of subunits at these ends. Unlike gelsolin and severin this protein does not sever actin filaments. The F-actin capping protein is a heterodimer composed of two unrelated subunits: alpha () and beta (
). Neither of the subunits shows sequence similarity to other filament-capping proteins [
].This entry represents the alpha and beta subunits of the F-actin-capping protein.
The alpha subunit (CAPZA) is a protein of about 268 to 286 amino acid residues and the beta subunit (CAPZB) is about 280 amino acid residues.Their sequences are well conserved in eukaryotic species [
]. In Drosophila mutations in the alpha and beta subunits cause actin accumulation and subsequent retinal degeneration []. In humans CAPZA and CAPZB are part of the WASH complex that controls the fission of endosomes [].
Elongator complex protein 2 (Elp2) is a component of the RNA polymerase II elongator complex, which is a major histone acetyltransferase component of the RNA polymerase II (RNAPII) holoenzyme. The eukaryotic elongator complex has been associated with many cellular activities, including transcriptional elongation [
,
], but its main function is tRNA modification [,
]. It is required for the formation of 5-methoxy-carbonylmethyl (mcm5) and 5-carbamoylmethyl (ncm5) groups on uridine nucleosides present at the wobble position of many tRNAs [,
].
NST1 is a family of proteins that seem to be involved, directly or indirectly, in the salt sensitivity of some cellular functions in yeast. It does this without affecting sodium accumulation. It negatively affects salt-tolerance through an interaction with the splicing factor Msl1p. This interaction stresses the importance of efficient RNA processing under salt stress conditions [
].
The damage avoidance-tolerance pathway(s) requires functional recA, recF, recO, and recR genes, suggesting the mechanism to be daughter strand gap repair. The ruvABC genes or the recG gene is also required. The RecG pathway appears to be more active than the RuvABC pathway [
]. RecO may contain a mononucleotide-binding fold [].
The cobalt transport protein CbiN is part of the active cobalt transport system involved in uptake of cobalt in to the cell involved with cobalamin biosynthesis (vitamin B12). It has been suggested that CbiN may function as
the periplasmic binding protein component of the active cobalt transport system [].
The function of macrophage-expressed gene 1 protein (MPEG1) is not known. MPEG1 is a single-pass type I membrane protein; it is expressed in macrophages and peripheral blood monocytes [
].
This entry represents a group of bacterial proteins, including MT1774 and Rv1733c from Mycobacterium tuberculosis. Their function is not known. Homologues are known only from Actinobacteria.
Protein phosphorylation, which plays a key role in most cellular activities, is a reversible process mediated by protein kinases and phosphoprotein phosphatases. Protein kinases catalyse the transfer of the gamma phosphate from nucleotide triphosphates (often ATP) to one or more amino acid residues in a protein substrate side chain, resulting in a conformational change affecting protein function. Phosphoprotein phosphatases catalyse the reverse process.
Protein kinases fall into three broad classes, characterised with respect to substrate specificity []:Serine/threonine-protein kinasesTyrosine-protein kinasesDual specificity protein kinases (e.g. MEK - phosphorylates both Thr and Tyr on target proteins)Protein kinase function is evolutionarily conserved from Escherichia coli to human [
]. Protein kinases play a role in a multitude of cellular processes, including division, proliferation, apoptosis, and differentiation []. Phosphorylation usually results in a functional change of the target protein by changing enzyme activity, cellular location, or association with other proteins. The catalytic subunits of protein kinases are highly conserved, and several structures have been solved [], leading to large screens to develop kinase-specific inhibitors for the treatments of a number of diseases [].The protein kinase D family of enzymes consists of three isoforms: PKD1 (PKCmu), PKD2, and PKD3 (PKCnu). They all share a similar architecture with regulatory sub-domains that play specific roles in the activation, translocation and function of the enzymes. The PKD enzymes have recently been implicated in very diverse cellular functions, including Golgi organisation and plasma membrane directed transport, metastasis, immune responses, apoptosis and cell proliferation []. Each isoform is differentially regulated through phosphorylation [].
This group of proteins consists of TMP21 (also known as transmembrane emp24 domain-containing protein 10) and related proteins, which are members of the p24 family. The p24 family is a widely conserved family of transmembrane proteins that plays a functional role in the initiation of assembly of COPI (Coat protein I) coated vesicles. COPI coated vesicles are involved in protein transport within the early secretory pathway [
,
].p24 proteins are major membrane components of COPI- and COPII-coated vesicles and are implicated in cargo selectivity of ER to Golgi transport [
,
]. Multiple members of the p24 family are found in all eukaryotes, from yeast to mammals. Members of the p24 family are type I membrane proteins with a signal peptide at the amino terminus, a lumenal coiled-coil (extracytosolic) domain, a single transmembrane domain with conserved amino acids, and a short cytoplasmic tail. They may be grouped into at least three subfamilies based on primary sequence []. One subfamily comprises yeast Emp24p and mammalian p24A. Another subfamily comprises yeast Erv25p and mammalian Tmp21, and the third subfamily comprises mammalian gp25L proteins.
This entry represents a protein family specific to the genus Plasmodium. The merozoite surface antigen 2 (MSA-2) may play a role in the merozoite attachment to the erythrocyte. This protein was proposed to be a candidate for a protective vaccine against malaria [
,
].
A number of uncharacterised integral membrane proteins from yeast contain an internal duplication due to duplicated genes. Duplicated copies of genes may be classified in two types of cluster organisation. The first type includes genes sharing a significant level of identity in the amino acid sequences of their predicted protein product. They are recovered on two different chromosomes, transcribed in the same orientation and the distance between them is conserved. The second type of cluster is based on one gene unit tandemly repeated. This duplication is itself repeated elsewhere in the genome. The basic gene unit is recovered many times in the genome and is a component of a multigene family of unknown function. These organisations in clusters of genes suggest a 'Lego organisation' of the yeast chromosomes []. The proteins belonging to this family are of unknown function.
The N-end rule pathway is an ubiquitin (Ub)-dependent proteolytic system that mediates and regulates the degradation of intracellular proteins through the recognition of their N-terminal residues. NTAQ1 (also known as Nta1 in fungi) is an N-terminal amidohydrolase, which converts N-terminal Asn and Gln to the N-terminal residues Asp and Glu, the first step of the hierarchically organized N-end rule pathway []. The structure of Nta1 has been determined [].
The Golgi apparatus protein 1 (GLG1), which is located in Golgi cisterns of various cell types, can bind fibroblast growth factor and E-selectin. Sixteen cysteine-rich GLG1 repeats form the core of the protein and are located in the lumen. The C-terminal part of GLG1 is composed of a transmembrane region and a short cytoplasmic tail. The Cys-rich GLG1 repeat is a ~60 amino acid module that contains 4 Cys residues, which can form intrachain disulphide bridges [
].
This entry represents WDR91 and its homologues. WDR91 is part of the WDR81-WDR91 complex that functions as a negative regulator of the PI3 kinase/PI3K activity associated with endosomal membranes via BECN1, a core subunit of the PI3K complex. WDR91 has been shown to be recruited to endosomes by interacting with active guanosine triphosophate-Rab7 and inhibits Rab7-associated phosphatidylinositol 3-kinase activity [
].This entry also includes Sorf-1, a WDR91 homologue from C. elegans. Sorf-1 and Sorf-2 form a complex with Beclin1 and inhibit the activity of the PI3K complex [
].
F-box only protein 28 (FBXO28) is required for proper mitotic progression. It may regulate topoisomerase IIalpha decatenation activity and plays an important role in maintaining genomic stability [
].
FdhD is a protein essential for the activity of formate dehydrogenases (FDHs) [
], but it is not a component of membrane-bound formate dehydrogenase []. In Escherichia coli, it has been shown to function as a sulfurtransferase between IscS to the molybdenum cofactor prior to its insertion into formate dehydrogenase []. Sulfur transfer between IscS and FDH is a indispensable step for FDH activity [].
Flaviruses are small, enveloped RNA viruses that use arthropods such as mosquitoes for transmission to their vertebrate hosts, and include Yellow fever virus, West Nile virus, Tick-borne encephalitis virus, Japanese encephalitis virus, and Dengue virus 2 [
]. Flaviviruses consist of three structural proteins: the core nucleocapsid protein C, and the envelope glycoproteins M () and E. The virion of these viruses is a nucleocapsid covered by a lipoprotein envelope, where the nucleocapsid is a complex of capsid protein C and mRNA. The capsid protein C is a dimeric α-helical protein, and its interaction with RNA is critical for the production of viable virus particles [
].
The bacterial ribosomal protein L25 is bound to 5S rRNA along with L5 and L18, forming a separate domain of the ribosome [
]. The solution structure of protein L25 uncomplexed with RNA shows two significantly disordered loops and a closed β-barrel domain with a complex topology that has significant structural similarities to the N-terminal domain of the Thermus thermophilus ribosomal protein TL5, to the general stress protein CTC, and to the C-terminal anticodon-binding domain of Escherichia coli glutaminyl-tRNA synthetase (GlnRS) [,
]. GlnRS contains a duplication consisting of two L25-like β-barrels domains with the swapping of N-terminal strands.This superfamily represents the C-terminal domain, which has a mainly β-strand structure, found in ribosomal L25-like proteins.
This family consists of nucleolar protein 4 (NOL4) [
,
] and nucleolar protein 4-like (NOL4L). NOL4 has been identified as a methylation target for cervical [] and head and neck cancer [].
This family includes testis-specific expressed protein 55 (also known as TSCPA in human) from animals, which is involved in normal spermatogenesis [
,
].
Rpf2 is a conserved protein essential for the maturation of 25 S rRNA and the 60 S ribosomal subunit assembly [
]. It is part of the complex (Rpf2/Rrs1/rpL5/rpL11) that functions in intermediate stages of 66S preribosome maturation and assembles into 90S preribosomes containing 35S pre-rRNA [].
This entry represents a group of homeodomain containing proteins from animals, including PHTF1/2 from mammals. In rat PHTF1 is an integral membrane protein abundantly expressed in testis [
]. It is localised in the endoplasmic reticulum saccules applied to the trans face of the Golgi system [].
Proteins in this entry contain N-terminal PAH (paired amphipathic helix) repeats, a histone deacetylase interacting domain, and a Sin3, C-terminal domain. Sin3 proteins have at least three PAH domains (PAH1, PAH2, and PAH3). They are components of a co-repressor complex that silences transcription, playing important roles in the transition between proliferation and differentiation. Sin3 proteins are recruited to the DNA by various DNA-binding transcription factors such as the Mad family of repressors, Mnt/Rox, PLZF, MeCP2, p53, REST/NRSF, MNFbeta, Sp1, TGIF and Ume6 [
]. Sin3 acts as a scaffold protein that in turn recruits histone-binding proteins RbAp46/RbAp48 and histone deacetylases HDAC1/HDAC2, which deacetylate the core histones resulting in a repressed state of the chromatin []. The PAH domains are protein-protein interaction domains through which Sin3 fulfils its role as a scaffold. The PAH2 domain of Sin3 can interact with a wide range of unrelated and structurally diverse transcription factors that bind using different interaction motifs. For example, the Sin3 PAH2 domain can interact with the unrelated Mad and HBP1 factors using alternative interaction motifs that involve binding in opposite helical orientations []. The Sin3, C-terminal domain forms interactions with histone deacetylases [].
The Flavivirus genome polypepetide contains the capsid protein C (core protein),
the matrix protein (envelope protein M), the major envelope protein E, a numberof small non structural proteins (NS1, NS2A, NS2B, NS4A and NS4B), helicase and
RNA-directed polymerase (NS5) [].
Bms1p and Tsr1p represent a new family of factors required for ribosome
biogenesis. They are each independently required for 40S ribosomal subunitbiogenesis. Bms1p, a protein required for pre-rRNA processing, contains an
evolutionarily conserved guanine nucleotide-binding (G) domain with five conserved polypeptide loopsdesignated G1 through G5, which form contact sites with the guanine nucleotide
or coordinate the Mg(2+) ion. Sequences resembling G1 (consensus [GA]-x(4)-G-
K-[ST]; also known as a P-loop), G4 (consensus [NT]-K-x-D), and G5 (consensusS-[AG] are present in all Bms1 proteins, and either fully conform with theconsensus or contain, at most, single conservative substitutions. The G2 motif
(consensus G-P-[IV]-T) contains a T residue involved in the coordination of
the Mg(2+) required for GTP hydrolysis. The G3 motif diverges from theconsensus found in G proteins, D-x(2)-G; however, the D residue is replaced
with the conserved E residue. In contrast, Tsr1p lacks a P-loop and is notpredicted to bind GTP. It functions at a later step of 40S ribosome
production, possibly in assembly and/or export of 43S pre-ribosomal subunitsto the cytosol [
,
,
].
This entry includes proteins with a hyaluronan/mRNA-binding protein domain, found in the HABP4 protein family of hyaluronan-binding proteins, and the PAI-1 mRNA-binding protein, PAI-RBP1 (also known as SERBP1). HABP4 has been observed to bind hyaluronan (a glucosaminoglycan), but it is not known whether this is its primary role in vivo. It has also been observed to bind RNA, but with a lower affinity than that for hyaluronan [
]. PAI-1 mRNA-binding protein specifically binds the mRNA of type-1 plasminogen activator inhibitor (PAI-1), and is thought to be involved in regulation of mRNA stability []. However, in both cases, the sequence motifs predicted to be important for ligand binding are not conserved throughout the family, so it is not known whether members of this family share a common function.Hyaluronan/mRNA-binding protein may be involved in nuclear functions such as the remodeling of chromatin and the regulation of transcription [
,
].
Nmd3 acts as an adapter for the XPO1/CRM1-mediated export of the 60S ribosomal subunit [
,
]. It may activate the circularly permuted GTPase Lsg1 during 60S ribosome biogenesis [].
DRC1 is a key component of the nexin-dynein regulatory complex (N-DRC), essential for N-DRC integrity. It is required for the assembly and regulation of specific classes of inner dynein arm motors. It may also function to restrict dynein-driven microtubule sliding, thus aiding in the generation of ciliary bending [
]. Mutations of DRC1 gene cause Ciliary dyskinesia, primary, 21 (CILD21), which is a disorder characterised by abnormalities of motile cilia []. DRC2, also known as CCDC65, is an essential component of the nexin-dynein regulatory complex (N-DRC)[
]. DRC2 is necessary for the co-assembly of DRC2 and DRC1 to form the base plate of N-DRC.
RbpA binds to RNA polymerase (RNAP), stimulating transcription from principal, but not alternative sigma factor promoters [
]. RbpA stimulates transcription from several principal sigma factor HrdB (SigA)-dependent promoters but not from a SigR-dependent promoter. Stimulation occurs in the presence of the transcription inititation inhibitor rifampicin [].
This entry represents a domain superfamily found in the mitochondrial 39S ribosomal protein L28, mitochondrial 54S ribosomal protein L24 and 50S ribosomal protein L28. They belong to the ribosomal protein L28 family. They are components of the mitochondrial or non-mitochondrial large ribosomal subunits.Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [
,
]. About 2/3 of themass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits.
Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [
,
].
This entry has so far only been found in Actinobacteria, including at least five species of Mycobacterium, three of Corynebacterium, and Nocardia farcinica - always in a single copy per genome. The function is unknown.
Parvoviruses are some of the smallest viruses containing linear, non-segmented single-stranded DNA genomes, with an average genome size of 5000 nucleotides. Parvoviruses have been described that infect a wide range of invertebrates and vertebrates and are well known for causing enteric disease in mammals. Genomes contain two large ORFs: NS1 and VP1; other ORFs are found in some sub-types and different gene products can arise from splice variants and the use of different start codons [
,
].The Parvovirus coat protein VP1 together with VP2 forms a capsomer. Both of these proteins are formed from the same transcript using alternative translation start codons. As a result, VP1 and VP2 differ only in the N terminus region. VP2 is involved in packaging the viral DNA [
]. The mature viron contains three capsid proteins VP1, VP2, and VP3 and a noncapsid protein NS1. VP3 may arise from a third start codon with a favorable translationinitiation context which is present at position 3067 in the ChPV genome and which has been described in the goose and Muscovy duck parvoviruses [
].
Proteins in this entry are more often encoded within mobilisation-related contexts than not. This includes a CRISPR-associated gene region in Geobacter sulfurreducens PCA, and plasmids in Agrobacterium tumefaciens and Coxiella burnetii. They are found together with mobile mystery protein B, a member of the Fic protein family (
). Mobile mystery protein A is encoded by the upstream member of the gene pair and contains a helix-turn-helix domain.
The function of this protein is unknown. It often found as part of a two-gene operon with
, a protein that appears to span the membrane seven times. It has so far been found in the bacteria Anabaena sp. (strain PCC 7120), Agrobacterium tumefaciens, Rhizobium meliloti, and Gloeobacter violaceus.
The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [
]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [
,
,
].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [
]. This entry represents a family of Cas5 proteins that includes DevS from Myxococcus xanthus, as well as related proteins from Leptospira interrogans and Gemmata obscuriglobus. Cas5 is a key regulator of development that is encoded in a cluster of CRISPR-associated (cas) genes, and in the special case of M. xanthus has taken on a role in the control of fruiting body development. This entry is related to
,
, and
.
HSPB8, also known as HSP22, is a small heat shock protein that interacts with itself, cvHSP (HSPB7), MKBP (HSPB2), HSP27 alphaB-crystallin and HSP20 [
,
].
The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [
]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [
,
,
].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [
]. This entry is found in the C-terminal region of a family of CRISPR associated proteins of the Hmari subtype [
]. Except for some sequences from halophilic archaea, this domain contains a pair of CXXC motifs.