This family includes variable charge X-linked proteins (VCX1, VCX2 and VCX3) and basic protein Y1 or variable charge Y1 (VCY1). Their function is unknown, but they may mediate a process in spermatogenesis or may play a role in sex ratio distortion. It's worth noting that the VCX/Y family has members on both the X and Y chromosome, but all appear to be expressed exclusively in male germ cells [
].
The hok/gef family of Gram-negative bacterial proteins are toxic to cells
when overexpressed, killing the cells from within by interfering with avital function in the cell membrane [
]. Some family members (flm) increase the stability of unstable RNA [], some (pnd) induce the degradation of stable RNA at higher than optimum growth temperatures [], while others affect the release of cellular magnesium by membrane alterations []. Theproteins are short (50-70 residues), consisting of an N-terminal hydrophobic (possibly membrane spanning) domain, and a C-terminal periplasmic region, which contains the toxic domain. The C-terminal region contains a conserved cysteine residue that mediates homo-dimerisation in the gef protein, although dimerisation is not necessary for the toxic effect [
].
Ribosomal protein L24e/L24 is a ribosomal protein found in eukaryotes (L24) and in archaea (L24e, distinct from archaeal L24). L24e/L24 is located on the surface of the large subunit, adjacent to proteins L14 and L3, and near the translation factor binding site. L24e/L24 appears to play a role in the kinetics of peptide synthesis, and may be involved in interactions between the large and small subunits, either directly or through other factors. In mouse, a deletion mutation in L24 has been identified as the cause for the belly spot and tail (Bst) mutation that results in disrupted pigmentation, somitogenesis and retinal cell fate determination [
]. L24 may be an important protein in eukaryotic reproduction: in shrimp, L24 expression is elevated in the ovary, suggesting a role in oogenesis [], and in Arabidopsis, L24 has been proposed to have a specific function in gynoecium development [].The crystal structure of the L24e protein from Halobacterium marismortui (Haloarcula marismortui) has been determined [
,
]. The protein is composed of a single structural domain which forms an alpha/beta zinc-binding fold.
This family consist of various capsid proteins from members of the Herpesviridae. The capsid protein 2 (formerly known as VP23) in Human herpesvirus 1 (HHV-1) (Human herpes simplex virus 1) forms a triplex together with VP19C these fit between and link together adjacent capsomers as formed by VP5 and VP26 [
]. VP3 along with the scaffolding proteins helps to form normal capsids by defining the curvature of the shell and size of the particle [].
Nuclear receptor-interacting protein 1 (also known as nuclear factor RIP140) modulates transcriptional activation by steroid receptors such as NR3C1, NR3C2 and ESR1 [
]. It also modulates transcriptional repression by nuclear hormone receptors [].
Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [
,
]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [
,
].This entry represents protein L18e from the 50S ribosomal subunit of archaea.
This superfamily consists of several invasion associated locus B (IalB) proteins and related sequences. IalB is known to be a major virulence factor in Bartonella bacilliformis where it was shown to have a direct role in human erythrocyte parasitism. IalB is up-regulated in response to environmental cues signalling vector-to-host transmission. Such environmental cues would include, but not be limited to, temperature, pH, oxidative stress, and haemin limitation. It is also thought that IalB would aide B. bacilliformis survival under stress-inducing environmental conditions [
].
The bacterial protein RecR is an important regulator in the RecFOR homologous recombination pathway during DNA repair [
,
,
,
]. It acts with RecF and RecO forming a complex that facilitates the loading of RecA onto ssDNA [,
]. RecR is a zinc metalloprotein consisting of a N-terminal helix-hairpin-helix (HhH) motif, a middle region containing a zinc finger motif and a Toprim domain, and a C-terminal domain comprising a divergent Walker B motif and a C-terminal helix [,
].
This family includes a group of LytTR domain-containing proteins, such as the transcriptional regulatory protein LytR and BtsR, the sensory transduction protein LytT and the Accessory gene regulator protein A. LytTR domain is a DNA-binding, potential winged helix-turn-helix (wHTH) domain [
,
], named after Bacillus subtilis LytT and Staphylococcus aureus lytR response regulators, involved in the regulation of cell autolysis []. Members of this entry are bacterial cytoplasmic proteins that regulate the production of important virulence factors, like extracellular polysaccharides, toxins and bacteriocins. These response regulators of the microbial two-component signal transduction systems contain N-terminal cheY-like domains and the LytTR domain in the C-terminal part is expected to bind to specific DNA sequences in the upstream regions of target genes [
].
Members of this family belong to a large group that also contains thiamine monophosphate kinase (
), hydrogenase maturation factor HypE (
), AIR synthase, FGAM synthase (
), selenophosphate synthase (
), and other groups. In AIR synthase, the N-terminal domain forms the dimer interface of the protein and, upon dimerisation, forms the putative ATP binding domain, while the cleft formed between the N- and C-terminal domains is postulated to be a sulphate binding site []. It could be speculated that similar structure-function relationship exists in members of this family; however, there is no experimental data that proves the biochemical activity.
This family includes small CPxCG-related zinc finger archaeal proteins, which are thought to bind DNA and to be potential transcriptional regulators. These zinc fingers are characterised by the specific CPxCG pattern (variants CPxCx and CxxCG) and a second Cys/His pattern potentially more general, being 7-40 residues apart from each other. This family includes Nif-regulating protein A (Q8PW88) which contains the patterns CxxCG and HxxxH (the latter is unique to this protein and located 24 residues away from the CxxCG motif) to form the zinc finger motif. It is a DNA-binding protein involved in nitrogen regulation as it enhances the transcription of the nitrogen fixation (nif) operon under nitrogen-limited conditions [
].
This entry represents the ribosomal protein L35A which is required for the proliferation and viability of hematopoietic cells and plays a role in the 60S ribosomal subunit formation. Its structure adopts a six-stranded anti-parallel β-barrel analogous to the "tRNA binding motif"fold [
].
This entry represents a group of uncharacterised hypothetical proteins from archaea and bacteria, including the 8.4kDa protein MTH865 from Methanobacterium thermoautotrophicum. The NMR structure of MTH865 reveals an EF-Hand-like fold consisting of four helices in two hairpins [
].
MukB is a 170kDa protein that is involved in ATP-dependent chromosome partitioning during cell division in Escherichia coli [
,
]. Its domain structure is reminiscent of the eukaryotic motor proteins kinesin and myosin []. MukB, like SMC (structural maintenance of chromosomes) proteins with which it shares function, has a dimeric structure and similar domain architecture []. It is a filamentous protein with globular domains at the ends, and also having DNA binding and nucleotide binding abilitiesIt forms a homodimer with a rod-and-hinge structure with a pair of large, C-terminal globular domains at one end and a pair of small, N-terminal globular domains at the other []. The N-terminal domain carries a putative Walker A nucleotide-binding region and the C-terminal domain has been shown to bind to DNA [].
SRP-independent targeting protein 3 (SND3, previously known as PHO88) is localized to the endoplasmic reticulum (ER). SND3 works together with SND1 and SND2; these proteins function in parallel with the SRP and GET pathways to target a broad range of substrates to the ER. The SND proteins constitute an alternative targeting route to the ER [
]. SND3/PHO88 was first identified as a Saccharomyces cerevisiae protein involved in inorganic phosphate transport [].
Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [,
]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [
,
].Ribosomal protein L29 is one of the proteins from the large ribosomal subunit. L29 belongs to a family of ribosomal proteins of 63 to 138 amino-acid residues which, on the basis of sequence similarities, groups:
Red algal L29.Bacterial L29.Mammalian L35Caenorhabditis elegans L35 (ZK652.4).Yeast L35.L29 is located on the surface of the large ribosomal subunit, where it participates in forming a protein ring that surrounds the polypeptide exit channel, providing structural support for the ribosome [
]. L29 is involved in forming the translocon binding site, along with L19, L22, L23, L24, and L31e. In addition, L29 and L23 form the interaction site for trigger factor (TF) on the ribosomal surface, adjacent to the exit tunnel []. L29 forms numerous interactions with L23 and with the 23S rRNA.This family includes eubacterial and archaeal L29 and eukariotic L35 ribosomal proteins, which constitute the uL29 family [
].
During development of the Drosophila retina, the bride of sevenless (boss)
gene is required in photoreceptor neuron R8 for the development ofphotoreceptor neuron R7, suggesting that boss encodes or regulates an
R7-specific inductive cue []. The induction of R8 photoreceptor neuronneighbouring cells to assume an R7 cell fate is believed to be mediated by
a direct interaction of boss with sevenless [,
]. An alternative model hasbeen proposed in which the boss protein functions as a receptor, based on
a (superficial) similarity to the G-protein-coupled family of membrane receptors [
]. The boss gene encodes a protein of 896 amino acids with a putative
N-terminal signal sequence, a large N-terminal extra-cellular domain, seventransmembrane (TM) segments and a C-terminal cytoplasmic tail [
,
]. Theboss protein from Drosophila virilis (Fruit fly) shares a high level of amino acid
identity with the Drosophila melanogaster (Fruit fly) homologue, with highest levels ofsimilarity in the TM domains [
].
This entry represetnts the PHD domain, a DNA-binding domain, on the additional sex combs (ASX) and additional sex combs-like (AsxL) proteins [
]. The Asx protein acts as an enhancer of trithorax and polycomb in displaying bidirectional homoeotic phenotypes in Drosophila, suggesting that it is required for maintenance of both activation and silencing of Hox genes. Asx is required for normal adult haematopoiesis and its function depends on its cellular context [].
Proteins in this family contains a microtubule binding domain that is found in MAP2, MAP4, Tau, and their homologues. All isoforms contain a conserved C-terminal domain containing tubulin-binding repeats (
), and a N-terminal projection domain of varying size. This domain has a net negative charge and exerts a long-range repulsive force. This provides a mechanism that can regulate microtubule spacing which might facilitate efficient organelle transport [,
].MAP2 may stabilise the microtubules against depolymerisation [
,
]. MAP4 is a non-neuronal microtubule-associated protein that promotes microtubule assembly [
,
]. Tau promotes microtubule assembly and stability, and might be involved in the establishment and maintenance of neuronal polarity [
]. Hyperphosphorylated forms of TAU has been linked to Alzheimer disease [].
Kinase associated protein B (KapB) is one of the major histidine kinases that provide phosphate input in the phosphorelay to produce SpoOA approximately P, the key transcription factor controlling the initiation of sporulation in Bacillus subtilis [
]. It forms an anti-parallel beta sheet with an extending alpha helical region.
This entry represents a group of plant proteins, including PDV1 and PDV2. They mediate recruitment of the dynamin-related protein ARC5 to the plastid division site [
].
Many bacteria are covered in a layer of surface-associated polysaccharide called the capsule. These capsules can be divided into four groups depending upon the organisation of genes responsible for capsule assembly, the assembly pathway and regulation [
]. This family plays a role in group 1 capsule biosynthesis. It is likely to be involved in the later stages of capsule assembly.Structurally, Wzi consists of an 18-stranded β-barrel with a periplasmic helical bundle that has a role in the recognition of the capsular polysaccharide. It is predicted that the Wzi-polysaccharide interaction is critical in the initialisation step of the functional capsule biosynthesis and the later steps of the capsule assembly [
,
].
This entry represents envelope proteins from a variety of retroviruses. It includes the GP41 subunit of the envelope protein complex from Human immunodeficiency virus (HIV) and Simian-Human immunodeficiency virus (SIV), which mediate membrane fusion during viral entry [
]. It has a core composed of a six-helix bundle and is folded by its trimeric N- and C-terminal heptad-repeats (NHR and CHR) []. Derivatives of this protein prevent HIV-1 from entering cell lines and primary human CD4+ cells in vitro [], making it an attractive subject of gene therapy studies against HIV and related retroviruses.The entry also represents envelope proteins from Bovine immunodeficiency virus, Feline immunodeficiency virus and Equine infectious anemia virus (EIAV) [
,
], as well as the Gp36 protein from Mouse mammary tumor virus (MMTV) [] and Human endogenous retrovirus (HERV).
Cep55 is a centrosome protein that plays a role in mitotic exit and cytokinesis [
]. It is a regulator of the PI3K/AKT pathway and has been linked to cancers [].
This family represents eukaryotic Coiled-coil domain-containing protein 13 (CCDC13). Human CCDC13 is a satellite protein required for ciliogenesis and genome stability [
].
ARMC2 is required for sperm flagellum axoneme organization and function. It is involved in axonemal central pair complex assembly and/or stability [
].
This family includes tetratricopeptide repeat protein 36 (TTC36) and homologues. The function of TTC36 is unknown. This entry also includes HmgX from Neosartorya fumigata, which is part of the L-tyrosine degradation gene cluster that mediates the biosynthesis of the brownish pigment pyomelanin as an alternative melanin [
,
].
Effectors of transcription (ETs) are plant-specific regulatory proteins characterized by the presence of two to five C-terminal DNA- and Zn-binding repeats, and a highly conserved cysteine pattern. There are three ETs in Arabidopsis thaliana. AtET2 plays a role in the regulation of plant hormone GA (gibberellin) and cell-cycle-related protein GASA4 [
,
]. Regulation of GA response has also been observed in Brassica napus ET, and could be a conserved feature of these proteins [].
Bicarbonate (HCO
3-) transport mechanisms are the principal regulators of pH in animal cells. Such transport also plays a vital role in acid-base movements in the stomach, pancreas, intestine, kidney, reproductive organs and the central nervous system. Functional studies have suggested four different HCO
3-transport modes. Anion exchanger proteins exchange HCO
3-for Cl
-in a reversible, electroneutral manner [
]. Na+/HCO
3-co-transport proteins mediate the coupled movement of Na
+and HCO
3-across plasma membranes, often in an electrogenic manner [
]. Na+driven Cl
-/HCO
3-exchange and K
+/HCO
3-exchange activities have also been detected in certain cell types, although the molecular identities of the proteins responsible remain to be determined.Sequence analysis of the two families of HCO
3-transporters that have been cloned to date (the anion exchangers and Na
+/HCO
3-co-transporters) reveals that they are homologous. This is not entirely unexpected, given that they both transport HCO
3-and are inhibited by a class of pharmacological agents called disulphonic stilbenes [
]. They share around ~25-30% sequence identity, which is distributed along their entire sequence length, and have similar predicted membrane topologies, suggesting they have ~10 transmembrane (TM) domains.Anion exchange proteins participate in pH and cell volume
regulation. They are glycosylated, plasma-membrane transport proteins thatexchange hydrogen carbonate (HCO
3-) for chloride (Cl
-) in a reversible,
electroneutral manner [,
]. To date three anion exchanger isoforms havebeen identified (AE1-3), AE1 being the previously-characterised erythrocyte
band 3 protein. They share a predicted topology of 12-14 transmembrane (TM)domains, but have differing distribution patterns and cellular localisation.
The best characterised isoform, AE1, is known to be the most abundantmembrane protein in mature erythrocytes. It has a molecular mass of ~95kDa
and consists of two major domains. The N-terminal 390 residues form a water-soluble, highly elongated domain that serves as an attachment site for the
binding of the membrane skeleton and other cytoplasmic proteins. Theremainder of the protein is a 55kDa hydrophobic domain that is responsible
for catalysing anion exchange. The function of the analogous domains of AE2and AE3 remains to be determined [
].AE3 is an anion exchanger that is primarily expressed in the brain and
heart. Several tissue-specific variants have been identified, which arisedue to both alternative promoter and exon usage. Two AE3-encoding cDNAs have
been isolated from human heart. These clones share long portions of commonsequence but have different 5' ends, therefore encoding distinct N-terminal
amino acid sequences. The longer AE3 polypeptide (1232 amino acids) displays~96% amino acid sequence identity to the rat and mouse AE3 'brain isoforms'.
The shorter polypeptide (1034 amino acids) corresponds to the rat AE3'cardiac isoform'. Studies of Cl- transport suggest that both isoforms are capable of anion exchange [
,
]. Mutations in AE3 has been related to epilepsy [] and different retinal diseases [].
Bicarbonate (HCO
3-) transport mechanisms are the principal regulators of pH in animal cells. Such transport also plays a vital role in acid-base movements in the stomach, pancreas, intestine, kidney, reproductive organs and the central nervous system. Functional studies have suggested four different HCO
3-transport modes. Anion exchanger proteins exchange HCO
3-for Cl
-in a reversible, electroneutral manner [
]. Na+/HCO
3-co-transport proteins mediate the coupled movement of Na
+and HCO
3-across plasma membranes, often in an electrogenic manner [
]. Na+driven Cl
-/HCO
3-exchange and K
+/HCO
3-exchange activities have also been detected in certain cell types, although the molecular identities of the proteins responsible remain to be determined.Sequence analysis of the two families of HCO
3-transporters that have been cloned to date (the anion exchangers and Na
+/HCO
3-co-transporters) reveals that they are homologous. This is not entirely unexpected, given that they both transport HCO
3-and are inhibited by a class of pharmacological agents called disulphonic stilbenes [
]. They share around ~25-30% sequence identity, which is distributed along their entire sequence length, and have similar predicted membrane topologies, suggesting they have ~10 transmembrane (TM) domains.Anion exchange proteins participate in pH and cell volume
regulation. They are glycosylated, plasma-membrane transport proteins thatexchange hydrogen carbonate (HCO
3-) for chloride (Cl
-) in a reversible,
electroneutral manner [,
]. To date three anion exchanger isoforms havebeen identified (AE1-3), AE1 being the previously-characterised erythrocyte
band 3 protein. They share a predicted topology of 12-14 transmembrane (TM)domains, but have differing distribution patterns and cellular localisation.
The best characterised isoform, AE1, is known to be the most abundantmembrane protein in mature erythrocytes. It has a molecular mass of ~95kDa
and consists of two major domains. The N-terminal 390 residues form a water-soluble, highly elongated domain that serves as an attachment site for the
binding of the membrane skeleton and other cytoplasmic proteins. Theremainder of the protein is a 55kDa hydrophobic domain that is responsible
for catalysing anion exchange. The function of the analogous domains of AE2and AE3 remains to be determined [
].AE2 (~1240 amino acids) is a non-erythroid anion exchanger. It was cloned from choroid plexus but has been detected in many organs including the gastrointestinal tract and kidney. It is expressed in both epithelial and non-epithelial cells, and may be present in the Golgi apparatus in addition to the cell membrane [
]. Three AE2 N-terminal variants have been described, arising due to the presence of alternative promoter sites within the gene. They are referred to as AE2a-c and have differing distribution patterns: AE2a is expressed in all tissues; AE2b exhibits a more restricted distribution, with highest levels in the stomach; and AE2c is expressed only in the stomach [].
Bicarbonate (HCO
3-) transport mechanisms are the principal regulators of pH in animal cells. Such transport also plays a vital role in acid-base movements in the stomach, pancreas, intestine, kidney, reproductive organs and the central nervous system. Functional studies have suggested four different HCO
3-transport modes. Anion exchanger proteins exchange HCO
3-for Cl
-in a reversible, electroneutral manner [
]. Na+/HCO
3-co-transport proteins mediate the coupled movement of Na
+and HCO
3-across plasma membranes, often in an electrogenic manner [
]. Na+driven Cl
-/HCO
3-exchange and K
+/HCO
3-exchange activities have also been detected in certain cell types, although the molecular identities of the proteins responsible remain to be determined.
Sequence analysis of the two families of HCO
3-transporters that have been cloned to date (the anion exchangers and Na
+/HCO
3-co-transporters) reveals that they are homologous. This is not entirely unexpected, given that they both transport HCO
3-and are inhibited by a class of pharmacological agents called disulphonic stilbenes [
]. They share around ~25-30% sequence identity, which is distributed along their entire sequence length, and have similar predicted membrane topologies, suggesting they have ~10 transmembrane (TM) domains.Anion exchange proteins participate in pH and cell volume
regulation. They are glycosylated, plasma-membrane transport proteins thatexchange hydrogen carbonate (HCO
3-) for chloride (Cl
-) in a reversible,
electroneutral manner [,
]. To date three anion exchanger isoforms havebeen identified (AE1-3), AE1 being the previously-characterised erythrocyte
band 3 protein. They share a predicted topology of 12-14 transmembrane (TM)domains, but have differing distribution patterns and cellular localisation.
The best characterised isoform, AE1, is known to be the most abundantmembrane protein in mature erythrocytes. It has a molecular mass of ~95kDa
and consists of two major domains. The N-terminal 390 residues form a water-soluble, highly elongated domain that serves as an attachment site for thebinding of the membrane skeleton and other cytoplasmic proteins. The
remainder of the protein is a 55kDa hydrophobic domain that is responsiblefor catalysing anion exchange. The function of the analogous domains of AE2
and AE3 remains to be determined [].Naturally-occuring mutations have been characterised in the AE1 gene, which
give rise to forms of several inherited human diseases. Around 20% ofhereditary spherocytosis cases arise from heterozygosity for AE1 mutations,
and result in the absence or decrease of the mutant protein in the red cellmembrane. Similarly, familial distal renal tubular acidosis, a condition
associated with kidney stones, has been shown to be associated withmutations of AE1 of the renal collecting duct alpha-intercalated cell, and
it has been postulated that such mutations may affect the targeting of theAE1 protein, which is usually directed to the basolateral membrane of
these cells [].Some of the proteins in this group are responsible for the molecular basis of the blood group antigens, surface markers on the outside of the red blood cell membrane. Most of these markers are proteins, but some are carbohydrates attached to lipids or proteins [Reid M.E., Lomas-Francis C. The Blood Group Antigen FactsBook Academic Press, London / San Diego, (1997)]. Band 3 anion transport protein (Anion exchange protein 1) belongs to the Diego blood group system and is associated with Di(a/b), Wr(a/b), Wd(a), Rb(a and WARR antigens.
This domain is predominantly found in various hypothetical archaeal proteins. Its exact function has not been defined yet. It possesses a zinc ribbon fold [
].
Demands on the proteasome increase during environmental stresses. TMA17 (ADC17) is a stress responsive gene that functions as a proteasome assembly chaperone. It interacts with the amino terminus of Rpt6 to assist formation of the Rpt6-Rpt3 ATPase pair, an early step in proteasome assembly [
,
,
].
This group of proteins are functionally uncharacterised. They include YqcI and YcgG from Bacillus subtilis. The alignment contains a conserved FPC motif at the N terminus and CPF at the C terminus.
This entry represents the large envelope protein S from hepatitis B virus. The large envelope protein exists in two topological conformations, one which is termed 'external' or Le-HBsAg and the other 'internal' or Li-HBsAg. In its external conformation the protein attaches the virus to cell receptors and thereby initiating infection. This interaction determines the species specificity and liver tropism. In its internal conformation the protein plays a role in virion morphogenesis and mediates the contact with the nucleocapsid like a matrix protein [
,
,
].
This entry represents a group of animal proteins, including protein POLR1D, isoform 2 from humans. The function of this isoform is not clear. However, mutations of the POLR1D gene have been linked to Treacher Collins syndrome, a mandibulofacial dysostosis caused by mutations in genes involved in ribosome biogenesis and synthesis [
].
This family contains several Phlebovirus non structural proteins which act as a major determinant of virulence by antagonising interferon beta gene expression [
] or by downregulating the host basal transcription factor TFIIH subunit p62 [].
Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [
,
]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [
,
].L32 is a protein from the large ribosomal subunit that contains a surface-exposed globular domain and a finger-like projection that extends into the RNA core to stabilize the tertiary structure. L32 does not appear to play a role in forming the A (aminacyl), P (peptidyl) or E (exit) sites of the ribosome, but does interact with 23S rRNA, which has a "kink-turn"secondary structure motif. L32 is overexpressed in human prostate cancer and has been identified as a stably expressed housekeeping gene in macrophages of human chronic obstructive pulmonary disease (COPD) patients. In Schizosaccharomyces pombe, L32 has also been suggested to play a role as a transcriptional regulator in the nucleus. Found in archaea and eukaryotes, this protein is known as L32 in eukaryotes and L32e in archaea [
,
,
,
,
,
,
].
This family is involved in the chronological life-span of S. cerevisiae. Over-expression leads to an extended viability of wild-type strains, indicating a role in regulation [
]. There are four cysteine residues conserved in the N-terminal region involved in zinc binding and have been shown to be essential for protein functions. Ecl1-family proteins are essential for sexual development caused by zinc depletion in S. pombe. They are also important for cellular response to zinc limitation [].
Peroxin-22 is an integral peroxisomal membrane protein. The N terminus of peroxin-22 is located in the matrix, while the C terminus is located in the cytosol. Peroxin-22 interacts with the ubiquitin-conjugating enzyme Pex4p, anchoring it at the peroxisomal membrane [
]. This association is required for ubiquitination of peroxisomal import receptor Pex5p mediated by Pex4p [,
,
].
This entry includes Gp16 and related proteins [
]. Gp16 is a component of the cylindrical core that assembles on the inner surface of the capsid during capsid formation and plays a role in viral DNA ejection into the host cell []. Bacteriophage T7 proteins Gp15 and Gp16 form a spiral ring complex that binds to both the viral DNA and the host inner membrane [].T7 is a member of the Podoviridae, having short, noncontractile tails that are too short to span a bacterial cell envelope. Its internal core, consists of Gp14/ Gp15/Gp16, is essential for both virion morphogenesis and ejection of its genome [
].
This entry includes Gp15 and related proteins. Gp15 is a component of the cylindrical core that assembles on the inner surface of the capsid during capsid formation and plays a role in viral DNA ejection into the host cell [
]. Bacteriophage T7 proteins Gp15 and Gp16 form a spiral ring complex that binds to both the viral DNA and the host inner membrane [].T7 is a member of the Podoviridae, having short, noncontractile tails that are too short to span a bacterial cell envelope. Its internal core, consists of Gp14/ Gp15/Gp16, is essential for both virion morphogenesis and ejection of its genome [
].
This group is distantly related (sharing some sequence motifs) both to the integrated thioesterase domains (TEI) found in type I polyketide synthases (PKSs) and non-ribosomal peptide synthetases (NRPSs), and to the related stand-alone (non-integrated) type II thioesterase (TEII; see
for a full description). Therefore, members of this group are stand-alone proteins related to both TEI and TEII.
Bradyrhizobium japonicum Bll6574 protein (
) is similar to proteins in this group, but has a LysM domain (
) at the C terminus.
This entry includes Gp14 and related proteins. Gp14 is a component of the cylindrical core that assembles on the inner surface of the capsid during capsid formation and plays a role in viral DNA ejection into the host cell [
].T7 is a member of the Podoviridae, having short, noncontractile tails that are too short to span a bacterial cell envelope. Its internal core, consists of Gp14/ Gp15/Gp16, is essential for both virion morphogenesis and ejection of its genome [
].
Members of this family are integral membrane proteins that belong to the small multidrug resistance (SMR) protein family. They confer resistance to a wide range of toxic compounds by removing them for the cells. The efflux is coupled to an influx of protons. An example is Escherichia coli EmrE (
) which plays a role in pH and osmotic stress response and biofilm formation. It confers resistance to a wide range of toxic compounds, including ethidium, methyl viologen [
,
], benzalkonium, propidium, dequalinium, different aromatic cation antibiotics, being also involved in ethidium bromide efflux [,
]. It simultaneously binds and co-transports proton and drug [,
]. Structural studies of EmrE revealed that the transmembrane (TM) helices undergo a complex reorientation to bind and transport diverse substrates. It may function not only as a proton-coupled antiporter but also as a proton-coupled symporter or uncoupled uniporter that potentially confers susceptibility rather than resistance, because the inward proton motive force and negative-inside membrane potential in bacteria would lead to concentrative uptake of toxic cations [].
The enzyme responsible for nitrogen fixation, the nitrogenase, shows a high degree of conservation of structure, function, and amino acid sequence across wide phylogenetic ranges. All known Mo-nitrogenases consist of two components, component I (also called dinitrogenase, or Fe-Mo protein), an alpha2beta2 tetramer encoded by the nifD and nifK genes, and component II (dinitrogenase reductase, or Fe protein) a homodimer encoded by the nifH gene [
,
] which has an Fe4S4 cluster bound between the subunits and two ATP-binding domains. The Fe protein supplies energy by ATP hydrolysis, and transfers electrons from reduced ferredoxin or flavodoxin to component 1 for the reduction of molecular nitrogen to ammonia [,
]. Nitrogenase contains two unusual rare metal clusters; one of them is the iron molybdenum cofactor (FeMo-co), which is considered to be the site of dinitrogen reduction and whose biosynthesis requires the products of the nifNE operon and of some other nif genes []. It has been proposed that nifNE might serve as a scaffold upon which FeMo-co is built and then inserted into component I [].This entry represents the nitrogenase iron protein (component II), which is encoded by nifH.
This domain represents VP4, a minor capsid protein from dicistroviridae which is processed from the capsid polyprotein. The dicistroviridae is a group of small, RNA-containing viruses that are closely structurally related to the picornaviridae. VP4 is a short, extended polypeptide chain found within the viral capsid, at the interface between the external protein shell and packaged RNA genome [
].
C2CD5, also known as CDP138 or KIAA0528, is a C2 domain-containing phosphoprotein. It is a substrate for protein kinase Akt2, and it may be involved in the regulation of GLUT4 vesicle-plasma membrane fusion in response to insulin. The C2 domain of C2CD5 was shown to be capable of binding Ca(2+) and lipid membranes [
]. Other studies indicate that C2CD5 is a CDK5- and FIBP-interacting protein, forming a complex with these proteins that is involved in cell proliferation and migration [].
Polyadenylate-binding protein-interacting protein 5/6
Type:
Family
Description:
This entry represents a group of plant proteins, including CID5 (also known as IPD1) and CID6 from Arabidopsis. They contain a CUE-like domain. CID5 regulates the endocycle leading to hypocotyl elongation and this function is controlled by blue and far-red light [
].
Eukaryotic and prokaryotic molybdoenzymes require a molybdopterin cofactor
(MoCF or Moco) for their activity. The biosynthesis of this cofactor involves acomplex multistep enzymatic pathway. In bacteria, The final step in MoCF synthesis is the attachment of mononuclear Mo to MPT, a process that requires MoeA and which is enhanced by MogA in an Mg2 ATP-dependent manner [
].In eukaryotes, MogA and MoeA are fused into a single polypeptide chain, such as in Drosophila protein cinnamon [
]. The corresponding mammalian protein gephyrin has also been implicated in the anchoring of glycinergic receptors to the cytoskeleton at inhibitory synapses [,
].
Ribosome biogenesis protein C1orf109 is involved in the cytoplasmic maturation steps of pre-60S ribosomal particles by promoting the release of shuttling protein RSL24D1/RLP24 from the pre-ribosomal particles [
]. It has been first identified as the downstream target of a protein kinase, casein kinase 2 (CK2) which is upregulated in cancer cells. C1orf109 has been found upregulated in cancer cells and therefore, it has been suggested to be important in the regulation of cancer cell proliferation. The precise mechanism of CK2 targeting C1orf109 is not well characterised. It is found to be localised in the nucleus and cytoplasm [].
Ribosomal protein L22 (L17 in eukaryotes) is a core protein of the large ribosomal subunit. It is the only ribosomal protein that interacts with all six domains of 23S rRNA, and is one of the proteins important for directing the proper folding and stabilizing the conformation of 23S rRNA. L22 is the largest protein contributor to the surface of the polypeptide exit channel, the tunnel through which the polypeptide product passes. L22 is also one of six proteins located at the putative translocon binding site on the exterior surface of the ribosome [
,
].
Ribosomal protein L17 is one of the proteins from the large ribosomal subunit. Bacterial L17 is a protein of 120 to 130 amino-acid residues while yeast YmL8 is
twice as large (238 residues). The N-terminal half of YmL8 is colinearwith the sequence of L17 from Escherichia coli.
Nse5 and Nse6 are non-structural nuclear proteins that are critical for chromosome segregation in fission yeast [
]. Nse5 forms a dimer with Nse6 and facilitates DNA repair as part of the Smc5-Smc6 holocomplex. Nse6 is also known as KRE29.
SPG5 (stationary phase protein 5) was identified as a protein required for survival at high temperature during stationary phase in Saccharomyces cerevisiae [
]. SPG5 is a positive regulator of the proteasome that is strongly induced in the stationary phase. It is critical for survival of cells that have ceased to proliferate due to nutrient limitation [].
The tumour suppressor protein adenomatous polyposis coli (APC) has a nuclear export activity as well as many different intracellular functions. The region close to the N terminus, termed APC-(129-250), consists of three α-helices forming two separate antiparallel coiled coils [
]. This entry represents a domain towards the N terminus that covers part of this region.
CDCP1 is a cell surface glycoprotein that has been recognized both as a tumour marker and as a potential target to disrupt progression of cancer [
]. CDCP1 has been shown to modulate cellsubstratum adhesion and motility in colon cancer cell lines [].
This family consists of rotaviral non-structural RNA binding protein 34 (NS34 or NSP3). The NSP3 protein has been shown to bind viral RNA. The NSP3 protein consists of 3 conserved functional domains; a basic region which binds ssRNA, a region containing heptapeptide repeats mediating oligomerisation and a leucine zipper motif [
]. NSP3 may play a central role in replication and assembly of genomic RNA structures []. Rotaviruses have a dsRNA genome and are a major cause cause of acute gastroenteritis in the young of many species [].
The movement of bipartite Geminiviruses such as squash leaf curl virus (SqLCV) requires the cooperative
interaction of two essential virus-encoded movement proteins, BR1 and BL1. Recent studies of SqLCV and bean dwarf mosaic virus have shown that BR1 and BL1 act in a cooperative manner to move the viral genome intracellularly from the nucleus to the cytoplasm and across the wall cell to cell. BR1 is a nuclear shuttle protein, and it has been proposed to bind newly replicated viral ssDNA genomes and move these between the nucleus and cytoplasm. These BR1-genome complexes are then directed to the cell periphery through interactions between BR1 andBL1, where, as the result of BL1 action, the complexes are moved to adjacent uninfected cells. The precise
mechanism by which BL1 acts to transport these genome complexes across the cell wall, and whether this may differ in different celltypes, remains at issue [
].
This protein is part of the 50S ribosomal subunit. It forms a cluster with proteins L3 and L24e, part of which may contact the 16S rRNA in two inter-subunit bridges [
].
Non-structural protein NSP7 has been implicated in viral RNA replication and is predominantly α-helical in structure. Its central core is an N-terminal helical bundle (HB), with helices HB1, HB2 and HB3, forming a triple-stranded antiparallel coiled coil with a right-handed superhelical pitch. It is part of the RNA-dependent RNA polymerase (RdRp) heterotetramer which consists of one NSP7, two NSP8 molecules and the catalytic NSP12, defined as the minimal core component for mediating coronavirus RNA synthesis [
,
,
,
,
,
]. NSP7 and NSP8 forms a complex that adopts a hollow cylinder-like structure []. The dimensions of the central channel and positive electrostatic properties of the cylinder imply that it confers processivity on RNA-dependent RNA polymerase []. NSP7 and NSP8 play a role in the stabilisation of NSP12 regions involved in RNA binding, and are essential for a highly active NSP12 polymerase complex [,
,
,
].
Viral non-structural protein NSP8 is part of the RNA-dependent RNA polymerase (RdRp) complex and forms a heterotetramer consisting of one molecule of NSP7, two copies of NSP8 and one of NSP12 [
]. NSP8 and NSP7 adopts a hollow cylinder-like structure [,
] in which the dimensions of the central channel and positive electrostatic properties of the cylinder imply that it confers processivity on RdRp [,
]. NSP7 and NSP8 are co-factors for the catalytic NSP12 that play a role in the stabilisation of NSP12 regions involved in RNA binding and are essential for a highly active NSP12 polymerase complex []. It has been demonstrated that NSP8 from human coronavirus 229E acts as an oligo(U)-templated polyadenylyltransferase but also has robust (mono/oligo) adenylate transferase activities []. NSP8 has N-terminal and C-terminal D/ExD/E conserved motifs. The N-terminal motif is critical for RNA polymerase activity as these residues are part of the Mg2-binding active site []. NSP8 has a 'golf club'-like structure composed of a long α-helix N-terminal 'shaft' subdomain and an α/β C-terminal 'head' subdomain consisting of three α-helices and seven β-strands (
). The seven β-strands form an open-barrel with two antiparallel β-sheets packed orthogonally. More than half the residues in the C-terminal domain are hydrophobic, and the whole domain forms a tight hydrophobic core [
,
,
].Together with NSP9, NSP8 suppresses protein integration into the cell membrane, thus, disrupting host immune defenses [
].
NSP9 is a single-stranded RNA-binding viral protein involved in RNA synthesis, essential for the coronavirus replication [
,
,
]. The dimerisation of NSP9 is essential for binding and orienting RNA for subsequent use by the replicase machinery. NSP9 is composed of seven antiparallel β-strands and a single α-helix hat are arranged into a single compact domain and form a cone-shaped β-barrel flanked by the C-terminal α-helix () [
,
]. The NSP9 dimer interface is formed by the N-finger motifs and the parallel association of the C-terminal α-helices GXXXG motifs. The N- and C-terminal regions are more conserved than the central core one, and the GXXXG motif is strictly conserved [,
]. NSP9 binds to discrete regions on the 7SL RNA component of the signal recognition particle (SRP) and interfere with protein trafficking to the cell membrane upon infection, which interferes with essential host functions, and suppresses host immune defenses [].
Cip1-interacting zinc finger protein may regulate the subcellular localization of p21(CIP/WAF1) [
,
]. p21(CIP/WAF1) has a critical role in the negative control of cell growth. The Drosophila melanogasterprotein zinc finger protein on ecdysone puffs, which may have a role in the activation of early and late genes [
], is also included in this entry.
The bacterial and archaeal proteins in this family have no known function. A study suggests that some archaeal members may be G-quadruplex binding proteins [
].
ParB is a component of the par system which mediates accurate DNA partition during cell division. It recognises A-box and B-box DNA motifs. ParB forms an asymmetric dimer with 2 extended helix-turn-helix (HTH) motifs that bind to A-boxes. The HTH motifs emanate from a beta sheet coiled coil DNA binding module [
]. Both DNA binding elements are free to rotate around a flexible linker, this enables them to bind to complex arrays of A- and B-box elements on adjacent DNA arms of the looped partition site [].
SoxZ forms an anti parallel beta structure and forms a complex with SoxY. Sulphur oxidation occurs at the thiol of a conserved cysteine residue of the SoxY subunit [
].
This entry includes death effector domain-containing proteins, DEDD and DEDD2. DEDD is a scaffold protein that directs CASP3 to certain substrates and facilitates their ordered degradation during apoptosis [
]. DEDD2 is involved in the regulation of nuclear events mediated by the extrinsic apoptosis pathway [].
Calcium-binding tyrosine phosphorylation-regulated protein
Type:
Family
Description:
CABYR was originally isolated from spermatoza and was thought to be testis specific, but has been observed in lung and brain tumours [
]. It is a polymorphic calcium binding protein that is phosphorylated during capacitation [].
This domain contains a P-loop motif that is characteristic of the AAA superfamily. It is found in P-loop containing proteins such as chromosomal cassette SCCmec type IVc uncharacterized protein CR006 and arcadin-4, an actin-like component of the cytoskeleton of the archaeon
Pyrobaculum calidifontis[
].
Members of this strictly bacterial protein family show similarity to class II glutamine amidotransferases. They are distinguished by appearing in a genome context with, and usually adjacent to or between, members of families
(methyltransferase EgtD-like) and
(ergothioneine biosynthesis protein EgtB) [
].
RWDD3, also known as RSUME, is a RWD-containing protein that enhances SUMO conjugation by interacting with the SUMO conjugase Ubc9, increases Ubc9 thioester formation and therefore favours sumoylation of specific targets. It also increases IkappaB levels and stabilizes HIF-1alpha during hypoxia, leading to inhibition of NF-kappaB and increased HIF-1 transcriptional activity [
,
].
Sed1 is a major structural GPI-cell wall glycoprotein from Saccharomyces cerevisiae [
]. It may have a possible role in mitochondrial genome maintenance []. This family also includes Sed1 paralogue, Spi1, a cell wall protein that plays a role in resistance to cell wall stress [], and related proteins from fungi.