The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [
]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [
,
,
].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [
]. This entry represents a shared N-terminal region, of about 43 amino acids in length, found in a number of Cas proteins. This region is widely distributed and was designated Cas5. Proteins containing this region are generally 210 to 265 amino acids in length and show little or no homology between their C-terminal regions. The best characterised protein in this entry is DevS (
) from Myxococcus xanthus, a Cas protein that appears to participate in a species-specific developmental pathway. Cas5 is found within a cluster of three cas genes associated with CRISPR structures in many bacterial species, named cas1B, cas5 and cas6 [
].
The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [
]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [
,
,
].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [
]. This entry represents a Cas5 family of Cas proteins unique to the hmari subtype of cas genes and CRISPR repeats, which is the only subtype present in Haloarcula marismortui ATCC 43049. The hmari type, though uncommon, is also found in the Aquificae, Thermotogae, Firmicutes, and Dictyoglomi.
TTRAP is a family of small bacterial proteins largely from Clostrium difficile. From comparative and other structural studies of the PDB:
,
, it has been suggested that this family is required for interacting with other proteins in order to facilitate the transfer of the transposon CTn4 between different bacterial species. PDB:
comprises an α-helical fold of four α-helices leading to the production of two clefts, the larger of which displays two highly conserved residues in close proximity, Glu-8 and Lys-48. The gene concerned is part of an operon within transposon CTn4, and is expressed alongside a putative DNA primase, a DNA topoisomerase and conjugal transfer proteins [].
These conserved hypothetical proteins have so far been found only in the Cyanobacteria. They are about 170 amino acids long and contain a CxxCx(14)CxxH motif near the N terminus.
Eukaryotic ribosomes consist of a small 40S subunit and a large 60S subunit. Rps10 is a component of the 40S ribosomal subunit. It plays an important role in mammalian ribosome-associated QC pathways [
].
NifX is involved in the biosynthesis of the iron-molybdenum cofactor (FeMo-Co) found in the dinitrogenase enzyme of the nitrogenase complex in nitrogen-fixing bacteria. This complex catalyses the reduction of atmospheric dinitrogen to ammonia. The role of NifX in cofactor biosynthesis is not fully understood, though it appears to be associated with mature FeMo-Co and may be involved in the addition of heterometal and/or homocitrate [
].
FosX is a Mn(II)-dependent fosfomycin-specific epoxide hydrolase [
]. Fosfomycin inhibits the enzyme UDP-Nacetylglucosamine-3-enolpyruvyltransferase (MurA), which catalyzes the first committed step in bacterial cell wall biosynthesis. FosX catalyzes the addition of a water molecule to the C1 position of the antibiotic with inversion of the configuration at C1 in the presence of Mn(II). The hydrated fosfomycin loses the inhibition activity []. FosX is evolutionarily related to glyoxalase I and type I extradiol dioxygenases.
The human protein polybromo-1 (PBRM1, also known as BAF180) is part of a SWI/SNF chromatin-remodeling complex [
]. It was shown that polybromo bromodomains bind to histone H3 at specific acetyl-lysine positions. Bromodomains are found in many chromatin-associated proteins and in nuclear histone acetyltransferases. They interact specifically with acetylated lysine, but not all the bromodomains in polybromo may bind to acetyl-lysine [,
]. Polybromo contains 6 bromodomains. This entry represents the fifth one (BD5). Mutations in BD5 moderately affect PBRM1 chromatin association [
].
Bacillus spores are protected by a protein shell consisting of over 50 different polypeptides, known as the coat. CotO has an important morphogenetic role in coat assembly, it is involved in the assembly of at least 5 different coat proteins including CotB, CotG, CotS, CotSA and CotW. It is likely to act at a late stage of coat assembly [
].
This uncharacterised protein is found in prophage regions of Shewanella oneidensis MR-1, Vibrio vulnificus (strain YJ016), Yersinia pseudotuberculosis IP 32953, and Aeromonas hydrophila subsp. hydrophila (strain ATCC 7966/NCIB 9240). It appears to have regions of sequence similarity to the phage lambda antitermination protein Q.
The GvpN protein is associated with the production of gas vesicles produced in some prokaryotes to give cells buoyancy [
,
]. It belongs to a larger family of ATPases [].
Proteins in this entry are found, so far, only in the Gammaproteobacteria. Their function is currently unknown. The location on the chromosome is usually not far from housekeeping genes. Some proteins have been annotated in public databases as DNA-binding protein inhibitor-related, putative transcriptional regulators, or hypothetical DNA binding proteins.
Peroxisomal proteins, called peroxins, participate in the process of peroxisome assembly. Specific amino acid sequences (PTS or peroxisomal targeting signal) at the C terminus (PTS1) or N terminus (PTS2) are present in peroxisomal matrix proteins, which signals them to be imported. The protein receptors Pex5 and Pex7 accompany their cargoes (containing a PTS1 or a PTS2 amino acid sequence, respectively) into the peroxisome. Ubiquitination appears to be crucial for the export of Pex5 from the peroxisome to the cytosol. The monoubiquitination of the PTS1-receptor Pex5p is catalyzed by the E2 enzyme Pex4.Pex22 anchors Pex4 to the peroxisome membrane [
,
]. Moreover, Pex4 requires the interaction with Pex22 to display full activity []. The N-terminal portion of Pex22 has a membrane anchoring function, whereas the C-terminal domain harbours its biological activity. The soluble C-terminal domain of Pex22 is essential for peroxisomal biogenesis and efficient monoubiquitination of the import receptor Pex5 [].
Homologues of the DNA-repair protein RAD52 are present in all plants. There are two identified RAD52 genes in Arabidopsis thaliana (RAD52-1 and RAD52-2), which encode four open reading frames (ORFs) through differential splicing, each of which specifically localized to the nucleus, mitochondria, or chloroplast, which suggests a role in maintaining nuclear and organellar genomes. The diverse plant RAD52 proteins affect DNA repair and fertility [
].
MBD8 is a protein with a putative Methyl-CpG-Binding Domain (MBD) that is involved in control of flowering time in the C24 ecotype of Arabidopsis thaliana [
].
Proteins in this family are encoded within a conserved gene four-gene neighbourhood found sporadically in a phylogenetically broad range of bacteria including: Nocardia farcinica, Symbiobacterium thermophilum, Streptomyces avermitilis (Actinobacteria), Geobacillus kaustophilus (Firmicutes), Azoarcus sp. (strain EbN1) (Aromatoleum aromaticum (strain EbN1)) and Ralstonia solanacearum (Betaproteobacteria).
This entry is represented by a N-terminal domain of tape measure proteins that characterises the Caudoviruses. The characteristics of the protein distribution suggest prophage matches in addition to the phage matches.Proteins containing this domain are found either in bacteriophages or in bacteria, where they are encoded in bacteriophage and prophage regions. Most members are 800 to 1800 amino acids long, making them among the longest predicted proteins of their respective phage genomes, where they are encoded in tail protein regions. This roughly 80-residue domain described here usually begins between residue 100 and 250. Many members are known or predicted phage tail tape measure proteins, a minor tail component that regulates tail length.
Proteins in this entry are encoded within a conserved gene four-gene neighbourhood found sporadically in a phylogenetically broad range of bacteria including: Nocardia farcinica, Symbiobacterium thermophilum, and Streptomyces avermitilis (Actinobacteria), Geobacillus kaustophilus (Firmicutes), Azoarcus sp. (strain EbN1) (Aromatoleum aromaticum (strain EbN1)) and Ralstonia solanacearum (Betaproteobacteria). These proteins average over 1400 amino acids in length.
Proteins in this family are encoded within a conserved gene four-gene neighbourhood found sporadically in a phylogenetically broad range of bacteria including: Nocardia farcinica, Symbiobacterium thermophilum, Streptomyces avermitilis (Actinobacteria), Geobacillus kaustophilus (Firmicutes), Azoarcus sp. (strain EbN1) (Aromatoleum aromaticum (strain EbN1)) and Ralstonia solanacearum (Betaproteobacteria).
Proteins in this family are encoded within a conserved gene four-gene neighbourhood found sporadically in a phylogenetically broad range of bacteria including: Nocardia farcinica, Symbiobacterium thermophilum, Streptomyces avermitilis (Actinobacteria), Geobacillus kaustophilus (Firmicutes), Azoarcus sp. (strain EbN1) (Aromatoleum aromaticum (strain EbN1)) and Ralstonia solanacearum (Betaproteobacteria).
This family includes the circadian oscillating protein COP23 from Cyanothece sp. (strain PCC 8801),
. The levels of this peripheral membrane protein display a circadian oscillation [
].
This entry includes CLAVATA3/ESR (CLE)-related protein 41/44 (CLE41/44) from Arabidopsis and TDIF from Zinnia violacea. TDIF functions as a phloem-derived, non-cell-autonomous signal that controls stem cell fate in the procambium [
].Tracheary element differentiation inhibition factor (TDIF and CLE41/CLE44 peptides act as inhibitors of tracheary element differentiation. TDIF/CLE41/CLE44 promote the proliferation of vascular cells and prevent them from differentiating into xylem by regulating WOX4 expression through the TDR/PXY receptor [
,
,
]. CLE42 binds PXL2 (PXY-like 2) []. TDIF-related peptides also have activity to enhance axillary bud formation [].The CLE proteins share a C-terminal CLE domain with CLV3, a small secreted polypeptide that is active as a ligand and has a function in shoot apical meristem maintenance [
]. In general, the CLE-related proteins are C-terminally processed to generate an active CLE peptide []. These small secreted peptide hormones then function in a variety of developmental and physiological processes [,
]. CLV3 and other A-type CLE peptides are involved in the suppression of stem cell development in root and shoot development, in contrast to the B-type CLE peptides, CLE41 and CLE44, in the case of Arabidopsis, which are not [].
The Notch signaling pathway regulates genes involved in cell fate decision throughout development. Its activity relies notably on the CSL transcription factors. BEND6 is a BEN-solo (bears a single BEN domain) factor that binds CSL via its BEN domain and acts as a CSL co-repressor. BEND6 associates with and represses Notch target genes, antagonizing Notch signaling in neural stem cells, thereby opposing their self-renewal and promoting neurogenesis [
].This entry also includes BEN-solo factors from Drosophila, namely protein insensitive, Elba 1 and Elba2, which share functional attributes with BEND6. They recognise CCAATTGG palindromes to mediate transcriptional repression [
].
This entry represents the nuclear anchorage protein 1 (ANC-1) and related proteins, from arthropods and nematodes (roundworms). ANC-1 contains a conserved Klarsicht/ANC-1/Syne homology (KASH) domain at its C terminus that can associate with Sad1p/UNC-84 (SUN)-domain proteins of the inner nuclear membrane within the periplasmic space of the nuclear envelope (NE) [
]. In C. elegans, ANC-1 connects nuclei to the cytoskeleton by interacting with UNC-84 at the nuclear envelope and with actin in the cytoplasm [,
]. This entry also includes Klarsicht protein from Drosophila melanogaster, a protein that links microtubule motors and various cellular structures. It is important to control the migration and positioning of nuclei in photoreceptors and muscles [,
].
PRAP, or proline-rich acidic protein 1, is a family of eukaryotic proteins. PRAP is abundantly expressed in the epithelial cells of the human liver, kidney, gastrointestinal tract, and cervix. It is significantly down-regulated in hepatocellular carcinoma and right colon adenocarcinoma compared with the respective adjacent normal tissues [
]. It is expressed in the epithelial cells of the mouse and rat gastrointestinal tracts, and pregnant mouse uterus. PRAP plays an important role in maintaining normal growth suppression [].
This entry includes NOPCHAP1 from animals and New4 from fission yeast.NOPCHAP1 is a client-loading PAQosome/R2TP complex cofactor that selects NOP58 to promote box C/D small nucleolar ribonucleoprotein (snoRNP) assembly [
]. The function of New4 is not clear.
This entry includes Vps38 from budding yeasts. Vps38 is involved in endosome-to-Golgi retrograde transport as part of the VPS34 PI3-kinase complex II [
].The class III phosphatidylinositol-3-kinase (PI3K) known as Vps34 regulates intracellular membrane trafficking in endocytic sorting, cytokinesis and autophagy. Vps34 forms complexes with other proteins: Vps15, Vps30 (encoded by VPS30/ATG6 in yeast, equivalent to mammalian Beclin 1, encoded by BECN1) and either Vps38 (UVRAG) or Atg14 (ATG14L). Although the N-terminal domains of Vps30, Vps38 and Atg14 differ, the overall similarity of their domain organizations suggests that these proteins may have evolved from a common ancestor [].
This domain can be found in Psb31, an extrinsic protein found in photosystem II (PSII) present in Chaetoceros gracilis [
]. Photosystem II (PSII) is a multisubunit, membrane protein complex located in the thylakoid membranes of oxygenic photosynthetic organisms from cyanobacteria to higher plants. The four helices in the N-terminal domain are arranged in an up-down-up-down fold and are similar in structure to PsbQ protein in Spinach, despite low sequence homology [].
The papillomavirus E6 oncoproteins are small zinc-binding proteins that share a conserved zinc-binding CXXC motif and do not have identified intrinsic enzymatic activity. E6 proteins are thought to act as adapter proteins, thereby altering the function of E6-associated cellular proteins. This model for E6 function is best supported by observations of human papillomavirus type 16 (HPV-16) E6 (16E6), which can alter the metabolism of the p53 tumor suppressor through association with a cellular E3 ubiquitin ligase called E6AP. HPV-16 E6 interacts with an 18-amino-acid sequence in E6AP, and in an as yet ill-defined fashion the E6AP-16E6 complex binds to p53, inducing the ubiquitin-dependent degradation of the trimolecular complex. 16E6 apparently functions as an adapter protein in the complex with p53, since E6AP does not interact with p53 in the absence of E6 and since the degradation of p53 requires both E6 and E6AP [
,
,
].Despite the similarity in structure of the E6 oncoproteins, studies have indicated surprising biochemical diversity among E6 oncoproteins of different papillomavirus types. E6 from the cancer-associated human papillomaviruses (HPVs) complex with a cellular protein termed E6-AP and together with E6-AP bind to the p53 tumor suppressor protein thereby degrading p53 through ubiquitin-mediated proteolysis. E6 from the non-cancer-associated HPV types do not bind E6-AP or degrade p53. Bovine papilloma virus E6 (BE6) binds E6-AP but fails either to complex with p53 or to degrade associated proteins, implying that BE6 might transform cells through a mechanism different from that of the HPVs. In addition to targeting p53, E6 of both cancer-associated HPVs and BPV-1 have been shown to associate with a cellular-calcium-binding protein localized to the endoplasmic reticulum [
,
].
This family of proteins of unknown function gets its name from its position in the mammalian genome: Fanconi anemia group D2 protein opposite strand transcript protein.
This entry includes C2 domain-containing protein 2 (C2CD2) and C2CD2-like (C2CD2L) proteins. C2CD2L, also known as TMEM24, is an endoplasmic reticulum (ER)-anchored membrane protein that binds lipids and transports the phosphatidylinositol 4,5-bisphosphate [PI(4,5)P2] precursor phosphatidylinositol between bilayers, allowing replenishment of PI(4,5)P2 hydrolyzed during signaling []. The function of C2CD2 is not known.
Nuclear transition protein 1 (TP1) is one of the spermatid-specific proteins
[]. TP1 is a basic protein well conserved in mammalian species. In mammals, the second stage of spermatogenesis is characterised by the conversion of nucleosomal chromatin to the compact, non-nucleosomal and transcriptionally inactive form found in the sperm nucleus. This condensation is associated with a double-protein transition. The first transition corresponds to the replacement of histones by several spermatid-specific proteins (also called transition proteins) which are themselves replaced by protamines during the second transition.
This entry includes proteins with an MMS19, N-terminal domain. Most homologues also contain an MMS19, C-terminal domain.MMS19 is a key component of the cytosolic iron-sulfur protein assembly (CIA) complex, a multiprotein complex that mediates the incorporation of iron-sulfur cluster into apoproteins specifically involved in DNA metabolism and genomic integrity [
,
]. In humans, MMS19 acts as an adapter between early-acting CIA components and a subset of cellular target iron-sulfur proteins such as ERCC2/XPD, FANCJ and RTEL1, thereby playing a key role in nucleotide excision repair (NER) and RNA polymerase II (POL II) transcription [
,
]. It is also part of the MMXD (MMS19-MIP18-XPD) complex, which plays a role in chromosome segregation, probably by facilitating iron-sulfur cluster assembly into ERCC2/XPD [].In budding yeasts, the mms19 mutants were originally isolated in a screening for mutants hypersensitive to the alkylating agent methyl methanesulfonate (MMS) [
]. Different from human MMS19, Mms19 in budding yeasts (also known as Met18) does not participate directly in NER []. In fission yeast, Mms19 is part of a silencing complex named Rik1-Dos2 complex, which contains Dos2, Rik1, Mms19 and Cdc20. This complex regulates RNA Pol II activity in heterochromatin, and is required for DNA replication and heterochromatin assembly [
].
WIT1 and WIT2 are tail-anchored proteins [
] required for the nuclear envelope anchoring of RANGAP (Ran GTPase activating protein) in root tips in Arabidopsis []. They could be involved in determining plant nuclear shape [] and pollen tube guidance [].
This is the N-terminal domain of cellulose-binding protein CttA present in Ruminococcus flavefaciens. CttA mediates attachment of the bacterial substrate via two carbohydrate-binding modules. The domain is known as the X-module and lacks a true hydrophobic core. Unlike the X-modules in other types of CohE-XDoc complexes it does not contribute to the binding surface. This X-module appears to serve as an extended spacer, which separates the cellulose-binding modules at the N terminus of CttA and the bacterial cell wall. The domain does not share structural similarity with other known X-modules from cellulolytic bacteria but does show similarity to G5-1 module of StrH from S. pneumoniae [
].
Mcp1 is a mitochondrial protein involved in mitochondrial lipid homeostasis in yeast. It is a Vps13 adaptor protein, promoting vacuole-mitochondria contacts [
,
,
,
].
COMM10 is a member of the COMMD family defined by the presence of a conserved and unique motif termed the COMM (copper metabolism gene MURR1) domain, which functions as an interface for protein-protein interactions [
].
COMM9 is a member of the COMMD family defined by the presence of a conserved and unique motif termed the COMM (copper metabolism gene MURR1) domain, which functions as an interface for protein-protein interactions [
]. COMM9 modulates Na+ transport in epithelial cells by regulation of apical cell surface expression of amiloride-sensitive sodium channel (ENaC) subunits [].
COMM domain-containing protein 1, also called Murr1, is involved in copper homeostasis [
,
]. It has been identified as a regulator of the human delta epithelial sodium channel []. It has been shown to directly link to early endosomes through its interaction with a protein complex containing CCDC22, CCDC93, and C16orf62 [].The COMM domain is found at the C terminus of a variety of proteins; presumably all COMM domain-containing proteins are located in the nucleus and the COMM domain plays a role in protein-protein interactions. Several family members have been shown to bind and inhibit NF-kappaB [
,
].
COMMD2 is a member of the COMMD family defined by the presence of a conserved and unique motif termed the COMM (copper metabolism gene MURR1) domain, which functions as an interface for protein-protein interactions [
]. Several family members have been shown to bind and inhibit NF-kappaB [].
COMMD4 is a member of the COMMD family defined by the presence of a conserved and unique motif termed the COMM (copper metabolism gene MURR1) domain, which functions as an interface for protein-protein interactions [
].
This entry includes COMM domain-containing proteins 4/6/7/8 (COMMD4/6/7/8), which may modulate the activity of cullin-RING E3 ubiquitin ligase (CRL) complexes [
].COMM (copper metabolism gene MURR1) domain proteins constitute a family initially identified as interacting partners of COMMD1 (previously known as MURR1), the prototype member of this protein family. COMMD1 is a multifunctional protein that has been shown to participate in two apparently distinct activities, regulation of the transcription factor NF-kappa-B and control of copper metabolism. The family is defined by the presence of a C-terminal motif termed COMM domain, which functions as an interface for protein-protein interactions. The proteins designated as COMMD or COMM domain containing 1-10 are extensively conserved in multicellular eukaryotic organisms [
,
].
COMMD3 is a member of the COMMD family defined by the presence of a conserved and unique motif termed the COMM (copper metabolism gene MURR1) domain, which functions as an interface for protein-protein interactions [
]. COMMD3 modulates Na+ transport in epithelial cells by regulation of apical cell surface expression of amiloride-sensitive sodium channel (ENaC) subunits [].
COMMD7 is a member of the COMMD family defined by the presence of a conserved and unique motif termed the COMM (copper metabolism gene MURR1) domain, which functions as an interface for protein-protein interactions [
].
COMMD5 (also known as HCaRG) is a member of the COMMD family defined by the presence of a conserved and unique motif termed the COMM (copper metabolism gene MURR1) domain, which functions as an interface for protein-protein interactions [
]. It is a nuclear protein involved in the regulation of cell proliferation []. It is also a regulator of renal epithelial cell growth and differentiation causing G(2)M cell cycle arrest [].
This entry represents the C-terminal domain of SduA, the only component of antiviral defense system Shedu. Expression of Shedu in B.subtilis (strain BEST7003) confers resistance to phages phi105, phi29, rho14 and to a lesser extent to SPP1 [
]. This domain may have endonuclease activity.
FlgL (or hook-associated protein 3, HAP3) proteins are flagellar hook-associated proteins encoded in bacterial flagellar operons [
,
]. An N-terminal region of about 150 residues and a C-terminal region of about 85 residues are conserved in this family, though members show considerable length heterogeneity between these two well-conserved terminal regions - proteins in this family range from 287 to more than 500 residues in length.
Proteins in this family contain a C2H2, LYAR-type zinc finger. Included in this family is cell growth-regulating nucleolar protein (LYAR), which maintains the appropriate processing of 47S/45S pre-rRNA to 32S/30S pre-rRNAs and their subsequent processing to produce 18S and 28S rRNAs [
]. It is also a repressor of the gamma-globin promoter [] and oxidative stress genes [].
MBD5 and MBD6 are two members of the methyl-CpG-binding domain (MBD) family. They do not bind methylated DNA [
], but instead they interact with the PR-DUB Polycomb protein complex in a mutually exclusive manner. They may share a function through their interaction with PR-DUB. MBD6, but not MBD5, is recruited to sites of DNA damage []. MBD5 is highly expressed in oocytes [].
Coiled-coil domain containing protein 40 (CCDC40) plays an evolutionary conserved role in the assembly of motile cilia and establishment of the left-right axis [
,
]. CCDC40 may serve as a part of the axoneme structural scaffold. It has been shown to be required for axonemal recruitment of CCDC39 []. CCDC40 mutation is a cause of primary ciliary dyskinesia [].
The myelin sheath is a multi-layered membrane, unique to the nervous system, that functions as an insulator to greatly increase the velocity of axonal impulse conduction [
]. Myelin proteolipid protein (PLP or lipophilin) [] is the major myelin protein from the central nervous system (CNS). It probably plays an important role in the formation or maintenance of the multilamellar structure of myelin. In man point mutations in PLP are the cause of Pelizaeus-Merzbacher disease (PMD), a neurologic disorder of myelin metabolism. In animals dismyelinating diseases such as mouse 'jimpy' (jp), rat md, or dog 'shaking pup' are also caused by mutations in PLP.PLP is a highly conserved [
] hydrophobic protein of 276 to 280 amino acids which seems to contain four transmembrane segments, two disulphide bonds and which covalently binds lipids (at least six palmitate groups in mammals) [].PLP is highly related to M6, a neuronal membran glycoprotein [
].
Smaug participates in regulation of transcript stability and degradation [
]. In Drosophila, Smaug is a translational repressor of the maternal mRNA of Nanos, a protein required for the formation of the anterior-posterior body axis. Repression depends on the protein Smaug binding to two Smaug recognition elements (SREs) in the nanos 3' UTR. The SAM domain interacts specifically with the Nanos mRNA regulatory regions []. Moreover, Smaug is also involved in regulation of specific maternal transcripts degradation in Drosophila early embryo via recruitment of the CCR4/POP2/NOT deadenylase [].This entry represents the SAM (sterile alpha motif) domain of Smaug, which is a C-terminal RNA binding domain. This domain interacts with stem-loop structures of mRNA [
].
Bovine calicivirus is a positive-stranded ssRNA viruses that cause gastroenteritis [
]. The calicivirus genome contains two open reading frames, ORF1 and ORF2 [,
]. ORF1 encodes a non-structural polypeptide, which has RNA helicase, cysteine protease and RNA polymerase activity. The regions of the poly-protein in which these activities lie are similar to proteins produced by the picornaviruses [,
]. ORF2 encodes a structural protein []. This signature finds ORF2, the structural coat protein. Two different families of caliciviruses can be distinguished on the basis of sequence similarity, namely those classified as small round structured viruses (SRSVs) and those classed as non-SRSVs.Rabbit hemorrhagic disease virus (RHDV) which causes a highly contagious disease of wild and domestic rabbits belongs to the family Caliciviridae [
]. The capsid protein self assembles to form an icosahedral capsid with a T=3 symmetry. It is about 38nm in diameter and consists of 180 capsid proteins. The capsid encapsulates the genomic RNA and VP2 proteins and attaches the virion to target cells by binding histo-blood group antigens present on gastroduodenal epithelial cells. The Shell domain (S domain) contains elements essential for the formation of the icosahedron. The Protruding domain (P domain) is divided into sub-domains P1 and P2. An hypervariable region in P2 is thought to play an important role in receptor binding and immune reactivity.This is the calicivirus coat protein (
) C-terminal region.
This domain of unknown function is found in O-mannosyl-transferases TMTC1-4, and constitutes a loop between TM7 and TM8 located in the ER lumen that contains a small hydrophobic, but not membrane-embedded helix. This loop is critical for catalysis and binding of ligands, especially the lipid-linked sugar moiety [
].TMTCs transfers mannosyl residues to the hydroxyl group of serine or threonine residues. The 4 members of the TMTC family are O-mannosyl-transferases acting primarily on the cadherin superfamily; each member has distinct roles in decorating the cadherin domains with O-linked mannose glycans at specific regions. These proteins also act as O-mannosyl-transferase on other proteins such as PDIA3 [
].
Members of this family are putative ATP-binding sugar transporter-like protein. This entry also includes the phage Head-tail joining protein, known as Gp10 from Bacteriophage N15. It plays a role in virion assembly by joining the head and the tail at the last step of morphogenesis.
28kDa AKAP (AKAP28) is highly enriched in human airway axonemes. The mRNA for AKAP28 is up-regulated as primary airway cells differentiate and is specifically expressed in tissues containing cilia and/or flagella [
]. Homologs of AKAP28 are present in all animals.
This family contains the coiled-coil domain-containing protein 51 (CCDC51), also known as mitochondrial potassium channel (MITOK). In complex with ABCB8/MITOSUR it forms a protein complex localised in the mitochondria that mediates ATP-dependent potassium currents across the inner membrane [
].
MOSMO, also known as Atthog (attenuator of hedgehog) in mouse, acts as a negative regulator of hedgehog (Hh) signaling, probably by promoting internalization and subsequent degradation of Smoothened (SMO). SMO is an oncoprotein that transduces the Hh signal across the membrane. In the absence of Atthog, SMO was stabilized at the cell surface and concentrated in the ciliary membrane [
].
This entry represents the bacterial protein ApaG that contains a single ApaG domain, which is ~125 amino acids in length. The Salmonella typhimurium ApaG domain protein, CorD, is involved in Co(2+) resistance and Mg(2+) efflux. Tertiary structures from different ApaG proteins show a fold of several β-sheets. The ApaG domain may be involved in protein-protein interactions which could be implicated in substrate-specificity [
,
,
].
CEP70 localizes to the centrosome throughout the cell cycle and binds to the key centrosomal component, gamma-tubulin [
]. It plays an important role in the regulation of microtubule dynamics, mitotic spindle formation, cell migration, and ciliogenesis []. It has been shown to regulate cancer growth and metastasis [,
].
Members of this family are nucleolar RNA-associated proteins (Nrap) which are highly conserved from yeast (Saccharomyces cerevisiae) to human. In the mouse, Nrap is ubiquitously expressed and is specifically localised in the nucleolus [
]. Nrap is a large nucleolar protein (of more than 1000 amino acids). Nrap appears to be associated with ribosome biogenesis by interacting with pre-rRNA primary transcript [].This domain has a nucleotidyltransferase structure.
Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [,
]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [
,
].L32 is a protein from the large ribosomal subunit that contains a surface-exposed globular domain and a finger-like projection that extends into the RNA core to stabilize the tertiary structure. L32 does not appear to play a role in forming the A (aminacyl), P (peptidyl) or E (exit) sites of the ribosome, but does interact with 23S rRNA, which has a "kink-turn"secondary structure motif. L32 is overexpressed in human prostate cancer and has been identified as a stably expressed housekeeping gene in macrophages of human chronic obstructive pulmonary disease (COPD) patients. In Schizosaccharomyces pombe, L32 has also been suggested to play a role as a transcriptional regulator in the nucleus. Found in archaea and eukaryotes, this protein is known as L32 in eukaryotes and L32e in archaea [
,
,
,
,
,
,
].This entry represents the archaeal 50S L32e ribosomal proteins.
Flaviviruses encode a single polyprotein. This is cleaved into three structural and seven non-structural proteins. The NS4B protein is small and poorly conserved among the Flaviviruses. NS4B contains multiple hydrophobic potential membrane spanning regions [
]. NS4B may form membrane components of the viral replication complex and could be involved in membrane localisation of NS3 and NS5 (see ) [
].
LITAF (LPS-induced TNF-activating factor) (also known as SIMPLE; small
integral membrane protein of the late endosome) is an endosome-associatedintegral membrane protein important for multivesicular body (MVB) sorting. It
is a monotypic membrane protein with both termini exposed to the cytoplasm andis anchored to membranes via an in-plane helical membrane anchor, present
within the highly conserved C-terminal region known as the 'LITAF domain' or'SIMPLE-like domain'. The LITAF domain consists of conserved cysteines
separated by a 22 residue hydrophobic region. LITAF domains are foundthroughout the eukaryotes, suggesting ancient conserved functions, with
multiple instances of expansion, especially in the metazoa [,
].Proteins containing the LITAF domain include: Vertebrate lipopolysaccharide-induced tumor necrosis factor-alpha factor (LITAF). In human, several mutations in LITAF cause the autosomal dominant inherited peripheral neuropathy, Charcot-Marie-Tooth disease type 1 (CMT1). These mutations map to the LITAF domain [
].Eukaryotic cell death-inducing p53-target protein 1 (CDIP1) [
].Arabidopsis thaliana GSH-induced LITAF domain protein (GILP), acts as a membrane anchor, bringing other regulators of programmed cell death (PCD) to the plasma membrane [
].
Budding yeast Pin2 was identified as a protein that when overexpressed can induce the [PIN+] prion phenotype, which is a prerequisite for prion formation by Sup35, referred to as the [PSI+]prion [
]. Pin2 localization is dependent on both exo- and endocytosisis. It is an exomer-dependent cargo that localizes at the plasma membrane of the bud early in the cell cycle, and the bud neck at cytokinesis. The prion-like domain (PLD) in Pin2 serves as a Pin2 retention signal in the trans-Golgi network (TGN) and may act as stress-response element. Under environmental stress, Pin2 is endocytosed, and the PLD aggregates and causes sequestration of Pin2. This aggregation is reversible upon stress removal and Pin2 can be re-exported to the plasma membrane []. Why Pin2 needs to be retain under stress is not clear, but it may be related to the observation that Pin2 interacts with various components of the cell-wall integrity pathway.
MrpL25 is a yeast mitochondria-specific ribosomal (mitoribosome) protein. MrpL25, also known as mL59 in S. cerevisiae, may be located near the tunnel exit site [
,
].
The function of this family of yeast proteins is not known. Mug8 may have a role in meiosis and septation in Schizosaccharomyces pombe [
]. The Saccharomyces cerevisiae homologue Msb1 may be involved in positive regulation of 1,3-beta-glucan synthesis and the Pkc1p-MAPK pathway [].
The spindle assembly checkpoint (SAC) ensures the fidelity of chromosome segregation. In plants, Msp1 (also known as AtPRD2) has been identified as one of the SAC components [
]. Msp1 is required for meiotic spindle organization [
] and DNA double-strand break formation [].In flowering plants, gametophyte formation relies on meiosis. In meiosis I, the homologous chromosomes are separated into two daughter cells. In meiosis II, the sister chromosomes are then separated into newly formed daughter cells. During prophase I, several events occurs: sister chromatid cohesion, homologous chromosome synapsis, recombination, crossover formation and chromosome segregation. Homologous recombination is initiated from the formation of DNA double-strand breaks (DSBs). The formation of DSBs is catalyzed by Spo11 and its homologues. So far, six Arabidopsis proteins, AtSPO11-1, AtSPO11-2, AtPRD1, AtPRD2, AtPRD3 and AtDFO, have been shown to be involved in DSB formation [].
Carbohydrate-binding domain-containing protein Cthe_2159
Type:
Family
Description:
Cthe_2159 from Clostridium thermocellum is the first representative of a novel family of cellulose and/or acid-sugar binding β-helix proteins that share structural similarities with polysaccharide lyases [
].
Bqt4 is a Schizosaccharomyces pombe protein that anchors telomeres to the inner nuclear membrane during both vegetative growth and meiosis. This is required for telomere clustering to the spindle pole body to form the bouquet arrangement of chromosomes during the meiotic prophase [
].
UBact is a ubiquitin-like protein found in Gram-negative bacteria. It differs from ubiquitin and most ubiquitin-like proteins (including the mycobacterial Pup) in that it lacks the characteristic C-terminal di-glycine motif. UBact conjugation-degradation system appears to be homologous to the pupylation system for proteasomal degradation in bacteria [
].
This entry represents GATOR complex protein WDR24, a component of the GATOR subcomplex GATOR2. GATOR2 is part of the amino acid-sensing branch of the TORC1 signaling pathway, activating mTORC1 by inhibition of the GATOR1 subcomplex. The GATOR2 complex consists of MIOS, SEC13, SEH1L, WDR24 and WDR59 [
]. WDR24 has a secondary role in the GATOR-independent promotion of lysosome acidity and regulation of autophagic flux [].This entry also includes the restriction of telomere capping protein 1 (Rtc1 or Sea2) from the budding yeast Saccharomyces cerevisiae. It is a component of the SEA complex, termed as GATOR in mammals, which coats the vacuolar membrane and is involved in intracellular trafficking, autophagy, response to nitrogen starvation, and amino acid biogenesis [
,
].
This entry represents the Ragulator complex protein LAMTOR2. The Ragulator complex consists of LAMTOR1, LAMTOR2, LAMTOR3, LAMTOR4 and LAMTOR5 [
]. The Ragulator complex is involved in amino acid sensing and activation of the mTORC1 signalling pathway. The complex is activated by amino acids through a mechanism involving the lysosomal V-ATPase, and functions as a guanine nucleotide exchange factor activating the small GTPases Rag [], []. This entry also includes uncharacterised proteins from bacteria and archaea.
This entry includes 60S ribosomal protein L35 from eukaryotes [
].Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [
,
]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [
,
].
This family of membrane/coat proteins are found in a number of different ssRNA plant virus families that include Potexvirus, Hordeivirus and Carlavirus.
The presequence translocase-associated motor (PAM) drives the completion of preprotein translocation into the mitochondrial matrix. The Pam17 subunit is required for formation of a stable complex between cochaperones Pam16 and Pam18 and promotes the association of Pam16-Pam18 with the presequence translocase [
]. Mitochondria lacking Pam17 are selectively impaired in the import of matrix proteins [
].