Search our database by keyword

- or -

Examples

  • Search this entire website. Enter identifiers, names or keywords for genes, pathways, authors, ontology terms, etc. (e.g. eve, embryo, zen, allele)
  • Use OR to search for either of two terms (e.g. fly OR drosophila) or quotation marks to search for phrases (e.g. "dna binding").
  • Boolean search syntax is supported: e.g. dros* for partial matches or fly AND NOT embryo to exclude a term

Search results 12001 to 12100 out of 30763 for seed protein

Category restricted to ProteinDomain (x)

0.033s

Categories

Category: ProteinDomain
Type Details Score
Protein Domain
Name: SNX8/Mvp1, PX domain
Type: Domain
Description: Sorting nexin-8 (SNX8) belongs to the sorting nexin family, which contains a conserved PX (phox homology) domain that is responsible for binding to specific phosphoinositides [ ]. It may play a role in intracellular protein transport from early endosomes to the trans-Golgi network []. SNX8 yeast counterpart, Mvp1, is required for sorting proteins to the vacuole [, ].The PX domain is a phosphoinositide (PI) binding module present in many proteins with diverse functions [ ]. Sorting nexins (SNXs) make up the largest group among PX domain containing proteins. They are involved in regulating membrane traffic and protein sorting in the endosomal system []. The PX domain of SNXs binds PIs and targets the protein to PI-enriched membranes. SNXs differ from each other in PI-binding specificity and affinity, and the presence of other protein-protein interaction domains, which help determine subcellular localization and specific function in the endocytic pathway. Some SNXs are localized in early endosome structures such as clathrin-coated pits, while others are located in late structures of the endocytic pathway [, , , , , , ].
Protein Domain
Name: Chromogranin, conserved site
Type: Conserved_site
Description: Granins (chromogranins or secretogranins) [ ] are a family of acidic proteins present in the secretory granules of a wide variety of endocrine and neuro-endocrine cells. The exact function(s) of these proteins is not yet known but they seem to be the precursors of biologically active peptides and/or they may act as helper proteins in the packaging of peptide hormones and neuropeptides. Apart from their subcellular location and the abundance of acidic residues (Asp and Glu), these proteins do not share many structural similarities. Only one short region, located in the C-terminal section, is conserved in all these proteins, such as:Chromogranin A (CGA): CGA is a protein of about 420 residues; it is the precursor of the peptide pancreastatin which strongly inhibits glucose- induced insulin release from the pancreas.Secretogranin 1 (chromogranin B): A sulfated protein of about 600 residues.Secretogranin 2 (chromogranin C): A sulfated protein of about 650 residues.Chromogranins and secretogranins together share a C-terminal motif, whereas chromogranins A and B share a region of high similarity in their N-terminal section; this region includes two cysteine residues involved in a disulphide bond.
Protein Domain
Name: Type III secretion system filament chaperone CesA
Type: Family
Description: This family represents a chaperone protein for the type III secretion system (TTSS) translocon protein EspA, to prevent the latter's self-polymerisation. The TTSS is a highly specialised bacterial protein secretory pathway, similar in many ways to the flagellar system, that is essential for the pathogenesis of many Gram-negative bacteria. The twenty or so proteins making up the TTSS apparatus, referred to as the needle complex, allow the injection of virulence proteins (known as effectors) directly into the cytoplasm of the eukaryotic host cells they infect; however, the injection process itself is mediated by a subset of extracellular proteins that are secreted by the needle complex to the bacterial surface and assembled into the type III translocon - EspA, EspB and EspD. EspA polymerises into an extracellular filament, and, as with other fibrous proteins, is apt to undergo massive polymerisation when overexpressed. CesA is the secretion chaperone protein that binds to EspA. CesA is dimeric and helical, and it traps EspA in a monomeric state and inhibits its polymerisation [].
Protein Domain
Name: Regulator of G-protein signalling 3, RGS domain
Type: Domain
Description: RGS3 is a member of R4 subfamily of RGS family, a diverse group of multifunctional proteins that regulate cellular signalling events downstream of G-protein coupled receptors (GPCRs) [ ]. Signalling is initiated when GPCRs bind to their ligands, triggering the replacement of GDP bound to the G-alpha subunits of heterotrimeric G proteins with GTP. RGSs inhibit signal transduction by increasing the GTPase activity of G protein alpha subunits, thereby driving them into their inactive GDP-bound form. This activity defines them as GTPase activating proteins (GAPs). Regulator of G protein signaling 3 (RGS3) contains a membrane-targeting C2 domain, one PDZ domain, and an RGS (Regulator of G-protein Signalling) domain. RGS3 has been identified to inhibit Galpha-q and Galpha-i-mediated signaling by acting as a GTPase-activating protein [ ]. RGS3 exits as several splice isoforms []. A short form, RGS3S, induced apoptosis when overexpressed [], whereas PDZ-RGS3 has been linked to cell migration through interaction with Ephrin receptors []. RGS3 interacts with neuroligin, estrogen receptor-alpha, and 14-3-3 outside of the GPCR pathways [, , ].This entry represents the RGS domain of RGS3.
Protein Domain
Name: RGS10, RGS domain
Type: Domain
Description: RGS (Regulator of G-protein Signaling) domain is an essential part of the RGS10 protein. RGS10 is a member of the RGS proteins family, a diverse group of multifunctional proteins that regulate cellular signaling events downstream of G-protein coupled receptors (GPCRs). RGS10 is one of the smallest proteins of the RGS family; its structure is little more than the RGS domain. As a major G-protein regulator, RGS domain containing proteins are involved in many crucial cellular processes such as regulation of intracellular trafficking, glial differentiation, embryonic axis formation, skeletal and muscle development, and cell migration during early embryogenesis [ , ]. RGS10 belongs to the R12 RGS subfamily, which includes RGS12 and RGS14, all of which are highly selective for G-alpha-i1 over G-alpha-q [ ]. RGS10 exists in 2 splice isoforms, RGS10A and RGS10B. Although the expression of RGS10 is ubiquitous, the highest levels are found in the brain and immune system []. RGS10A is expressed in osteoclasts and is a key component in the RANKL signaling mechanism for osteoclast differentiation [].
Protein Domain
Name: Netrin module, non-TIMP type
Type: Domain
Description: The netrin (NTR) module is an about 130-residue domain found in the C-terminal parts of netrins, complement proteins C3, C4, and C5, secreted frizzled-related proteins, and type I procollagen C-proteinase enhancer proteins (PCOLCEs), as well as in the N-terminal parts of tissue inhibitors of metalloproteinases (TIMPs). The proteins harboring the NTR domain fulfill diverse biological roles ranging from axon guidance, regulation of Wnt signalling, to the control of the activity of metalloproteinases. The NTR domain can be found associated to other domains such as CUB, WAP, Kazal, Kunitz, Ig-like, laminin N-terminal, laminin-type EGF or frizzled. The NTR domain is implicated in inhibition of zinc metalloproteinases of the metzincin family [ , ].The NTR module is a basic domain containing six conserved cysteines, which are likely to form internal disulphide bonds, and several conserved blocks of hydrophobic residues (including an YLLLG-like motif). The NTR module consists of a β-barrel with two terminal α-helices packed side by side against the face of the β-barrel (see ) [ ].This entry includes most netrin modules, but excludes those found in TIMPs.
Protein Domain
Name: tRNA(Ile)-lysidine synthase, N-terminal
Type: Domain
Description: This entry represents the N-terminal domain of lysidine-tRNA(Ile) synthetase, which ligates lysine onto the cytidine present at position 34 of the AUA codon-specific tRNA(Ile) that contains the anticodon CAU, in an ATP-dependent manner. Cytidine is converted to lysidine, thus changing the amino acid specificity of the tRNA from methionine to isoleucine. The N-terminal region contains the highly conserved SGGXDS motif, predicted to be a PP-loop motif involved in ATP binding.The only examples in which the wobble position of a tRNA must discriminate between G and A of mRNA are AUA (Ile) vs. AUG (Met) and UGA (stop) vs. UGG (Trp). In all bacteria, the wobble position of the tRNA(Ile) recognizing AUA is lysidine, a lysine derivative of cytidine. This domain is found, apparently, in all bacteria in a single copy. Eukaryotic sequences appear to be organellar. The domain architecture of this protein is variable; some, including characterised proteins of Escherichia coli and Bacillus subtilis known to be tRNA(Ile)-lysidine synthetase, include a conserved 50-residue domain that many other members lack. This protein belongs to the ATP-binding PP-loop family. It appears in the literature and protein databases as TilS, YacA, and putative cell cycle protein MesJ (a misnomer).The PP-loop motif appears to be a modified version of the P-loop of nucleotide binding domain that is involved in phosphate binding [ ]. Named PP-motif, since it appears to be a part of a previously uncharacterised ATP pyrophophatase domain. ATP sulfurylases, E. coli NtrL, and B. subtilis OutB consist of this domain alone. In other proteins, the pyrophosphatase domain is associated with amidotransferase domains (type I or type II), a putative citrulline-aspartate ligase domain or a nitrilase/amidase domain. The HUP domain class (after HIGH-signature proteins, UspA, and PP-ATPase) groups together PP-loop ATPases, the nucleotide-binding domains of class I aminoacyl-tRNA synthetases, UspA protein (USPA domains), photolyases, and electron transport flavoproteins (ETFP). The HUP domain is a distinct class of alpha/beta domain[].
Protein Domain
Name: Zinc finger, HIT-type
Type: Domain
Description: Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [ , , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. This entry represents the HIT-type zinc finger, which contains 7 conserved cysteines and one histidine that can potentially coordinate two zinc atoms. It has been named after the first protein that originally defined the domain: the yeast HIT1 protein ( ) [ ]. The HIT-type zinc finger displays some sequence similarities to the MYND-type zinc finger. The function of this domain is unknown but it is mainly found in nuclear proteins involved in gene regulation and chromatin remodeling. This domain is also found in the thyroid receptor interacting protein 3 (TRIP-3) , that specifically interacts with the ligand binding domain of the thyroid receptor.
Protein Domain
Name: ENTH/VHS
Type: Homologous_superfamily
Description: This superfamily represents domains with a multi-helical, α-α 2-layered structural fold as found in: the ENTH domain of Epsin; the VHS domain of Hrs, Tom1, and ADP-ribosylation factors; the RPR domain of PCF11 protein; and the N-terminal domain of phosphoinositide-binding clathrin adaptor.The epsin NH2-terminal homology (ENTH) domain is a membrane interacting module composed of a superhelix of α-helices. It is present at the NH2-terminus of proteins that often contain consensus sequences for binding to clathrin coat components and their accessory factors, and therefore function as endocytic adaptors. ENTH domain containing proteins have additional roles in signalling and actin regulation and may have yet other actions in the nucleus. The ENTH domain is structurally similar to the VHS domain.The ENTH domain is approximately 150 amino acids long. The ENTH domain forms a compact globular structure, composed of eight α-helices connected by loops of varying length. Three helical hairpins that are stacked consecutively with a right-handed twist determine the general topology of the domain. This stacking gives the ENTH domain a rectangular appearance when viewed face on. The most highly conserved amino acids fall roughly into two classes: internal residues that are involved in packing and therefore are necessary for structural integrity, and solvent accessible residues that may be involved in protein-protein interactions [ ].VHS domains are found at the N termini of select proteins involved in intracellular membrane trafficking. The domain consists of eight helices arranged in a superhelix. The surface of the domain has two main features: a basic patch on one side due to several conserved positively charged residues on helix 3 and a negatively charged ridge on the opposite side, formed by residues on helix 2. Comparison of the two VHS domains and the ENTH domain reveals a conserved surface, composed of helices 2 and 4, that is utilised for protein-protein interactions. In addition, VHS domain-containing proteins are also often localized to membranes. It has therefore been suggested that the conserved positively charged surface of helix 3 in VHS and ENTH domains plays a role in membrane binding [ ].
Protein Domain
Name: CRISPR-associated Cas3-type HD domain superfamily
Type: Homologous_superfamily
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes []. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. This entry represents the HD domain, which is found in a number of Cas proteins that tend to be found near CRISPR repeats. These domains can be found either separately or as the N-terminal region of Cas3, the helicase-containing CRISPR-associated protein. CRISPR loci appear to be mobile elements with a wide host range. The Cas3-type HD domain has nuclease activity against ssDNA and ssRNA [ , ].
Protein Domain
Name: Cse2 superfamily
Type: Homologous_superfamily
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. This superfamily represents Cse2 of Cas proteins, a novel family of endoribonucleases that cleaves single-stranded RNAs preferentially within U-rich regions, including Cse2 from Thermus thermophilus [ ]. These proteins are found in the CRISPR/Cas subtype Ecoli regions of many bacteria (most of which are mesophiles), and not in Archaea []. This is also known as Cse2 Type I-E [].
Protein Domain
Name: Apx/Shrm Domain 2
Type: Domain
Description: This entry represents the ASD2 motif of the Shroom proteins. The Shroom family is a small group of related proteins that are defined by sequence similarity and in most cases by some link to the actin cytoskeleton. The Shroom (Shrm) protein family is found only in animals. Proteins of this family are predicted to be utilised in multiple morphogenic and developmental processes across animal phyla to regulate cells shape or intracellular architecture in an actin and myosin-dependent manner [ ]. The Shrm family consists of:Shrm1 (formerly Apx), first found in Xenopus [ ]. Human Shroom1 links a membrane bound protein to the actin cytoskeleton [].Shrm2 (formerly Apxl), a protein involved in the morphogenesis, maintenance, and/or function of vascular endothelial cells. Shrm3 (formerly Shroom), a protein necessary for neural tube closure in vertebrate development as deficiency in Shrm results in spina bifida. Shrm3 is also conserved in some invertebrates, as orthologues can be found in sea urchins. Shrm4, a regulator of cyto-skeletal architecture that may play an important role in vertebrate development. It is implicated in X-linked mental retardation in humans. This protein family is based on the conservation of a specific arrangement of an N-terminal PDZ domain, a centrally positioned sequence motif termed ASD1 (Apx/Shrm Domain 1) and a C-terminal motif termed ASD2 [ , , ]. Shrm2 and Shrm3 contain all three domains, while Shrm4 contains the PDZ and ASD2 domains, but lacks a discernible ASD1 element. To date, the ASD1 and ASD2 elements have only been found in Shrm-related proteins and do not appear in combination with other conserved domains. ASD1 is required for targeting actin, while ASD2 is capable of eliciting an actomyosin based constriction event [, ]. ASD2 is the most highly conserved sequence element shared by Shrm1, Shrm2, Shrm3, and Shrm4. It possesses a well conserved series of leucine residues that exhibit spacing consistent with that of a leucine zipper motif [ ].
Protein Domain
Name: Zinc finger, FCS-type
Type: Domain
Description: Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [ , , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. The FCS-type zinc finger, named after the signature FCS residues associated with the third cysteine, is present in several polycomb group proteins. The FCS zinc finger is expected to adopt a Zn-ribbon like fold. Molecular data suggest that this domain can bind RNA in a non-sequence-specific manner [ ].Proteins known to contain an FCS-type zinc finger include:Drosphila polyhomeotic (ph) protein and vertebrate homologues. They are members of the Polycomb repressive complex 1 (PRC1), an inhibitory complex that acts by preventing chromatin remodeling and transcription.Drosophila Sex comb on midleg (Scm) and vertebrate homologues. They are also members of the PRC1.Drosphila Lethal(3) malignant brain tumor (l(3)mbt) protein and vertebrate homologues.
Protein Domain
Name: Apx/Shrm Domain 1
Type: Domain
Description: This entry represents the ASD1 (Apx/Shrm Domain 1) motif of the Shroom proteins. The Shroom family is a small group of related proteins that are defined by sequence similarity and in most cases by some link to the actin cytoskeleton. The Shroom (Shrm) protein family is found only in animals. Proteins of this family are predicted to be utilised in multiple morphogenic and developmental processes across animal phyla to regulate cells shape or intracellular architecture in an actin and myosin-dependent manner []. The Shrm family consists of:Shrm1 (formerly Apx), first found in Xenopus [ ]. Human Shroom1 links a membrane bound protein to the actin cytoskeleton [].Shrm2 (formerly Apxl), a protein involved in the morphogenesis, maintenance, and/or function of vascular endothelial cells. Shrm3 (formerly Shroom), a protein necessary for neural tube closure in vertebrate development as deficiency in Shrm results in spina bifida. Shrm3 is also conserved in some invertebrates, as orthologues can be found in sea urchins. Shrm4, a regulator of cyto-skeletal architecture that may play an important role in vertebrate development. It is implicated in X-linked mental retardation in humans. This protein family is based on the conservation of a specific arrangement of an N-terminal PDZ domain, a centrally positioned sequence motif termed ASD1 (Apx/Shrm Domain 1) and a C-terminal motif termed ASD2 [ , , ]. Shrm2 and Shrm3 contain all three domains, while Shrm4 contains the PDZ and ASD2 domains, but lacks a discernible ASD1 element. To date, the ASD1 and ASD2 elements have only been found in Shrm-related proteins and do not appear in combination with other conserved domains. ASD1 is required for targeting actin, while ASD2 is capable of eliciting an actomyosin based constriction event [, ]. ASD2 is the most highly conserved sequence element shared by Shrm1, Shrm2, Shrm3, and Shrm4. It possesses a well conserved series of leucine residues that exhibit spacing consistent with that of a leucine zipper motif [ ].
Protein Domain
Name: TonB-dependent receptor-like
Type: Family
Description: Proteins in this family are predominantly bacterial TonB-dependent transporters that have a N-terminal plug domain which acts as the channel gate, blocking the pore until the channel is bound by a ligand when it undergoes conformational changes and opens the channel; and a C-terminal β-barrel domain. Members of this entry proteins are multipass transmembrane proteins. In Escherichia coli the TonB protein interacts with outer membrane receptor proteins that carry out high-affinity binding and energy-dependent uptake of specific substrates into the periplasmic space [ ]. These substrates are either poorly permeable through the porin channels or are encountered at very low concentrations. In the absence of TonB, these receptors bind their substrates but do not carry out active transport. TonB-dependent regulatory systems consist of six components: a specialised outer membrane-localized TonB-dependent receptor (TonB-dependent transducer) that interacts with its energizing TonB-ExbBD protein complex, a cytoplasmic membrane-localized anti-sigma factor and an extracytoplasmic function (ECF)-subfamily sigma factor []. The TonB complex senses signals from outside the bacterial cell and transmits them via two membranes into the cytoplasm, leading to transcriptional activation of target genes. Energy for active transport is again obtained through physical interaction with TonB []. The proteins that are currently known or presumed to interact with TonB include BtuB, CirA, FatA, FcuT, FecA [], FhuA [], FhuE, FepA which is important for iron uptake and allows the bacterium to extract iron from the environment [], FptA, HemR, IrgA, IutA, PfeA, PupA and Tbp1. The TonB protein also interacts with some colicins.The family includes the vitamin B12 transporter BtuB from Escherichia coli, which translocates vitamin B12 (cyanocobalamin) across the outer membrane to the periplasmic space. A conserved sequence of seven residues, the Ton box, faces the periplasm and interacts with the inner membrane TonB protein to energize this active transport cycle []. The TonB-dependent receptor SusC from Bacteroides thetaiotaomicron, which transports starch oligosaccharides from the surface of the outer membrane to the periplasm for subsequent degradation [ ], is also a member of this family.
Protein Domain
Name: CRISPR-associated protein, Csm2 Type III-A
Type: Family
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. This entry represents Csm2 Type III-A, a family of CRISPR-associated proteins. These proteins are found adjacent to a characteristic short, palindromic repeat cluster termed CRISPR, a probable mobile DNA element. This family is also known as TM1810/Csm2 [ ].
Protein Domain
Name: CRISPR-associated protein, Csd1-type
Type: Family
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. This entry represents the Csd1 (CRISPR/Cas Subtype DVULG protein 1) family of Cas proteins, which tend to be found near CRISPR repeats of the DVULG subtype of CRISPR/Cas locus. The species range for this subtype, so far, is exclusively bacterial and mesophilic, although CRISPR loci in general are particularly common among archaea and thermophilic bacteria.
Protein Domain
Name: CRISPR-associated protein, Csx14
Type: Family
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. This entry is found in Cas proteins from a variety of bacterial and archaeal species, including Pelotomaculum thermopropionicum (strain DSM 13744/JCM 10971/SI), Thermoanaerobacter tengcongensis MB4, Roseiflexus sp. (strain RS-1), Thermoplasma volcanium, Picrophilus torridus, and Methanospirillum hungatei. The function of these proteins is unknown.
Protein Domain
Name: ApbE-like superfamily
Type: Homologous_superfamily
Description: This superfamily previously known as ApbE (thought to be involved in thiamine biosynthesis, alternatively known as Mg2+-dependent flavin transferase that has a role in catalysing the covalent attachment of FMN to a threonine residue in bacterial flavoproteins [ ]) was renamed to Ftp (flavin-trafficking protein) in 2013 due to the characterisation of TP0796, a lipoprotein from T. pallidum, which is classified as a putative member of the ApbE superfamily (). FAD pyrophosphatase () catalyses the hydrolysis of FAD, forming AMP and FMN. To date, the Ftp (TP0796) of T. pallidum is the first bacterial FAD pyrophosphatase shown to have a strict requirement for Mg2+ for its catalytic activity. Other Ftp homologs (formerly known as ApbE proteins) are present in the genomes of numerous bacteria and in lower eukaryotes, such as Trypanosoma spp. (agents of sleeping sickness and Chagas disease) and Leishmania spp. (agent of leishmaniasis), but the eukaryotic homologs appear to be fused with a multidomain fumarate reductase. Other members of the ApbE superfamily are related to the periplasmic ApbE lipoprotein of Salmonella typhimurium. In S. typhimurium, ApbE has been shown to be involved in thiamine biosynthesis and may serve in the conversion of aminoimidazole ribotide to 4-amino-5-hydroxymethyl-2-methylpyrimidine. T. pallidum is predicted to lack the thiamine pathway as well as the enzymes involved in aminoimidazole ribotide metabolism. The periplasmic location of ApbE prompts questions concerning how it could participate in a cytoplasmic pathway. ApbE proteins are relatively understudied biochemically, and representative crystal structures (PDB entries and ) have failed to definitively elucidate their functions. Some studies have shown that some of the Ftp family proteins bind FAD and that the Ftp protein from Vibrio harveyi transfers the FMN portion of FAD to a subunit of the integral inner membrane Nqr redox pump. The crystal structure of Ftp from T. pallidum displays a highly conserved Ftp fold and an active site/FAD-binding site of all known Ftp-like proteins.
Protein Domain
Name: Helicase Cas3, CRISPR-associated, Anaes-subtype
Type: Family
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. This entry represents a subfamily of Cas3 DEAH-box helicases found in Actinomyces naeslundii MG1, Geobacter sulfurreducens PCA, Gemmata obscuriglobus UQM 2246, and Desulfotalea psychrophila. The proteins include both DEAH and HD motifs. Cas3 is one of four protein families (Cas1 to Cas4) that are associated with CRISPR elements and always occur near a repeat cluster, usually in the order cas3-cas4-cas1-cas2 [ ].
Protein Domain
Name: CRISPR system subtype II-B RNA-guided endonuclease Cas9/Csx12
Type: Family
Description: This entry represents a family of large Cas proteins that have so far only been found in CRISPR/cas loci in Francisella tularensis, Wolinella succinogenes DSM 1740, and Legionella pneumophila (strain Paris) [, ]. One region of this large protein shows sequence similarity to HNH endonuclease . The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ].
Protein Domain
Name: CRISPR-associated endoribonuclease Cas6/Csy4, subtype I-F/YPEST
Type: Family
Description: This protein family, typified by YPO2462 of Yersinia pestis, is a CRISPR-associated (Cas) family strictly associated with the Ypest subtype of CRISPR/Cas locus. It is designated Csy4, for CRISPR/Cas Subtype Ypest protein 4. In Pseudomonas aeruginosa, crRNA biogenesis requires the endoribonuclease Csy4, which binds and cleaves the repetitive sequence of the CRISPR transcript [ ].The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [, , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ].
Protein Domain
Name: AF1862-like domain superfamily
Type: Homologous_superfamily
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. This superfamily represents the structural domain found in a family of Cas proteins as represented by TM1791.1 from Thermotoga maritima and AF1862 from Archaeoglobus fulgidus. This family of Cas proteins are found in both archaeal and bacterial species.
Protein Domain
Name: CRISPR-associated Cas3-type HD domain
Type: Domain
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability []. This entry represents the HD domain, which is found in a number of Cas proteins that tend to be found near CRISPR repeats. These domains can be found either separately or as the N-terminal region of Cas3, the helicase-containing CRISPR-associated protein. CRISPR loci appear to be mobile elements with a wide host range. The Cas3-type HD domain has nuclease activity against ssDNA and ssRNA [ , ].
Protein Domain
Name: 5'-3' exonuclease, alpha-helical arch, N-terminal
Type: Domain
Description: The N-terminal and internal 5'3'-exonuclease domains are commonly found together, and are most often associated with 5' to 3' nuclease activities. The XPG protein signatures ( ) are never found outside the '53EXO' domains. The latter are found in more diverse proteins [ , , ]. The number of amino acids that separate the two 53EXO domains, and the presence of accompanying motifs allow the diagnosis of several protein families.In the eubacterial type A DNA-polymerases, the N-terminal and internal domains are separated by a few amino acids, usually four. The pattern DNA_POLYMERASE_A ( ) is always present towards the C terminus. Several eukaryotic structure-dependent endonucleases and exonucleases have the 53EXO domains separated by 24 to 27 amino acids, and the XPG protein signatures are always present. In several proteins from herpesviridae, the two 53EXO domains are separated by 50 to 120 amino acids. These proteins are implicated in the inhibition of the expression of the host genes. Eukaryotic DNA repair proteins with 600 to 700 amino acids between the 53_EXO domains all carry the XPG protein signatures. This entry represents the N-terminal resolvase-like domain, which has a 3-layer alpha/beta/alpha core structure and contains an α-helical arch [ , ].
Protein Domain
Name: Cullin, conserved site
Type: Conserved_site
Description: Cullins are a family of hydrophobic proteins that act as scaffolds for ubiquitin ligases (E3). Cullins are found throughout eukaryotes. Humans express seven cullins (Cul1, 2, 3, 4A, 4B, 5 and 7), each forming part of a multi-subunit ubiquitin complex. Cullin-RING ubiquitin ligases (CRLs), such as Cul1 (SCF) [ ], play an essential role in targeting proteins for ubiquitin-mediated destruction; as such, they are diverse in terms of composition and function, regulating many different processes from glucose sensing and DNA replication to limb patterning and circadian rhythms. The catalytic core of CRLs consists of a RING protein and a cullin family member. For Cul1, the C-terminal cullin-homology domain binds the RING protein. The RING protein appears to function as a docking site for ubiquitin-conjugating enzymes (E2s). Other proteins contain a cullin-homology domain, such as the APC2 subunit of the anaphase-promoting complex/cyclosome and the p53 cytoplasmic anchor PARC; both APC2 and PARC have ubiquitin ligase activity. The N-terminal region of cullins is more variable, and is used to interact with specific adaptor proteins [, , ].This entry represents a conserved site found in various cullin proteins.
Protein Domain
Name: Lanthionine synthetase C-like
Type: Family
Description: The LanC-like protein superfamily encompasses a highly divergent group of peptide-modifying enzymes, including the eukaryotic and bacterial lanthionine synthetase C-like proteins (LanC) [ , , ]; subtilin biosynthesis protein SpaC from Bacillus subtilis [, ]; epidermin biosynthesis protein EpiC from Staphylococcus epidermidis []; nisin biosynthesis protein NisC from Lactococcus lactis [, , ]; GCR2 from Arabidopsis thaliana []; and many others. The 3D structure of the lantibiotic cyclase from L. lactis has been determined by X-ray crystallography to 2.5A resolution [ ]. The globular structure is characterised by an all-α fold, in which an outer ring of helices envelops an inner toroid composed of 7 shorter, hydrophobic helices. This 7-fold hydrophobic periodicity has led several authors to claim various members of the family, including eukaryotic LanC-1 and GCR2, to be novel G protein-coupled receptors [, ]; some of these claims have since been corrected [, , ]. The C terminus of the lantibiotic biosynthesis protein LanM is homologous to LanC [ , ]. LanC-like protein 1 (included in this family) has been shown to function as a glutathione transferase, and as such has been renamed to Glutathione S-transferase LANCL1 [].
Protein Domain
Name: MoaB/Mog domain
Type: Domain
Description: MoaB/Mog domain, also known as Cnx1G domain, is found in the bacterial molybdenum cofactor (Moco) biosynthesis protein MoaB/Mog and N-terminal of the eukaryotic MoCF biosynthesis proteins, such as the Drosophila protein cinnamon, the Arabidopsis protein cnx1 and the mammal protein gephyrin [ ]. These proteins are involved in the final steps of Moco synthesis. In E. coli two proteins, MogA and MoeA are essential for Mo insertion into molybdopterin, while in plants and animals one fusion protein with two domains (G and E domains) fulfils this function. The G domain of Cnx1 and gephyrin shares similarity with MoaB/Mog, while their E domain displays similarities to the sulfurtransferase rhodanese (homologous to E. coli MoeA/chlE) []. Structurally, MogA is folded into a compact molecule with alpha/beta/alpha architecture and forms a trimer []. The Cnx1G domain has been shown to bind molybdopterin []. This domain is also found in N-terminal of the FAD synthases belonging to a family that catalyses the adenylation of flavin mononucleotide (FMN) to form flavin adenine dinucleotide (FAD) coenzyme [ ]. CinA from Thermus thermophilus is shown to have both nicotinamide mononucleotide deamidase and ADP-ribose pyrophosphatase activities, with ADP-ribose pyrophosphatase activity attributed to the N-terminal domain [].
Protein Domain
Name: RNase P subunit Pop5/Rpp14/Rnp2-like
Type: Family
Description: This entry contains ribonuclease P (Rnp) proteins from eukaryotes and archaea. Rnp is a ubiquitous ribozyme that catalyzes a Mg2 -dependent hydrolysis to remove the 5'-leader sequence of precursor tRNA (pre-tRNA) [ , ]. Archaeal and eukaryotic RNase P consist of a single RNA and archaeal RNase P has four or five proteins, while eukaryotic RNase P consists of 9 or 10 proteins. Eukaryotic and archaeal RNase P RNAs cooperatively function with protein subunits in catalysis []. Human RNase P is composed of a singular protein Pop1 and three subcomplexes, the Rpp20-Rpp25 heterodimer, Pop5-Rpp14-(Rpp30)2-Rpp40 heteropentamer, and Rpp21-Rpp29-Rpp38 heterotrimer. Although both Pop5 and Rpp14 have similar protein structure, they share a very limited sequence similarity. Moreover, the C-terminal fragments after the conserved beta sheets in Pop5 and Rpp14 exhibit distinct structural features that mediate interactions with Pop1 and Rpp40, respectively [ ].In the hyperthermophilic archaeon Pyrococcus horikoshii OT3, RNase P is composed of the RNase P RNA (pRNA) and five proteins (PhoPop5, PhoRpp38, PhoRpp21, PhoRpp29, and PhoRpp30) [ , ].Proteins in this entry include Rnp2 (also known as Pop5) from archaea and Pop5/Rpp14 from humans [ ].
Protein Domain
Name: Deglycase PfpI
Type: Family
Description: PfpI from Pyrococcus furiosus functions as a protein deglycase that repairs methylglyoxal- and glyoxal-glycated proteins [ ]. Pfpi-like proteins have been found in bacteria, archaea, in some plants and in amoebae. The structure of P. horikoshii PfpI shows an alpha/beta sandwich fold, which consists of a central beta sheet flanked by two helices and two strands on one side, and by six helices and two strands on the other side. The fold resembles that of glutamine amidotransferases (GATase) of class I, which are characterised by a conserved Cys-His-Glu active site. The catalytically essential cysteine is located at a nucleophile elbow. PfpI forms a hexameric structure and the active sites are formed at the interfaces between three pairs of monomers. Cys and His form a triad with a glutamate residue from an adjacent monomer [].Other proteins in this group:Bacillus subtilis general stress protein 18 (GSP18/yfkM) [ ].E. coli protein yhbO. It is involved in protection against environmental stresses. It has deglycase activity, and can repair glyoxal- and methylglyoxal-glycated proteins and nucleic acids [ , ].This group of proteins form part of the MEROPS peptidase family C56 (Pfp1 endopeptidase, clan PC(C)).
Protein Domain
Name: Divalent ion tolerance protein, CutA
Type: Family
Description: The CutA family of proteins which exhibit ion tolerance are found in a large variety of species [ ]. In E.Coli, two operons on the cutA locus contain genes that encode three proteins, CutA1, CutA2 and CutA3. CutA1 proteins are found in the cytoplasm while CutA2 (50kDa) and CutA3 (24kDa) are located in the inner membrane. Although the role of E. Coli CutA1 is not clear, studies on E. coli cutA locus describe some mutations that lead to an increase in copper sensitivity, thus suggesting a role in ion tolerance [].To date, the structure of CutA proteins from several species have been solved [ , ]. The crystal structures of the E.Coli and rat CutA1 proteins show both these proteins to be trimeric in the crystal as well as in solution[].Trimerisation seems to supported by the formation of beta sheets between the subunit. This trimeric structure suggests the protein may be involved in signal transduction due to architectural similarities with PII signal transducer proteins [].Recent studies propose that mammalian CutA1 in the neuronal cell membrane acts as an anchor for acetylcholinesterase (AChE)1 [ ].
Protein Domain
Name: Lepidopteran low molecular weight lipoprotein
Type: Family
Description: This family includes Lepidopteran low molecular weight (30kDa) lipoproteins (30KPs), which belong to the Lipoprotein 11 family. They are abundant proteins found in the hemolymph, mainly synthesized in the fat body and then secreted into the hemolymph during the last instar larval stage. They have very similar amino acid and nucleotide sequences which are also similar to those of microvitellogenin in Manduca sexta [ ]. Structure resolution of some of its members revealed that they belong to the β-sheet superfamily and share a characteristic fold, with differences in loop conformations thought to confer ligand specificity [, ].Studies from Bombyx mori, showed that these proteins function as storage proteins during growth and development but they are used only during embryonic development, thought to act as storage units of amino acids for use in de novo synthesis of other proteins. They are also involved in energy transport and metabolic processes such as release of diacylglycerol and the transport of steroids and hormones. 30KPs may be involved in defense response mechanisms as they can be conjugated to glucose, dextran, maltose and glycoproteins [ ]. These proteins may also inhibit apoptosis, thereby having a role in cell survival [, ].
Protein Domain
Name: Gamma-synuclein
Type: Family
Description: Synucleins are small, soluble proteins expressed primarily in neural tissue and in certain tumors [ , ]. The family includes three known proteins: alpha-synuclein, beta-synuclein, and gamma-synuclein. All synucleins have in common a highly conserved α-helical lipid-binding motif with similarity to the class-A2 lipid-binding domains of the exchangeable apolipoproteins [].Synuclein family members are not found outside vertebrates, although they have some conserved structural similarity with plant 'late-embryo-abundant' proteins. The alpha- and beta-synuclein proteins are found primarily in brain tissue, where they are seen mainly in presynaptic terminals [ , ]. The gamma-synuclein protein is found primarily in the peripheral nervous system and retina, but its expression in breast tumors [] is a marker for tumor progression [].Normal cellular functions have not been determined for any of the synuclein proteins, although some data suggest a role in the regulation of membrane stability and/or turnover.Mutations in alpha-synuclein are associated with rare familial cases of early-onset Parkinson's disease, and the protein accumulates abnormally in Parkinson's disease, Alzheimer's disease,and several other neurodegenerative illnesses [ ]. As with other synucleins, gamma-synuclein, or persyn, is believed to be involved in the pathogenesis of human neurodegenerative diseases []. Persyn influences neurofilament network integrity [].
Protein Domain
Name: Ion-transport regulator, FXYD family
Type: Family
Description: The FXYD protein family contains at least seven members in mammals [ ]. Two other family members that are not obvious orthologs of any identified mammalian FXYD protein exist in zebrafish. All these proteins share a signature sequence of six conserved amino acids comprising the FXYD motif in the NH2-terminus, and two glycines and one serine residue in the transmembrane domain. FXYD proteins are widely distributed in mammalian tissues with prominent expression in tissues that perform fluid and solute transport or that are electrically excitable. Initial functional characterisation suggested that FXYD proteins act as channels or as modulators of ion channels however studies have revealed that most FXYD proteins have another specific function and act as tissue-specific regulatory subunits of the Na,K-ATPase [ , ]. Each of these auxiliary subunits produces a distinct functional effect on the transport characteristics of the Na,K-ATPase that is adjusted to the specific functional demands of the tissue in which the FXYD protein is expressed. FXYD proteins appear to preferentially associate with Na,K-ATPase alpha1-beta isozymes, and affect their function in a way that render them operationally complementary or supplementary to coexisting isozymes.
Protein Domain
Name: PPE family, C-terminal
Type: Domain
Description: The human pathogen Mycobacterium tuberculosis harbours a large number of genes that encode proteins whose N-termini contain the characteristic motifs Pro-Glu (PE) or Pro-Pro-Glu (PPE). A subgroup of the PE proteins contains polymorphic GC-rich sequences (PGRS), while a subgroup of the PPE proteins contains major polymorphic tandem repeats (MPTR). The function of most of these proteins remains unknown [ ]. However, the PE_PGRS proteins from Mycobacterium marinum are secreted by components of the ESX-5 system that belongs to the recently defined type VII secretion systems []. It has also been reported that the PE_PGRS family of proteins contains multiple calcium-binding and glycine-rich sequence motifs GGXGXD/NXUX. This sequence repeat constitutes a calcium-binding parallel β-roll or parallel β-helix structure and is found in RTX toxins secreted by many Gram-negative bacteria []. This domain family is found in bacteria, and is approximately 90 amino acids in length. The family is found in association with . There is a conserved SVP sequence motif. There is a single completely conserved residue W that may be functionally important. The proteins in this family are PE/PPE proteins implicated in immunostimulation and virulence.
Protein Domain
Name: Regulator of G-protein signaling 14, Ras-binding domain 1
Type: Domain
Description: RGS14 is a regulator of G protein signaling (RGS) protein and contains an N-terminal RGS domain, two tandem Ras-binding domains (RBDs) and a G protein regulatory (GPR, also referred to as a GoLoco) motif. It regulates G protein nucleotide exchange and hydrolysis by acting as a GTPase-activating protein (GAP) through its RGS domain, and as a guanine nucleotide dissociation inhibitor (GDI) through its GoLoco motif [ , ]. Both domains of RGS14 target members of the Gialpha subclass []. It is a microtubule-associated protein that may modulate microtubule dynamics and spindle formation [] and play an essential role during mammalian cell division []. RGS14 regulates the activation of alphaMbeta2 integrin during phagocytosis []. It is a key regulator of signalling pathways linking synaptic plasticity in CA2 pyramidal neurons to hippocampal-based learning and memory [].RGS14 binds activated H-Ras-GTP through its first RBD and interacts with Rap2-GTP and RAF kinases by the second tandem RBD. This entry represents the first RBD. RBD is structurally similar to the β-grasp fold of ubiquitin, a common structure involved in protein-protein interactions. RGS14 modulates neuronal physiology and all of its binding partners have roles in synaptic plasticity [, , , , ].
Protein Domain
Name: Frag1/DRAM/Sfk1
Type: Family
Description: This entry includes Frag1, DRAM and Sfk1 proteins. Frag1 (FGF receptor activating protein 1, also known as post-GPI attachment to proteins factor 2) is a protein that is conserved from fungi to humans. There are four potential iso-prenylation sites throughout the peptide, CILW (x2), CIIW and CIGL. Frag1 is a membrane-spanning protein that is ubiquitously expressed in adult tissues suggesting an important cellular function []. DRAM is a family of proteins conserved from nematodes to humans with six hydrophobic transmembrane regions and an endoplasmic reticulum signal peptide. It is a lysosomal protein that induces macro-autophagy as an effector of p53-mediated death, where p53 is the tumour-suppressor gene that is frequently mutated in cancer. Expression of DRAM is stress-induced []. DRAM-3 (also known as modulator of macroautophagy TMEM150B) has been shown to be a modulator of both macroautophagy and cell survival under starvation conditions []. This region is also part of a family of small plasma membrane proteins, referred to as Sfk1, that may act together with or upstream of Stt4p to generate normal levels of the essential phospholipid PI4P, thus allowing proper localisation of Stt4p to the actin cytoskeleton [, ].
Protein Domain
Name: SNX17/27/31
Type: Family
Description: SNX17/27/31 are known as PX-FERM proteins, a sub-group of the PX superfamily that contain a defining PtdInsP-binding PX domain, and a C-terminal 4.1, ezrin, radixin, moesin (FERM) domain with an atypical tertiary structure [ ]. They are modular peripheral membrane proteins that act as central scaffolds mediating protein-lipid interactions, cargo binding, and regulatory protein recruitment [].SNX17 is a critical regulator of endosomal recycling of numerous receptors, channels, and other transmembrane proteins. It binds to Asn-Pro-Xaa-Tyr (NPxY) sequences in the cytoplasmic tails of cargo such as LDL receptors and the amyloid precursor protein [ ]. SNX27 is a core component of the SNX27-retromer, a multiprotein complex composed of SNX27, the WASH complex and the retromer complex. It specifically binds and directs sorting of a subset of transmembrane proteins containing a PDZ-binding motif at the C terminus [ ]. Its PDZ domain mediates binding to the retromer complex via direct interaction with VPS26 (VPS26A or VPS26B) [], while its PX domain mediates binding to phosphatidylinositol 3-phosphate (PtdIns(3P)) and localization to early endosome membranes []. SNX31 is a sorting nexin that binds to PtdIns3P-enriched lipids via its N-terminal PX-domain. It also binds to uroplakins [ ].
Protein Domain
Name: PAT complex subunit CCDC47
Type: Family
Description: This family represents CCDC47 proteins which are a component of the PAT complex, an endoplasmic reticulum (ER)-resident membrane multiprotein complex that facilitates multi-pass membrane proteins insertion into membranes [ ]. The PAT complex, formed by CCDC47 and Asterix proteins, acts as an intramembrane chaperone by directly interacting with nascent transmembrane domains (TMDs), releasing its substrates upon correct folding, and is needed for optimal biogenesis of multi-pass membrane proteins []. WDR83OS/Asterix is the substrate-interacting subunit of the PAT complex, whereas CCDC47 () is required to maintain the stability of WDR83OS/Asterix [ , ]. The PAT complex favors the binding to TMDs with exposed hydrophilic amino acids within the lipid bilayer and provides a membrane-embedded partially hydrophilic environment in which TMD1 binds [].CCDC47 is associated with various membrane-associated processes and is a component of a ribosome-associated ER translocon complex involved in multi-pass membrane protein transport into the ER membrane and biogenesis [ ]. It is also involved in the regulation of calcium ion homeostasis in the ER [], being also required for proper protein degradation via the ERAD (ER-associated degradation) pathway [].This entry also includes the uncharacterised proteins YNR021W from S. cerevisiae, C2G5.01 from S. pombe and At5g49945 from Arabidopsis.
Protein Domain
Name: START-like domain
Type: Domain
Description: This entry represents the START-like domain distantly related to the START domain.START (StAR-related lipid-transfer) is a lipid-binding domain in StAR, HD-ZIP and signalling proteins [ ]. StAR (Steroidogenic Acute Regulatory protein) is a mitochondrial protein that is synthesised in response to luteinising hormone stimulation [].Expression of the protein in the absence of hormone stimulation is sufficient to induce steroid production, suggesting that this protein is required in the acute regulation ofsteroidogenesis. Representatives of the START domain family have been shown to bind different ligands such as sterols (StAR protein) andphosphatidylcholine (PC-TP). Ligand binding by the START domain can also regulate the activities of other domains that co-occur with the START domainin multidomain proteins such as Rho-gap, the homeodomain, and the thioesterase domain [, ]. The crystal structure of START domain of human MLN64 shows analpha/beta fold built around an U-shaped incomplete β-barrel. Most importantly, the interior of the protein encompasses a 26 x 12 x 11 Angstromshydrophobic tunnel that is apparently large enough to bind a single cholesterol molecule []. The START domain structure revealed an unexpectedsimilarity to that of the birch pollen allergen Bet v 1 and to bacterial polyketide cyclases/aromatases [, ].
Protein Domain
Name: Cyclophilin-type peptidyl-prolyl cis-trans isomerase, cyclophilin A-like
Type: Family
Description: Cyclophilins exhibit peptidyl-prolyl cis-trans isomerase (PPIase) activity ( ), accelerating protein folding by catalysing the cis-trans isomerisation of proline imidic peptide bonds in oligopeptides [ , ]. They also have protein chaperone-like functions [] and are the major high-affinity binding proteins for the immunosuppressive drug cyclosporin A (CSA) in vertebrates [].Cyclophilins are found in all prokaryotes and eukaryotes, and have been structurally conserved throughout evolution, implying their importance in cellular function [ ]. They share a common 109 amino acid cyclophilin-like domain (CLD) and additional domains unique to each member of the family. The CLD domain contains the PPIase activity, while the unique domains are important for selection of protein substrates and subcellular compartmentalisation [].This entry includes eukaryotic, bacterial and archeal proteins which exhibit a peptidylprolyl cis- trans isomerases activity (PPIase, Rotamase) and in addition bind the immunosuppressive drug cyclosporin (CsA). Immunosuppression in vertebrates is believed to bethe result of the cyclophilin A-cyclosporin protein drug complex binding to and inhibiting the protein-phosphatase calcineurin [ , ]. This entry also includes proteins that do not have the peptidyl-prolyl cis-trans isomerase activity, such as CWC27 from humans [].
Protein Domain
Name: DnaJ homolog subfamily A member 1/2-like
Type: Family
Description: The DnaJ proteins, also known as heat shock protein 40 (Hsp40 or Hsc40), are proteins originally identified in Escherichia coli that act as cochaperones to the molecular chaperone DnaK (Hsp70), which is responsible for several cellular processes such as rescuing misfolded proteins, folding polypeptide chains, transport of polypeptides through membranes, assembly and disassembly of protein complexes, and control of regulatory proteins [ ].Structurally, the DnaJ protein consists of an N-terminal conserved domain (called 'J' domain) of about 70 amino acids, a glycine-rich region ('G' domain') of about 30 residues, a central domain containing four repeats of a CXXCXGXG motif ('CRR' domain) and a C-terminal region of 120 to 170 residues.This entry represents a group of DnaJ domain containing proteins, including DNJA1/2/4 from humans, Scj1/Mas5/Xdj1/Apj1 from budding yeasts, and DNAJ2/3/19 from Arabidopsis. [ ].In humans, DNAJA1, DNAJA2 and HSC70 have been shown to play key roles in both the folding and degradation of wild-type and mutant CFTR (cystic fibrosis transmembrane conductance regulator) [ , ]. They also function as co-chaperone for HSPA1B and negatively regulates the translocation of BAX from the cytosol to mitochondria in response to cellular stress, thereby protecting cells against apoptosis [].
Protein Domain
Name: Cytochrome P450
Type: Family
Description: Cytochrome P450 enzymes are a superfamily of haem-containing mono-oxygenases that are found in all kingdoms of life, and which show extraordinary diversity in their reaction chemistry. In mammals, these proteins are found primarily in microsomes of hepatocytes and other cell types, where they oxidise steroids, fatty acids and xenobiotics, and are important for the detoxification and clearance of various compounds, as well as for hormone synthesis and breakdown, cholesterol synthesis and vitamin D metabolism. In plants, these proteins are important for the biosynthesis of several compounds such as hormones, defensive compounds and fatty acids. In bacteria, they are important for several metabolic processes, such as the biosynthesis of antibiotic erythromycin in Saccharopolyspora erythraea (Streptomyces erythraeus).Cytochrome P450 enzymes use haem to oxidise their substrates, using protons derived from NADH or NADPH to split the oxygen so a single atom can be added to a substrate. They also require electrons, which they receive from a variety of redox partners. In certain cases, cytochrome P450 can be fused to its redox partner to produce a bi-functional protein, such as with P450BM-3 from Bacillus megaterium [ ], which has haem and flavin domains.Organisms produce many different cytochrome P450 enzymes (at least 58 in humans), which together with alternative splicing can provide a wide array of enzymes with different substrate and tissue specificities. Individual cytochrome P450 proteins follow the nomenclature: CYP, followed by a number (family), then a letter (subfamily), and another number (protein); e.g. CYP3A4 is the fourth protein in family 3, subfamily A. In general, family members should share >40% identity, while subfamily members should share >55% identity.Cytochrome P450 proteins can also be grouped by two different schemes. One scheme was based on a taxonomic split: class I (prokaryotic/mitochondrial) and class II (eukaryotic microsomes). The other scheme was based on the number of components in the system: class B (3-components) and class E (2-components). These classes merge to a certain degree. Most prokaryotes and mitochondria (and fungal CYP55) have 3-component systems (class I/class B) - a FAD-containing flavoprotein (NAD(P)H-dependent reductase), an iron-sulphur protein and P450. Most eukaryotic microsomes have 2-component systems (class II/class E) - NADPH:P450 reductase (FAD and FMN-containing flavoprotein) and P450. There are exceptions to this scheme, such as 1-component systems that resemble class E enzymes [ , , ]. The class E enzymes can be further subdivided into five sequence clusters, groups I-V, each of which may contain more than one cytochrome P450 family (eg, CYP1 and CYP2 are both found in group I). The divergence of the cytochrome P450 superfamily into B- and E-classes, and further divergence into stable clusters within the E-class, appears to be very ancient, occurring before the appearance of eukaryotes.This family also includes germacrene A hydroxylase (GAO1; ) from plants such as lettuce (Lactuca sativa). GAO1 is required for the biosynthesis of germacrene-derived sesquiterpene lactones, which are characteristic natural products in members of the Asteraceae [ ].
Protein Domain
Name: CELF1/2, RNA recognition motif 1
Type: Domain
Description: The human CELF family has six members, which can be divided into two subfamilies based on their phylogeny: CELF1-2 and CELF3-6. This entry represents the RNA recognition motif 1 (RRM1) of CELF-1 and CELF-2 protein. CELF-1 and CELF-2 belong to the CELF (CUGBP and ETR-3 Like Factor)/Bruno-like protein family, whose members play important roles in the regulation of alternative splicing and translation. CELF-1 and CELF-2 share sequence similarity to the Drosophila Bruno protein and binds to the Bruno response elements (cis-acting sequences in the 3'-untranslated region (UTR) ofoskar mRNA) [ ].The human CELF-1 (also known as CUG-BP or BRUNOL-2) binds to RNA substrates and recruits PARN deadenylase [ ]. It preferentially targets UGU-rich mRNA elements []. CELF-1 has been implicated in onset of type 1 myotonic dystrophy (DM1), a neuromuscular disease associated with an unstable CUG triplet expansion in the 3'-UTR (3'-untranslated region) of the DMPK (myotonic dystrophy protein kinase) gene [, ]. CELF-1 contain three highly conserved RNA recognition motifs (RRMs): two consecutive RRMs (RRM1 and RRM2) situated in the N-terminal region followed by a linker region and the third RRM (RRM3) close to the C terminus of the protein. The Xenopus homologue of CELF-1 is EDEN-BP (embryo deadenylation element-binding protein), which mediates sequence-specific deadenylation of Eg5 mRNA. It binds specifically to the EDEN motif in the 3'-untranslated regions of maternal mRNAs and targets these mRNAs for deadenylation and translational repression [ ]. The two N-terminal RRMs of EDEN-BP are necessary for the interaction with EDEN as well as a part of the linker region (between RRM2 and RRM3). Oligomerization of EDEN-BP is required for specific mRNA deadenylation and binding []. CELF-2 (also known as CUGBP2 or ETR-3) shares high sequence identity with CELF-1, but shows different binding specificity; it binds preferentially to sequences with UG repeats and UGUU motifs. It also binds to the 3'-UTR of cyclooxygenase-2 messages, affecting both translation and mRNA stability, and binds to apoB mRNA, regulating its C to U editing [ ]. CELF-2 also contains three highly conserved RRMs. It binds to RNA via the first two RRMs, which are also important for localization in the cytoplasm. The splicing activation or repression activity of CELF-2 on some specific substrates is mediated by RRM1/RRM2. Both, RRM1 and RRM2 of CELF-2, can activate cardiac troponin T (cTNT) exon 5 inclusion. In addition, CELF-2 possesses a typical arginine and lysine-rich nuclear localization signal (NLS) in the C terminus, within RRM3 [].Proteins containing this motif also include Drosophila melanogaster Bruno protein, which plays a central role in regulation of Oskar (Osk) expression in flies. It mediates repression by binding to regulatory Bruno response elements (BREs) in the Osk mRNA 3' UTR [ ]. The full-length Bruno protein contains three RRMs, two located in the N-terminal half of the protein and the third near the C terminus, separated by a linker region.
Protein Domain
Name: STAT transcription factor, all-alpha domain
Type: Domain
Description: The STAT protein (Signal Transducers and Activators of Transcription) family contains transcription factors that are specifically activated to regulate gene transcription when cells encounter cytokines and growth factors, hence they act as signal transducers in the cytoplasm and transcription activators in the nucleus [ ]. Binding of these factors to cell-surface receptors leads to receptor autophosphorylation at a tyrosine, the phosphotyrosine being recognised by the STAT SH2 domain, which mediates the recruitment of STAT proteins from the cytosol and their association with the activated receptor. The STAT proteins are then activated by phosphorylation via members of the JAK family of protein kinases, causing them to dimerise and translocated to the nucleus, where they bind to specific promoter sequences in target genes. In mammals, STATs comprise a family of seven structurally and functionally related proteins: Stat1, Stat2, Stat3, Stat4, Stat5a and Stat5b, Stat6. STAT proteins play a critical role in regulating innate and acquired host immune responses. Dysregulation of at least two STAT signalling cascades (i.e. Stat3 and Stat5) is associated with cellular transformation.Signalling through the JAK/STAT pathway is initiated when a cytokine binds to its corresponding receptor. This leads to conformational changes in the cytoplasmic portion of the receptor, initiating activation of receptor associated members of the JAK family of kinases. The JAKs, in turn, mediate phosphorylation at the specific receptor tyrosine residues, which then serve as docking sites for STATs and other signalling molecules. Once recruited to the receptor, STATs also become phosphorylated by JAKs, on a single tyrosine residue. Activated STATs dissociate from the receptor, dimerise, translocate to the nucleus and bind to members of the GAS (gamma activated site) family of enhancers.The seven STAT proteins identified in mammals range in size from 750 and 850 amino acids. The chromosomal distribution of these STATs, as well as the identification of STATs in more primitive eukaryotes, suggest that this family arose from a single primordial gene. STATs share 6 structurally and functionally conserved domains including: an N-terminal domain (ND) that strengthens interactions between STAT dimers on adjacent DNA-binding sites; a coiled-coil STAT domain (CCD) that is implicated in protein-protein interactions; a DNA-binding domain (DBD) with an immunoglobulin-like fold similar to p53 tumour suppressor protein; an EF-hand-like linker domain connecting the DNA-binding and SH2 domains; an SH2 domain () that acts as a phosphorylation-dependent switch to control receptor recognition and DNA-binding; and a C-terminal transactivation domain [ , , ]. The crystal structure of the N terminus of Stat4 reveals a dimer. The interface of this dimer is formed by a ring-shaped element consisting of five short helices. Several studies suggest that this N-terminal dimerisation promotes cooperativity of binding to tandem GAS elements and with the transcriptional coactivator CBP/p300.This entry represents the all-alpha helical domain, which consists of four long helices arranged in a bundle with a left-handed twist (coiled-coil), which in turn forms a right-handed superhelix.
Protein Domain
Name: STAT transcription factor, coiled coil
Type: Homologous_superfamily
Description: The STAT protein (Signal Transducers and Activators of Transcription) family contains transcription factors that are specifically activated to regulate gene transcription when cells encounter cytokines and growth factors, hence they act as signal transducers in the cytoplasm and transcription activators in the nucleus [ ]. Binding of these factors to cell-surface receptors leads to receptor autophosphorylation at a tyrosine, the phosphotyrosine being recognised by the STAT SH2 domain, which mediates the recruitment of STAT proteins from the cytosol and their association with the activated receptor. The STAT proteins are then activated by phosphorylation via members of the JAK family of protein kinases, causing them to dimerise and translocated to the nucleus, where they bind to specific promoter sequences in target genes. In mammals, STATs comprise a family of seven structurally and functionally related proteins: Stat1, Stat2, Stat3, Stat4, Stat5a and Stat5b, Stat6. STAT proteins play a critical role in regulating innate and acquired host immune responses. Dysregulation of at least two STAT signalling cascades (i.e. Stat3 and Stat5) is associated with cellular transformation.Signalling through the JAK/STAT pathway is initiated when a cytokine binds to its corresponding receptor. This leads to conformational changes in the cytoplasmic portion of the receptor, initiating activation of receptor associated members of the JAK family of kinases. The JAKs, in turn, mediate phosphorylation at the specific receptor tyrosine residues, which then serve as docking sites for STATs and other signalling molecules. Once recruited to the receptor, STATs also become phosphorylated by JAKs, on a single tyrosine residue. Activated STATs dissociate from the receptor, dimerise, translocate to the nucleus and bind to members of the GAS (gamma activated site) family of enhancers.The seven STAT proteins identified in mammals range in size from 750 and 850 amino acids. The chromosomal distribution of these STATs, as well as the identification of STATs in more primitive eukaryotes, suggest that this family arose from a single primordial gene. STATs share 6 structurally and functionally conserved domains including: an N-terminal domain (ND) that strengthens interactions between STAT dimers on adjacent DNA-binding sites; a coiled-coil STAT domain (CCD) that is implicated in protein-protein interactions; a DNA-binding domain (DBD) with an immunoglobulin-like fold similar to p53 tumour suppressor protein; an EF-hand-like linker domain connecting the DNA-binding and SH2 domains; an SH2 domain ( ) that acts as a phosphorylation-dependent switch to control receptor recognition and DNA-binding; and a C-terminal transactivation domain [ , , ]. The crystal structure of the N terminus of Stat4 reveals a dimer. The interface of this dimer is formed by a ring-shaped element consisting of five short helices. Several studies suggest that this N-terminal dimerisation promotes cooperativity of binding to tandem GAS elements and with the transcriptional coactivator CBP/p300.This entry represents a domain consisting of four long helices that forms a bundle with a left-handed twist (coiled coil), in a right-handed superhelix.
Protein Domain
Name: Dictyostelium (slime mold) repeat
Type: Repeat
Description: Several Dictyostelium species have proteins that contain conserved repeats. These proteins have been variously described as 'extracellular matrix protein B', 'cyclic nucleotide phosphodiesterase inhibitor precursor', 'prestalk protein precursor', 'putative calmodulin-binding protein CamBP64', and 'cysteine-rich, acidic integral membrane protein precursor' as well as 'hypothetical protein'.
Protein Domain
Name: Zinc finger, PHD-finger
Type: Domain
Description: Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [, , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. This entry represents the PHD (homeodomain) zinc finger domain [ ], which is a C4HC3 zinc-finger-like motif found in nuclear proteins thought to be involved in chromatin-mediated transcriptional regulation. The PHD finger motif is reminiscent of, but distinct from the C3HC4 type RING finger.The function of this domain is not yet known but in analogy with the LIM domain it could be involved in protein-protein interaction and be important for the assembly or activity of multicomponent complexes involved in transcriptional activation or repression. Alternatively, the interactions could be intra-molecular and be important in maintaining the structural integrity of the protein. In similarity to the RING finger and the LIM domain, the PHD finger is thought to bind two zinc ions.
Protein Domain
Name: Zinc finger, PHD-type
Type: Domain
Description: Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [ , , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. This entry represents the PHD (homeodomain) zinc finger domain [ ], which is a C4HC3 zinc-finger-like motif found in nuclear proteins thought to be involved in chromatin-mediated transcriptional regulation. The PHD finger motif is reminiscent of, but distinct from the C3HC4 type RING finger.The function of this domain is not yet known but in analogy with the LIM domain it could be involved in protein-protein interaction and be important for the assembly or activity of multicomponent complexes involved in transcriptional activation or repression. Alternatively, the interactions could be intra-molecular and be important in maintaining the structural integrity of the protein. In similarity to the RING finger and the LIM domain, the PHD finger is thought to bind two zinc ions.
Protein Domain
Name: CRISPR-associated protein, Cas6
Type: Family
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. This entry represents Cas6, a broadly distributed, highly divergent Cas family, as represented by TM1814 from Thermotoga maritima. TM1814 contains a C-terminal motif GXGXXXXXGXG, where the each X between two Gly is hydrophobic and the spacer XXXXX contains (usually) one Arg or Lys. Members of this protein family are found associated with several different CRISPR/cas system subtypes. Cas6 proteins share the ability to recognise and cleave a single phosphodiester bond in a short repeated sequence of the pre-crRNA transcript [ , ].
Protein Domain
Name: Wnt inhibitory factor (WIF)-1
Type: Family
Description: Wnt proteins constitute a large family of secreted molecules that are involved in intercellular signalling during development. The name derives from the first 2 members of the family to be discovered: int-1 (mouse) and wingless (Drosophila) [ ]. It is now recognised that Wnt signalling controls many cell fate decisions in a variety of different organisms, including mammals []. Wnt signalling has been implicated in tumourigenesis, early mesodermal patterning of the embryo, morphogenesis of the brain and kidneys, regulation of mammary gland proliferation and Alzheimer's disease [, ].Wnt-mediated signalling is believed to proceed initially through binding to cell surface receptors of the frizzled family; the signal is subsequently transduced through several cytoplasmic components to B-catenin, which enters the nucleus and activates the transcription of several genes important in development []. Several non-canonical Wnt signalling pathways have also been elucidated that act independently of B-catenin. Canonical and noncanonical Wnt signaling branches are highly interconnected, and cross-regulate each other [ ].Members of the Wnt gene family are defined by their sequence similarity to mouse Wnt-1 and Wingless in Drosophila. They encode proteins of ~350-400 residues in length, with orthologues identified in several, mostly vertebrate, species. Very little is known about the structure of Wnts as they are notoriously insoluble, but they share the following features characteristics of secretory proteins: a signal peptide, several potential N-glycosylation sites and 22 conserved cysteines [] that are probably involved in disulphide bonds. The Wnt proteins seem to adhere to the plasma membrane of the secreting cells and are therefore likely to signal over only few cell diameters. Fifteen major Wnt gene families have been identified in vertebrates, with multiple subtypes within some classes.Wnt inhibitory factor-1 (WIF-1) is a secreted protein that binds to Wnt proteins and inhibits their activities. It was first identified as an EST from human retina; ortholgues have since been identified in mouse, rat, Xenopus and zebrafish. WIF-1 proteins comprise an N-terminal signal sequence, a family-specific WIF domain, five epidermal growth factor (EGF) repeats and a hydrophilic C terminus. The interaction of Wnt proteins with WIF-1 and other inhibitors, such as frizzled-related protein, is thought to fine tune their activity [ ].
Protein Domain
Name: ERM family, FERM domain C-lobe
Type: Domain
Description: This entry represents the C-lobe of FERM domain found in the ERM family members, including ezrin, radixin, moesin and merlin. They are composed of a N-terminal FERM (ERM) domain, a coiled coil region (CRR), and a C-terminal domain CERMAD (C-terminal ERM association domain) which has an F-actin-binding site (ABD). Two actin-binding sites have been identified in the middle and N-terminal domains. Merlin is structurally similar to the ERM proteins, but instead of an actin-binding domain (ABD), it contains a C-terminal domain (CTD), just like the proteins from the 4.1 family. Activated ezrin, radixin and moesin are thought to be involved in the linking of actin filaments to CD43, CD44, ICAM1-3 cell adhesion molecules, various membrane channels and receptors, such as the Na+/H+ exchanger-3 (NHE3), cystic fibrosis transmembrane conductance regulator (CFTR), and the beta2-adrenergic receptor [ , ]. The ERM proteins exist in two states, a dormant state in which the FERM domain binds to its own C-terminal tail and thereby precludes binding of some partner proteins, and an activated state, in which the FERM domain binds to one of many membrane binding proteins and the C-terminal tail binds to F-actin [, ]. The FERM domain has a cloverleaf tripart structure composed of: (1) FERM_N (A-lobe or F1); (2) FERM_M (B-lobe, or F2); and (3) FERM_C (C-lobe or F3). The C-lobe/F3 within the FERM domain is part of the PH domain family. Like most other ERM members they have a phosphoinositide-binding site in their FERM domain. The FERM C domain is the third structural domain within the FERM domain. The FERM domain is found in the cytoskeletal-associated proteins such as ezrin, moesin, radixin, 4.1R, and merlin. These proteins provide a link between the membrane and cytoskeleton and are involved in signal transduction pathways. The FERM domain is also found in protein tyrosine phosphatases (PTPs) , the tyrosine kinases FAK and JAK, in addition to other proteins involved in signaling. This domain is structurally similar to the PH and PTB domains and consequently is capable of binding to both peptides and phospholipids at different sites [ , ].
Protein Domain
Name: Cullin, N-terminal
Type: Domain
Description: Cullins are a family of hydrophobic proteins that act as scaffolds for ubiquitin ligases (E3). Cullins are found throughout eukaryotes. Humans express seven cullins (Cul1, 2, 3, 4A, 4B, 5 and 7), each forming part of a multi-subunit ubiquitin complex. Cullin-RING ubiquitin ligases (CRLs), such as Cul1 (SCF) [ ], play an essential role in targeting proteins for ubiquitin-mediated destruction; as such, they are diverse in terms of composition and function, regulating many different processes from glucose sensing and DNA replication to limb patterning and circadian rhythms. The catalytic core of CRLs consists of a RING protein and a cullin family member. For Cul1, the C-terminal cullin-homology domain binds the RING protein. The RING protein appears to function as a docking site for ubiquitin-conjugating enzymes (E2s). Other proteins contain a cullin-homology domain, such as the APC2 subunit of the anaphase-promoting complex/cyclosome and the p53 cytoplasmic anchor PARC; both APC2 and PARC have ubiquitin ligase activity. The N-terminal region of cullins is more variable, and is used to interact with specific adaptor proteins [, , ].This entry represents the N-terminal region of cullin proteins, which consists of several domains, including cullin repeat domain, a 4-helical bundle domain, an alpha+beta domain, and a winged helix-like domain.
Protein Domain
Name: GOSR2/Membrin/Bos1
Type: Family
Description: This entry consists of a group of SNAREs (soluble N-ethylmaleimide-sensitive fusion protein-attachment protein receptors), including Golgi SNAP receptor complex member 2 (GOSR2), vesicle transport through interaction with t-SNAREs homologue 1 (VTI1) and the related proteins membrin (Memb11 and Memb12) in plants and Bos1 in yeasts. Bos1 is necessary for vesicular transport from the ER to the Golgi [ ]. GOSR2 is involved in protein transport through the Golgi apparatus []. In Arabidopsis, Memb11 is a v-SNARE involved in anterograde protein trafficking at the ER-Golgi interface []. In human, defects in GOSR2 are the cause of progressive myoclonic epilepsy type 6 (EPM6). A neurologic disorder characterised by onset of ataxia in the first years of life, followed by action myoclonus and seizures later in childhood, and loss of independent ambulation in the second decade [ ]. SNARE proteins are a family of membrane-associated proteins characterised by an α-helical coiled-coil domain called the SNARE motif [ ]. These proteins are classified as v-SNAREs and t-SNAREs based on their localisation on vesicle or target membrane; another classification scheme defines R-, Qa-, Qb- and Qc-SNAREs according to sequence similarities []. SNAREs are localised to distinct membrane compartments of the secretory and endocytic trafficking pathways, and contribute to the specificity of intracellular membrane fusion processes.
Protein Domain
Name: Calreticulin/calnexin, P domain superfamily
Type: Homologous_superfamily
Description: The type-I integral membrane protein calnexin (CNX) and its soluble paralog calreticulin (CRT) are members of a family of molecular chaperones that function in the endoplasmic reticulum (ER) of eukaryotic cells. These calcium-binding proteins are lectins that bind newly synthesised N-linked glycoproteins to help promote efficient folding and oligomeric assembly. The chaperones act to retain the glycoproteins in the ER while they are still incompletely folded, ensuring that the ER quality control machinery can dispose of misfolded glycoproteins. The family of molecular chaperones are conserved among plants, fungi, and animals. The P domain contains a high-affinity calcium-binding site and is thought to be involved in either substrate binding or protein-protein interactions. The P domain forms part of the lumenal region in CNX. In both CRT and CNX the P domain forms a protrusion, or arm, extending from the core protein. The amino acid sequence of the P domain is highly conserved and is characteristic for this family of lectins. The structure of the P domain consists of a non-globular proline-rich hairpin fold. The P domain is composed of multiple copies of two types of proline-rich repeat sequences, a 17 amino acid type 1 motif and a 14 amino acid type 2 motif, with the arrangement 111222 in CRT and 11112222 in CNX [ , ].
Protein Domain
Name: AAA+ ATPase domain
Type: Domain
Description: The AAA+ superfamily of ATPases is found in all kingdoms of living organisms where they participate in diverse cellular processes including membrane fusion, proteolysis and DNA replication. Although the terms AAA+ and AAA are often used loosely and interchangeably, the classical AAA family members are distinguished by their possession of the SRH region in the AAA module. Many AAA+ proteins are involved in similar processes to those of AAA proteins (facilitation of protein folding and unfolding, assembly or disassembly of proteins complexes, protein transport and degradation), but others function in replication, recombination, repair and transcription. For a review see [ ]. The proteins in this superfamily are characterised by the structural conservation of a central ATPase domain of about 250 amino acids called the AAA+ module. Typically, the AAA+ domain can be divided into two structural subdomains, an N-terminal P-loop NTPase α-β-α subdomain that is connected to a smaller C-terminal all-α subdomain. The α-β-α subdomain adopts a Rossman fold and contains several motifs involved in ATP binding and hydrolysis, including classical motifs Walker A and Walker B [ , ]. The all-α subdomain [], is much less conserved across AAA+ proteins. This entry represents the AAA+ ATPase domain found in a range of proteins, including Holliday junction ATP-dependent DNA helicase RuvB from Mycobacterium sp.
Protein Domain
Name: K Homology domain
Type: Domain
Description: The K homology (KH) domain was first identified in the human heterogeneous nuclear ribonucleoprotein (hnRNP) K. An evolutionarily conserved sequence of around 70 amino acids, the KH domain is present in a wide variety of nucleic acid-binding proteins. The KH domain binds RNA, and can function in RNA recognition [ ]. It is found in multiple copies in several proteins, where they can function cooperatively or independently. For example, in the AU-rich element RNA-binding protein KSRP, which has 4 KH domains, KH domains 3 and 4 behave as independent binding modules to interact with different regions of the AU-rich RNA targets []. The solution structure of the first KH domain of FMR1 [] and of the C-terminal KH domain of hnRNP K [] determined by nuclear magnetic resonance(NMR) revealed a β-α-α-β-β-α structure. Proteins containing KH domains include: Bacterial and organelle PNPases [ ].Archaeal and eukaryotic exosome subunits [ ].Eukaryotic and prokaryotic RS3 ribosomal proteins [ ].Vertebrate Fragile X messenger ribonucleoprotein 1 (FMR1) [ ].Vigilin, which has 14 KH domains [ ].AU-rich element RNA-binding protein KSRP.hnRNP K, which contains 3 KH domains.Human onconeural ventral antigen-1 (NOVA-1) [ ].According to structural analyses [ , , ], the KH domain can be separated in two groups - type 1 and type 2.
Protein Domain
Name: Agenet-like domain
Type: Domain
Description: Fragile X messenger ribonucleoprotein 1 (FMR1/FMRP), and its autosomal paralogues, RNA-binding proteins FXR1/2 (Fragile X-related protein 1/2), comprise a family of RNA-binding proteins that are involved the regulation of alternative mRNA splicing, mRNA stability, mRNA dendritic transport and postsynaptic local protein synthesis of a subset of mRNAs, playing a crucial role in neuronal development and synaptic plasticity [ , , , ]. These proteins are highly similar to one another and also retain highly conserved domain architecture. The two ribonucleoprotein K homology (KH) domains and the cluster of arginine and glycine residues that constitute the RGG box, comprise a large region that is important for RNA binding and polyribosome association. In addition, two Agenet-like domains exist in tandem within the N-terminal regions of FMRP family proteins. The Agenet-like domain belongs to the "Royal"domain superfamily which contains also the Tudor, chromo, MBT, PWWP and plant Agenet domains. Biochemical analysis of the tandem Agenet-like domains reveals their ability to preferentially recognise trimethylated peptides in a sequence-specific manner [ , , , ].The Agenet-like domain folds into a bent four-stranded antiparallel β-sheet with a fifth strand closing the cavity of the sheet, similar to a thumb across a semiclosed hand [ , ].
Protein Domain
Name: GGA3, VHS domain
Type: Domain
Description: ADP-ribosylation factor-binding protein GGA3, also known as golgi-localised gamma ear-containing ARF-binding protein 3, plays a role in protein sorting and trafficking between the trans-Golgi network (TGN) and endosomes. It is required for the lysosomal degradation of BACE (beta-site APP-cleaving enzyme), the protease that initiates the production of beta-amyloid, which causes Alzheimer's disease [ ]. It also plays a key role in GABA transmission, which is important in the regulation of anxiety-like behaviours []. GGA3 mediates the ARF-dependent recruitment of clathrin to the TGN [] and binds ubiquitinated proteins and membrane cargo molecules []. GGA3 belong to the GGA family of proteins, which have a multidomain structure consisting of an N-terminal VHS domain linked by a short proline-rich linker to a GAT (GGA and TOM) domain, which is followed by a long flexible linker to the C-terminal appendage, GAE (Gamma-Adaptin Ear) domain. This entry represents the VHS domain, which has a superhelical structure similar to the structure of the ARM (Armadillo) repeats and is present at the N-terminal of these proteins. The VHS domain of GGA proteins binds to the acidic-cluster dileucine (DxxLL) motif found on the cytoplasmic tails of cargo proteins trafficked between the Trans-Golgi Network and the endosomal system [, , ].
Protein Domain
Name: Diaphanous autoregulatory (DAD) domain
Type: Domain
Description: Formins participate in the assembly of the actin and microtubule cytoskeletons in processes like cell division, migration, and development. Diaphanous-related formins (DRF) contain an N-terminal GTPase-binding domain (GBD) and a C-terminal diaphanous autoregulatory domain (DAD). DRFs are regulated by an autoinhibitory interaction of the C-terminal DAD with the DRF N-terminal armadillo repeat-like region (see ) in the DID or GBD/FH3 domain [ , , ]. This autoinhibition is released upon competitive binding of an activated Rho GTPase to the GBD. The release of DAD allows the catalytical formin homology 2 (FH2) domain to then nucleate and elongate nonbranched actin filaments.The DAD domain is a ~32 amino acid autoinhibitory domain, which facilitates intramolecular binding. The DAD core forms an α-helical structure and the C-terminal part of the domain contains several basic residues that form a basic region [ , , , ].Proteins known to contain a DAD domain include:Fruit fly protein diaphanous, which plays an important role during cytokinesis.Mammalian diaphanous-related formins (DRF) 1-3, which act as Rho GTPase effectors during cytoskeletal remodelling.Saccharomyces cerevisiae (Baker's yeast) proteins BNI1 and BNI1-related protein 1 (BNR1).Emericella nidulans (Aspergillus nidulans) cytokinesis protein sepA, which participates in two actin-mediated processes, septum formation and polarized growth.Mammalian disheveled-associated activator of morphogenesis (DAAM) proteins.Mammalian formin-like 1 protein (Fmnl1) or formin-related protein gene in leukocytes (FRL).
Protein Domain
Name: PE-PGRS family, N-terminal
Type: Domain
Description: The human pathogen Mycobacterium tuberculosis harbours a large number of genes that encode proteins whose N-termini contain the characteristic motifs Pro-Glu (PE) or Pro-Pro-Glu (PPE). A subgroup of the PE proteins contains polymorphic GC-rich sequences (PGRS), while a subgroup of the PPE proteins contains major polymorphic tandem repeats (MPTR). The function of most of these proteins remains unknown [ ]. However, the PE_PGRS proteins from Mycobacterium marinum are secreted by components of the ESX-5 system that belongs to the recently defined type VII secretion systems []. It has also been reported that the PE_PGRS family of proteins contains multiple calcium-binding and glycine-rich sequence motifs GGXGXD/NXUX. This sequence repeat constitutes a calcium-binding parallel β-roll or parallel β-helix structure and is found in RTX toxins secreted by many Gram-negative bacteria []. This family is named after a PE motif near to the amino terminus. The carboxyl terminus of this family are variable and fall into several classes. The largest class of PE proteins is the highly repetitive PGRS class which have a high glycine content. The function of these proteins is uncertain but it has been suggested that they may be related to antigenic variation of Mycobacterium tuberculosis [ ] and also immune modulation [].
Protein Domain
Name: CST complex subunit Ten1, fungal
Type: Family
Description: Stn1 and Ten1 are DNA-binding proteins with specificity for telomeric DNA substrates and both protect chromosome termini from unregulated resection and regulate telomere length [ ]. Stn1 complexes with Ten1 and Cdc13 to function as a telomere-specific replication protein A (RPA)-like complex []. These three interacting proteins associate with the telomeric overhang in budding yeast, whereas a single protein known as Pot1 (protection of telomeres-1) performs this function in fission yeast, and a two-subunit complex consisting of POT1 and TPP1 associates with telomeric ssDNA in humans. S.pombe has Stn1- and Ten1-like proteins that are essential for chromosome end protection. Stn1 orthologues exist in all species that have Pot1, whereas Ten1-like proteins can be found in all fungi. Fission yeast Stn1 and Ten1 localise at telomeres in a manner that correlates with the length of the ssDNA overhang, suggesting that they specifically associate with the telomeric ssDNA. Two separate protein complexes are required for chromosome end protection in fission yeast. Protection of telomeres by multiple proteins with OB-fold domains is conserved in eukaryotic evolution []. Ten1 is one of the three components of the CST complex, which, in conjunction with the Shelterin complex helps protect telomeres from attack by DNA-repair mechanisms [].This entry represent Ten1 from fungi.
Protein Domain
Name: Protein-tyrosine phosphatase, YopH, N-terminal domain superfamily
Type: Homologous_superfamily
Description: This entry represents the N-terminal domain of YopH protein tyrosine phosphatase (PTP). This domain has a compact structure composed of four α-helices and two β-hairpins. Helices alpha-1 and alpha-3 are parallel to each other and antiparallel to helices alpha-2 and alpha-4. This domain targets YopH for secretion from the bacterium and translocation into eukaryotic cells, and has phosphotyrosyl peptide-binding activity, allowing for recognition of p130Cas and paxillin [ ]. YopH from Yersinia sp. is essential for pathogenesis, as it allows the bacteria to resist phagocytosis by host macrophages through its ability to dephosphorylate host proteins, thereby interfering with the host signalling process. Yersinia has one of the most active PTP enzymes known. YopH contains a loop of ten amino acids (the WPD loop) that covers the entrance of the active site of the enzyme during substrate binding []. A homologous domain is found in YscM (Yop secretion protein M or Yop proteins translocation protein M), which acts as a Yop protein translocation protein. Several Yop proteins are involved in pathogenesis. YscM is produced by the virulence operon virC, which encodes thirteen genes, yscA-M [ ]. Transcription of the virC operon was subjected to the same regulation as the yop genes.
Protein Domain
Name: Immunoglobulin/albumin-binding domain superfamily
Type: Homologous_superfamily
Description: This superfamily represents immunoglobulin and albumin-binding (GA module) domains from various bacterial proteins, which share a common fold consisting of a left-handed three-helical bundle (mirror topology to spectrin-like fold).The Staphylococcus aureus virulence factor protein A (SpA) contains five highly homologous Ig-binding domains in tandem (designated domains E, D, A, B and C) that share this common structure. Protein A can exist in both secreted and membrane-bound forms, and has two distinct Ig-binding activities: each domain can bind Fc-gamma (the constant region of IgG involved in effector functions) and Fab (the Ig fragment responsible for antigen recognition) [ ].Protein G-related albumin-binding (GA) modules occur on the surface of numerous Gram-positive bacterial pathogens. Protein G of group C and G Streptococci interacts with the constant region of IgG and with human serum albumin. The GA module is found in a range of bacterial cell surface proteins [ , ]. GA modules may promote bacterial growth and virulence in mammalian hosts by scavenging albumin-bound nutrients and camouflaging the bacteria. Variations in sequence give rise to differences in structure and function between GA modules in different proteins, which could alter pathogenesis and host specificity due to their varied affinities for different species of albumin []. Proteins containing a GA module include PAB from Peptostreptococcus magnus [].
Protein Domain
Name: Proteinase inhibitor I47, latexin
Type: Family
Description: This family consists of several animal specific latexin and proteins related to latexin that belong to MEROPS proteinase inhibitor family I47, clan I- [ ]. Latexin, a protein possessing inhibitory activity against rat carboxypeptidase A1 (CPA1) and CPA2 (MEROPS peptidase family M14A), is expressed in a neuronal subset in the cerebral cortex and cells in other neural and non-neural tissues of rat [, ]. OCX-32, the 32kDa eggshell matrix protein, is present at high levels in the uterine fluid during the terminal phase of eggshell formation, and is localised predominantly in the outer eggshell. The timing of OCX-32 secretion into the uterine fluid suggests that it may play a role in the termination of mineral deposition [ ]. OCX-32 protein possesses limited identity (32%) to two unrelated proteins: latexin and to a skin protein that is encoded by a retinoic acid receptor-responsive gene, TIG1. Tazarotene Induced Gene 1 (TIG1) is a putative 228 transmembrane protein with a small N-terminal intracellular region, a single membrane-spanning hydrophobic region, and a large C-terminal extracellular region containing a glycosylation signal. TIG1 is up-regulated by retinoic acid receptor but not by retinoid X receptor-specific synthetic retinoids []. TIG1 may be a tumour suppressor gene whose diminished expression is involved in the malignant progression of prostate cancer [].
Protein Domain
Name: Protein-tyrosine phosphatase, YopH, N-terminal
Type: Domain
Description: This entry represents the N-terminal domain of YopH protein tyrosine phosphatase (PTP). This domain has a compact structure composed of four α-helices and two β-hairpins. Helices alpha-1 and alpha-3 are parallel to each other and antiparallel to helices alpha-2 and alpha-4. This domain targets YopH for secretion from the bacterium and translocation into eukaryotic cells, and has phosphotyrosyl peptide-binding activity, allowing for recognition of p130Cas and paxillin [ ]. YopH from Yersinia sp. is essential for pathogenesis, as it allows the bacteria to resist phagocytosis by host macrophages through its ability to dephosphorylate host proteins, thereby interfering with the host signalling process. Yersinia has one of the most active PTP enzymes known. YopH contains a loop of ten amino acids (the WPD loop) that covers the entrance of the active site of the enzyme during substrate binding []. A homologous domain is found in YscM (Yop secretion protein M or Yop proteins translocation protein M), which acts as a Yop protein translocation protein. Several Yop proteins are involved in pathogenesis. YscM is produced by the virulence operon virC, which encodes thirteen genes, yscA-M [ ]. Transcription of the virC operon was subjected to the same regulation as the yop genes.
Protein Domain
Name: Trypanosome variant surface glycoprotein, A-type, N-terminal domain
Type: Domain
Description: This entry represents the N-terminal of the variant surface glycoproteins from Trypanosome. Proteins containing this domain include a variety of surface proteins such as variant surface glycoprotein and protein PAG1. The trypanosome parasite expresses these proteins to evade the immune response [ ]. The variant surface glycoprotein (VSG) of Trypanosoma brucei forms a coat on the surface of the parasite; by the expression of a series of antigenically distinct VSGs in the surface coat the parasite escapes the host immune response. The 2.9A resolution crystal structure of the N-terminal domain of one variant, MITat 1.2, has been determined [ ]. The "top"of the protein, which in the surface coat may be exposed to the external environment, is formed from the ends of the two long helices, a short three-stranded β-sheet, and a strand having irregular conformation that packs above these secondary structure elements. Two conserved disulphide bridges are in this part of the molecule. Several elements of the MITat 1.2 sequence, which contribute to the formation of the helix bundle structure, have been identified. These elements can be found in the sequences of several different VSGs, suggesting that to some extent the VSG structure is conserved in those variants [ , ].
Protein Domain
Name: Prohibitin
Type: Family
Description: This entry describes proteins similar to prohibitin (a lipid raft-associated integral membrane protein). Individual proteins of the SPFH (band 7) domain superfamily may cluster to form membrane microdomains which may in turn recruit multiprotein complexes [ ]. These microdomains, in addition to being stable scaffolds, may also be dynamic units with their own regulatory functions [, ]. Prohibitin is a mitochondrial inner-membrane protein which may act as a chaperone for the stabilization of mitochondrial proteins. Human prohibitin forms a hetero-oligomeric complex with Bap-37 (prohibitin 2, an SPFH domain carrying homologue). This complex may protect non-assembled membrane proteins against proteolysis by the m-AAA protease [, ]. Prohibitin and Bap-37 yeast homologues have been implicated in yeast longevity [] and in the maintenance of mitochondrial morphology. Sequence comparisons suggest that the prohibitin gene is an analogue of Cc, a Drosophila melanogaster gene that is vital for normal development [].Genes that negatively regulate proliferation inside the cell are of considerable interest because of the implications in processes such as development and cancer []. Prohibitin acts as a cytoplasmic anti-proliferative protein, is widely expressed in a variety of tissues and inhibits DNA synthesis. Studies have suggested that prohibitin may be a suppressor gene and is associated with tumour development and/or progression of at least some breast cancers [].
Protein Domain
Name: Patellin
Type: Family
Description: This is a family of proteins widely distributed across the plant kingdom. Patellin (PATL) proteins are yeast Sec14 protein (Sec14p)-like phosphatidylinositol transfer proteins (PITPs) involved in several biological processes such as plant development and stress tolerance regulation, playing a role in membrane-trafficking events [ , ]. In Arabidopsis, this family consists of six members, PATL1-6, and are characterised by the presence of two conserved domains observed in other membrane trafficking-related proteins: a Sec14p-like lipid-binding domain and a GOLD following the N-terminal variable domain. The GOLD domain is found in proteins related to Golgi function, membrane homeostasis and vesicle trafficking []. PATL1 is the defining member of this family, associated to the cell-plate, and has a regulatory role during cytokinesis, which is thought to be involved in clathrin-dependent endocytosis that aids cell plate remodeling and completion []. These proteins are able to interact with phosphoinositides, varying their affinity among them which results their involvement in different processes and signalling pathways. PATLs also respond to auxin and play a role in the regulation of PIN1, which suggests that they play a redundant and crucial role in polarity and patterning in Arabidopsis [, ].This entry also includes some CRAL-TRIO domain-containing proteins from fungi.
Protein Domain
Name: CHAP domain
Type: Domain
Description: The CHAP (cysteine, histidine-dependent amidohydrolases/peptidases) domain is a region between 110 and 140 amino acids that is found in proteins frombacteria, bacteriophages, archaea and eukaryotes of the Trypanosomidae family. Many of these proteins are uncharacterised, but it has been proposed that theymay function mainly in peptidoglycan hydrolysis. The CHAP domain is found in a wide range of protein architectures; it is commonly associated with bacterialtype SH3 domains and with several families of amidase domains. It has been suggested that CHAP domain containing proteins utilise a catalytic cysteineresidue in a nucleophilic-attack mechanism [ , ].The CHAP domain contains two invariant residues, a cysteine and a histidine. These residues form part of the putative active site of CHAP domain containingproteins. Secondary structure predictions show that the CHAP domain belongs to the alpha + beta structural class, with the N-terminal half largely containingpredicted alpha helices and the C-terminal half principally composed of predicted beta strands [, ].Some proteins known to contain a CHAP domain are listed below: Bacterial and trypanosomal glutathionylspermidine amidases.A variety of bacterial autolysins.A Nocardia aerocolonigenes putative esterase.Streptococcus pneumoniae choline-binding protein D.Methanosarcina mazei protein MM2478, a putative chloride channel.Several phage-encoded peptidoglycan hydrolases.Cysteine peptidases belonging to MEROPS peptidase family C51 (D-alanyl-glycyl endopeptidase, clan CA).
Protein Domain
Name: NTF2-like domain superfamily
Type: Homologous_superfamily
Description: This superfamily represents a domain covering the whole length of the nuclear transport factor 2 (NTF2). It has a β-α(2)-β insertion after the main helix. Ntf2 protein is a nuclear envelope protein facilitating protein transport into the nucleus [ ].Besides Ntf2, proteins containing this domain include protein kinases, sucrose phosphatases, bacterial ring-hydroxylating dioxygenase beta subunit, protein NXF and many other uncharacterised proteins.
Protein Domain
Name: OTU domain
Type: Domain
Description: An homology region containing four conserved motifs has been identified in proteins from eukaryotes, several groups of viruses and the pathogenic bacteria Chlamydia pneumoniae []. None of these proteins has a known biochemical function but low sequence similarity with the polyprotein regions of arteriviruses has lead to suggest that it could possess cysteine protease activity []. In this case, the conserved cysteine and aspartate in motif I and the histidine in motif IV could be the catalytic residues. Motifs II and III have a more limited sequence conservation and could be involved in substrate recognition [].It has been proposed that the eukaryotic proteins containing an OTU domain could mediate proteolytic events involved in signalling associated with the modification of chromatin structure and control of cell proliferation [ ].In viruses proteins containing this domain are annotated as replicase or RNA-dependent RNA polymerase. The eukaryotic sequences are related to the Ovarian Tumour (OTU) gene in Drosophila, cezanne deubiquitinating peptidase and tumor necrosis factor, alpha-induced protein 3 (MEROPS peptidase family C64) and otubain 1 and otubain 2 (MEROPS peptidase family C65). A cysteine peptidase is a proteolytic enzyme that hydrolyses a peptide bond using the thiol group of a cysteine residue as a nucleophile. Hydrolysis involves usually a catalytic triad consisting of the thiol group of the cysteine, the imidazolium ring of a histidine, and a third residue, usually asparagine or aspartic acid, to orientate and activate the imidazolium ring. In only one family of cysteine peptidases, is the role of the general base assigned to a residue other than a histidine: in peptidases from family C89 (acid ceramidase) an arginine is the general base. Cysteine peptidases can be grouped into fourteen different clans, with members of each clan possessing a tertiary fold unique to the clan. Four clans of cysteine peptidases share structural similarities with serine and threonine peptidases and asparagine lyases. From sequence similarities, cysteine peptidases can be clustered into over 80 different families [ ]. Clans CF, CM, CN, CO, CP and PD contain only one family.Cysteine peptidases are often active at acidic pH and are therefore confined to acidic environments, such as the animal lysosome or plant vacuole. Cysteine peptidases can be endopeptidases, aminopeptidases, carboxypeptidases, dipeptidyl-peptidases or omega-peptidases. They are inhibited by thiol chelators such as iodoacetate, iodoacetic acid, N-ethylmaleimide or p-chloromercuribenzoate. Clan CA includes proteins with a papain-like fold. There is a catalytic triad which occurs in the order: Cys/His/Asn (or Asp). A fourth residue, usually Gln, is important for stabilising the acyl intermediate that forms during catalysis, and this precedes the active site Cys. The fold consists of two subdomains with the active site between them. One subdomain consists of a bundle of helices, with the catalytic Cys at the end of one of them, and the other subdomain is a β-barrel with the active site His and Asn (or Asp). There are over thirty families in the clan, and tertiary structures have been solved for members of most of these. Peptidases in clan CA are usually sensitive to the small molecule inhibitor E64, which is ineffective against peptidases from other clans of cysteine peptidases [ ].Clan CD includes proteins with a caspase-like fold. Proteins in the clan have an α/β/α sandwich structure. There is a catalytic dyad which occurs in the order His/Cys. The active site His occurs in a His-Gly motif and the active site Cys occurs in an Ala-Cys motif; both motifs are preceded by a block of hydrophobic residues [ ]. Specificity is predominantly directed towards residues that occupy the S1 binding pocket, so that caspases cleave aspartyl bonds, legumains cleave asparaginyl bonds, and gingipains cleave lysyl or arginyl bonds.Clan CE includes proteins with an adenain-like fold. The fold consists of two subdomains with the active site between them. One domain is a bundle of helices, and the other a β-barrel. The subdomains are in the opposite order to those found in peptidases from clan CA, and this is reflected in the order of active site residues: His/Asn/Gln/Cys. This has prompted speculation that proteins in clans CA and CE are related, and that members of one clan are derived from a circular permutation of the structure of the other.Clan CL includes proteins with a sortase B-like fold. Peptidases in the clan hydrolyse and transfer bacterial cell wall peptides. The fold shows a closed β-barrel decorated with helices with the active site at one end of the barrel []. The active site consists of a His/Cys catalytic dyad.Cysteine peptidases with a chymotrypsin-like fold are included in clan PA, which also includes serine peptidases. Cysteine peptidases that are N-terminal nucleophile hydrolases are included in clan PB. Cysteine peptidases with a tertiary structure similar to that of the serine-type aspartyl dipeptidase are included in clan PC. Cysteine peptidases with an intein-like fold are included in clan PD, which also includes asparagine lyases.
Protein Domain
Name: MutS, connector domain superfamily
Type: Homologous_superfamily
Description: Mismatch repair contributes to the overall fidelity of DNA replication and is essential for combating the adverse effects of damage to the genome. It involves the correction of mismatched base pairs that have been missed by the proofreading element of the DNA polymerase complex. The post-replicative Mismatch Repair System (MMRS) of Escherichia coli involves MutS (Mutator S), MutL and MutH proteins, and acts to correct point mutations or small insertion/deletion loops produced during DNA replication [ ]. MutS and MutL are involved in preventing recombination between partially homologous DNA sequences. The assembly of MMRS is initiated by MutS, which recognises and binds to mispaired nucleotides and allows further action of MutL and MutH to eliminate a portion of newly synthesized DNA strand containing the mispaired base []. MutS can also collaborate with methyltransferases in the repair of O(6)-methylguanine damage, which would otherwise pair with thymine during replication to create an O(6)mG:T mismatch []. MutS exists as a dimer, where the two monomers have different conformations and form a heterodimer at the structural level []. Only one monomer recognises the mismatch specifically and has ADP bound. Non-specific major groove DNA-binding domains from both monomers embrace the DNA in a clamp-like structure. Mismatch binding induces ATP uptake and a conformational change in the MutS protein, resulting in a clamp that translocates on DNA. MutS is a modular protein with a complex structure [ ], and is composed of:N-terminal mismatch-recognition domain, which is similar in structure to tRNA endonuclease.Connector domain, which is similar in structure to Holliday junction resolvase ruvC.Core domain, which is composed of two separate subdomains that join together to form a helical bundle; from within the core domain, two helices act as levers that extend towards (but do not touch) the DNA.Clamp domain, which is inserted between the two subdomains of the core domain at the top of the lever helices; the clamp domain has a β-sheet structure.ATPase domain (connected to the core domain), which has a classical Walker A motif.HTH (helix-turn-helix) domain, which is involved in dimer contacts.The MutS family of proteins is named after the Salmonella typhimurium MutS protein involved in mismatch repair. Homologues of MutS have been found in many species including eukaryotes (MSH 1, 2, 3, 4, 5, and 6 proteins), archaea and bacteria, and together these proteins have been grouped into the MutS family. Although many of these proteins have similar activities to the E. coli MutS, there is significant diversity of function among the MutS family members. Human MSH has been implicated in non-polyposis colorectal carcinoma (HNPCC) and is a mismatch binding protein [].This diversity is even seen within species, where many species encode multiple MutS homologues with distinct functions []. Inter-species homologues may have arisen through frequent ancient horizontal gene transfer of MutS (and MutL) from bacteria to archaea and eukaryotes via endosymbiotic ancestors of mitochondria and chloroplasts []. This superfamily represents the connector domain (domain 2) found in proteins of the MutS family. The structure of the MutS connector domain consists of a parallel β-sheet surrounded by four alpha helices, which is similar to the structure of the Holliday junction resolvase ruvC.
Protein Domain
Name: Proteinase inhibitor I25C, fetuin, conserved site
Type: Conserved_site
Description: The cystatins are a superfamily of similar proteins present in mammals, birds, fish, insects, plants and protozoa. In general they are potent peptidase inhibitors [ , , , ] belonging to MEROPS inhibitor family I25, clan IH. The type 1 cystatins or stefins (A and B) are mainly intracellular, the type 2 cystatins (C, D, E/M, F, G, S, SN and SA) are extracellular, and the type 3 cystatins (L- and H-kininogens) are intravascular proteins. All true cystatins inhibit cysteine peptidases of the papain family (MEROPS peptidase family C1), and some also inhibit legumain family enzymes (MEROPS peptidase family C13). These peptidases play key roles in physiological processes, such as intracellular protein degradation (cathepsins B, H and L), are pivotal in the remodelling of bone (cathepsin K), and may be important in the control of antigen presentation (cathepsin S, mammalian legumain). Moreover, the activities of such peptidases are increased in pathophysiological conditions, such as cancer metastasis and inflammation. Additionally, such peptidases are essential for several pathogenic parasites and bacteria. Thus in animals cystatins not only have capacity to regulate normal body processes and perhaps cause disease when down-regulated, but in other organisms may also participate in defence against biotic and abiotic stress.This family of proteinase inhibitors are cystatins belonging to MEROPS inhibitor family I25 (clan IH), subfamily I25C. They are primarily metalloprotease inhibitors, which include snake venom anti-hemorrhagic factors and the mammalian fetuins, for example:Anti-hemorrhagic factor BJ46a, which is a potent inhibitor of atrolysin and jararhagin (MEROPS peptidase family M12) from the venomous snake Bothrops jararaca [ ].Anti-hemorrhagic factor, HSF, from the Japanese Habu snake (Trimeresurus flavoviridis). HSF contains two N-terminal cystatin domains which show a remarkable sequence homology (about 50%) to those of plasma glycoproteins such as alpha 2-HS (human) and fetuin (bovine) and to a lesser extent to that of HRG (human). In spite of the presence of cystatin domains, HSF does not inhibit cysteine proteinases such as papain and cathepsin B but does inhibit several metalloproteases in Habu venom [ ].Mammalian fetuins have been demonstrated to bind to matrix metalloproteinase (MMPs, MEROPS peptidase family M10). This binding protects the MMPs from autolytic degradation without interfering with their enzymatic activity [ ].Fetuins are known to consist of three domains: two tandemly arranged N-terminal cystatin domains (D1 and D2) and an unrelated domain (D3) located in the C-terminal region [ , ]. When compared with the other members of this family, D3, especially its N-terminal half, varies greatly due to deletions, insertions or substitutions. Sequence comparisons suggest that the conformation of the human alpha2HS glycoprotein differs greatly from that of other members of this family. Human fetuin is a heterodimer of chains A and B, which are derived by cleavage of a connecting peptide from a common precursor. It is synthesised in the liver and selectively concentrated in bone matrix. It has a wide functional diversity having been shown to be involved in immune response, bone formation and resorption. Mammalian fetuin also called alpha-2-HS-glycoprotein, bone sialic acid-containing protein (BSP), countertrypin or PP63, is expressed in a tissue- and development-specific pattern, which seems to be significantly different between species [ , ].
Protein Domain
Name: HssA/B family related
Type: Family
Description: This entry include a group of Dictyostelium discoideum (Slime mould) proteins, including hssA/B proteins. HssA (also known as NK20) is a high-glycine-serine-rich small protein (8.37kDa), which has strong homology to previously reported cyclic-adenosine-monophosphate-inducible 2C and 7E proteins [ ]. This protein may form extended coil structures []. The serine-rich chaperone protein 1 (hssl63) suppresses aggregation of proteins with long polyglutamine tracts by protoming their degradation by the proteasome [].
Protein Domain
Name: Bunyavirus nucleocapsid (N) , N-terminal domain
Type: Homologous_superfamily
Description: Orthobunyavirus are enveloped viruses with a genome consisting of 3 ssRNA segments (called L, M and S). The nucleocapsid protein (also known as nucleoprotein) is encoded on the small (S) genomic RNA. The N protein is the major component of the nucleocapsids. This protein is thought to interact with the L protein, virus RNA and/or other N proteins [ ].This superfamily represents the N-terminal domain found in Bunyavirus nucleocapsid (N) protein.
Protein Domain
Name: Tetracyclin repressor SlmA-like, C-terminal domain
Type: Domain
Description: TetR family regulators are involved in the transcriptional control of multidrug efflux pumps, pathways for the biosynthesis of antibiotics, response to osmotic stress and toxic chemicals, control of catabolic pathways, differentiation processes, and pathogenicity [ ]. The TetR proteins identified in overm ultiple genera of bacteria and archaea share a common helix-turn-helix (HTH) structure in their DNA-binding domain. However, TetR proteins can work in different ways: they can bind a target operator directly to exert their effect (e.g. TetR binds Tet(A) gene to repress it in the absence of tetracycline), or they can be involved in complex regulatory cascades in which the TetR protein can either be modulated by another regulator or TetR can trigger the cellular response.This entry represents the C-terminal domain found in a number of different TetR transcription regulator proteins including SlmA proteins found in E. coli. Unlike other TetR proteins, SlmA functions not as a transcription regulator but rather as an NO (nucleoid occlusion) factor [ ]. TetR regulates the expression of the membrane-associated tetracycline resistance protein, TetA, which exports the tetracycline antibiotic out of the cell before it can attach to the ribosomes and inhibit protein synthesis []. TetR blocks transcription from the genes encoding both TetA and TetR in the absence of antibiotic. The C-terminal domain is multi-helical and is interlocked in the homodimer with the helix-turn-helix (HTH) DNA-binding domain [].
Protein Domain
Name: Tandem C2 domains nuclear protein, C2A domain
Type: Domain
Description: Tac2-N (tandem C2 protein in nucleus) contains two C2 domains and a short C terminus including a WHXL motif, which are key in stabilizing transport vesicles to the plasma membrane by binding to a plasma membrane. However, unlike the usual carboxyl-terminal-type (C-type) tandem C2 proteins, it lacks a transmembrane domain, a Slp-homology domain, and a Munc13-1-interacting domain. Homology search analysis indicate that no known protein motifs are located in its N terminus, making Tac2-N a novel class of Ca2+-independent, C-type tandem C2 proteins [ ].C2 domains fold into an 8-standed β-sandwich that can adopt 2 structural arrangements: type I and type II, distinguished by a circular permutation involving their N- and C-terminal beta strands. Many C2 domains are Ca2+-dependent membrane-targeting modules that bind a wide variety of substances including bind phospholipids, inositol polyphosphates, and intracellular proteins. Most C2 domain proteins are either signal transduction enzymes that contain a single C2 domain, such as protein kinase C, or membrane trafficking proteins which contain at least two C2 domains, such as synaptotagmin 1. However, there are a few exceptions to this including RIM isoforms and some splice variants of piccolo/aczonin and intersectin which only have a single C2 domain. C2 domains with a calcium binding region have negatively charged residues, primarily aspartates, that serve as ligands for calcium ions [ , ].This entry represents the first C2 domain of Tac2-N.
Protein Domain
Name: Tandem C2 domains nuclear protein, C2B domain
Type: Domain
Description: Tac2-N (tandem C2 protein in nucleus) contains two C2 domains and a short C terminus including a WHXL motif, which are key in stabilizing transport vesicles to the plasma membrane by binding to a plasma membrane. However, unlike the usual carboxyl-terminal-type (C-type) tandem C2 proteins, it lacks a transmembrane domain, a Slp-homology domain, and a Munc13-1-interacting domain. Homology search analysis indicate that no known protein motifs are located in its N terminus, making Tac2-N a novel class of Ca2+-independent, C-type tandem C2 proteins [].C2 domains fold into an 8-standed β-sandwich that can adopt 2 structural arrangements: type I and type II, distinguished by a circular permutation involving their N- and C-terminal beta strands. Many C2 domains are Ca2+-dependent membrane-targeting modules that bind a wide variety of substances including bind phospholipids, inositol polyphosphates, and intracellular proteins. Most C2 domain proteins are either signal transduction enzymes that contain a single C2 domain, such as protein kinase C, or membrane trafficking proteins which contain at least two C2 domains, such as synaptotagmin 1. However, there are a few exceptions to this including RIM isoforms and some splice variants of piccolo/aczonin and intersectin which only have a single C2 domain. C2 domains with a calcium binding region have negatively charged residues, primarily aspartates, that serve as ligands for calcium ions [ , ].This entry represents the second C2 domain of Tac2-N.
Protein Domain
Name: Serum albumin/Alpha-fetoprotein/Afamin
Type: Family
Description: A number of serum transport proteins are known to be evolutionarily related, including albumin, alpha-fetoprotein, vitamin D-binding protein and afamin [ , , ]. Albumin is the main protein of plasma; it binds water, cations (such as Ca2+, Na +and K +), fatty acids, hormones, bilirubin and drugs - its main function is to regulate the colloidal osmotic pressure of blood. Alphafeto- protein (alpha-fetoglobulin) is a foetal plasma protein that binds various cations, fatty acids and bilirubin. Vitamin D-binding protein binds to vitamin D and its metabolites, as well as to fatty acids. The biological role of afamin (alpha-albumin) has not yet been characterised. The 3D structure of human serum albumin has been determined by X-ray crystallography to a resolution of 2.8A [ ]. It comprises three homologous domains that assemble to form a heart-shaped molecule []. Each domain is a product of two subdomains that possess common structural motifs []. The principal regions of ligand binding to human serum albumin are located in hydrophobic cavities in subdomains IIA and IIIA, which exhibit similar chemistry. Structurally, the serum albumins are similar, each domain containing five or six internal disulphide bonds, as shown schematically below:+---+ +----+ +-----+ | | | | | |xxCxxxxxxxxxxxxxxxxCCxxCxxxxCxxxxxCCxxxCxxxxxxxxxCxxxxxxxxxxxxxxCCxxxxCxxxx | | | | | |+-----------------+ +-----+ +---------------+ This entry includes serum albumin, alpha-fetoprotein and afamin.
Protein Domain
Name: snRNP70, RNA recognition motif
Type: Domain
Description: This entry represents the RNA recognition motif (RRM) of snRNP70 (also known as U1-70K), a key component of the spliceosomal U1 snRNP, which is essential for recognition of the pre-mRNA 5' splice-site and the subsequent assembly of the spliceosome [ ]. U1-70K plays an essential role in targeting the U1 snRNP to the 5' splice site through protein-protein interactions with regulatory RNA-binding splicing factors, such as the RS protein ASF/SF2. It can also specifically bind to stem-loop I of the U1 small nuclear RNA (U1 snRNA) contained in the U1 snRNP complex. It also mediates the binding of U1C, another U1-specific protein, to the U1 snRNP complex [ ].U1-70K contains a conserved RNA recognition motif (RRM), followed by an adjacent glycine-rich region at the N-terminal half, and two serine/arginine-rich (SR) domains at the C-terminal half. The RRM is responsible for the binding of stem-loop I of U1 snRNA molecule. Additionally, the most prominent immunodominant region that can be recognized by auto-antibodies from autoimmune patients may be located within the RRM. The SR domains are involved in protein-protein interaction with SR proteins that mediate 5' splice site recognition [ ]. For instance, the first SR domain is necessary and sufficient for ASF/SF2 binding [, ].Proteins containing this domain also includes Drosophila U1-70K that is an essential splicing factor required for viability in flies, but its SR domain is dispensable [ ].
Protein Domain
Name: MoaB/Mog-like domain superfamily
Type: Homologous_superfamily
Description: MoaB/Mog domain, also known as Cnx1G domain, is found in the bacterial molybdenum cofactor (Moco) biosynthesis protein MoaB/Mog and N-terminal of the eukaryotic MoCF biosynthesis proteins, such as the Drosophila protein cinnamon, the Arabidopsis protein cnx1 and the mammal protein gephyrin [ ]. These proteins are involved in the final steps of Moco synthesis. In E. coli two proteins, MogA and MoeA are essential for Mo insertion into molybdopterin, while in plants and animals one fusion protein with two domains (G and E domains) fulfils this function. The G domain of Cnx1 and gephyrin shares similarity with MoaB/Mog, while their E domain displays similarities to the sulfurtransferase rhodanese (homologous to E. coli MoeA/chlE) []. Structurally, MogA is folded into a compact molecule with alpha/beta/alpha architecture and forms a trimer []. The Cnx1G domain has been shown to bind molybdopterin []. This domain is also found in N-terminal of the FAD synthases belonging to a family that catalyses the adenylation of flavin mononucleotide (FMN) to form flavin adenine dinucleotide (FAD) coenzyme [ ]. CinA from Thermus thermophilus is shown to have both nicotinamide mononucleotide deamidase and ADP-ribose pyrophosphatase activities, with ADP-ribose pyrophosphatase activity attributed to the N-terminal domain []. The structure of the MoaB/Mog domain has three layers (alpha/beta/alpha) with mixed β-sheet of five strands where the strand 5 is antiparallel to the rest.
Protein Domain
Name: Homeobox domain, metazoa
Type: Domain
Description: The homeobox domain (homeodomain) is a 60-residue motif first identified in a number of Drosophila homeotic and segmentation proteins, but now known to be well-conserved in many other animals, including vertebrates [ , ]. Proteins containing homeobox domains are likely to play an important role in development - most are known to be sequence-specific DNA-binding transcription factors. The domain binds DNA through a helix-turn-helix (HTH) structure.The HTH motif is characterised by 2 α-helices, which make intimate contacts with the DNA and are joined by a short turn. The second helix binds to DNA via a number of hydrogen bonds and hydrophobic interactions, which occur between specific side chains and the exposed bases and thymine methyl groups within the major groove of the DNA. The first helix helps to stabilise the structure. The motif is very similar in sequence and structure in a wide range of DNA-binding proteins (for example, cro and repressor proteins, homeotic proteins, amongst others). One of the principal differences between HTH motifs in these different proteins arises from the stereo-chemical requirement for glycine in the turn (position 9 of the motif), which is needed to avoid steric interference of the β-carbon with the main chain: for cro and repressor proteins the glycine appears to be mandatory, while for many of the homeotic and other DNA-binding proteins the requirement is relaxed. This entry represents the eukaryotic homeobox domain.
Protein Domain
Name: PKD domain
Type: Domain
Description: The polycystic kidney disease (PKD) domain is an 80-90 amino acid module originally found in 16 copies in the extracellular segment of polycystin-1, a large cell surface glycoprotein. Polycystin-1 is encoded by the PKD1 gene, which is mutated in autosomal dominant polycystic kidney disease (ADPKD). Although its function is unknown, it may be involved in protein-protein and protein-carbohydrate interactions based on its predicted domain structure. One or more copies of the PKD domain are also found in several other extracellular proteins from higher organisms, eubacteria, and archaebacteria. Singles copies of the PKD domain are found in the melanocytes heavily glycosylated cell-surface proteins Pmel 17, MMP and Nmp. Some bacterial collagenases and proteases also contain a single PKD domain adjacent to their catalytic domains, whereas four copies are present in the heavily glycosylated surface layer protein of archaebacteria []. The PKD modules are often observed, within a same protein sequence, in association with FnIII domains [].The most conserved motif is the WDFGDGS sequence that is found in the central part of many PKD domains and could play a structural role [ , ]. Determination of the solution structure of the first PKD domain from human polycystin-1 has shown that the module is built from two β-sheet, one of three strands and one of four strands, which are packed face-to-face [].
Protein Domain
Name: 3a-like viroporin, transmembrane domain, alpha/betacoronavirus
Type: Domain
Description: All coronaviruses have a similar genomic structure comprising two large open reading frames (ORFs) (ORF1a and ORF1b) encoding the coronavirus replicase. At the 3' end, the genome encodes four structural proteins (S, E, M and N) and a variable number of accessory proteins. Accessory proteins play an important role in virus-host interactions, especially in antagonizing or regulating host immunity and virus adaptation to the host. There are large variations in the number of accessory proteins (1-10) among coronaviruses. Betacoronavirus (bCoVs) have 3-5 accessory proteins, except for SARS-CoV and SARS-CoV-2, which possess the largest number of accessory proteins among all coronaviruses (10 and 9, respectively). 3a-like accessory proteins are found in multiple alpha and betacoronavirus lineages that infect bats and humans. They are transmembrane proteins of the viroporin family that form ion channels in the host membrane and have been implicated in inducing apoptosis, pathogenicity, and virus release. The induction of cytokine storms in COVID-19 patients might be linked to ORF3a mediated activation of inflammasome. 3a-like viroporins contain a transmembrane domain (TM) and a cytosolic domain (CD) [ , , , , , , ].This is the transmembrane domain (TM) of 3a-like viroporins, which is composed of three helices. The 3a-like viroporin forms a dimer and the six transmembrane helices of the dimer form an ion channel with polar/charged residues in the interior of the channel capable of conducting cations [ ].
Protein Domain
Name: IRSp53, SH3 domain
Type: Domain
Description: This entry represents the SH3 domain of IRSp53. The SH3 domain of IRSp53 has been shown to bind the proline-rich C terminus of EspFu (E. coli secreted protein F-like from prophage U) [ ].IRSp53, also known as IRS-58 or BAIAP2 (brain-specific angiogenesis inhibitor 1-associated protein 2), is an I-BAR (Bin/amphipysin/Rvs) domain containing protein. BAR domain forms an anti-parallel all-helical dimer, with a curved (banana-like) shape, that promotes membrane tubulation. BAR domain proteins can be classified into three types: BAR, F-BAR and I-BAR. BAR and F-BAR proteins generate positive membrane curvature, while I-BAR proteins induce negative curvature [ ].IRSp53 is an adaptor protein that acts at the membrane-actin interface, coupling membrane deformation with F-actin polymerisation [ ]. It is involved in the formation of filopodia and lamellipodia in cultured mesenchymal cells and contributes to assembly/maintenance of tight junctions in cultured epithelial cells []. IRSp53 contains an N-terminal I-BAR domain, followed by a partial CRIB domain and a SH3 domain. It binds to small GTPase Cdc42, Rac1 and WAVE1 []. IRSp53 binds Rac through its I-BAR domain and to WAVE through its SH3 domain, and thus contributes to membrane ruffling []. Its SH3 domain also interacts with other regulators of actin dynamics, such as WAVE2, Mena, mDia1, Dynamin1, Eps8 and N-WASP []. This protein has been associated with human breast cancer and Gilles de la Tourette syndrome [ ].
Protein Domain
Name: PKD domain superfamily
Type: Homologous_superfamily
Description: The polycystic kidney disease (PKD) domain is an 80-90 amino acid module originally found in 16 copies in the extracellular segment of polycystin-1, a large cell surface glycoprotein. Polycystin-1 is encoded by the PKD1 gene, which is mutated in autosomal dominant polycystic kidney disease (ADPKD). Although its function is unknown, it may be involved in protein-protein and protein-carbohydrate interactions based on its predicted domain structure. One or more copies of the PKD domain are also found in several other extracellular proteins from higher organisms, eubacteria, and archaebacteria. Singles copies of the PKD domain are found in the melanocytes heavily glycosylated cell-surface proteins Pmel 17, MMP and Nmp. Some bacterial collagenases and proteases also contain a single PKD domain adjacent to their catalytic domains, whereas four copies are present in the heavily glycosylated surface layer protein of archaebacteria []. The PKD modules are often observed, within a same protein sequence, in association with FnIII domains [].The most conserved motif is the WDFGDGS sequence that is found in the central part of many PKD domains and could play a structural role [ , ]. Determination of the solution structure of the first PKD domain from human polycystin-1 has shown that the module is built from two β-sheet, one of three strands and one of four strands, which are packed face-to-face [].
Protein Domain
Name: GB1/RHD3-type guanine nucleotide-binding (G) domain
Type: Domain
Description: The P-loop guanosine triphosphatases (GTPases) control a multitude of biological processes, ranging from cell division, cell cycling,and signal transduction, to ribosome assembly and protein synthesis. GTPases exert their control by interchanging between an inactive GDP-bound state andan active GTP-bound state, thereby acting as molecular switches. The common denominator of GTPases is the highly conserved guanine nucleotide-binding (G)domain that is responsible for binding and hydrolysis of guanine nucleotides. The GB1/RHD3 GTPase family contains a large G domain (~230 amino acids). It is widespread in eukaryotes, but not detectable in bacteria or archaea. Oneconserved subfamily is typified by the Arabidopsis protein root hair defective 3 (RHD3), whose othologs are present in all crown group eukaryotes. The othersubfamily is typified by the interferon gamma-induced antiviral GB1 protein that is conserved in animals. The other GTPases of this subfamily are the brain finger proteins (BFPs), in which the GTPase domain is combined with anN-terminal RING finger domain, which implicates these proteins in ubiquitin-mediated signaling. Most members of this family have alarge C-terminal, α-helical extension that probably participates in protein-protein interactions. The GB1/RHD3-type G domain has a low intrinsicaffinity for nucleotide and often depends on nucleotide-dependent homodimerization to facilitate GTP hydrolysis [, , , , , ].The large GB1/RHD3-type G domain consists of a six-stranded β-sheet surrounded by eight helices. It contains the conservedsequence elements of GTP-binding proteins with modifications [ , , ].
Protein Domain
Name: Rhabdovirus glycoprotein
Type: Family
Description: Different families of ssRNA negative-strand viruses contain glycoproteins responsible for forming spikes on the surface of the virion. The glycoprotein spike is made up of a trimer of glycoproteins. These proteins are frequently abbreviated to G protein. Channel formed by glycoprotein spike is thought to function in a similar manner to Influenza virus M2 protein channel, thus allowing a signal to pass across the viral membrane to signal for viral uncoating [ , ].
Protein Domain
Name: Liprin-beta, SAM domain repeat 1
Type: Domain
Description: Liprin-beta proteins contain three copies (repeats) of the SAM (sterile alpha motif) domain, which is a protein-protein interaction domain. This entry represents the repeat 1 domain. Liprin-beta may form heterodimers with liprin-alpha proteins through their SAM domains [ ]. Liprin beta1 has been shown to interacts with metastasis-associated protein S100A4 (Mts1), and this interaction results in the inhibition of liprin-beta-1 phosphorylation by protein kinase C and protein kinase CK2 in vitro [].
Protein Domain
Name: Liprin-beta, SAM domain repeat 2
Type: Domain
Description: This entry represents the SAM (sterile alpha motif) domain repeat 2 of liprin-beta. Liprin-beta proteins contain three copies (repeats) of the SAM domain, which is a protein-protein interaction domain. They may form heterodimers with liprin-alpha proteins through their SAM domains [ ]. Liprin beta1 has been shown to interacts with metastasis-associated protein S100A4 (Mts1), and this interaction results in the inhibition of liprin-beta-1 phosphorylation by protein kinase C and protein kinase CK2 in vitro [].
Protein Domain
Name: Ubiquitin-conjugating enzyme/RWD-like
Type: Homologous_superfamily
Description: This superfamily represents a structural domain with an α-β(4)-α(3) core fold. Domains of this structure are found in:Ubiquitin conjugating enzyme E2, as well as related proteins such as ubiquitin carrier protein 4 and ubiquitin-protein ligase W [ ].The UEV domain in tumour susceptibility gene 101 [ ] and vacuolar protein sorting-associated protein [].RWD domain, found in RING finger and WD repeat-containing proteins, such as EIF-2 kinase 4 (GCN2-like protein) [ ].UFC1-like domain found in Ufm1-conjugating enzyme 1 [ ].
Protein Domain
Name: ABC transporter, periplasmic substrate-binding protein, predicted
Type: Family
Description: Bacterial high affinity transport systems are involved in active transport of solutes across the cytoplasmic membrane. Most of the bacterial ABC (ATP-binding cassette) importers are composed of one or two transmembrane permease proteins, one or two nucleotide-binding proteins and a highly specific periplasmic solute-binding protein. In Gram-negative bacteria the solute-binding proteins are dissolved in the periplasm, while in archaea and Gram-positive bacteria, their solute-binding proteins are membrane-anchored lipoproteins [, ].This group represents a predicted ABC transporter, periplasmic substrate-binding protein.
Protein Domain
Name: Snf7 family
Type: Family
Description: Snf7 family members are small coil-coiled proteins that share protein sequence similarity with budding yeast Snf7, which is part of the ESCRT-III complex that is required for endosome-mediated trafficking via multivesicular body (MVB) formation and sorting [ ].Proteins in this entry also includes human CHMPs (charged multivesicular body proteins), budding yeast Did4/Did2, Arabidopsisvacuolar protein sorting-associated proteins and the archaean Sulfolobus acidocaldariuscell division protein B which is a component required for cell division, forming polymers with segregating nucleoids [ ].
Protein Domain
Name: Capsid, astroviral
Type: Family
Description: The astrovirus genome is apparently organised with nonstructural proteins encoded at the 5' end and structural proteins at the 3' end [ ]. Proteins in this family are encoded by astrovirus ORF2, one of the three astrovirus ORFs (1a, 1b, 2). The proteins contain a viral RNA-dependent RNA polymerase motif [ ]. The 87kDa precursor polyproteinundergoes an intracellular cleavage to form a 79kDa protein. Subsequently, extracellular trypsin cleavage yields the three proteins forming the infectious virion [].
Protein Domain
Name: HIT-like superfamily
Type: Homologous_superfamily
Description: The histidine triad motif (HIT) consists of the conserved sequence HXHXHXX (where X is a hydrophobic amino acid) at the enzymatic catalytic centre, in which the second histidine is strictly conserved and participates in catalysis with the third histidine [ , , ]. Proteins containing HIT domains form a superfamily of nucleotide hydrolases and transferases that act on the alpha-phosphate of ribonucleotides [, ]. They are highly conserved from archaea to humans and are involved in galactose metabolism, DNA repair, and tumor suppression []. HIT-containing proteins can be divided in five families based on catalytic specificities, sequence compositions, and structural similarities of its members: Hint family of protein kinase-interacting proteins, the most ancient class in this superfamily. These include adenosine 5'-monophosphoramide hydrolases (e.g. HIT-nucleotide-binding protein, or HINT) [ , ]. They also have a conserved zinc-binding motif C-X-X-C (where C is a cysteine residue and X is a hydrophobic residue), and a zinc ion is coordinated by these cysteine residues, together with the first histidine residue [].Fragile HIT protein, or FINT, whose name is due to its high rate of mutation at its locus on chromosome 3 in many cancers has been characterised as a tumor suppressor and plays a role in the hydrolysis of dinucleotide polyphosphates [ , ]. HINT and FINT HIT domains have a topology similar to that found in the N-terminal of protein kinases []. GalT family. These include specific nucleoside monophosphate transferases (e.g. galactose-1-phosphate uridylyltransferase, diadenosine tetraphosphate phosphorylase, and adenylyl sulphate:phosphate adenylytransferase). These HIT domains are a duplication consisting of 2 HIT-like motifs. This family binds zinc and iron [ , ].Aprataxin, which hydrolyses both dinucleotide polyphosphates and phophoramidates, and is involved in DNA repair systems [ , ].mRNA decapping enzyme family. These include enzymes such as DcpS and Dcp2. The HIT-domain is usually C-terminal in these proteins [ , ].This superfamily also includes CDP-diacylglycerol pyrophosphatases, CDH, which play a role in phospholipid metabolism and regulates phosphatidylinositol levels [], the C-terminal CwfJ domains of CWF19-like protein DRN1 from Saccharomyces cerevisiae () which is involved branched RNA metabolism, modulating the turnover of lariat-intron pre-mRNAs by the lariat-debranching enzyme DBR1 and its homologues. This C-terminal Cwfj domains contain evolutionarily conserved cysteine and histidine residues in an arrangement similar to the CCCH-class of zinc fingers [ , ].
Protein Domain
Name: SMAD MH1 domain superfamily
Type: Homologous_superfamily
Description: Smad proteins are signal transducers and transcriptional comodulators of the TGF-beta superfamily of ligands, which play a central role in regulating a broad range of cellular responses, including cell growth, differentiation, and specification of developmental fate, in diverse organisms from Caenorhabditis elegans to humans. Ligand binding to specific transmembrane receptor kinases induces receptor oligomerisation and phosphorylation of the receptor specific Smad protein (R-Smad) in the cytoplasm. The R-Smad proteins regulate distinct signalling pathways. Smad1, 5 and 8 mediate the signals of bone morphogenetic proteins (BMPs), while Smad2 and 3 mediate the signals of activins and TGF-betas. Upon ligand stimulation, R-Smad proteins are phosphorylated at the conserved C-terminal tail sequence, SS*xS* (where S* denotes a site of phosphorylation). The phosphorylated states of R-Smad proteins form heteromeric complexes with Smad4 and are translocated into the nucleus. In the nucleus, the heteromeric complexes function as gene-specific transcription activators by binding to promoters and interacting with transcriptional coactivators. Smad6 and Smad7 are inhibitory Smad proteins that inhibit TGF-beta signalling by interfering with either receptor-mediated phosphorylation or hetero-oligomerisation between Smad4 and R-Smad proteins. Smad proteins comprise two conserved MAD homology domains, one in the N terminus (MH1) and one in the C terminus (MH2), separated by a more variable, proline-rich linker region. The MH1 domain has a role in DNA binding and negatively regulates the functions of MH2 domain, whereas the MH2 domain is responsible for transactivation and mediates phosphorylation-triggered heteromeric assembly between Smad4 and R-Smad [, ]. The MH1 domain adopts a compact globular fold, with four alpha helices, six short beta strands, and five loops. The N-terminal half of the sequence consists of three alpha helices, and the C-terminal half contains all six beta strands, which form two small beta sheets and one beta hairpin. The fourth alpha helix is located in the hydrophobic core of the molecule, surrounded by the N-terminal three alpha helices on one side and by the two small beta sheets and the beta hairpin on the other side. These secondary structural elements are connected with five intervening surface loops. The MH1 domain employs a novel DNA-binding motif, an 11-residue β-hairpin formed by strands B2 and B3, to contact DNA in the major groove. Two residues in the L3 loop and immediately preceding strand B2 also contribute significantly to DNA recognition. The beta hairpin appears to protrude outward from the globular MH1 core [].
Protein Domain
Name: CRISPR-associated protein, CasD
Type: Family
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. This entry represents CasD proteins, which includes CT1976 from Chlorobium tepidum. It shares a small N-terminal homology region with members of several other CRISPR/Cas subtypes; the families that share this region are denoted Cas5. The family is designated Subtype I-E, for CRISPR-associated protein Cas5, Ecoli subtype. CasD is a component of Cascade, which participates in CRISPR interference, the third stage of CRISPR immunity. Cascade binds both crRNA and in a sequence-specific manner negatively supercoiled dsDNA target. This leads to the formation of an R-loop in which the crRNA binds the target DNA, displacing the noncomplementary strand. Cas3 is recruited to Cascade, nicks target DNA and then unwinds and cleaves the target, leading to DNA degradation and invader neutralization [ , ].
Protein Domain
Name: Immunoglobulin/major histocompatibility complex, conserved site
Type: Conserved_site
Description: The basic structure of immunoglobulin (Ig) molecules is a tetramer of two light chains and two heavy chains linked by disulphide bonds. There are two types of light chains: kappa and lambda, each composed of a constant domain (CL) and a variable domain (VL). There are five types of heavy chains: alpha, delta, epsilon, gamma and mu, all consisting of a variable domain (VH) and three (in alpha, delta and gamma) or four (in epsilon and mu) constant domains (CH1 to CH4). Ig molecules are highly modular proteins, in which the variable and constant domains have clear, conserved sequence patterns. The domains in Ig and Ig-like molecules are grouped into four types: V-set (variable; ), C1-set (constant-1; ), C2-set (constant-2; ) and I-set (intermediate; ) [ ]. Structural studies have shown that these domains share a common core Greek-key β-sandwich structure, with the types differing in the number of strands in the β-sheets as well as in their sequence patterns [, ].Immunoglobulin-like domains that are related in both sequence and structure can be found in several diverse protein families. Ig-like domains are involved in a variety of functions, including cell-cell recognition, cell-surface receptors, muscle structure and the immune system [ ]. Major Histocompatibility Complex (MHC) glycoproteins are heterodimeric cell surface receptors that function to present antigen peptide fragments to T cells responsible for cell-mediated immune responses. MHC molecules can be subdivided into two groups on the basis of structure and function: class I molecules present intracellular antigen peptide fragments (~10 amino acids) on the surface of the host cells to cytotoxic T cells; class II molecules present exogenously derived antigenic peptides (~15 amino acids) to helper T cells. MHC class I and II molecules are assembled and loaded with their peptide ligands via different mechanisms. However, both present peptide fragments rather than entire proteins to T cells, and are required to mount an immune response.Some of the proteins in this group are responsible for the molecular basis of the blood group antigens, surface markers on the outside of the red blood cell membrane. Most of these markers are proteins, but some are carbohydrates attached to lipids or proteins [Reid M.E., Lomas-Francis C. The Blood Group Antigen FactsBook Academic Press, London / San Diego, (1997)]. Lutheran blood group glycoprotein (B-CAM cell surface glycoprotein) (Auberger B antigen) (F8/G253 antigen) belongs to the Lutheran blood group system and is associated with Lu(a/b), Au(a/b), LU6 to LU20 antigens.
Protein Domain
Name: CRISPR-associated protein, Csf1 family
Type: Family
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. This entry represents the Csf1 family of Cas proteins. Members of this family show up near CRISPR repeats in Acidithiobacillus ferrooxidans ATCC 23270, Azoarcus sp. (strain EbN1), and Rhodoferax ferrireducens (strain DSM 15236/ATCC BAA-621/T118). In the latter two species, the CRISPR/cas locus is found on a plasmid. This family is one of several characteristic of a type of cas gene cluster we designate Aferr after A. ferrooxidans, where it is both chromosomal and the only type of cas gene cluster found. The gene is designated csf1 (CRISPR/cas Subtype as in A. ferrooxidans protein 1), as it lies closest to the repeats.
USDA
InterMine logo
The Legume Information System (LIS) is a research project of the USDA-ARS:Corn Insects and Crop Genetics Research in Ames, IA.
LegumeMine || ArachisMine | CicerMine | GlycineMine | LensMine | LupinusMine | PhaseolusMine | VignaMine | MedicagoMine
InterMine © 2002 - 2022 Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, United Kingdom