Search our database by keyword

- or -

Examples

  • Search this entire website. Enter identifiers, names or keywords for genes, pathways, authors, ontology terms, etc. (e.g. eve, embryo, zen, allele)
  • Use OR to search for either of two terms (e.g. fly OR drosophila) or quotation marks to search for phrases (e.g. "dna binding").
  • Boolean search syntax is supported: e.g. dros* for partial matches or fly AND NOT embryo to exclude a term

Search results 12201 to 12300 out of 30763 for seed protein

Category restricted to ProteinDomain (x)

0.035s

Categories

Category: ProteinDomain
Type Details Score
Protein Domain
Name: DNA mismatch repair, conserved site
Type: Conserved_site
Description: Mismatch repair contributes to the overall fidelity of DNA replication [ ]. It involves the correction of mismatched base pairs that have been missed by the proofreading element of the DNA polymerase complex. The sequence of some proteins involved in mismatch repair in different organisms have been found to be evolutionarily related. These proteins include:Escherichia coli and Salmonella typhimurium mutL protein [ ]. MutL is required for dam-dependent methyl-directed DNA repair.Streptococcus pneumoniae hexB protein [ ]. The Hex system is nick directed.Yeast proteins PMS1 and MLH1 [ ].Human protein MLH1 [ ] which is involved in a form of familial hereditary nonpolyposis colon cancer (HNPCC).This entry represents a perfectly conserved heptapeptide which is located in the N-terminal section of these proteins.
Protein Domain
Name: Sulfur oxidation c-type cytochrome SoxX, type II
Type: Family
Description: Members of this family are SoxX, a c-type cytochrome with a CxxCH motif, part of a heterodimer with SoxA. SoxAX cytochromes play a key role in bacterial thiosulfate oxidation [ , ].There are three distinct types of SoxAX proteins. Type I and II SoxAX proteins are heterodimers, while the heterotrimeric SoxAXK proteins form the third group. Type I SoxAX proteins have SoxA subunits with two heme groups and SoxX subunits with a molecular mass of approximately 14kDa, while the Type II SoxA proteins only contain one heme group, and are associated with SoxX proteins that have a molecular mass of approximately 20kDa. The increased molecular mass inthese SoxX proteins is largely due to the presence of an N-terminal extension [ ].
Protein Domain
Name: Sulfur oxidation c-type cytochrome SoxX
Type: Family
Description: Members of this family are SoxX, a c-type cytochrome with a CxxCH motif, part of a heterodimer with SoxA. SoxAX cytochromes play a key role in bacterial thiosulfate oxidation [ , ].There are three distinct types of SoxAX proteins. Type I and II SoxAX proteins are heterodimers, while the heterotrimeric SoxAXK proteins form the third group. Type I SoxAX proteins have SoxA subunits with two heme groups and SoxX subunits with a molecular mass of approximately 14kDa, while the Type II SoxA proteins only contain one heme group, and are associated with SoxX proteins that have a molecular mass of approximately 20kDa. The increased molecular mass inthese SoxX proteins is largely due to the presence of an N-terminal extension [ ].
Protein Domain
Name: Npc2 like, ML domain
Type: Domain
Description: The MD-2-related lipid-recognition (ML) domain is implicated in lipid recognition, particularly in the recognition of pathogen related products. It has an immunoglobulin-like β-sandwich fold similar to that of E-set Ig domains. This domain is present in proteins from plants, animals and fungi, including the following proteins: Epididymal secretory protein E1 (also known as Niemann-Pick C2 protein - Npc2), which is known to bind cholesterol. Niemann-Pick disease type C2 is a fatal hereditary disease characterised by accumulation of low-density lipoprotein-derived cholesterol in lysosomes [ ].House-dust mite allergen proteins such as Der f 2 from Dermatophagoides farinae and Der p 2 from Dermatophagoides pteronyssinus [ ].This entry refers to the ML domain found in metazoan Npc2 as well and some similar proteins.
Protein Domain
Name: Thiamine/thiamin pyrophosphate-binding periplasmic protein, ABC transporter
Type: Family
Description: Bacterial high affinity transport systems are involved in active transport of solutes across the cytoplasmic membrane. Most of the bacterial ABC (ATP-binding cassette) importers are composed of one or two transmembrane permease proteins, one or two nucleotide-binding proteins and a highly specific periplasmic solute-binding protein. In Gram-negative bacteria the solute-binding proteins are dissolved in the periplasm, while in archaea and Gram-positive bacteria, their solute-binding proteins are membrane-anchored lipoproteins [ , ].This entry includes thiB from Proteobacteria and its homologues from Archaea, Actinobacteria, Euryarchaeota and other bacteria. ThiB is a thiamine/thiamine pyrophosphate-binding protein and part of the ABC transporter complex ThiBPQ involved in thiamine import. ThiBPQ is required for transport of thiamine and thiamine pyrophosphate in Salmonella typhimurium []. This entry also includes tbpA from E. coli [].
Protein Domain
Name: Cyclohexadienyl dehydratase PheC
Type: Domain
Description: This entry represents the periplasmic binding protein type 2 (PBP2) domain found in cyclohexadienyl dehydratase PheC [ ]. Proteins containing this domain catalyze the decarboxylation of prephenate to phenylpyruvate in the alternative phenylalanine biosynthesis pathway in some proteobacteria and archaea []. The PheC proteins belong to the PBP2 superfamily of periplasmic binding proteins that differ in size and ligand specificity, but have similar tertiary structures consisting of two globular subdomains connected by a flexible hinge. They have been shown to bind their ligand in the cleft between these domains in a manner resembling a Venus flytrap []. Since the PheC proteins are so similar to periplasmic binding proteins, (PBP), it is evolutionary plausible that several pre-existing PBP proteins might have been recruited to perform the enzymatic function.
Protein Domain
Name: COMMD1 N-terminal domain
Type: Domain
Description: COMM (copper metabolism gene MURR1) domain proteins constitute a family initially identified as interacting partners of COMMD1 (previously known as MURR1), the prototype member of this protein family. COMMD1 is a multifunctional protein that has been shown to participate in two apparently distinct activities, regulation of the transcription factor NF-kappa-B and control of copper metabolism. The family is defined by the presence of a C-terminal motif termed COMM domain, which functions as an interface for protein-protein interactions. The proteins designated as COMMD or COMM domain containing 1-10 are extensively conserved in multicellular eukaryotic organisms [ , ].This helical domain is found at the N terminus of COMMD1. This domain adopts an α-helical structure unlike any other helical protein [ ].
Protein Domain
Name: Sprouty
Type: Family
Description: Sprouty (Spry) and Spred (Sprouty related EVH1 domain) proteins have been identified as inhibitors of the Ras/mitogen-activated protein kinase (MAPK) cascade, a pathway crucial for developmental processes initiated by activation of various receptor tyrosine kinases [ , ]. These proteins share a conserved, C-terminal cysteine-rich region, the SPR domain. This domain has been defined as a novel cytosol to membrane translocation domain [, , , ]. It has been found to be a PtdIns(4,5)P2-binding domain that targets the proteins to a cellular localization that maximizes their inhibitory potential [, ]. It also mediates homodimer formation of these proteins [, ].The SPR domain can occur in association with the WH1 domain (see ) (located in the N-terminal part of the proteins) in the Spred proteins.
Protein Domain
Name: Clathrin, heavy chain, linker, core motif
Type: Domain
Description: Proteins synthesized on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. These vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transport [ ]. Clathrin coats contain both clathrin (acts as a scaffold) and adaptor complexes that link clathrin to receptors in coated vesicles. Clathrin-associated protein complexes are believed to interact with the cytoplasmic tails of membrane proteins, leading to their selection and concentration. The two major types of clathrin adaptor complexes are the heterotetrameric adaptor protein (AP) complexes, and the monomeric GGA (Golgi-localising, Gamma-adaptin ear domain homology, ARF-binding proteins) adaptors [, ].Clathrin is a trimer composed of three heavy chains and three light chains, each monomer projecting outwards like a leg; this three-legged structure is known as a triskelion [, ]. The heavy chains form the legs, their N-terminal β-propeller regions extending outwards, while their C-terminal α-α-superhelical regions form the central hub of the triskelion. Peptide motifs can bind between the β-propeller blades. The light chains appear to have a regulatory role, and may help orient the assembly and disassembly of clathrin coats as they interact with hsc70 uncoating ATPase []. Clathrin triskelia self-polymerise into a curved lattice by twisting individual legs together. The clathrin lattice forms around a vesicle as it buds from the TGN, plasma membrane or endosomes, acting to stabilise the vesicle and facilitate the budding process []. The multiple blades created when the triskelia polymerise are involved in multiple protein interactions, enabling the recruitment of different cargo adaptors and membrane attachment proteins []. This entry represents the core motif for the α-helical zigzag linker region connecting the conserved N-terminal β-propeller region to the C-terminal α-α-superhelical region in clathrin heavy chains [ ].
Protein Domain
Name: ShKT domain
Type: Domain
Description: BgK, a 37-residue peptide toxin from the sea anemone Bunodosoma granulifera, and ShK, a 35-residue peptide toxin from the sea anemone Stichodactyla helianthus, are potent inhibitors of K channels. There is a large superfamily of proteins that contains domains (referred to as ShKT domains) resembling these two toxins. Many of these proteins are metallopeptidases, whereas others are prolyl-4-hydroxylases, tyrosinases, peroxidases, oxidoreductases, or proteins containing epidermal growth factor-like domains, thrombospondin-type repeats, or trypsin-like serine protease domains []. The ShKT domain has also been called NC6 (nematode six-cysteine) domain [], SXC (six-cysteine) domain [, , , ] and ICR (ion channel regulator) [, ]. The ShKT domain is short (36 to 42 amino acids), with six conserved cysteines and a number of other conserved residues. The fold adopted by the ShKT domain contains two nearly perpendicular stretches of helices, with no additional canonical secondary structures []. The globular architecture of the ShKT domain is stabilised by three disulfides, one of them linking the two helices. In venomous creatures, the ShKT domain may have been modified to give rise to potent ion channel blockers, whereas the incorporation of this domain into plant oxidoreductases and prolyl hydroxylases and into worm astacin-like metalloproteases and trypsin-like serine proteases produced enzymes with potential channel-modulatory activity.Some proteins known to contain a ShKT domain are listed below:Caribbean sea anemone ShK, a potassium channel toxin [ ]. Sea anemone BgK, a potassium channel toxin [ ].Toxocara canis family of secreted mucins Tc-MUC-1 to -5, which are implicated in immune evasion. They combine two evolutionarily distinct modules, the mucin and ShkT domains [ , ].Some Caenorhabditis elegans astacin-like proteins (nematode astacins, NAS), metalloproteases [ ].Vertebrate cysteine-rich secretory proteins (Crisp) [ ]. Mammalian microfibrillar-associated protein 2 (MFAP2 or MAGP1), a matrix protein.Plant prolyl 4-hydroxylase.
Protein Domain
Name: Signal peptidase complex subunit 1
Type: Family
Description: Translocation of polypeptide chains across the endoplasmic reticulum (ER) membrane is triggered by signal sequences. Subsequently, signal recognition particle interacts with its membrane receptor and the ribosome-bound nascent chain is targeted to the ER where it is transferred into a protein-conducting channel. At some point, a second signal sequence recognition event takes place in the membrane and translocation of the nascent chain through the membrane occurs. The signal sequence of most secretory and membrane proteins is cleaved off at this stage. Cleavage occurs by the signal peptidase complex (SPC) as soon as the lumenal domain of the translocating polypeptide is large enough to expose its cleavage site to the enzyme. The signal peptidase complex is possibly also involved in proteolytic events in the ER membrane other than the processing of the signal sequence, for example the further digestion of the cleaved signal peptide or the degradation of membrane proteins [ ]. This family represents the Signal peptidase complex subunit 1 (SPCS1) and its homologues, such as Spc1 from budding yeasts. The signal peptidase complex cleaves the signal sequence from proteins targeted to the endoplasmic reticulum (ER). Mammalian signal peptidase is as a complex of five different polypeptide chains [ ], while the budding yeast SPC comprises four proteins []. Budding yeast Spc1 has been shown to be a nonessential component of the signal peptidase complex []. However, the Drosophila spase12 (yeast Spc1 homologue) null alleles are embryonic lethal [].Interestingly, human SPC12 has been linked to post-translational processing of proteins involved in virion assembly and secretion from flaviviruses [ , , ]. Signal peptidase complex-like protein DTM1 from Oryza sativa (rice) also belongs to this family. This protein unctions in tapetum development during early meiosis. It may play a role in the endoplasmic reticulum (ER) membrane in the early stages of tapetum development in anthers [ ].
Protein Domain
Name: SH2 domain
Type: Domain
Description: The Src homology 2 (SH2) domain is a protein domain of about 100 amino-acid residues first identified as a conserved sequence region between the oncoproteins Src and Fps [ ]. Similar sequences were later found in many other intracellular signal-transducing proteins []. SH2 domains function as regulatory modules of intracellular signalling cascades by interacting with high affinity to phosphotyrosine-containing target peptides in a sequence-specific, SH2 domains recognise between 3-6 residues C-terminal to the phosphorylated tyrosine in a fashion that differs from one SH2 domain to another, and strictly phosphorylation-dependent manner [, , , ]. They are found in a wide variety of protein contexts e.g., in association with catalytic domains of phospholipase Cy (PLCy) and the non-receptor protein tyrosine kinases; within structural proteins such as fodrin and tensin; and in a group of small adaptor molecules, i.e Crk and Nck. The domains are frequently found as repeats in a single protein sequence and will then often bind both mono- and di-phosphorylated substrates. The structure of the SH2 domain belongs to the α+β class, its overall shape forming a compact flattened hemisphere. The core structural elements comprise a central hydrophobic anti-parallel β-sheet, flanked by 2 short α-helices. The loop between strands 2 and 3 provides many of the binding interactions with the phosphate group of its phosphopeptide ligand, and is hence designated the phosphate binding loop, the phosphorylated ligand binds perpendicular to the β-sheet and typically interacts with the phosphate binding loop and a hydrophobic binding pocket that interacts with a pY+3 side chain. The N- and C-termini of the domain are close together in space and on the opposite face from the phosphopeptide binding surface and it has been speculated that this has facilitated their integration into surface-exposed regions of host proteins [ ].
Protein Domain
Name: MIF4G-like, type 2
Type: Domain
Description: This entry represents an MIF4G-like domain. MIF4G domains share a common structure but can differ in sequence. This entry is designated "type 2", and is found in nuclear cap-binding proteins and eIF4G.The MIF4G domain is a structural motif with an ARM (Armadillo) repeat-type fold, consisting of a 2-layer alpha/alpha right-handed superhelix. Proteins usually contain two or more structurally similar MIF4G domains connected by unstructured linkers. MIF4G domains are found in several proteins involved in RNA metabolism, including eIF4G (eukaryotic initiation factor 4-gamma), eIF-2b (translation initiation factor), UPF2 (regulator of nonsense transcripts 2), and nuclear cap-binding proteins (CBP80, CBC1, NCBP1), although the sequence identity between them may be low [ ]. The nuclear cap-binding complex (CBC) is a heterodimer. Human CBC consists of a large CBP80 subunit and a small CBP20 subunit, the latter being critical for cap binding. CBP80 contains three MIF4G domains connected with long linkers, while CBP20 has an RNP (ribonucleoprotein)-type domain that associates with domains 2 and 3 of CBP80 [ ]. The complex binds to 5'-cap of eukaryotic RNA polymerase II transcripts, such as mRNA and U snRNA. The binding is important for several mRNA nuclear maturation steps and for nonsense-mediated decay. It is also essential for nuclear export of U snRNAs in metazoans []. Eukaryotic translation initiation factor 4 gamma (eIF4G) plays a critical role in protein expression, and is at the centre of a complex regulatory network. Together with the cap-binding protein eIF4E, it recruits the small ribosomal subunit to the 5'-end of mRNA and promotes the assembly of a functional translation initiation complex, which scans along the mRNA to the translation start codon. The activity of eIF4G in translation initiation could be regulated through intra- and inter-protein interactions involving the ARM repeats []. In eIF4G, the MIF4G domain binds eIF4A, eIF3, RNA and DNA.
Protein Domain
Name: MIF4G-like, type 1
Type: Domain
Description: This entry represents an MIF4G-like domain. MIF4G domains share a common structure but can differ in sequence. This entry is designated "type 1", and is found in nuclear cap-binding proteins and eIF4G.The MIF4G domain is a structural motif with an ARM (Armadillo) repeat-type fold, consisting of a 2-layer alpha/alpha right-handed superhelix. Proteins usually contain two or more structurally similar MIF4G domains connected by unstructured linkers. MIF4G domains are found in several proteins involved in RNA metabolism, including eIF4G (eukaryotic initiation factor 4-gamma), eIF-2b (translation initiation factor), UPF2 (regulator of nonsense transcripts 2), and nuclear cap-binding proteins (CBP80, CBC1, NCBP1), although the sequence identity between them may be low [ ]. The nuclear cap-binding complex (CBC) is a heterodimer. Human CBC consists of a large CBP80 subunit and a small CBP20 subunit, the latter being critical for cap binding. CBP80 contains three MIF4G domains connected with long linkers, while CBP20 has an RNP (ribonucleoprotein)-type domain that associates with domains 2 and 3 of CBP80 []. The complex binds to 5'-cap of eukaryotic RNA polymerase II transcripts, such as mRNA and U snRNA. The binding is important for several mRNA nuclear maturation steps and for nonsense-mediated decay. It is also essential for nuclear export of U snRNAs in metazoans []. Eukaryotic translation initiation factor 4 gamma (eIF4G) plays a critical role in protein expression, and is at the centre of a complex regulatory network. Together with the cap-binding protein eIF4E, it recruits the small ribosomal subunit to the 5'-end of mRNA and promotes the assembly of a functional translation initiation complex, which scans along the mRNA to the translation start codon. The activity of eIF4G in translation initiation could be regulated through intra- and inter-protein interactions involving the ARM repeats [ ]. In eIF4G, the MIF4G domain binds eIF4A, eIF3, RNA and DNA.
Protein Domain
Name: Zinc finger, Dof-type
Type: Domain
Description: This entry consists of proteins containing a Dof domain, which is a zinc finger DNA-binding domain that shows resemblance to the Cys2 zinc finger, although it has a longer putative loop where an extra Cys residue is conserved [ ]. AOBP, a DNA-binding protein in pumpkin (Cucurbita maxima), contains a 52 amino acid Dof domain, which is highly conserved in several DNA-binding proteins of higher plants.Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [, , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target.
Protein Domain
Name: ARID DNA-binding domain
Type: Domain
Description: The AT-rich interaction domain (ARID) is an ~100-amino acid DNA-binding module found in a large number of eukaryotic transcription factors that regulate cell proliferation, differentiation and development [, ]. The ARID domain appearsas a single-copy motif and can be found in association with other domains, such as JmjC, JmjN, Tudor and PHD-type zinc finger [].The basic structure of the ARID domain domain appears to be a series of six α-helices separated by β-strands, loops, or turns, but the structuredregion may extend to an additional helix at either or both ends of the basic six. Based on primary sequence homology, they can be partitioned into threestructural classes:Minimal ARID proteins that consist of a core domain formed by six alpha- helices;ARID proteins that supplement the core domain with an N-terminal alpha- helix;Extended-ARID proteins, which contain the core domain and additional alpha- helices at their N- and C-termini.Minimal ARIDs are distributed in all eukaryotes, while extended ARIDs are restricted to metazoans. The ARID domain binds DNA as a monomer, recognizingthe duplex through insertion of a loop and an α-helix into the major groove, and by extensive non-specific anchoring contacts to the adjacentsugar-phosphate backbone [ , , ].Some proteins known to contain a ARID domain are listed below:Eukaryotic transcription factors of the jumonji family.Mammalian Bright, a B-cell-specific trans-activator of IgH transcription.Mammalian PLU-1, a protein that is upregulated in breast cancer cells.Mammalian RBP1 and RBP2, retinoblastoma binding factors.Mammalian Mrf-1 and Mrf-2, transcriptional modulators of the cytomegalovirus major intermediate-early promoter.Drosophila melanogaster Dead ringer protein, a transcriptional regulatory protein required for early embryonic development.Yeast SWI1 protein, from the SWI/SNF complex involved in chromatin remodeling and broad aspects of transcription regulation.Drosophila melanogaster Osa. It is structurally related to SWI1 and associates with the brahma complex, which is the Drosophila equivalent ofthe SWI/SNF complex.
Protein Domain
Name: Zinc finger, RanBP2-type
Type: Domain
Description: Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [ , , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. This entry represents the zinc finger domain found in RanBP2 proteins. Ran is an evolutionary conserved member of the Ras superfamily that regulates all receptor-mediated transport between the nucleus and the cytoplasm. Ran binding protein 2 (RanBP2) is a 358kDa nucleoporin located on the cytoplasmic side of the nuclear pore complex which plays a role in nuclear protein import [ ]. RanBP2 contains multiple zinc fingers which mediate binding to RanGDP [].
Protein Domain
Name: HIT-like domain
Type: Domain
Description: The histidine triad motif (HIT) consists of the conserved sequence HXHXHXX (where X is a hydrophobic amino acid) at the enzymatic catalytic centre, in which the second histidine is strictly conserved and participates in catalysis with the third histidine [ , , ]. Proteins containing HIT domains form a superfamily of nucleotide hydrolases and transferases that act on the alpha-phosphate of ribonucleotides [, ]. They are highly conserved from archaea to humans and are involved in galactose metabolism, DNA repair, and tumor suppression []. HIT-containing proteins can be divided in five families based on catalytic specificities, sequence compositions, and structural similarities of its members: Hint family of protein kinase-interacting proteins, the most ancient class in this superfamily. These include adenosine 5'-monophosphoramide hydrolases (e.g. HIT-nucleotide-binding protein, or HINT) [ , ]. They also have a conserved zinc-binding motif C-X-X-C (where C is a cysteine residue and X is a hydrophobic residue), and a zinc ion is coordinated by these cysteine residues, together with the first histidine residue [].Fragile HIT protein, or FINT, whose name is due to its high rate of mutation at its locus on chromosome 3 in many cancers has been characterised as a tumor suppressor and plays a role in the hydrolysis of dinucleotide polyphosphates [ , ]. HINT and FINT HIT domains have a topology similar to that found in the N-terminal of protein kinases [].GalT family. These include specific nucleoside monophosphate transferases (e.g. galactose-1-phosphate uridylyltransferase, diadenosine tetraphosphate phosphorylase, and adenylyl sulphate:phosphate adenylytransferase). These HIT domains are a duplication consisting of 2 HIT-like motifs. This family binds zinc and iron [ , ].Aprataxin, which hydrolyses both dinucleotide polyphosphates and phophoramidates, and is involved in DNA repair systems [ , ].mRNA decapping enzyme family. These include enzymes such as DcpS and Dcp2. The HIT-domain is usually C-terminal in these proteins [ , ].
Protein Domain
Name: Methylthiotransferase, conserved site
Type: Conserved_site
Description: The methylthiotransferase (MTTase) or miaB-like family is named after the (dimethylallyl)adenosine tRNA MTTase miaB protein, which catalyses a C-H to C-S bond conversion in the methylthiolation of tRNA. A related bacterial enzyme rimO performs a similar methylthiolation, but on a protein substrate. RimO acts on the ribosomal protein S12 and forms a separate MTTase subfamily. The miaB-subfamily includes mammalian CDK5 regulatory subunit-associated proteins and similar proteins in other eukaryotes. Two other subfamilies, yqeV and CDKAL1, are named after a Bacillus subtilis and a human protein, respectively. While yqeV-like proteins are found in bacteria, CDKAL1 subfamily members occur in eukaryotes and in archaebacteria. The likely MTTases from these 4 subfamilies contain an N-terminal MTTase domain, a central radical generating fold and a C-terminal TRAM domain (see ). The core forms a radical SAM fold (or AdoMet radical), containing a cysteine motif CxxxCxxC that binds a [4Fe-4S] cluster [, , ]. A reducing equivalent from the [4Fe-4S]+ cluster is used to cleave S-adenosylmethionine (SAM) to generate methionine and a 5'-deoxyadenosyl radical. The latter is thought to produce a reactive substrate radical that is amenable to sulphur insertion [ , ]. The N-terminal MTTase domain contains 3 cysteines that bind a second [4Fe-4S]cluster, in addition to the radical-generating [4Fe-4S] cluster, which could be involved in the thiolation reaction. The C-terminal TRAM domain is not shared with other radical SAM proteins outside the MTTase family. The TRAM domain can bind to RNA substrate and seems to be important for substrate recognition. The tertiary structure of the central radical SAM fold has six beta/alpha motifs resembling a three-quarter TIM barrel core (see ) [ ]. The N-terminal MTTase domain might form an additional [beta/alpha]2 TIM barrel unit [ ]. This entry represents a conserved site containing three of the conserved cysteines that form the motif in the central radical SAM fold.
Protein Domain
Name: Methylthiotransferase, N-terminal
Type: Domain
Description: The methylthiotransferase (MTTase) or miaB-like family is named after the (dimethylallyl)adenosine tRNA MTTase miaB protein, which catalyses a C-H to C-S bond conversion in the methylthiolation of tRNA. A related bacterial enzyme rimO performs a similar methylthiolation, but on a protein substrate. RimO acts on the ribosomal protein S12 and forms a separate MTTase subfamily. The miaB-subfamily includes mammalian CDK5 regulatory subunit-associated proteins and similar proteins in other eukaryotes. Two other subfamilies, yqeV and CDKAL1, are named after a Bacillus subtilis and a human protein, respectively. While yqeV-like proteins are found in bacteria, CDKAL1 subfamily members occur in eukaryotes and in archaebacteria. The likely MTTases from these 4 subfamilies contain an N-terminal MTTase domain, a central radical generating fold and a C-terminal TRAM domain (see ). The core forms a radical SAM fold (or AdoMet radical), containing a cysteine motif CxxxCxxC that binds a [4Fe-4S] cluster [, , ]. A reducing equivalent from the [4Fe-4S]+ cluster is used to cleave S-adenosylmethionine (SAM) to generate methionine and a 5'-deoxyadenosyl radical. The latter is thought to produce a reactive substrate radical that is amenable to sulphur insertion [ , ]. The N-terminal MTTase domain contains 3 cysteines that bind a second [4Fe-4S]cluster, in addition to the radical-generating [4Fe-4S] cluster, which could be involved in the thiolation reaction. The C-terminal TRAM domain is not shared with other radical SAM proteins outside the MTTase family. The TRAM domain can bind to RNA substrate and seems to be important for substrate recognition. The tertiary structure of the central radical SAM fold has six beta/alpha motifs resembling a three-quarter TIM barrel core (see ) [ ]. The N-terminal MTTase domain might form an additional [beta/alpha]2 TIM barrel unit [ ].
Protein Domain
Name: Methylthiotransferase
Type: Family
Description: The methylthiotransferase (MTTase) or miaB-like family is named after the (dimethylallyl)adenosine tRNA MTTase miaB protein, which catalyses a C-H to C-S bond conversion in the methylthiolation of tRNA. A related bacterial enzyme RimO performs a similar methylthiolation, but on a protein substrate. RimO acts on the ribosomal protein S12 and forms a separate MTTase subfamily. The miaB-subfamily includes mammalian CDK5 regulatory subunit-associated proteins and similar proteins in other eukaryotes. Two other subfamilies, yqeV and CDKAL1, are named after a Bacillus subtilis and a human protein, respectively. While yqeV-like proteins are found in bacteria, CDKAL1 subfamily members occur in eukaryotes and in archaebacteria [ ].The likely MTTases from these 4 subfamilies contain an N-terminal MTTase domain, a central radical generating fold and a C-terminal TRAM domain (see ). The core forms a radical SAM fold (or AdoMet radical), containing a cysteine motif CxxxCxxC that binds a [4Fe-4S] cluster [, , ]. A reducing equivalent from the [4Fe-4S]+ cluster is used to cleave S-adenosylmethionine (SAM) to generate methionine and a 5'-deoxyadenosyl radical. The latter is thought to produce a reactive substrate radical that is amenable to sulphur insertion [ , ]. The N-terminal MTTase domain contains 3 cysteines that bind a second [4Fe-4S]cluster, in addition to the radical-generating [4Fe-4S] cluster, which could be involved in the thiolation reaction. The C-terminal TRAM domain is not shared with other radical SAM proteins outside the MTTase family. The TRAM domain can bind to RNA substrate and seems to be important for substrate recognition. The tertiary structure of the central radical SAM fold has six beta/alpha motifs resembling a three-quarter TIM barrel core []. The N-terminal MTTase domain might form an additional [beta/alpha]2 TIM barrel unit [ ].
Protein Domain
Name: Methylthiotransferase, N-terminal domain superfamily
Type: Homologous_superfamily
Description: The methylthiotransferase (MTTase) or miaB-like family is named after the (dimethylallyl)adenosine tRNA MTTase miaB protein, which catalyses a C-H to C-S bond conversion in the methylthiolation of tRNA. A related bacterial enzyme rimO performs a similar methylthiolation, but on a protein substrate. RimO acts on the ribosomal protein S12 and forms a separate MTTase subfamily. The miaB-subfamily includes mammalian CDK5 regulatory subunit-associated proteins and similar proteins in other eukaryotes. Two other subfamilies, yqeV and CDKAL1, are named after a Bacillus subtilis and a human protein, respectively. While yqeV-like proteins are found in bacteria, CDKAL1 subfamily members occur in eukaryotes and in archaebacteria. The likely MTTases from these 4 subfamilies contain an N-terminal MTTase domain, a central radical generating fold and a C-terminal TRAM domain (see ). The core forms a radical SAM fold (or AdoMet radical), containing a cysteine motif CxxxCxxC that binds a [4Fe-4S] cluster [, , ]. A reducing equivalent from the [4Fe-4S]+ cluster is used to cleave S-adenosylmethionine (SAM) to generate methionine and a 5'-deoxyadenosyl radical. The latter is thought to produce a reactive substrate radical that is amenable to sulphur insertion [ , ]. The N-terminal MTTase domain contains 3 cysteines that bind a second [4Fe-4S]cluster, in addition to the radical-generating [4Fe-4S] cluster, which could be involved in the thiolation reaction. The C-terminal TRAM domain is not shared with other radical SAM proteins outside the MTTase family. The TRAM domain can bind to RNA substrate and seems to be important for substrate recognition. The tertiary structure of the central radical SAM fold has six beta/alpha motifs resembling a three-quarter TIM barrel core (see ) [ ]. The N-terminal MTTase domain might form an additional [beta/alpha]2 TIM barrel unit [ ].
Protein Domain
Name: Clathrin, heavy chain, linker
Type: Homologous_superfamily
Description: Proteins synthesized on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. These vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transport [ ]. Clathrin coats contain both clathrin (acts as a scaffold) and adaptor complexes that link clathrin to receptors in coated vesicles. Clathrin-associated protein complexes are believed to interact with the cytoplasmic tails of membrane proteins, leading to their selection and concentration. The two major types of clathrin adaptor complexes are the heterotetrameric adaptor protein (AP) complexes, and the monomeric GGA (Golgi-localising, Gamma-adaptin ear domain homology, ARF-binding proteins) adaptors [, ].Clathrin is a trimer composed of three heavy chains and three light chains, each monomer projecting outwards like a leg; this three-legged structure is known as a triskelion [ , ]. The heavy chains form the legs, their N-terminal β-propeller regions extending outwards, while their C-terminal α-α-superhelical regions form the central hub of the triskelion. Peptide motifs can bind between the β-propeller blades. The light chains appear to have a regulatory role, and may help orient the assembly and disassembly of clathrin coats as they interact with hsc70 uncoating ATPase []. Clathrin triskelia self-polymerise into a curved lattice by twisting individual legs together. The clathrin lattice forms around a vesicle as it buds from the TGN, plasma membrane or endosomes, acting to stabilise the vesicle and facilitate the budding process []. The multiple blades created when the triskelia polymerise are involved in multiple protein interactions, enabling the recruitment of different cargo adaptors and membrane attachment proteins []. This entry represents the α-helical zigzag linker region connecting the conserved N-terminal β-propeller region to the C-terminal α-α-superhelical region in clathrin heavy chains [ ].
Protein Domain
Name: ARID DNA-binding domain superfamily
Type: Homologous_superfamily
Description: The AT-rich interaction domain (ARID) is an ~100-amino acid DNA-binding module found in a large number of eukaryotic transcription factors that regulate cell proliferation, differentiation and development [, ]. The ARID domain appearsas a single-copy motif and can be found in association with other domains, such as JmjC, JmjN, Tudor and PHD-type zinc finger [].The basic structure of the ARID domain domain appears to be a series of six α-helices separated by β-strands, loops, or turns, but the structuredregion may extend to an additional helix at either or both ends of the basic six. Based on primary sequence homology, they can be partitioned into threestructural classes:Minimal ARID proteins that consist of a core domain formed by six alpha- helices;ARID proteins that supplement the core domain with an N-terminal alpha- helix;Extended-ARID proteins, which contain the core domain and additional alpha- helices at their N- and C-termini.Minimal ARIDs are distributed in all eukaryotes, while extended ARIDs are restricted to metazoans. The ARID domain binds DNA as a monomer, recognizingthe duplex through insertion of a loop and an α-helix into the major groove, and by extensive non-specific anchoring contacts to the adjacentsugar-phosphate backbone [ , , ].Some proteins known to contain a ARID domain are listed below:Eukaryotic transcription factors of the jumonji family.Mammalian Bright, a B-cell-specific trans-activator of IgH transcription.Mammalian PLU-1, a protein that is upregulated in breast cancer cells.Mammalian RBP1 and RBP2, retinoblastoma binding factors.Mammalian Mrf-1 and Mrf-2, transcriptional modulators of the cytomegalovirus major intermediate-early promoter.Drosophila melanogaster Dead ringer protein, a transcriptional regulatory protein required for early embryonic development.Yeast SWI1 protein, from the SWI/SNF complex involved in chromatin remodeling and broad aspects of transcription regulation.Drosophila melanogaster Osa. It is structurally related to SWI1 and associates with the brahma complex, which is the Drosophila equivalent ofthe SWI/SNF complex.
Protein Domain
Name: Zinc finger, RanBP2-type superfamily
Type: Homologous_superfamily
Description: Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [ , , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. This entry represents the zinc finger domain superfamily found in RanBP2 proteins. Ran is an evolutionary conserved member of the Ras superfamily that regulates all receptor-mediated transport between the nucleus and the cytoplasm. Ran binding protein 2 (RanBP2) is a 358kDa nucleoporin located on the cytoplasmic side of the nuclear pore complex which plays a role in nuclear protein import [ ]. RanBP2 contains multiple zinc fingers which mediate binding to RanGDP [].
Protein Domain
Name: CRISPR-associated protein, TM1793
Type: Family
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. This entry represents a highly divergent family of Cas proteins found in at least ten different archaeal and bacterial species. This family includes TM1793 from Thermotoga maritima.
Protein Domain
Name: CRISPR-associated protein, Cmr3
Type: Family
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA). The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. This entry represents a highly divergent family of Cas proteins, found in at least ten different archaeal and bacterial species, including TM1793 from Thermotoga maritima.
Protein Domain
Name: CRISPR-associated protein, CT1975
Type: Family
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. This entry represents a family of Cas proteins, which includes CT1975 of Chlorobium tepidum. This family is also known as Cse4/CasC and Cas7 Type I-E [ ].
Protein Domain
Name: CRISPR-associated protein, CXXC-CXXC
Type: Family
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. This entry represents a conserved region from an otherwise highly divergent protein found in the Tneap subtype of CRISPR/Cas regions. This Cys-rich region features two CXXC motifs.
Protein Domain
Name: TOG domain
Type: Domain
Description: XMAP215/Dis1 proteins, such as Alp14 and XMAP215, increase microtubules dynamic polymerization rates by recruiting soluble alpha/beta-tubulin via their conserved TOG domains to polymerizing microtubule plus ends [ , ]. A TOG domain contains HEAT repeats.This entry represents a structural domain with an armadillo (ARM)-like fold, consisting of a multi-helical fold comprised of two curved layers of α- helices arranged in a regular right-handed superhelix, where the repeats that make up this structure are arranged about a common axis [ ]. These superhelical structures present an extensive solvent-accessible surface that is well suited to binding large substrates such as proteins and nucleic acids. Domains and repeats with an ARM-like fold have been found in a number of proteins, including:ARM repeat domain, found in beta-catenins, importins, karyopherin and exportins.HEAT repeat domain, found in protein phosphatase 2a and initiation factor eIF4G.PHAT domain, found in the RNA-binding protein Smaug.Leucine-rich repeat variant, which contain an FeS cluster.Pumilio repeat domain, found in Pumilio protein.Regulatory subunit H of V-type ATPases.PBS lyase HEAT-like repeat.Mo25 protein.MIF4G domain-like, found in eukaryotic initiation factor eIF4G, translation initiation factor eIF-2b epsilon and nuclear cap-binding protein CBP80.The N-terminal domain of eukaryotic translation initiation factor 3 subunit 12.The C-terminal domain of leukotriene A4 hydrolase.The helical domain of phosphoinositide 3-kinase.The N-terminal fragment of adaptin alpha-C and beta subunits.The proximal leg segment and the linker domain of the clathrin heavy chain.The sequence similarity among these different repeats or domains is low, however they exhibit considerable structural similarity. Furthermore, the number of repeats present in the superhelical structure can vary between orthologues, indicating that rapid loss/gain of repeats has occurred frequently in evolution. A common phylogenetic origin has been proposed for the armadillo and HEAT repeats [ ].
Protein Domain
Name: SH2 domain superfamily
Type: Homologous_superfamily
Description: The Src homology 2 (SH2) domain is a protein domain of about 100 amino-acid residues first identified as a conserved sequence region between the oncoproteins Src and Fps [ ]. Similar sequences were later found in many other intracellular signal-transducing proteins []. SH2 domains function as regulatory modules of intracellular signalling cascades by interacting with high affinity to phosphotyrosine-containing target peptides in a sequence-specific, SH2 domains recognise between 3-6 residues C-terminal to the phosphorylated tyrosine in a fashion that differs from one SH2 domain to another, and strictly phosphorylation-dependent manner [ , , , ]. They are found in a wide variety of protein contexts e.g., in association with catalytic domains of phospholipase Cy (PLCy) and the non-receptor protein tyrosine kinases; within structural proteins such as fodrin and tensin; and in a group of small adaptor molecules, i.e Crk and Nck. The domains are frequently found as repeats in a single protein sequence and will then often bind both mono- and di-phosphorylated substrates. The structure of the SH2 domain belongs to the α+β class, its overall shape forming a compact flattened hemisphere. The core structural elements comprise a central hydrophobic anti-parallel β-sheet, flanked by 2 short α-helices. The loop between strands 2 and 3 provides many of the binding interactions with the phosphate group of its phosphopeptide ligand, and is hence designated the phosphate binding loop, the phosphorylated ligand binds perpendicular to the β-sheet and typically interacts with the phosphate binding loop and a hydrophobic binding pocket that interacts with a pY+3 side chain. The N- and C-termini of the domain are close together in space and on the opposite face from the phosphopeptide binding surface and it has been speculated that this has facilitated their integration into surface-exposed regions of host proteins [ ].
Protein Domain
Name: ORF8, SARS-CoV-2
Type: Family
Description: This entry represents the ORF8 immunoglobulin (Ig) domain protein of Severe acute respiratory syndrome (SARS) coronavirus 2 (SARS-CoV-2, also known as a 2019 novel coronavirus, 2019-nCoV) and related Sarbecovirus ORF8 proteins.ORF8 is an accessory protein that is not shared by all members of subgenus sarbecovirus. The presence and location of ORF8 in the SARS-CoV-2 genome has led its classification with SARS-CoV [ , ]. ORF8 is a potential pathogenicity factor which evolves rapidly to counter the immune response and facilitate the transmission between hosts []. ORF8 has been suggested to be one of the relevant genes in the study of human adaptation of the virus [, ].The ORF8 protein is a fast-evolving protein in SARS-related CoVs, with a tendency to recombine and undergo deletions. During the early phases of the SARS (SARS-CoV) epidemic in 2002, human isolates were found to possess a unique continuous ORF8 with 366 nucleotides and a predicted protein with 122 amino acids. During the middle and late phases of the SARS epidemic, two functional ORFs (ORF8a and ORF8b) were emerged; they are predicted to encode two small proteins, 8a with 39 amino acids and 8b with 84 amino acids. Interestingly, SARS-CoV-2 ORF8 has not undergone any significantly measurable deletion events, so its function as a full-length protein might be more important to its pathogenicity [ ]. ORF8 plays a role in modulating host immune response [] which may act by down-regulating major histocompatibility complex class I (MHC-I) []. It may inhibit expression of some members of the IFN-stimulated gene (ISG) family including hosts IGF2BP1/ZBP1, MX1 and MX2, and DHX58 []. ORF8 also binds to IL17RA receptor, leading to IL17 pathway activation and an increased secretion of pro-inflammatory factors, contributing to cytokine storm during COVID-19 infection [].
Protein Domain
Name: ORF8, bat coronavirus Rf1-like
Type: Family
Description: This subfamily includes the ORF8 immunoglobulin (Ig) domain proteins of bat coronavirus Rf1 (Bat SARS CoV Rf1) and Bat CoV 273/2005, which have been classified previously as type II ORF8 proteins.ORF8 is an accessory protein that is not shared by all members of subgenus sarbecovirus. The presence and location of ORF8 in the SARS-CoV-2 genome has led its classification with SARS-CoV [ , ]. ORF8 is a potential pathogenicity factor which evolves rapidly to counter the immune response and facilitate the transmission between hosts []. ORF8 has been suggested to be one of the relevant genes in the study of human adaptation of the virus [, ].The ORF8 protein is a fast-evolving protein in SARS-related CoVs, with a tendency to recombine and undergo deletions. During the early phases of the SARS (SARS-CoV) epidemic in 2002, human isolates were found to possess a unique continuous ORF8 with 366 nucleotides and a predicted protein with 122 amino acids. During the middle and late phases of the SARS epidemic, two functional ORFs (ORF8a and ORF8b) were emerged; they are predicted to encode two small proteins, 8a with 39 amino acids and 8b with 84 amino acids. Interestingly, SARS-CoV-2 ORF8 has not undergone any significantly measurable deletion events, so its function as a full-length protein might be more important to its pathogenicity [ ]. ORF8 plays a role in modulating host immune response [] which may act by down-regulating major histocompatibility complex class I (MHC-I) []. It may inhibit expression of some members of the IFN-stimulated gene (ISG) family including hosts IGF2BP1/ZBP1, MX1 and MX2, and DHX58 []. ORF8 also binds to IL17RA receptor, leading to IL17 pathway activation and an increased secretion of pro-inflammatory factors, contributing to cytokine storm during COVID-19 infection [].
Protein Domain
Name: CRISPR-associated protein, CXXC-CXXC domain
Type: Domain
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. This entry represents a conserved domain of about 65 amino acids found in otherwise highly divergent proteins encoded in CRISPR-associated regions. This domain features two CXXC motifs.
Protein Domain
Name: CRISPR-assoc protein, NE0113/Csx13
Type: Family
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. This entry represents a Cas protein family found in both bacteria and arachaea, including the ring nuclease SSO2081 from Saccharolobus solfataricus [ ].
Protein Domain
Name: CRISPR-associated protein, NE0113
Type: Family
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. This entry represents a minor family of Cas proteins encoded in cas gene clusters in Vibrio vulnificus (strain YJ016), Nitrosomonas europaea (strain ATCC 19718), Mannheimia succiniciproducens (strain MBEL55E), and Verrucomicrobium spinosum.
Protein Domain
Name: Voltage-dependent L-type calcium channel subunit beta-1-4, N-terminal A domain
Type: Domain
Description: Ca2+ ions are unique in that they not only carry charge but they are also the most widely used of diffusible second messengers. Voltage-dependent Ca2+ channels (VDCC) are a family of molecules that allow cells to couple electrical activity to intracellular Ca2+ signalling. The opening and closing of these channels by depolarizing stimuli, such as action potentials, allows Ca2+ ions to enter neurons down a steep electrochemical gradient, producing transient intracellular Ca2+ signals. Many of the processes that occur in neurons, including transmitter release, gene transcription and metabolism are controlled by Ca2+ influx occurring simultaneously at different cellular locales. The pore is formed by the alpha-1 subunit which incorporates the conduction pore, the voltage sensor and gating apparatus, and the known sites of channel regulation by second messengers, drugs, and toxins [ ]. The activity of this pore is modulated by four tightly-coupled subunits: an intracellular beta subunit; a transmembrane gamma subunit; and a disulphide-linked complex of alpha-2 and delta subunits, which are proteolytically cleaved from the same gene product. Properties of the protein including gating voltage-dependence, G protein modulation and kinase susceptibility can be influenced by these subunits.Voltage-gated calcium channels are classified as T, L, N, P, Q and R, and are distinguished by their sensitivity to pharmacological blocks, single-channel conductance kinetics, and voltage-dependence. On the basis of their voltage activation properties, the voltage-gated calcium classes can be further divided into two broad groups: the low (T-type) and high (L, N, P, Q and R-type) threshold-activated channels.This entry represents the N-terminal A domain of the beta subunit of Voltage-dependent L-type calcium channel, which precedes the SH3 domain and it may be involved in protein-protein interactions [ ]. These proteins are thought to regulate the channel properties through protein-protein interactions with non Ca channel proteins. They show an N-terminal α-helix [, ].
Protein Domain
Name: CRISPR-associated RAMP TM1809
Type: Family
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. This entry represents a family of Cas proteins that are found in the RAMP-2 subtype of CRISPR/cas locus and designated TM1809 family.
Protein Domain
Name: Vitellinogen, beta-sheet shell domain superfamily
Type: Homologous_superfamily
Description: Vitellinogen precursors provide the major egg yolk proteins that are a source of nutrients during early development of oviparous vertebrates and invertebrates. Vitellinogen precursors are multi-domain apolipoproteins that are cleaved into distinct yolk proteins. Different vitellinogen precursors exist, which are composed of variable combinations of yolk protein components; however, the cleavage sites are conserved [ , ].In vertebrates, a complete vitellinogen is composed of an N-terminal signal peptide for export, followed by four regions that can be cleaved into yolk proteins: lipovitellin-1, phosvitin, lipovitellin-2, and a von Willebrand factor type D domain (YGP40). Vitellinogens are post-translationally glycosylated and phosphorylated in the endoplasmic reticulum and Golgi complex of hepatocytes, before being secreted into the circulatory system to be taken up by oocytes. In the ovary, vitellinogens bind to specific Vtgr receptors on oocyte membranes to become internalised by endocytosis, where they are cleaved into yolk proteins by cathepsin D. YGP40 is released into the yolk plasma before or during compartmentation of lipovitellin-phosvitin complex into the yolk granule.The different yolk proteins have distinct roles. Phosvitins are important in sequestering calcium, iron and other cations for the developing embryo. Phosvitins are one of the most phosphorylated (10%) proteins in nature, the high concentration of phosphate groups providing efficient metal-binding sites in clusters [ , ]. Lipovitellins are involved in lipid and metal storage, and contain a heterogeneous mixture of about 16% (w/w) noncovalently bound lipid, most being phospholipid. Lipovitellin-1 contains two chains, LV1N and LV1C [, ].This entry represents the β-sheet shell domain superfamily found in vitellinogen, which generally corresponds to the lipovitellin-2 peptide product. This domain consists of several large open β-sheets [ ]. It is often found C-terminal to and .
Protein Domain
Name: Vitellinogen, open beta-sheet, subdomain 1
Type: Homologous_superfamily
Description: Vitellinogen precursors provide the major egg yolk proteins that are a source of nutrients during early development of oviparous vertebrates and invertebrates. Vitellinogen precursors are multi-domain apolipoproteins that are cleaved into distinct yolk proteins. Different vitellinogen precursors exist, which are composed of variable combinations of yolk protein components; however, the cleavage sites are conserved [ , ].In vertebrates, a complete vitellinogen is composed of an N-terminal signal peptide for export, followed by four regions that can be cleaved into yolk proteins: lipovitellin-1, phosvitin, lipovitellin-2, and a von Willebrand factor type D domain (YGP40). Vitellinogens are post-translationally glycosylated and phosphorylated in the endoplasmic reticulum and Golgi complex of hepatocytes, before being secreted into the circulatory system to be taken up by oocytes. In the ovary, vitellinogens bind to specific Vtgr receptors on oocyte membranes to become internalised by endocytosis, where they are cleaved into yolk proteins by cathepsin D. YGP40 is released into the yolk plasma before or during compartmentation of lipovitellin-phosvitin complex into the yolk granule.The different yolk proteins have distinct roles. Phosvitins are important in sequestering calcium, iron and other cations for the developing embryo. Phosvitins are one of the most phosphorylated (10%) proteins in nature, the high concentration of phosphate groups providing efficient metal-binding sites in clusters [ , ]. Lipovitellins are involved in lipid and metal storage, and contain a heterogeneous mixture of about 16% (w/w) noncovalently bound lipid, most being phospholipid. Lipovitellin-1 contains two chains, LV1N and LV1C [, ].This entry represents an open β-sheet subdomain found in vitellinogen, which generally corresponds to a domain within the lipovitellin-1 peptide product. This domain adopts a structure consisting of one of several large open β-sheets. This subdomain is also found in apolipophorins, the major component of lipophorin, which mediates transport for various types of lipids in hemolymph.
Protein Domain
Name: CRISPR system single-strand-specific deoxyribonuclease Cas10/Csm1
Type: Family
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. This entry represents Csm1 (CRISPR/Cas Subtype Mtube Protein 1), which is a single-strand-specific deoxyribonuclease (ssDNase) which digests both linear and circular ssDNA [ , ].
Protein Domain
Name: CRISPR system ring nuclease SSO1393-like
Type: Domain
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability []. This entry represents a conserved region of about 150 amino acids found in a family of Cas proteins, such as ring nuclease SSO1393 from Saccharolobus solfataricus [ ].
Protein Domain
Name: CRISPR-associated RAMP Csx10
Type: Family
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. This entry represents the Csx10 family of Cas proteins, which are largely restricted to cyanobacteria [ ].
Protein Domain
Name: Flavivirus NS3, petidase S7
Type: Domain
Description: The viral genome of Flavivirus is a positive strand RNA that encodes a single polyprotein precursor. Processing of the polyprotein precursor into mature proteins is carried out by the host signal peptidase and by NS3 serine protease, which requires NS2B ( ) as a cofactor. Pathogenic members of the flavivirus family, including West Nile Virus (WNV) and Dengue Virus (DV), are growing global threats for which there are nospecific treatments. The genome encodes three structural proteins found in the mature virion (C, prM, and E) and seven "nonstructural"(i.e., not part of the virion architecture) proteins (NS1, NS2A, NS2B, NS3, NS4A, NS4B, and NS5).Full-length NS3 is a bifunctional protein. The N-terminal 175 residues comprise a chymotrypsin-like protease, while the C-terminal portionis a helicase. The NS2B protein, which is located in the polypeptide precursor immediately upstream of the NS3 protease domain, functions as the cofactor for NS3 protease. A 35-48 residue central portion is required forprotease activity in vitro, while N- and C-terminal flanking hydrophobic regions are predicted to anchor the NS2B-NS3 complex into the host endoplasmicreticulum membrane. The two component flaviviral enzyme NS2B-NS3 cleaves the viral polyprotein precursor within the host cell, a process that is requiredfor viral replication [ , , ]. The NS3 protease forms the MEROPS peptidase family S7 (flavivirin family), clan PA.The NS3 protease has a classical serine protease catalytic triad (His, Asp, and Ser). The enzymatic activity is enhanced by interacting with thecentral 40 amino acid of NS2B which acts as an essential cofactor. The NS3 protease domain has an overall structure of two barrels made of six beta sheets each,with the active site located in the cleft between the barrels. The NS2B hydrophilic core cofactor contributes one of the N-terminal beta sheets [, , ].
Protein Domain
Name: Glutaredoxin-like protein, YruB
Type: Family
Description: Glutaredoxins [ , , ], also known as thioltransferases (disulphide reductases), are small proteins of approximately one hundred amino-acid residues which utilise glutathione and NADPH as cofactors. Oxidized glutathione is regenerated by glutathione reductase. Together these components compose the glutathione system [].Glutaredoxin functions as an electron carrier in the glutathione-dependent synthesis of deoxyribonucleotides by the enzyme ribonucleotide reductase. Like thioredoxin (TRX), which functions in a similar way, glutaredoxin possesses an active centre disulphide bond [ ]. It exists in either a reduced or an oxidized form where the two cysteine residues are linked in an intramolecular disulphide bond. It contains a redox active CXXC motif in a TRX fold and uses a similar dithiol mechanism employed by TRXs for intramolecular disulfide bond reduction of protein substrates. Unlike TRX, GRX has preference for mixed GSH disulfide substrates, in which it uses a monothiol mechanism where only the N-terminal cysteine is required. The flow of reducing equivalents in the GRX system goes from NADPH ->GSH reductase ->GSH ->GRX ->protein substrates [ , , , ]. By altering the redox state of target proteins, GRX is involved in many cellular functions including DNA synthesis, signal transduction and the defense against oxidative stress.Glutaredoxin has been sequenced in a variety of species. On the basis of extensive sequence similarity, it has been proposed [ ] that Vaccinia virus protein O2L is most probably a glutaredoxin. Finally, it must be noted that Bacteriophage T4 thioredoxin seems also to be evolutionary related. In position 5 of the pattern T4 thioredoxin has Val instead of Pro.This glutaredoxin-like protein family contains the conserved CxxC motif and includes the Clostridium pasteurianum protein YruB which has been cloned from a rubredoxin operon [ ]. Somewhat related to NrdH, it is unknown whether this protein actually interacts with glutathione/glutathione reducatase, or, like NrdH, some other reductant system.
Protein Domain
Name: E3 ubiquitin-protein ligase MGRN1/RNF157-like
Type: Family
Description: This entry represents a group of confirmed and predicted E3 ubiquitin ligases, including MGRN1/RNF157 from humans and LOG2/LUL1-4 from Arabidopsis.MGRN1 is a cytosolic E3 ubiquitin-protein ligase that inhibits signalling through the G protein-coupled melanocortin receptors-1 (MC1R), -2 (MC2R) and -4 (MC4R) via ubiquitylation-dependent and -independent processes [ ]. It suppresses chaperone-associated misfolded protein aggregation and toxicity []. MGRN1 interacts with cytosolic prion proteins (PrPs) that are linked with neurodegeneration[ ]. It also interacts with expanded polyglutamine proteins, and suppresses misfolded polyglutamine aggregation and cytotoxicity. Moreover, MGRN1 inhibits melanocortin receptor signaling by competition with Galphas, suggesting a novel pathway for melanocortin signaling from the cell surface to the nucleus []. Furthermore, MGRN1 interacts with and ubiquitylates TSG101, a key component of the endosomal sorting complex required for transport (ESCRT)-I, and regulates endosomal trafficking. A null mutation in the gene encoding MGRN1 causes spongiform neurodegeneration, suggesting a link between dysregulation of endosomal trafficking and spongiform neurodegeneration [, ].RNF157 is a cytoplasmic E3 ubiquitin ligase predominantly expressed in brain. In cultured neurons, it promotes neuronal survival in an E3 ligase-dependent manner. In contrast, it supports growth and maintenance of dendrites independent of its E3 ligase activity. RNF157 interacts with and ubiquitinates the adaptor protein APBB1 (amyloid beta precursor protein-binding, family B, member 1 or Fe65), which regulates neuronal survival, but not dendritic growth downstream of RNF157. The nuclear localization of APBB1 together with its interaction partner RNA-binding protein SART3 (squamous cell carcinoma antigen recognized by T cells 3 or Tip110) is crucial to trigger apoptosis [ ]. Both MGRN1 and RNF157 contain a modified C3HC5-type RING-HC finger, and a functionally uncharacterized region, known as domain associated with RING2 (DAR2), N-terminal to the RING finger.In Arabidopsis, LOG2 is a predicted E3 ubiquitin ligase that interact with GDU1 and is involved in the regulation of amino acid export from plant cells [ ].
Protein Domain
Name: Flagellar M-ring , N-terminal
Type: Domain
Description: This domain is found at the N terminus of the flagellar M-ring protein FliF. It can also be found in YscJ lipoprotein, where it covers most of the sequence. FliF is the major protein of the M-ring in bacterial flagellar basal body [].The basal body consists of four rings (L,P,S and M) surrounding the flagellar rod, which is believed to transmit motor rotation to the filament [ ]. The M ring is integral to the inner membrane of the cell, and may be connected to the rod via the S (supramembrane) ring, which lies just distalto it. The L and P rings reside in the outer membrane and periplasmic space, respectively.FliF lacks a signal peptide and is predicted to have considerable α-helical structure, including an N-terminal sequence that is likelyto be membrane-spanning [ ]. Overall, however, FliF has a relativelyhydrophilic sequence, with a high charge density, especially towards its C terminus [].The type III secretion system is of great interest as it is used to transport virulence factors from the pathogen directly into the host cell [] and is only triggered when the bacterium comes into close contact with the host. The protein subunits of the system are very similar to those of bacterial flagellar biosynthesis []. However, while the latter forms a ring structure to allow secretion of flagellin and is an integral part of the flagellum itself, type III subunits in the outer membrane translocate secreted proteins through a channel-like structure.One of the outer membrane protein subunit families, termed "K"here for nomenclature purposes, aids in the structural assembly of the invasion complex []. It is also described as a lipoprotein. These lipoproteins include the PrgK and SsaJ from Salmonella, MxiJ from Shigella, YscJ from Yersinia, and from the plant enteropathogens NolT (Rhizobium) and HrcJ (Erwinia).
Protein Domain
Name: Selenophosphate synthetase
Type: Family
Description: The UGA (TGA) codon is normally a termination codon, however it is also used as a selenocysteine (Sec) codon by numerous organisms []. Sec is the 21st amino acid that is inserted into selenoproteins (protein that includes a selenocysteine (Se-Cys) amino acid residue). The synthesis of Sec and its incorporation into proteins requires the activity of a number of proteins, one of which is selenophosphate synthetase (SPS), also known as the SelD gene product [, ]. SPS catalises the production of the selenium donor compound monoselenophosphate (MSP) from selenide and ATP. MSP is then used to synthesize Sec from seryl-tRNAs []. SPS was initially identified in E. coli as the product of the gene selD, one of four essential selenoprotein synthesis genes (selA-D) [ ]. SelC is the tRNA itself, SelD acts as a donor of reduced selenium, SelA modifies a serine residue on SelC into selenocysteine, and SelB is a selenocysteine-specific translation elongation factor. 3' or 5' non-coding elements of mRNA have been found as probable structures for directing selenocysteine incorporation. Later, the selD homologues from eukaryotes, bacteria, and archaea were identified [].In mammals, two gene products, SPS1 and SPS2 are proposed to be selenophosphate synthetases. SPS1 may be involved in Sec recycling via a selenium salvage pathway, whereas SPS2 may play a role in the synthesis of selenophosphate [ ]. SPS2 is a selenoprotein and could serve as an autoregulator of selenoprotein synthesis []. Drosophila SPS1 (UniProt: O18373) lacks selenide-dependent SPS activity due to an arginine substitution of the critical Cys (or Sec) residue in the catalytic domain of the enzyme when expressed in E. coli [ ]. Drosophila SPS2 (also known as Dsps2) is a selenoprotein that contains a UGA stop codon in the catalytic centre of the enzyme, nevertheless, the read-through activity can be provided by a mammalian-like SECIS element in its 3'UTR [].
Protein Domain
Name: ISC system FeS cluster assembly, IscX
Type: Family
Description: Iron-sulphur (FeS) clusters are important cofactors for numerous proteins involved in electron transfer, in redox and non-redox catalysis, in gene regulation, and as sensors of oxygen and iron. These functions depend on the various FeS cluster prosthetic groups, the most common being [2Fe-2S] and [4Fe-4S][ ]. FeS cluster assembly is a complex process involving the mobilisation of Fe and S atoms from storage sources, their assembly into [Fe-S]form, their transport to specific cellular locations, and their transfer to recipient apoproteins. So far, three FeS assembly machineries have been identified, which are capable of synthesising all types of [Fe-S] clusters: ISC (iron-sulphur cluster), SUF (sulphur assimilation), and NIF (nitrogen fixation) systems.The ISC system is conserved in eubacteria and eukaryotes (mitochondria), and has broad specificity, targeting general FeS proteins [ , ]. It is encoded by the isc operon (iscRSUA-hscBA-fdx-iscX). IscS is a cysteine desulphurase, which obtains S from cysteine (converting it to alanine) and serves as a S donor for FeS cluster assembly. IscU and IscA act as scaffolds to accept S and Fe atoms, assembling clusters and transfering them to recipient apoproteins. HscA is a molecular chaperone and HscB is a co-chaperone. Fdx is a [2Fe-2S]-type ferredoxin. IscR is a transcription factor that regulates expression of the isc operon. IscX (also known as YfhJ) appears to interact with IscS and may function as an Fe donor during cluster assembly [ ].This entry represents IscX proteins (also known as hypothetical protein YfhJ) that are part of the ISC system. IscX is active as a monomer. The structure of YfhJ is an orthogonal α-bundle [ ]. YfhJ is a small acidic protein that binds IscS, and contains a modified winged helix motif that is usually found in DNA-binding proteins []. YfhJ/IscX can bind Fe, and may function as an Fe donor in the assembly of FeS clusters
Protein Domain
Name: CRISPR-associated endonuclease Cas1, C-terminal domain
Type: Homologous_superfamily
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. This superfamily corresponds to the C-terminal domain found in Cas1, a CRISPR-associated protein. Cas1 is common to all CRISPR-containing prokaryotes and it may be involved in linking DNA segments to CRISPR.
Protein Domain
Name: Signal recognition particle SRP54, helical bundle
Type: Domain
Description: This entry represents the N-terminal helical bundle domain of the 54kDa SRP54 component, a GTP-binding protein that interacts with the signal sequence when it emerges from the ribosome. SRP54 of the signal recognition particle has a three-domain structure: an N-terminal helical bundle domain, a GTPase domain, and the M-domain that binds the 7s RNA and also binds the signal sequence. The extreme C-terminal region is glycine-rich and lower in complexity and poorly conserved between species.Other proteins with this domain include signal recognition particle receptor alpha subunit (docking protein), an integral membrane GTP-binding protein which ensures (in conjunction with SRP) the correct targeting of nascent secretory proteins to the endoplasmic reticulum membrane; and bacterial FtsY protein, which is believed to play a similar role to that played by the eukaryotic docking protein.
Protein Domain
Name: Proteolipid membrane potential modulator
Type: Family
Description: Proteolipid membrane potential modulator is an evolutionarily conserved proteolipid in the plasma membrane which, in S. pombe, is transcriptionally regulated by the Spc1 stress MAPK (mitogen-activated protein kinases) pathway. It functions to modulate the membrane potential, particularly to resist high cellular cation concentration. In eukaryotic organisms, stress-activated mitogen-activated protein kinases play crucial roles in transmitting environmental signals that will regulate gene expression for allowing the cell to adapt to cellular stress. Pmp3-like proteins are highly conserved in bacteria, yeast, nematode and plants [ ].Proteins in this entry include the PMP3 as well as several other proteins that have been shown [ ] to be evolutionary related. Theseare small proteins of from 52 to 140 amino-acid resiudes that contain two transmembrane domains and belong to the UPF0057 (PMP3) protein family.
Protein Domain
Name: PDCD5-like
Type: Family
Description: This protein family is found in archaea and eukaryota. Proteins in this family contain a predicted DNA-binding domain [ ] and may function as DNA-binding proteins. Methanobacterium thermoautotrophicum MTH1615 was predicted to bind DNA based on structural proteomics data, and this was confirmed by the demonstration that it can interact non-specifically with a randomly chosen 20-mer of double stranded DNA []. This suggests that the human protein may be involved in nucleic acid binding or metabolism.The human programmed cell death protein 5 (PDCD5, also known as TFAR19) encodes a protein which shares significant homology to the corresponding proteins of species ranging from yeast to mice. PDCD5 exhibits a ubiquitous expression pattern and its expression is up-regulated in the tumour cells undergoing apoptosis. PDCD5 may play a general role in the apoptotic process [ , ].
Protein Domain
Name: BCNT-C domain
Type: Domain
Description: Vertebrate BCNT (named after Bucentaur) protein is found in the nucleus and cytosol. Gene duplication of the ancestral BCNT gene leads to the h-type BCNTor craniofacial development protein 1 (CFDP1) gene and the ruminant-specific p97BCNT or craniofacial development protein 2 (CFDP2) gene. The h-type BCNTproteins contain a highly conserved 82-amino acid region at the C terminus (BCNT-C) that is not present in p97BCNT. Instead ruminant p97BCNT contains aregion derived from the endonuclease domain of a retrotransposable element RTE-1 [, ].In addition to h-type BCNT proteins, a BCNT-C domain is also found in Drosophila YETI, a protein that binds to a microtubule-based motorkinesin-1, and the yeast SWR1-complex protein 5 (SWC5) or AOR1 (actin overexpression resistant 1), a component of the SWR1 chromatin remodelingcomplex [ , ].The entry represents the entire BCNT-C domain.
Protein Domain
Name: JAB1/MPN/MOV34 metalloenzyme domain
Type: Domain
Description: This domain is known as the MPN domain [ ], PAD-1-like domain [], JABP1 domain [] or JAMM domain []. Proteins with this domain include proteasome regulatory subunits, eukaryotic initiation factor 3 (eIF3) subunits and regulators of transcription factors. They are metalloenzymes that function as the ubiquitin isopeptidase/deubiquitinase in the ubiquitin-based signalling and protein turnover pathways in eukaryotes []. Versions of the domain in prokaryotic cognates of the ubiquitin-modification pathway are predicted to have a similar role [].The archaeal (H. volcanii) JAMM domain containing protein, HvJAMM1, cleaves ubiquitin-like small archaeal modifier proteins (SAMP1/2) from protein conjugates [ ]. The bacterial JAMM domain containing protein QbsD from Pseudomonas fluorescens cleaves the C-terminal amino acid residues of the sulfur carrier protein QbsE prior to the formation of the carboxy-terminal thiocarboxylate [].
Protein Domain
Name: Autotransporter beta-domain
Type: Domain
Description: Secretion of protein products occurs by a number of different pathways in bacteria. One of these pathways known as the type V pathway was first described for the IgA1 protease [ ]. The protein component that mediates secretion through the outer membrane is contained within the secreted protein itself, hence the proteins secreted in this way are called autotransporters. This family corresponds to the presumed integral membrane β-barrel domain that transports the protein. This domain is found at the C terminus of the proteins it occurs in. The N terminus contains the variable passenger domain that is translocated across the membrane. Once the passenger domain is exported it is cleaved auto-catalytically in some proteins, in others a different protease is used and in some cases no cleavage occurs [].
Protein Domain
Name: SRP/SRP receptor, N-terminal
Type: Homologous_superfamily
Description: This entry represents the N-terminal helical bundle domain of the 54kDa SRP54 component, a GTP-binding protein that interacts with the signal sequence when it emerges from the ribosome. SRP54 of the signal recognition particle has a three-domain structure: an N-terminal helical bundle domain, a GTPase domain, and the M-domain that binds the 7s RNA and also binds the signal sequence. The extreme C-terminal region is glycine-rich and lower in complexity and poorly conserved between species.Other proteins with this domain include signal recognition particle receptor alpha subunit (docking protein), an integral membrane GTP-binding protein which ensures (in conjunction with SRP) the correct targeting of nascent secretory proteins to the endoplasmic reticulum membrane; and bacterial FtsY protein, which is believed to play a similar role to that played by the eukaryotic docking protein.
Protein Domain
Name: Autotransporter beta-domain superfamily
Type: Homologous_superfamily
Description: Secretion of protein products occurs by a number of different pathways in bacteria. One of these pathways known as the type V pathway was first described for the IgA1 protease [ ]. The protein component that mediates secretion through the outer membrane is contained within the secreted protein itself, hence the proteins secreted in this way are called autotransporters [, ]. This superfamily corresponds to the presumed integral membrane β-barrel domain that transports the protein. This domain is found at the C terminus of the proteins it occurs in. The N terminus contains the variable passenger domain that is translocated across the membrane. Once the passenger domain is exported it is cleaved auto-catalytically in some proteins, in others a different protease is used and in some cases no cleavage occurs [, ].
Protein Domain
Name: p53-like tetramerisation domain superfamily
Type: Homologous_superfamily
Description: The p53 protein is a tetrameric transcription factor that plays a central role in the prevention of neoplastic transformation [ ]. Oligomerization appears to be essential for the tumour suppressing activity of p53. p53 can be divided into different functional domains: an N-terminal transactivation domain, a proline-rich domain, a DNA-binding domain (), a tetramerisation domain and a C-terminal regulatory region. The tetramerisation domain of human p53 extends from residues 325 to 356, and has a 4-helical bundle fold. The tetramerisation domain is essential for DNA binding, protein-protein interactions, post-translational modifications, and p53 degradation [ , ].This superfamily also includes the structural-related C-terminal SARAH (Salvador-RassF-Hippo) domain found in tumor suppressors such as Serine/threonine-protein kinase 4 (human Hippo homologue, Mst), Serine/threonine-protein kinase 3, Serine/threonine-protein kinase cst-1 and Serine/threonine-protein kinase hippo, which is involved in dimerisation [].
Protein Domain
Name: Shark-like, N-terminal SH2 domain
Type: Domain
Description: This entry represents the SH2 domain found in Drosophila shark protein and hydra protein HTK16. Shark and HTK16 are non-receptor protein-tyrosine kinases contain two SH2 domains, five ankyrin (ANK)-like repeats, and a potential tyrosine phosphorylation site in the carboxyl-terminal tail which resembles the phosphorylation site in members of the src family. Like, mammalian non-receptor protein-tyrosine kinases, ZAP-70 and syk proteins, they do not have SH3 domains. However, the presence of ANK makes these unique among protein-tyrosine kinases. Both tyrosine kinases and ANK repeats have been shown to transduce developmental signals, and SH2 domains are known to participate intimately in tyrosine kinase signaling [ ].Drosophila Shark transduces intracellularly the Crumbs, a protein necessary for proper organization of ectodermal epithelia, intercellular signal [ ]. It is essential for Draper-mediated signalling [].
Protein Domain
Name: Ferrous iron transporter FeoA domain
Type: Domain
Description: This entry represents the core domain of the ferrous iron (Fe2+) transport protein FeoA found in bacteria. This domain also occurs at the C terminus in related proteins. This suggests that this domain may be metal-binding. In most cases this is likely to be either iron or manganese.The transporter Feo is composed of three proteins: FeoA a small, soluble SH3-domain protein probably located in the cytosol which is composed of a β-barrel with two α-helices [ ]; FeoB, a large protein with a cytosolic N-terminal G-protein domain and a C-terminal integral inner-membrane domain containing two 'Gate' motifs which likely functions as the Fe2+ permease; and FeoC, a small protein apparently functioning as an [Fe-S]-dependent transcriptional repressor [ , ]. Feo allows the bacterial cell to acquire iron from its environment.
Protein Domain
Name: Pumilio homologue 3
Type: Family
Description: Proteins in this family contain a CPL domain, which is a C-terminal domain found in Penguin-like proteins, and Pumilio RNA-binding repeats. Proteins include the following: Pumilio homologue 3 (PUM3; also known as Puf-A), which inhibits the poly(ADP-ribosyl)ation activity of PARP1 and the degradation of PARP1 by CASP3 following genotoxic stress [ ]. Tertiary structure determination has revealed that PUM3 has eleven PUM repeats arranged in an L-shape, and in contrast to classical PUF proteins, PUM3 forms sequence-independent interactions with DNA or RNA, mediated by conserved basic residues [].Pumilio homology domain family member 6 from Saccharomyces cerevisiae and Schizosaccharomyces pombe, which is an RNA-binding protein involved in post-transcriptional regulation and represses ASH1 mRNA translation. ASH1 is a a protein determinant for mating-type switching [ ].Protein penguin from Drosophila melanogaster.
Protein Domain
Name: PDZD7, Harmonin N-like domain
Type: Domain
Description: Human PDZ domain-containing protein 7 (PDZD7) is a scaffolding protein which associates with the Usher Syndrome protein network, and localizes to the stereocilia Ankle-link. Usher syndrome is the leading cause of genetic deaf-blindness. PDZD7 has a role as in Usher syndrome type 2 (and not in USH1) in humans. Whirlin, Usherin and GRP98 are other USH2 proteins. The latter two form the ankle links and whirlin is thought to be a scaffold for protein interactions at these links. PDZD7, whirlin, and harmonin (an USH1 protein) have a similar domain composition [ , ]. The domain represented here is a putative protein-binding module based on its sequence similarity to the N-terminal domain of harmonin []. Cooperative effects of mutations in PDZD7 and Usherin, and in PDZD7 and GPR98, result in a digenic USH2 phenotype [].
Protein Domain
Name: Lipopolysaccharide core heptose(II)-phosphate phosphatase
Type: Family
Description: This entry represents lipopolysaccharide core heptose(II)-phosphate phosphatase, which catalyzes the dephosphorylation of heptose(II) of the outer membrane lipopolysaccharide core. These proteins are related to cofactor-dependent phosphoglucomutases so distantly that the characteristic domain ( ) is not detected. Nonetheless, the relationship has been shown by sequence analysis and molecular modelling [ ], on the basis of which it is predicted that these proteins are phosphatases acting on large substrates, perhaps phosphoproteins. Sequence analysis also predicts that they are periplasmic, having a signal sequence and two pairs of Cys residues close enough to form disulphide bonds. Known properties of these proteins are rather disparate. AfrS is required for the transcriptional activation of AfrA, the major protein of AF/R1 pili in Escherichia coli RDEC-1. Ais proteins are aluminium-inducible proteins of unknown function.
Protein Domain
Name: Pinkbar, SH3
Type: Domain
Description: This entry represents the SH3 domain of Pinkbar. Brain-specific angiogenesis inhibitor 1-associated protein 2-like protein 2 (Baiap2l2), also known as planar intestinal and kidney specific BAR domain protein (Pinkbar), is an I-BAR (Bin/amphipysin/Rvs) domain containing protein. BAR domain forms an anti-parallel all-helical dimer, with a curved (banana-like) shape, that promotes membrane tubulation. BAR domain proteins can be classified into three types: BAR, F-BAR and I-BAR. BAR and F-BAR proteins generate positive membrane curvature, while I-BAR proteins induce negative curvature [ ].In humans Baiap2l2 is specifically expressed in intestinal epithelial cells, where it localises to Rab13-positive vesicles and to the plasma membrane at intercellular junctions [ ]. The BAR domain of Baiap2l2 does not induce membrane protrusions or invaginations; instead, it promotes the formation of planar membrane sheets [].
Protein Domain
Name: MPP6, SH3 domain
Type: Domain
Description: This entry represents the SH3 domain of MPP6 (also known as VAM-1), which is a scaffolding protein that binds to Veli-1, a homologue of Caenorhabditis Lin-7 [ ]. MPP6 belongs to the membrane-associated guanylate kinase (MAGUK) p55 subfamily.The membrane-associated guanylate kinase (MAGUK) p55 subfamily (also known as MPP subfamily) members include the Drosophila Stardust protein and its vertebrate homologues, MPP1-7. They contain the core of three domains characteristic of MAGUK (membrane-associated guanylate kinase) proteins: PDZ, SH3, and guanylate kinase (GuK). In addition, they also contain the Hook (Protein 4.1 Binding) motif in between the SH3 and GuK domains [ ]. MPP2-7 have two additional L27 domains at their N terminus. The GuK domain in MAGUK proteins is enzymatically inactive; instead, the domain mediates protein-protein interactions and associates intramolecularly with the SH3 domain [].
Protein Domain
Name: G-protein gamma-like domain
Type: Domain
Description: This entry represents the G protein gamma subunit and the GGL (G protein gamma-like) domain, which are related in sequence and are comprised of an extended α-helical polypeptide. The G protein gamma subunit forms a stable dimer with the beta subunit, but it does not make any contact with the alpha subunit, which contacts the opposite face of the beta subunit. The GGL domain is found in several RGS (regulators of G protein signaling) proteins. GGL domains can interact with beta subunits to form novel dimers that prevent gamma subunit binding, and may prevent heterotrimer formation by inhibiting alpha subunit binding. The interaction between G protein beta-5 neuro-specific isoforms and RGS GGL domains may represent a general mode of binding between β-propeller proteins and their partners [ ].
Protein Domain
Name: Regulator of G-protein signalling 5, RGS domain
Type: Domain
Description: The RGS (Regulator of G-protein Signalling) domain is an essential part of the RGS5 protein.RGS5 is a member of R4 subfamily of RGS family, a diverse group of multifunctional proteins that regulate cellular signalling events downstream of G-protein coupled receptors (GPCRs) [ ]. Signalling is initiated when GPCRs bind to their ligands, triggering the replacement of GDP bound to the G-alpha subunits of heterotrimeric G proteins with GTP. RGSs inhibit signal transduction by increasing the GTPase activity of G protein alpha subunits, thereby driving them into their inactive GDP-bound form. This activity defines them as GTPase activating proteins (GAPs).RGS5 may have a role in pericyte (peri-endothelial cell) and vascular smooth muscle cell development and function [ , , ]. It is also expressed is the heart, where it is upregulated by beta-adrenergic mediated induction [].
Protein Domain
Name: ELO family, conserved site
Type: Conserved_site
Description: This entry represents a conserved site found in the ELO (Elongation of fatty acids protein) family members, such as:Mammalian proteins ELOVL1 (Elongation of very long chain fatty acids protein 1) to ELOVL4 [ ]. These proteins all seem to beinvolved in the synthesis of very long chain fatty acids. Yeast ELO1, ELO2 and ELO3 [ ]. They seem to be components of membrane-boundfatty acid elongation systems. Caenorhabditis elegans hypothetical protein C40H1.4.Caenorhabditis elegans hypothetical protein D2024.3.The proteins have from 271 to 435 amino acid residues. Structurally, they seem to be formed of three sections: a N-terminal region with two transmembranedomains, a central hydrophilic loop and a C-terminal region that contains from one to three transmembrane domains. This entry represents aconserved region that contains three histidines. This region is located in the hydrophilic loop.
Protein Domain
Name: Zinc finger, DksA/TraR C4-type
Type: Domain
Description: Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [, , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. This entry represents domains identified in zinc finger-containing members of the DksA/TraR family. DksA is a critical component of the rRNA transcription initiation machinery that potentiates the regulation of rRNA promoters by ppGpp and the initiating NTP. In delta-dksA mutants, rRNA promoters are unresponsive to changes in amino acid availability, growth rate, or growth phase. In vitro, DksA binds to RNAP, reduces open complex lifetime, inhibits rRNA promoter activity, and amplifies effects of ppGpp and the initiating NTP on rRNA transcription [ , ]. The dksA gene product suppresses the temperature-sensitive growth and filamentation of a dnaK deletion mutant of Escherichia coli. Gene knockout [] and deletion [] experiments have shown the gene to be non-essential, mutations causing a mild sensitivity to UV light, but not affecting DNA recombination []. In Pseudomonas aeruginosa, dksA is a novel regulator involved in the post-transcriptional control of extracellular virulence factor production [ ]. The proteins contain a C-terminal region thought to fold into a 4-cysteine zinc finger. Other proteins found to contain a similar zinc finger domain include:the traR gene products encoded on the E. coli F and R100 plasmids [ , ] the traR gene products encoded on Salmonella spp. plasmids pED208 and pSLTthe dnaK suppressorhypothetical proteins from bacteria and bacteriophageFHL4, LIM proteins from Homo sapiens (Human) and Mus musculus (Mouse) [ ]
Protein Domain
Name: Anion exchange, conserved site
Type: Conserved_site
Description: Bicarbonate (HCO 3-) transport mechanisms are the principal regulators of pH in animal cells. Such transport also plays a vital role in acid-base movements in the stomach, pancreas, intestine, kidney, reproductive organs and the central nervous system. Functional studies have suggested four different HCO3-transport modes. Anion exchanger proteins exchange HCO 3-for Cl -in a reversible, electroneutral manner [ ]. Na+/HCO 3-co-transport proteins mediate the coupled movement of Na +and HCO 3-across plasma membranes, often in an electrogenic manner [ ]. Na+driven Cl -/HCO 3-exchange and K +/HCO 3-exchange activities have also been detected in certain cell types, although the molecular identities of the proteins responsible remain to be determined. Sequence analysis of the two families of HCO 3-transporters that have been cloned to date (the anion exchangers and Na +/HCO 3-co-transporters) reveals that they are homologous. This is not entirely unexpected, given that they both transport HCO 3-and are inhibited by a class of pharmacological agents called disulphonic stilbenes [ ]. They share around ~25-30% sequence identity, which is distributed along their entire sequence length, and have similar predicted membrane topologies, suggesting they have ~10 transmembrane (TM) domains.Anion exchange proteins participate in pH and cell volume regulation. They are glycosylated, plasma-membrane transport proteins thatexchange hydrogen carbonate (HCO 3-) for chloride (Cl -) in a reversible, electroneutral manner [, ]. To date three anion exchanger isoforms havebeen identified (AE1-3), AE1 being the previously-characterised erythrocyte band 3 protein. They share a predicted topology of 12-14 transmembrane (TM)domains, but have differing distribution patterns and cellular localisation. The best characterised isoform, AE1, is known to be the most abundantmembrane protein in mature erythrocytes. It has a molecular mass of ~95kDa and consists of two major domains. The N-terminal 390 residues form a water-soluble, highly elongated domain that serves as an attachment site for the binding of the membrane skeleton and other cytoplasmic proteins. Theremainder of the protein is a 55kDa hydrophobic domain that is responsible for catalysing anion exchange. The function of the analogous domains of AE2and AE3 remains to be determined [ ].Naturally-occurring mutations have been characterised in the AE1 gene, which give rise to forms of several human diseases: included are spherocytosis,affecting red blood cells, and familial distal renal tubular acidosis, a kidney disease associated with the formation of kidney stones [].This entry represents two conserved small regions of approximately 11 and 14 residues found in AE1-3 mostly from chordates. The first region contains four clustered positive charged residues and is located just before the first transmembrane segment from the integral domain. The second region contains a lysine, which is the covalent binding site for the isothiocyanate group of DIDS, an inhibitor of anion exchange.
Protein Domain
Name: rRNA small subunit methyltransferase B, enterobacteriaceae
Type: Family
Description: RNA (C5-cytosine) methyltransferases (RCMTs) catalyse the transfer of a methyl group to the 5th carbon of a cytosine base in RNA sequences to produce C5-methylcytosine. RCMTs use the cofactor S-adenosyl-L-methionine (SAM) as a methyl donor [ ]. The catalytic mechanism of RCMTs involves an attack by the thiolate of a Cys residue on position 6 of the target cytosine base to form a covalent link, thereby activating C5 for methyl-group transfer. Following the addition of the methyl group, a second Cys residue acts as a general base in the beta-elimination of the proton from the methylated cytosine ring. The free enzyme is restored and the methylated product is released [].Numerous putative RCMTs have been identified in archaea, bacteria and eukaryota [ , ]; most are predicted to be nuclear or nucleolar proteins []. The Escherichia coli Ribosomal RNA Small-subunit Methyltransferase Beta (RSMB) FMU (FirMicUtes) represents the first protein identified and characterised as a cytosine-specific RNA methyltransferase. RSMB was reported to catalyse the formation of C5-methylcytosine at position 967 of 16S rRNA [, ].A classification of RCMTs has been proposed on the basis of sequence similarity [ ]. According to this classification, RCMTs are divided into 8 distinct subfamilies []. Recently, a new RCMT subfamily, termed RCMT9, was identified []. Members of the RCMT contain a core domain, responsible for the cytosine-specific RNA methyltransferase activity. This 'catalytic' domain adopts the Rossman fold for the accommodation of the cofactor SAM []. The RCMT subfamilies are also distinguished by N-terminal and C-terminal extensions, variable both in size and sequence [].The rRNA small subunit methyltransferase B (RsmB) protein, often referred to as Fmu, has been demonstrated to methylate only C967 of the 16S ribosomal RNA and to produce only m5C at that position [ ]. The structure of the E. coli protein has been determined []. It contains three subdomains which share structural homology to DNA m5C methyltransferases and two RNA binding protein families. The N-terminal sequence shares homology to another (noncatalytic) RNA binding protein, e.g. the ribosomal RNA antiterminator protein NusB (). The catalytic lobe of the N1 domain, comprises the conserved core identified in all of the putative RNA m5C MTase sequences. Although the N1 domain is structurally homologous to known RNA binding proteins, there is no clear sequence motif that defines its role in RNA binding and recognition. At the functional centre of the catalytic lobe is the MTase domain of Fmu (residues 232-429), which adopts a fold typical of known AdoMet-dependent methyltransferases. In spite of the lack of a conserved RNA binding motif in the N1 domain, the close association of the N1 and MTase domains suggest that any RNA bound in the active site of the MTase domain is likely to interact with the N1 domain. Theis entry is specific for the enterobacterial RsmB proteins.
Protein Domain
Name: Dynamin-type guanine nucleotide-binding (G) domain
Type: Domain
Description: This entry represents the dynamin-type guanine nucleotide-binding (G) domain. Members of the dynamin GTPase family appear to be ubiquitous. They catalyze diverse membrane remodelling events in endocytosis, cell division, and plastid maintenance. Their functional versatility also extends to other core cellular processes, such as maintenance of cell shape or centrosome cohesion. Members of the dynamin family are characterised by their common structure and by conserved sequences in the GTP-binding domain. The minimal distinguishing architectural features that are common to all dynamins and are distinct from other GTPases are the structure of the large GTPase domain (~280 amino acids) and the presence of two additional domains: the middle domain and the GTPase effector domain (GED), which are involved in oligomerization and regulation of the GTPase activity. In many dynamin family members, the basic set of domains is supplemented by targeting domains, such as: pleckstrin-homology (PH) domain, proline-rich domains (PRDs), or by sequences that target dynamins to specific organelles, such as mitochondria and chloroplasts [ , , ]. The dynamin-type G domain consists of a central eight-stranded β-sheet surrounded by seven alpha helices and two one-turn helices.It contains the five canonical guanine nucleotide binding motifs (G1-5). The P-loop (G1) motif (GxxxxGKS/T) is also present in ATPases (Walker A motif) andfunctions as a coordinator of the phosphate groups of the bound nucleotide. A conserved threonine in switch-I (G2) and the conserved residues DxxG ofswitch-II (G3) are involved in Mg(2+) binding and GTP hydrolysis. The nucleotide binding affinity of dynamins is typically low, with specificity forGTP provided by the mostly conserved N/TKxD motif (G4). The G5 or G-cap motif is involved in binding the ribose moiety [, , ].Some proteins containing a dynamin-type G domain are listed below [ , ]:Animal dynamin, the prototype for this family. The role of dynamin in endocytosis is well established. Additional roles were proposed in vesiclebudding from the trans-Golgi network (TGN) and the budding of caveolae from the plasma membrane [].Vetebrate Mx proteins, a group of interferon (IFN)-induced GTPases involved in the control of intracellular pathogens [, ].Eukaryotic Drp1 (Dnm1 in yeast) mediates mitochondrial and peroxisomal fission.Eukaryotic Eps15 homology (EH)-domain-containing proteins (EHDs), ATPases implicated in clathrin-independent endocytosis and recycling fromendosomes. The dynamin-type G domains of EHDs bind to adenine rather than to guanine nucleotide [, ].Yeast to human OPA1/Mgm1 proteins. They are found between the inner and outer mitochondrial membranes and are involved in mitochondrial fusion.Yeast to human mitofusin/fuzzy onions 1 (Fzo1) proteins, involved in mitochondrial dynamics [, ].Yeast vacuolar protein sorting-associated protein 1 (Vps1), involved in vesicle trafficking from the Golgi.Escherichia coli clamp-binding protein CrfC (or Yjda), important for the colocalization of sister nascent DNA strands after replication fork passageduring DNA replication, and for positioning and subsequent partitioning of sister chromosomes [].Nostoc punctiforme bacterial dynamin-like protein (BDLP) [ , ].
Protein Domain
Name: Hypoxia induced protein, domain
Type: Domain
Description: The hypoxia induced gene 1 (HIG1) or hypoglycemia/hypoxia inducible mitochondrial protein (HIMP1) is up-regulated by stresses of themicroenvironment such as low oxygen or low glucose conditions. HIG1 is a mitochondrial inner membrane protein, which is ubiquitously expressed. It ispredicted to be an integral membrane protein consisting of two hydrophobic helices, 21-23 residues in length that might tend to form a hairpin-like loopacross the bilayer. HIG1 could be implied in apoptotic or cytoprotective signals. HIG1 is a member of a well conserved eukaryote protein family. Thepredicted transmembrane helice (TMH) and loop regions represent the most highly conserved regions in these proteins [, ].The profile we developed covers the predicted TMH and loop regions. This domain is found in proteins thought to be involved in the response to hypoxia [ ]. It is also found in altered inheritance of mitochondria proteins.
Protein Domain
Name: TRAP transporter solute receptor DctP
Type: Family
Description: The tripartite ATP-independent periplasmic (TRAP) transporters are substrate-binding protein (SBP)-dependent secondary transporters found in prokaryotes. They consist of a substrate-binding protein (SBP) of the DctP or TAXI families and two integral membrane proteins that form the DctQ and DctM protein families [ ].This entry represents the DctP family of the substrate-binding proteins. They are part of the DctP-TRAP (tripartite ATP-independent periplasmic) transporter involved in binding extracellular solutes for transport across the bacterial cytoplasmic membrane. Proteins in this family include DctP from R. capsulatus, SiaP from Haemophilus influenzae [], DctB from Bacillus subtilis [], and TeaA from Halomonas elongata []. The structure of the SiaP receptor has revealed an overall topology similar to ATP binding cassette ESR (extracytoplasmic solute receptors) proteins []. Upon binding of sialic acid, SiaP undergoes domain closure about a hinge region and kinking of an α-helix hinge component [].
Protein Domain
Name: Snx41 /Atg20, PX domain
Type: Domain
Description: The Phox Homology (PX) domain is a phosphoinositide (PI) binding module present in many proteins with diverse functions. Sorting nexins (SNXs) make up the largest group among PX domain containing proteins. They are involved in regulating membrane traffic and protein sorting in the endosomal system. The PX domain of SNXs binds phosphoinositides (PIs) and targets the protein to PI-enriched membranes [, ]. SNXs differ from each other in PI-binding specificity and affinity, and the presence of other protein-protein interaction domains, which help determine subcellular localization and specific function in the endocytic pathway [, , ].This entry represents the PX domain found in a group of fungal sorting nexins, including Snx41 and Snx42 (also called Atg20). SNX41 and SNX42 form dimers with SNX4, and are required in protein recycling from the sorting endosome (post-Golgi endosome) back to the late Golgi in yeast [ ].
Protein Domain
Name: MPP2, SH3 domain
Type: Domain
Description: This entry represents the SH3 domain of MPP2. MPP2 is a scaffolding protein that interacts with the non-receptor tyrosine kinase c-Src in epithelial cells to negatively regulate its activity and morphological function [ ]. MPP2 belongs to the membrane-associated guanylate kinase (MAGUK) p55 subfamily. The membrane-associated guanylate kinase (MAGUK) p55 subfamily (also known as MPP subfamily) members include the Drosophila Stardust protein and its vertebrate homologues, MPP1-7. They contain the core of three domains characteristic of MAGUK (membrane-associated guanylate kinase) proteins: PDZ, SH3, and guanylate kinase (GuK). In addition, they also contain the Hook (Protein 4.1 Binding) motif in between the SH3 and GuK domains [ ]. MPP2-7 have two additional L27 domains at their N terminus. The GuK domain in MAGUK proteins is enzymatically inactive; instead, the domain mediates protein-protein interactions and associates intramolecularly with the SH3 domain [].
Protein Domain
Name: CXC chemokine, conserved site
Type: Conserved_site
Description: Most members of this family of low-molecular weight proteins seem to have mitogenic, chemotactic or inflammatory activities. They are released by phagocytes, mesenchymal cells and a wide variety of tissue cells, upon exposure to inflammation []. These small cytokines are also called intercrines or chemokines. They are cationic proteins of 70 to 100 amino acid residues that share four conserved cysteine residues involved in two disulphide bonds.The family can be split into two groups, depending on the spacing of two N-terminal Cys residues: in one group (CxC), the cysteines are separated by a single amino acid; in the second (C-C), they are adjacent []. The CxC group includes such factors as interleukin-8, platelet factor 4, melanoma growth stimulatory activity protein, macrophage inflammatory protein 2, platelet basic protein, and several others. The 'C-C' group includes the monocyte chemotactic proteins, macrophage inflammatory proteins, amongst others.
Protein Domain
Name: NolX
Type: Family
Description: This family consists of Rhizobium NolX and Xanthomonas HrpF proteins. The interaction between the plant pathogen Xanthomonas campestris pv. vesicatoria (strain 85-10) and its host plants is controlled by hrp genes (hypersensitive reaction and pathogenicity), which encode a type III protein secretion system. Among type III-secreted proteins are avirulence proteins, effectors involved in the induction of plant defence reactions. HrpF is dispensable for protein secretion but required for AvrBs3 recognition in planta, is thought to function as a translocator of effector proteins into the host cell [ ]. NolX, a Glycine max (Soybean) cultivar specificity protein, is secreted by a type III secretion system (TTSS) and shows homology to HrpF. It is not known whether NolX functions at the bacterium-plant interface or acts inside the host cell. NolX is expressed in planta only during the early stages of nodule development [].
Protein Domain
Name: Haemagglutinin-neuraminidase
Type: Family
Description: This entry represents the haemagglutinin-neuraminidase (HN) glycoprotein found in a variety of paramyxoviruses (negative-stranded RNA viruses), including Mumps virus, Human parainfluenza virus 3, and the avian pathogen Newcastle disease virus. Some paramyxoviruses have two surface glycoproteins, HN and a fusion protein (F). HN is a multi-functional protein with three distinct functions: a receptor-binding (haemagglutinin) activity, a receptor-destroying (neuraminidase) activity, and a membrane fusion activity that fuses the viral envelope to the host cell membrane in order to infect the cell. The fusion activity involves an interaction between HN and the fusion protein. In other viruses, such as influenza A and B viruses, haemagglutinin and neuraminidase occur as separate glycoproteins.The haemagglutinin-neuraminidase glycoprotein has a six-bladed β-propeller structure, and bears structural similarity to influenza A and B virus neuraminidase, bacterial neuraminidase, trypanosomal neuraminidase and transialidase [ , ].
Protein Domain
Name: Tic22-like
Type: Family
Description: Chloroplast function requires the import of nuclear encoded proteins from the cytoplasm across the chloroplast double membrane. This is accompished by two protein complexes, the Toc complex located at the outer membrane and the Tic complex loacted at the inner membrane [ ]. The Toc complex recognises specific proteins by a cleavable N-terminal sequence and is primarily responsible for translocation through the outer membrane, while the Tic complex translocates the protein through the inner membrane.This entry represents Tic22, a core member of the Tic complex. Tic22 is a soluble protein and the only Tic complex component of the intermembrane space that has been shown to interact with preproteins during import. It may also act as a chaperone that prevent misfolding or missorting of the preprotein to the intermembrane space and potentially serve as a component that links the Toc and Tic complexes [ ].
Protein Domain
Name: Ras GTPase-activating domain
Type: Domain
Description: This entry represents a conserved domain in the RasGAPs (Ras GTPase-activating proteins). This domain is also known as the RasGAP domain. Ras proteins are membrane-associated molecular switches that bind GTP and GDP and slowly hydrolyze GTP to GDP []. This intrinsic GTPase activity of Ras is regulated by a family of proteins collectively known as 'GAP' or GTPase-activating proteins [, ]. RasGAP proteins are usually quite large (from 765 residues for sar1 to 3079residues for IRA2) but share only a limited (about 250 residues) region of sequence similarity, referred to as the 'catalytic domain' or RasGAP domain.The most conserved region within this domain contains a 15 residue motif which seems to be characteristic of this family of proteins []. Note: There are distinctly different GAPs for the rap and rho/rac subfamilies of Ras-like proteins (reviewed in reference [ ]) that do not share sequence similarity with ras GAPs.
Protein Domain
Name: SUZ-C domain
Type: Domain
Description: The SUZ-C domain is a conserved motif found in one or more copies in several RNA-binding proteins [ ]. It is always found at the C terminus of the protein and appears to be required for localization of the protein to specific subcellular structures. The domain was first characterised in the C.elegans protein SZY-20, a centrosome-associated RNA-binding protein that negatively regulates centrosome assembly. SZY-20 and its animal orthologs share three prominent blocks of conservation. The N-terminal block is a short element, which is found exclusively in orthologs of SZY-2. The third block, the SUZ-C domain, is at the extreme C terminus and is defined by a characteristic pattern of two highly conserved glycines and one absolutely conserved proline. The SUZ and SUZ-C domains occur independently in proteins outside of the SZY-20 family. Many of these proteins contain known RNA-binding domains. The SUZ-C domain is a putative RNA-binding domain [ , ].
Protein Domain
Name: Phlebovirus glycoprotein G2, fusion domain
Type: Domain
Description: HRTV is an insect-borne virus found in America that can infect humans. It belongs to the newly defined family Phenuiviridae, order Bunyavirales. HRTV contains three single-stranded RNA segments (L, M, and S). The M segment of the virus encodes a polyprotein precursor that is cleaved into two glycoproteins, Gn and Gc. Gc is a fusion protein facilitating virus entry into host cells [ ]. G2 (also known as Gc) is necessary for optimal glycoprotein G1 (also found as Gn) expression and efficient production or viral-like particles and possibly for the cell infection, as G2 is determinant for cell fusion []. Phleboviral G2 fusion glycoprotein is both functionally and structurally analogous to the fusion glycoproteins of alphaviruses and flaviviruses, all of them recognize DC-SIGN receptor to enable viral attachment [].This domain consists of the N-terminal and middle domains, known as fusion domain, of several Phlebovirus glycoprotein G2 sequences.
Protein Domain
Name: LARG, RGS domain
Type: Domain
Description: The RGS domain is an essential part of the leukemia-associated RhoGEF protein (LARG, also known as Rho guanine nucleotide exchange factor 12), a member of the RhoGEF subfamily of the RGS protein family. The RhoGEFs are peripheral membrane proteins that regulate essential cellular processes, including cell shape, cell migration, cell cycle progression of cells, and gene transcription by linking signals from heterotrimeric G-alpha12/13 protein-coupled receptors to Rho GTPase activation, leading to various cellular responses, such as actin reorganization and gene expression [, ].The RhoGEF subfamily includes p115RhoGEF, LARG, PDZ-RhoGEF, and its rat specific splice variant GTRAP48. The RGS domain of RhoGEFs has very little sequence similarity with the canonical RGS domain of the RGS proteins and is often refered to as RH (RGS Homology) domain [ ]. In addition to being a G-alpha13 effector, the LARG protein also functions as a GTPase-activating protein (GAP) [].
Protein Domain
Name: Neocarzinostatin-like
Type: Homologous_superfamily
Description: Neocarzinostatin (NCS) is a member of the macromolecular antitumour antibiotic family. NCS consists of two components, a polypeptide (apo-NCS) and a nonprotein chromophore. The crystal structure of NCS is similar to that of the related chromoproteins actinoxanthin and macromomycin [ ]. The protein component of the chromophore displays an unusual bicyclic dienediyne structure. The chromoprotein inter-chelates the DNA, where its cycloaromatisation produces a biradical intermediate that has the ability to abstract hydrogens from the sugar moiety of DNA. This causes single- and double-strand breaks in the DNA [ ]. In addition to their ability to cleave DNA at sites specific for each chromophore, results indicate that these chromoproteins also possess proteolytic activity against histones, with histone H1 as the preferred substrate [].This entry represents a domain is found in antitumour antibiotic chromoproteins from the neocarzinostatin family [ ] and the thiamine biosynthesis protein X.
Protein Domain
Name: P22 tailspike C-terminal domain
Type: Domain
Description: The tailspike protein of Salmonella bacteriophage P22 is a viral adhesion protein that mediates attachment of the viral protein to host cell-surface lipopolysaccharide. The tailspike protein displays both receptor binding and destroying properties, inactivating the receptor by endoglycosidase activity. P22 tailspike is a homotrimer composed of 666 amino acid polypeptide chains. P22 tailspike consists of three main structural elements: the head-binding domain at the N terminus, the β-helix in the centre of the protein, and the β-prism and caudal fin at the C terminus [ ]. The P22 tailspike protein contains similar structural elements to pectin lyase []. The binding of the disaccharide product occurs within a positively charged cleft formed by loops extending from the surface of the β-helix structure. Amino acid residues responsible for recognition of the disaccharide, as well as potential catalytic residues, have been identified [].This entry represents the C-terminal domain of the tailspike protein.
Protein Domain
Name: CXC chemokine
Type: Family
Description: Most members of this family of low-molecular weight proteins seem to have mitogenic, chemotactic or inflammatory activities. They are released by phagocytes, mesenchymal cells and a wide variety of tissue cells, upon exposure to inflammation []. These small cytokines are also called intercrines or chemokines. They are cationic proteins of 70 to 100 amino acid residues that share four conserved cysteine residues involved in two disulphide bonds.The family can be split into two groups, depending on the spacing of two N-terminal Cys residues: in one group (CxC), the cysteines are separated by a single amino acid; in the second (C-C), they are adjacent []. The CxC group includes such factors as interleukin-8, platelet factor 4, melanoma growth stimulatory activity protein, macrophage inflammatory protein 2, platelet basic protein, and several others. The 'C-C' group includes the monocyte chemotactic proteins, macrophage inflammatory proteins, amongst others.This entry represents the CxC family of chemokines.
Protein Domain
Name: Ras GTPase-activating protein, conserved site
Type: Conserved_site
Description: This entry represents a conserved site in the RasGAPs (Ras GTPase-activating proteins). Ras proteins are membrane-associated molecular switches that bind GTP and GDP and slowly hydrolyze GTP to GDP [ ]. This intrinsic GTPase activity of Ras is regulsted by a family of proteins collectively known as 'GAP' or GTPase-activating proteins [, ]. The Ras GTPase-activating proteins are quite large (from 765 residues for sar1 to 3079 residues for IRA2) but share only a limited (about 250 residues) region of sequence similarity, referred to as the 'catalytic domain' or RasGAP domain. The most conserved region within this domain contains a 15 residue motif which seems to be characteristic of this family of proteins []. This entry represents this conserved site.Note: There are distinctly different GAPs for the Rap and Rho/Rac subfamilies of Ras-like proteins (reviewed in reference [ ]) that do not share sequence similarity with RasGAPs.
Protein Domain
Name: Calsyntenin, C-terminal
Type: Domain
Description: This entry represents the cytoplasmic C-terminal domain of calsyntenin (CLSTN) proteins 1, 2 and 3 (also known as alcadein-alpha, beta and gamma). These are postsynaptic Ca2-binding proteins, evolutionarily conserved type I membrane proteins. CLSTN (also known as Alc) forms a tripartite complex with APP (amyloid beta-protein precursor) and X11L (a neuron-specific adaptor protein) in brain; this complex stabilizes both APP and CLSTN proteins metabolically. Deficiencies in the X11L-mediated interaction between CLSTN and APP and/or CTFbeta enhances the production of amyloid beta-protein and have been linked to the development or progression of Alzheimer's disease [ ]. Moreover, CLSTN1 strongly associates with kinesin-1 light chains (KLC1) and acts as a cargo that regulates kinesin-1 function. It may affect the transport of APP-containing vesicles by kinesin-1 []. This domain includes the WD motifs required for KLC1 interaction (although one WD motif is sufficient), and the NP motif [].
Protein Domain
Name: Chondromodulin/Tenomodulin
Type: Family
Description: This entry includes chondromodulin (CNMD, also known as Leukocyte cell-derived chemotaxin 1) and chondromodulin-like protein (TNMD, Tenomodulin). CNMD is a transmembrane glycoprotein with two cleaved portions: the N-terminal contains a surfactant protein referred to as a chondrosurfactant protein, and the C-terminal of the precursor protein known as Chm-1. Mature Chm-1 is cleaved from pre-Chm-1 in a furin-like processing site. Mature Chm-1 then can be secreted into the extracellular matrix. Chm-1 is involved in the regulation of tissue angiogenesis, cartilage development and homeostasis. It has also been linked to the onset and progression of diseases, such as osteoarthritis, infective endocarditis, and cancer [ ].Chondromodulin-like protein, also known as Tenomodulin or CNMD, share protein sequence similarity with pre-Chm-1. However, CNMD is located on the cell surface. It is expressed in dense hypovascular connective tissue. It is involved in cell adhesion, determination of cell morphology, cell aging and bone mineral density [ ].
Protein Domain
Name: Oberon, PHD finger domain
Type: Domain
Description: This entry represents a plant homeodomain (PHD) finger domain of Oberon proteins from plants. Oberon is necessary for maintenance and/or establishment of both the shoot and root apical meristems in Arabidopsis. Oberon proteins are made up of a PHD finger domain and a coiled-coil domain. The PHD-finger domain is found in a wide variety of proteins involved in the regulation of chromatin structure [ ]. Oberon proteins mediate the TMO7 (the direct target of MP) expression through modification of, or binding to, chromatin at the TMO7 locus. TMO7 stands for the target of Monopteros 7 (MP) (or Auxin response factor 7) [].This domain can also be found in protein VERNALIZATION INSENSITIVE 3 (VIN3) and related proteins. VIN3 is involved in both the vernalization and photoperiod pathways in Arabidopsis [ ]. VIN3-like protein 1 (also known as VRN5) functions in the epigenetic silencing of Arabidopsis FLOWERING LOCUS C [].
Protein Domain
Name: VAMP5, SNARE motif
Type: Domain
Description: VAMP-5 (vesicle-associated membrane protein 5) belongs to the R-SNARE subgroup of SNAREs [ ]. SNARE (soluble N-ethylmaleimide-sensitive factor attachment protein receptor) proteins contain coiled-coil helices (called SNARE motifs) which mediate the interactions between SNARE proteins, and a transmembrane domain [, ]. The SNARE complex mediates membrane fusion, important for trafficking of newly synthesized proteins, recycling of pre-existing proteins and organelle formation. SNARE proteins are classified into four groups, Qa-, Qb-, Qc- and R-SNAREs, depending on whether the residue in the hydrophilic centre layer of the four-helical bundle is a glutamine (Q) or arginine (R). Qa-, as well as Qb- and Qc-SNAREs, are localized to target organelle membranes, while R-SNARE is localized to vesicle membranes. They form unique complexes consisting ofone member of each subgroup, that mediate fusion between a specific type of vesicles and their target organelle. Their SNARE motifs form twisted and parallel heterotetrameric helix bundles [ ].
Protein Domain
Name: Gamma-glutamyl cyclotransferase-like
Type: Domain
Description: Gamma-glutamyl cyclotransferase (GGCT) catalyzes the formation of pyroglutamic acid (5-oxoproline) from dipeptides containing gamma-glutamyl, and is a dimeric protein. In Homo sapiens, the protein is encoded by the gene C7orf24 [ ]. The enzyme participates in the gamma-glutamyl cycle, which plays a pre-eminent role in glutathione homoeostasis []. The synthesis and metabolism of glutathione (L-gamma-glutamyl-L-cysteinylglycine) ties the gamma-glutamyl cycle to numerous cellular processes; glutathione acts as a ubiquitous reducing agent in reductive mechanisms involved in protein and DNA synthesis, transport processes, enzyme activity, and metabolism.AIG2 (avrRpt2-induced gene) is an Arabidopsis protein that exhibits RPS2- and avrRpt2-dependent induction early after infection with Pseudomonas syringae pv maculicola strain ES4326 carrying avrRpt2 [ , ]. avrRpt2 is an avirulence gene that can convert virulent strains of P. syringae to avirulence on Arabidopsis thaliana, soybean, and bean. This GGCT-like domain is also found in bacterial tellurite-resistance proteins (TrgB) [], butirosin biosynthesis protein BtrG [], and in proteins described as ChaC, involved in a novel pathway for glutathione degradation [].
Protein Domain
Name: Germin, manganese binding site
Type: Binding_site
Description: Germins and germin-like proteins [ ] are a family of hexameric ubiquitous plant glycoproteins. They are not restricted to germinating grains as initially thought and therefore called 'germins', but they exist in all organs and developmental stages. They are all partly associated with the extracellular matrix. A wide range of functions have been uncovered for germins and germin-like proteins: some act as oxalate oxidases ( ) or as superoxide dismutases ( ), while others seem to be structural proteins or receptors for auxins or other proteins. Germins and germin-like proteins are highly similar to slime mold spherulins 1a and 1b which are proteins that accumulate specifically during spherulation, a process induced by various forms of environmental stress which leads to encystment and dormancy. The signature pattern for this entry is located in the central region and it contains three residues; 2 histidines and a glutamate, which are implicated in the binding of a manganese ion [ ].
Protein Domain
Name: Germin
Type: Family
Description: Germins (also known as Oxalate oxidases) and germin-like proteins [ ] are a family of hexameric ubiquitous plant glycoproteins. They are not restricted to germinating grains as initially sought and thereof called 'germins', but they exist in all organs and developmental stages. All are at least partly associated with the extracellular matrix. A wide range of function has been uncovered for germins and germin-like proteins: some act as oxalate oxidases ( ) or as superoxide dismutase ( ), while others seem to be structural proteins or receptors for auxins or proteins. Germin catalyses the manganese-dependent oxidative decarboxylation of oxalate to carbon dioxide and hydrogen peroxide (H2O2) [ , , ]. It is widespread in fungi and various plant tissues and may play a role in plant signaling and defense.Germins and germin-like proteins are highly similar to slime mold spherulins 1a and 1b which are proteins that accumulate specifically during spherulation, a process induced by various forms of environmental stress which leads to encystment and dormancy.
Protein Domain
Name: PAS fold
Type: Domain
Description: PAS domains are involved in many signalling proteins where they are used as a signal sensor domain [ ]. PAS domains appear in archaea, bacteria and eukaryotes. Several PAS-domain proteins are known to detect their signal by way of an associated cofactor. Heme,flavin, and a 4-hydroxycinnamyl chromophore are used in different proteins. The PAS domain was named after three proteins that it occurs in: Per- period circadian proteinArnt- Ah receptor nuclear translocator proteinSim- single-minded protein.PAS domains are often associated with PAC domains . It appears that these domains are directly linked, and that together they form the conserved 3D PAS fold. The division between the PAS and PAC domains is caused by major differences in sequences in the region connecting these two motifs [ ]. In human PAS kinase, this region has been shown to be very flexible, and adopts different conformations depending on the bound ligand []. Probably the most surprising identification of a PAS domain was that in EAG-like K-channels [ ].
Protein Domain
Name: Cornichon
Type: Family
Description: This entry represents a group of conserved proteins from fungi, plants to animals. They are transmembrane proteins. Proteins in this entry include budding yeast Erv14/15, Drosophila Cornichon and human CNIH1/2/3/4. The drosophila cornichon protein (gene: cni), the founding member of this family, is an integral component of the COPII-coated vesicles that mediate cargo export from the yeast endoplasmic reticulum (ER) [ ]. It is required in the germline for dorsal-ventral signalling. The dorsal-ventral pattern formation involves a reorganisation of the microtubule network correlated with the movement of the oocyte nucleus, and depending on the initial correct establishment of the anterior-posterior axis via a signal from the oocyte produced by cornichon and gurken and received by torpedo protein in the follicle cells []. Erv14 is a COPII-coated vesicle protein involved in vesicle formation and incorporation of specific secretory cargo. It is required for axial budding [ , ].CNIH1 is involved in the selective transport and maturation of TGF-alpha family proteins [ ].
Protein Domain
Name: EV matrix domain superfamily
Type: Homologous_superfamily
Description: Ebola virus sp. are non-segmented, negative-strand RNA viruses that causes severe haemorrhagic fever in humans with high rates of mortality. The virus matrix protein VP40 is a major structural protein that plays a central role in virus assembly and budding at the plasma membrane of infected cells. VP40 proteins associate with cellular membranes, interact with the cytoplasmic tails of glycoproteins, and bind to the ribonucleoprotein complex. The VP40 monomer consists of two domains, the N-terminal oligomerization domain and the C-terminal membrane-binding domain, connected by a flexible linker. Both the N- and C-terminal domains fold into beta sandwich structures of similar topology [ ]. Within the N-terminal domain are two overlapping L-domains with the sequences PTAP and PPEY at residues 7 to13, which are required for efficient budding []. L-domains are thought to mediate their function in budding through their interaction with specific host cellular proteins, such as tsg101 and vps-4 []. This entry represents the N- and C-terminal domains of the VP40 matrix protein.
Protein Domain
Name: Hantavirus glycoprotein Gc
Type: Domain
Description: The medium (M) genome segment of Hantaviruses (family Bunyaviridae) encodes the two virion glycoproteins [ ], Gn and Gc (also known as G1 and G2, respectively) as a polyprotein precursor [, ]. Gn and Gc forms homotetramers at the surface of the virion, which attach the virion to host cell receptors including integrin beta3/ITGB3 and induce internalization, predominantly through clathrin-dependent endocytosis [, ]. This entry represents the polyprotein region which forms the Gc glycoprotein. It has been shown that the N-terminal region of glycoprotein Gc has the conserved CNP motif, suggested to be an integrin-binding motif [ ]. Gc protein has a typical class II fusion protein fold consisting of a central β-sandwich domain (termed domain I) made of eight β-strands arranged in two antiparallel β-sheets, domain II which has an elongated shape with two subdomains (a central, opened β-barrel proximal to domain I and a distal β-sandwich 'tip'), and domain III, which has an Ig-like fold [, ].
USDA
InterMine logo
The Legume Information System (LIS) is a research project of the USDA-ARS:Corn Insects and Crop Genetics Research in Ames, IA.
LegumeMine || ArachisMine | CicerMine | GlycineMine | LensMine | LupinusMine | PhaseolusMine | VignaMine | MedicagoMine
InterMine © 2002 - 2022 Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, United Kingdom