Search our database by keyword

- or -

Examples

  • Search this entire website. Enter identifiers, names or keywords for genes, pathways, authors, ontology terms, etc. (e.g. eve, embryo, zen, allele)
  • Use OR to search for either of two terms (e.g. fly OR drosophila) or quotation marks to search for phrases (e.g. "dna binding").
  • Boolean search syntax is supported: e.g. dros* for partial matches or fly AND NOT embryo to exclude a term

Search results 13901 to 14000 out of 30763 for seed protein

Category restricted to ProteinDomain (x)

0.034s

Categories

Category: ProteinDomain
Type Details Score
Protein Domain
Name: Ribonucleotide reductase regulator NrdR-like
Type: Family
Description: Ribonucleotide reductases (RNRs) are essential enzymes which catalyse the reduction of ribonucleotides to their respective deoxyribonucleotides, thus providing the precursors necessary for DNA synthesis [ ].Proteins in this entry are orthologues of the novel transcriptional regulator NrdR ( ) from Streptomyces coelicolor. This protein negatively regulates the expression of the RNR genes in this organism [ ]. NrdR, like all proteins in this entry, contains an ATP-cone domain which may regulate DNA-binding activity by sensing deoxyribonucleotides [], though since this domain resembles the allosteric effector region of some RNRs it has also been suggested that it may allow NrdR to directly modulate the activity of some reductases, in addition to its function as a transcriptional regulator [].NrdR-type proteins have been detected in a large number of bacterial genomes [ ]. They are encoded only once in any particular genome and are usually clustered near RNR genes or genes involved in chromosome replication. Currently, only a single archaeon Natronomonas pharaonis (strain DSM 2160/ATCC 35678) has been shown to encode one of these proteins.
Protein Domain
Name: LPXTG cell wall anchor domain
Type: Domain
Description: Surface proteins from Gram-positive cocci are covalently linked to the bacterial cell wall by sortase, a membrane-anchored transpeptidase that cleaves proteins between the threonine and the glycine of a conserved LPxTG motif, with the formation of a thioester between the conserved cysteine of sortase and the threonine carboxyl group. The newly liberated C terminus of the threonine is transferred via an amide bond exchange to the amino group of the pentaglycine wall crossbridge, thereby tethering the C terminus end of the surface protein to the bacterial peptidoglycan [, ].Surface proteins from Gram-positive cocci contain an N-terminal signal peptide and a C-terminal sorting signal. The 35-residue sorting signal is composed of a conserved LPxTG motif, a hydrophobic domain, and a tail of positively charged residues.In the case of immunoglobulin A1 proteases, the typical Gram-positive cell wall anchor motif LPxTG is located in their N-terminal regions, in contrast with other known streptococcal and staphylococcal proteins [].This entry represents a domain covering the LPxTG motif, the hydrophobic stretch and the positively charged region.
Protein Domain
Name: Frizzled 4, cysteine-rich domain
Type: Domain
Description: The cysteine-rich domain (CRD) is an essential extracellular portion of the frizzled 4 (Fz4) receptor, and is required for binding Wnt proteins, which play fundamental roles in many aspects of early development, such as cell and tissue polarity, neural synapse formation, and the regulation of proliferation. Fz proteins serve as Wnt receptors for multiple signal transduction pathways, including both beta-catenin dependent and -independent cellular signaling, as well as the planar cell polarity pathway and the Ca(2+) modulating signaling pathway [ ]. CRD containing Fzs have been found in diverse species from amoebas to mammals. 10 different frizzled proteins are found in vertebrata.Frizzled 4 (Fz4) activates the Ca(2+)/calmodulin-dependent protein kinase II and protein kinase C of the Wnt/Ca(2+) signaling pathway during retinal angiogenesis. Mutations in Fz4 lead to familial exudative vitreoretinopathy (FEVR), a hereditary ocular disorder characterized by failure of the peripheral retinal vascularization [ ]. In addition, the interplay between Fz4 and norrin as a receptor-ligand pair plays an important role in vascular development in the retina and inner ear in a Wnt-independent manner [].
Protein Domain
Name: SFRP1, cysteine-rich domain
Type: Domain
Description: The cysteine-rich domain (CRD) is an essential part of the secreted frizzled-related protein 1 (SFRP1), which is an antagonist of the Wnt-Frizzled pathway and is involved in regulating many processes, such as vascular cell proliferation [ ], heart morphogenesis [], bone trabecular formation [] and lung morphogenesis []. SFRP1 is expressed in many tissues and is involved in the regulation of Wnt signaling in osteoblasts, leading to enhanced trabecular bone formation in adults; it has also been shown to control the growth of retinal ganglion cell axons and the elongation of the antero-posterior axis [, ].In general, SFRPs antagonize the activation of Wnt signaling by binding to the CRD domains of frizzled (Fz) proteins, thereby preventing Wnt proteins from binding to these receptors. SFRPs are also known to have functions unrelated to Wnt, as enhancers of procollagen cleavage by the TLD proteinases [ ]. SFRPs and Fz proteins both contain CRD domains, but SFRPs lack the seven-pass transmembrane domain which is an integral part of Fzs [, ].
Protein Domain
Name: Cytochrome c, class III
Type: Family
Description: Cytochromes c (cytC) can be defined as electron-transfer proteins having one or several haem c groups, bound to the protein by one or, more generally, two thioether bonds involving sulphydryl groups of cysteine residues. The fifth haem iron ligand is always provided by a histidine residue. CytC possess a wide range of properties and function in a large number of different redox processes []. Ambler [ ] recognised four classes of cytC.Class III comprises the low redox potential multiple haem cytochromes: cyt C7 (trihaem), C3 (tetrahaem),and high-molecular-weight cytC, HMC (hexadecahaem), with only 30-40 residues per haem group. The haem c groups, all bis-histidinyl coordinated,are structurally and functionally nonequivalent and present different redox potentials in the range 0 to -400 mV []. The 3D structures of a number of cyt C3 proteins have been determined. The proteins consist of 4-5 α-helices and 2 β-strands wrapped around a compactcore of four non-parallel haems, which present a relatively high degree of exposure to the solvent. The overall protein architecture, haem plane orientations and iron-iron distances are highly conserved [ ].
Protein Domain
Name: STARD11, START domain
Type: Domain
Description: This entry represents the steroidogenic acute regulatory protein (StAR)-related lipid transfer (START) domain of STARD11. STARD11 is also known as Collagen type IV alpha-3-binding protein, or COL4A3BP. STARD11 acts as a non-vesicular ceramide-carrier protein [ ]. STARD11 is synthesized from two major transcripts: a long one encoding Goodpasture-antigen-binding protein (GPBP), also named CERTL; and a shorter one lacking one exon, GPBPdelta26/CERT (also known as CERT). It is likely that these two carry out different functions in specific sub-cellular locations. GPBP/CERTL exists in multiple isoforms originating from alternative translation initiation sites and post-translational modifications. Recombinant STARD11 binds and phosphorylates Goodpasture antigen, the C-terminal region of the alpha3 chain of collagen IV, which is involved in the autoimmune disease Goodpasture disease [, ]. GPBL/CERTL contains an N-terminal pleckstrin homology domain (PH), which targets the protein to the Golgi, a middle region containing two serine-rich domains (SR1, SR2), a FFAT (two phenylalanine amino acids in an acidic tract) motif which is involved in endoplasmic reticulum targeting, and this C-terminal SMART domain. The shorter splicing variant, CERT, lacks the SR2 domain [ ].
Protein Domain
Name: Calsequestrin, C-terminal TRX-fold domain
Type: Domain
Description: Calsequestrin contains three redox inactive TRX-fold domains [ , ]. This entry represents the C-terminal TRX-fold domain.Calsequestrin is the principal calcium-binding protein present in the sarcoplasmic reticulum of cardiac and skeletal muscle []. It is a highly acidic protein that is able to bind over 40 calcium ions and acts as an internal calcium store in muscle. Sequence analysis has suggested that calcium isnot bound in distinct pockets via EF-hand motifs, but rather via presentation of a charged protein surface. Two forms of calsequestrin have been identified. The cardiac form is present in cardiac and slowskeletal muscle and the fast skeletal form is found in fast skeletal muscle. The release of calsequestrin-bound calcium (through a a calciumrelease channel) triggers muscle contraction. Theactive protein is not highly structured, more than 50% of it adopting a random coil conformation []. When calcium binds there is a structural change wherebythe α-helical content of the protein increases from 3 to 11% [ ].Both forms of calsequestrin are phosphorylated by casein kinase II, but the cardiac form is phosphorylated more rapidly and to a higher degree [].
Protein Domain
Name: Calsequestrin, middle TRX-fold domain
Type: Domain
Description: Calsequestrin contains three redox inactive TRX-fold domains [ , ]. This entry represents the middle TRX-fold domain.Calsequestrin is the principal calcium-binding protein present in the sarcoplasmic reticulum of cardiac and skeletal muscle []. It is a highly acidic protein that is able to bind over 40 calcium ions and acts as an internal calcium store in muscle. Sequence analysis has suggested that calcium isnot bound in distinct pockets via EF-hand motifs, but rather via presentation of a charged protein surface. Two forms of calsequestrin have been identified. The cardiac form is present in cardiac and slowskeletal muscle and the fast skeletal form is found in fast skeletal muscle. The release of calsequestrin-bound calcium (through a a calciumrelease channel) triggers muscle contraction. The active protein is not highly structured, more than 50% ofit adopting a random coil conformation [ ]. When calcium binds there is a structural change wherebythe α-helical content of the protein increases from 3 to 11% [ ].Both forms of calsequestrin are phosphorylated by casein kinase II, but the cardiac form is phosphorylated more rapidly and to a higher degree [].
Protein Domain
Name: Calsequestrin, N-terminal TRX-fold domain
Type: Domain
Description: Calsequestrin contains three redox inactive TRX-fold domains [ , ]. This entry represents the N-terminal TRX-fold domain.Calsequestrin is the principal calcium-binding protein present in the sarcoplasmic reticulum of cardiac and skeletal muscle []. It is a highly acidic protein that is able to bind over 40 calcium ions and acts as an internal calcium store in muscle. Sequence analysis has suggested that calcium isnot bound in distinct pockets via EF-hand motifs, but rather via presentation of a charged protein surface. Two forms of calsequestrin have been identified. The cardiac form is present in cardiac and slowskeletal muscle and the fast skeletal form is found in fast skeletal muscle. The release of calsequestrin-bound calcium (through a a calciumrelease channel) triggers muscle contraction. The active protein is not highly structured, more than 50% ofit adopting a random coil conformation [ ]. When calcium binds there is a structural change wherebythe α-helical content of the protein increases from 3 to 11% [ ].Both forms of calsequestrin are phosphorylated by casein kinase II, but the cardiac form is phosphorylated more rapidly and to a higher degree [].
Protein Domain
Name: Spike glycoprotein S1, coronavirus
Type: Domain
Description: The type I glycoprotein S of Coronavirus, trimers of which constitute the typical viral spikes, is assembled into virions through noncovalent interactions with the M protein. The spike glycoprotein is translated as a large polypeptide that is subsequently cleaved to S1 and S2 [ ]. Both chimeric S proteins appeared to cause cell fusion when expressed individually, suggesting that they were biologically fully active []. The spike is a type I membrane glycoprotein that possesses a conserved transmembrane anchor and an unusual cysteine-rich (cys) domain that bridges the putative junction of the anchor and the cytoplasmic tail [].The coronavirus (SARS-CoV) S1 subunit is composed of two distinct domains: an N-terminal domain (S1 NTD) and a receptor-binding domain (S1 RBD) also referred to as the S1 CTD or domain B. Each of these domains have been implicated in binding to host receptors. However, most coronaviruses are not known to utilise both the S1 NTD and S1 RBD for viral entry [ ]. SARS-CoV makes use of its S1 RBD to bind to the human angiotensin-converting enzyme 2 (ACE2) as its host receptor [, ].
Protein Domain
Name: Nuclear receptor coactivator, CREB-bp-like, interlocking
Type: Domain
Description: This entry represents the interlocking domain of the eukaryotic nuclear receptor coactivators CREBP and p300. The interlocking domain forms a 3-helical non-globular array that forms interlocked heterodimers with its target.Nuclear receptors are ligand-activated transcription factors involved in the regulation of many processes, including development, reproduction and homeostasis. Nuclear receptor coactivators act to modulate the function of nuclear receptors. Coactivators associate with promoters and enhancers primarily through protein-protein contacts to facilitate the interaction between DNA-bound transcription factors and the transcription machinery. Many of these coactivators are structurally related, including CBP (CREB-binding protein) and p300 [ ]. CBP and p300 both have histone acetyltransferase activity (). CBP/p300 proteins function synergistically to activate transcription, acting to remodel chromatin and to recruit RNA polymerase II and the basal transcription machinery. CBP is required for proper cell cycle control, differentiation and apoptosis. The interaction of CBP/p300 with transcription factors involves several small domains. The IBiD domain in the C-terminal of CBP is responsible for CBP interaction with IRF-3, as well as with the adenoviral oncoprotein E1A, TIF-2 coactivator, and the IRF homologue KSHV IRF-1 [ ].
Protein Domain
Name: H/ACA RNP complex subunit Gar1/Naf1, Cbf5-binding domain
Type: Homologous_superfamily
Description: H/ACA ribonucleoprotein particles (RNPs) are a family of RNA pseudouridine synthases that specify modification sites through guide RNAs. The function of these H/ACA RNPs is essential for biogenesis of the ribosome, splicing of precursor mRNAs (pre-mRNAs), maintenance of telomeres and probably for additional cellular processes []. All H/ACA RNPs contain a specific RNA component (snoRNA or scaRNA) and at least four proteins common to all such particles: Cbf5, Gar1, Nhp2 and Nop10. These proteins are highly conserved from yeast to mammals and homologues are also present in archaea []. The H/ACA protein complex contains a stable core composed of Cbf5 and Nop10, to which Gar1 and Nhp2 subsequently bind [].Naf1 is an RNA-binding protein required for the maturation of box H/ACA snoRNPs complex and ribosome biogenesis. During assembly of the H/ACA snoRNPs complex, it associates with the complex, disappearing during maturation of the complex and being replaced by Gar1 to yield mature H/ACA snoRNPs complex. The core domain of Naf1 is homologous to the core domain of Gar1, suggesting that they share a common Cbf5 binding surface [ ].
Protein Domain
Name: Tetracyclin repressor-like, C-terminal domain superfamily
Type: Homologous_superfamily
Description: This entry represents the C-terminal domain found in a number of different TetR transcription regulator proteins. TetR regulates the expression of the membrane-associated tetracycline resistance protein, TetA, which exports the tetracycline antibiotic out of the cell before it can attach to the ribosomes and inhibit protein synthesis [ ]. TetR blocks transcription from the genes encoding both TetA and TetR in the absence of antibiotic. The C-terminal domain is multi-helical and is interlocked in the homodimer with the helix-turn-helix (HTH) DNA-binding domain. Other members of the TetR family of transcriptional regulators carry this C-terminal domain. These include:QacR from Staphylococcus aureus, a multidrug binding protein that represses transcription of the qacA multidrug transporter gene [ ] Ethr, a repressor from Mycobacterium tuberculosis implicated in ethionamide drug resistance [ ] CprB, a gamma-butyrolactone autoregulator/receptor from Streptomyces coelicolor that acts as a DNA-binding protein [ ] YcdC, a hypothetical transcriptional regulator from Escherichia coliYsiA, YfiR, and YxaF, hypothetical transcriptional regulators from Bacillus subtilis YbiH, a hypothetical transcriptional regulator from Salmonella typhimurium
Protein Domain
Name: SF3B4, RNA recognition motif 2
Type: Domain
Description: This entry represents the RNA recognition motif 2 (RRM2) of SF3B4, also termed pre-mRNA-splicing factor SF3b 49kDa (SF3b50), or spliceosome-associated protein 49 (SAP 49). SF3B4 is a component of the multiprotein complex splicing factor 3b (SF3B), an integral part of the U2 small nuclear ribonucleoprotein (snRNP) and the U11/U12 di-snRNP. SF3B is essential for the accurate excision of introns from pre-messenger RNA, and is involved in the recognition of the pre-mRNA's branch site within the major and minor spliceosomes [ ]. SF3B4 functions to tether U2 snRNP with pre-mRNA at the branch site during spliceosome assembly []. It is an evolutionarily highly conserved protein with orthologues across diverse species.SF3B4 contains two closely adjacent N-terminal RNA recognition motifs (RRMs). It binds directly to pre-mRNA and also interacts directly and highly specifically with another SF3B subunit called SAP 145 [ ].Mutations in the SF3B4 gene cause Nager syndrome, a form of acrofacial dysostosis which affects the development of the face, hands, and arms [ ]. This entry also includes the orthologue from Schizosaccharomyces pombe Spliceosome-associated protein 49.
Protein Domain
Name: SREK1, RNA recognition motif 2
Type: Domain
Description: This entry represents the RNA recognition motif 2 (RRM2) of SREK1 (also known as SRrp86). SREK1 belongs to a family of proteins containing regions rich in serine-arginine dipeptides (SR proteins family), which is involved in bridge-complex formation and splicing by mediating protein-protein interactions across either introns or exons. It may play a crucial role in determining tissue specific patterns of alternative splicing. SREK1 can alter splice site selection by both positively and negatively modulating the activity of other SR proteins. For instance, SREK1 can activate SRp20 and repress SC35 in a dose-dependent manner both in vitro and in vivo [].SREK1 contains two (some contain only one) RNA recognition motifs (RRMs), and two serine-arginine (SR)-rich domains (SR domains) separated by an unusual glutamic acid-lysine (EK) rich region. The RRM and SR domains are highly conserved among other members of the SR superfamily. However, the EK domain is unique to SREK1. It plays a modulatory role controlling SR domain function by involvement in the inhibition of both constitutive and alternative splicing and in the selection of splice-site [ , ].
Protein Domain
Name: Tubulin/FtsZ, GTPase domain superfamily
Type: Homologous_superfamily
Description: This entry represents a GTPase domain found in all tubulin chains, such as tubulin alpha, beta and gamma chains, plant ARC3 and prokaryotic FtsZ and CetZ proteins [ , ]. These proteins are involved in polymer formation. Tubulin is the major component of microtubules, while FtsZ (homologue of eukaryotic tubulin) is the polymer-forming protein of bacterial cell division, it is part of a ring in the middle of the dividing cell that is required for constriction of cell membrane and cell envelope to yield two daughter cells [ , ]. FtsZ can polymerise into tubes, sheets, and rings in vitro and is ubiquitous in bacteria and archaea. CetZ co-exists with FtsZ in many archaea. Cetz does not affect cell division, instead, it is involved in cell shape control []. Arabidopsis chloroplast protein ARC3 (At1g75010) is a Z-ring accessory protein involved in the initiation of plastid division and division site placement [, ].The structure of the GTPase domain has a three layers (α/β/α) with a parallel β-sheet of six strands.
Protein Domain
Name: UNC5B, death domain
Type: Domain
Description: This entry represents the death domain (DD) found in Uncoordinated-5B (UNC5B), which is a receptor for the secreted netrin-1 and plays a role in axonal guidance, angiogenesis, and apoptosis. UNC5B signaling is involved in the netrin-1-induced proliferation and migration of renal proximal tubular cells [ ]. It is also required for vascular patterning during embryonic development, and its activation inhibits sprouting angiogenesis [, ]. It belongs to the UNC-5 family.UNC5 proteins are transmembrane proteins with an extracellular domain consisting of two immunoglobulin repeats, two thrombospondin type-I modules and an intracellular region containing a ZU-5 domain, UPA domain and a DD [ , , ].DDs (Death domains) are protein-protein interaction domains found in a variety of domain architectures. Their common feature is that they form homodimers by self-association or heterodimers by associating with other members of the DD superfamily including CARD (Caspase activation and recruitment domain), DED (Death Effector Domain), and PYRIN. They serve as adaptors in signaling pathways and can recruit other proteins into signaling complexes [ , ].
Protein Domain
Name: Photosystem I PsaA/PsaB superfamily
Type: Homologous_superfamily
Description: Photosystem I (PSI) [ ] is an integral membrane protein complex that uses light energy to mediate electron transfer from plastocyanin to ferredoxin. PSI is found in the chloroplast of plants and cyanobacteria. The electron transfer components of the reaction centre of PSI are a primary electron donor P-700 (chlorophyll dimer) and five electron acceptors: A0 (chlorophyll), A1 (a phylloquinone) and three 4Fe-4S iron-sulphur centres: Fx, Fa, and Fb.PsaA and psaB, two closely related proteins, are involved in the binding of P700, A0, A1, and Fx. psaA and psaB are both integral membrane proteins of 730 to 750 amino acids that seem to contain 11 transmembrane segments. The Fx 4Fe-4S iron-sulphur centre is bound by four cysteines; two of these cysteines are provided by the psaA protein and the two others by psaB. The two cysteines in both proteins are proximal and located in a loop between the ninth and tenth transmembrane segments. A leucine zipper motif seems to be present [ ] downstream of the cysteines and could contribute to dimerisation of psaA/psaB.
Protein Domain
Name: Homeobox domain engrailed
Type: Domain
Description: Proteins that regulate developmental gene expression are nuclear proteins [ ] that containa conserved domain known as the homeobox, the flanking sequences of which differ considerably among different proteins. The homeodomain includes the helix-turn-helix (HTH) motif which binds to DNA[ ]. Most proteins which contain a homeobox domain can be classified [, ],on the basis of their sequence characteristics, into three subfamilies, engrailed, antennapedia and paired. The engrailed subfamily plays an important role in Drosophila segmentation and neurogenesis,affecting genes in posterior compartments of the developing embryo. It is also required for the development of the central nervous system. Homologues found in other species may play a role inneurogenesis, possibly in both the compartmentalisation of the developing neural tube and specification of particular neuronal populations. Other members of the engrailed subfamily include Drosophila invectedprotein (inv); Apis mellifera (Honeybee) E30 and E60; Schistocerca americana (American grasshopper) G-En; mammalian and birds En-1 and En-2; Danio rerio (Zebrafish) (Brachydanio rerio) Eng-1, -2 and -3; Helobdella triserialis (Leech) Ht-En; and Caenorhabditis elegans ceh-16.
Protein Domain
Name: Cystic fibrosis transmembrane conductance regulator
Type: Family
Description: The ABC transporter family is a group of membrane proteins that use the hydrolysis of ATP to power the translocation of a wide variety of substrates across cellular membranes. ABC transporters minimally consist of two conserved regions: a highly conserved nucleotide-binding domain (NBD) and a less conserved transmembrane domain (TMD). Eukaryotic ABC proteins are usually organised either as full transporters (containing two NBDs and two TMDs), or as half transporters (containing one NBD and one TMD), that have to form homo- or heterodimers in order to constitute a functional protein [ ].Cystic fibrosis transmembrane conductance regulator (CFTR, also known as ABCC7) is an eukaryotic protein belonging to the ABC-C subfamily of the ABC transporter family. CFTR protein is a chloride ion channel controlled by phosphorylation. It has a major role in electrolyte and fluid secretion. CFTR is important in the determination of fluid flow, ion concentration and transepithelial salttransport. Dysfunction of the CFTR channel causes the life-threatening disease, cystic fibrosis, in which trans-epithelial ion transport is disrupted [ ]. Defective phosphorylation has been seen to be a cause for this altered activity [].
Protein Domain
Name: Predicted nickel metalloenzyme maturation factor, AIR synthase-related
Type: Family
Description: The large subunit of [NiFe]-hydrogenase--as well as other nickel metalloenzymes--is synthesized as a precursor devoid of the metalloenzyme active site. This precursor then undergoes a complex post-translational maturation process that requires a number of accessory proteins [, ].One such group of accessory proteins is represented by , whose members are from archaea and are required for hydrogenase maturation [ , ] (though their exact function is still unknown). Proteins from this family have similar sequences to members. Thus, members of this family are predicted to play a role in maturation of some nickel metalloenzyme(s). However, other functions can not be ruled out. Both this family and are related to proteins that use both purine derivatives and ATP as substrates. They belong to a superfamily of AIR synthase-related proteins that also includes PurM phosphoribosylaminoimidazole synthetase (AIR synthase), thiamine monophosphate kinase ( ), phosphoribosylformylglycinamidine synthase (FGAM synthase, ), and selenophosphate synthase (selenide, water dikinase). They all contain two conserved domains and seem to dimerize. The N-terminal domain forms the dimer interface and is a putative ATP binding domain [ ].
Protein Domain
Name: MICOS complex subunit MIC26/MIC27
Type: Family
Description: This entry includes MICOS complex subunit Mic26 from fungi and animals. Mammalian MIC26 and its paralogue, MIC27, can also be found in this entry. However, this entry does not include budding yeast Mic27. Budding yeast Mic26 is a component of the MICOS complex, a large protein complex of the mitochondrial inner membrane that plays crucial roles in the maintenance of crista junctions, inner membrane architecture, and formation of contact sites to the outer membrane [ ].Human MIC26 (also known as apolipoprotein O, APOO) plays a crucial role in crista junction formation and mitochondrial function [ ] and can promote cardiac lipotoxicity by enhancing mitochondrial respiration and fatty acid metabolism in cardiac myoblasts []. It promotes cholesterol efflux from macrophage cells and can be detected in HDL, LDL and VLDL. It is secreted by a microsomal triglyceride transfer protein (MTTP)-dependent mechanism, probably as a VLDL-associated protein that is subsequently transferred to HDL [].Human MIC27 (also known as apolipoprotein O-like, APOOL) is also a subunit of the MICOS complex. It interacts with MIC26 and is involved in the formation of crista junctions [ ].
Protein Domain
Name: Pleiotrophin/Midkine, C-terminal domain
Type: Domain
Description: Several extracellular heparin-binding proteins involved in regulation of growth and differentiation belong to a new family of growth factors. These growth factors are highly related proteins of about 140 amino acids that contain 10 conserved cysteines probably involved in disulphide bonds, and include pleiotrophin [ ] (also known as heparin-binding growth-associated molecule HB-GAM, heparin-binding growth factor 8 HBGF-8, heparin-binding neutrophic factor HBNF and osteoblast specific protein OSF-1); midkine (MK) []; retinoic acid-induced heparin-binding protein (RIHB) []; and pleiotrophic factors alpha-1 and -2 and beta-1 and -2 from Xenopus laevis, the homologues of midkine and pleiotrophin respectively. Pleiotrophin is a heparin-binding protein that has neurotrophic activity and has mitogenic activity towards fibroblasts. It is highly expressed in brain and uterus tissues, but is also found in gut, muscle and skin. It is thought to possess an important brain-specific function. Midkine is a regulator of differentiation whose expression is regulated by retinoic acid, and, like pleiotrophin, is a heparin-binding growth/differentiation factor that acts on fibroblasts and nerve cells.This entry represents the C-terminal domain of pleiotrophin and midkine [ ].
Protein Domain
Name: Pleiotrophin/Midkine, N-terminal domain
Type: Domain
Description: Several extracellular heparin-binding proteins involved in regulation of growth and differentiation belong to a new family of growth factors. These growth factors are highly related proteins of about 140 amino acids that contain 10 conserved cysteines probably involved in disulphide bonds, and include pleiotrophin [ ] (also known as heparin-binding growth-associated molecule HB-GAM, heparin-binding growth factor 8 HBGF-8, heparin-binding neutrophic factor HBNF and osteoblast specific protein OSF-1); midkine (MK) []; retinoic acid-induced heparin-binding protein (RIHB) []; and pleiotrophic factors alpha-1 and -2 and beta-1 and -2 from Xenopus laevis, the homologues of midkine and pleiotrophin respectively. Pleiotrophin is a heparin-binding protein that has neurotrophic activity and has mitogenic activity towards fibroblasts. It is highly expressed in brain and uterus tissues, but is also found in gut, muscle and skin. It is thought to possess an important brain-specific function. Midkine is a regulator of differentiation whose expression is regulated by retinoic acid, and, like pleiotrophin, is a heparin-binding growth/differentiation factor that acts on fibroblasts and nerve cells.This entry represents the N-terminal domain of pleiotrophin and midkine [ ].
Protein Domain
Name: LEM domain
Type: Domain
Description: The LEM (LAP2, emerin, MAN1) domain is a globular module of approximately 40 amino acids, which is mostly found in the nucleoplasmic portions of metazoaninner nuclear membrane proteins. The LEM domain has been shown to mediate binding to BAF (barrier-to-autointegration factor) and BAF-DNA complexes. BAFdimers bind to double-stranded DNA non-specifically and thereby bridge DNA molecules to form a large, discrete nucleoprotein complex [, ].The resolution of the solution structure of the LEM domain reveals that it iscomposed of a three-residue N-terminal helical turn and two large parallel alpha helices interacting through a set of conserved hydrophobic amino acids. The two helices, which are connected by a long loop are oriented at an angle of ~45 degree [, ].Proteins known to contain a LEM domain include:Vertebrate inner nuclear membrane protein MAN1. Vertebrate lamina-associated polypeptide 2 (LAP2) or thymopoietin. Mammalian emerin (EMD). In human, defects in EMD are a cause of X-linked Emery-Dreifuss muscular dystrophy (X-EDMD), an X-linked disorder, characterised by early contractures, muscle wasting and weakness and cardiomyopathy.Xenopus laevis Smad1 antagonistic effector (SANE).Drosophila melanogaster otefin (OTE).Caenorhabditis elegans W01G7.5 protein.
Protein Domain
Name: Rubella membrane glycoprotein E1, domain 3
Type: Homologous_superfamily
Description: Rubella virus (RV), the sole member of the genus Rubivirus within the family Togaviridae, is a small enveloped, positive strand RNA virus. The nucleocapsid consists of 40S genomic RNA and a single species of capsid protein which is enveloped within a host-derived lipid bilayer containing two viral glycoproteins, E1 (58kDa) and E2 (42-46kDa). In virus infected cells, RV matures by budding either at the plasma membrane, or at the internal membranes depending on the cell type and enters adjacent uninfected cells by a membrane fusion process in the endosome, directed by E1-E2 heterodimers. The heterodimer formation is crucial for E1 transport out of the endoplasmic reticulum to the Golgi and plasma membrane. In RV E1, a cysteine at position 82 is crucial for the E1-E2 heterodimer formation and cell surface expression of the two proteins. E1 has been shown to be a type 1 membrane protein, rich in cysteine residues with extensive intramolecular disulphide bonds [ ].This superfamily makes up the membrane-distal domain 3 found in rubella membrane glycoprotein E1. Structurally, it consists of 9 beta strands and 1 alpha helix.
Protein Domain
Name: Rubella membrane glycoprotein E1, domain 2
Type: Homologous_superfamily
Description: Rubella virus (RV), the sole member of the genus Rubivirus within the family Togaviridae, is a small enveloped, positive strand RNA virus. The nucleocapsid consists of 40S genomic RNA and a single species of capsid protein which is enveloped within a host-derived lipid bilayer containing two viral glycoproteins, E1 (58kDa) and E2 (42-46kDa). In virus infected cells, RV matures by budding either at the plasma membrane, or at the internal membranes depending on the cell type and enters adjacent uninfected cells by a membrane fusion process in the endosome, directed by E1-E2 heterodimers. The heterodimer formation is crucial for E1 transport out of the endoplasmic reticulum to the Golgi and plasma membrane. In RV E1, a cysteine at position 82 is crucial for the E1-E2 heterodimer formation and cell surface expression of the two proteins. E1 has been shown to be a type 1 membrane protein, rich in cysteine residues with extensive intramolecular disulphide bonds [ ].This superfamily represents the domain 2 found in Rubella membrane glycoprotein E1. Structurally, it consists of 6 beta strands and 2 alpha helices.
Protein Domain
Name: Pre-mRNA-splicing factor Cwc2/Slt11
Type: Family
Description: This entry contains proteins involved in the first step of pre-mRNA splicing to remove introns. Splicing is performed by the spliceosome, which is a complex of snRNAs U1, U2, U4, U5, and U6 and proteins such as pre-mRNA-processing factor 19 (Prp19). The yeast protein Cwc2 contains motifs known to bind RNA, a zinc finger and an RNA recognition motif. Because mutations in the gene are lethal and lead to accumulation of pre-mRNA, and reduced levels of U1, U4, U5 and U6 snRNAs and U4/U6 snRNA complex levels, Cwc2 has been proposed to link the Prp19 complex to the spliceosome during pre-mRNA splicing [ ]. Because Cwc2 is at the spliceosome catalytic centre, it induces an active conformation of the spliceosome's catalytic RNA elements []. Cwc2 is a component of the CWC complex []. The human homologue, known as RBM22 [], also translocates the protein SLU7 from the nucleus to the cytoplasm or vice versa in the case of the calcium-binding protein PDCD6 [].Also included in this entry is the pre-mRNA-splicing factor Slt11 (also known as Ecm2) [ ].
Protein Domain
Name: Saposin A-type domain
Type: Domain
Description: The saposin A-type domain is a ~40 amino acid domain present in the saposin precursor, prosaposin, in the propeptides that are cleaved off in theactivation reaction. The domain is named after the small lysosomal proteins, saposins, which serve as sphingolipid hydrolase activator proteins invertebrates. The mammalian saposins are synthesized as a single precursor molecule (prosaposin) which contains four saposin B-type domains yielding the active saposins A, B, C and D after proteolyticcleavage, and two saposin A-type domains in the extremities that are removed in the activation reaction. The saposin A-type domain may play a role intargeting, as propeptides containing the saposin A-type domain of the C terminus of prosaposin and of the N-terminal part of pulmonarysurfactant-associated protein B are involved in the transport to the lysosome and to secretory granules (lamellar bodies, which are lysosomal-likeorganelles), respectively [ , ].Some proteins known to contain a saposin A-type domain:Mammalian proactivator polypeptide, the saposin precursor (prosaposin) that is processed into saposins A, B, C and D.Mammalian pulmonary surfactant-associated protein B (SP-B), a surface tension reducing surfactant secreted by type II epithelial cells.
Protein Domain
Name: HSPA4L, nucleotide-binding domain
Type: Domain
Description: Human Heat shock protein family A member 4 like (HSPA4L) is expressed ubiquitously and predominantly in the testis. It is required for normal spermatogenesis and plays a role in osmotolerance. HSPA4L belongs to the 105/110kDa heat shock protein (HSP105/110) subfamily of the HSP70-like family [ , ]. HSP105/110s are believed to function generally as co-chaperones of HSP70 chaperones, acting as nucleotide exchange factors (NEFs), to remove ADP from their HSP70 chaperone partners during the ATP hydrolysis cycle [ ]. HSP70 chaperones assist in protein folding and assembly, and can direct incompetent "client"proteins towards degradation. Like HSP70 chaperones, HSP105/110s have an N-terminal nucleotide-binding domain (NBD) and a C-terminal substrate-binding domain (SBD). For HSP70 chaperones, the nucleotide sits in a deep cleft formed between the two lobes of the NBD. The two subdomains of each lobe change conformation between ATP-bound, ADP-bound, and nucleotide-free states [ ]. ATP binding opens up the substrate-binding site; substrate-binding increases the rate of ATP hydrolysis. Hsp70 chaperone activity is also regulated by J-domain proteins [, , ].
Protein Domain
Name: Colipase, C-terminal
Type: Domain
Description: This entry represents the C-terminal domain of colipase proteins. Colipase [ , ] is a small protein cofactor needed by pancreatic lipase for efficient dietary lipid hydrolyisis. It also binds to the bile-salt covered triacylglycerol interface, thus allowing the enzyme to anchor itself to the water-lipid interface. Efficient absorption of dietary fats is dependent on the action of pancreatic triglyceride lipase. Colipase binds to the C-terminal, non-catalytic domain of lipase, thereby stabilising as active conformation and considerably increasing the overall hydrophobic binding site. Structural studies of the complex and of colipase alone have revealed the functionality of its architecture [, ].Colipase is a small protein with five conserved disulphide bonds. Structural analogies have been recognised between a developmental protein (Dickkopf), the pancreatic lipase C-terminal domain, the N-terminal domains of lipoxygenases and the C-terminal domain of alpha-toxin. These non-catalytic domains in the latter enzymes are important for interaction with membrane. It has not been established if these domains are also involved in eventual protein cofactor binding as is the case for pancreatic lipase [].
Protein Domain
Name: Colipase, N-terminal
Type: Domain
Description: This entry represents the N-terminal domain of colipase proteins. Colipase [ , ] is a small protein cofactor needed by pancreatic lipase for efficient dietary lipid hydrolyisis. It also binds to the bile-salt covered triacylglycerol interface, thus allowing the enzyme to anchor itself to the water-lipid interface. Efficient absorption of dietary fats is dependent on the action of pancreatic triglyceride lipase. Colipase binds to the C-terminal, non-catalytic domain of lipase, thereby stabilising as active conformation and considerably increasing the overall hydrophobic binding site. Structural studies of the complex and of colipase alone have revealed the functionality of its architecture [, ].Colipase is a small protein with five conserved disulphide bonds. Structural analogies have been recognised between a developmental protein (Dickkopf), the pancreatic lipase C-terminal domain, the N-terminal domains of lipoxygenases and the C-terminal domain of alpha-toxin. These non-catalytic domains in the latter enzymes are important for interaction with membrane. It has not been established if these domains are also involved in eventual protein cofactor binding as is the case for pancreatic lipase [ ].
Protein Domain
Name: Nuclear receptor coactivator, CREB-bp-like, interlocking domain superfamily
Type: Homologous_superfamily
Description: This entry represents the interlocking domain superfamily of the eukaryotic nuclear receptor coactivators CREBP and p300. The interlocking domain forms a 3-helical non-globular array that forms interlocked heterodimers with its target.Nuclear receptors are ligand-activated transcription factors involved in the regulation of many processes, including development, reproduction and homeostasis. Nuclear receptor coactivators act to modulate the function of nuclear receptors. Coactivators associate with promoters and enhancers primarily through protein-protein contacts to facilitate the interaction between DNA-bound transcription factors and the transcription machinery. Many of these coactivators are structurally related, including CBP (CREB-binding protein) and p300 [ ]. CBP and p300 both have histone acetyltransferase activity (). CBP/p300 proteins function synergistically to activate transcription, acting to remodel chromatin and to recruit RNA polymerase II and the basal transcription machinery. CBP is required for proper cell cycle control, differentiation and apoptosis. The interaction of CBP/p300 with transcription factors involves several small domains. The IBiD domain in the C-terminal of CBP is responsible for CBP interaction with IRF-3, as well as with the adenoviral oncoprotein E1A, TIF-2 coactivator, and the IRF homologue KSHV IRF-1 [ ].
Protein Domain
Name: Pleiotrophin/Midkine, N-terminal domain superfamily
Type: Homologous_superfamily
Description: Several extracellular heparin-binding proteins involved in regulation of growth and differentiation belong to a new family of growth factors. These growth factors are highly related proteins of about 140 amino acids that contain 10 conserved cysteines probably involved in disulphide bonds, and include pleiotrophin [ ] (also known as heparin-binding growth-associated molecule HB-GAM, heparin-binding growth factor 8 HBGF-8, heparin-binding neutrophic factor HBNF and osteoblast specific protein OSF-1); midkine (MK) []; retinoic acid-induced heparin-binding protein (RIHB) []; and pleiotrophic factors alpha-1 and -2 and beta-1 and -2 from Xenopus laevis, the homologues of midkine and pleiotrophin respectively. Pleiotrophin is a heparin-binding protein that has neurotrophic activity and has mitogenic activity towards fibroblasts. It is highly expressed in brain and uterus tissues, but is also found in gut, muscle and skin. It is thought to possess an important brain-specific function. Midkine is a regulator of differentiation whose expression is regulated by retinoic acid, and, like pleiotrophin, is a heparin-binding growth/differentiation factor that acts on fibroblasts and nerve cells.This entry represents the N-terminal domain superfamily of pleiotrophin and midkine [ ].
Protein Domain
Name: Claudin-6
Type: Family
Description: Claudins form the paracellular tight junction seal in epithelial tissues. In humans, 24 claudins (claudin 1-24) have been identified. Their ability to polymerise and form strands is affected by the cell types [ , , ]. They can also form heteropolymers with each other within and between tight junction strands []. Most of the claudins (claudin-12 being the exception) have a C-terminal PDZ-binding motif that can interact with other PDZ domain proteins, such as scaffolding protein, ZO-1, -2 and -3 []. They also interact with non-tight junction proteins, such as cell adhesion proteins EpCam and tetraspanins and the signaling proteins, ephrin A and B and their receptors, EphA and EphB [].Claudin-6 was identified through searching expressed sequence tag (EST) databases for sequences similar to claudin-1 and -2 []. It was subsequently cloned and expressed in cells, where it was shown to concentrate at tight junctions. Human and mouse isoforms have been identified. Claudin-6 shares ~25-70% overall similarity with other claudin family members at the amino acid level, displaying highest similarity to claudin-9.
Protein Domain
Name: Claudin-16
Type: Family
Description: Claudins form the paracellular tight junction seal in epithelial tissues. In humans, 24 claudins (claudin 1-24) have been identified. Their ability to polymerise and form strands is affected by the cell types [ , , ]. They can also form heteropolymers with each other within and between tight junction strands []. Most of the claudins (claudin-12 being the exception) have a C-terminal PDZ-binding motif that can interact with other PDZ domain proteins, such as scaffolding protein, ZO-1, -2 and -3 []. They also interact with non-tight junction proteins, such as cell adhesion proteins EpCam and tetraspanins and the signaling proteins, ephrin A and B and their receptors, EphA and EphB [].Claudin-16 was originally termed paracellin-1. It was re-classified as claudin-16 on the basis of its sequence similarity to the claudin family[ ]. Claudin-16 is involved in renal paracellular Mg2+ resorption and is required for selective paracellular conductance []. Defects in the claudin-16 gene are associated with an autosomal recessive chronic interstitial nephritis with diffuse zonal fibrosis (CINF) [, ].
Protein Domain
Name: E3 ubiquitin-protein ligase parkin
Type: Family
Description: Parkinson's disease (PD) is a common neurodegenerative disorder with complex clinical features and a poorly understood aetiology. PD is accompanied by aprogressive loss of dopamine-containing neurons in the substantia nigra, with patients suffering from rigidity, slowness of movement, tremour anddisturbances of balance. Autosomal recessive juvenile parkinsonism (AR-JP) is a rare form of familial PD mapped to chromosome 6 and linked strongly toa pair of markers. One of these markers has been cloned, yielding a sequence that encodes a protein, 465 amino acids long []. The protein sequence,named parkin, shows moderate similarity with ubiquitin at the N terminus and a ring-finger domain at the C terminus.In normal individuals, parkin binds to the E2 ubiquitin-conjugating human enzyme 8 (UbcH8) through the C-terminal ring-finger domain. In the presenceof UbcH8, parkin has ubiquitin-protein ligase activity and even catalyses its own ubiquitination. Furthermore, parkin appears to target the synapticvesicle-associated protein CDCrel-1 for ubiquitination and thus promotes its degradation. The mutated forms of parkin implicated in AR-JP appear to bedefective in terms of UbcH8 binding, E3 ubiquitin protein-ligase activity, self-ubiquitination, and CDCrel-1 binding and ubiquitination [].
Protein Domain
Name: Protein-arginine deiminase, C-terminal
Type: Domain
Description: In the presence of calcium ions, Protein-arginine deiminase (PAD) enzymes catalyse the post-translational modification reaction responsible for the formation of citrulline residues from protein-bound arginine residues []. Four PAD isotypes of PAD have been identified in mammals, a fifth may also exist. Non-mammalian vertebrates appear to have only a single PAD enzyme. All known natural substrates of PAD are proteins known to have an important structural function, such as keratin (PAD1), intermediate filaments or proteins associated with intermediate filaments. Citrulination may have consequences for the structural integrity and interactions of these proteins. Physiological levels of calcium appear to be too low to activate these enzymes suggesting a role between PAD activation and loss of calcium homeostasis during terminal differentiation and cell death (apoptosis). In humans, PAD enzymes may be involved in cytoskeletal reorganization in the egg and early embryo [ ]. These enzymes abolish the methyltransferase activity of Nicotinamide- N-methyltransferase (NNMT) through its citrullination, which may play a role in a subset of breast cancers and several chronic disease conditions [, ].
Protein Domain
Name: FAR1 DNA binding domain
Type: Domain
Description: Phytochrome A is the primary photoreceptor for mediating various far-red light-induced responses in higher plants. It has been found that the proteins governing this response, which include FAR-RED ELONGATED HYPOCOTYL3 (FHY3) and FAR-RED-IMPAIRED RESPONSE1 (FAR1), are a pair of homologous proteins sharing significant sequence homology to mutator-like transposases. These proteins appear to be novel transcription factors, which are essential for activating the expression of FHY1 and FHL (for FHY1-like) and related genes, whose products are required for light-induced phytochrome A nuclear accumulation and subsequent light responses in plants.The FRS (FAR1 Related Sequences) family of proteins share a similar domain structure to mutator-like transposases, including an N-terminal C2H2 zinc finger domain, a central putative core transposase domain, and a C-terminal SWIM motif (named after SWI2/SNF and MuDR transposases). It seems plausible that the FRS family represent transcription factors derived from mutator-like transposases [ , ]. This entry represents a domain found in FAR1 and FRS proteins. It contains a WRKY like fold and is therefore most likely a zinc binding DNA-binding domain.
Protein Domain
Name: Guanylate-binding protein/Atlastin, C-terminal
Type: Domain
Description: Guanylate-binding protein is a GTPase that is induced by interferon (IFN)-gamma. GTPases induced by IFN-gamma are key to the protective immunity against microbial and viral pathogens. These GTPases are classified into three groups: the small 47-kd GTPases, the Mx proteins, and the large 65- to 67-kd GTPases. Guanylate-binding proteins (GBP) fall into the last class. In humans, there are seven GBPs (hGBP1-7) []. Structurally, hGBP1 consists of two domains: a compact globular N-terminal domain harbouring the GTPase function (), and an α-helical finger-like C-terminal domain. Human GBP1 is secreted from cells without the need of a leader peptide, and has been shown to exhibit antiviral activity against Vesicular stomatitis virus and Encephalomyocarditis virus, as well as being able to regulate the inhibition of proliferation and invasion of endothelial cells in response to IFN-gamma [ ].This entry represents the C-terminal domain of the guanylate-binding protein. Proteins containing this domain also include Atlastin2/3. They are GTPases tethering membranes through formation of trans-homooligomers and mediating homotypic fusion of endoplasmic reticulum membranes [ ].
Protein Domain
Name: Mss4/translationally controlled tumour-associated TCTP
Type: Homologous_superfamily
Description: This superfamily represents a structural domain with a complex fold consisting of several coiled β-sheets. This domain exists as a duplication, consisting of a tandem repeat of two similar structural motifs. This entry represents copies of this structural motif in the following protein families:Mss4, which contains a zinc-binding site.Translationally controlled tumour-associated protein TCTP, which contains an insertion of an α-helix hairpin, and which lacks a zinc-binding site.Mss4 is a conserved accessory factor for Rab GTPases, which function as ubiquitous regulators of intracellular membrane trafficking [ ]. Mss4 acts to promote nucleotide release from exocytic but not endocytic Rab GTPases. Mss4 has a complex fold made of several coiled β-sheets, and consists of a duplication of tandem repeats of two similar structural motifs. It contains a zinc-binding site.Other proteins that show structural similarity to Mss4 include the translationally controlled tumour-associated proteins TCTPs, which contain an insertion of an alpha helical hairpin, and lack the zinc-binding site. TCTPs are a highly conserved and abundantly expressed family of eukaryotic proteins that are implicated in both cell growth and the human acute allergic response [ ].
Protein Domain
Name: Nesprin-1-like, second calponin homology domain
Type: Domain
Description: This entry represents the second calponin homology (CH) domain of SYNE-1 and similar proteins predominantly found in vertebrates and arthropodes. CH domains are actin filament (F-actin) binding motifs.Nesprin-1, also called Synaptic nuclear envelope protein 1 (SYNE-1), is a multi-isomeric modular protein which forms a linking network between organelles and the actin cytoskeleton to maintain subcellular spatial organization. SYNE-1 also acts as a component of the LINC (LInker of Nucleoskeleton and Cytoskeleton) complex, which is involved in the connection between the nuclear lamina and the cytoskeleton. This protein is involved in the pathogenesis of Emery Dreifuss muscular dystrophy and lead to a type of autosomal recessive cerebellar ataxia. SYNE-1 contains two copies of the CH domain [ , , , , , , , ]. The homologue from Drosophila melanogaster, named Muscle-specific protein 300 kDa, is also included in this group. It collaborates with Klar to promote even spacing of the myonuclei at the periphery of striated muscle fibres and is essential for anchoring nuclei, mitochondria and endoplasmic reticulum (ER) structures to the Z-disks [, ].
Protein Domain
Name: Nesprin-1-like, first calponin homology domain
Type: Domain
Description: This entry represents the first calponin homology (CH) domain of SYNE-1 and similar proteins predominantly found in vertebrates and arthropods. CH domains are actin filament (F-actin) binding motifs.Nesprin-1, also called Synaptic nuclear envelope protein 1 (SYNE-1), is a multi-isomeric modular protein which forms a linking network between organelles and the actin cytoskeleton to maintain subcellular spatial organization. SYNE-1 also acts as a component of the LINC (LInker of Nucleoskeleton and Cytoskeleton) complex, which is involved in the connection between the nuclear lamina and the cytoskeleton. This protein is involved in the pathogenesis of Emery Dreifuss muscular dystrophy and lead to a type of autosomal recessive cerebellar ataxia. SYNE-1 contains two copies of the CH domain [ , , , , , , , ]. The homologue from Drosophila melanogaster, named Muscle-specific protein 300 kDa, is also included in this group. It collaborates with Klar to promote even spacing of the myonuclei at the periphery of striated muscle fibres and is essential for anchoring nuclei, mitochondria and endoplasmic reticulum (ER) structures to the Z-disks [, ].
Protein Domain
Name: TRIM45/56/19
Type: Family
Description: This entry represents a group of animal proteins defined by the TRIM/RBCC motif, including TRIM45/56/19 [ , , ]. These protein have been shown to act as SUMO E3 ligases and play important roles in a variety of cellular functions including cell proliferation, differentiation, development, oncogenesis, and apoptosis. TRIM45 may act as a transcriptional repressor in mitogen-activated protein kinase signaling pathway []; TRIM56 plays a key role in innate antiviral immunity by mediating ubiquitination of CGAS and STING1 [, ]. TRIM19, also known as PML, acts its association with PML-nuclear bodies (PML-NBs) in a wide range of important cellular processes, such as tumor suppression, transcriptional regulation, apoptosis, senescence, DNA damage response, and viral defense mechanisms. It acts as the scaffold of PML-NBs allowing other proteins to shuttle in and out, a process which is regulated by SUMO-mediated modifications and interactions. PML exhibits antiviral activity as it stimulates the SUMOylation of a viral protein which is supposed to serve as a cellular mechanism to compromise specific functions of the viral effector IE1p72 [].
Protein Domain
Name: ATPase, AAA-3
Type: Domain
Description: This entry includes some of the AAA proteins not detected by the model. AAA ATPases form a large, functionally diverse protein family belonging to the AAA+ superfamily of ring-shaped P-loop NTPases, which exert their activity through the energy-dependent unfolding of macromolecules. AAA ATPases contain a P-loop NTPase domain, which is the most abundant class of NTP-binding protein fold, and is found throughout all kingdoms of life [ ]. P-loop NTPase domains act to hydrolyse the beta-gamma phosphate bond of bound nucleoside triphosphate. There are two classes of P-loop domains: the KG (kinase-GTPase) division, and the ASCE division, the latter including the AAA+ group as well as several other ATPases.There are at least six major clades of AAA domains (metalloproteases, meiotic proteins, D1 and D2 domains of ATPases with two AAA domains, proteasome subunits, and BSC1), as well as several minor clades, some of which consist of hypothetical proteins [ ]. The domain organisation of AAA ATPases consists of a non-ATPase N-terminal domain that acts in substrate recognition, followed by one or two AAA domains (D1 and D2), one of which may be degenerate.
Protein Domain
Name: Trichohyalin
Type: Family
Description: The trichohyalin gene is a member of the "fused"gene family. Trichohyalin is a large structural protein abundant in the inner root sheath (IRS) of anagenic (growing) hair follicles and other sites of hard cornification [ , ]. It associates with intermediate filaments and associates in regular arrays with keratin intermediate filaments.In humans, a number of genes specifying structural proteins expressed late during epidermal differentiation have been identified and found to be clustered on chromosome 1q21. Therefore, this region is named the epidermal differentiation complex (EDC). The proteins encoded by the EDC genes can be classified into three groups: the precursor proteins of the CE, the S100 family and the "fused"gene family [ ]. In some classification, the "fused gene"family is classified as a subgroup within the S100 gene family [ ]. The "fused"gene family members contain EF hands and internal tandem repeats. It consists of profilaggrin, trichohyalin, repetin, hornerin, the profilaggrin-related protein and cornulin (encoded by c1orf10). They are associated with keratin intermediate filaments and partially cross-linked to the cell envelope (CE) [ ].
Protein Domain
Name: Chlamydia 15kDa cysteine-rich outer membrane
Type: Family
Description: Chlamydia is a genus of bacteria, which causes the most common bacterial sexually transmitted diseases. They are obligate intracellular bacterial pathogens. Members of this genus lack a peptidoglycan layer, but as a substitute, it has been proposed that they have several cysteine rich membrane proteins. This includes the major outer membrane protein (MOMP). These form disulphide bonds to provide rigidity to the cell wall. The alignment of the amino acid sequences of the MOMP from various serovars of Chlamydia show that they have between seven and ten cysteine residues; seven of which are highly conserved []. The MOMP has been the focus of efforts to produce a vaccine for Chlamydia trachomatis [].The 15kDa cysteine-rich protein in this entry is a multi-pass outer membrane protein. They are associated with the differentiation of reticulate bodies (RBs) into elementary bodies (EBs) [ ]. They immunolocalise to the inclusion membrane, which is the membrane that surrounds the intracellular parasite. These proteins are recognised by CD8+ T cells in both human and mouse infections, suggesting they gain access to the host cytoplasm.
Protein Domain
Name: Effector-associated domain 11
Type: Domain
Description: This entry represents the effector-associated domain 11 (EAD11). It is predicted to be an all α-helical domain [ ]. This domain is found in Roc-COR-CHAT protease, a recently characterised protease for substrate gasdermin bGSDM. It cleaves the bGSDM precursor, releasing the pore-forming moiety, which integrates into the membrane and triggers cell death [].Effector-associated domains (EADs) are predicted to function as adaptor domains mediating protein-protein interactions. The EADs show a characteristic architectural pattern. One copy is always fused, typically to the N- or C-terminal, of a core component of a biological conflict system; examples include VMAP (vWA-MoxR associated protein), iSTAND (inactive STAND (iSTAND) NTPase system), or GAP1 (GTPase-associated protein 1). Further copies of the same EAD are fused to either effector or signal-transducing domains, or additional EADs. EAD pairs are frequently observed together on the genome in conserved gene neighborhoods, but can also be severed from such neighborhoods and located in distant regions, indicating EAD-EAD protein domain coupling approximates the advantages of collinear transcription [ , ]. EADs are all small domains with no enzymatic features.
Protein Domain
Name: PHTF1/2, N-terminal
Type: Domain
Description: This domain is found in a group of homeodomain containing proteins from animals, including PHTF1/2, and is typically between 101 and 140 amino acids in length. PHTF proteins do not display any sequence similarity to known or predicted proteins, but their conservation among species suggests an essential function. The 84kDa Phtf1 protein is an integral membrane protein, anchored to a cell membrane by six to eight trans-membrane domains, that is associated with a domain of the endoplasmic reticulum (ER) juxtaposed to the Golgi apparatus. It is present during meiosis and spermiogenesis, and, by the end of spermiogenesis, is released from the mature spermatozoon within the residual bodies [ ]. PHTF1 enhances the binding of FEM1B -feminisation homologue 1B - to cell membranes. Fem-1 was initially identified in the signaling pathway for sex determination, as well as being implicated in apoptosis, but its biochemical role is still unclear, and neither FEM1B nor PHTF1 is directly implicated in apoptosis in spermatogenesis. It is the ANK domain of FEM1B that is necessary for the interaction with the N-terminal region of PHTF1 [].
Protein Domain
Name: Glycophorin, conserved site
Type: Conserved_site
Description: Proteins in this group are responsible for the molecular basis of the blood group antigens, surface markers on the outside of the red blood cell membrane. Most of these markers are proteins, but some are carbohydrates attached to lipids or proteins [Reid M.E., Lomas-Francis C. The Blood Group Antigen FactsBook Academic Press, London / San Diego, (1997)]. Glycophorin A (PAS-2) and glycophorin B (PAS-3) belong to the MNS blood group system and are associated with antigens that include M/N, S/s, U, He, Mi(a), M(c), Vw, Mur, M(g), Vr, M(e), Mt(a), St(a), Ri(a), Cl(a), Ny(a), Hut, Hil, M(v), Far, Mit, Dantu, Hop, Nob, En(a), ENKT, amongst others.Glycophorin A is the major sialoglycoprotein of the erythrocyte membrane [ ]. Structurally, glycophorin A consists ofan N-terminal extracellular domain, heavily glycosylated on serine and threonine residues, followed by a transmembrane region and a C-terminal cytoplasmic domain. Other glycophorins in this entry such as Glycophorin B and Glycophorin E represent minor sialoglycoproteins in the erythrocyte membrane.This entry represents a short conserved region found in the transmebrane domain of gylcophorins.
Protein Domain
Name: DltD
Type: Family
Description: The dlt operon (dltA to dltD) of Lactobacillus rhamnosus 7469 encodes four proteins responsible for the esterification of lipoteichoic acid (LTA) by D-alanine. These esters play an important role in controlling the net anionic charge of the poly (GroP) moiety of LTA. DltA and DltC encode the D-alanine-D-alanyl carrier protein ligase (Dcl) and D-alanyl carrier protein (Dcp), respectively. Whereas the functions of DltA and DltC are defined, the functions of DltB and DltD are unknown. In vitroassays showed that DltD bound Dcp for ligation with D-alanine by Dcl in the presence of ATP. In contrast, the homologue of Dcp, the Escherichia coli acyl carrier protein (ACP), involved in fatty acid biosynthesis, was not bound to DltD and thus was not ligated with D-alanine. DltD also catalyzed the hydrolysis of the mischarged D-alanyl-ACP. The hydrophobic N-terminal sequence of DltD was required for anchoring the protein in the membrane. It is hypothesized that this membrane-associated DltD facilitates the binding of Dcp and Dcl for ligation of Dcp with D-alanine and that the resulting D-alanyl-Dcp is translocated to the primary site of D-alanylation [ ].
Protein Domain
Name: Protein-arginine deiminase
Type: Family
Description: In the presence of calcium ions, Protein-arginine deiminase (PAD) enzymes catalyse the post-translational modification reaction responsible for the formation of citrulline residues from protein-bound arginine residues []. Four PAD isotypes of PAD have been identified in mammals, a fifth may also exist. Non-mammalian vertebrates appear to have only a single PAD enzyme. All known natural substrates of PAD are proteins known to have an important structural function, such as keratin (PAD1), intermediate filaments or proteins associated with intermediate filaments. Citrulination may have consequences for the structural integrity and interactions of these proteins. Physiological levels of calcium appear to be too low to activate these enzymes suggesting a role between PAD activation and loss of calcium homeostasis during terminal differentiation and cell death (apoptosis).In humans, PAD enzymes may be involved in cytoskeletal reorganization in the egg and early embryo [ ]. These enzymes abolish the methyltransferase activity of Nicotinamide- N-methyltransferase (NNMT) through its citrullination, which may play a role in a subset of breast cancers and several chronic disease conditions [, ].
Protein Domain
Name: Knr4/Smi1-like domain
Type: Domain
Description: This domain is found in the yeast cell wall assembly regulator Smi1 (also known as Knr4) [ , ]. Saccharomyces cerevisiae Knr4 has a regulatory role in chitin deposition and in cell wall assembly [ ]. It is believed to connect the PKC1-SLT2 MAPK pathway with cell proliferation. It has been shown to interact with Bck2, a gene involved in cell cycle progression in S. cerevisiae (forming a complex) to allow PKC1 to coordinate the cell cycle (cell proliferation) with cell wall integrity [, ]. Knr4 also interacts with the tyrosine-tRNA synthetase protein encoded by Tys1 and is involved in sporulation process []. Proteins containing this domain also include the animal F-box only protein 3 (FBXO3). In humans, FBXO3 is a substrate recognition component of the SCF (SKP1-CUL1-F-box protein)-type E3 ubiquitin ligase complex [ ].Interestingly, Smi1/Knr4 homologues from bacteria are potential immunity proteins in a subset of these contact-dependent inhibitory toxin systems [ ].Note: previously reported evidence that Knr4 may interact with nuclear matrix-association region [ ] may be due to an artefact [].
Protein Domain
Name: IPS1, CARD domain
Type: Domain
Description: This entry represents the CARD domain found in IPS1, also known as mitochondrial antiviral-signaling protein (MAVS). IPS1 is an adaptor protein that plays an important role in interferon induction in response to viral infection. It is crucial in triggering innate immunity and in developing adaptive immunity against viral pathogens. The CARD of IPS-1 associates with the CARDs of two RNA helicases, RIG-I and MDA5, which bind viral DNA in the cytoplasm during the initial stage of intracellular antiviral response, leading to the induction of type I interferons [ ].In general, CARDs are death domains (DDs) found associated with caspases [ ]. They are known to be important in the signaling pathways for apoptosis, inflammation, and host-defense mechanisms []. DDs are protein-protein interaction domains found in a variety of domain architectures. Their common feature is that they form homodimers by self-association or heterodimers by associating with other members of the DD superfamily including PYRIN and DED (Death Effector Domain). They serve as adaptors in signaling pathways and can recruit other proteins into signaling complexes [].
Protein Domain
Name: Selenophosphate synthetase, class I
Type: Family
Description: The UGA (TGA) codon is normally a termination codon, however it is also used as a selenocysteine (Sec) codon by numerous organisms [ ]. Sec is the 21st amino acid that is inserted into selenoproteins (protein that includes a selenocysteine (Se-Cys) amino acid residue). The synthesis of Sec and its incorporation into proteins requires the activity of a number of proteins, one of which is selenophosphate synthetase (SPS), also known as the SelD gene product [, ]. SPS catalyses the production of the selenium donor compound monoselenophosphate (MSP) from selenide and ATP. MSP is then used to synthesize Sec from seryl-tRNAs []. SPS was initially identified in E. coli as the product of the gene selD, one of four essential selenoprotein synthesis genes (selA-D) [, ]. SelC is the tRNA itself, SelD acts as a donor of reduced selenium, SelA modifies a serine residue on SelC into selenocysteine, and SelB is a selenocysteine-specific translation elongation factor. 3' or 5' non-coding elements of mRNA have been found as probable structures for directing selenocysteine incorporation. This entry represents the type I SPS, mostly from bacteria.
Protein Domain
Name: VapC-like Sll0205 protein, PIN domain
Type: Domain
Description: This entry represents the virulence associated protein C (VapC)-like PIN (PilT N terminus) domain of the Synechocystis sp. (strain PCC 6803) Sll0205 protein and other uncharacterized homologs. They are similar to the PIN domains of the Mycobacterium tuberculosis VapC and Neisseria gonorrhoeae FitB toxins of the prokaryotic toxin/antitoxin operons, VapBC and FitAB, respectively, which are believed to be involved in growth inhibition by regulating translation. These toxins are nearly always co-expressed with an antitoxin, a cognate protein inhibitor, forming an inert protein complex. Disassociation of the protein complex activates the ribonuclease activity of the toxin by an, as yet undefined mechanism [ ]. VapC-like PIN domains are homologs of flap endonuclease-1 (FEN1)-like PIN domains, but lack the extensive arch/clamp region and the H3TH (helix-3-turn-helix) domain, an atypical helix-hairpin-helix-2-like region, seen in FEN1-like PIN domains []. PIN domains within this subgroup contain four highly conserved acidic residues. These putative active site residues are thought to bind Mg2+ and/or Mn2+ ions and be essential for single-stranded ribonuclease activity [, ].
Protein Domain
Name: TrkH potassium transport family
Type: Family
Description: The Trk system is a low to medium affinity potassium uptake system, widely found in both in bacteria and archaea, where the uptake of K(+) is believed to be linked to H(+) symport [ ]. The core Trk system consists of two proteins, the integral membrane K(+)-translocating protein TrkH (or TrkG), and the regulatory NAD-binding peripheral membrane protein TrkA [, , ]. In Escherichia coli the activity of TrkH is dependent on the ATP-binding protein SapD (also known as TrkE) which is part of the SapABCDF ABC transporter, involved in putrescine export []. Not all Trk systems are dependent on SapD however - it is thought that these may utilise ATP-binding proteins from other ABC transporters [].Also included in this entry is the homologous subunit J of a V-type Na(+) ATP synthase found in some bacteria such as Enterococcus hirae. The function of this subunit is unknown, but as K(+) transport is dependent on the activity of the V-type Na(+) ATP synthase in this organism, it may function to exchange Na(+) for K(+) [ ].
Protein Domain
Name: Sorting nexin-33, BAR domain
Type: Domain
Description: Sorting nexins are a large family of evolutionarily conserved phosphoinositide-binding proteins that have roles in cargo sorting through the endosomal netwrok [ ]. Sorting nexins contain at least a PX domain (a phospholipid-binding motif). Some nexins contain a few additional domains. Proteins in the sorting nexin 9 subfamily includes SNX9, SNX18 and SNX33 [ ]. They are characterised by the presence of an N-terminal SH3 domain (), a PX domain that is a phosphoinositide-binding module ( ), and a Bin/Amphiphysin/Rvs (BAR) domain at the C terminus, which allows membrane binding and bending. They are required for progression and completion of mitosis [ ]. SNX33 plays a role in maintaining cell shape and cell cycle progression through its interaction with WASp (Wiskott-Aldrich syndrome protein) [ ]. It interferes with cellular prion protein (PrP) formation by modulation of its shedding []. It may also promote the formation of macropinosomes (large endocytic organelles) []. This entry represents the BAR domain of SNX33. BAR domains are dimerization, lipid binding and curvature sensing modules found in many different proteins with diverse functions [ ].
Protein Domain
Name: Claudin-2
Type: Family
Description: Claudins form the paracellular tight junction seal in epithelial tissues. In humans, 24 claudins (claudin 1-24) have been identified. Their ability to polymerise and form strands is affected by the cell types [ , , ]. They can also form heteropolymers with each other within and between tight junction strands []. Most of the claudins (claudin-12 being the exception) have a C-terminal PDZ-binding motif that can interact with other PDZ domain proteins, such as scaffolding protein, ZO-1, -2 and -3 []. They also interact with non-tight junction proteins, such as cell adhesion proteins EpCam and tetraspanins and the signaling proteins, ephrin A and B and their receptors, EphA and EphB [].Claudin-2 was initially isolated as a peptide fragment from TJ-enriched junctional cell fractions. Following sequencing and similarity searching it was cloned and expressed in cells, where it was shown to concentrate at TJs []. Human and mouse isoforms have been identified. Claudin-2 shares ~22-46% overall similarity with other claudin family members at the aminoacid level, displaying highest similarity to claudin-14.
Protein Domain
Name: Class III cytochrome C
Type: Domain
Description: Cytochromes c (cytC) can be defined as electron-transfer proteins having one or several haem c groups, bound to the protein by one or, more generally, two thioether bonds involving sulphydryl groups of cysteine residues. The fifth haem iron ligand is always provided by a histidine residue. CytC possess a wide range of properties and function in a large number of different redox processes [].Ambler [ ] recognised four classes of cytC.Class III comprises the low redox potential multiple haem cytochromes: C3 (tetrahaem),and high-molecular-weight cytC, HMC (hexadecahaem), with only 30-40 residues per haem group. The haem c groups, all bis-histidinyl coordinated,are structurally and functionally nonequivalent and present different redox potentials in the range 0 to -400 mV []. The 3D structures of a number of cyt C3 proteins have been determined. The proteins consist of 4-5 α-helices and 2 β-strands wrapped around a compactcore of four non-parallel haems, which present a relatively high degree of exposure to the solvent. The overall protein architecture, haem plane orientations and iron-iron distances are highly conserved [ ].
Protein Domain
Name: Globin/Protoglobin
Type: Homologous_superfamily
Description: Globins are haem-containing proteins involved in binding and/or transporting oxygen. They belong to a very large and well studied family that is widely distributed in many organisms [ ]. Globins have evolved from a common ancestor and can be divided into three groups: single-domain globins, and two types of chimeric globins, flavohaemoglobins and globin-coupled sensors. Bacteria have all three types of globins, while archaea lack flavohaemoglobins, and eukaryotes lack globin-coupled sensors []. Several functionally different haemoglobins can coexist in the same species. The major types of globins include:Haemoglobin (Hb): tetramer of two alpha and two beta chains, although embryonic and foetal forms can substitute the alpha or beta chain for ones with higher oxygen affinity, such as gamma, delta, epsilon or zeta chains. Hb transports oxygen from lungs to other tissues in vertebrates [ ]. Hb proteins are also present in unicellular organisms where they act as enzymes or sensors [].Myoglobin (Mb): monomeric protein responsible for oxygen storage in vertebrate muscle [ ].Neuroglobin: a myoglobin-like haemprotein expressed in vertebrate brain and retina, where it is involved in neuroprotection from damage due to hypoxia or ischemia [ ]. Neuroglobin belongs to a branch of the globin family that diverged early in evolution. Cytoglobin: an oxygen sensor expressed in multiple tissues. Related to neuroglobin [ ].Erythrocruorin: highly cooperative extracellular respiratory proteins found in annelids and arthropods that are assembled from as many as 180 subunit into hexagonal bilayers [ ].Leghaemoglobin (legHb or symbiotic Hb): occurs in the root nodules of leguminous plants, where it facilitates the diffusion of oxygen to symbiotic bacteriods in order to promote nitrogen fixation.Non-symbiotic haemoglobin (NsHb): occurs in non-leguminous plants, and can be over-expressed in stressed plants [ ].Flavohaemoglobins (FHb): chimeric, with an N-terminal globin domain and a C-terminal ferredoxin reductase-like NAD/FAD-binding domain. FHb provides protection against nitric oxide via its C-terminal domain, which transfers electrons to haem in the globin [ ].Globin-coupled sensors: chimeric, with an N-terminal myoglobin-like domain and a C-terminal domain that resembles the cytoplasmic signalling domain of bacterial chemoreceptors. They bind oxygen, and act to initiate an aerotactic response or regulate gene expression [ , ]. Protoglobin: a single domain globin found in archaea that is related to the N-terminal domain of globin-coupled sensors [].Truncated 2/2 globin: lack the first helix, giving them a 2-over-2 instead of the canonical 3-over-3 α-helical sandwich fold. Can be divided into three main groups (I, II and II) based on structural features [ ].This domain superfamily is found in the entire globin family of proteins, including the microbial globins [ ].
Protein Domain
Name: Globin
Type: Domain
Description: Globins are haem-containing proteins involved in binding and/or transporting oxygen. They belong to a very large and well studied family that is widely distributed in many organisms [ ]. Globins have evolved from a common ancestor and can be divided into three groups: single-domain globins, and two types of chimeric globins, flavohaemoglobins and globin-coupled sensors. Bacteria have all three types of globins, while archaea lack flavohaemoglobins, and eukaryotes lack globin-coupled sensors []. Several functionally different haemoglobins can coexist in the same species. The major types of globins include:Haemoglobin (Hb): tetramer of two alpha and two beta chains, although embryonic and foetal forms can substitute the alpha or beta chain for ones with higher oxygen affinity, such as gamma, delta, epsilon or zeta chains. Hb transports oxygen from lungs to other tissues in vertebrates [ ]. Hb proteins are also present in unicellular organisms where they act as enzymes or sensors [].Myoglobin (Mb): monomeric protein responsible for oxygen storage in vertebrate muscle [ ].Neuroglobin: a myoglobin-like haemprotein expressed in vertebrate brain and retina, where it is involved in neuroprotection from damage due to hypoxia or ischemia [ ]. Neuroglobin belongs to a branch of the globin family that diverged early in evolution. Cytoglobin: an oxygen sensor expressed in multiple tissues. Related to neuroglobin [ ].Erythrocruorin: highly cooperative extracellular respiratory proteins found in annelids and arthropods that are assembled from as many as 180 subunit into hexagonal bilayers [ ].Leghaemoglobin (legHb or symbiotic Hb): occurs in the root nodules of leguminous plants, where it facilitates the diffusion of oxygen to symbiotic bacteriods in order to promote nitrogen fixation.Non-symbiotic haemoglobin (NsHb): occurs in non-leguminous plants, and can be over-expressed in stressed plants [ ].Flavohaemoglobins (FHb): chimeric, with an N-terminal globin domain and a C-terminal ferredoxin reductase-like NAD/FAD-binding domain. FHb provides protection against nitric oxide via its C-terminal domain, which transfers electrons to haem in the globin [ ].Globin-coupled sensors: chimeric, with an N-terminal myoglobin-like domain and a C-terminal domain that resembles the cytoplasmic signalling domain of bacterial chemoreceptors. They bind oxygen, and act to initiate an aerotactic response or regulate gene expression [ , ]. Protoglobin: a single domain globin found in archaea that is related to the N-terminal domain of globin-coupled sensors [ ].Truncated 2/2 globin: lack the first helix, giving them a 2-over-2 instead of the canonical 3-over-3 α-helical sandwich fold. Can be divided into three main groups (I, II and II) based on structural features [ ].This entry covers most of the globin family of proteins, but it omits some bacterial globins and the protoglobins.
Protein Domain
Name: tRNA-cytidine(32) 2-sulfurtransferase
Type: Family
Description: The aminoacyl-tRNA synthetases (also known as aminoacyl-tRNA ligases) catalyse the attachment of an amino acid to its cognate transfer RNA molecule in a highly specific two-step reaction [ , ]. These proteins differ widely in size and oligomeric state, and have limited sequence homology []. The 20 aminoacyl-tRNA synthetases are divided into two classes, I and II. Class I aminoacyl-tRNA synthetases contain a characteristic Rossman fold catalytic domain and are mostly monomeric []. Class II aminoacyl-tRNA synthetases share an anti-parallel β-sheet fold flanked by α-helices [], and are mostly dimeric or multimeric, containing at least three conserved regions [, , ]. However, tRNA binding involves an α-helical structure that is conserved between class I and class II synthetases. In reactions catalysed by the class I aminoacyl-tRNA synthetases, the aminoacyl group is coupled to the 2'-hydroxyl of the tRNA, while, in class II reactions, the 3'-hydroxyl site is preferred. The synthetases specific for arginine, cysteine, glutamic acid, glutamine, isoleucine, leucine, methionine, tyrosine, tryptophan, valine, and some lysine synthetases (non-eukaryotic group) belong to class I synthetases. The synthetases specific for alanine, asparagine, aspartic acid, glycine, histidine, phenylalanine, proline, serine, threonine, and some lysine synthetases (non-archaeal group), belong to class-II synthetases. Based on their mode of binding to the tRNA acceptor stem, both classes of tRNA synthetases have been subdivided into three subclasses, designated 1a, 1b, 1c and 2a, 2b, 2c [].tRNA-cytidine(32) 2-sulfurtransferase (also known as 2-thiocytidine tRNA biosynthesis protein TtcA) is required for the thiolation of cytidine in position 32 of tRNA, to form 2-thiocytidine (s(2)C32). The modified nucleoside 2-thiocytidine (s(2)C) has so far been found in tRNA from archaea and bacteria. The TtcA protein family is characterised by the existence of both a PP-loop and a Cys-X(1)-X(2)-Cys motif in the central region of the protein but can be divided into two distinct groups based on the presence and location of additional Cys-X(1)-X(2)-Cys motifs in terminal regions of the sequence. Mutant analysis showed that both cysteines in this central conserved Cys-X(1)-X(2)-Cys motif are required for the formation of s(2)C [ ]. The PP-loop motif appears to be a modified version of the P-loop of nucleotide binding domain that is involved in phosphate binding [ ]. Named PP-motif, since it appears to be a part of a previously uncharacterised ATP pyrophophatase domain. ATP sulfurylases, Escherichia coli NtrL, and Bacillus subtilis OutB consist of this domain alone. In other proteins, the pyrophosphatase domain is associated with amidotransferase domains (type I or type II), a putative citrulline-aspartate ligase domain or a nitrilase/amidase domain.This entry represents tRNA-cytidine(32) 2-sulfurtransferase (also known as 2-thiocytidine tRNA biosynthesis protein, TtcA) and its homologues.
Protein Domain
Name: PsbP, C-terminal
Type: Domain
Description: Oxygenic photosynthesis uses two multi-subunit photosystems (I and II) located in the cell membranes of cyanobacteria and in the thylakoid membranes of chloroplasts in plants and algae. Photosystem II (PSII) has a P680 reaction centre containing chlorophyll 'a' that uses light energy to carry out the oxidation (splitting) of water molecules, and to produce ATP via a proton pump. Photosystem I (PSI) has a P700 reaction centre containing chlorophyll that takes the electron and associated hydrogen donated from PSII to reduce NADP+ to NADPH. Both ATP and NADPH are subsequently used in the light-independent reactions to convert carbon dioxide to glucose using the hydrogen atom extracted from water by PSII, releasing oxygen as a by-product.PSII is a multisubunit protein-pigment complex containing polypeptides both intrinsic and extrinsic to the photosynthetic membrane [ , , ]. Within the core of the complex, the chlorophyll and beta-carotene pigments are mainly bound to the antenna proteins CP43 (PsbC) and CP47 (PsbB), which pass the excitation energy on to the reaction centre proteins D1 (Qb, PsbA) and D2 (Qa, PsbD) that bind all the redox-active cofactors involved in the energy conversion process. The PSII oxygen-evolving complex (OEC) oxidises water to provide protons for use by PSI, and consists of OEE1 (PsbO), OEE2 (PsbP) and OEE3 (PsbQ). The remaining subunits in PSII are of low molecularweight (less than 10kDa), and are involved in PSII assembly, stabilisation, dimerisation, and photo-protection [ ]. In PSII, the oxygen-evolving complex (OEC) is responsible for catalysing the splitting of water to O(2) and 4H+. The OEC is composed of a cluster of manganese, calcium and chloride ions bound to extrinsic proteins. In cyanobacteria there are five extrinsic proteins in OEC (PsbO, PsbP-like, PsbQ-like, PsbU and PsbV), while in plants there are only three (PsbO, PsbP and PsbQ), PsbU and PsbV having been lost during the evolution of green plants [ ].This entry represents the C-terminal domain found in PSII OEC protein PsbP. Both PsbP and PsbQ ( ) are regulators that are necessary for the biogenesis of optically active PSII. PsbP increases the affinity of the water oxidation site for chloride ions and provides the conditions required for high affinity binding of calcium ions [ , ]. The crystal structure of PsbP from Nicotiana tabacum (Common tobacco) revealed a two-domain structure, where domain 1 may play a role in the ion retention activity in PSII, the N-terminal residues being essential for calcium and chloride ion retention activity []. PsbP is encoded in the nuclear genome in plants.
Protein Domain
Name: Peptidase S9, prolyl oligopeptidase, catalytic domain
Type: Domain
Description: Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes [ ]. They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Many families of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence []. Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases [].Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base [ ]. The geometric orientations of the catalytic residues are similar between families, despite different protein folds []. The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) [, ].This domain covers the active site serine of the serine peptidases belonging to MEROPS peptidase family S9 (prolyl oligopeptidase family, clan SC). The protein fold of the peptidase domain for members of this family resembles that of serine carboxypeptidase D, the type example of clan SC. Examples of protein families containing this domain are:Prolyl endopeptidase ( ) (PE) (also called post-proline cleaving enzyme). PE is an enzyme that cleaves peptide bonds on the C-terminal sideof prolyl residues. The sequence of PE has been obtained from a mammalian species (pig) and from bacteria (Flavobacterium meningosepticum andAeromonas hydrophila); there is a high degree of sequence conservation between these sequences.Escherichia coli protease II ( ) (oligopeptidase B) (gene prtB) which cleaves peptide bonds on the C-terminal side of lysyl and argininylresidues. Dipeptidyl peptidase IV ( ) (DPP IV). DPP IV is an enzyme that removes N-terminal dipeptides sequentially from polypeptides havingunsubstituted N-termini provided that the penultimate residue is proline. Saccharomyces cerevisiae (Baker's yeast) vacuolar dipeptidyl aminopeptidases A and B (DPAP A and DPAP B), encoded by the STE13 and DAP2 genes respectively. DPAP A is responsible for the proteolytic maturation of the alpha-factor precursor.Acylamino-acid-releasing enzyme ( ) (acyl-peptide hydrolase). This enzyme catalyses the hydrolysis of the amino-terminal peptide bond ofan N-acetylated protein to generate a N-acetylated amino acid and a protein with a free amino-terminus.These proteins belong to MEROPS peptidase families S9A, S9B and S9C.
Protein Domain
Name: GPCR, family 3, extracellular calcium-sensing receptor-related
Type: Family
Description: G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups [ ]. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [, , , , ]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The metabotropic glutamate receptors are functionally and pharmacologically distinct from the ionotropic glutamate receptors. They are coupled to G-proteins and stimulate the inositol phosphate/Ca 2+intracellular signalling pathway [ , , , ]. At least eight sub-types of metabotropic receptor (GRM1-8) have been identified in cloning studies. The sub-types differ in their agonist pharmacology and signal transduction pathways.The calcium-sensing receptor (CaSR) is an integral membrane protein that senses changes in the extracellular concentration of calcium ions. Theactivity of the receptor is mediated by a G-protein that activates a phosphatidyl-inositol-calcium second messenger system. The sequences of the receptors show a high degree of similarity to the TM signature thatcharacterises the metabotropic glutamate receptors. In addition, the sequences contain a large extracellular domain that includes clusters ofacidic amino acid residues, which may be involved in calcium binding [ ].Defects in CaSR that result in reduced activity of the receptor cause familial hypocalciuric hypercalcemia (FHH) and neonatal severe hyperparathyroidism (NSHPT), inherited conditions characterised by altered calciumhomeostasis [ , ]. FHH-affected individuals exhibit mild or modest hypercalcemia, relative hypocalciuria and inappropriately normal PTH levels. Bycontrast, NSHPT is a rare autosomal recessive life-threatening disorder characterised by high serum calcium concentrations, skeletal demineralisation and parathyroid hyperplasia. In addition, defects resulting fromreceptor activation at subnormal Ca 2+levels cause autosomal dominant hypocalcemia [].This entry represents the extracellular calcium-sensing receptors and related proteins in GPCR family 3, such as the taste receptors.
Protein Domain
Name: Secreted aspartic endopeptidase
Type: Domain
Description: SAPs (Secreted aspartic proteinases) are secreted from a group of pathogenic fungi, predominantly Candida species. They are secreted from the pathogen to degrade host proteins. SAP is one of the most significant extracellular hydrolytic enzymes produced by C. albicans. SAP proteins, encoded by a family of 10 SAP genes. All 10 SAP genes of C. albicans encode preproenzymes, approximately 60 amino acid longer than the mature enzyme, which are processed when transported via the secretory pathway. The mature enzymes contain sequence motifs typical for all aspartyl proteinases, including the two conserved aspartate residues other active site and conserved cysteine residues implicated in the maintenance of the three-dimensional structure. Most Sap proteins contain putative N-glycosylation sites, but it remains to be determined which Sap proteins are glycosylated [ ].A variety of fungal secreted aspartic peptidases are included in this entry:barrierpepsin (MEROPS identifier A01.015)candidapepsin SAP1 (A01.014)candidapepsin SAP2 (A01.060)candidapepsin SAP3 (A01.061)candidapepsin SAP4 (A01.062)candidapepsin SAP5 (A01.063)candidapepsin SAP6 (A01.064)candidapepsin SAP7 (A01.065)candidapepsin SAP8 (A01.066)candidapepsin SAP9 (A01.067)candidapepsin SAP10 (A01.085)candiparapsin (A01.038)canditropsin (A01.037)yapsin-1 (A01.030)yapsin-2 (A01.031)These are not all products secreted by pathogenic fungi. Barrierpepsin and yapsins are secreted by Sacchomyces cervisiae. Barrierpepsin is secreted by yeasts of mating type a and processes the alpha-mating factor at the Leu-Lys bond, thereby inactivating it. This is mechanism for optimizing the concentration of the mating factor [ , ]. Yapsin-1 releases alpha-mating factor from its precursor [].Aspartyl proteases (APs), also known as acid proteases, ([intenz:3.4.23.-]) are a widely distributed family of proteolytic enzymes [, , , , , ] known to exist in vertebrates, fungi, plants, retroviruses and some plant viruses. APs use an Asp dyad to hydrolyze peptide bonds.APs found in eukaryotic cells are α/β monomers composed of two asymmetric lobes ("bilobed"). Each of the lobes provides a catalytic Asp residue, positioned within the hallmark motif Asp-Thr/Ser-Gly, to the active site. The N- and C-terminal domains, although structurally related by a 2-fold axis, have only limited sequence homology except the vicinity of the active site. This suggests that the enzymes evolved by an ancient duplication event. The enzymes specifically cleave bonds in peptides which have at least six residues in length with hydrophobic residues in both the P1 and P1' positions. The active site is located at the groove formed by the two lobes, with an extended loop projecting over the cleft to form an 11-residue flap, which encloses substrates and inhibitors in the active site. Specificity is determined by nearest-neighbour hydrophobic residues surrounding the catalytic aspartates, and by three residues in the flap. The enzymes are mostly secreted from cells as inactive proenzymes that activate autocatalytically at acidic pH. Eukaryotic APs form peptidase family A1 of clan AA.
Protein Domain
Name: Tumour necrosis factor domain
Type: Domain
Description: Cytokines can be grouped into a family on the basis of sequence, functional and structural similarities [ , , ]. Tumor necrosis factor (TNF) (also known as TNF-alpha or cachectin) is a monocyte-derived cytotoxin that has been implicated in tumour regression, septic shock and cachexia [, ]. The protein is synthesised as a prohormone with an unusually long and atypical signal sequence, which is absent from the mature secreted cytokine []. A short hydrophobic stretch of amino acids serves to anchor the prohormone in lipid bilayers []. Both the mature protein and a partially-processed form of the hormone are secreted after cleavage of the propeptide [].There are a number of different families of TNF, but all these cytokines seem to form homotrimeric (or heterotrimeric in the case of LT-alpha/beta) complexes that are recognised by their specific receptors. The following cytokines can be grouped into a family on the basis of sequence, functional, and structural similarities [ , , ]: Tumor Necrosis Factor (TNF) (also known as cachectin or TNF-alpha) [ , ] is a cytokine which has a wide variety of functions. It can cause cytolysis of certain tumor cell lines; it is involved in the induction of cachexia; it is a potent pyrogen, causing fever by direct action or by stimulation of interleukin-1 secretion; finally, it can stimulate cell proliferation and induce cell differentiation under certain conditions.Lymphotoxin-alpha (LT-alpha) and lymphotoxin-beta (LT-beta), two related cytokines produced by lymphocytes and which are cytotoxic for a wide range of tumor cells in vitro and in vivo [ ]. T cell antigen gp39 (CD40L), a cytokine which seems to be important in B-cell development and activation.CD27L, a cytokine which plays a role in T-cell activation. It induces the proliferation of costimulated T cells and enhances the generation of cytolytic T cells. CD30L, a cytokine which induces proliferation of T cells.FASL, a cytokine involved in cell death [ ].4-1BBL, a inducible T cell surface molecule that contributes to T-cell stimulation.OX40L, a cytokine that co-stimulates T cell proliferation and cytokine production [ ].TNF-related apoptosis inducing ligand (TRAIL), a cytokine that induces apoptosis [ ].TNF-alpha is synthesised as a type II membrane protein which then undergoes post-translational cleavage liberating the extracellular domain. CD27L, CD30L, CD40L, FASL, LT-beta, 4-1BBL and TRAIL also appear to be type II membrane proteins. LT-alpha is a secreted protein. All these cytokines seem to form homotrimeric (or heterotrimeric in the case of LT-alpha/beta) complexes that are recognised by their specific receptors. The PROSITE pattern for this family is located in a β-strand in the central section of the protein which is conserved across all members.
Protein Domain
Name: RTX, pore-forming domain
Type: Domain
Description: This is a hydrophobic pore-forming domain found towards the N-terminal of RTX toxins [ ].Secretion of virulence factors in Gram-negative bacteria involves transportation of the protein across two membranes to reach the cell exterior [ , ]. Four principal exotoxin secretion systems have been described. In the type II and IV secretion systems, toxins are first exported to the periplasm by way of a cleaved N-terminal signal sequence; a second set of proteins is used for extracellular transport (type II), or the C terminus of the exotoxin itself is used (type IV). Type III secretion involves at least 20 molecules that assemble into a needle; effector proteins are then translocated through this without need of a signal sequence. In the Type I system, a complete channel is formed through both membranes, and the secretion signal is carried on the C terminus of the exotoxin. The RTX (repeats in toxin) family of cytolytic toxins belong to the Type I secretion system, and are important virulence factors in Gram-negative bacteria, such as Escherichia coli ( ), Actinobacillus pleuropneumoniae ( ) and Kingella kingae ( ). They consist of a hydrophobic pore-forming domain at the N-terminal that harbors four putative transmembrane α-helices, a typical glycine-rich repeats segment and a C-terminal signal sequence [ ]. The glycine-rich repeats are essential for binding calcium, and are critical for the biological activity of the secreted toxins []. They can be divided into two different groups, (i) hemolysins, which cause cause the lysis of erythrocytes and exhibit toxicity towards a wide range of cell types from various species; and (ii) leukotoxins, that exhibit narrow cell type and species specificity due to cell-specific binding through the beta2-integrins expressed on the cell surface of leukocytes []. All RTX toxin operons exist in the order rtxCABD, RtxA protein being the structural component of the exotoxin, both RtxB and D being required for its export from the bacterial cell; RtxC is an acyl-carrier-protein-dependent acyl-modification enzyme, required to convert RtxA to its active form [].Escherichia coli haemolysin (HlyA) is often quoted as the model for RTX toxins. Recent work on its relative rtxC gene product HlyC [ ] has revealed that it provides the acylation aspect for post-translational modification of two internal lysine residues in the HlyA protein. To cause pathogenicity, the HlyA toxin must first bind Ca2+ ions to the set of glycine-rich repeats and then be activated by HlyC []. This has been demonstrated both in vitroand in vivo.
Protein Domain
Name: EDG-8 sphingosine 1-phosphate receptor
Type: Family
Description: G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups [ ]. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [, , , , ]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [ ].The rhodopsin-like GPCRs (GPCRA) represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7 transmembrane (TM) helices [ , , ].Sphingosine 1-phosphate (S1P) is released from activated platelets and is also produced by a number of other cell types in response to growth factors and cytokines [ ]. It isproposed to act both as an extracellular mediator and as an intracellular second messenger. Recently, 5G protein-coupled receptors have been identified that act as high affinity receptors for S1P and also as low affinity receptors for the relatedlysophospholipid, SPC [ ]. EDG-1, EDG-3, EDG-5 and EDG-8 share a high degree of similarity and are also referred to as lpB1, lpB3, lpB2 and lpB4,respectively. EDG-6 is referred to as lpC1, reflecting its more distant relationship to the other S1P receptors.EDG-8 is expressed predominantly in the white matter tracts of the brain and in the pancreas []. Upon binding of S1P, EDG-8 appears to couple to Gi and G12 proteins but not Gq family members. Unlike other EDG receptors, which activate MAP kinases and stimulate proliferation, EDG-8 causes inhibition of ERK MAP kinases and proliferation, and also inhibition of adenylyl cyclase [].
Protein Domain
Name: Glutaredoxin 2, C-terminal
Type: Domain
Description: Glutaredoxins [ , , ], also known as thioltransferases (disulphide reductases), are small proteins of approximately one hundred amino-acid residues which utilise glutathione and NADPH as cofactors. Oxidized glutathione is regenerated by glutathione reductase. Together these components compose the glutathione system [].Glutaredoxin functions as an electron carrier in the glutathione-dependent synthesis of deoxyribonucleotides by the enzyme ribonucleotide reductase. Like thioredoxin (TRX), which functions in a similar way, glutaredoxin possesses an active centre disulphide bond [ ]. It exists in either a reduced or an oxidized form where the two cysteine residues are linked in an intramolecular disulphide bond. It contains a redox active CXXC motif in a TRX fold and uses a similar dithiol mechanism employed by TRXs for intramolecular disulfide bond reduction of protein substrates. Unlike TRX, GRX has preference for mixed GSH disulfide substrates, in which it uses a monothiol mechanism where only the N-terminal cysteine is required. The flow of reducing equivalents in the GRX system goes from NADPH ->GSH reductase ->GSH ->GRX ->protein substrates [ , , , ]. By altering the redox state of target proteins, GRX is involved in many cellular functions including DNA synthesis, signal transduction and the defense against oxidative stress.Glutaredoxin has been sequenced in a variety of species. On the basis of extensive sequence similarity, it has been proposed [ ] that Vaccinia virus protein O2L is most probably a glutaredoxin. Finally, it must be noted that Bacteriophage T4 thioredoxin seems also to be evolutionary related. In position 5 of the pattern T4 thioredoxin has Val instead of Pro.Unlike other glutaredoxins, glutaredoxin 2 (Grx2) cannot reduce ribonucleotide reductase [ ]. Grx2 has significantly higher catalytic activity in the reduction of mixed disulphides with glutathione (GSH) compared with other glutaredoxins. It adopts a GST fold containing an N-terminal thioredoxin-fold domain and a C-terminal alpha helical domain. The active site residues (Cys9-Pro10-Tyr11-Cys12, in Escherichia coli Grx2, ), which are found at the interface between the N- and C-terminal domains are identical to other glutaredoxins, but there is no other similarity between glutaredoxin 2 and other glutaredoxins. Grx2 is structurally similar to glutathione-S-transferases (GST), but there is no obvious sequence similarity []. The inter-domain contacts are mainly hydrophobic, suggesting that the two domains are unlikely to be stable on their own. Both domains are needed for correct folding and activity of Grx2. It is thought that the primary function of Grx2 is to catalyse reversible glutathionylation of proteins with GSH in cellular redox regulation including the response to oxidative stress. The N-terminal domain is .
Protein Domain
Name: Hepatocyte growth factor-regulated tyrosine kinase substrate/VPS27
Type: Family
Description: Members of this group are characterised by the presence of a VHS (Vps27p/Hrs/Stam) domain in the N-terminal portion followed by a FYVE domain and one or two ubiquitin-interacting motifs.VHS domains are found at the N termini of select proteins involved in intracellular membrane trafficking and are often localised to membranes. FYVE domains are membrane localisation domains that specifically bind phosphatidylinositol 3-phosphate, regulating membrane trafficking and signalling pathways [ ]. The ubiquitin-interacting motif is found in many proteins involved in the endocytic pathway and is capable of binding ubiquitin.The three dimensional structure of the VHS and FYVE tandem domain unit of Drosophila melanogaster (Fruit fly) Hrs reveals a pyramidal structure in which the much larger VHS domain forms a rectangular base and the FYVE domain occupies the apical end. The VHS domain is composed of an unusual superhelix of eight alpha helices, and the FYVE domain is mainly built of loops, two double-stranded antiparallel sheets, and a helix stabilised by two tetrahedrally coordinated zinc atoms. Dimerisation creates two identical pockets designed for binding ligands with multiple negative charges such as citrate or phosphatidylinositol 3-phosphate [ ]. The FYVE domain of the Hrs orthologue in yeast is consistent with this structure [].Members of this group regulate endosome maturation and trafficking between endosomes and the degradative organelles (lysosome/vacuole) [ ]. Monoubiquitination functions as a signal for sorting transmembrane proteins into intraluminal vesicles of multivesicular bodies (MVB) and subsequent delivery to the degradative organelles []. The sorting of proteins into the inner vesicles of multivesicular bodies is required for many key cellular processes, ranging from the downregulation of activated signalling receptors to the proper stimulation of the immune response [, ]. A molecular machine that contains the ubiquitin-binding protein Hrs as well as three multi-subunit complexes, ESCRT (endosome-associated complex required for transport) -I, -II and -III, are essential for both sorting and MVB formation [, ].A conserved sequence motif, PT/(S)AP, found in structural proteins of several RNA viruses (e.g. HIV gag), promotes release of virus from the cell by recruiting the ESCRT machinery to the viral budding sites at the plasma membrane. The same motif is also found in Hrs and recruits the ESCRT I complex to endosomes through direct interaction with one of its components, TSG101. Fusion of Hrs with the gag gene of HIV-1 lacking this motif can complement a defect in virus budding. Further challenging data indicate a wider role for Hrs in the regulation of endosome dynamics [ ].Rat GalCer expression factor 1 (GEF-1), which shows high sequence similarity to Hrs, induces GalCer expression, morphological changes, and cell growth inhibition in COS-7 cells [ ].
Protein Domain
Name: Photosystem I PsaA/PsaB, conserved site
Type: Conserved_site
Description: Photosystem I (PSI) [ ] is an integral membrane protein complex that uses light energy to mediate electron transfer from plastocyanin to ferredoxin. PSI is found in the chloroplast of plants and cyanobacteria. The electron transfer components of the reaction centre of PSI are a primary electron donor P-700 (chlorophyll dimer) and five electron acceptors: A0 (chlorophyll), A1 (a phylloquinone) and three 4Fe-4S iron-sulphur centres: Fx, Fa, and Fb.PsaA and psaB, two closely related proteins, are involved in the binding of P700, A0, A1, and Fx. psaA and psaB are both integral membrane proteins of 730 to 750 amino acids that seem to contain 11 transmembrane segments. The Fx 4Fe-4S iron-sulphur centre is bound by four cysteines; two of these cysteines are provided by the psaA protein and the two others by psaB. The two cysteines in both proteins are proximal and located in a loop between the ninth and tenth transmembrane segments. A leucine zipper motif seems to be present [ ] downstream of the cysteines and could contribute to dimerisation of psaA/psaB.This entry represents a conserved region that includes the two iron-sulphur binding cysteines.
Protein Domain
Name: E3 Ubiquitin ligase MUL1-like
Type: Domain
Description: This domain is found in mitochondrial ubiquitin ligase activator of NFKB 1 (MULAN, also known as MUL1) from animals and ubiquitin E3 Ligase SP1/SP2/SPL1/SPL2 from Arabidopsis.MUL1 is a multifunctional E3 ubiquitin ligase anchored in the outer mitochondrial membrane with its RING finger domain facing the cytoplasm. Mul1 functions as a ubiquitin ligase to ubiquitinate molecules such as mitofusin2 (Mfn2), Akt, p53 and ULK1, through its RING finger domain, leading to proteins degradation. Moreover, Mul1 can also act as a small ubiquitin-like modifiers (SUMO) E3 ligase to sumoylate proteins such as dynamin-related protein 1 (Drp1), enhancing protein stabilization [ ]. It plays a role in the control of mitochondrial morphology, promotes mitochondrial fragmentation and influences mitochondrial localisation []. When over-expressed in human cells, it activates JNK through MAP3K7/TAK1 and induces caspase-dependent apoptosis []. MUL1 has also been shown to regulate RIG-I mediated antiviral response []. Ubiquitin E3 ligase SP1 associates with TOC (translocon at the outer envelope membrane of chloroplasts) complexes and mediates ubiquitination of TOC components, promoting their degradation. SP1-mediated regulation of chloroplast protein import contributes to the organellar proteome changes that occur during plant development [ ]. It is also important for stress tolerance in plants [].
Protein Domain
Name: FMN-binding split barrel
Type: Homologous_superfamily
Description: The FMN-binding domain has a split β-barrel structure with a Greek-key topology that is related in structure to the ferredoxin reductase-like FAD-binding domain. The FMN-binding split barrel domain is found in pyridoxine 5'-phoshate oxidase (PNP oxidase), FMN-binding protein, ferric reductase, and in phenol 2-hydroxylase component B (PheA2).PNP oxidase ( ) is an FMN flavoprotein that catalyses the oxidation of pyridoxamine-5-P (PMP) and pyridoxine-5-P (PNP) to pyridoxal-5-P (PLP). This reaction serves as the terminal step in the de novo biosynthesis of PLP in Escherichia coli, and as a part of the salvage pathway of this coenzyme in both E. coli and mammalian cells [ , ]. The binding sites for FMN and for substrate have been highly conserved throughout evolution. The FMN-binding protein (FMN-bp) is one of the smallest proteins known to bind FMN. FMN-bp appears to participate in the electron-transfer pathway, and may have a structural relationship to the C-terminal domain of chymotrypsin [ , ].Microbial ferric reductases are essential for generating more soluble ferrous iron to use in cellular proteins (assimilatory ferric reductases), and as terminal reductases of iron respiratory pathway of certain bacteria (dissimilatory iron reductases). Most assimilatory iron reductases are flavin enzymes [ ].
Protein Domain
Name: Blood group Rhesus C/E/D polypeptide
Type: Family
Description: Proteins in this group are responsible for the molecular basis of the blood group antigens, surface markers on the outside of the red blood cell membrane. Most of these markers are proteins, but some are carbohydrates attached to lipids or proteins [Reid M.E., Lomas-Francis C. The Blood Group Antigen Facts Book Academic Press, London / San Diego, (1997)]. The RH(CE) polypeptide (Rhesus C/E antigens) and RH(D) polypeptide (Rhesus D antigen) belong to the Rh blood group system and are associated with antigens that include C/c, E/e, D, f, C(e), C(w), C(x), V, E(w), G, Tar, VS, D(w), cE, amongst others.The Rh (Rhesus) blood group system is important in clinical medicine by virtue of being involved in haemolytic disease of the newborn, transfusion reactions, autoimmune haemolytic anaemias, and haemolytic reactions of non-immune origin [ ]. The RH locus from RH(D)-positive donors contains 2 homologous structural genes, one of which encodes the D protein that carries the major antigen of the Rh system. Hydropathy analysis of the RhD gene product reveals 13 hydrophobic domains, all of which have been assumed to be transmembrane (TM) [].The proteins in this entry are related to ammonium transport [ , ].
Protein Domain
Name: CYC/TB1, R domain
Type: Domain
Description: Members of the TCP family of transcription factors have so far only been found in plants, where they are implicated in processes related to cell proliferation. It appears that TCP domain (see ) protein have been recruited during evolution to control cell division and growth in various developmental processes. The TCP proteins fall into two subfamilies, one including CYC and TB1 and the other including the PCFs. Most members of the CYC/TB1 subfamily have an R domain, predicted to form a coiled coil that may mediate protein-protein interactions [ , ].The R domain is rich in polar residues (arginine, lysine and glutamic acid) and is predicted to form a hydrophilic α-helix [ ].Some proteins known to contain an R domain are listed below:Antirrhinum majus (Garden snapdragon) cycloidea (CYC). It is involved in the control of floral symmetry, a character that has changed many times during plant evolutionZea mays (Maize) teosinte branched 1 (TB1). It controls the developmental of apical dominance that contributed to the evolution of modern day maize from its wild ancestor teosinte Arabidopsis thaliana (Mouse-ear cress) TCP2 and TCP3, which correlate with actively dividing regions of the floral meristem
Protein Domain
Name: Transcription initiation factor TFIID, 23-30kDa subunit
Type: Family
Description: Transcription initiation factor TFIID is a multimeric protein complex that plays a central role in mediating promoter responses to various activatorsand repressors. The complex includes TATA binding protein (TBP) and various TBP-associated factors (TAFS). TFIID is a bona fide RNA polymerase II-specificTATA-binding protein-associated factor (TAF) and is essential for viability [ ]. This entry represents one of the TAFs, TAF10. TFIID acts to nucleate the transcription complex, recruiting the rest of the factors through a direct interaction with TFIIB. The TBP subunit of TFIID is sufficient for TATA-element binding and TFIIB interaction, and can support basal transcription. The protein belongs to the TAF2H family.TAF10 is part of other transcription regulatory multiprotein complexes (e.g., SAGA, TBP-free TAF-containing complex [TFTC], STAGA, and PCAF/GCN5). Several TAFs interact via histone-fold motifs. The histone fold (HFD) is the interaction motif involved in heterodimerization of the core histones and their assembly into nucleosome octamer. The minimal HFD contains three α-helices linked by two loops. The HFD is found in core histones, TAFs and many other transcription factors. Five HF-containing TAF pairs have been described in TFIID: TAF6-TAF9, TAF4-TAF12, TAF11-TAF13, TAF8-TAF10 and TAF3-TAF10 [, , ].
Protein Domain
Name: ATPase, AFG1-like
Type: Family
Description: This P-loop motif-containing family of proteins includes AFG1, LACE1 and ZapE.ATPase family gene 1 (AFG1) is a 377 amino acid yeast protein with an ATPase motif typical of the family [ ].AFG1-like ATPase (also known as lactation elevated 1 or LACE1), the mammalian homologue of AGF1, is a mitochondrial integral membrane protein that is essential for maintenance of fused mitochondrial reticulum and lamellar cristae morphology. It has also been demonstrated that LACE1 mediates degradation of nuclear-encoded complex IV subunits COX4 (cytochrome c oxidase 4), COX5A and COX6A, and is required for normal activity of complexes III and IV of the respiratory chain [ ].ZapE is a cell division protein found in Gram-negative bacteria. The bacterial cell division process relies on the assembly, positioning, and constriction of FtsZ ring (the so-called Z-ring), a ring-like network that marks the future site of the septum of bacterial cell division. ZapE is a Z-ring associated protein required for cell division under low-oxygen conditions. It is an ATPase that appears at the constricting Z-ring late in cell division. It reduces the stability of FtsZ polymers in the presence of ATP in vitro [ ].
Protein Domain
Name: Mechanosensitive ion channel MscS-like, plants/fungi
Type: Family
Description: This entry represents a group of MscS-like (mechanosensitive channels of small conductance-like) proteins found in fungi and plants. Ten MscS-Like (MSL) proteins have been found in the genome of Arabidopsis thaliana [ , ]. In the fission yeast Schizosaccharomyces pombethe mechanosensitive ion channel proteins are known as Msy1 and Msy2 [ ].Mechanosensitive (MS) channels provide protection against hypo-osmotic shock, responding both to stretching of the cell membrane and to membrane depolarisation. They are present in the membranes of organisms from the three domains of life: bacteria, archaea, and eukarya [ ]. There are two families of MS channels: large-conductance MS channels (MscL) and small-conductance MS channels (MscS or YGGB). The pressure threshold for MscS opening is 50% that of MscL []. The MscS family is much larger and more variable in size and sequence than the MscL family. Much of the diversity in MscS proteins occurs in the size of the transmembrane regions, which ranges from three to eleven transmembrane helices, although the three C-terminal helices are conserved.In the fission yeast Schizosaccharomyces pombethe mechanosensitive ion channel proteins are known as Msy1 and Msy2 [ ].
Protein Domain
Name: Lipocalin
Type: Family
Description: The lipocalins are a diverse, interesting, yet poorly understood family of proteins composed, in the main, of extracellular ligand-binding proteins displaying high specificity for small hydrophobic molecules []. Functions of these proteins include transport of nutrients, control of cell regulation, pheromone transport, cryptic colouration, and the enzymatic synthesis of prostaglandins. For example, retinol-binding protein 4 transfers retinol from the stores in the liver to peripheral tissues [].The crystal structures of several lipocalins have been solved and show a novel 8-stranded anti-parallel β-barrel fold well conserved within the family. Sequence similarity within the family is at a much lower level and would seem to be restricted to conserved disulphides and 3 motifs, which form a juxtaposed cluster that may act as a common cell surface receptor site [, ]. By contrast, at the more variable end of the fold are found an internal ligand binding site and a putative surface for the formation of macromolecular complexes []. The anti-parallel β-barrel fold is also exploited by the fatty acid-binding proteins, which function similarly by binding small hydrophobic molecules. Similarity at the sequence level, however, is less obvious, being confined to a single short N-terminal motif.
Protein Domain
Name: Helix-turn-helix motif
Type: Conserved_site
Description: Helix-turn-helix (HTH) motifs are found in all known DNA binding proteins that regulate gene expression. The motif consists of approximately 20 residues and is characterised by 2 α-helices, which make intimate contacts with the DNA and are joined by a short turn. The second helix of the HTH motif binds to DNA via a number of hydrogen bonds and hydrophobic interactions, which occur between specific side chains and the exposed bases and thymine methyl groups within the major groove of the DNA [ ]. Thefirst helix helps to stabilise the structure [ ]. The HTH motif is very similar in sequence and structure to the N-terminal region of the lambda [] and other repressor proteins, and has also been identified in many other DNA-binding proteins on the basis of sequence and structural similarity []. One of the principal differences between HTH motifs in these different proteins arises from the stereochemical requirement for glycine in the turn, which is needed to avoid steric interference of the β-carbon with the main chain: for cro and other repressors the Gly appears to be mandatory, while for many of the homeoticand other DNA-binding proteins the requirement is relaxed.
Protein Domain
Name: Trigger factor, C-terminal
Type: Domain
Description: In the Escherichia coli cytosol, a fraction of the newly synthesised proteins requires the activity of molecular chaperones for folding to the native state. The major chaperones implicated in this folding process are the ribosome-associated Trigger Factor (TF), and the DnaK and GroEL chaperones with their respective co-chaperones. Trigger Factor is an ATP-independent chaperone and displays chaperone and peptidyl-prolyl-cis-trans-isomerase (PPIase) activities in vitro. It is composed of three domains, an N-terminal ribosome-binding domain (RBD) which mediates association with the large ribosomal subunit, a central PPIase domain with homology to FKBP proteins, and a C-terminal substrate-binding domain (SBD) which forms the central body of the protein and has two helical arms that create a cavity [ ]. The association between its N-terminal domain with the ribosomal protein L23 located next to the peptide tunnel exit is essential for the interaction with nascent polypeptides and its in vivo function [].This entry represents the C-terminal domain of bacterial TF proteins, which has a multi-helical structure consisting of an irregular array of long and short helices. This domain is structurally similar to the peptide-binding domain of the bacterial porin chaperone SurA [ ].
Protein Domain
Name: GroES-like superfamily
Type: Homologous_superfamily
Description: GroES (chaperonin 10) is an oligomeric molecular chaperone, which functions in protein folding and possibly in intercellular signalling, being found on the surface of various prokaryotic and eukaryotic cells, as well as being released from cells. Secreted chaperonins are thought to act as intercellular signals, interacting with a variety of cell types, including leukocytes, vascular endothelial cells and epithelial cells, as well as activating key cellular activities such as the synthesis of cytokines and adhesion proteins [ ]. GroES works as a co-chaperone with GroEL (chaperonin 60) during protein folding. The polypeptide substrate is captured by GroEL, which bind the co-chaperone GroES and ATP, and discharges the substrate into a unique microenvironment inside of the chaperone, which promotes productive folding. After hydrolysis of ATP, the polypeptide is released into solution []. GP31 from Bacteriophage T4 is functionally equivalent to GroES. GroES folds as a partly opened β-barrel. The N-terminal domain of alcohol dehydrogenase-like proteins have a GroES-like fold, the C-terminal domain having a classical Rossman-fold [ ]. These proteins include, alcohol dehydrogenase, which contains a zinc-finger subdomain within the GroES-like domain, ketose reductase (sorbitol dehydrogenase), formaldehyde dehydrogenase, quinone oxidoreductase and 2,4-dienoyl-CoA reductase.
Protein Domain
Name: Toxin CdiA-like, Filamentous hemagglutinin motif repeats
Type: Repeat
Description: This entry represents several repeats of the filamentous hemagglutinin (FHA-1) motif , which have approximately 20 amino acids. The repeats are found at the N-terminal domain of Toxin CdiA from Escherichia coli and similar proteins mostly from bacterial species, including several plant and animal pathogens. The FHA-1 repeats found in the extracellular filament of CdiA form an elongated β-helix, with each ~20-residue motif extending the helix [ , ].CdiA is a toxic component of a toxin-immunity protein module, which functions as a cellular contact-dependent growth inhibition (CDI) system CDI modules allow bacteria to communicate with and inhibit the growth of closely related neighboring bacteria in a contact-dependent fashion. CDI is neutralized by its cognate immunity protein CdiI, but not by non-cognate CdiI from other bacteria [ , ]. Some proteins with nuclease activity on top of the toxin activity are also included in this group, such as tRNA nuclease CdiA from Escherichia coli. This protein cleaves tRNA (CUG-Gln) in the acceptor stem between C70 and A71 [ . 16S rRNA endonuclease CdiA from Enterobacter cloacae cleaves 16S rRNA in vivo and in vitro between adenine 1493 and guanosine 1494 of E.coli 16S rRNA [ ].
Protein Domain
Name: DNA binding HTH domain, AraC-type
Type: Domain
Description: Many bacterial transcription regulation proteins bind DNA through a 'helix-turn-helix' (HTH) motif. One major subfamily of these proteins [, ] is related to the arabinose operon regulatory protein AraC [ , . Except for celD [ ], all of these proteins seem to be positive transcriptional factors.Although the sequences belonging to this family differ somewhat in length, in nearly every case the HTH motif is situated towards the C terminus in the third quarter of most of the sequences. The minimal DNA binding domain spans roughly 100 residues and comprises two HTH subdomains; the classical HTH domain and another HTH subdomain with similarity to the classical HTH domain but with an insertion of one residue in the turn-region. The N-terminal and central regions of these proteins are presumed to interact with effector molecules and may be involved in dimerisation [ ].The known structure of MarA ( ) shows that the AraC domain is alpha helical and shows the two HTH subdomains both bind the major groove of the DNA. The two HTH subdomains are separated by only 27 angstroms, which causes the cognate DNA to bend.This entry representsthe full AraC domain containing the two HTH subdomains.
Protein Domain
Name: Glycoprotein hormone receptor family
Type: Family
Description: Glycoprotein hormones (or gonadotropins) are protein hormones, that includes the mammalian hormones follicle-stimulating hormone (FSH, also known as follitropin), luteinizing hormone (LH, also known as lutropin), thyroid-stimulating hormone (TSH, also known as thyrotropin) and human chorionic gonadotropin (hCG). These hormones are central to the complex endocrine system that regulates normal growth, sexual development, and reproductive function []. The hormones FSH, LH and TSH are secreted by the anterior pituitary gland [, ], while the choriogonadotropins are secreted by the placenta []. Glycoprotein hormone receptors are members the rhodopsin-like G-protein coupled receptor (GPCR) family. They function as receptors for the pituitary hormones thyrotropin (TSH receptor), follitropin (FSH receptor) and lutropin (LH receptor). In mammals the LH receptor is also the receptor for the placental hormone, human chorionic gonadotropin (hCG), so is denominated as a lutropin-choriogonadotropic hormone receptor (LHCG receptor). The receptors share close sequence similarity, and are characterised by large extracellular domains believed to be involved in hormone binding via leucine-rich repeats (LRR) [ ].This entry represents the glycoprotein hormone receptor family, which includes the follicle stimulating hormone receptor, lutropin-choriogonadotropic hormone receptor and the thyroid stimulating hormone receptor.
Protein Domain
Name: PP7, metallophosphatase domain
Type: Domain
Description: PP7 is a plant phosphoprotein phosphatase that is highly expressed in a subset of stomata and thought to play an important role in sensory signaling. PP7 acts as a positive regulator of signaling downstream of cryptochrome blue light photoreceptors [ ]. PP7 also controls amplification of phytochrome signaling, and interacts with nucleotidediphosphate kinase 2 (NDPK2), a positive regulator of phytochrome signalling. In addition, PP7 interacts with heat shock transcription factor HSF and up-regulates protective heat shock proteins [, ]. PP7 may also play a role in salicylic acid-dependent defense signaling [].The PPP (phosphoprotein phosphatase) family, to which PP7 belongs, is one of two known protein phosphatase families specific for serine and threonine. The PPP family also includes: PP2A, PP2B (calcineurin), PP4, PP5, PP6, Bsu1, RdgC, PrpE, PrpA/PrpB, and ApA4 hydrolase. The PPP catalytic domain is defined by three conserved motifs (-GDXHG-, -GDXVDRG- and -GNHE-). The PPP enzyme family is ancient with members found in all eukaryotes, and in most bacterial and archeal genomes. Dephosphorylation of phosphoserines and phosphothreonines on target proteins plays a central role in the regulation of many cellular processes [ , ]. PPPs belong to the metallophosphatase (MPP) superfamily.
Protein Domain
Name: Ribosomal S6 modification enzyme RimK/Lysine biosynthesis enzyme LysX
Type: Family
Description: Escherichia coli RimK adds additional Glu residues to the native Glu-Glu C terminus of ribosomal protein S6. Mutation of the Glu-Glu terminus to Lys-Glu blocked addition. S6 has the C-terminal sequence Glu-Glu in few species, suggesting the homologue of rimK may have a function other than S6 modification in those species. However, most species having a member of this protein subfamily do not have an S6 homologue ending in Glu-Glu.The family of proteins found in this family include the characterised LysX from Thermus thermophilus [ ] which is part of a well-organised lysine biosynthesis gene cluster []. LysX is believed to carry out an ATP-dependent acylation of the amino group of alpha-aminoadipate in the prokaryotic version of the fungal AAA lysine biosynthesis pathway. No species having a sequence in this family contains the elements of the more common diaminopimelate lysine biosythesis pathway, and none has been shown to be a lysine auxotroph. These sequences have mainly recieved the name of the related enzyme, "ribosomal protein S6 modification protein RimK". RimK has been characterised in Escherichia coli, and acts by ATP-dependent condensation of S6 with glutamate residues [ ].
Protein Domain
Name: WHIM2 domain
Type: Domain
Description: This entry represents the WHIM2 domain found in diverse eukaryotic chromatin proteins, such as animal BAZ/WAL and BPTF proteins, plant RLT, and yeast Itc1. This domain contains the D-TOX E motif (also known as the Williams-Beuren syndrome DDT (WSD) motif) that is conserved from yeasts to animals [ , , ].WHIM2 domain is a conserved alpha helical motif that along with the WHIM1 and WHIM3 motifs, and the DDT domain comprise an alpha helical module found in diverse eukaryotic chromatin proteins [ ]. Based on the Ioc3 structure, this module is inferred to interact with nucleosomal linker DNA and the SLIDE domain of ISWI proteins [, ]. The resulting complex forms a protein ruler that measures out the spacing between two adjacent nucleosomes []. The acidic residue from the GxD signature of WHIM2 is a major determinant of the interaction between the ISWI and WHIM motifs. The N-terminal portion of the WHIM2 motif also contacts the inter-nucleosomal linker DNA. The module shows a great domain architectural diversity and is often combined with other modified histone peptide recognizing and DNA binding domains, some of which discriminate methylated DNA [].
Protein Domain
Name: DNA replication regulator HobA superfamily
Type: Homologous_superfamily
Description: Proteins in this superfamily are approximately 180 amino acids in length found exclusively in epsilon-proteobacteria. The crystal structure of HobA from Helicobacter pylori has been reported at 1.7A resolution; HobA represents a modified Rossmann fold consisting of a five-stranded parallel β-sheet (beta1-5) flanked on one side by alpha-2, alpha-3 and alpha-6 helices and alpha-4 and alpha-5 on the other. The alpha-1 helix is extended away from and has minimal interaction with the globular part of the protein. Four monomers interact to form a tetrameric molecule. Four calcium atoms bind to the tetramer and these binding sites may have functional relevance. The closest structural homologue of HobA is a sugar isomerase (SIS) domain containing protein, the phosphoheptose isomerase from Pseudomonas aeruginosa. The SIS proteins share strong sequence homology with DiaA from Escherichia coli; yet, HobA and DiaA share no sequence homology [ ]. HobA is a novel protein essential for initiation of H. pylori chromosome replication. It interacts specifically via DnaA with the oriC-DnaA complex. It is possible that HobA is essential for the correct formation and stabilisation of the orisome by facilitating the spatial positioning of DnaA at oriC [ ].
Protein Domain
Name: Rab35
Type: Family
Description: Ras-related protein Rab35 is a member of the large Rab GTPase family. Rab35 is involved in many cellular functions, including endocytic recycling, cytokinesisis and endosomal trafficking [ ]. It is one of several Rab proteins to be found to participate in the regulation of osteoclast cells in rats []. In addition, Rab35 has been identified as a protein that interacts with nucleophosmin-anaplastic lymphoma kinase (NPM-ALK) in human cells. Overexpression of NPM-ALK is a key oncogenic event in some anaplastic large-cell lymphomas; since Rab35 interacts with NPM-ALK, it may provide a target for cancer treatments []. Rabs are regulated by GTPase activating proteins (GAPs), which interact with GTP-bound Rab and accelerate the hydrolysis of GTP to GDP. Guanine nucleotide exchange factors (GEFs) interact with GDP-bound Rabs to promote the formation of the GTP-bound state. Rabs are further regulated by guanine nucleotide dissociation inhibitors (GDIs), which facilitate Rab recycling by masking C-terminal lipid binding and promoting cytosolic localization. Most Rab GTPases contain a lipid modification site at the C terminus, with sequence motifs CC, CXC, or CCX. Lipid binding is essential for membrane attachment, a key feature of most Rab proteins [ , ].
Protein Domain
Name: Transcription regulator Myc
Type: Family
Description: The class III basic helix-turn-helix (bHLH) transcription factors have proliferative and apoptotic roles and are characterised by the presence of a leucine zipper adjacent to the bHLH domain. The myc oncogene was first discovered in small-cell lung cancer cell lines where it is found to be deregulated []. The Myc protein contains an N-terminal transcriptional regulatory domain followed by a nuclear localization signal and a C-terminal basic DNA binding domain tethered to a helix-loop-helix-leucine zipper (HLH-Zip) dimerization motif. Myc forms a heterodimer with Max, and this complex regulates cell growth through direct activation of genes involved in cell replication [, , ].The `leucine zipper' is a structure that is believed to mediate the function of several eukaryotic gene regulatory proteins. The zipper consists of a periodic repetition of leucine residues at every seventh position, and regions containing them appear to span eight turns of α-helix. The leucine side chains that extend from one helix interact with those from a similar helix, hence facilitating dimerisation in the form of a coiled-coil. Leucine zippers are present in many gene regulatory proteins, including the CREB proteins, Jun/AP1 transcription factors, fos oncogene and fos-related proteins, C-myc, L-myc and N-myc oncogenes, and so on.
Protein Domain
Name: Toll-interacting protein, C2 domain
Type: Domain
Description: Tollip is a part of the Interleukin-1 receptor (IL-1R) signaling pathway. Tollip is proposed to link serine/threonine kinase IRAK to IL-1Rs as well as inhibiting phosphorylation of IRAK [ ]. The TOLLIP-dependent selective autophagy pathway plays an important role in clearance of cytotoxic polyQ proteins aggregates []. There is a single C2 domain present in Tollip. C2 domains fold into an 8-standed β-sandwich that can adopt 2 structural arrangements, type I and type II, distinguished by a circular permutation involving their N- and C-terminal beta strands. Many C2 domains are Ca2+-dependent membrane-targeting modules that bind a wide variety of substances including phospholipids, inositol polyphosphates, and intracellular proteins. Most C2 domain proteins are either signal transduction enzymes that contain a single C2 domain, such as protein kinase C, or membrane trafficking proteins which contain at least two C2 domains, such as synaptotagmin 1. However, there are a few exceptions to this including RIM isoforms and some splice variants of piccolo/aczonin and intersectin which only have a single C2 domain. C2 domains with a calcium binding region have negatively charged residues, primarily aspartates, that serve as ligands for calcium ions [, , , , , , , ].
Protein Domain
Name: Archease
Type: Family
Description: The archease superfamily of proteins are represented in all three domains of life. Archease genes are generally located adjacent to genes encoding proteins involved in DNA or RNA processing and therefore have been predicted to be modulators or chaperones involved in DNA or RNA metabolism. Many of the roles of archeases remain to be established experimentally. The function of one of the archeases from the hyperthermophile Pyrococcus abyssi has been determined. The gene encoding the archease (PAB1946) is located in a bicistronic operon immediately upstream from a second open reading frame (PAB1947), which encodes a tRNA m5C methyltransferase. The methyl transferase catalyses m5C formation at several cytosine's within tRNAs with preference for C49; the specificity of the methyltransferase reaction being increased by the archease. The archease protects the tRNA (cytosine-5-)-methyltransferase PAB1947 against aggregation and increases its specificity. The archease exists in monomeric and oligomeric states, with only the oligomeric forms able to bind the methyltransferase [ ].The function of this family of archeases as chaperones is supported by structural analysis of from Methanobacterium thermoautotrophicum, which shows homology to heat shock protein 33, which is a chaperone protein that inhibits the aggregation of partially denatured proteins [].
Protein Domain
Name: GspL, cytoplasmic actin-ATPase-like domain
Type: Domain
Description: The general secretion pathway of Gram-negative bacteria is responsible for extracellular secretion of a number of different proteins, including proteases and toxins. This pathway supports secretion of proteins across the cell envelope in two distinct steps, in which the second step, involving translocation through the outer membrane, is assisted by at least 13 different gene products. GspL is predicted to contain a large cytoplasmic domain and has been shown to interact with the autophosphorylating cytoplasmic membrane protein GspE. It is thought that the tri-molecular complex of GspL, GspE and GspM might be involved in regulating the opening and closing of the secretion pore and/or transducing energy to the site of outer membrane translocation [ ].This N-terminal domain is found in general secretion pathway protein L sequences from several Gram-negative bacteria. It is a cytoplasmic domain that shows structural homology with the superfamily of actin-like ATPases. However, it is entirely missing domains 1B and 2B of the actin-like ATPases. As domain 2B of the actin-like superfamily is critically important for binding the adenosine part of ATP and absent altogether in cyto-EpsL, it is therefore unlikely that EpsL is an ATP-binding protein [ ].
Protein Domain
Name: FOX1, RNA recognition motif
Type: Domain
Description: This entry represents the RNA recognition motif (RRM) of the C. elegans RNA binding protein Fox (feminizing locus on X)-1 and its homologues. The Fox-1 family proteins are evolutionarily conserved regulators of tissue-specific alternative splicing in metazoans [ ]. They bind specifically to an RNA element, UGCAUG []. In mammals, there are three Fox-1 homologues, RBFOX1-3 (RNA binding protein Fox-1 homologue 1-3). RBFOX1 is predominantly expressed in neurons, skeletal muscle and heart [ ]. RBFOX1 binds to the C terminus of ataxin-2 and forms an ataxin-2/A2BP1 complex involved in RNA processing []. RBFOX2 is expressed in ovary, whole embryo, and human embryonic cell lines in addition to neurons and muscle []. RBFOX2 activates splicing of neuron-specific exons []. RBFOX3 is a nuclear RNA-binding protein that regulates alternative splicing of the RBFOX2 pre-mRNA, producing a message encoding a dominant negative form of the RBFOX2 protein. Its message is detected exclusively in post-mitotic regions of embryonic brain []. RBFOX3 modulates brain and muscle-specific splicing of exon EIIIB of fibronectin, exon N1 of c-src, and calcitonin/CGRP []. Members in this family harbour one RNA recognition motif (RRM).
Protein Domain
Name: Frataxin/CyaY superfamily
Type: Homologous_superfamily
Description: The eukaryotic proteins in this entry include frataxin, the protein that is mutated in Friedreich's ataxia [ ], and related sequences. Friedreich's ataxia is a progressive neurodegenerative disorder caused by loss of function mutations in the gene encoding frataxin (FRDA). Frataxin mRNA is predominantly expressed in tissues with a high metabolic rate (including liver, kidney, brown fat and heart). Mouse and yeast frataxin homologues contain a potential N-terminal mitochondrial targeting sequence, and human frataxin has been observed to co-localise with a mitochondrial protein. Furthermore, disruption of the yeast gene has been shown to result in mitochondrial dysfunction. Friedreich's ataxia is thus believed to be a mitochondrial disease caused by a mutation in the nuclear genome (specifically, expansion of an intronic GAA triplet repeat) [, , ].The bacterial proteins in this entry are iron-sulphur cluster (FeS) metabolism CyaY proteins homologous to eukaryotic frataxin. Partial Phylogenetic Profiling [ ] suggests that CyaY most likely functions as part of the ISC system for FeS cluster biosynthesis, and is supported by expermimental data in some species [, ]. The structure of Frataxin/CyaY has an α-β(5)-alpha fold arranged in two layers (alpha/beta) with meander antiparallel sheet.
Protein Domain
Name: CBS domain-containing protein, bacteria
Type: Domain
Description: CBS domains are evolutionarily conserved structural domains found in a variety of non functionally-related proteins from all kingdoms of life. These domains pair together to form a intramolecular dimeric structure (CBS pair), termed Bateman domain [ , , , ]. CBS domains have been shown to bind mainly ligands with an adenosyl group such as AMP, ATP and S-AdoMet, but may also bind metal ions, or nucleic acids [, ]. Hence, they play an essential role in the regulation of the activities of numerous proteins, and mutations in them are associated with several hereditary diseases [, , ]. CBS domains are found attached to a wide range of other protein domains suggesting that CBS domains may play a regulatory role making proteins sensitive to adenosyl-carrying ligands. The region containing the CBS domains in cystathionine-beta synthase is involved in regulation by S-AdoMet []. CBS domain pairs from AMPK bind AMP or ATP []. The CBS domains from IMPDH, which bind ATP, have shown to have a role in the regulation of adenylate nucleotide synthesis [, ].This entry represents a group of uncharacterised bacterial proteins containing a pair of CBS domains.
Protein Domain
Name: Metallothionein, family 5, Diptera
Type: Family
Description: Metallothioneins (MT) are small proteins that bind heavy metals, such as zinc, copper, cadmium, and nickel. They have a high content of cysteine residues that bind the metal ions through clusters of thiolate bonds [ , , ] species, including sea urchins, fungi, insects and cyanobacteria. Class III MTs are atypical polypeptides composed of gamma-glutamylcysteinyl units. This original classification system has been found to be limited, in the sense that it does not allow clear differentiation of patterns of structural similarities, either between or within classes. Consequently, all class I and class I MTs (the proteinaceous sequences) have now been grouped into families of phylogenetically-related and thus alignable sequences. Diptera (Drosophila, family 5) MTs are 40-43 residue proteins that contain 10 conserved cysteines arranged in five Cys-X-Cys groups. In particular, the consensus pattern C-G-x(2)-C-x-C-x(2)-Q-x(5)-C-x-C-x(2)-D-C-x-C has been found to be diagnostic of family 5 MTs. The protein is found primarily in the alimentary canal, and its induction is stimulated by ingestion of cadmium or copper [ ]. Mercury, silver and zinc induce the protein to a lesser extent. Family 5 includes subfamilies: d1, d2. Only one d2 is known until now. Subfamilies hit the same entry.
USDA
InterMine logo
The Legume Information System (LIS) is a research project of the USDA-ARS:Corn Insects and Crop Genetics Research in Ames, IA.
LegumeMine || ArachisMine | CicerMine | GlycineMine | LensMine | LupinusMine | PhaseolusMine | VignaMine | MedicagoMine
InterMine © 2002 - 2022 Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, United Kingdom