Search our database by keyword

- or -

Examples

  • Search this entire website. Enter identifiers, names or keywords for genes, pathways, authors, ontology terms, etc. (e.g. eve, embryo, zen, allele)
  • Use OR to search for either of two terms (e.g. fly OR drosophila) or quotation marks to search for phrases (e.g. "dna binding").
  • Boolean search syntax is supported: e.g. dros* for partial matches or fly AND NOT embryo to exclude a term

Search results 11501 to 11600 out of 30763 for seed protein

Category restricted to ProteinDomain (x)

0.032s

Categories

Category: ProteinDomain
Type Details Score
Protein Domain
Name: SMAD/FHA domain superfamily
Type: Homologous_superfamily
Description: FHA and SMAD (MH2) domains share a common structure consisting of a sandwich of eleven β-strands in two sheets with Greek key topology. Forkhead-associated (FHA) domains were originally identified as a sequence profile of about 75 amino acids, whereas the full-length domain is closer to about 150 amino acids. FHA domains are found in transcription factors, kinesin motors, and in a variety of other signalling molecules in organisms ranging from eubacteria to humans. FHA domains are protein-protein interaction domains that are specific for phosphoproteins. FHA-containing proteins function in maintaining cell-cycle checkpoints, DNA repair and transcriptional regulation. FHA domain proteins include the Chk2/Rad53/Cds1 family of proteins that contain one or more FHA domains, as well as a Ser/Thr kinase domain [ , , ]. SMAD (Mothers against decapentaplegic (MAD) homologue) domain proteins are found in a range of species from nematodes to humans. These highly conserved proteins contain an N-terminal MH1 domain that contacts DNA, and is separated by a short linker region from the C-terminal MH2 domain, the later showing a striking similarity to FHA domains. SMAD proteins mediate signalling by the TGF-beta/activin/BMP-2/4 cytokines from receptor Ser/Thr protein kinases at the cell surface to the nucleus. SMAD proteins fall into three functional classes: the receptor-regulated SMADs (R-SMADs), including SMAD1, -2, -3, -5, and -8, each of which is involved in a ligand-specific signalling pathway []; the comediator SMADs (co-SMADs), including SMAD4, which interact with R-SMADs to participate in signalling []; and the inhibitory SMADs (I-SMADs), including SMAD6 and -7, which block the activation of R-SMADs and Co-SMADs, thereby negatively regulating signalling pathways []. Domains with this fold are also found as the transactivation domain of interferon regulatory factor 3 (IRF3), which has a weak homology to SMAD domains [ ], and the N-terminal domain of EssC protein in Staphylococcus aureus.
Protein Domain
Name: CTLH, C-terminal LisH motif
Type: Domain
Description: The 33-residue LIS1 homology (LisH) motif ( ) is found in eukaryotic intracellular proteins involved in microtubule dynamics, cell migration, nucleokinesis andchromosome segregation. The LisH motif is likely to possess a conserved protein-binding function and it has been proposed that LisH motifs contributeto the regulation of microtubule dynamics, either by mediating dimerization, or else by binding cytoplasmic dynein heavy chain or microtubules directly.The LisH motif is found associated to other domains, such as WD-40 (see ), SPRY, Kelch, AAA ATPase, RasGEF, or HEAT (see ) [, , ].The secondary structure of the LisH domain is predicted to be two alpha- helices [].Some proteins known to contain a LisH motif are listed below:Animal LIS1. It regulates cytoplasmic dynein function. In Homo sapiens (human) children with defects in LIS1 suffer from Miller-Dieker lissencephaly, a brainmalformation that results in severe retardation, epilepsy and an early death.Emericella nidulans (Aspergillus nidulans) nuclear migration protein nudF, the orthologue of LIS1.Eukaryotic RanBPM, a Ran binding protein involved in microtubule nucleation.Eukaryotic Nopp140, a nucleolar phosphoprotein.Mammalian treacle, a nucleolar protein. In human, defects in treacle are the cause of Treacher Collins syndrome (TCS), an autosomal dominantdisorder of craniofacial development.Animal muskelin. It acts as a mediator of cell spreading and cytoskeletal responses to the extracellular matrix component thrombospondin 1.Animal transducin beta-like 1 protein (TBL1).Plant tonneau.Arabidopsis thaliana LEUNIG, a putative transcriptional corepressor that regulates AGAMOUS expression during flower development.Fungal aimless RasGEF.Leishmania major katanin-like protein.The C-terminal to LisH (CTLH) motif is a predicted α-helical sequence of unknown function that is found adjacent to the LisH motif in a number of theseproteins but is absent in other (e.g. LIS1) [ , , ]. The CTLH domain can alsobe found in the absence of the LisH motif, like in: Arabidopsis thaliana (Mouse-ear cress) hypothetical protein MUD21.5.Saccharomyces cerevisiae yeast protein RMD5.
Protein Domain
Name: HTH APSES-type DNA-binding domain superfamily
Type: Homologous_superfamily
Description: The APSES (ASM-1, Phd1, StuA, EFG1, and Sok2) domain is a ~110-residue sequence-specific DNA-binding domain found in a family of fungal transcription factors and other DNA-binding proteins. The APSES domain is found associated with ankyrin repeats (see , ) and a heterodimerization domain [ , ].The APSES domain consists of a six-stranded β-sheet segment folded against two pairs of α-helices in the topology of the winged helix-turn-helix (HTH) family of proteins [ ]. The alphaA/alphaB helical pair corresponds to the HTH motif [, , ]. Some proteins known to contain an APSES domain are listed below:Saccharomyces cerevisiae (Baker's yeast) Mlu-1 box binding protein (MBP1, ); Mlu-1 and Swi6 together make up the transcription factor complex MBF (Mlu-1-binding factor), which binds to the Mlu1 cell-cycle box (MCB) elements found in the promoters of many DNA synthesis genes [ ]. Schizosaccharomyces pombe (Fission yeast) cell division cycle-related proteins res1/sct1 ( ) and res2/pct1 ( ), which are homologous to MBP1. S. pombe start control protein cdc10 ( ). Emericella nidulans (Aspergillus nidulans) cell pattern formation-associated protein stuA ( ). It regulates the transformation of undifferentiated hyphal elements into a complex multicellular structure. S. cerevisiae transcription factor PHD1 ( ). S. cerevisiae protein SOK2 ( ). It represses the transition from unicellular growth to pseudohyphae. Candida albicans (Yeast) enhanced filamentous growth protein (EFG1, )). Neurospora crassa ascospore maturation 1 protein (Asm-1, ). It has a role in spore maturation. S. pombe bouquet formation protein 4 ( ), which is a nuclear membrane protein that connects telomeres to the nuclear envelop (NE) during both vegetative growth and meiosis. Bqt4 does not seem to bind DNA, but acts as an adaptor between Bqt3 and Rap1 [].
Protein Domain
Name: Transcription regulator HTH, APSES-type DNA-binding domain
Type: Domain
Description: The APSES (ASM-1, Phd1, StuA, EFG1, and Sok2) domain is a ~110-residue sequence-specific DNA-binding domain found in a family of fungal transcription factors and other DNA-binding proteins. The APSES domain is found associated with ankyrin repeats (see , ) and a heterodimerization domain [ , ].The APSES domain consists of a six-stranded β-sheet segment folded against two pairs of α-helices in the topology of the winged helix-turn-helix (HTH) family of proteins [ ]. The alphaA/alphaB helical pair corresponds to the HTH motif [, , ]. Some proteins known to contain an APSES domain are listed below:Saccharomyces cerevisiae (Baker's yeast) Mlu-1 box binding protein (MBP1, ); Mlu-1 and Swi6 together make up the transcription factor complex MBF (Mlu-1-binding factor), which binds to the Mlu1 cell-cycle box (MCB) elements found in the promoters of many DNA synthesis genes [ ]. Schizosaccharomyces pombe (Fission yeast) cell division cycle-related proteins res1/sct1 ( ) and res2/pct1 ( ), which are homologous to MBP1. S. pombe start control protein cdc10 ( ). Emericella nidulans (Aspergillus nidulans) cell pattern formation-associated protein stuA ( ). It regulates the transformation of undifferentiated hyphal elements into a complex multicellular structure. S. cerevisiae transcription factor PHD1 ( ). S. cerevisiae protein SOK2 ( ). It represses the transition from unicellular growth to pseudohyphae. Candida albicans (Yeast) enhanced filamentous growth protein (EFG1, )). Neurospora crassa ascospore maturation 1 protein (Asm-1, ). It has a role in spore maturation. S. pombe bouquet formation protein 4 ( ), which is a nuclear membrane protein that connects telomeres to the nuclear envelop (NE) during both vegetative growth and meiosis. Bqt4 does not seem to bind DNA, but acts as an adaptor between Bqt3 and Rap1 [].
Protein Domain
Name: Death-like domain superfamily
Type: Homologous_superfamily
Description: The death domain (DD) is a conserved region of about 80 residues found on death receptors, and which is required for death signalling, as well as a variety of non-apoptotic functions [ , ]. Proteins containing this domain include the low affinity neurotrophin receptor p73, Fas, FADD (Fas-associated death domain protein), TNF-1 (tumour necrosis factor receptor-1), Pelle protein kinase, and the Tube adaptor protein [].The induction of apoptosis also relies on the presence of a second domain, called the death effector domain. The death effector domain (DED) occurs in proteins that regulate programmed cell death, including both pro- and anti-apoptotic proteins; many of these proteins are also involved in controlling cellular activation and proliferation pathways [ ]. Proteins containing this domain include FADD (DED N-terminal, DD C-terminal), PEA-15 (phosphoproteins enriched in astrocytes 15kDa), caspases and FLIP.The induction of apoptosis results in the activation of caspases, a family of aspartyl-specific cysteine proteases that are the main executioners of apoptosis. For example, the DED of FADD recruits two DED-containing caspases, caspase-8 and caspase-10, to form the death-inducing signal complex, which initiates apoptosis. Proteins containing the caspase recruitment domain (CARD) are involved in the recruitment and activation of caspases during apoptosis [ ]. Other CARD proteins participate in NF-kappaB signalling pathways associated with innate or adaptive immune responses. Proteins containing CARD include Raidd, APAF-1 (apoptotic protease activating factor 1), procaspase 9 and iceberg (inhibitor of interleukin-1-beta generation).The DD shows strong structural similarity to both DED and CARD. They all display a 6-helical closed bundle fold, with greek key topology and an internal psuedo two-fold symmetry. However, despite their overall similarity in topology, each domain forms specialised interactions, typically only with members of its own subfamily, for example DED with DED.This superfamily represents the death domain and other structurally similar domains, including DED, CARD and the DAPIN domain.
Protein Domain
Name: Zinc finger, MIZ-type
Type: Domain
Description: Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [ , , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. This entry represents MIZ-type zinc finger domains. Miz1 (Msx-interacting-zinc finger) is a zinc finger-containing protein with homology to the yeast protein, Nfi-1. Miz1 is a sequence specific DNA binding protein that can function as a positive-acting transcription factor. Miz1 binds to the homeobox protein Msx2, enhancing the specific DNA-binding ability of Msx2 [ ]. Other proteins containing this domain include the human pias family (protein inhibitor of activated STAT protein).
Protein Domain
Name: FRMD4A/B, FERM domain C-lobe
Type: Domain
Description: FERM domain-containing protein 4A (FRMD4A) is part of the Par-3/FRMD4A/cytohesin-1 complex that activates Arf6, a central player in actin cytoskeleton dynamics and membrane trafficking, during junctional remodelling and epithelial polarization. The Par-3/Par-6/aPKC/Cdc42 complex regulates the conversion of primordial adherens junctions (AJs) into belt-like AJs and the formation of linear actin cables. When primordial AJs are formed, Par-3 recruits scaffolding protein FRMD4A which connects Par-3 and the Arf6 guanine-nucleotide exchange factor (GEF), cytohesin-1 [ ].FERM domain-containing protein 4B (FRMD4B, also called GRP1-binding protein, GRSP1) is a novel member of GRP1 signalling complexes that are recruited to plasma membrane ruffles in response to insulin receptor signalling. The GRSP1/FRMD4B protein contains a FERM protein domain as well as two coiled coil domains and may function as a scaffolding protein. GRP1 and GRSP1 interact through the coiled coil domains in the two proteins [ ]. The FERM domain has a cloverleaf tripart structure composed of: (1) FERM_N (A-lobe or F1); (2) FERM_M (B-lobe, or F2); and (3) FERM_C (C-lobe or F3). The C-lobe/F3 within the FERM domain is part of the PH domain family. Like most other ERM members they have a phosphoinositide-binding site in their FERM domain. The FERM C domain is the third structural domain within the FERM domain. The FERM domain is found in the cytoskeletal-associated proteins such as ezrin, moesin, radixin, 4.1R, and merlin. These proteins provide a link between the membrane and cytoskeleton and are involved in signal transduction pathways. The FERM domain is also found in protein tyrosine phosphatases (PTPs) , the tyrosine kinases FAK and JAK, in addition to other proteins involved in signaling. This domain is structurally similar to the PH and PTB domains and consequently is capable of binding to both peptides and phospholipids at different sites [ , ].
Protein Domain
Name: Conjugal transfer, TrbG/VirB9/CagX
Type: Family
Description: Several bacterial pathogens utilise conjugation machines to export effector molecules during infection. Such systems are members of the type IV or 'adapted conjugation' secretion family. The prototypical type IV system is the Agrobacterium tumefaciens T-DNA transfer machine, which delivers oncogenic nucleoprotein particles to plant cells. Other pathogens, including Bordetella pertussis, Legionella pneumophila, Brucella spp. and Helicobacter pylori (Campylobacter pylori), use type IV machines to export effector proteins to the extracellular milieu or the mammalian cell cytosol.Conjugation machines of Gram-negative bacteria consist of two surface structures, the mating channel through which the DNA transfer intermediate and proteins are translocated and the conjugal pilus for contacting recipient cells. Various conjugative pili have been visualised, but to date there is no ultrastructural information about the mating channel. Recent work on the A. tumefaciens T-DNA transfer system has focused on identifying interactions among the VirB protein subunits and defining steps in the transporter assembly pathway. There are three functional groups of VirB proteins: proteins localised exocellularly forming the T-pilus or other adhesive structures; mating-channel components; and cytoplasmic membrane ATPases. Although all of these proteins probably assemble as a supramolecular complex, as yet there is no direct evidence for a physical association between the conjugative pilus and the mating channel.Several lines of evidence suggest that VirB6-VirB10 are probable channel subunits. VirB6, a highly hydrophobic protein, is thought to span the cytoplasmic membrane several times and presently is the best candidate for a channel-forming protein. VirB7, an outer membrane lipoprotein, interacts with itself and with VirB9 via disulphide bonds between unique reactive cysteines present in each protein. The VirB7-VirB9 heterodimer localises at the outer membrane and plays a critical role in stabilising other VirB proteins during assembly of the transfer machine. VirB9 is also required for formation of chemically crosslinked VirB10 oligomers probably corresponding to homotrimers [ ].
Protein Domain
Name: LIM-domain binding protein/SEUSS
Type: Family
Description: This entry includes the LIM-domain binding proteins and similar proteins, such as protein Chip from Drosophila, SEUSS from Arabidopsis and Adn1 from fission yeasts. The LIM-domain binding protein, binds to the LIM domain of LIM homeodomain proteins which are transcriptional regulators of development. Nuclear LIM interactor (NLI) / LIM domain-binding protein 1 (LDB1) is located in the nuclei of neuronal cells during development, it is co-expressed with Isl1 in early motor neuron differentiation and has a suggested role in the Isl1 dependent development of motor neurons [ ]. It is suggested that these proteins act synergistically to enhance transcriptional efficiency by acting as co-factors for LIM homeodomain and Otx class transcription factors, both of which have essential roles in development [ ].The Drosophila protein Chip ( ) is required for segmentation and activity of a remote wing margin enhancer [ ]. Chip is a ubiquitous chromosomal factor required for normal expression of diverse genes at many stages of development. It is suggested that Chip cooperates with different LIM domain proteins and other factors to structurally support remote enhancer-promoter interactions [].SEUSS is a transcriptional corepressor from Arabidopsis thaliana [ ]. SEUSS contains two glutamine-rich domains and a highly conserved domain that shares sequence identity with the dimerisation domain of the LIM-domain-binding transcription co-regulators in animals [].Several proteins in this entry are transcriptional regulators in fungi. In fission yeasts, adhesion defective protein 1 (Adn1) is a probable transcriptional regulator involved in cell adhesion [ ]. In Aspergillus fumigatus, the transcriptional activator ptaB forms a complex with somA to control biofilm formation [ ]. In Candida albicans, MFG1 (morphogenetic regulator of filamentous growth protein 1) has a role in all morphogenetically distinct forms of filamentous growth, including invasive growth and biofilm formation, probably by forming a complex with FLO8 and MSS1 which binds the promoter of the FLO11 gene [ ].
Protein Domain
Name: M matrix/glycoprotein, gammacoronavirus
Type: Family
Description: This family consists of various Gammacoronavirus matrix proteins which are transmembrane glycoproteins.M proteins play a critical role in protein-protein interactions (as well as protein-RNA interactions) since virus-like particle (VLP) formation in many coronavirus requires only the M and envelope (E) proteins for efficient virion assembly []. The M protein or E1 glycoprotein is implicated in virus assembly []. The E1 viral membrane protein is required for formation of the viral envelope and is transported via the Golgi complex [].
Protein Domain
Name: M matrix/glycoprotein, alphacoronavirus
Type: Family
Description: This family consists of various Alphacoronavirus matrix proteins which are transmembrane glycoproteins.M proteins play a critical role in protein-protein interactions (as well as protein-RNA interactions) since virus-like particle (VLP) formation in many coronavirus requires only the M and envelope (E) proteins for efficient virion assembly []. The M protein or E1 glycoprotein is implicated in virus assembly []. The E1 viral membrane protein is required for formation of the viral envelope and is transported via the Golgi complex [].
Protein Domain
Name: WD40-like beta propeller
Type: Repeat
Description: WD-40 repeats (also known as WD or beta-transducin repeats) are short ~40 amino acid motifs, often terminating in a Trp-Asp (W-D) dipeptide. WD40 repeats usually assume a 7-8 bladed β-propeller fold, but proteins have been found with 4 to 16 repeated units, which also form a circularised β-propeller structure. WD-repeat proteins are a large family found in all eukaryotes and are implicated in a variety of functions ranging from signal transduction and transcription regulation to cell cycle control and apoptosis. Repeated WD40 motifs act as a site for protein-protein or protein-DNA interaction, and proteins containing WD40 repeats are known to serve as platforms for the assembly of protein complexes or mediators of transient interplay among other proteins []. The specificity of the proteins is determined by the sequences outside the repeats themselves. Examples of such complexes are G proteins (beta subunit is a β-propeller), TAFII transcription factor, and E3 ubiquitin ligase [, ]. In Arabidopsis spp., several WD40-containing proteins act as key regulators of plant-specific developmental events.This region appears to be related to the repeat. This model is likely to miss copies within a sequence.
Protein Domain
Name: Homeobox-like domain superfamily
Type: Homologous_superfamily
Description: Homeobox domain (also known as homeodomain) proteins are transcription factors that share a related DNA binding homeodomain [ ]. The homeodomain was first identified in a number of Drosophila homeotic and segmentation proteins, but is now known to be well conserved in many other animals, including vertebrates. The domain binds DNA through a helix-turn-helix (HTH) structure. The HTH motif is characterised by two α-helices, which make intimate contacts with the DNA and are joined by a short turn. The second helix binds to DNA via a number of hydrogen bonds and hydrophobic interactions, which occur between specific side chains and the exposed bases and thymine methyl groups within the major groove of the DNA. The first helix helps to stabilise the structure. Many proteins contain homeodomains, including Drosophila Engrailed, yeast mating type proteins, hepatocyte nuclear factor 1a and HOX proteins. The homeodomain motif is very similar in sequence and structure to domains in a wide range of DNA-binding proteins, including recombinases, Myb proteins, GARP response regulators, human telomeric proteins (hTRF1), paired domain proteins (PAX), yeast RAP1, centromere-binding proteins CENP-B and ABP-1, transcriptional regulators (TyrR), AraC-type transcriptional activators, and tetracycline repressor-like proteins (TetR, QacR, YcdC) [ , , ].
Protein Domain
Name: LSM domain, eukaryotic/archaea-type
Type: Domain
Description: This domain is found in Lsm (like-Sm) proteins, which have a core structure consisting of an open β-barrel with an SH3-like topology.Lsm (like-Sm) proteins have diverse functions, and are thought to be important modulators of RNA biogenesis and function [ , ]. The Sm proteins form part of specific small nuclear ribonucleoproteins (snRNPs) that are involved in the processing of pre-mRNAs to mature mRNAs, and are a major component of the eukaryotic spliceosome. Most snRNPs consist of seven Sm proteins (B/B', D1, D2, D3, E, F and G) arranged in a ring on a uridine-rich sequence (Sm site), plus a small nuclear RNA (snRNA) (either U1, U2, U5 or U4/6) []. All Sm proteins contain a common sequence motif in two segments, Sm1 and Sm2, separated by a short variable linker []. Other snRNPs, such as U7 snRNP, can contain different Lsm proteins.Lsm proteins are also found in archaebacteria, which do not have any splicing apparatus, suggesting a more general role for Lsm proteins. Archaeal Lsm proteins have been shown to bind to small RNAs and are probably involved in many cellular processes [ ]. Archaeal Lsm proteins are likely to represent the ancestral Lsm domain [].
Protein Domain
Name: Beta-retroviral matrix superfamily
Type: Homologous_superfamily
Description: Retroviral matrix proteins (or major core proteins) are components of envelope-associated capsids, which line the inner surface of virus envelopes and are associated with viral membranes [ ]. Matrix proteins are produced as part of Gag precursor polyproteins. During viral maturation, the Gag polyprotein is cleaved into major structural proteins by the viral protease, yielding the matrix (MA), capsid (CA), nucleocapsid (NC), and some smaller peptides. Gag-derived proteins govern the entire assembly and release of the virus particles, with matrix proteins playing key roles in Gag stability, capsid assembly, transport and budding. Although matrix proteins from different retroviruses appear to perform similar functions and can have similar structural folds which predominantly consist of four closely packed α-helices that are interconnected through loops, their primary sequences can be very different []. This entry represents matrix proteins from beta-retroviruses such as Mason-Pfizer monkey virus (MPMV) (Simian Mason-Pfizer virus) and Mouse mammary tumor virus (MMTV) [ , , ]. This entry also identifies matrix proteins from several eukaryotic endogenous retroviruses, which arise when one or more copies of the retroviral genome becomes integrated into the host genome [].
Protein Domain
Name: WD40-repeat-containing domain superfamily
Type: Homologous_superfamily
Description: WD-40 repeats (also known as WD or beta-transducin repeats) are short ~40 amino acid motifs, often terminating in a Trp-Asp (W-D) dipeptide. WD40 repeats usually assume a 7-8 bladed β-propeller fold, but proteins have been found with 4 to 16 repeated units, which also form a circularised β-propeller structure. WD-repeat proteins are a large family found in all eukaryotes and are implicated in a variety of functions ranging from signal transduction and transcription regulation to cell cycle control and apoptosis. Repeated WD40 motifs act as a site for protein-protein or protein-DNA interaction, and proteins containing WD40 repeats are known to serve as platforms for the assembly of protein complexes or mediators of transient interplay among other proteins [ ]. The specificity of the proteins is determined by the sequences outside the repeats themselves. Examples of such complexes are G proteins (beta subunit is a β-propeller), TAFII transcription factor, and E3 ubiquitin ligase [, ]. In Arabidopsis spp., several WD40-containing proteins act as key regulators of plant-specific developmental events.This entry identifies a domain superfamily containing adjacent WD repeats. Its structure consists of seven 4-stranded β-sheet motifs.
Protein Domain
Name: Gamma-retroviral matrix domain superfamily
Type: Homologous_superfamily
Description: Retroviral matrix proteins (or major core proteins) are components of envelope-associated capsids, which line the inner surface of virus envelopes and are associated with viral membranes [ ]. Matrix proteins are produced as part of Gag precursor polyproteins. During viral maturation, the Gag polyprotein is cleaved into major structural proteins by the viral protease, yielding the matrix (MA), capsid (CA), nucleocapsid (NC), and some smaller peptides. Gag-derived proteins govern the entire assembly and release of the virus particles, with matrix proteins playing key roles in Gag stability, capsid assembly, transport and budding. Although matrix proteins from different retroviruses appear to perform similar functions and can have similar structural folds which predominantly consist of four closely packed α-helices that are interconnected through loops, their primary sequences can be very different []. This entry represents matrix proteins from gamma-retroviruses, such as Moloney murine leukemia virus (MoMLV), Feline leukemia virus (FLV), and Feline sarcoma virus (FESV) [ , ]. This entry also identifies matrix proteins from several eukaryotic endogenous retroviruses, which arise when one or more copies of the retroviral genome becomes integrated into the host genome [].
Protein Domain
Name: WDR19, WD40 repeat
Type: Repeat
Description: This entry represents the WD-40 repeat found in WDR19 and its homologue, dyf-2. They are part of the IFT complex A (IFT-A), a complex required for retrograde ciliary transport [ ].WD-40 repeats (also known as WD or beta-transducin repeats) are short ~40 amino acid motifs, often terminating in a Trp-Asp (W-D) dipeptide. WD40 repeats usually assume a 7-8 bladed β-propeller fold, but proteins have been found with 4 to 16 repeated units, which also form a circularised β-propeller structure. WD-repeat proteins are a large family found in all eukaryotes and are implicated in a variety of functions ranging from signal transduction and transcription regulation to cell cycle control and apoptosis. Repeated WD40 motifs act as a site for protein-protein or protein-DNA interaction, and proteins containing WD40 repeats are known to serve as platforms for the assembly of protein complexes or mediators of transient interplay among other proteins [ ]. The specificity of the proteins is determined by the sequences outside the repeats themselves. Examples of such complexes are G proteins (beta subunit is a β-propeller), TAFII transcription factor, and E3 ubiquitin ligase [, ]. In Arabidopsis spp., several WD40-containing proteins act as key regulators of plant-specific developmental events.
Protein Domain
Name: NIF system FeS cluster assembly, NifU, N-terminal
Type: Domain
Description: Iron-sulphur (FeS) clusters are important cofactors for numerous proteins involved in electron transfer, in redox and non-redox catalysis, in gene regulation, and as sensors of oxygen and iron. These functions depend on the various FeS cluster prosthetic groups, the most common being [2Fe-2S] and [4Fe-4S][ ]. FeS cluster assembly is a complex process involving the mobilisation of Fe and S atoms from storage sources, their assembly into [Fe-S]form, their transport to specific cellular locations, and their transfer to recipient apoproteins. So far, three FeS assembly machineries have been identified, which are capable of synthesising all types of [Fe-S] clusters: ISC (iron-sulphur cluster), SUF (sulphur assimilation), and NIF (nitrogen fixation) systems.The ISC system is conserved in eubacteria and eukaryotes (mitochondria), and has broad specificity, targeting general FeS proteins [ , ]. It is encoded by the isc operon (iscRSUA-hscBA-fdx-iscX). IscS is a cysteine desulphurase, which obtains S from cysteine (converting it to alanine) and serves as a S donor for FeS cluster assembly. IscU and IscA act as scaffolds to accept S and Fe atoms, assembling clusters and transferring them to recipient apoproteins. HscA is a molecular chaperone and HscB is a co-chaperone. Fdx is a [2Fe-2S]-type ferredoxin. IscR is a transcription factor that regulates expression of the isc operon. IscX (also known as YfhJ) appears to interact with IscS and may function as an Fe donor during cluster assembly [ ].The SUF system is an alternative pathway to the ISC system that operates under iron starvation and oxidative stress. It is found in eubacteria, archaea and eukaryotes (plastids). The SUF system is encoded by the suf operon (sufABCDSE), and the six encoded proteins are arranged into two complexes (SufSE and SufBCD) and one protein (SufA). SufS is a pyridoxal-phosphate (PLP) protein displaying cysteine desulphurase activity. SufE acts as a scaffold protein that accepts S from SufS and donates it to SufA [ ]. SufC is an ATPase with an unorthodox ATP-binding cassette (ABC)-like component. SufA is homologous to IscA [], acting as a scaffold protein in which Fe and S atoms are assembled into [FeS]cluster forms, which can then easily be transferred to apoproteins targets. In the NIF system, NifS and NifU are required for the formation of metalloclusters of nitrogenase in Azotobacter vinelandii, and other organisms, as well as in the maturation of other FeS proteins. Nitrogenase catalyses the fixation of nitrogen. It contains a complex cluster, the FeMo cofactor, which contains molybdenum, Fe and S. NifS is a cysteine desulphurase. NifU binds one Fe atom at its N-terminal, assembling an FeS cluster that is transferred to nitrogenase apoproteins [ ]. Nif proteins involved in the formation of FeS clusters can also be found in organisms that do not fix nitrogen [].This entry represents the N-terminal of NifU and homologous proteins. NifU contains two domains: an N-terminal and a C-terminal domain ( ) [ ]. These domains exist either together or on different polypeptides, both domains being found in organisms that do not fix nitrogen (e.g. yeast), so they have a broader significance in the cell than nitrogen fixation.
Protein Domain
Name: Insulin-like growth factor-binding protein, IGFBP
Type: Domain
Description: Insulin-like Growth Factor Binding Proteins (IGFBP) are a group of vertebrate secreted proteins, which bind to IGF-I and IGF-II with high affinity and modulate the biological actions of IGFs. The IGFBP family has six distinct subgroups, IGFBP-1 through 6, based on conservation of gene (intron-exon) organisation, structural similarity, and binding affinity for IGFs. Across species, IGFBP-5 exhibits the most sequence conservation, while IGFBP-6 exhibits the least sequence conservation. The IGFBPs contain inhibitor domain homologues, which are related to MEROPS protease inhibitor family I31 (equistatin, clan IX). All IGFBPs share a common domain architecture ( : ). While the N-terminal ( , IGF binding protein domain), and the C-terminal ( , thyroglobulin type-1 repeat) domains are conserved across vertebrate species, the mid-region is highly variable with respect to protease cleavage sites and phosphorylation and glycosylation sites. IGFBPs contain 16-18 conserved cysteines located in the N-terminal and the C-terminal regions, which form 8-9 disulphide bonds [ ]. As demonstrated for human IGFBP-5, the N terminus is the primary binding site for IGF. This region, comprised of Val49, Tyr50, Pro62 and Lys68-Leu75, forms a hydrophobic patch on the surface of the protein [ ]. The C terminus is also required for high affinity IGF binding, as well as for binding to the extracellular matrix [] and for nuclear translocation [, ] of IGFBP-3 and -5. IGFBPs are unusually pleiotropic molecules. Like other binding proteins, IGFBP can prolong the half-life of IGFs via high affinity binding of the ligands. In addition to functioning as simple carrier proteins, serum IGFBPs also serve to regulate the endocrine and paracrine/autocrine actions of IGF by modulating the IGF available to bind to signalling IGF-I receptors [ , ]. Furthermore, IGFBPs can function as growth modulators independent of IGFs. For example, IGFBP-5 stimulates markers of bone formation in osteoblasts lacking functional IGFs []. The binding of IGFBP to its putative receptor on the cell membrane may stimulate the signalling pathway independent of an IGF receptor, to mediate the effects of IGFBPs in certain target cell types. IGFBP-1 and -2, but not other IGFBPs, contain a C-terminal Arg-Gly-Asp integrin-binding motif. Thus, IGFBP-1 can also stimulate cell migration of CHO and human trophoblast cells through an action mediated by alpha 5 beta 1 integrin []. Finally, IGFBPs transported into the nucleus (via the nuclear localisation signal) may also exert IGF-independent effects by transcriptional activation of genes.This entry represents insulin-like growth factors (IGF-I and IGF-II), which bind to specific binding proteins in extracellular fluids with high affinity [ , , ]. These IGF-binding proteins (IGFBP) prolong the half-life of the IGFs and have been shown to either inhibit or stimulate the growth promoting effects of the IGFs on cells culture. They seem to alter the interaction of IGFs with their cell surface receptors. There are at least six different IGFBPs and they are structurally related. The following growth-factor inducible proteins are structurally related to IGFBPs and could function as growth-factor binding proteins [ , ], mouse protein cyr61 and its probable chicken homologue, protein CEF-10; human connective tissue growth factor (CTGF) and its mouse homologue, protein FISP-12; and vertebrate protein NOV.
Protein Domain
Name: Zinc finger, FLYWCH-type
Type: Domain
Description: Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [ , , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. C2H2-type (classical) zinc fingers (Znf) were the first class to be characterised. They contain a short β hairpin and an α helix (β/β/α structure), where a single zinc atom is held in place by Cys(2)His(2) (C2H2) residues in a tetrahedral array. C2H2 Znf's can be divided into three groups based on the number and pattern of fingers: triple-C2H2 (binds single ligand), multiple-adjacent-C2H2 (binds multiple ligands), and separated paired-C2H2 [ ]. C2H2 Znf's are the most common DNA-binding motifs found in eukaryotic transcription factors, and have also been identified in prokaryotes []. Transcription factors usually contain several Znf's (each with a conserved β/β/α structure) capable of making multiple contacts along the DNA, where the C2H2 Znf motifs recognise DNA sequences by binding to the major groove of DNA via a short α-helix in the Znf, the Znf spanning 3-4 bases of the DNA []. C2H2 Znf's can also bind to RNA and protein targets [].This entry represents a potential FLYWCH Zn-finger domain found in a number of eukaryotic proteins. FLYWCH is a C2H2-type zinc finger characterised by five conserved hydrophobic residues, containing the conserved sequence motif:F/Y-X(n)-L-X(n)-F/Y-X(n)-WXCX(6-12)CX(17-22)HXH where X indicates any amino acid. This domain was first characterised in Drosophila Modifier of mdg4 proteins, Mod(mgd4), putative chromatin modulators involved in higher order chromatin domains. Mod(mdg4) proteins share a common N-terminal BTB/POZ domain, but differ in their C-terminal region, most containing C-terminal FLYWCH zinc finger motifs [ ]. The FLYWCH domain in Mod(mdg4) proteins has a putative role in protein-protein interactions; for example, Mod(mdg4)-67.2 interacts with DNA-binding protein Su(Hw) via its FLYWCH domain.FLYWCH domains have been described in other proteins as well, including suppressor of killer of prune, Su(Kpn), which contains 4 terminal FLYWCH zinc finger motifs in a tandem array and a C-terminal glutathione SH-transferase (GST) domain [ ].
Protein Domain
Name: NIF system FeS cluster assembly, NifU, C-terminal
Type: Domain
Description: Iron-sulphur (FeS) clusters are important cofactors for numerous proteins involved in electron transfer, in redox and non-redox catalysis, in gene regulation, and as sensors of oxygen and iron. These functions depend on the various FeS cluster prosthetic groups, the most common being [2Fe-2S] and [4Fe-4S][ ]. FeS cluster assembly is a complex process involving the mobilisation of Fe and S atoms from storage sources, their assembly into [Fe-S]form, their transport to specific cellular locations, and their transfer to recipient apoproteins. So far, three FeS assembly machineries have been identified, which are capable of synthesising all types of [Fe-S] clusters: ISC (iron-sulphur cluster), SUF (sulphur assimilation), and NIF (nitrogen fixation) systems.The ISC system is conserved in eubacteria and eukaryotes (mitochondria), and has broad specificity, targeting general FeS proteins [ , ]. It is encoded by the isc operon (iscRSUA-hscBA-fdx-iscX). IscS is a cysteine desulphurase, which obtains S from cysteine (converting it to alanine) and serves as a S donor for FeS cluster assembly. IscU and IscA act as scaffolds to accept S and Fe atoms, assembling clusters and transferring them to recipient apoproteins. HscA is a molecular chaperone and HscB is a co-chaperone. Fdx is a [2Fe-2S]-type ferredoxin. IscR is a transcription factor that regulates expression of the isc operon. IscX (also known as YfhJ) appears to interact with IscS and may function as an Fe donor during cluster assembly [ ].The SUF system is an alternative pathway to the ISC system that operates under iron starvation and oxidative stress. It is found in eubacteria, archaea and eukaryotes (plastids). The SUF system is encoded by the suf operon (sufABCDSE), and the six encoded proteins are arranged into two complexes (SufSE and SufBCD) and one protein (SufA). SufS is a pyridoxal-phosphate (PLP) protein displaying cysteine desulphurase activity. SufE acts as a scaffold protein that accepts S from SufS and donates it to SufA [ ]. SufC is an ATPase with an unorthodox ATP-binding cassette (ABC)-like component. SufA is homologous to IscA [], acting as a scaffold protein in which Fe and S atoms are assembled into [FeS]cluster forms, which can then easily be transferred to apoproteins targets. In the NIF system, NifS and NifU are required for the formation of metalloclusters of nitrogenase in Azotobacter vinelandii, and other organisms, as well as in the maturation of other FeS proteins. Nitrogenase catalyses the fixation of nitrogen. It contains a complex cluster, the FeMo cofactor, which contains molybdenum, Fe and S. NifS is a cysteine desulphurase. NifU binds one Fe atom at its N-terminal, assembling an FeS cluster that is transferred to nitrogenase apoproteins [ ]. Nif proteins involved in the formation of FeS clusters can also be found in organisms that do not fix nitrogen [].This entry represents the C-terminal of NifU and homologous proteins. NifU contains two domains: an N-terminal ( ) and a C-terminal domain [ ]. These domains exist either together or on different polypeptides, both domains being found in organisms that do not fix nitrogen (e.g. yeast), so they have a broader significance in the cell than nitrogen fixation.
Protein Domain
Name: FeS assembly scaffold SufA, proteobacteria
Type: Family
Description: Iron-sulphur (FeS) clusters are important cofactors for numerous proteins involved in electron transfer, in redox and non-redox catalysis, in gene regulation, and as sensors of oxygen and iron. These functions depend on the various FeS cluster prosthetic groups, the most common being [2Fe-2S] and [4Fe-4S][ ]. FeS cluster assembly is a complex process involving the mobilisation of Fe and S atoms from storage sources, their assembly into [Fe-S]form, their transport to specific cellular locations, and their transfer to recipient apoproteins. So far, three FeS assembly machineries have been identified, which are capable of synthesising all types of [Fe-S]clusters: ISC (iron-sulphur cluster), SUF (sulphur assimilation), and NIF (nitrogen fixation) systems. The ISC system is conserved in eubacteria and eukaryotes (mitochondria), and has broad specificity, targeting general FeS proteins [ , ]. It is encoded by the isc operon (iscRSUA-hscBA-fdx-iscX). IscS is a cysteine desulphurase, which obtains S from cysteine (converting it to alanine) and serves as a S donor for FeS cluster assembly. IscU and IscA act as scaffolds to accept S and Fe atoms, assembling clusters and transferring them to recipient apoproteins. HscA is a molecular chaperone and HscB is a co-chaperone. Fdx is a [2Fe-2S]-type ferredoxin. IscR is a transcription factor that regulates expression of the isc operon. IscX (also known as YfhJ) appears to interact with IscS and may function as an Fe donor during cluster assembly [ ].The SUF system is an alternative pathway to the ISC system that operates under iron starvation and oxidative stress. It is found in eubacteria, archaea and eukaryotes (plastids). The SUF system is encoded by the suf operon (sufABCDSE), and the six encoded proteins are arranged into two complexes (SufSE and SufBCD) and one protein (SufA). SufS is a pyridoxal-phosphate (PLP) protein displaying cysteine desulphurase activity. SufE acts as a scaffold protein that accepts S from SufS and donates it to SufA [ ]. SufC is an ATPase with an unorthodox ATP-binding cassette (ABC)-like component. SufA is homologous to IscA [], acting as a scaffold protein in which Fe and S atoms are assembled into [FeS]cluster forms, which can then easily be transferred to apoproteins targets. In the NIF system, NifS and NifU are required for the formation of metalloclusters of nitrogenase in Azotobacter vinelandii, and other organisms, as well as in the maturation of other FeS proteins. Nitrogenase catalyses the fixation of nitrogen. It contains a complex cluster, the FeMo cofactor, which contains molybdenum, Fe and S. NifS is a cysteine desulphurase. NifU binds one Fe atom at its N-terminal, assembling an FeS cluster that is transferred to nitrogenase apoproteins [ ]. Nif proteins involved in the formation of FeS clusters can also be found in organisms that do not fix nitrogen [].This entry represents the SufA protein of the SUF system of iron-sulphur cluster biosynthesis from proteobacteria. SufA acts as a scaffold in which Fe and S are assembled into FeS clusters [ ]. This system performs FeS biosynthesis even during oxidative stress and tends to be absent in obligate anaerobic and microaerophilic bacteria.
Protein Domain
Name: Mitochondrial inner membrane translocase complex, subunit Tim17
Type: Family
Description: The mitochondrial protein translocase (MPT) family, which brings nuclearly encoded preproteins into mitochondria, is very complex with 19 currently identified protein constituents. These proteins include several chaperone proteins, four proteins of the outer membrane translocase (Tom) import receptor, five proteins of the Tom channel complex, five proteins of the inner membrane translocase (Tim) and three "motor"proteins. The inner membrane translocase is formed of a complex with a number of proteins, including the Tim17, Tim23 and Tim44 subunits. Tim17 and Tim23 are thought to form the translocation channel of the inner membrane [ ].
Protein Domain
Name: Clathrin/coatomer adaptor, adaptin-like, N-terminal
Type: Domain
Description: Proteins synthesized on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. This traffic is bidirectional, to ensure that proteins required to form vesicles are recycled. Vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transfer [ ]. Clathrin coats contain both clathrin and adaptor complexes that link clathrin to receptors in coated vesicles. Clathrin-associated protein complexes are believed to interact with the cytoplasmic tails of membrane proteins, leading to their selection and concentration. The two major types of clathrin adaptor complexes are the heterotetrameric adaptor protein (AP) complexes, and the monomeric GGA (Golgi-localising, Gamma-adaptin ear domain homology, ARF-binding proteins) adaptors [ ]. All AP complexes are heterotetramers composed of two large subunits (adaptins), a medium subunit (mu) and a small subunit (sigma). Each subunit has a specific function. Adaptin subunits recognise and bind to clathrin through their hinge region (clathrin box), and recruit accessory proteins that modulate AP function through their C-terminal appendage domains. By contrast, GGAs are monomers composed of four domains, which have functions similar to AP subunits: an N-terminal VHS (Vps27p/Hrs/Stam) domain, a GAT (GGA and Tom1) domain, a hinge region, and a C-terminal GAE (gamma-adaptin ear) domain. The GAE domain is similar to the AP gamma-adaptin ear domain, being responsible for the recruitment of accessory proteins that regulate clathrin-mediated endocytosis [].While clathrin mediates endocytic protein transport from ER to Golgi, coatomers (COPI, COPII) primarily mediate intra-Golgi transport, as well as the reverse Golgi to ER transport of dilysine-tagged proteins [ ]. Coatomers reversibly associate with Golgi (non-clathrin-coated) vesicles to mediate protein transport and for budding from Golgi membranes []. Coatomer complexes are hetero-oligomers composed of at least an alpha, beta, beta', gamma, delta, epsilon and zeta subunits. This entry represents the N-terminal domain of various adaptins from different AP clathrin adaptor complexes (including AP1, AP2, AP3 and AP4), and from the beta and gamma subunits of various coatomer (COP) adaptors. This domain has a 2-layer alpha/alpha fold that forms a right-handed superhelix, and is a member of the ARM repeat superfamily [ ]. The N-terminal region of the various AP adaptor proteins share strong sequence identity; by contrast, the C-terminal domains of different adaptins share similar structural folds, but have little sequence identity []. It has been proposed that the N-terminal domain interacts with another uniform component of the coated vesicles.
Protein Domain
Name: Clathrin adaptor, appendage, Ig-like subdomain superfamily
Type: Homologous_superfamily
Description: Proteins synthesized on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. This traffic is bidirectional, to ensure that proteins required to form vesicles are recycled. Vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transfer []. Clathrin coats contain both clathrin and adaptor complexes that link clathrin to receptors in coated vesicles. Clathrin-associated protein complexes are believed to interact with the cytoplasmic tails of membrane proteins, leading to their selection and concentration. The two major types of clathrin adaptor complexes are the heterotetrameric adaptor protein (AP) complexes, and the monomeric GGA (Golgi-localising, Gamma-adaptin ear domain homology, ARF-binding proteins) adaptors [ ]. All AP complexes are heterotetramers composed of two large subunits (adaptins), a medium subunit (mu) and a small subunit (sigma). Each subunit has a specific function. Adaptin subunits recognise and bind to clathrin through their hinge region (clathrin box), and recruit accessory proteins that modulate AP function through their C-terminal appendage domains. By contrast, GGAs are monomers composed of four domains, which have functions similar to AP subunits: an N-terminal VHS (Vps27p/Hrs/Stam) domain, a GAT (GGA and Tom1) domain, a hinge region, and a C-terminal GAE (gamma-adaptin ear) domain. The GAE domain is similar to the AP gamma-adaptin ear domain, being responsible for the recruitment of accessory proteins that regulate clathrin-mediated endocytosis [].While clathrin mediates endocytic protein transport from ER to Golgi, coatomers (COPI, COPII) primarily mediate intra-Golgi transport, as well as the reverse Golgi to ER transport of dilysine-tagged proteins [ ]. Coatomers reversibly associate with Golgi (non-clathrin-coated) vesicles to mediate protein transport and for budding from Golgi membranes []. Coatomer complexes are hetero-oligomers composed of at least an alpha, beta, beta', gamma, delta, epsilon and zeta subunits. This superfamily represents a β-sandwich structural motif found in the appendage (ear) domain of alpha-, beta- and gamma-adaptin from AP clathrin adaptor complexes, the GAE (gamma-adaptin ear) domain of GGA adaptor proteins, and the appendage domain of the gamma subunit of coatomer complexes. These domains have an immunoglobulin-like β-sandwich fold containing 7 or 8 strands in 2 β-sheets in a Greek key topology [ , , ]. Although the appendage domains from AP / GGA adaptins and coatomers share a similar fold, there is little sequence identity between them. However, they also share similar motif-based cargo recognition and accessory factor recruitment mechanisms.
Protein Domain
Name: Ubiquilin
Type: Family
Description: Ubiquitin [ ] is a protein of seventy six amino acid residues, found in all eukaryotic cells and whose sequence is extremely well conserved from protozoan to vertebrates. It is widely known as a post-translational tag used to signal a protein's hydrolytic destruction. Other functions for ubiquitin, depend on its differential internal isopeptide linkages. In addition, several ubiquitin-like proteins have been discovered from genome-sequencing efforts, other structural studies, and genetic screens. These new data show that proteins with the ubiquitin domain are adaptable, transposable genetic elements, which have been appended to other genes and utilised for many different cellular functions, depending on the ubiquitin-like protein's identity, subcellular location, and method of covalent attachment. The post-translational ligation of proteins to members of the ubiquitin superfamily can signal many different fates for the target protein [].Ubiquitin is a globular protein, the last four C-terminal residues (Leu-Arg-Gly-Gly) extending from the compact structure to form a 'tail' important for its function. The latter is mediated by the covalent conjugation of ubiquitin to target proteins, by an isopeptide linkage between the C-terminal glycine and the epsilon amino group of lysine residues in the target proteins.Ubiquilin is a Ubiquitin-like (UBL) protein and has an N-terminal UBL domain and a C-terminal Ub-associated (UBA) domain in its structure.
Protein Domain
Name: START-like domain superfamily
Type: Homologous_superfamily
Description: START (StAR-related lipid-transfer) is a lipid-binding domain in StAR, HD-ZIP and signalling proteins [ ]. StAR (Steroidogenic Acute Regulatory protein) is a mitochondrial protein that is synthesised in response to luteinising hormone stimulation [].Expression of the protein in the absence of hormone stimulation is sufficient to induce steroid production, suggesting that this protein is required in the acute regulation ofsteroidogenesis. Representatives of the START domain family have been shown to bind different ligands such as sterols (StAR protein) andphosphatidylcholine (PC-TP). Ligand binding by the START domain can also regulate the activities of other domains that co-occur with the START domainin multidomain proteins such as Rho-gap, the homeodomain, and the thioesterase domain [, ]. The crystal structure of START domain of human MLN64 shows analpha/beta fold built around an U-shaped incomplete β-barrel. Most importantly, the interior of the protein encompasses a 26 x 12 x 11 Angstromshydrophobic tunnel that is apparently large enough to bind a single cholesterol molecule []. The START domain structure revealed an unexpectedsimilarity to that of the birch pollen allergen Bet v 1 and to bacterial polyketide cyclases/aromatases [, ]. This superfamily represents an α/β sandwich structural domain found in a wide variety of protein families, including STAR-related lipid transfer proteins and homeobox-leucine zipper proteins.
Protein Domain
Name: RNA recognition motif domain
Type: Domain
Description: Many eukaryotic proteins containing one or more copies of a putative RNA-binding domain of about 90 amino acids are known to bind single-stranded RNAs [ , , ]. The largest group of single strand RNA-binding proteins is the eukaryotic RNA recognition motif (RRM) family that contains an eight amino acid RNP-1 consensus sequence [, ]. RRM proteins have a variety of RNA binding preferences and functions, and include heterogeneous nuclear ribonucleoproteins (hnRNPs), proteins implicated in regulation of alternative splicing (SR, U2AF, Sxl), protein components of small nuclear ribonucleoproteins (U1 and U2 snRNPs), and proteins that regulate RNA stability and translation (PABP, La, Hu) [, , ]. The RRM in heterodimeric splicing factor U2 snRNP auxiliary factor (U2AF) appears to have two RRM-like domains with specialised features for protein recognition []. The motif also appears in a few single stranded DNA binding proteins.The typical RRM consists of four anti-parallel β-strands and two α-helices arranged in a β-α-β-β-α-β fold with side chains that stack with RNA bases. Specificity of RNA binding is determined by multiple contacts with surrounding amino acids. A third helix is present during RNA binding in some cases [ ]. The RRM is reviewed in a number of publications [, , ].This entry also includes some bacterial putative RNA-binding proteins.
Protein Domain
Name: Octanoyltransferase
Type: Family
Description: Lipoate-protein ligase B [ ] (gene lipB), alternatively known as octanoyltransferase () is an enzyme that creates an amide linkage that joins the free carboxyl group of lipoic acid to the ε-amino group of a specific lysine residue in lipoate-dependent enzymes. octanoyl-[acyl-carrier-protein] + protein = protein N6-(octanoyl)lysine + acyl carrier proteinLipoyl(octanoyl) transferase catalyses the first committed step in the biosynthesis of lipoyl cofactor. The lipoyl cofactor is essential for the function of several key enzymes involved in oxidative metabolism, as it converts apoprotein into the biologically active holoprotein. Examples of such lipoylated proteins include pyruvate dehydrogenase (E2 domain), 2-oxoglutarate dehydrogenase (E2 domain), the branched-chain 2-oxoacid dehydrogenases and the glycine cleavage system (H protein) [ ]. Lipoyl-ACP can also act as a substrate [] although octanoyl-ACP is likely to be the true substrate []. The other enzyme involved in the biosynthesis of lipoyl cofactor is , lipoyl synthase. An alternative lipoylation pathway involves , lipoate-protein ligase, which can lipoylate apoproteins using exogenous lipoic acid (or its analogues). Such an enzyme has also been found in fungi [ ], where it is located in the mitochondria. It also seems to exist in plants [] and is encoded in the chloroplast genome of the red alga Cyanidium caldarium [].
Protein Domain
Name: Type II/III secretion system
Type: Family
Description: This family includes: protein D that is involved in the general (type II) secretion pathway (GSP) within Gram-negative bacteria, a signal sequence-dependent process responsible for protein export [ , , , , , , ] and protein G from the type III secretion system.A number of proteins are involved in the GSP; one of these is known as protein D (GSPD protein), the most probable location of which is the outer membrane [ ]. This suggests that protein D constitutes the apparatus of the accessory mechanism, and is thus involved in transporting exoproteins from the periplasm, across the outer membrane, to the extracellular environment.The type III secretion system is of great interest, as it is used to transport virulence factors from the pathogen directly into the host cell and is only triggered when the bacterium comes into close contact with the host. The protein subunits of the system are very similar to those of bacterial flagellar biosynthesis. However, while the latter forms a ring structure to allow secretion of flagellin and is an integral part of the flagellum itself [ ], type III subunits in the outer membrane translocate secreted proteins through a channel-like structure. Protein G aids in the structural assembly of the invasion complex [].
Protein Domain
Name: DIX domain superfamily
Type: Homologous_superfamily
Description: Proteins of the dishevelled family (Dsh and Dvl) play a key role in the transduction of the Wg/Wnt signal from the cell surface tothe nucleus: in response to Wnt signal, they block the degradation of beta- catenin by interacting with the scaffolding protein axin. The N terminus ofproteins of the dishevelled family and the C terminus of proteins of the axin family share a region of homology of about 85 amino acids, which has beencalled DIX for DIshevelled and aXin [ ]. The DIX domain is found associated with PDZ and DEP domains in proteins of the dishevelled family and with an RGS domain in proteins of the axin family. DIX has been shown to be a protein-protein interaction domain that is important for homo- and hetero-oligomerization of proteins of the dishevelled and axin families [, , , ]. The DIX domain has also be shownto be a signalling module that can target proteins to actin stress fibres and cytoplasmic vesicles to control Wnt signalling [].The Dvl2 DIX domain has been shown to form a predominantly helical structure [].A third type of DIX domain-possessing protein, known as Coiled-coil-DIX1 (Ccd1) or Dixin, forms homomeric and heteromeric complexes with Dvl and Axin and is positive regulator of Wnt signaling [ ].
Protein Domain
Name: Matrix protein, lentiviral and alpha-retroviral, N-terminal
Type: Homologous_superfamily
Description: Retroviral matrix proteins (or major core proteins) are components of envelope-associated capsids, which line the inner surface of virus envelopes and are associated with viral membranes [ ]. Matrix proteins are produced as part of Gag precursor polyproteins. During viral maturation, the Gag polyprotein is cleaved into major structural proteins by the viral protease, yielding the matrix (MA), capsid (CA), nucleocapsid (NC), and some smaller peptides. Gag-derived proteins govern the entire assembly and release of the virus particles, with matrix proteins playing key roles in Gag stability, capsid assembly, transport and budding. Although matrix proteins from different retroviruses appear to perform similar functions and can have similar structural foldswhich predominantly consist of four closely packed α-helices that are interconnected through loops, their primary sequences can be very different [ ]. This entry represents the N-terminal domain from matrix proteins from primate lentiviruses, such as human and simian immunodeficiency viruses (HIV and SIV, respectively), equine lentiviruses, such as Equine infectious anemia virus (EIAV), and avian alpha-retroviruses such as Rous sarcoma virus (RSV), and type C retroviruses, such as Avian sarcoma virus, respectively [ , , ]. This entry also identifies matrix proteins from several eukaryotic endogenous retroviruses, which arise when one or more copies of the retroviral genome becomes integrated into the host genome [].
Protein Domain
Name: TFP11/STIP/Ntr1
Type: Family
Description: This entry includes tuftelin-interacting protein 11 (TFP11) from humans, septin/tuftelin-interacting protein 1 (STIP1) from Drosophila melanogaster and Ntr1 (also known as Spp382) from budding yeasts.Ntr1 forms the NTR complex (NineTeen Complex) with Ntr2 and Prp43 (a DExD/H-box RNA helicase) that catalyses disassembly of ntron-lariat spliceosomes (ILS) and defective earlier spliceosomes. Ntr1 has been shown to interact with Prp43 through the N-terminal G-patch domain, with Ntr2 through a middle region, and with itself through the carboxyl half of the protein [ ]. The G-patch domain of Ntr1 activates Prp43 for spliceosome disassembly, while its C-terminal domain may have a safeguarding role preventing Prp43-mediated disassembly of wild-type spliceosomes other than the IL spliceosome [, ].Septin and tuftelin interacting proteins (STIPs) are G-patch domain proteins involved in spliceosome disassembly [ ]. The mouse protein, known as TFT11 was originally identified as a protein interacting with tuftelin, one of the presumed enamel matrix proteins []. The Drosophila protein STP1 was originally identifiedas a septin-interacting protein [ ]. In both cases these interactions were identified by a yeast two-hybrid system and their function and direct physical association were not characterised. Subsequent studies show that these proteins are widely expressed and function as splicing factors [, ]. STIP is essential for embryogenesis in Caenorhabditis elegans [].
Protein Domain
Name: SARS-like ORF8 accessory protein, Ig-like domain
Type: Domain
Description: Coronaviruses (CoVs) have a similar genomic structure and encodes four structural proteins (S, E, M and N) and a variable number of accessory proteins. Accessory proteins play an important role in virus-host interactions, especially in antagonizing or regulating host immunity and virus adaptation to the host. There are large variations in the number of accessory proteins (1-10) among coronaviruses. BetaCoVs have 3-5 accessory proteins, except for SARS-CoV and SARS-CoV-2, which possess the largest number of accessory proteins among all coronaviruses (10 and 9, respectively). ORF8 is the most variable accessory protein among those encoded by SARS related coronaviruses (SARSr-CoVs) and is not shared by all members of subgenus Sarbecovirus. SARSr ORF8 accessory proteins are characterized by the presence of an N-terminal hydrophobic signal peptide, a conserved N-glycosylation site, and enough cysteine residues with the potential to form disulfide bonds, drawing their picture as structurally stable potential ER-resident proteins. There is functional overlap between these proteins with involvement in immune modulation, which is probably accomplished through involvement in protein quality control. When ORF8 is exogenously overexpressed in cells, it disrupts IFN-I signaling. Unlike ORF8a/b of SARS-CoV, the SARS-CoV-2 ORF8 downregulates MHC-I in cells.This entry represents the immunoglobulin (Ig)-like domain from SARSr ORF8 [ , , , , ].
Protein Domain
Name: DIX domain
Type: Domain
Description: Proteins of the dishevelled family (Dsh and Dvl) play a key role in the transduction of the Wg/Wnt signal from the cell surface tothe nucleus: in response to Wnt signal, they block the degradation of beta- catenin by interacting with the scaffolding protein axin. The N terminus ofproteins of the dishevelled family and the C terminus of proteins of the axin family share a region of homology of about 85 amino acids, which has beencalled DIX for DIshevelled and aXin [ ]. The DIX domain is found associated with PDZ and DEP domains in proteins of the dishevelled family and with an RGS domain in proteins of the axin family. DIX has been shown to be a protein-protein interaction domain that is important for homo- and hetero-oligomerization of proteins of the dishevelled and axin families [, , , ]. The DIX domain has also be shownto be a signalling module that can target proteins to actin stress fibres and cytoplasmic vesicles to control Wnt signalling [].The Dvl2 DIX domain has been shown to form a predominantly helical structure [].A third type of DIX domain-possessing protein, known as Coiled-coil-DIX1 (Ccd1) or Dixin, forms homomeric and heteromeric complexes with Dvl and Axin and is positive regulator of Wnt signaling [ ].
Protein Domain
Name: RNA-binding domain superfamily
Type: Homologous_superfamily
Description: Many eukaryotic proteins containing one or more copies of a putative RNA-binding domain of about 90 amino acids are known to bind single-stranded RNAs [ , , ]. The largest group of single strand RNA-binding proteins is the eukaryotic RNA recognition motif (RRM) family that contains an eight amino acid RNP-1 consensus sequence [, ]. RRM proteins have a variety of RNA binding preferences and functions, and include heterogeneous nuclear ribonucleoproteins (hnRNPs), proteins implicated in regulation of alternative splicing (SR, U2AF, Sxl), protein components of small nuclear ribonucleoproteins (U1 and U2 snRNPs), and proteins that regulate RNA stability and translation (PABP, La, Hu) [, , ]. The RRM in heterodimeric splicing factor U2 snRNP auxiliary factor (U2AF) appears to have two RRM-like domains with specialised features for protein recognition []. The motif also appears in a few single stranded DNA binding proteins.The typical RRM consists of four anti-parallel β-strands and two α-helices arranged in a β-α-β-β-α-β fold with side chains that stack with RNA bases. Specificity of RNA binding is determined by multiple contacts with surrounding amino acids. A third helix is present during RNA binding in some cases [ ]. The RRM is reviewed in a number of publications [, , ].This entry also includes some bacterial putative RNA-binding proteins.
Protein Domain
Name: Membrane insertase YidC/ALB, C-terminal
Type: Domain
Description: This entry represents the C-terminal domain of membrane insertase YidC from bacteria, ALBINO3-like proteins from plants and similar proteins.YidC is a bacterial membrane protein which is required for the insertion and assembly of inner membrane proteins [ , ]. The well-characterised YidC protein from Escherichia coli and its close homologues contain a large N-terminal periplasmic domain (). This protein interacts with SecYEG protein-conducting channel and the accessory proteins SecDF-YajC to form the bacterial holo-translocon (HTL) [ ]. Plant ALBINO3-like proteins are required for the insertion of some light harvesting chlorophyll-binding proteins (LHCP) into the chloroplast thylakoid membrane [ , ].
Protein Domain
Name: Metallo-dependent phosphatase-like
Type: Homologous_superfamily
Description: This entry represents a domain found in metallo-dependent phosphatases. Proteins containing this domain include: Purple acid phosphatases [ ]. DNA double-strand break repair nucleases [ ]. 5'-nucleotidases (syn. UDP-sugar hydrolase). Protein serine/threonine phosphatase. Phosphoesterase-related proteins. TT1561-like proteins. YfcE-like proteins. Hypothetical protein aq_1666.DR1281-like proteins. TTHA0625-like proteins.GpdQ-like proteins.ADPRibase-Mn-like proteins. Metallophosphatases (MPPs), also known as metallophosphoesterases, phosphodiesterases (PDEs), binuclear metallophosphoesterases, and dimetal-containing phosphoesterases (DMPs), represent a diverse superfamily of enzymes with a conserved domain containing an active site consisting of two metal ions (usually manganese, iron, or zinc) coordinated with octahedral geometry by a cage of histidine, aspartate, and asparagine residues [ , ].
Protein Domain
Name: 14-3-3 domain
Type: Domain
Description: The 14-3-3 proteins are a large family of approximately 30kDa acidic proteins which exist primarily as homo- and heterodimers within all eukaryotic cells [ , ]. These are structurally similar phospho-binding proteins that regulate multiple signaling pathways []. There is a high degree of sequence identity and conservation between all the 14-3-3 isotypes, particularly in the regions which form the dimer interface or line the central ligand binding channel of the dimeric molecule. Each 14-3-3 protein sequence can be roughly divided into three sections: a divergent amino terminus, the conserved core region and a divergent carboxyl terminus. The conserved middle core region of the 14-3-3s encodes an amphipathic groove that forms the main functional domain, a cradle for interacting with client proteins. The monomer consists of nine helices organised in an antiparallel manner, forming an L-shaped structure. The interior of the L-structure is composed of four helices: H3 and H5, which contain many charged and polar amino acids, and H7 and H9, which contain hydrophobic amino acids. These four helices form the concave amphipathic groove that interacts with target peptides.The 14-3-3 proteins mainly bind proteins containing phosphothreonine or phosphoserine motifs, however exceptions to this rule do exist. Extensive investigation of the 14-3-3 binding site of the mammalian serine/threonine kinase Raf-1 has produced a consensus sequence for 14-3-3-binding, RSxpSxP (in the single-letter amino-acid code, where x denotes any amino acid and p indicates that the next residue is phosphorylated). The 14-3-3 proteins appear to effect intracellular signalling in one of three ways - by direct regulation of the catalytic activity of the bound protein, by regulating interactions between the bound protein and other molecules in the cell by sequestration or modification or by controlling the subcellular localisation of the bound ligand. Proteins appear to initially bind to a single dominant site and then subsequently to many, much weaker secondary interaction sites. The 14-3-3 dimer is capable of changing the conformation of its bound ligand whilst itself undergoing minimal structural alteration. This entry represents the structural domain found in 14-3-3 proteins.
Protein Domain
Name: Zinc finger, ClpX C4-type
Type: Domain
Description: Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [ , , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. The ClpX heat shock protein of Escherichia coli is a member of the universally conserved Hsp100 family of proteins, and possesses a putative zinc finger motif of the C4 type [ ]. This presumed zinc binding domain (ZBD) is found at the N terminus of the ClpX protein. ClpX is an ATPase which functions both as a substrate specificity component of the ClpXP protease and as a molecular chaperone. ZBD is a member of the treble clef zinc finger family, a motif known to facilitate protein-ligand, protein-DNA, and protein-protein interactions and forms a constitutive dimer that is essential for the degradation of some, but not all, ClpX substrates [].
Protein Domain
Name: CRISPR-associated protein, GSU0054
Type: Family
Description: The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [ ]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [ , , ].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [ ]. This entry represents a rare CRISPR-associated protein. CRISPR-associated proteins typically are found near CRISPR repeats and other CRISPR-associated proteins, have low levels of sequence identify, have sequence relationships that suggest lateral transfer, and show some sequence similarity to DNA-active proteins such as helicases and repair proteins.
Protein Domain
Name: Zinc finger, SWIM-type
Type: Domain
Description: Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [ , , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. This entry represents the SWIM (SWI2/SNF2 and MuDR) zinc-binding domain, which is found in a variety of prokaryotic and eukaryotic proteins, such as mitogen-activated protein kinase kinase kinase 1 (or MEKK1). It is also found in the related protein MEX (MEKK1-related protein X), a testis-expressed protein that acts as an E3 ubiquitin ligase through the action of E2 ubiquitin-conjugating enzymes in the proteasome degradation pathway; the SWIM domain is critical for MEX ubiquitination [ ]. SWIM domains are also found in the homologous recombination protein Sws1 [], as well as in several hypothetical proteins.
Protein Domain
Name: F-box domain
Type: Domain
Description: First identified in cyclin-F as a protein-protein interaction motif, the F-box is a conserved domain that is present in numerous proteins with a bipartite structure []. Through the F-box, these proteins are linked to the Skp1 protein and the core of SCFs (Skp1-cullin-F-box protein ligase) complexes. SCFs complexes constitute a new class of E3 ligases []. They function in combination with the E2 enzyme Cdc34 to ubiquitinate G1 cyclins, Cdk inhibitors and many other proteins, to mark them for degradation. The binding of the specific substrates by SCFs complexes is mediated by divergent protein-protein interaction motifs present in F-box proteins, like WD40 repeats, leucine rich repeats [, ] or ANK repeats.
Protein Domain
Name: F-box-like domain superfamily
Type: Homologous_superfamily
Description: First identified in cyclin-F as a protein-protein interaction motif, the F-box is a conserved domain that is present in numerous proteins with a bipartite structure []. Through the F-box, these proteins are linked to the Skp1 protein and the core of SCFs (Skp1-cullin-F-box protein ligase) complexes. SCFs complexes constitute a new class of E3 ligases []. They function in combination with the E2 enzyme Cdc34 to ubiquitinate G1 cyclins, Cdk inhibitors and many other proteins, to mark them for degradation. The binding of the specific substrates by SCFs complexes is mediated by divergent protein-protein interaction motifs present in F-box proteins, like WD40 repeats, leucine rich repeats [, ] or ANK repeats.
Protein Domain
Name: Peptidase A2A, retrovirus, catalytic
Type: Domain
Description: This group of aspartic peptidases belong to the peptidase clan AA. The clan includes the single domain aspartic proteases from retroviruses, retrotransposons, and badnaviruses (plant dsDNA viruses) which are active as homodimers. While fungal and mammalian pepsins are bilobal proteins with structurally related N- and C-termini, retropepsins are half as long as their fungal and mammalian counterparts. The monomers are structurally related to one lobe of the pepsin molecule and retropepsins function as homodimers. The active site aspartate occurs within a motif (Asp-Thr/Ser-Gly), as it does in pepsin [ , ]. Family A2 includes the peptidase (retropepsin, EC 3.4.23.16) from the human immunodeficiency virus and other retroviruses. In most retroviruses, the peptidase is encoded as a segment of a polyprotein (usually the pol polyprotein, which includes the peptidase, a reverse transcriptase, RNase H and an integrase, but occassionally the gag polyprotein) which it cleaves during viral maturation to release individual proteins. Some retrotransposon polyproteins also contain a homologous, retropepsin-like peptidase which is also a member of family A2.Family A3 includes peptidases from the double-stranded DNA plant viruses known as badnaviruses or pararetroviruses. The viral genome includes genes (ORFs IV and V) that encodes polyproteins. The ORF V polyprotein contains the peptidase and a reverse transcriptase. The peptidase processes the ORF IV polyprotein, which includes the viral coat protein [ ].Family A9 includes peptidases from spumaretroviruses, and the peptidase is a component of either the gag and pol polyprotein, which is processes [ ]. The structure has been solved for the peptidase from simian foamy virus and shows a retropepsin-like fold [].Family A11 includes polyprotein-processing peptidases from retrotransposons such as the copia transposon from Drosophila melanogaster. No tertiary structure has been solved for any member of the family, and family A11 is included in clan AA on the basis of the similar motif around the active site Asp. Family A28 includes the yeast DNA-damage inducible protein 1 which is a component of the DNA repair pathway. The tertiary structure shows a retropepsin-like fold [ ]. This peptidase is not a component of a polyprotein.Family A32 includes the bacterial PerP peptidase which converts the transmembrane factor PodJ from a form that recruits proteins for pilus formation, to a truncated form that recruits proteins for stalk formation. This converts the bacterium from a motile form to the sessile form found in biofilms [ ].Aspartic peptidases, also known as aspartyl proteases ([intenz:3.4.23.-]), are widely distributed proteolytic enzymes [, , ] known to exist in vertebrates, fungi, plants, protozoa, bacteria, archaea, retroviruses and some plant viruses. All known aspartic peptidases are endopeptidases. A water molecule, activated by two aspartic acid residues, acts as the nucleophile in catalysis. Aspartic peptidases can be grouped into five clans, each of which shows a unique structural fold [].Peptidases in clan AA are either bilobed (family A1 or the pepsin family) or are a homodimer (all other families in the clan, including retropepsin from HIV-1/AIDS) [ ]. Each lobe consists of a single domain with a closed β-barrel and each lobe contributes one Asp to form the active site. Most peptidases in the clan are inhibited by the naturally occurring small-molecule inhibitor pepstatin [].Clan AC contains the single family A8: the signal peptidase 2 family. Members of the family are found in all bacteria. Signal peptidase 2 processes the premurein precursor, removing the signal peptide. The peptidase has four transmembrane domains and the active site is on the periplasmic side of the cell membrane. Cleavage occurs on the amino side of a cysteine where the thiol group has been substituted by a diacylglyceryl group. Site-directed mutagenesis has identified two essential aspartic acid residues which occur in the motifs GNXXDRX and FNXAD (where X is a hydrophobic residue) [ ]. No tertiary structures have been solved for any member of the family, but because of the intramembrane location, the structure is assumed not to be pepsin-like.Clan AD contains two families of transmembrane endopeptidases: A22 and A24. These are also known as "GXGD peptidases"because of a common GXGD motif which includes one of the pair of catalytic aspartic acid residues. Structures are known for members of both families and show a unique, common fold with up to nine transmembrane regions [ ]. The active site aspartic acids are located within a large cavity in the membrane into which water can gain access [].Clan AE contains two families, A25 and A31. Tertiary structures have been solved for members of both families and show a common fold consisting of an α-β-alpha sandwich, in which the beta sheet is five stranded [ , ].Clan AF contains the single family A26. Members of the clan are membrane-proteins with a unique fold. Homologues are known only from bacteria. The structure of omptin (also known as OmpT) shows a cylindrical barrel containing ten beta strands inserted in the membrane with the active site residues on the outer surface [ ].There are two families of aspartic peptidases for which neither structure nor active site residues are known and these are not assigned to clans. Family A5 includes thermopsin, an endopeptidase found only in thermophilic archaea. Family A36 contains sporulation factor SpoIIGA, which is known to process and activate sigma factor E, one of the transcription factors that controls sporulation in bacteria [ ].
Protein Domain
Name: Retropepsins
Type: Domain
Description: This group of aspartic peptidases belong to the peptidase clan AA. The clan includes the single domain aspartic proteases from retroviruses, retrotransposons, and badnaviruses (plant dsDNA viruses) which are active as homodimers. While fungal and mammalian pepsins are bilobal proteins with structurally related N- and C-termini, retropepsins are half as long as their fungal and mammalian counterparts. The monomers are structurally related to one lobe of the pepsin molecule and retropepsins function as homodimers. The active site aspartate occurs within a motif (Asp-Thr/Ser-Gly), as it does in pepsin [ , ]. Family A2 includes the peptidase (retropepsin, EC 3.4.23.16) from the human immunodeficiency virus and other retroviruses. In most retroviruses, the peptidase is encoded as a segment of a polyprotein (usually the pol polyprotein, which includes the peptidase, a reverse transcriptase, RNase H and an integrase, but occassionally the gag polyprotein) which it cleaves during viral maturation to release individual proteins. Some retrotransposon polyproteins also contain a homologous, retropepsin-like peptidase which is also a member of family A2.Family A3 includes peptidases from the double-stranded DNA plant viruses known as badnaviruses or pararetroviruses. The viral genome includes genes (ORFs IV and V) that encodes polyproteins. The ORF V polyprotein contains the peptidase and a reverse transcriptase. The peptidase processes the ORF IV polyprotein, which includes the viral coat protein [ ].Family A9 includes peptidases from spumaretroviruses, and the peptidase is a component of either the gag and pol polyprotein, which is processes [ ]. The structure has been solved for the peptidase from simian foamy virus and shows a retropepsin-like fold [].Family A11 includes polyprotein-processing peptidases from retrotransposons such as the copia transposon from Drosophila melanogaster. No tertiary structure has been solved for any member of the family, and family A11 is included in clan AA on the basis of the similar motif around the active site Asp. Family A28 includes the yeast DNA-damage inducible protein 1 which is a component of the DNA repair pathway. The tertiary structure shows a retropepsin-like fold [ ]. This peptidase is not a component of a polyprotein.Family A32 includes the bacterial PerP peptidase which converts the transmembrane factor PodJ from a form that recruits proteins for pilus formation, to a truncated form that recruits proteins for stalk formation. This converts the bacterium from a motile form to the sessile form found in biofilms [ ].Aspartic peptidases, also known as aspartyl proteases ([intenz:3.4.23.-]), are widely distributed proteolytic enzymes [, , ] known to exist in vertebrates, fungi, plants, protozoa, bacteria, archaea, retroviruses and some plant viruses. All known aspartic peptidases are endopeptidases. A water molecule, activated by two aspartic acid residues, acts as the nucleophile in catalysis. Aspartic peptidases can be grouped into five clans, each of which shows a unique structural fold [].Peptidases in clan AA are either bilobed (family A1 or the pepsin family) or are a homodimer (all other families in the clan, including retropepsin from HIV-1/AIDS) [ ]. Each lobe consists of a single domain with a closed β-barrel and each lobe contributes one Asp to form the active site. Most peptidases in the clan are inhibited by the naturally occurring small-molecule inhibitor pepstatin [].Clan AC contains the single family A8: the signal peptidase 2 family. Members of the family are found in all bacteria. Signal peptidase 2 processes the premurein precursor, removing the signal peptide. The peptidase has four transmembrane domains and the active site is on the periplasmic side of the cell membrane. Cleavage occurs on the amino side of a cysteine where the thiol group has been substituted by a diacylglyceryl group. Site-directed mutagenesis has identified two essential aspartic acid residues which occur in the motifs GNXXDRX and FNXAD (where X is a hydrophobic residue) [ ]. No tertiary structures have been solved for any member of the family, but because of the intramembrane location, the structure is assumed not to be pepsin-like. Clan AD contains two families of transmembrane endopeptidases: A22 and A24. These are also known as "GXGD peptidases"because of a common GXGD motif which includes one of the pair of catalytic aspartic acid residues. Structures are known for members of both families and show a unique, common fold with up to nine transmembrane regions [ ]. The active site aspartic acids are located within a large cavity in the membrane into which water can gain access [].Clan AE contains two families, A25 and A31. Tertiary structures have been solved for members of both families and show a common fold consisting of an α-β-alpha sandwich, in which the beta sheet is five stranded [ , ].Clan AF contains the single family A26. Members of the clan are membrane-proteins with a unique fold. Homologues are known only from bacteria. The structure of omptin (also known as OmpT) shows a cylindrical barrel containing ten beta strands inserted in the membrane with the active site residues on the outer surface [ ].There are two families of aspartic peptidases for which neither structure nor active site residues are known and these are not assigned to clans. Family A5 includes thermopsin, an endopeptidase found only in thermophilic archaea. Family A36 contains sporulation factor SpoIIGA, which is known to process and activate sigma factor E, one of the transcription factors that controls sporulation in bacteria [ ].
Protein Domain
Name: Peptidase M16, zinc-binding site
Type: Binding_site
Description: Over 70 metallopeptidase families have been identified to date. In these enzymes a divalent cation which is usually zinc, but may be cobalt, manganese or copper, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. In some families of co-catalytic metallopeptidases, two metal ions are observed in crystal structures ligated by five amino acids, with one amino acid ligating both metal ions. The known metal ligands are His, Glu, Asp or Lys. At least one other residue is required for catalysis, which may play an electrophillic role. Many metalloproteases contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site []. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as 'abXHEbbHbc', where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases [].A number of proteases dependent on divalent cations for their activity have been shown [, ] to belong to one family, on the basis of sequence similarity.These enzymes are listed below: Insulinase ( ) (also known as insulysin or insulin-degrading enzyme or IDE), a cytoplasmic enzyme which seems to be involved in the cellular processing of insulin, glucagon and other small polypeptides.Escherichia coli protease III ( ) (pitrilysin) (gene ptr), a periplasmic enzyme that degrades small peptides.Mitochondrial processing peptidase ( ) (MPP). This enzyme removes the transit peptide from the precursor form of proteins importedfrom the cytoplasm across the mitochondrial inner membrane. It is composed of two non-identical homologous subunits termed alpha and beta. The betasubunit seems to be catalytically active while the alpha subunit has probably lost its activity.Nardilysin ( ) (N-arginine dibasic convertase or NRD convertase) this mammalian enzyme cleaves peptide substrates on the N terminus of Argresidues in dibasic stretches. Klebsiella pneumoniae protein pqqF. This protein is required for the biosynthesis of the coenzyme pyrrolo-quinoline-quinone (PQQ). It is thoughtto be protease that cleaves peptide bonds in a small peptide (gene pqqA) thus providing the glutamate and tyrosine residues necessary for thesynthesis of PQQ.Saccharomyces cerevisiae (Baker's yeast) protein AXL1, which is involved in axial budding [ ].Eimeria bovis sporozoite developmental protein.E. coli hypothetical protein yddC and HI1368, the corresponding Haemophilus influenzae protein.Bacillus subtilis hypothetical protein ymxG.Caenorhabditis elegans hypothetical proteins C28F5.4 and F56D2.1.It should be noted that in addition to the above enzymes, this family also includes the core proteins I and II of the mitochondrial bc1 complex (alsocalled cytochrome c reductase or complex III), but the situation as to the activity or lack of activity of these subunits is quite complex:In mammals and yeast, core proteins I and II lack enzymatic activity.In Neurospora crassa and in potato core protein I is equivalent to the beta subunit of MPP.In Euglena gracilis, core protein I seems to be active, while subunit II is inactive.These proteins do not share many regions of sequence similarity; the most noticeable is in the N-terminal section. This region includes a conservedhistidine followed, two residues later by a glutamate and another histidine. In pitrilysin, it has been shown [, ] that this H-x-x-E-H motif is involved inenzyme activity; the two histidines bind zinc and the glutamate is necessary for catalytic activity. Non-active members of this family have lost from oneto three of these active site residues. This signature pattern only detects active members of the M16 peptidase family.
Protein Domain
Name: Octanoyltransferase, conserved site
Type: Conserved_site
Description: Lipoate-protein ligase B [ ] (gene lipB), alternatively known as octanoyltransferase () is an enzyme that creates an amide linkage that joins the free carboxyl group of lipoic acid to the ε-amino group of a specific lysine residue in lipoate-dependent enzymes. octanoyl-[acyl-carrier-protein] + protein = protein N6-(octanoyl)lysine + acyl carrier proteinLipoyl(octanoyl) transferase catalyses the first committed step in the biosynthesis of lipoyl cofactor. The lipoyl cofactor is essential for the function of several key enzymes involved in oxidative metabolism, as it converts apoprotein into the biologically active holoprotein. Examples of such lipoylated proteins include pyruvate dehydrogenase (E2 domain), 2-oxoglutarate dehydrogenase (E2 domain), the branched-chain 2-oxoacid dehydrogenases and the glycine cleavage system (H protein) [ , ]. Lipoyl-ACP can also act as a substrate [] although octanoyl-ACP is likely to be the true substrate []. The other enzyme involved in the biosynthesis of lipoyl cofactor is , lipoyl synthase. An alternative lipoylation pathway involves , lipoate-protein ligase, which can lipoylate apoproteins using exogenous lipoic acid (or its analogues). Such an enzyme has also been found in fungi [ ], where it is located in the mitochondria. It also seems to exist in plants [] and is encoded in the chloroplast genome of the red alga Cyanidium caldarium [].This entry represents a conserved region, located in the central part of the enzyme; this region contains one of two conserved histidines.
Protein Domain
Name: Nucleocapsid (N) protein, C-terminal domain, coronavirus
Type: Domain
Description: The Nucleocapsid (N) protein is a highly immunogenic phosphoprotein also implicated in viral genome replication and in modulating cell signalling pathways. The N protein interacts with genomic and subgenomic RNA molecules. Together with the envelope protein M, it participates in genome condensation and packaging. The N protein is a highly immunogenic and abundantly expressed protein during infection, capable of inducing protective immune responses against SARS-CoV and SARS-CoV-2 [ , , , , ].Coronavirus (CoV) nucleocapsid (N) proteins have 3 highly conserved domains. The N-terminal domain (NTD) (N1b), the C-terminal domain (CTD)(N2b) and the N3 region. The N1b and N2b domains from SARS CoV, infectious bronchitis virus (IBV), human CoV 229E and mouse hepatic virus (MHV) display similar topological organisations. N proteins form dimers, which are asymmetrically arranged into octamers via their N2b domains []. Both domains contribute to the binding of viral RNA genome [, ].This entry represents the C-terminal domain of the nucleocapsid proteins from Coronavirus. The C-terminal domain of the N protein (N-CTD) is involved in dimerization, and is thus, also called the dimerization domain [ , ]. Structurally, the C-terminal domain forms a tightly intertwined dimer with an intermolecular four-stranded central β-sheet platform flanked by alpha helices, indicating that the basic building block for coronavirus nucleocapsid formation is a dimeric assembly of N protein [].
Protein Domain
Name: Glycosyl transferase family 39/83
Type: Domain
Description: The biosynthesis of disaccharides, oligosaccharides and polysaccharides involves the action of hundreds of different glycosyltransferases. These enzymes catalyse the transfer of sugar moieties from activated donor molecules to specific acceptor molecules, forming glycosidic bonds. A classification of glycosyltransferases using nucleotide diphospho-sugar, nucleotide monophospho-sugar and sugar phosphates ([intenz:2.4.1.-]) and related proteins into distinct sequence based families has been described []. This classification is available on the CAZy (CArbohydrate-Active EnZymes) web site. The same three-dimensional fold is expected to occur within each of the families. Because 3-D structures are better conserved than sequences, several of the families defined on the basis of sequence similarities may have similar 3-D structures and therefore form 'clans'.Dolichyl-phosphate-mannose-protein mannosyltransferase proteins belong to the glycosyltransferase family 39 ( ) and are responsible for O-linked glycosylation of proteins. They catalyse the reaction: Dolichyl phosphate D-mannose + protein ->dolichyl phosphate + O-D-mannosyl-protein.The transfer of mannose to seryl and threonyl residues of secretory proteins is catalyzed by a family of protein mannosyltransferases in Saccharomyces cerevisiae coded for by seven genes (PMT1-7). Protein O-glycosylation is essential for cell wall rigidity and cell integrity and this protein modification is vital for S. cerevisiae [ ].Undecaprenyl phosphate-alpha-4-amino-4-deoxy-L-arabinose arabinosyl transferase proteins belong to the glycosyltransferase family 83 ( ). They catalyse the reaction: 4-amino-4-deoxy-alpha-L-arabinopyranosyl di-trans,octa-cis-undecaprenyl phosphate + lipid IV(A) = lipid II(A) + di-trans,octa-cis-undecaprenyl phosphate.
Protein Domain
Name: Nucleocapsid protein, C-terminal
Type: Homologous_superfamily
Description: The Nucleocapsid (N) protein is a highly immunogenic phosphoprotein also implicated in viral genome replication and in modulating cell signalling pathways. The N protein interacts with genomic and subgenomic RNA molecules. Together with the envelope protein M, it participates in genome condensation and packaging. The N protein is a highly immunogenic and abundantly expressed protein during infection, capable of inducing protective immune responses against SARS-CoV and SARS-CoV-2 [ , , , , ].Coronavirus (CoV) nucleocapsid (N) proteins have 3 highly conserved domains. The N-terminal domain (NTD) (N1b), the C-terminal domain (CTD)(N2b) and the N3 region. The N1b and N2b domains from SARS CoV, infectious bronchitis virus (IBV), human CoV 229E and mouse hepatic virus (MHV) display similar topological organisations. N proteins form dimers, which are asymmetrically arranged into octamers via their N2b domains [ ]. Both domains contribute to the binding of viral RNA genome [, ].This entry represents the C-terminal domain of the nucleocapsid proteins from Coronavirus and Arterivirus. The C-terminal domain of the N protein (N-CTD) is involved in dimerization, and is thus, also called the dimerization domain [ ]. Structurally, the C-terminal domain forms a tightly intertwined dimer with an intermolecular four-stranded central β-sheet platform flanked by alpha helices, indicating that the basic building block for coronavirus nucleocapsid formation is a dimeric assembly of N protein [ ].
Protein Domain
Name: SPX domain
Type: Domain
Description: The SPX domain is named after SYG1/Pho81/XPR1 proteins. This 180 residue length domain is found at the amino terminus of a variety of proteins. In the yeast protein SYG1, the N terminus directly binds to the G- protein beta subunit and inhibits transduction of the mating pheromone signal [ ] suggesting that all the members of this family are involved in G-protein associated signal transduction. The C-terminal of these proteins often have an EXS domain () [ ].The N-termini of several proteins involved in the regulation of phosphate transport, including the putative phosphate level sensors PHO81 from Saccharomyces cerevisiae and NUC-2 from Neurospora crassa, are also members of this family [ , ]. NUC-2 contains several ankyrin repeats (). Several members of this family are the XPR1 proteins: the xenotropic and polytropic retrovirus receptor confers susceptibility to infection with Murine leukemia virus (MLV) [ ]. The similarity between SYG1, phosphate regulators and XPR1 sequences has been previously noted, as has the additional similarity to several predicted proteins, of unknown function, from Drosophila melanogaster, Arabidopsis thaliana, Caenorhabditis elegans, Schizosaccharomyces pombe, and Saccharomyces cerevisiae [, ]. In addition, given the similarities between XPR1 and SYG1 and phosphate regulatory proteins, it has been proposed that XPR1 might be involved in G-protein associated signal transduction [, , ] and may itself function as a phosphate sensor [].
Protein Domain
Name: DsbB-like superfamily
Type: Homologous_superfamily
Description: Disulphide bonds contribute to folding, maturation, stability, and regulation of proteins, in particular those localized out of the cytosol. Oxidation of selected pairs of cysteines to disulphide in vivo requires cellular factors present in the bacterial periplasmic space or in the endoplasmic reticulum of eukaryotic cells [, ].The disulfide bond formation protein B (DsbB) is a component of the pathway that leads to disulphide bond formation in periplasmic proteins of Escherichia coli and other bacteria. The DsbB protein oxidises the periplasmic protein DsbA which in turn oxidises cysteines in other periplasmic proteins in order to make disulphide bonds [ ]. DsbB acts as a redox potential transducer across the cytoplasmic membrane. It is a membrane protein which spans the membrane four times with both the N- and C-termini of the protein are in the cytoplasm. Each of the periplasmic domains of the protein has two essential cysteines. The two cysteines in the first periplasmic domain are in a Cys-X-Y-Cys configuration that is characteristic of the active site of other proteins involved in disulphide bond formation, including DsbA and protein disulphide isomerase [].The structure of DsbB contains four transmembrane (TM) helices with both termini oriented toward the cytoplasm. The four TM segments are arranged into a four-helix-bundle configuration. In addition to these TM helices, a short helix with a horizontal axis exists in the periplasmic region [ ].
Protein Domain
Name: Bone sialoprotein 2
Type: Family
Description: The major non-collagenous matrix proteins (chondroitin-sulphate glycoproteins decorin and biglycan, sialoproteins osteopontin and bone sialoprotein, osteocalcin and osteonectin) have important roles in formation and organisation of the collagen matrix and nucleation and growth of hydroxyapatite crystals.Bone sialoprotein (BSP) is a major structural protein of the bone matrix that is specifically expressed by fully differentiated osteoblasts [ ], [] and is highly specific for mineralising tissues. The expression of bone sialoprotein (BSP) is normally restricted to mineralised connective tissues of bones and teeth; it is thought to be responsible for nucleation of hydroxyapatite crystals. BSP also displays a high affinity for calcium ions. The mature protein has a molecular weight of around 33kDa and consists predominantly of Glu and Gly residues. It is subject to extensive post-translational modification, and is predominantly phosphorylated at Ser residues. Post-translational modifications can also cause BSP to act as an inhibitor of hydroxyapatite crystal formation. Ectopic expression of BSP occurs in various lesions, including oral and extraoral carcinomas, in which it has been associated both with the formation of microcrystalline deposits and with the metastasis of cancer cells to bone. The expression of this protein may be altered with the cancer secretome [ , , ]. It has also been related to osteoarthritis ]. This entry represents the Bone sialoprotein 2 and similar proteins mainly found in animals.
Protein Domain
Name: Pneumovirinae attachment membrane glycoprotein G
Type: Family
Description: This family of proteins contain the major surface glycoprotein of turkey rhinotracheitis virus (TRTV), avian pneumovirus (APV), the aetiological agent of turkey rhinotracheitis (TRT), and other Metapneumoviruses. The major surface glycoprotein is the attachment (G) protein, which, by analogy with other respiratory syncytial viruses (RSV), has been proposed to be responsible for virus binding to its cell receptor. The APV G gene and its predicted protein have several features in common with their RSV counterparts. Both G proteins are type II glycoproteins and both the RSV G and APV G proteins are heavily O-glycosylated. In both RSV and APV, the G protein is the most variable protein and is a major target for neutralizing antibodies [].
Protein Domain
Name: SUF system FeS cluster assembly, SufB
Type: Family
Description: Iron-sulphur (FeS) clusters are important cofactors for numerous proteins involved in electron transfer, in redox and non-redox catalysis, in gene regulation, and as sensors of oxygen and iron. These functions depend on the various FeS cluster prosthetic groups, the most common being [2Fe-2S] and [4Fe-4S][ ]. FeS cluster assembly is a complex process involving the mobilisation of Fe and S atoms from storage sources, their assembly into [Fe-S]form, their transport to specific cellular locations, and their transfer to recipient apoproteins. So far, three FeS assembly machineries have been identified, which are capable of synthesising all types of [Fe-S] clusters: ISC (iron-sulphur cluster), SUF (sulphur assimilation), and NIF (nitrogen fixation) systems.The ISC system is conserved in eubacteria and eukaryotes (mitochondria), and has broad specificity, targeting general FeS proteins [ , ]. It is encoded by the isc operon (iscRSUA-hscBA-fdx-iscX). IscS is a cysteine desulphurase, which obtains S from cysteine (converting it to alanine) and serves as a S donor for FeS cluster assembly. IscU and IscA act as scaffolds to accept S and Fe atoms, assembling clusters and transferring them to recipient apoproteins. HscA is a molecular chaperone and HscB is a co-chaperone. Fdx is a [2Fe-2S]-type ferredoxin. IscR is a transcription factor that regulates expression of the isc operon. IscX (also known as YfhJ) appears to interact with IscS and may function as an Fe donor during cluster assembly [ ].The SUF system is an alternative pathway to the ISC system that operates under iron starvation and oxidative stress. It is found in eubacteria, archaea and eukaryotes (plastids). The SUF system is encoded by the suf operon (sufABCDSE), and the six encoded proteins are arranged into two complexes (SufSE and SufBCD) and one protein (SufA). SufS is a pyridoxal-phosphate (PLP) protein displaying cysteine desulphurase activity. SufE acts as a scaffold protein that accepts S from SufS and donates it to SufA [ ]. SufC is an ATPase with an unorthodox ATP-binding cassette (ABC)-like component. SufA is homologous to IscA [], acting as a scaffold protein in which Fe and S atoms are assembled into [FeS]cluster forms, which can then easily be transferred to apoproteins targets. In the NIF system, NifS and NifU are required for the formation of metalloclusters of nitrogenase in Azotobacter vinelandii, and other organisms, as well as in the maturation of other FeS proteins. Nitrogenase catalyses the fixation of nitrogen. It contains a complex cluster, the FeMo cofactor, which contains molybdenum, Fe and S. NifS is a cysteine desulphurase. NifU binds one Fe atom at its N-terminal, assembling an FeS cluster that is transferred to nitrogenase apoproteins [ ]. Nif proteins involved in the formation of FeS clusters can also be found in organisms that do not fix nitrogen [].This entry represents SufB, which is part of the SUF system and forms a complex with SufBCD.
Protein Domain
Name: Small GTPase Rho
Type: Family
Description: Small GTPases form an independent superfamily within the larger class of regulatory GTP hydrolases. This superfamily contains proteins that control a vast number of important processes and possess a common, structurally preserved GTP-binding domain [ , ]. Sequence comparisons of small G proteins from various species have revealed that they are conserved in primary structures at the level of 30-55% similarity [].Crystallographic analysis of various small G proteins revealed the presence of a 20kDa catalytic domain that is unique for the whole superfamily [ , ]. The domain is built of five alpha helices (A1-A5), six β-strands (B1-B6) and five polypeptide loops (G1-G5). A structural comparison of the GTP- and GDP-bound form, allows one to distinguish two functional loop regions: switch I and switch II that surround the gamma-phosphate group of the nucleotide. The G1 loop (also called the P-loop) that connects the B1 strand and the A1 helix is responsible for the binding of the phosphate groups. The G3 loop provides residues for Mg2 and phosphate binding and is located at the N terminus of the A2 helix. The G1 and G3 loops are sequentially similar to Walker A and Walker B boxes that are found in other nucleotide binding motifs. The G2 loop connects the A1 helix and the B2 strand and contains a conserved Thr residue responsible for Mg2 binding. The guanine base is recognised by the G4 and G5 loops. The consensus sequence NKXD of the G4 loop contains Lys and Asp residues directly interacting with the nucleotide. Part of the G5 loop located between B6 and A5 acts as a recognition site for the guanine base [].The small GTPase superfamily can be divided into at least 8 different families, including:Arf small GTPases. GTP-binding proteins involved in protein trafficking by modulating vesicle budding and uncoating within the Golgi apparatus.Ran small GTPases. GTP-binding proteins involved in nucleocytoplasmic transport. Required for the import of proteins into the nucleus and also for RNA export.Rab small GTPases. GTP-binding proteins involved in vesicular traffic.Rho small GTPases. GTP-binding proteins that control cytoskeleton reorganisation.Ras small GTPases. GTP-binding proteins involved in signalling pathways.Sar1 small GTPases. Small GTPase component of the coat protein complex II (COPII) which promotes the formation of transport vesicles from the endoplasmic reticulum (ER).Mitochondrial Rho (Miro). Small GTPase domain found in mitochondrial proteins involved in mitochondrial trafficking.Roc small GTPases domain. Small GTPase domain always found associated with the COR domain.This entry represents the Rho subfamily of Ras-like small GTPases. The small GTPase-like protein LIP2 (light insensitive period 2) from Arabidopsis thalianais implicated in control of the plant circadian rhythm [ ]. The crystal structures of a number of the members of this entry have been determined: Rnd3/RhoE [], RhoA [] and Cdc42 [].
Protein Domain
Name: Fe-S cluster assembly domain superfamily
Type: Homologous_superfamily
Description: Iron-sulphur (FeS) clusters are important cofactors for numerous proteins involved in electron transfer, in redox and non-redox catalysis, in gene regulation, and as sensors of oxygen and iron. These functions depend on the various FeS cluster prosthetic groups, the most common being [2Fe-2S] and [4Fe-4S][ ]. FeS cluster assembly is a complex process involving the mobilisation of Fe and S atoms from storage sources, their assembly into [Fe-S]form, their transport to specific cellular locations, and their transfer to recipient apoproteins. So far, three FeS assembly machineries have been identified, which are capable of synthesising all types of [Fe-S] clusters: ISC (iron-sulphur cluster), SUF (sulphur assimilation), and NIF (nitrogen fixation) systems.The ISC system is conserved in eubacteria and eukaryotes (mitochondria), and has broad specificity, targeting general FeS proteins [ , ]. It is encoded by the isc operon (iscRSUA-hscBA-fdx-iscX). IscS is a cysteine desulphurase, which obtains S from cysteine (converting it to alanine) and serves as a S donor for FeS cluster assembly. IscU and IscA act as scaffolds to accept S and Fe atoms, assembling clusters and transferring them to recipient apoproteins. HscA is a molecular chaperone and HscB is a co-chaperone. Fdx is a [2Fe-2S]-type ferredoxin. IscR is a transcription factor that regulates expression of the isc operon. IscX (also known as YfhJ) appears to interact with IscS and may function as an Fe donor during cluster assembly [ ].The SUF system is an alternative pathway to the ISC system that operates under iron starvation and oxidative stress. It is found in eubacteria, archaea and eukaryotes (plastids). The SUF system is encoded by the suf operon (sufABCDSE), and the six encoded proteins are arranged into two complexes (SufSE and SufBCD) and one protein (SufA). SufS is a pyridoxal-phosphate (PLP) protein displaying cysteine desulphurase activity. SufE acts as a scaffold protein that accepts S from SufS and donates it to SufA [ ]. SufC is an ATPase with an unorthodox ATP-binding cassette (ABC)-like component. SufA is homologous to IscA [], acting as a scaffold protein in which Fe and S atoms are assembled into [FeS]cluster forms, which can then easily be transferred to apoproteins targets. In the NIF system, NifS and NifU are required for the formation of metalloclusters of nitrogenase in Azotobacter vinelandii, and other organisms, as well as in the maturation of other FeS proteins. Nitrogenase catalyses the fixation of nitrogen. It contains a complex cluster, the FeMo cofactor, which contains molybdenum, Fe and S. NifS is a cysteine desulphurase. NifU binds one Fe atom at its N-terminal, assembling an FeS cluster that is transferred to nitrogenase apoproteins [ ]. Nif proteins involved in the formation of FeS clusters can also be found in organisms that do not fix nitrogen [].
Protein Domain
Name: CD34 antigen
Type: Family
Description: The CD34 group of monoclonal antibodies recognises CD34 (also termed CD34 antigen), a 105-120kDa cell surface glycoprotein, which is selectively expressed by human myeloid and lymphoid progenitor cells, including the haemopoietic stem cell. The protein is also expressed on vascularendothelial cells. Here, it is concentrated on the surface of the inter-digitating processes, suggesting a possible involvement in cell interactions or adhesion, by mediating the attachment of stem cells to the bone marrow extracellular matrix, or directly to stromal cells. The restricted pattern of expression of CD34 in haemopoiesis suggests that it may have a significant function in the earliest stages of blood cell differentiation in the bone marrow [, ].CD34 is a phosphoprotein shown to be activated by protein kinase C (PKC) in a developmental stage-specific manner. Analysis of the human CD34 sequencereveals that the protein appears to be a type I transmembrane (TM) molecule. The predicted internal portion of the protein appears to retain basic amino acid residues adjacent to Ser residues, presenting at least two potential target sites for PKC phosphorylation. In addition, there are two other consensus motifs that correspond to potential target sites for Ca+/calmodulin-dependent kinase and/or protease activated kinase I [].The protein is not strongly similar to other known proteins, but some weak similarities do exist: e.g., to the S+T region (a region rich in potentialO-linked carbohydrate attachment sites), the TM domain and cytoplasmic domain of cell surface proteins such as leukosialin, a major sialoglyco-protein of rat and human leukocytes; to the N-terminal glycosylated regionof CD45 (the leukocyte common antigen); and to groups of interrelated proteins involved in cell adhesion or the regulation of complement.A homologue of human CD34 is expressed in mouse. The amino acid sequences only diverge significantly at their N-termini, which are predicted to be highly glycosylated and whose functions are probably modulated by carbohydrate. The observed pattern of expression of the murine CD34 geneis consistent with that of the human antigen. That CD34 is also highly expressed outside haematopoiesis, by vascular endothelial cells and by fibroblasts in differentiated tissue, suggests a role common to a varietyof cell types. Concentration of CD34 on the interdigitating membrane projections of adjacent capillary endothelial cells has strengthened theidea that it functions in the control of events leading to cell-cell or cell-matrix adhesion, which role could be modulated by variation in itslevels of glycosylation. The conservation between the human and mouse cysteine-rich domain in the extracellular part of the protein, and theexceptionally high conservation of the cytoplasmic domain, imply that the protein is more than a carrier for either carbohydrate or negatively chargedterminal sialic acid residues (a role postulated for leukosialin/sialophorin). The highly conserved domain may serve to provide an internal signal of external contact with a ligand.
Protein Domain
Name: Zinc finger, AN1-type
Type: Domain
Description: Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [ , , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. This entry represents the AN1-type zinc finger domain, which has a dimetal (zinc)-bound alpha/beta fold. This domain was first identified as a zinc finger at the C terminus of AN1 , a ubiquitin-like protein in Xenopus laevis []. The AN1-type zinc finger contains six conserved cysteines and two histidines that could potentially coordinate 2 zinc atoms.Certain stress-associated proteins (SAP) contain AN1 domain, often in combination with A20 zinc finger domains (SAP8) or C2H2 domains (SAP16) [ ]. For example, the human protein Znf216 has an A20 zinc-finger at the N terminus and an AN1 zinc-finger at the C terminus, acting to negatively regulate the NFkappaB activation pathway and to interact with components of the immune response like RIP, IKKgamma and TRAF6. The interact of Znf216 with IKK-gamma and RIP is mediated by the A20 zinc-finger domain, while its interaction with TRAF6 is mediated by the AN1 zinc-finger domain; therefore, both zinc-finger domains are involved in regulating the immune response []. The AN1 zinc finger domain is also found in proteins containing a ubiquitin-like domain, which are involved in the ubiquitination pathway [ ]. Proteins containing an AN1-type zinc finger include:Ascidian posterior end mark 6 (pem-6) protein [ ].Human AWP1 protein (associated with PRK1), which is expressed during early embryogenesis [ ].Human immunoglobulin mu binding protein 2 (SMUBP-2), mutations in which cause muscular atrophy with respiratory distress type 1 [ ].
Protein Domain
Name: AN1-like Zinc finger
Type: Homologous_superfamily
Description: Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [ , , , , ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. This entry represents the AN1-type zinc finger domain, which has a dimetal (zinc)-bound alpha/beta fold. This domain was first identified as a zinc finger at the C terminus of AN1 , a ubiquitin-like protein in Xenopus laevis []. The AN1-type zinc finger contains six conserved cysteines and two histidines that could potentially coordinate 2 zinc atoms.Certain stress-associated proteins (SAP) contain AN1 domain, often in combination with A20 zinc finger domains (SAP8) or C2H2 domains (SAP16) [ ]. For example, the human protein Znf216 has an A20 zinc-finger at the N terminus and an AN1 zinc-finger at the C terminus, acting to negatively regulate the NFkappaB activation pathway and to interact with components of the immune response like RIP, IKKgamma and TRAF6. The interact of Znf216 with IKK-gamma and RIP is mediated by the A20 zinc-finger domain, while its interaction with TRAF6 is mediated by the AN1 zinc-finger domain; therefore, both zinc-finger domains are involved in regulating the immune response []. The AN1 zinc finger domain is also found in proteins containing a ubiquitin-like domain, which are involved in the ubiquitination pathway []. Proteins containing an AN1-type zinc finger include:Ascidian posterior end mark 6 (pem-6) protein [ ].Human AWP1 protein (associated with PRK1), which is expressed during early embryogenesis [ ].Human immunoglobulin mu binding protein 2 (SMUBP-2), mutations in which cause muscular atrophy with respiratory distress type 1 [ ].
Protein Domain
Name: Prenyl protease-related
Type: Family
Description: Members of this protein family are fusion proteins of exosortase (N-terminal) and a CAAX prenyl protease domain (C-terminal). Members are restricted to the alpha Proteobacteria. The variant C-terminal protein sequence VPEID-CTERM occurs only in these species, often adjacent [ ]. In eukaryotes, CAAX prenyl protease catalyzes three covalent modifications, including the cleavage and acylation at the C terminus of certain proteins in a process connected to protein sorting. This entry describes a bacterial protein homologous to one domain of the CAAX-processing enzyme. Members of this protein are found in genomes that carry a predicted protein sorting system, PEP-CTERM/exosortase, usually in the vicinity of the EpsH homologue that is the hallmark of the system. The function of this protein is unknown, but it may relate to protein modification.
Protein Domain
Name: Outer membrane lipoprotein LolB
Type: Family
Description: This protein, LolB, is known so far only in the gamma subdivision of the Proteobacteria. It is a processed, lipid-modified outer membrane protein. In Escherichia coli, lipoproteins are anchored to theperiplasmic side of either the inner or outer membrane through N-terminal lipids, depending on the lipoprotein-sorting signal present at position 2 []. Five Lol proteins are involved in the sorting and outer membrane localization of lipoproteins. LolCDE, an ATPbinding cassette (ABC) transporter, in the inner membrane releases outer membrane-directed lipoproteins from the inner membrane in an ATP-dependent manner, leading to the formation of a water-soluble complex between the lipoprotein and LolA. The LolA-lipoprotein complex crosses the periplasm and then interacts with outer membrane receptor LolB, which is essential for the anchoring of lipoproteins to the outer membrane.
Protein Domain
Name: LanC-like protein, eukaryotic
Type: Family
Description: The LanC-like protein superfamily encompasses a highly divergent group of peptide-modifying enzymes, including the eukaryotic and bacterial lanthionine synthetase C-like proteins (LanC) [ , , ]; subtilin biosynthesis protein SpaC from Bacillus subtilis [, ]; epidermin biosynthesis protein EpiC from Staphylococcus epidermidis []; nisin biosynthesis protein NisC from Lactococcus lactis [, , ]; GCR2 from Arabidopsis thaliana (Mouse-ear cress) []; and many others. The 3D structure of the lantibiotic cyclase from L. lactis has been determined by X-ray crystallography to 2.5A resolution [ ]. The globular structure is characterised by an all-α fold, in which an outer ring of helices envelops an inner toroid composed of 7 shorter, hydrophobic helices. This 7-fold hydrophobic periodicity has led several authors to claim various members of the family, including eukaryotic LanC-1 and GCR2, to be novel G protein-coupled receptors [, ]; some of these claims have since been corrected [, , ]. The eukaryotic lanthionine synthetase C-like proteins 1-3, are relatives of the bacterial lanthionine synthetase components C (LanC) [ , , ]. They are ubiquitous in nature, being variously expressed in brain, spinal cord, pituitary gland, kidney, heart, skeletal muscle, pancreas, ovary and testis. LanC-like protein 1 is a glutathione-binding protein []. LanC-like protein 2 is a bystander gene co-amplified and overexpressed with epidermal growth factor receptor (EGFR) in 20% of glioblastomas; its exogenous expression in a sarcoma cell line decreases the expression of ABCB1 (P-glycoprotein 1) and increases cellular sensitivity to the anticancer drug adriamycin [].
Protein Domain
Name: DNA/RNA helicase, ATP-dependent, DEAH-box type, conserved site
Type: Conserved_site
Description: A number of eukaryotic and prokaryotic proteins have been characterised [ , , ] on the basis of their structural similarity. They all seem to be involved in ATP-dependent, nucleic-acid unwinding. There are two subfamilies of such proteins, the D-E-A-D-box and D-E-A-H-box families. Proteins that belong to the subfamily which have His instead of the second Asp are said to be 'D-E-A-H-box' proteins [, , ]. Proteins currently known to belong to this subfamily include yeast PRP2, PRP16, PRP22 and PRP43, involved in various ATP-requiring steps of the pre-mRNA splicing process; fission yeast prh1, which my be involved in pre-mRNA splicing; Drosophila male-less (mle) protein required in males for dosage compensation of X chromosome linked genes; yeast RAD3, a DNA helicase involved in excision repair of DNA damaged by UV light, bulky adducts or cross-linking agents; fission yeast rad15 (rhp3) and mammalian DNA excision repair protein XPD (ERCC-2); yeast CHL1 (or CTF1), which is important for chromosome transmission and normal cell cycle progression in G(2)/M; yeast TPS1, Caenorhabditis elegans hypothetical proteins C06E1.10 and K03H1.2; Poxviruses' early transcription factor 70kDa subunit which acts with RNA polymerase to initiate transcription from early gene promoters; Vaccinia virus putative helicase I8; and Escherichia coli putative RNA helicase hrpA. All these proteins share a number of conserved sequence motifs. Some of them are specific to this family while others are shared by other ATP-binding proteins or by proteins belonging to the helicases 'superfamily' [ ].
Protein Domain
Name: Haemerythrin
Type: Family
Description: The hemerythrin family is composed of hemerythrin proteins found in invertebrates, and a broader collection of bacterial and archaeal homologues. Hemerythrin is an oxygen-binding protein found in the vascular system and coelomic fluid, or in muscles (myohemerythrin) in invertebrates [ ]. Many of the homologous proteins found in prokaryotes are multi-domain proteins with signal-transducing domains such as the GGDEF diguanylate cyclase domain () and methyl-accepting chemotaxis protein (MCP) signalling domain ( ). Most hemerythrins are oxygen-carriers with a bound non-haem iron, but at least one example is a cadmium-binding protein, apparently with a role in sequestering toxic metals rather than in binding oxygen. The prokaryote with the most instances of this domain is Magnetococcus sp. MC-1, a magnetotactic bacterium.Hemerythrins and myohemerythrins [ , ] are small proteins of about 110 to 129 amino acid residues that bind two iron atoms. They are left-twisted 4-α-helical bundles, which provide a hydrophobic pocket where dioxygen binds as a peroxo species, interacting with adjacent aliphatic side chains via van der Waals forces []. In both hemerythrins and myohemerythrins, the active centre is a binuclear iron complex, bound directly to the protein via 7 amino acid side chains [], 5 His, 1 Glu and 1 Asp []. Ovohemerythrin [], a yolk protein from the leech Theromyzon tessulatum seems to belong to this family of proteins, it may play a role in the detoxification of free iron after a blood meal [].This entry represents the haemerythrin family of proteins.
Protein Domain
Name: Ezrin/radixin/moesin
Type: Family
Description: This entry represents ERM family of proteins. The ERM family consists of three closely-related proteins, ezrin, radixin and moesin [ ]. Ezrin was first identified as a constituent of microvilli [], radixin as a barbed, end-capping actin-modulating protein from isolated junctional fractions [], and moesin as a heparin binding protein [], which is particularly important in immunity acting on both T and B-cells homeostasis and self-tolerance [, ]. Members of this family have been associated with axon-associated Schwann cell (SC) motility and the maintenance of the polarity of these cells []. A tumour suppressor molecule responsible for neurofibromatosis type 2 (NF2) is highly similar to ERM proteins and has been designated merlin (moesin-ezrin-radixin-like protein) []. ERM molecules contain 3 domains, an N-terminal globular domain, an extended α-helical domain and a charged C-terminal domain [ ]. Ezrin, radixin and merlin also contain a polyproline region between the helical and C-terminal domains. The N-terminal domain is highly conserved, and is also found in merlin, band 4.1 proteins and members of the band 4.1 superfamily, designated the FERM domain []. ERM proteins crosslink actin filaments with plasma membranes. They co-localise with CD44 at actin filament plasma membrane interaction sites, associating with CD44 via their N-terminal domains and with actin filaments via their C-terminal domains []. The α-helical region is involved in intramolecular masking of protein-protein interaction sites which regulates the activity of this proteins [].
Protein Domain
Name: Bcl-2, Bcl-2 homology region 1-3
Type: Domain
Description: B cell CLL/lymphoma-2 (Bcl-2) and related proteins comprise the Bcl-2 family. Bcl-2 proteins are central regulators of caspase activation, and play a key role in cell death by regulating the integrity of the mitochondrial and endoplasmic reticulum (ER) membranes [ ]. Though originally characterised with respect to their roles in controlling outer mitochondrial membrane integrity and apoptosis, the members of the Bcl-2 family are involved in numerous cellular pathways [].Bcl-2 and its relatives are functionally classified as either antiapoptotic or proapoptotic. All members contain at least one of four conserved motifs, termed Bcl-2 Homology (BH) domains. Antiapoptotic BCL-2 proteins contain four Bcl-2 homology domains (BH1-4). The major antiapoptotic proteins are Bcl-2-related gene A1 (A1), Bcl-2, Bcl-2-related gene, long isoform (Bcl-xL), Bcl-w, and myeloid cell leukemia 1 (MCL-1). They preserve outer mitochondrial membrane (OMM) integrity by directly inhibiting the proapoptotic Bcl-2 proteins [].The proapoptotic Bcl-2 members are divided into the effector proteins and the BH3-only proteins. The effector proteins Bcl-2 antagonist killer 1 (BAK) and Bcl-2-associated x protein (BAX) were originally described to contain only BH1-3; however, structure-based alignments revealed a conserved BH4 motif [ ]. Upon activation BAK and BAX homo-oligomerise into proteolipid pores within the OMM to promote MOMP (mitochondrial outer membrane permeabilisation). The BH3-only proteins function in distinct cellular stress scenarios and are subdivided based on their ability to interact with the antiapoptotic or both the antiapoptotic and the effector proteins [].This entry covers the Bcl-2 homology regions 1, 2 and 3 (BH1-3).
Protein Domain
Name: Bcl-2-like superfamily
Type: Homologous_superfamily
Description: B cell CLL/lymphoma-2 (Bcl-2) and related proteins comprise the Bcl-2 family. Bcl-2 proteins are central regulators of caspase activation, and play a key role in cell death by regulating the integrity of the mitochondrial and endoplasmic reticulum (ER) membranes [ ]. Though originally characterised with respect to their roles in controlling outer mitochondrial membrane integrity and apoptosis, the members of the Bcl-2 family are involved in numerous cellular pathways [].Bcl-2 and its relatives are functionally classified as either antiapoptotic or proapoptotic. All members contain at least one of four conserved motifs, termed Bcl-2 Homology (BH) domains. Antiapoptotic BCL-2 proteins contain four Bcl-2 homology domains (BH1-4). The major antiapoptotic proteins are Bcl-2-related gene A1 (A1), Bcl-2, Bcl-2-related gene, long isoform (Bcl-xL), Bcl-w, and myeloid cell leukemia 1 (MCL-1). They preserve outer mitochondrial membrane (OMM) integrity by directly inhibiting the proapoptotic Bcl-2 proteins [].The proapoptotic Bcl-2 members are divided into the effector proteins and the BH3-only proteins. The effector proteins Bcl-2 antagonist killer 1 (BAK) and Bcl-2-associated x protein (BAX) were originally described to contain only BH1-3; however, structure-based alignments revealed a conserved BH4 motif [ ]. Upon activation BAK and BAX homo-oligomerise into proteolipid pores within the OMM to promote MOMP (mitochondrial outer membrane permeabilisation). The BH3-only proteins function in distinct cellular stress scenarios and are subdivided based on their ability to interact with the antiapoptotic or both the antiapoptotic and the effector proteins [].
Protein Domain
Name: Nucleocapsid (N) protein, N-terminal domain, coronavirus
Type: Domain
Description: The Nucleocapsid (N) protein is a highly immunogenic phosphoprotein also implicated in viral genome replication and in modulating cell signalling pathways. The N protein interacts with genomic and subgenomic RNA molecules. Together with the envelope protein M, it participates in genome condensation and packaging. The N protein is a highly immunogenic and abundantly expressed protein during infection, capable of inducing protective immune responses against SARS-CoV and SARS-CoV-2 [ , , , , ].Coronavirus (CoV) nucleocapsid (N) proteins have 3 highly conserved domains. The N-terminal domain (NTD) (N1b), the C-terminal domain (CTD)(N2b) and the N3 region. The N1b and N2b domains from SARS CoV, infectious bronchitis virus (IBV), human CoV 229E and mouse hepatic virus (MHV) display similar topological organisations. N proteins form dimers, which are asymmetrically arranged into octamers via their N2b domains [ ]. Both domains contribute to the binding of viral RNA genome [, ].This entry represents the N-terminal domain of the nucleocapsid (N) protein predominantly from Coronavirus. It exhibits a U-shaped structure, with two arms rich in basic residues, providing a module for specific interaction with RNA [ , ]. The overall structure of the N-terminal domain found in SARS-CoV-2 is similar to other reported coronavirus nucleocapsid protein N-terminal domains, but the surface charge distribution patterns are different. This domain also interacts with the viral membrane protein during virion assembly and plays a critical role in enhancing the efficiency of virus transcription and assembly. It has been identified as an important drug target [].
Protein Domain
Name: CD44 antigen-like
Type: Family
Description: CD44, also known as the hyaluronan receptor [], is a polymorphic cell-surface glycoprotein synthesised in a varietyof cells. The protein interacts with actin-based cytoskeletons, and co-localises with ERM proteins (ezrin, radixin and moesin) at actin filament- plasma membrane interaction sites. CD44 may be involved in cell migration, adhesion and differentiation in normal cells, as well as in metastasis in cancer cells. It is a receptor for extracellular materials, such as soluble or cell-bound hyaluronic acid, collagen, fibronectin and serglycin. The protein has a single membrane-spanning domain and has a heavily glycosylated extracellular domain; its cytoplasmic domain is reportedly associated with an ankyrin-like protein [].Some of the proteins in this group are responsible for the molecular basis of the blood group antigens, surface markers on the outside of the red blood cell membrane. Most of these markers are proteins, but some are carbohydrates attached to lipids or proteins [Reid M.E., Lomas-Francis C. The Blood Group Antigen FactsBook Academic Press, London / San Diego, (1997)]. CD44 antigen (Phagocytic glycoprotein I) belongs to the Indian blood group system and is associated with In(a/b) antigen.In addition to CD44 antigens this entry also includes lymphatic vessel endothelial hyaluronic acid receptor 1, also known as cell surface retention sequence binding protein-1 (CRSBP-1). CRSBP-1, a membrane glycoprotein, can mediate cell-surface retention of secreted growth factors containing CRS motifs such as PDGF-BB. The role of CRSBP-1 is unknown [ ].This entry appears to be restricted to metazoa.
Protein Domain
Name: Nucleocapsid protein, N-terminal
Type: Homologous_superfamily
Description: The Nucleocapsid (N) protein is a highly immunogenic phosphoprotein also implicated in viral genome replication and in modulating cell signalling pathways. The N protein interacts with genomic and subgenomic RNA molecules. Together with the envelope protein M, it participates in genome condensation and packaging. The N protein is a highly immunogenic and abundantly expressed protein during infection, capable of inducing protective immune responses against SARS-CoV and SARS-CoV-2 [ , , , , ].Coronavirus (CoV) nucleocapsid (N) proteins have 3 highly conserved domains. The N-terminal domain (NTD) (N1b), the C-terminal domain (CTD)(N2b) and the N3 region. The N1b and N2b domains from SARS CoV, infectious bronchitis virus (IBV), human CoV 229E and mouse hepatic virus (MHV) display similar topological organisations. N proteins form dimers, which are asymmetrically arranged into octamers via their N2b domains [ ]. Both domains contribute to the binding of viral RNA genome [, ].This entry represents the N-terminal domain of the nucleocapsid (N) protein predominantly from Coronavirus. It exhibits a U-shaped structure, with two arms rich in basic residues, providing a module for specific interaction with RNA [ , ]. The overall structure of the N-terminal domain found in SARS-CoV-2 is similar to other reported coronavirus nucleocapsid protein N-terminal domains, but the surface charge distribution patterns are different. This domain also interacts with the viral membrane protein during virion assembly and plays a critical role in enhancing the efficiency of virus transcription and assembly. It has been identified as an important drug target [].
Protein Domain
Name: Ezrin/radixin/moesin, C-terminal
Type: Domain
Description: The ERM family consists of three closely-related proteins, ezrin, radixin and moesin [ ]. Ezrin was first identified as a constituent of microvilli [], radixin as a barbed, end-capping actin-modulating protein from isolated junctional fractions [], and moesin as a heparin binding protein [], which is particularly important in immunity acting on both T and B-cells homeostasis and self-tolerance [, ]. Members of this family have been associated with axon-associated Schwann cell (SC) motility and the maintenance of the polarity of these cells []. A tumour suppressor molecule responsible for neurofibromatosis type 2 (NF2) is highly similar to ERM proteins and has been designated merlin (moesin-ezrin-radixin-like protein) []. ERM molecules contain 3 domains, an N-terminal globular domain, an extended α-helical domain and a charged C-terminal domain []. Ezrin, radixin and merlin also contain a polyproline region between the helical and C-terminal domains. The N-terminal domain is highly conserved, and is also found in merlin, band 4.1 proteins and members of the band 4.1 superfamily, designated the FERM domain []. ERM proteins crosslink actin filaments with plasma membranes. They co-localise with CD44 at actin filament plasma membrane interaction sites, associating with CD44 via their N-terminal domains and with actin filaments via their C-terminal domains []. The α-helical region is involved in intramolecular masking of protein-protein interaction sites which regulates the activity of this proteins [].This entry represents the C-terminal domain of ERM family of proteins which corresponds to the actin-binding tail domain [, ].
Protein Domain
Name: RNA polymerase, phosphoprotein P, C-terminal XD, paramyxovirinae
Type: Homologous_superfamily
Description: Paramyxovirinae has a negative-sense ssRNA genome that is packaged by the viral nucleoprotein (N) within a helical nucleocapsid. The N-RNA (nucleoprotein-RNA) complex is used as a template for both transcription and replication. During viral genome replication, the synthesis of viral RNA and its encapsidation by N are concomitant. Viral transcription and replication are carried out by viral RNA-dependent RNA polymerase, which consists of two proteins: L polymerase and phosphoprotein P. The L polymerase carries the enzyme activity. Phosphoprotein P binds the viral nucleocapsid, and positions the L polymerase on the template for transcription and replication formed by nucleoprotein-RNA (N-RNA). Phosphoprotein P, an indispensable subunit of the viral polymerase complex, is a modular protein organised into two moieties that are both functionally and structurally distinct: a well-conserved C-terminal moiety that contains all the regions required for transcription, and a poorly conserved, intrinsically unstructured N-terminal moiety that provides several additional functions required for replication. The N-terminal moiety is responsible for binding to newly synthesized free N(0) (nucleoprotein that has not yet bound RNA), in order to prevent the binding of N(0) to cellular RNA. The C-terminal moiety consists of an oligomerisation domain, an N-RNA (nucleoprotein-RNA)-binding domain and an L polymerase-binding domain. This entry represents the XD domain found at the extreme C-terminal of the C-terminal moiety of phosphoprotein P from several Paramyxovirinae, including Measles virus [ , ], Canine distemper virus and Sendai virus [, , ].
Protein Domain
Name: Bcl-2 family
Type: Family
Description: B cell CLL/lymphoma-2 (Bcl-2) and related proteins comprise the Bcl-2 family. Bcl-2 proteins are central regulators of caspase activation, and play a key role in cell death by regulating the integrity of the mitochondrial and endoplasmic reticulum (ER) membranes [ ]. Though originally characterised with respect to their roles in controlling outer mitochondrial membrane integrity and apoptosis, the members of the Bcl-2 family are involved in numerous cellular pathways [].Bcl-2 and its relatives are functionally classified as either antiapoptotic or proapoptotic. All members contain at least one of four conserved motifs, termed Bcl-2 Homology (BH) domains. Antiapoptotic BCL-2 proteins contain four Bcl-2 homology domains (BH1-4). The major antiapoptotic proteins are Bcl-2-related gene A1 (A1), Bcl-2, Bcl-2-related gene, long isoform (Bcl-xL), Bcl-w, and myeloid cell leukemia 1 (MCL-1). They preserve outer mitochondrial membrane (OMM) integrity by directly inhibiting the proapoptotic Bcl-2 proteins [].The proapoptotic Bcl-2 members are divided into the effector proteins and the BH3-only proteins. The effector proteins Bcl-2 antagonist killer 1 (BAK) and Bcl-2-associated x protein (BAX) were originally described to contain only BH1-3; however, structure-based alignments revealed a conserved BH4 motif [ ]. Upon activation BAK and BAX homo-oligomerise into proteolipid pores within the OMM to promote MOMP (mitochondrial outer membrane permeabilisation). The BH3-only proteins function in distinct cellular stress scenarios and are subdivided based on their ability to interact with the antiapoptotic or both the antiapoptotic and the effector proteins [].
Protein Domain
Name: Ribosomal S24e conserved site
Type: Conserved_site
Description: Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [ , ]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [ , ].This conserved site is found in S24 ribosomal proteins from eukaryotes and archaebacteria.
Protein Domain
Name: Transcription regulator HTH, Crp-type, conserved site
Type: Conserved_site
Description: The crp-type HTH domain is a DNA-binding, winged helix-turn-helix (wHTH) domain of about 70-75 amino acids present in transcription regulators of the crp-fnr family, involved in the control of virulence factors, enzymes of aromatic ring degradation, nitrogen fixation, photosynthesis, and various types of respiration. The crp-fnr family is named after the first members identified in Escherichia coli: the well characterised cyclic AMP receptor protein CRP or CAP (catabolite activator protein) and the fumarate and nitrate reductase regulator Fnr. crp-type HTH domain proteins occur in most bacteria and in chloroplasts of red algae. The DNA-binding HTH domain is located in the C-terminal part; the N-terminal part of the proteins of the crp-fnr family contains a nucleotide-binding domain and a dimerisation/linker helix occurs in between. The crp-fnr regulators predominantly act as transcription activators, but can also be important repressors, and respond to diverse intracellular and exogenous signals, such as cAMP, anoxia, redox state, oxidative and nitrosative stress, carbon monoxide, nitric oxide or temperature [ ]. The structure of the crp-type DNA-binding domain shows that the helices (H) forming the helix-turn-helix motif (H2-H3) are flanked by two β-hairpin (B) wings, in the topology H1-B1-B2-H2-H3-B3-B4. Helix 3 is termed the recognition helix, as in most wHTHs it binds the DNA major groove [ , , ]. Some proteins known to contain a Crp-type HTH domain: E. coli crp (also known as cAMP receptor), a protein that complexes with cAMP and regulates the transcription of several catabolite-sensitive operons. E. coli fnr, a protein that activates genes for proteins involved in a variety of anaerobic electron transport systems. Rhizobium leguminosarum fnrN, a transcription regulator of nitrogen fixation. Rhodobacter sphaeroides (Rhodopseudomonas sphaeroides) fnrL, a transcription activator of genes for heme biosynthesis, bacteriochlorophyll synthesis and the light-harvesting complex LHII. Rhizobiacae fixK, a protein that regulates nitrogen fixation genes, both positively and negatively. Lactobacillus casei fnr-like protein flp, a putative regulatory protein linked to the trpDCFBA operon. Cyanobacteria ntcA, a regulator of the expression of genes subject to nitrogen control. Xanthomonas campestris clp, a protein involved in the regulation of phytopathogenicity. Clp controls the production of extracellular enzymes, xanthan gum and pigment, either positively or negatively.
Protein Domain
Name: UmuC domain
Type: Domain
Description: In Escherichia coli, UV and many chemicals appear to cause mutagenesis by a process of translesion synthesis that requires DNA polymerase III and the SOS-regulated proteins UmuD, UmuC and RecA. This machinery allows the replication to continue through DNA lesion, and therefore avoid lethal interruption of DNA replication after DNA damage [ ]. UmuC is a well conserved protein in prokaryotes, with a homologue in yeast species.Proteins known to contain an UmuC domain are listed below: E. coli MucB protein. Plasmid-born analogue of the UmuC protein.Saccharomyces cerevisiae (Baker's yeast) Rev1 protein. Homologue of UmuC also required for normal induction of mutations by physical and chemical agents. Salmonella typhimurium ImpB protein. Plasmid-born analogue of the UmuC protein.Bacterial UmuC protein.E. coli DNA-damage-inducible protein P (DinP).S. typhimurium SamB homologue of UmuC plasmid associated.
Protein Domain
Name: Type III secretion, needle-protein-like superfamily
Type: Homologous_superfamily
Description: This entry represents bacterial type III secretion system needle-like proteins. Type III secretion systems are essential virulence determinants for many Gram-negative bacterial pathogens, acting to translocate proteins, usually virulence factors, out across both inner and outer membranes of bacteria and into the cytoplasm of the host cell. These proteins include:Needle proteins, including MxiH, YscF, EscF, PscF, EprI, that form the needle of the injection apparatus. For instance, MxiH is an extracellular alpha helical needle that is required for translocation of effector proteins into host cells, and once inside, the effector proteins subvert normal cell function to aid infection [ ].YscI (Yop proteins translocation protein I) in Yersinia and HrpB (hypersensitivity response and pathogenicity protein B) in plant pathogens such as Pseudomonas syringae. YscI is involved in the translocation of Yop proteins across the bacterial membrane or in the specific control of this function.
Protein Domain
Name: Bet v I type allergen
Type: Family
Description: This family consists of a number of plant proteins that are structurally related [ , , ] and seem to be involved in pathogen defence response. These proteins include:Bet v I, the major pollen allergen from white birch. Bet v I is the main cause of type I allergic reactions observed in early spring.Aln g I, the major pollen allergen from alder.Api G I, the major allergen from celery.Car b I, the major pollen allergen from hornbeam.Cor a I, the major pollen allergen from hazel.Mal d I, the major pollen allergen from apple.Asparagus wound-induced protein AoPR1.Kidney bean pathogenesis-related proteins 1 and 2.Parsley pathogenesis-related proteins PR1-1 and PR1-3.Pea disease resistance response proteins pI49, pI176 and DRRG49-C.Pea abscisic acid-responsive proteins ABR17 and ABR18.Potato pathogenesis-related proteins STH-2 and STH-21.Soybean stress-induced protein SAM22.Major strawberry allergen proteins Fra a 1-2, 1-3 and 1.04 to 1.08
Protein Domain
Name: Type III secretion, needle-protein-like
Type: Family
Description: This entry represents bacterial type III secretion system needle-like proteins. Type III secretion systems are essential virulence determinants for many Gram-negative bacterial pathogens, acting to translocate proteins, usually virulence factors, out across both inner and outer membranes of bacteria and into the cytoplasm of the host cell. These proteins include:Needle proteins, including MxiH, YscF, EscF, PscF, EprI, that form the needle of the injection apparatus. For instance, MxiH is an extracellular alpha helical needle that is required for translocation of effector proteins into host cells, and once inside, the effector proteins subvert normal cell function to aid infection [ ].YscI (Yop proteins translocation protein I) in Yersinia and HrpB (hypersensitivity response and pathogenicity protein B) in plant pathogens such as Pseudomonas syringae. YscI is involved in the translocation of Yop proteins across the bacterial membrane or in the specific control of this function.
Protein Domain
Name: Coatomer beta subunit (COPB1)
Type: Family
Description: Proteins synthesised on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. This traffic is bidirectional, to ensure that proteins required to form vesicles are recycled. Vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transfer [ ]. While clathrin mediates endocytic protein transport, and transport from ER to Golgi, coatomers primarily mediate intra-Golgi transport, as well as the reverse Golgi to ER transport of dilysine-tagged proteins []. For example, the coatomer COP1 (coat protein complex 1) is responsible for reverse transport of recycled proteins from Golgi and pre-Golgi compartments back to the ER, while COPII buds vesicles from the ER to the Golgi []. Coatomers reversibly associate with Golgi (non-clathrin-coated) vesicles to mediate protein transport and for budding from Golgi membranes []. Activated small guanine triphosphatases (GTPases) attract coat proteins to specific membrane export sites, thereby linking coatomers to export cargos. As coat proteins polymerise, vesicles are formed and budded from membrane-bound organelles. Coatomer complexes also influence Golgi structural integrity, as well as the processing, activity, and endocytic recycling of LDL receptors. In mammals, coatomer complexes can only be recruited by membranes associated to ADP-ribosylation factors (ARFs), which are small GTP-binding proteins. Coatomer complexes are hetero-oligomers composed of at least an alpha, beta, beta', gamma, delta, epsilon and zeta subunits. This group represents the coatomer beta subunit.
Protein Domain
Name: Coatomer beta' subunit (COPB2)
Type: Family
Description: Proteins synthesised on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. This traffic is bidirectional, to ensure that proteins required to form vesicles are recycled. Vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transfer [ ]. While clathrin mediates endocytic protein transport, and transport from ER to Golgi, coatomers primarily mediate intra-Golgi transport, as well as the reverse Golgi to ER transport of dilysine-tagged proteins []. For example, the coatomer COP1 (coat protein complex 1) is responsible for reverse transport of recycled proteins from Golgi and pre-Golgi compartments back to the ER, while COPII buds vesicles from the ER to the Golgi []. Coatomers reversibly associate with Golgi (non-clathrin-coated) vesicles to mediate protein transport and for budding from Golgi membranes []. Activated small guanine triphosphatases (GTPases) attract coat proteins to specific membrane export sites, thereby linking coatomers to export cargos. As coat proteins polymerise, vesicles are formed and budded from membrane-bound organelles. Coatomer complexes also influence Golgi structural integrity, as well as the processing, activity, and endocytic recycling of LDL receptors. In mammals, coatomer complexes can only be recruited by membranes associated to ADP-ribosylation factors (ARFs), which are small GTP-binding proteins. Coatomer complexes are hetero-oligomers composed of at least an alpha, beta, beta', gamma, delta, epsilon and zeta subunits. This group represents the coatomer beta' subunit.
Protein Domain
Name: GTD-binding domain
Type: Domain
Description: The GTD-binding domain is a plant-specific protein-protein interaction domain. It emerged in primitive land plants and founded a multigene family that isconserved in all flowering plants. Proteins with GTD-binding domains fall into four groups, where group 1-3 contain the GTD-binding domain at the C-terminalhalf of the protein and one (group 2) or more (group1) predicted transmembrane domains, or an endoplasmic reticulum signal peptide (group 3) at the N-terminus, whereas group 4 contains the GTD-binding domain near the N terminus. GTD-binding domain proteins may constitute a family of myosin receptors, whichare associated with the surface of specific plant organelles, bind to the globular tail domain (GTD) of myosin motor proteins, and thereby promoteactin-dependent organelle motility. It seems likely that myosin binding is a common property of the GTD-binding domain, whereas the ability of FLOURY1 tobind maize-specific zeins is a specific feature of this endoplasmic reticulum (ER)-associated protein.The GTD-binding domain is predicted to adopt a coiled-coil structure.Some proteins known to contain a GTD-binding domain are listed below:Maize FLOURY1 (FL1), which belongs to group 1. Its GTD-binding domain facilitates the localization of 22kDa alpha-zein [].Arabidopsis myosin binding (MyoB) proteins 1-6 and 7, which belong respectively to group 3 and 4. They bind to myosin XI [].Tobacco RAC5 interacting subapical pollen tube protein (RISAP), which belongs to group 3. It binds via its GTD-binding domain to the GTD domainof a pollen tube myosin XI [ ].Lily LLP13, which belongs to group 3. It is likely a cytoskeleton-binding protein that binds with intermediate filaments (IFs) that potentially existin pollen tubes [ ].
Protein Domain
Name: Coatomer gamma subunit
Type: Family
Description: Proteins synthesised on the ribosome and processed in the endoplasmic reticulum are transported from the Golgi apparatus to the trans-Golgi network (TGN), and from there via small carrier vesicles to their final destination compartment. This traffic is bidirectional, to ensure that proteins required to form vesicles are recycled. Vesicles have specific coat proteins (such as clathrin or coatomer) that are important for cargo selection and direction of transfer [ ]. While clathrin mediates endocytic protein transport, and transport from ER to Golgi, coatomers primarily mediate intra-Golgi transport, as well as the reverse Golgi to ER transport of dilysine-tagged proteins []. For example, the coatomer COP1 (coat protein complex 1) is responsible for reverse transport of recycled proteins from Golgi and pre-Golgi compartments back to the ER, while COPII buds vesicles from the ER to the Golgi []. Coatomers reversibly associate with Golgi (non-clathrin-coated) vesicles to mediate protein transport and for budding from Golgi membranes []. Activated small guanine triphosphatases (GTPases) attract coat proteins to specific membrane export sites, thereby linking coatomers to export cargos. As coat proteins polymerise, vesicles are formed and budded from membrane-bound organelles. Coatomer complexes also influence Golgi structural integrity, as well as the processing, activity, and endocytic recycling of LDL receptors. In mammals, coatomer complexes can only be recruited by membranes associated to ADP-ribosylation factors (ARFs), which are small GTP-binding proteins. Coatomer complexes are hetero-oligomers composed of at least an alpha, beta, beta', gamma, delta, epsilon and zeta subunits. This group represents the coatomer gamma subunit.
Protein Domain
Name: SidE, PDE domain
Type: Domain
Description: This entry represents the phosphodiesterase (PDE) domain from the SidE family of proteins found in the Dot/Icm pathway of Legionella pneumophila bacteria. This domain catalyses the conjugation of ADP-ribosylated ubiquitin (ADPR-Ub), previously synthesised by the mono-ADP-ribosyltransferase (mART) domain, to a serine residue on substrates to generate a protein-phosphoribosyl-Ub (PR-Ub) product [ ]. These pathogen proteins are secreted bacterial effector proteins, which are translocated inside the host cell using the Dot/Icm secretion system [, ]. The SidE family of enzymes carry out ubiquitination of host cell proteins in an E1/E2-independent manner [, ].The SidE family includes four large proteins SidE, SdeA, SdeB, and SdeC, required for efficient intracellular bacterial replication and catalyses ubiquitination in an E1/E2-independent manner. These proteins contain four domains: a DUB domain, a phosphodiesterase (PDE) domain, a mono-ADP-ribosyltransferase (mART) domain, and a coiled-coil (CC) domain [ , ].Ubiquitination is a post-translational modification that regulates many cellular processes, the conventional ubiquitination cascade culminates in a covalent linkage between the C terminus of ubiquitin (Ub) and a target protein [ , ]. SidE family proteins can catalyse the non-canonical ubiquitination of several different substrate proteins, including Rab small GTPases, Reticulon-4 (Rtn4), and Rag small GTPases, as well as SidE proteins themselves []. This specificity resides on the specific ubiquitin-binding surfaces of mART and the unique features of the PDE domain [, ].The DUB activity of SdeA is important for regulating the dynamics of ubiquitin association with the bacterial phagosome, but is not necessary for its role in intracellular bacterial replication [ , ].
Protein Domain
Name: Hepatitis C virus core protein, chain A superfamily
Type: Homologous_superfamily
Description: This superfamily represents the N-terminal chain A region of the Hepatitis C virus core protein. The HCV core protein is located at the N terminus of the polyprotein and is followed by the signal sequence located between the core protein and the E1 envelope glycoprotein. This signal sequence targets the nascent HCV polyprotein to the endoplasmic reticulum (ER), allowing the translocation of E1 to the ER lumen. Cleavage by a signal peptidase in the ER lumen releases the N-terminal end of E1, leaving the 191-amino acids (aa) core protein anchored by its C-terminal signal peptide [ , ]. This 191aa polypeptide, also known as p23, is the immature form of the core protein; p23 is further processed by an intramembrane protease, the signal peptide peptidase (SPP), that removes the ER anchor , releasing p21, the N-terminal 179aa mature form of the core protein []. Core protein (p21) is responsible for packaging viral RNA to form a viral nucleocapsid, and it also promotes virion budding []. Two domains have been identified in the mature form of the HCV core protein, based on predicted structural and functional characteristics [ ]. Domain I, corresponding to the N-terminal region of approximately 120 aa, is a highly basic domain that is probably involved in the recruitment ofviral RNA during particle morphogenesis. Domain II, located between aa 120 and aa 175, is a hydrophobic region predicted to form one or two α-helices that are probably involved in the association of core with the ER membrane and lipid droplets.
Protein Domain
Name: FeS cluster assembly SUF system, ATPase SufC
Type: Family
Description: Iron-sulphur (FeS) clusters are important cofactors for numerous proteins involved in electron transfer, in redox and non-redox catalysis, in gene regulation, and as sensors of oxygen and iron. These functions depend on the various FeS cluster prosthetic groups, the most common being [2Fe-2S] and [4Fe-4S][ ]. FeS cluster assembly is a complex process involving the mobilisation of Fe and S atoms from storage sources, their assembly into [Fe-S]form, their transport to specific cellular locations, and their transfer to recipient apoproteins. So far, three FeS assembly machineries have been identified, which are capable of synthesising all types of [Fe-S] clusters: ISC (iron-sulphur cluster), SUF (sulphur assimilation), and NIF (nitrogen fixation) systems.The ISC system is conserved in eubacteria and eukaryotes (mitochondria), and has broad specificity, targeting general FeS proteins [ , ]. It is encoded by the isc operon (iscRSUA-hscBA-fdx-iscX). IscS is a cysteine desulphurase, which obtains S from cysteine (converting it to alanine) and serves as a S donor for FeS cluster assembly. IscU and IscA act as scaffolds to accept S and Fe atoms, assembling clusters and transferring them to recipient apoproteins. HscA is a molecular chaperone and HscB is a co-chaperone. Fdx is a [2Fe-2S]-type ferredoxin. IscR is a transcription factor that regulates expression of the isc operon. IscX (also known as YfhJ) appears to interact with IscS and may function as an Fe donor during cluster assembly [ ].The SUF system is an alternative pathway to the ISC system that operates under iron starvation and oxidative stress. It is found in eubacteria, archaea and eukaryotes (plastids). The SUF system is encoded by the suf operon (sufABCDSE), and the six encoded proteins are arranged into two complexes (SufSE and SufBCD) and one protein (SufA). SufS is a pyridoxal-phosphate (PLP) protein displaying cysteine desulphurase activity. SufE acts as a scaffold protein that accepts S from SufS and donates it to SufA [ ]. SufC is an ATPase with an unorthodox ATP-binding cassette (ABC)-like component. SufA is homologous to IscA [], acting as a scaffold protein in which Fe and S atoms are assembled into [FeS]cluster forms, which can then easily be transferred to apoproteins targets. In the NIF system, NifS and NifU are required for the formation of metalloclusters of nitrogenase in Azotobacter vinelandii, and other organisms, as well as in the maturation of other FeS proteins. Nitrogenase catalyses the fixation of nitrogen. It contains a complex cluster, the FeMo cofactor, which contains molybdenum, Fe and S. NifS is a cysteine desulphurase. NifU binds one Fe atom at its N-terminal, assembling an FeS cluster that is transferred to nitrogenase apoproteins [ ]. Nif proteins involved in the formation of FeS clusters can also be found in organisms that do not fix nitrogen [].This entry represents SufC, which acts as an ATPase in the SUF system. SufC belongs to the ATP-binding cassette transporter family ( ) but is no longer thought to be part of a transporter.The complex is reported as cytosolic or associated with the membrane [ ].
Protein Domain
Name: Small GTPase Tem1/Spg1
Type: Family
Description: Small GTPases form an independent superfamily within the larger class of regulatory GTP hydrolases. This superfamily contains proteins that control a vast number of important processes and possess a common, structurally preserved GTP-binding domain [ , ]. Sequence comparisons of small G proteins from various species have revealed that they are conserved in primary structures at the level of 30-55% similarity [].Crystallographic analysis of various small G proteins revealed the presence of a 20kDa catalytic domain that is unique for the whole superfamily [ , ]. The domain is built of five alpha helices (A1-A5), six β-strands (B1-B6) and five polypeptide loops (G1-G5). A structural comparison of the GTP- and GDP-bound form, allows one to distinguish two functional loop regions: switch I and switch II that surround the gamma-phosphate group of the nucleotide. The G1 loop (also called the P-loop) that connects the B1 strand and the A1 helix is responsible for the binding of the phosphate groups. The G3 loop provides residues for Mg2 and phosphate binding and is located at the N terminus of the A2 helix. The G1 and G3 loops are sequentially similar to Walker A and Walker B boxes that are found in other nucleotide binding motifs. The G2 loop connects the A1 helix and the B2 strand and contains a conserved Thr residue responsible for Mg2 binding. The guanine base is recognised by the G4 and G5 loops. The consensus sequence NKXD of the G4 loop contains Lys and Asp residues directly interacting with the nucleotide. Part of the G5 loop located between B6 and A5 acts as a recognition site for the guanine base [].The small GTPase superfamily can be divided into at least 8 different families, including:Arf small GTPases. GTP-binding proteins involved in protein trafficking by modulating vesicle budding and uncoating within the Golgi apparatus.Ran small GTPases. GTP-binding proteins involved in nucleocytoplasmic transport. Required for the import of proteins into the nucleus and also for RNA export.Rab small GTPases. GTP-binding proteins involved in vesicular traffic.Rho small GTPases. GTP-binding proteins that control cytoskeleton reorganisation.Ras small GTPases. GTP-binding proteins involved in signalling pathways.Sar1 small GTPases. Small GTPase component of the coat protein complex II (COPII) which promotes the formation of transport vesicles from the endoplasmic reticulum (ER).Mitochondrial Rho (Miro). Small GTPase domain found in mitochondrial proteins involved in mitochondrial trafficking.Roc small GTPases domain. Small GTPase domain always found associated with the COR domain.This entry includes Tem1 from budding yeasts and Spg1 from fission yeasts. They are GTPases involved in the regulation of the cell cycle. In Schizosaccharomyces pombe, Spg1 is required for the localisation of Cdc7 (part of the septation initiation network) to the spindle pole body (SPB) []. It is regulated negatively by a GTPase-activating protein (GAP) comprising two subunits - Byr4 and Cdc16. In anaphase B, Spg1 is localised on the new SPB []. In Saccharomyces cerevisiae, Tem1 is associated with the mitotic exit network (MEN). It is involved in termination of M phase of the cell cycle [ ].
Protein Domain
Name: Kinesin motor domain, conserved site
Type: Conserved_site
Description: Kinesin [ , , ] is a microtubule-associated force-producing protein that may play a role in organelle transport. The kinesin motor activity is directed toward the microtubule's plus end. Kinesin is an oligomeric complex composed of two heavy chains and two light chains. The maintenance of the quaternary structure does not require interchain disulphide bonds.The heavy chain is composed of three structural domains: a large globular N-terminal domain which is responsible for the motor activity of kinesin (it is known to hydrolyse ATP, to bind and move on microtubules), a central α-helical coiled coil domain that mediates the heavy chain dimerisation; and a small globular C-terminal domain which interacts with other proteins (such as the kinesin light chains), vesicles and membranous organelles.The kinesin motor domain comprises five motifs, namely N1 (P-loop), N2 (Switch I), N3 (Switch II), N4 and L2 (KVD finger) [ ]. It has a mixed eight stranded β-sheet core with flanking solvent exposed α-helices and a small three-stranded antiparallel β-sheet in the N-terminal region [].A number of proteins have been recently found that contain a domain similar to that of the kinesin 'motor' domain [ , ]:Drosophila melanogaster claret segregational protein (ncd). Ncd is required for normal chromosomal segregation in meiosis, in females, and in early mitotic divisions of the embryo. The ncd motor activity is directed toward the microtubule's minus end.Homo sapiens CENP-E [ ]. CENP-E is a protein that associates with kinetochores during chromosome congression, relocates to the spindle midzone at anaphase, and is quantitatively discarded at the end of the cell division. CENP-E is probably an important motor molecule in chromosome movement and/or spindle elongation.H. sapiens mitotic kinesin-like protein-1 (MKLP-1), a motor protein whose activity is directed toward the microtubule's plus end.Saccharomyces cerevisiae KAR3 protein, which is essential for nuclear fusion during mating. KAR3 may mediate microtubule sliding during nuclear fusion and possibly mitosis.S. cerevisiae CIN8 and KIP1 proteins which are required for the assembly of the mitotic spindle. Both proteins seem to interact with spindle microtubules to produce an outwardly directed force acting upon the poles.Emericella nidulans (Aspergillus nidulans) bimC, which plays an important role in nuclear division.A. nidulans klpA.Caenorhabditis elegans unc-104, which may be required for the transport of substances needed for neuronal cell differentiation.C. elegans osm-3.Xenopus laevis Eg5, which may be involved in mitosis.Arabidopsis thaliana KatA, KatB and katC.Chlamydomonas reinhardtii FLA10/KHP1 and KLP1. Both proteins seem to play a role in the rotation or twisting of the microtubules of the flagella.C. elegans hypothetical protein T09A5.2.The kinesin motor domain is located in the N-terminal part of most of the above proteins, with the exception of KAR3, klpA, and ncd where it is located in the C-terminal section.The kinesin motor domain contains about 330 amino acids. An ATP-binding motif of type A is found near position 80 to 90, the C-terminal half of the domain is involved in microtubule-binding.The signature pattern for this entry is derived from a conserved decapeptide inside the microtubule-binding region.
Protein Domain
Name: Kinesin motor domain
Type: Domain
Description: Kinesin [ , , ] is a microtubule-associated force-producing protein that may play a role in organelle transport. The kinesin motor activity is directed toward the microtubule's plus end. Kinesin is an oligomeric complex composed of two heavy chains and two light chains. The maintenance of the quaternary structure does not require interchain disulphide bonds.The heavy chain is composed of three structural domains: a large globular N-terminal domain which is responsible for the motor activity of kinesin (it is known to hydrolyse ATP, to bind and move on microtubules), a central α-helical coiled coil domain that mediates the heavy chain dimerisation; and a small globular C-terminal domain which interacts with other proteins (such as the kinesin light chains), vesicles and membranous organelles.The kinesin motor domain comprises five motifs, namely N1 (P-loop), N2 (Switch I), N3 (Switch II), N4 and L2 (KVD finger) [ ]. It has a mixed eight stranded β-sheet core with flanking solvent exposed α-helices and a small three-stranded antiparallel β-sheet in the N-terminal region [].A number of proteins have been recently found that contain a domain similar to that of the kinesin 'motor' domain [ , ]:Drosophila melanogaster claret segregational protein (ncd). Ncd is required for normal chromosomal segregation in meiosis, in females, and in early mitotic divisions of the embryo. The ncd motor activity is directed toward the microtubule's minus end.Homo sapiens CENP-E [ ]. CENP-E is a protein that associates with kinetochores during chromosome congression, relocates to the spindle midzone at anaphase, and is quantitatively discarded at the end of the cell division. CENP-E is probably an important motor molecule in chromosome movement and/or spindle elongation.H. sapiens mitotic kinesin-like protein-1 (MKLP-1), a motor protein whose activity is directed toward the microtubule's plus end.Saccharomyces cerevisiae KAR3 protein, which is essential for nuclear fusion during mating. KAR3 may mediate microtubule sliding during nuclear fusion and possibly mitosis.S. cerevisiae CIN8 and KIP1 proteins which are required for the assembly of the mitotic spindle. Both proteins seem to interact with spindle microtubules to produce an outwardly directed force acting upon the poles.Emericella nidulans (Aspergillus nidulans) bimC, which plays an important role in nuclear division.A. nidulans klpA.Caenorhabditis elegans unc-104, which may be required for the transport of substances needed for neuronal cell differentiation.C. elegans osm-3.Xenopus laevis Eg5, which may be involved in mitosis.Arabidopsis thaliana KatA, KatB and katC.Chlamydomonas reinhardtii FLA10/KHP1 and KLP1. Both proteins seem to play a role in the rotation or twisting of the microtubules of the flagella.C. elegans hypothetical protein T09A5.2.The kinesin motor domain is located in the N-terminal part of most of the above proteins, with the exception of KAR3, klpA, and ncd where it is located in the C-terminal section.The kinesin motor domain contains about 330 amino acids. An ATP-binding motif of type A is found near position 80 to 90, the C-terminal half of the domain is involved in microtubule-binding.
Protein Domain
Name: SUF system FeS cluster assembly associated
Type: Family
Description: Iron-sulphur (FeS) clusters are important cofactors for numerous proteins involved in electron transfer, in redox and non-redox catalysis, in gene regulation, and as sensors of oxygen and iron. These functions depend on the various FeS cluster prosthetic groups, the most common being [2Fe-2S] and [4Fe-4S][ ]. FeS cluster assembly is a complex process involving the mobilisation of Fe and S atoms from storage sources, their assembly into [Fe-S]form, their transport to specific cellular locations, and their transfer to recipient apoproteins. So far, three FeS assembly machineries have been identified, which are capable of synthesising all types of [Fe-S] clusters: ISC (iron-sulphur cluster), SUF (sulphur assimilation), and NIF (nitrogen fixation) systems.The ISC system is conserved in eubacteria and eukaryotes (mitochondria), and has broad specificity, targeting general FeS proteins [ , ]. It is encoded by the isc operon (iscRSUA-hscBA-fdx-iscX). IscS is a cysteine desulphurase, which obtains S from cysteine (converting it to alanine) and serves as a S donor for FeS cluster assembly. IscU and IscA act as scaffolds to accept S and Fe atoms, assembling clusters and transferring them to recipient apoproteins. HscA is a molecular chaperone and HscB is a co-chaperone. Fdx is a [2Fe-2S]-type ferredoxin. IscR is a transcription factor that regulates expression of the isc operon. IscX (also known as YfhJ) appears to interact with IscS and may function as an Fe donor during cluster assembly [ ].The SUF system is an alternative pathway to the ISC system that operates under iron starvation and oxidative stress. It is found in eubacteria, archaea and eukaryotes (plastids). The SUF system is encoded by the suf operon (sufABCDSE), and the six encoded proteins are arranged into two complexes (SufSE and SufBCD) and one protein (SufA). SufS is a pyridoxal-phosphate (PLP) protein displaying cysteine desulphurase activity. SufE acts as a scaffold protein that accepts S from SufS and donates it to SufA []. SufC is an ATPase with an unorthodox ATP-binding cassette (ABC)-like component. SufA is homologous to IscA [], acting as a scaffold protein in which Fe and S atoms are assembled into [FeS]cluster forms, which can then easily be transferred to apoproteins targets. In the NIF system, NifS and NifU are required for the formation of metalloclusters of nitrogenase in Azotobacter vinelandii, and other organisms, as well as in the maturation of other FeS proteins. Nitrogenase catalyses the fixation of nitrogen. It contains a complex cluster, the FeMo cofactor, which contains molybdenum, Fe and S. NifS is a cysteine desulphurase. NifU binds one Fe atom at its N-terminal, assembling an FeS cluster that is transferred to nitrogenase apoproteins [ ]. Nif proteins involved in the formation of FeS clusters can also be found in organisms that do not fix nitrogen [].This entry is a subset of the larger family. Many members of are candidate ring hydroxylating complex subunits. However, members of the narrower family defined here are all found as part of the FeS assembly SUF system locus, in a subset of SUF-positive proteobacteria.
Protein Domain
Name: Kinesin motor domain superfamily
Type: Homologous_superfamily
Description: Kinesin [ , , ] is a microtubule-associated force-producing protein that may play a role in organelle transport. The kinesin motor activity is directed toward the microtubule's plus end. Kinesin is an oligomeric complex composed of two heavy chains and two light chains. The maintenance of the quaternary structure does not require interchain disulphide bonds.The heavy chain is composed of three structural domains: a large globular N-terminal domain which is responsible for the motor activity of kinesin (it is known to hydrolyse ATP, to bind and move on microtubules), a central α-helical coiled coil domain that mediates the heavy chain dimerisation; and a small globular C-terminal domain which interacts with other proteins (such as the kinesin light chains), vesicles and membranous organelles.The kinesin motor domain comprises five motifs, namely N1 (P-loop), N2 (Switch I), N3 (Switch II), N4 and L2 (KVD finger) [ ]. It has a mixed eight stranded β-sheet core with flanking solvent exposed α-helices and a small three-stranded antiparallel β-sheet in the N-terminal region [].A number of proteins have been recently found that contain a domain similar to that of the kinesin 'motor' domain [ , ]:Drosophila melanogaster claret segregational protein (ncd). Ncd is required for normal chromosomal segregation in meiosis, in females, and in early mitotic divisions of the embryo. The ncd motor activity is directed toward the microtubule's minus end.Homo sapiens CENP-E [ ]. CENP-E is a protein that associates with kinetochores during chromosome congression, relocates to the spindle midzone at anaphase, and is quantitatively discarded at the end of the cell division. CENP-E is probably an important motor molecule in chromosome movement and/or spindle elongation.H. sapiens mitotic kinesin-like protein-1 (MKLP-1), a motor protein whose activity is directed toward the microtubule's plus end.Saccharomyces cerevisiae KAR3 protein, which is essential for nuclear fusion during mating. KAR3 may mediate microtubule sliding during nuclear fusion and possibly mitosis.S. cerevisiae CIN8 and KIP1 proteins which are required for the assembly of the mitotic spindle. Both proteins seem to interact with spindle microtubules to produce an outwardly directed force acting upon the poles.Emericella nidulans (Aspergillus nidulans) bimC, which plays an important role in nuclear division.A. nidulans klpA.Caenorhabditis elegans unc-104, which may be required for the transport of substances needed for neuronal cell differentiation.C. elegans osm-3.Xenopus laevis Eg5, which may be involved in mitosis.Arabidopsis thaliana KatA, KatB and katC.Chlamydomonas reinhardtii FLA10/KHP1 and KLP1. Both proteins seem to play a role in the rotation or twisting of the microtubules of the flagella.C. elegans hypothetical protein T09A5.2.The kinesin motor domain is located in the N-terminal part of most of the above proteins, with the exception of KAR3, klpA, and ncd where it is located in the C-terminal section.The kinesin motor domain contains about 330 amino acids. An ATP-binding motif of type A is found near position 80 to 90, the C-terminal half of the domain is involved in microtubule-binding.Interestingly, kinesin motor domain has a striking structural similarity to the core of the catalytic domain of the actin-based motor myosin [ ].
Protein Domain
Name: Hepatitis C virus, NS3 protease, Peptidase S29
Type: Domain
Description: Although Hepatitis A virus, Hepatitis B virus, and Hepatitis C virus have similar names, because they all cause liver inflammation, these are distinctly different viruses both genetically and clinically. The Hepatitis C virus (HCV) is a small (50-80 nm in diameter), enveloped, single-stranded, positive sense RNA virus. It is member of the family Flaviviridae. There are seven genotypes and a number of subtypes with diverse geographic distributions. The genome of HCV consists of a single open reading frame. At the 5' and 3' ends of the RNA are the UTR regions that are not translated into proteins but are important to translation and replication of the viral RNA. The 5' UTR has a ribosome binding site (IRES - Internal ribosome entry site) that starts the translation of unique polyprotein that is later cut by cellular and viral proteases into 10 active structural and non-structural smaller proteins []. This signature identifies the Hepatitis C virus NS3 protein as a serine protease which belongs to MEROPS peptidase family S29 (hepacivirin family, clan PA(S)), which has a trypsin-like fold. The non-structural (NS) protein NS3 is one of the NS proteins involved in replication of the HCV genome. The NS2 proteinase ( ), a zinc-dependent enzyme, performs a single proteolytic cut to release the N terminus of NS3. The action of NS3 proteinase (NS3P), which resides in the N-terminal one-third of the NS3 protein, then yields all remaining non-structural proteins. The C-terminal two-thirds of the NS3 protein contain a helicase. The functional relationship between the proteinase and helicase domains is unknown. NS3 has a structural zinc-binding site and requires cofactor NS4. It has been suggested that the NS3 serine protease of hepatitus C is involved in cell transformation and that the ability to transform requires an active enzyme [ ].Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes [ ]. They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Many families of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence []. Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases [].Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base [ ]. The geometric orientations of the catalytic residues are similar between families, despite different protein folds []. The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) [, ].
Protein Domain
Name: Chaperone lipoprotein, PulS/OutS-like
Type: Family
Description: This family comprises lipoproteins from gamma proteobacterial species: pullulanase secretion protein PulS protein of Klebsiella pneumoniae ( ), the lipoprotein OutS protein of Erwinia chrysanthemi ( ) and the functionally uncharacterised type II secretion protein EtpO ( ) from Escherichia coli O157:H7. PulS and OutS have been shown to interact with and facilitate insertion of secretins into the outer membrane, suggesting a chaperone-like, or piloting function for members of this family. These proteins consist of 4 alpha helices arranged into a slightly elongated, C-shaped helical bundle. Within PulS, the helices form a binding groove which interact with a disordered region on another protein, PulD, to form a PulS-PulD complex. PulD then oligomerises into a dodecameric outer membrane complex in the formation of a type II secretion system [ ].In addition to the PulS/OutS proteins this entry also includes other functionally uncharacterised proteins, such as YacC from Escherischia coli ( ).
Protein Domain
Name: PulS/OutS-like superfamily
Type: Homologous_superfamily
Description: This superfamily comprises lipoproteins from gamma proteobacterial species: pullulanase secretion protein PulS protein of Klebsiella pneumoniae ( ), the lipoprotein OutS protein of Erwinia chrysanthemi ( ) and the functionally uncharacterised type II secretion protein EtpO ( ) from Escherichia coli O157:H7. PulS and OutS have been shown to interact with and facilitate insertion of secretins into the outer membrane, suggesting a chaperone-like, or piloting function for members of this superfamily. These proteins consist of 4 alpha helices arranged into a slightly elongated, C-shaped helical bundle. Within PulS, the helices form a binding groove which interact with a disordered region on another protein, PulD, to form a PulS-PulD complex. PulD then oligomerises into a dodecameric outer membrane complex in the formation of a type II secretion system [ ].In addition to the PulS/OutS proteins this entry also includes other functionally uncharacterised proteins, such as YacC from Escherischia coli ( ).
Protein Domain
Name: N-end rule aminoacyl transferase
Type: Family
Description: This entry represents a family of aminoacyl-transferases that includes prokaryotic aspartate/glutamate leucyltransferase, and eukaryotic arginine-tRNA-protein transferase.Arginine-tRNA-protein transferase catalyses the post-translational conjugation of arginine to the N terminus of a protein. In eukaryotes, this functions as part of the N-end rule pathway of protein degradation by conjugating a destabilising amino acid to the N-terminal aspartate or glutamate of a protein, targeting the protein for ubiquitin-dependent proteolysis [ ]. In Saccharomyces cerevisiae, Cys20, 23, 94 and/or 95 are thought to be important for activity []. Of these, only Cys 94 appears to be completely conserved in this family. Aspartate/glutamate leucyltransferase (also known as bacterial protein transferase or Bpt) functions in the N-end rule pathway of protein degradation where it conjugates Leu from its aminoacyl-tRNA to the N-termini of proteins containing an N-terminal aspartate or glutamate. This protein shows sequence similarity to the eukaryotic N-end rule pathway component arginyl-transferase Ate1 [ ].
Protein Domain
Name: Glycine dehydrogenase (decarboxylating)
Type: Family
Description: The P protein is part of the glycine decarboxylase multienzyme complex (GDC), also annotated as glycine cleavage system or glycine synthase. GDC consists of four proteins P, H, L and T [ ]. The P protein () binds the alpha-amino group of glycine through its pyridoxal phosphate cofactor, carbon dioxide is released and the remaining methylamin moiety is then transferred to the lipoamide cofactor of the H protein. The reaction catalysed by this protein is: Glycine + lipoylprotein = S-aminomethyldihydrolipoylprotein + CO2 The subunit composition of glycine cleavage system P proteins have been classified into two types. Those from eukaryotes and some of the P proteins from prokaryotes (e.g. Escherichia coli) are in the homodimeric form. The rest of those from prokaryotes are heterotetrameric, with two different subunits which, based on sequence similarities, correspond respectively to the N and C-terminal halves of the eukaryotic subunit [ ].This entry represents the P protein homodimeric subfamily, which is found in eukaryotes and some prokaryotes, such as E. coli.
Protein Domain
Name: N-end rule aminoacyl transferase, C-terminal
Type: Domain
Description: This entry represents the C-terminal region of aminoacyl-transferases found in both eukaryotic (Arginine-tRNA-protein transferase) and prokaryotic (Aspartate/glutamate leucyltransferase) enzymes.Arginine-tRNA-protein transferase catalyses the post-translational conjugation of arginine to the N terminus of a protein. In eukaryotes, this functions as part of the N-end rule pathway of protein degradation by conjugating a destabilising amino acid to the N-terminal aspartate or glutamate of a protein, targeting the protein for ubiquitin-dependent proteolysis [ ]. In Saccharomyces cerevisiae, Cys20, 23, 94 and/or 95 are thought to be important for activity []. Of these, only Cys 94 appears to be completely conserved in this family. Aspartate/glutamate leucyltransferase (also known as bacterial protein transferase or Bpt) functions in the N-end rule pathway of protein degradation where it conjugates Leu from its aminoacyl-tRNA to the N-termini of proteins containing an N-terminal aspartate or glutamate. This protein shows sequence similarity to the eukaryotic N-end rule pathway component arginyl-transferase Ate1 [ ].
USDA
InterMine logo
The Legume Information System (LIS) is a research project of the USDA-ARS:Corn Insects and Crop Genetics Research in Ames, IA.
LegumeMine || ArachisMine | CicerMine | GlycineMine | LensMine | LupinusMine | PhaseolusMine | VignaMine | MedicagoMine
InterMine © 2002 - 2022 Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, United Kingdom