The Mediator complex is a coactivator involved in the regulated transcription of nearly all RNA polymerase II-dependent genes. Mediator functions as a bridge to convey information from gene-specific regulatory proteins to the basal RNA polymerase II transcription machinery. The Mediator complex, having a compact conformation in its free form, is recruited to promoters by direct interactions with regulatory proteins and serves for the assembly of a functional preinitiation complex with RNA polymerase II and the general transcription factors. On recruitment the Mediator complex unfolds to an extended conformation and partially surrounds RNA polymerase II, specifically interacting with the unphosphorylated form of the C-terminal domain (CTD) of RNA polymerase II. The Mediator complex dissociates from the RNA polymerase II holoenzyme and stays at the promoter when transcriptional elongation begins. The Mediator complex is composed of at least 31 subunits: MED1, MED4, MED6, MED7, MED8, MED9, MED10, MED11, MED12, MED13, MED13L, MED14, MED15, MED16, MED17, MED18, MED19, MED20, MED21, MED22, MED23, MED24, MED25, MED26, MED27, MED29, MED30, MED31, CCNC, CDK8 and CDC2L6/CDK11. The subunits form at least three structurally distinct submodules. The head and the middle modules interact directly with RNA polymerase II, whereas the elongated tail module interacts with gene-specific regulatory proteins. Mediator containing the CDK8 module is less active than Mediator lacking this module in supporting transcriptional activation.
The head module contains: MED6, MED8, MED11, SRB4/MED17, SRB5/MED18, ROX3/MED19, SRB2/MED20 and SRB6/MED22. The middle module contains: MED1, MED4, NUT1/MED5, MED7, CSE2/MED9, NUT2/MED10, SRB7/MED21 and SOH1/MED31. CSE2/MED9 interacts directly with MED4. The tail module contains: MED2, PGD1/MED3, RGR1/MED14, GAL11/MED15 and SIN4/MED16. The CDK8 module contains: MED12, MED13, CCNC and CDK8. Individual preparations of the Mediator complex lacking one or more distinct subunits have been variously termed ARC, CRSP, DRIP, PC2, SMCC and TRAP.Med19 represents a family of conserved proteins which are members of the multi-protein co-activator Mediator complex [
].
Metallothioneins (MT) are small proteins that bind heavy metals, such as zinc, copper, cadmium, nickel, etc. They have a high content of cysteine residues that bind the metal ions through clusters of thiolate bonds [
,
]. An empirical classification into three classes has been proposed by Fowler and coworkers [] and Kojima []. Members of class I are defined to include polypeptides related in the positions of their cysteines to equine MT-1B, and include mammalian MTs as well as from crustaceans and molluscs. Class II groups MTs from a variety of species, including sea urchins,fungi, insects and cyanobacteria. Class III MTs are atypical polypeptides composed of gamma-glutamylcysteinyl units [
].This original classification system has been found to be limited, in the sense that it does not allow clear differentiation of patterns of structural similarities, either between or within classes. Subsequently, a new classification was proposed on the basis of sequence similarity derived from phylogenetic relationships, which basically proposes an MT family for each main taxonomic group of organisms [
]. Mollusc MTs are 64-75 residue proteins. They usually contain 18-23 Cys, at least 13 of them are totally conserved. The protein sequence is divided into two structural domains. The Cys residues are arranged in C-X-C groups, and a C-X-X-C grouping is also observed. In particular, the consensus pattern C-x-C-x(3)-C-T-G-x(3)-C-x-C-x(3)-C-x-C-K has been shown to be diagnostic of family 2 metallothioneins. MTs locate at the C terminus of the sequence. These proteins show more similarily to the vertebrate metallothioneins than to those from other invertebrate phyla [
], and on this basis they are classified as class I metallothioneins. The protein is induced by cadmium and binds divalent cations of several transition elements, including cadmium, zinc and copper. Family 2 includes subfamilies: mo1, mo2, mog, mo, which hit the same entry, except the subfamily mog.
Haemagglutinin-esterase fusion glycoprotein (HEF) is a multi-functional protein embedded in the viral envelope of several viruses, including influenza C virus, coronaviruses and toroviruses [
,
]. HEF is required for infectivity, and functions to recognise the host cell surface receptor, to fuse the viral and host cell membranes, and to destroy the receptor upon host cell infection. The haemagglutinin region of HEF is responsible for receptor recognition and membrane fusion, and bears a strong resemblance to the sialic acid-binding haemagglutinin found in influenza A and B viruses, except that it binds 9-O-acetylsialic acid. The esterase region of HEF is responsible for the destruction of the receptor, an action that is carried out by neuraminidase in influenza A and B viruses. The esterase domain is similar in structure to Streptomyces scabies esterase, and to acetylhydrolase, thioesterase I and rhamnogalacturonan acetylesterase.The haemagglutinin-esterase glycoprotein HEF must be cleaved by the host's trypsin-like proteases to produce two peptides (HEF1 and HEF2) in order for the virus to be infectious. Once HEF is cleaved, the newly exposed N-terminal of the HEF2 peptide then acts to fuse the viral envelope to the cellular membrane of the host cell, which allows the virus to infect the host cell.The haemagglutinin-esterase glycoprotein is a trimer, where each monomer is composed of three domains: an elongated stem active in membrane fusion, an esterase domain, and a receptor-binding domain, where the stem and receptor-binding domains together resemble influenza A virus haemagglutinin. Two of these domains are composed of non-contiguous sequence: the receptor-binding haemagglutinin domain is inserted into a surface loop of the esterase domain, and the esterase domain is inserted into a surface loop of the haemagglutinin stem. This entry represents the receptor-binding haemagglutinin domain of the haemagglutinin-esterase glycoprotein.
Oxygenic photosynthesis uses two multi-subunit photosystems (I and II) located in the cell membranes of cyanobacteria and in the thylakoid membranes of chloroplasts in plants and algae. Photosystem II (PSII) has a P680 reaction centre containing chlorophyll 'a' that uses light energy to carry out the oxidation (splitting) of water molecules, and to produce ATP via a proton pump. Photosystem I (PSI) has a P700 reaction centre containing chlorophyll that takes the electron and associated hydrogen donated from PSII to reduce NADP+ to NADPH. Both ATP and NADPH are subsequently used in the light-independent reactions to convert carbon dioxide to glucose using the hydrogen atom extracted from water by PSII, releasing oxygen as a by-product.PSII is a multisubunit protein-pigment complex containing polypeptides both intrinsic and extrinsic to the photosynthetic membrane [
,
,
]. Within the core of the complex, the chlorophyll and beta-carotene pigments are mainly bound to the antenna proteins CP43 (PsbC) and CP47 (PsbB), which pass the excitation energy on to the reaction centre proteins D1 (Qb, PsbA) and D2 (Qa, PsbD) that bind all the redox-active cofactors involved in the energy conversion process. The PSII oxygen-evolving complex (OEC) oxidises water to provide protons for use by PSI, and consists of OEE1 (PsbO), OEE2 (PsbP) and OEE3 (PsbQ). The remaining subunits in PSII are of low molecular weight (less than 10kDa), and are involved in PSII assembly, stabilisation, dimerisation, and photo-protection []. This entry represents the low molecular weight transmembrane protein PsbK found in PSII, where it is tightly associated with the antenna protein CP43 (PsbC). PsbK is required for accumulation of the PSII complex, and may participate in the assembly and stability of the PSII complex. In particular, PsbK may be involved in the binding of plastoquinone and in maintaining the dimeric organisation of PSII [
,
].
Oxygenic photosynthesis uses two multi-subunit photosystems (I and II) located in the cell membranes of cyanobacteria and in the thylakoid membranes of chloroplasts in plants and algae. Photosystem II (PSII) has a P680 reaction centre containing chlorophyll 'a' that uses light energy to carry out the oxidation (splitting) of water molecules, and to produce ATP via a proton pump. Photosystem I (PSI) has a P700 reaction centre containing chlorophyll that takes the electron and associated hydrogen donated from PSII to reduce NADP+ to NADPH. Both ATP and NADPH are subsequently used in the light-independent reactions to convert carbon dioxide to glucose using the hydrogen atom extracted from water by PSII, releasing oxygen as a by-product.PSII is a multisubunit protein-pigment complex containing polypeptides both intrinsic and extrinsic to the photosynthetic membrane [
,
,
]. Within the core of the complex, the chlorophyll and beta-carotene pigments are mainly bound to the antenna proteins CP43 (PsbC) and CP47 (PsbB), which pass the excitation energy on to the reaction centre proteins D1 (Qb, PsbA) and D2 (Qa, PsbD) that bind all the redox-active cofactors involved in the energy conversion process. The PSII oxygen-evolving complex (OEC) oxidises water to provide protons for use by PSI, and consists of OEE1 (PsbO), OEE2 (PsbP) and OEE3 (PsbQ). The remaining subunits in PSII are of low molecular weight (less than 10kDa), and are involved in PSII assembly, stabilisation, dimerisation, and photo-protection []. This entry represents the low molecular weight transmembrane protein PsbT found in PSII, which is thought to be associated with the D1 (PsbA) - D2 (PsbD) heterodimer. PsbT may be involved in the formation and/or stabilisation of dimeric PSII complexes, because in the absence of this protein dimeric PSII complexes were found to be less abundant. Furthermore, although PsbT does not confer photo-protection, it is required for the efficient recovery of photo-damaged PSII [
].
Polycystic kidney diseases (PKD) are disorders characterised by large numbers of cysts distributed throughout grossly-enlarged kidneys. Cyst
development is associated with impairment of kidney function, and ultimately kidney failure and death [,
]. Most cases of autosomal dominant PKD result from mutations in the PKD1 gene that cause premature protein termination. A second gene for autosomal dominant polycystic kidney disease has been identified by positional cloning []. The predicted 968-amino acid sequence of the PKD2 gene product (polycystin-2) contains 6 transmembrane domains, with intracellular N- and C-termini. Polycystin-2 shares some similarity with the family of voltage-activated calcium (and sodium) channels, and contains a potential calcium-binding domain [].Polycystin-2 is strongly expressed in ovary, foetal and adult kidney, testis, and small intestine. Polycystin-1 requires the presence of this protein for stable expression and is believed to interact with it via its C terminus. All mutations between exons 1 and 11 result in a truncated polycystin-2 that lacks a calcium-binding EF-hand domain and the cytoplasmic domains required for the interaction of polycystin-2 with polycystin-1 [
]. PKD2, although clinically milder than PKD1, has a deleterious impact on life expectancy.This entry contains proteins belonging to the polycystin family including Mucolipin and Polycystin-1 and -2 (PKD1 and PKD2). The domain contains the cation channel region of PKD1 and PKD2 proteins. PKD1 and PKD2 may function through a common signalling pathway that is necessary for normal tubulogenesis. The PKD2 gene product has six transmembrane spans with intracellular amino- and carboxyl-termini [
].Mucolipin is a cationic channel which probably plays a role in the endocytic pathway and in the control of membrane trafficking of proteins and lipids. It could play a major role in the calcium ion transport regulating lysosomal exocytosis [
,
,
].
Transcriptional activators are believed to stimulate gene expression via protein-protein interactions with the basal machinery. The cAMP-regulated transcription factor CREB has been shown to stimulate target gene expression, in part by associating with the coactivator paralogues p300 and CREB binding protein (CBP). CBP and P300 bind to the Ser-133-phosphorylated kinase-inducible domain (KID) of CREB via a region of approximately 90 residues referred to as the KIX domain, which is highly conserved in CBP homologues from Caenorhabditis elegans and Drosophila melanogaster. In addition to CREB, the KIX domain of CBP also recognises the transactivation domains of other nuclear factors, including Myb, Jun, cubitus interruptus, and HTLV-1 virally encoded Tax protein. Thus the KIX domain appears to be a common docking site on CBP for many transcriptional activators. The KIX domain is found in association with other domains, such as the bromodomain, the ZZ-type zinc finger, or the TAZ-type zinc finger [
,
].The KIX domain of CBP is composed of three mutually interacting alpha helices, designated alpha1, alpha2 and alpha3, and two short 3(10) helices G1 and G2, that together with the interconnecting loops define a compact structural domain with an extensive hydrophobic core. Helices alpha1 and alpha3 constitute the primary interacting surface for the phosphorylated KID domain (pKID), forming a hydrophobic patch on the protein surface that is large enough to accommodate up to 3 turns of an amphipathic alpha helix, designated alphaB, in pKID. A second alpha helix in pKID, referred to as alphaA, interacts with a different face of the alpha3 helix of KIX. The two helices of pKID are arranged at an angle of about 90 degree and essentially wrap around the alpha3 helix of KIX [
,
].
Oxygenic photosynthesis uses two multi-subunit photosystems (I and II) located in the cell membranes of cyanobacteria and in the thylakoid membranes of chloroplasts in plants and algae. Photosystem II (PSII) has a P680 reaction centre containing chlorophyll 'a' that uses light energy to carry out the oxidation (splitting) of water molecules, and to produce ATP via a proton pump. Photosystem I (PSI) has a P700 reaction centre containing chlorophyll that takes the electron and associated hydrogen donated from PSII to reduce NADP+ to NADPH. Both ATP and NADPH are subsequently used in the light-independent reactions to convert carbon dioxide to glucose using the hydrogen atom extracted from water by PSII, releasing oxygen as a by-product.PSII is a multisubunit protein-pigment complex containing polypeptides both intrinsic and extrinsic to the photosynthetic membrane [
,
,
]. Within the core of the complex, the chlorophyll and beta-carotene pigments are mainly bound to the antenna proteins CP43 (PsbC) and CP47 (PsbB), which pass the excitation energy on to the reaction centre proteins D1 (Qb, PsbA) and D2 (Qa, PsbD) that bind all the redox-active cofactors involved in the energy conversion process. The PSII oxygen-evolving complex (OEC) oxidises water to provide protons for use by PSI, and consists of OEE1 (PsbO), OEE2 (PsbP) and OEE3 (PsbQ). The remaining subunits in PSII are of low molecular weight (less than 10kDa), and are involved in PSII assembly, stabilisation, dimerisation, and photo-protection []. This entry represents the low molecular weight transmembrane protein PsbT found in PSII, which is thought to be associated with the D1 (PsbA) - D2 (PsbD) heterodimer. PsbT may be involved in the formation and/or stabilisation of dimeric PSII complexes, because in the absence of this protein dimeric PSII complexes were found to be less abundant. Furthermore, although PsbT does not confer photo-protection, it is required for the efficient recovery of photo-damaged PSII [
].
Activator protein-2 (AP-2) transcription factors constitute a family of closely related and evolutionarily conserved proteins that bind to the DNA
consensus sequence 5'-GCCNNNGGC-3' and stimulate target gene transcription [,
]. Five different isoforms of AP-2 have been identified in mammals, termed AP-2 alpha, beta, gamma, delta and epsilon. Each family member shares a common structure, possessing a proline/glutamine-rich domain in the N-terminal region, which is responsible for transcriptional activation [], and a helix-span-helix domain in the C-terminal region, which mediates dimerisation and site-specific DNA binding [].The AP-2 family have been shown to be critical regulators of gene expression during embryogenesis. They regulate the development of facial prominence and limb buds, and are essential for cranial closure and development of the lens [
,
]; they have also been implicated in tumorigenesis. AP-2 protein expression levels have been found to affect cell transformation, tumour growth and metastasis, and may predict survival in some types of cancer [,
]. Mutations in human AP-2 have been linked with bronchio-occular-facial syndrome and Char Syndrome, congenital birth defects characterised by craniofacial deformities and patent ductus arteriosus, respectively []. AP-2 alpha was initially isolated from human HeLa cells [
]. The protein wasshown to bind to enhancer regions of the SV40 and human metallothionein IIA
promoters, and to stimulate RNA synthesis []. AP-2 alpha gene knockout in mice causes neural-tube defects during embryogenesis, leading to craniofacial
abnormalities and anencephaly []. In humans, deletion of chromosome 6region 6p24-p25, which includes the AP-2 alpha gene, is associated with
microphthalmia, corneal clouding and a number of other dysmorphic features, including hypertelorism, micrognathia, dysplastic ears, thin limbs, and
congenital cardiac defects.This entry represents the N-terminal region of these proteins, including the transcriptional activation domain.
Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [
,
,
,
,
]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. This entry represents the zinc finger domain found in NAD-dependent DNA ligases. DNA ligases catalyse the crucial step of joining the breaks in duplex DNA during DNA replication, repair and recombination, utilizing either ATP or NAD(+) as a cofactor [
]. This domain is a small zinc binding motif that is presumably DNA binding. It is found only in NAD-dependent DNA ligases.
Animal lectins display a wide variety of architectures.
They are classified according to the carbohydrate-recognitiondomain (CRD) of which there are two main types, S-type and C-type.C-type lectins
display a wide range of specificities.
They require Ca2+for their activity
They are found predominantly but not exclusively in vertebrates.They can be classified into a number of subgroups based on their function and structure:Endocytic lectins -
Membrane-bound receptors that mediate endocytosis of glycoproteinsCollectins -
Represented by the soluble mannose-binding proteins of mammalian serum and liver Selectins -
Membrane-bound proteins involved in inflammation. There are three main divisions, CD62E, CD62 L and CD62P [].CD62E (also called E-selectin, ELAM-1 or LECAM-2), CD62L (also called L-selectin, LAM-1, LECAM-1, Leu-8, MEL-14 or TQ-1) and CD62P (also called P-selectin, granule membrane protein-140, GMP-140 or platelet activation dependent granule-external membrane protein, PADGEM) belong to this group.
CD62E mediates leukocyte rolling on activated endothelium at inflammatory sites and may also support tumor cell adhesion during hematogenous metastasis, and play a role in angiogenesis. CD62L mediates lymphocyte homing to high endothelial venules of peripheral lymphoid tissue and leukocyte rolling on activated endothelium at inflammatory sites. Interaction of CD62P with PSGL-1 mediates tethering and rolling of leukocytes on the surface of activated endothelial cells, the first step in leukocyte extravasation and migration towards inflammations. CD62P mediates rolling of platelets on endothelial cells and CD62P-mediated interactions are also involved in platelet-mediated delivery of lymphocytes to high endothelial venules.Members of the selectin superfamily have the same domain structure: an N-terminal lectin domain followed by an EGF repeat; a variable
number (between 2 and 9) of complement regulatory elements; a single trans-membrane region; and a short cytoplasmic anchor. Some studies have
found distinct carbohydrate structures on leukocytes that are adhered to byselectins, suggesting that selectins are involved in the selective
trafficking of blood-borne components of the immune system.
Oxygenic photosynthesis uses two multi-subunit photosystems (I and II) located in the cell membranes of cyanobacteria and in the thylakoid membranes of chloroplasts in plants and algae. Photosystem II (PSII) has a P680 reaction centre containing chlorophyll 'a' that uses light energy to carry out the oxidation (splitting) of water molecules, and to produce ATP via a proton pump. Photosystem I (PSI) has a P700 reaction centre containing chlorophyll that takes the electron and associated hydrogen donated from PSII to reduce NADP+ to NADPH. Both ATP and NADPH are subsequently used in the light-independent reactions to convert carbon dioxide to glucose using the hydrogen atom extracted from water by PSII, releasing oxygen as a by-product.PSII is a multisubunit protein-pigment complex containing polypeptides both intrinsic and extrinsic to the photosynthetic membrane [
,
,
]. Within the core of the complex, the chlorophyll and beta-carotene pigments are mainly bound to the antenna proteins CP43 (PsbC) and CP47 (PsbB), which pass the excitation energy on to the reaction centre proteins D1 (Qb, PsbA) and D2 (Qa, PsbD) that bind all the redox-active cofactors involved in the energy conversion process. The PSII oxygen-evolving complex (OEC) oxidises water to provide protons for use by PSI, and consists of OEE1 (PsbO), OEE2 (PsbP) and OEE3 (PsbQ). The remaining subunits in PSII are of low molecular weight (less than 10kDa), and are involved in PSII assembly, stabilisation, dimerisation, and photo-protection []. This entry represents the low molecular weight transmembrane protein PsbK found in PSII, where it is tightly associated with the antenna protein CP43 (PsbC). PsbK is required for accumulation of the PSII complex, and may participate in the assembly and stability of the PSII complex. In particular, PsbK may be involved in the binding of plastoquinone and in maintaining the dimeric organisation of PSII [
,
].
Inteins (for INternal proTEINs) are in frame intervening sequences that disrupt the coding region of a host gene. They are post-translationally
excised from a protein precursor by a self-catalytic protein splicing process[
,
,
]. Most inteins are bifunctional proteins mediating both protein splicing and DNA clivage. The domain involved in splicing is formed by the two terminal splicing regions, which are separated by a small linker in mini-inteins or a 200- to 250-amino-acid homing endonuclease in larger inteins [,
]. Homing endonucleases are rare-cutting enzymes encoded by inteins and introns. By making a site-specific double-strand break in the intronless or inteinless alleles, these nucleases create recombinogenic ends which engage in a gene conversion process that duplicates the intron or intein [,
]. There are four families of homing endonucleases classified by conserved sequence motifs. Homing endonucleases found in inteins generally belong to the dodecapetide (DOD) family, but an HNH endonuclease is also found. Endonucleases of the DOD family contain one or two copies of a 10-residue sequence known as a dodecapeptide or LAGLIDADG motif. They recognise long, pseudopalindromic homing sites of 14-30 bp in length and cleave their homing site DNA to generate 4nt, 3' extensions. The DOD endonucleases found in inteins contain 2 dodecapeptide motifs and are active as monomers. Resolution of the 3D structure of PI-Sce revealed that the endonuclease domain consist of alpha/beta motifs related by pseudo two-fold symmetry. The two α-helices containing the dodecapeptide motifs form the axis of symmetry [,
].This entry covers the conserved central intein Blocks C,
D, E and H. Blocks C and E are the dodecapeptide motifs that are required forendonuclease activity and each contains an endonuclease active site Asp or Glu
[].
Septins constitute a eukaryotic family of guanine nucleotide-binding proteins, most of which polymerise to form filaments [
]. Members of the family were first identified by genetic screening for Saccharomyces cerevisiae (Baker's yeast) mutants defective in cytokinesis []. Temperature-sensitive mutations in four genes, CDC3, CDC10, CDC11 and CDC12, were found to cause cell-cycle arrest and defects in bud growth and cytokinesis. The protein products of these genes localise at the division plane between mother and daughter cells, indicating a role in mother-daughter separation during cytokinesis []. Members of the family were therefore termed septins to reflect their role in septation and cell division. The identification of septin homologues in higher eukaryotes, which localise to the cleavage furrow in dividing cells, supports an orthologous function in cytokinesis. Septins have since been identified in most eukaryotes, except plants [].Septins are approximately 40-50kDa in molecular mass, and typically comprise a conserved central core domain (more than 35% sequence identity between mammalian and yeast homologues) flanked by more divergent N- and C-termini. Most septins possess a P-loop motif in their N-terminal domain (which is characteristic of GTP-binding proteins), and a predicted C-terminal coiled-coil domain [
].A number of septin interaction partners have been identified in yeast, many of which are components of the budding site selection machinery, kinase cascades or of the ubiquitination pathway. It has been proposed that septins may act as a scaffold that provides an interaction matrix for other proteins [
,
]. In mammals, septins have been shown to regulate vesicle dynamics []. Mammalian septins have also been implicated in a variety of other cellular processes, including apoptosis, carcinogenesis and neurodegeneration [].Northern blotting studies indicate that septin 3 expression
is high in the brain but undetectable in other tissues, suggesting that theprotein is primarily neuronal.
MetR, a member of the LysR family, is a positive regulator for the metA, metE, metF, and metH genes [
]. The sulfur-containing amino acid methionine is the universal initiator of protein synthesis in all known organisms and its derivative S-adenosylmethionine (SAM) and autoinducer-2 (AI-2) are involved in various cellular processes. SAM plays a central role as methyl donor in methylation reactions, which are essential for the biosynthesis of phospholipids, proteins, DNA and RNA. The interspecies signaling molecule AI-2 is involved in cell-cell communication process (quorum sensing) and gene regulation in bacteria. Although methionine biosynthetic enzymes and metabolic pathways are well conserved in bacteria, the regulation of methionine biosynthesis involves various regulatory mechanisms. In Escherichia coli and Salmonella enterica serovar Typhimurium, MetJ and MetR regulate the expression of methionine biosynthetic genes []. The MetJ repressor negatively regulates the E. coli met genes, except for metH []. Several of these genes are also under the positive control of MetR with homocysteine as a co-inducer. In Bacillus subtilis, the met genes are controlled by S-box termination-antitermination system. This substrate-binding domain shows significant homology to the type 2 periplasmic binding proteins (PBP2).The PBP2 are responsible for the uptake of a variety of substrates such as phosphate, sulfate, polysaccharides, lysine/arginine/ornithine, and histidine. The PBP2 bind their ligand in the cleft between these domains in a manner resembling a Venus flytrap. After binding their specific ligand with high affinity, they can interact with a cognate membrane transport complex comprised of two integral membrane domains and two cytoplasmically located ATPase domains. This interaction triggers the ligand translocation across the cytoplasmic membrane energized by ATP hydrolysis. Besides transport proteins, the PBP2 superfamily includes the substrate- binding domains from ionotropic glutamate receptors, LysR-like transcriptional regulators, and unorthodox sensor proteins involved in signal transduction [
,
,
].
Double-stranded (ds) DNA typically adopts the so-called B-conformation, while dsRNA is usually in the A-conformation. Both DNA and RNA can also adopt a Z-form double helix that is characterised by a left-handed helical arrangement, a zigzag pattern of the phosphodiester backbone and the absence of major grooves. Z-DNA is believed to play a role in transcription by relieving torsional strain induced within the DNA template by the movement of RNA polymerases, and Z-DNA may also promote genetic instability. Physiological functions of Z-RNA remain unknown. This entry represents the Z-binding domain (ZBD), also referred to as Zalpha or Zbeta, which is a 78-amino-acid protein fold that specifically binds to Z-DNA as well as to Z-RNA but not to B-DNA. ZBDs have been identified in four proteins: ADAR1, ZBP1, E3L and PKZ. ADAR1 and ZBP1 are mammalian proteins implicated in antiviral innate immune responses. E3L is a poxvirus protein known to antagonise this host response. Lastly, PKZ is a fish protein related to mammalian PKR, also involved in antiviral immunity. This suggests an important role of ZBDs in innate antiviral immune responses and may imply that Z-DNA and/or Z-RNA trigger such a host defense response [
,
,
,
,
,
].The Z-binding domain displays an α/β architecture with three α-helices packed against three antiparallel β-strands (
). It belongs to the winged helix-turn-helix (wHTH) domain superfamily. The three helices form the core of the domain, with helices 2 and 3 forming the helix-turn-helix unit. Helix 1 is joined to helix 2 by a β-strand, beta1. C-terminal to helix 3 is the 'wing', formed by two antiparallel β-strands (β2 and β3), which hydrogen bond to each other and to beta1, forming a three-stranded β-sheet [
,
].
The N-acetyltransferases (NAT) (EC 2.3.1.-) are enzymes that use acetyl
coenzyme A (CoA) to transfer an acetyl group to a substrate, a reactionimplicated in various functions from bacterial antibiotic resistance to
mammalian circadian rhythm and chromatin remodeling. The Gcn5-related
N-acetyltransferases (GNAT) catalyze the transfer of the acetyl from the CoAdonor to a primary amine of the acceptor. The GNAT proteins share a domain
composed of four conserved sequence motifs A-D [,
]. This GNAT domain is named after yeast GCN5 (from General Control Nonrepressed) and related histone acetyltransferases (HATs) like Hat1 and PCAF. HATs acetylate lysine residues of amino terminal histone tails, resulting in transcription activation. Another category of GNAT, the aminoglycoside N-acetyltransferases, confer antibiotic resistance by catalyzing the acetylation of amino groups in aminoglycoside antibiotics []. GNAT proteins can also have anabolic and catabolic functions in both prokaryotes and eukaryotes [,
,
,
,
].The acetyltransferase/GNAT domain forms a structurally conserved fold of 6 to
7 beta strands (B) and 4 helices (H) in the topologyB1-H1-H2-B2-B3-B4-H3-B5-H4-B6, followed by a C-terminal strand which may be
from the same monomer or contributed by another [,
]. MotifsD (B2-B3), A (B4-H3) and B (B5-H4) are collectively called the HAT core
[,
,
], while the N-terminal motif C (B1-H1) is less conserved.This entry represents the ATAT-type of the GNAT domain [
]. Proteins containing this domain include alpha-tubulin N-acetyltransferase, originally known as mechanosensory abnormality protein 17 (Mec-17), as it is the protein product of one of the 18 genes required for the development and function of the touch receptor neuron for gentle touch []. Mec-17 specifically acetylates 'Lys-40' in alpha-tubulin on the lumenal side of microtubules []. It is inefficient, and its activity is enhanced when tubulin is incorporated in microtubules []. It may affect microtubule stability and regulate microtubule dynamics.
This family of proteins, previously known as DUF2540, includes Uncharacterized protein MJ0366 from Methanocaldococcus jannaschii and similar proteins predominantly found in Methanococci. This protein adopts the 3-1 trefoil knot conformation [
].
Proteins in this entry are siderophore-interacting FAD-binding proteins.This entry includes the vibriobactin utilization protein ViuB, which is involved in the removal of iron from iron-vibriobactin complexes, as well as several hypothetical proteins.
This entry represents a protein-glutamine gamma-glutamyltransferase which may play a role in the assembly of the spore coat proteins by catalysing epsilon-(gamma-glutamyl)lysine cross-links. Protein glutamine + alkylamine = protein N(5)-alkylglutamine + NH(3).
This domain is found in uncharacterised proteins and in tripartite motif containing (TRIM) protein 41. This protein functions as an E3 ligase that catalyzes the ubiquitin-mediated degradation of protein kinase C [
].
This family is restricted to fungi. The function of proteins in this family is not known. Computational analysis of large-scale protein-protein interaction data suggests that proteins may be involved in glutathione metabolism [
].
Pathogenic Neisseria spp. possess a repertoire of phase-variable opacity proteins that mediate various pathogen/host cell interactions [
]. These proteins are integral membrane proteins related to other porins and the Haemophilus influenzae OpA protein.
Pathogenic Neisseria spp. possess a repertoire of phase-variable opacity proteins that mediate various pathogen/host cell interactions [
]. These proteins are integral membrane proteins related to other porins and the Haemophilus influenzae OpA protein.
Proteins containing this domain are members of the ParB/sulfiredoxin superfamily found in polyvalent proteins prototyped by the version in the phage P1 ddRB protein. These proteins are predicted to function as nucleases [
].
This small family includes Bublin coiled-coil proteins (BBLNs) found in eukaryotes (also known UPF0184 proteins), including the human BBLN previously known as UPF0184 protein C9orf16 and other uncharacterised proteins. Their function is currently unknown.
This domain occurs within longer proteins that contain lantibiotic dehydratase domains (see
and
) like epidermin biosynthesis protein EpiB and nisin biosynthesis protein NisB. It occurs also in bacteriocin biosynthesis single-domain proteins [
,
].
This entry represents the N-terminal domain of PARMER_03128, an uncharacterized protein from Parabacteroides merdae. Proteins containing this domain also include related proteins from Bacteroidetes. Structurally, they resemble domains found in streptococcal surface proteins such as SpaP.
This superfamily represents an immunoglobulin-like domain found in immunogenic proteins MPB63/MPT63 [
] and in a group of related proteins that fall into the antigen MPT63/MPB63 (immunoprotective extracellular protein) superfamily, such as uncharacterised lipoprotein YjhA ().
ABC transporters belong to the ATP-Binding Cassette (ABC) superfamily, which uses the hydrolysis of ATP to energise diverse biological systems. ABC transporters minimally consist of two conserved regions: a highly conserved ATP binding cassette (ABC) and a less conserved transmembrane domain (TMD). These can be found on the same protein or on two different ones. Most ABC transporters function as a dimer and therefore are constituted of four domains, two ABC modules and two TMDs.ABC transporters are involved in the export or import of a wide variety of substrates ranging from small ions to macromolecules. The major function of ABC import systems is to provide essential nutrients to bacteria. They are found only in prokaryotes and their four constitutive domains are usually encoded by independent polypeptides (two ABC proteins and two TMD proteins). Prokaryotic importers require additional extracytoplasmic binding proteins (one or more per systems) for function. In contrast, export systems are involved in the extrusion of noxious substances, the export of extracellular toxins and the targeting of membrane components. They are found in all living organisms and in general the TMD is fused to the ABC module in a variety of combinations. Some eukaryotic exporters encode the four domains on the same polypeptide chain [
].The ABC module (approximately two hundred amino acid residues) is known to bind and hydrolyse ATP, thereby coupling transport to ATP hydrolysis in a large number of biological processes. The cassette is duplicated in several subfamilies. Its primary sequence is highly conserved, displaying a typical phosphate-binding loop: Walker A, and a magnesium binding site: Walker B. Besides these two regions, three other conserved motifs are present in the ABC cassette: the switch region which contains a histidine loop, postulated to polarise the attaching water molecule for hydrolysis, the signature conserved motif (LSGGQ) specific to the ABC transporter, and the Q-motif (between Walker A and the signature), which interacts with the gamma phosphate through a water bond. The Walker A, Walker B, Q-loop and switch region form the nucleotide binding site [,
,
].The 3D structure of a monomeric ABC module adopts a stubby L-shape with two distinct arms. ArmI (mainly β-strand) contains Walker A and Walker B. The important residues for ATP hydrolysis and/or binding are located in the P-loop. The ATP-binding pocket is located at the extremity of armI. The perpendicular armII contains mostly the alpha helical subdomain with the signature motif. It only seems to be required for structural integrity of the ABC module. ArmII is in direct contact with the TMD. The hinge between armI and armII contains both the histidine loop and the Q-loop, making contact with the gamma phosphate of the ATP molecule. ATP hydrolysis leads to a conformational change that could facilitate ADP release. In the dimer the two ABC cassettes contact each other through hydrophobic interactions at the antiparallel β-sheet of armI by a two-fold axis [
,
,
,
,
,
].The ATP-Binding Cassette (ABC) superfamily forms one of the largest of all protein families with a diversity of physiological functions [
]. Several studies have shown that there is a correlation between the functional characterisation and the phylogenetic classification of the ABC cassette [,
]. More than 50 subfamilies have been described based on a phylogenetic and functional classification [,
,
].This entry represents ABC transporter substrate-binding proteins including KPN01854 from Klebsiella pneumoniae, and occur in both Gram-positive and Gram-negative species.
G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups [
]. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [,
,
,
,
]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The rhodopsin-like GPCRs (GPCRA) represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7 transmembrane (TM) helices [,
,
].Leukotrienes (LT) are potent lipid mediators derived from arachidonic acid metabolism. They can be divided into two classes, based on the presence or absence of a cysteinyl group. Leukotriene B4 (LTB4) does not contain such a group, whereas LTC4, LTD4, LTE4 and LTF4 are cysteinyl leukotrienes.LTB4 is one of the most effective chemoattractant mediators known, and is produced predominantly by neutrophils and macrophages. It is involved in a number of events, including: stimulation of leukocyte migration from the bloodstream; activation of neutrophils; inflammatory pain; host defence against infection; increased interleukin production and transcription [
]. It is found in elevated concentrations in a number of inflammatory and allergic conditions, such as asthma, psoriasis, rheumatoid arthritis and inflammatory bowel disease, and has been implicated in the pathogenesis of these diseases [].Binding sites for LTB4 have been observed in membrane preparations from leukocytes, macrophages and spleen. Two receptors for LTB4 have since been cloned (BLT1 and BLT2); both are members of the rhodopsin-like G-protein-coupled receptor superfamily [
].The leukotriene B4 type 1 receptor (BLT1) has been cloned from Homo sapiens (Human), Mus musculus (Mouse) and Rattus norvegicus (Rat), and was found to be identical to a previously cloned receptor, P2Y7 [
,
]. This receptor was originally thought to be a purinoceptor but is now widely accepted to bind LTB4. BLT1 has also been reported to act as a coreceptor for macrophage-tropic Human immunodeficiency virus 1 (HIV-1) strains []. BLT1 is expressed primarily in peripheral leukocytes and peritoneal macrophages, with lower levels of expression detected in the spleen and thymus of humans []. Activation of the receptor by LTB4 leads to the production of inositol trisphosphate and an increase in intracellular calcium levels. This response is sensitive to pertussis toxin in some cell types. The receptor also causes chemotaxis and inhibition of forskolin-stimulated adenylyl cyclase activity in a pertussis toxin sensitive manner. It has been demonstrated that BLT1 can couple to Gi2 and G16 G-proteins, depending on the cell type in which it is expressed [].
G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups [
]. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [,
,
,
,
]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice [
]. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The rhodopsin-like GPCRs (GPCRA) represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7 transmembrane (TM) helices [
,
,
].Lysophospholipids (LPs), such as lysophosphatidic acid (LPA), sphingosine
1-phosphate (S1P) and sphingosylphosphorylcholine (SPC), have long been known to act as signalling molecules in addition to their roles as intermediates in membrane biosynthesis []. They have roles in the regulation of cell growth, differentiation, apoptosis and development, and have been implicated in a wide range of pathophysiological conditions, including: blood clotting, corneal wounding, subarachinoid haemorrhage, inflammation and colitis []. A number of G protein-coupled receptors bind members of the lysophopholipid family - these include: the cannabinoid receptors; platelet activating factor receptor; OGR1, an SPC receptor identified in ovarian cancer cell lines; PSP24, an orphan receptor that has been proposed to bind LPA; and at least 8 closely related receptors, the EDG family, that bind LPA and S1P [].LPA is found in all cell types in small quantities (associated with membrane
biosynthesis) but is produced in significant quantities by some cellularsources, accounting for the levels of LPA in serum. LPA is also found in
elevated levels in ovarian cancer ascites, and acts to stimulate proliferation and promote survival of the cancer cells []. The effects of LPA on the proliferation and morphology of a number of other cell types have been well documented [,
]. However, identification of the mechanisms by which these effects are accomplished has been complicated by a number of factors, such as: a lack of antagonists, difficulty in ligand-binding experiments and the responsiveness of many cell types to LPA []. The G protein-coupled receptors EDG-2, EDG-4 and EDG-7 have now been identifiedas high affinity receptors for LPA. It has been suggested that these receptors should now be referred to as lpA1, lpA2 and lpA3 respectively [
,
].
G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups [
]. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [,
,
,
,
]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The rhodopsin-like GPCRs (GPCRA) represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7 transmembrane (TM) helices [
,
,
].Lysophospholipids (LPs), such as lysophosphatidic acid (LPA), sphingosine
1-phosphate (S1P) and sphingosylphosphorylcholine (SPC), have long been known to act as signalling molecules in addition to their roles as intermediates in membrane biosynthesis []. They have roles in the regulation of cell growth, differentiation, apoptosis and development, and have been implicated in a wide range of pathophysiological conditions, including: blood clotting, corneal wounding, subarachinoid haemorrhage, inflammation and colitis []. A number of G protein-coupled receptors bind members of the lysophopholipid family - these include: the cannabinoid receptors; platelet activating factor receptor; OGR1, an SPC receptor identified in ovarian cancer cell lines; PSP24, an orphan receptor that has been proposed to bind LPA; and at least 8 closely related receptors, the EDG family, that bind LPA and S1P [].S1P is released from activated platelets and is also produced by a number of other cell types in response to growth factors and cytokines [
]. It is proposed to act both as an extracellular mediator and as an intracellularsecond messenger. The cellular effects of S1P include growth related effects, such as proliferation, differentiation, cell survival and apoptosis, and cytoskeletal effects, such as chemotaxis, aggregation, adhesion, morphological change and secretion. The molecule has been implicated in control of angiogenesis, inflammation, heart-rate and tumour progression, and may play an important role in a number of disease states, such as atherosclerosis, and breast and ovarian cancer [
]. Recently, 5 G protein-coupled receptors have been identified that act as high affinity receptors for S1P, and also as low affinity receptors for the related lysophospholipid, SPC []. EDG-1, EDG-3, EDG-5 and EDG-8 share a high degree of similarity, and are also referred to as lpB1, lpB3, lpB2 and lpB4, respectively. EDG-6 is referred to as lpC1, reflecting its more distant relationship to the other S1P receptors.
These proteins are integral membrane proteins with four transmembrane spanning helices. The most conserved region of an alignment of the proteins is a motif HPP. The function of these proteins is uncertain but they may be transporters.
Proteins in this family include O-mannosyl-transferase and dolichyl-phosphate-mannose--protein mannosyltransferase. O-mannosyl-transferase transfers mannosyl residues to the hydroxyl group of serine or threonine residues [
]. Dolichyl-phosphate-mannose--protein mannosyltransferase transfers mannose from Dol-P-mannose to Ser orThr residues on proteins [
].
This domain is found in various hypothetical bacterial proteins that are similar to the Escherichia coli protein YheO (
). Their function is unknown, but a few members are annotated as being HTH-containing proteins and putative DNA-binding proteins.
This protein family contains uncharacterised proteins mainly found in Alphabaculovirus, including Uncharacterized 9.7 kDa protein from Orgyia pseudotsugata multicapsid polyhedrosis virus (OpMNPV) and Uncharacterized 9.9 kDa protein in LEF8-FP intergenic region from Autographa californica nuclear polyhedrosis virus (AcMNPV).
This domain is found as a tandem repeat in Streptococcal cell surface proteins, such as the IgG binding proteins G. These proteins are type I membrane proteins that bind to the constant Fc region of IgG with high affinity.
This domain is found in diverse bacterial polysaccharide biosynthesis proteins including the CapD protein from Staphylococcus aureus [], the WalL protein, mannosyl-transferase [], and several putative epimerases. The CapD protein is required for biosynthesis of type 1 capsular polysaccharide.
The YecR-like family of lipoproteins includes the YecR protein from Escherichia coli (
). This family of proteins is found in bacteria and viruses and is functionally uncharacterised. Proteins in this family are approximately 110 amino acids in length.
This family consists of several viral hemorrhagic septicemia virus non-virion (Nv) proteins. The NV protein is a nonstructural protein absent from mature virions although it is present in infected cells. The function of this protein is unknown [
].
This family represents proteins found in Poxvirus, including Protein C10 and Protein C4. Vaccinia virus protein C4 plays a role in the inhibition of host NF-kappa-B activation. It blocks the subunit p65/RELA translocation into the host nucleus [
].
This entry represents a group of proteins related to archaeal non-canonical single-stranded DNA-binding proteins (ThermoDBPs) named ThermoDBP-related proteins. These proteins have been related to genome maintenance in particularly challenging environments, but their specific function is still unknown [
].
Thiol (cysteine) proteases (EC 3.4.22.-) [
] are a family of proteolytic enzymes which contain an active site cysteine. Catalysis proceeds through a thioester intermediate and is facilitated by a nearby histidine side chain; an asparagine completes the essential catalytic triad.Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue. Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad (cysteine-histidene) or triad [
].Modification of the catalytic triad, especially of its first amino acid (cysteine), has been postulated as a suitable target for a chemical modulation of enzyme function. This is the case for silicateins, where the cysteine residue has been replaced by a serine [
]. Silicateins represent a group of enzymes possessing bi-functional activity; in addition to the silica-condensing activity, they possess a proteolytic (cathepsin-like) activity [].The sequences around the three active site residues are well conserved. This entry represents the cysteine active site. The catalytic triad consists of this entry,
and
. This catalytic triad detects mainly proteases of the C1 family, including papain and several cathepsins.
A cysteine peptidase is a proteolytic enzyme that hydrolyses a peptide bond using the thiol group of a cysteine residue as a nucleophile. Hydrolysis involves usually a catalytic triad consisting of the thiol group of the cysteine, the imidazolium ring of a histidine, and a third residue, usually asparagine or aspartic acid, to orientate and activate the imidazolium ring. In only one family of cysteine peptidases, is the role of the general base assigned to a residue other than a histidine: in peptidases from family C89 (acid ceramidase) an arginine is the general base. Cysteine peptidases can be grouped into fourteen different clans, with members of each clan possessing a tertiary fold unique to the clan. Four clans of cysteine peptidases share structural similarities with serine and threonine peptidases and asparagine lyases. From sequence similarities, cysteine peptidases can be clustered into over 80 different families [
]. Clans CF, CM, CN, CO, CP and PD contain only one family.Cysteine peptidases are often active at acidic pH and are therefore confined to acidic environments, such as the animal lysosome or plant vacuole. Cysteine peptidases can be endopeptidases, aminopeptidases, carboxypeptidases, dipeptidyl-peptidases or omega-peptidases. They are inhibited by thiol chelators such as iodoacetate, iodoacetic acid,
N-ethylmaleimide or
p-chloromercuribenzoate.
Clan CA includes proteins with a papain-like fold. There is a catalytic triad which occurs in the order: Cys/His/Asn (or Asp). A fourth residue, usually Gln, is important for stabilising the acyl intermediate that forms during catalysis, and this precedes the active site Cys. The fold consists of two subdomains with the active site between them. One subdomain consists of a bundle of helices, with the catalytic Cys at the end of one of them, and the other subdomain is a β-barrel with the active site His and Asn (or Asp). There are over thirty families in the clan, and tertiary structures have been solved for members of most of these. Peptidases in clan CA are usually sensitive to the small molecule inhibitor E64, which is ineffective against peptidases from other clans of cysteine peptidases [
].Clan CD includes proteins with a caspase-like fold. Proteins in the clan have an α/β/α sandwich structure. There is a catalytic dyad which occurs in the order His/Cys. The active site His occurs in a His-Gly motif and the active site Cys occurs in an Ala-Cys motif; both motifs are preceded by a block of hydrophobic residues [
]. Specificity is predominantly directed towards residues that occupy the S1 binding pocket, so that caspases cleave aspartyl bonds, legumains cleave asparaginyl bonds, and gingipains cleave lysyl or arginyl bonds.Clan CE includes proteins with an adenain-like fold. The fold consists of two subdomains with the active site between them. One domain is a bundle of helices, and the other a β-barrel. The subdomains are in the opposite order to those found in peptidases from clan CA, and this is reflected in the order of active site residues: His/Asn/Gln/Cys. This has prompted speculation that proteins in clans CA and CE are related, and that members of one clan are derived from a circular permutation of the structure of the other.Clan CL includes proteins with a sortase B-like fold. Peptidases in the clan hydrolyse and transfer bacterial cell wall peptides. The fold shows a closed β-barrel decorated with helices with the active site at one end of the barrel [
]. The active site consists of a His/Cys catalytic dyad.Cysteine peptidases with a chymotrypsin-like fold are included in clan PA, which also includes serine peptidases. Cysteine peptidases that are N-terminal nucleophile hydrolases are included in clan PB. Cysteine peptidases with a tertiary structure similar to that of the serine-type aspartyl dipeptidase are included in clan PC. Cysteine peptidases with an intein-like fold are included in clan PD, which also includes asparagine lyases.
Flavivirus envelope glycoprotein E, Stem/Anchor domain superfamily
Type:
Homologous_superfamily
Description:
This entry describes the C-terminal domain, containing a stem region followed by two transmembrane anchor domains, of the envelope protein E. This protein is cleaved from the large flavivirus polyprotein, which yields three structural and seven nonstructural proteins [
].
Flavivirus envelope glycoprotein E, Stem/Anchor domain
Type:
Domain
Description:
This entry describes the C-terminal domain, containing a stem region followed by two transmembrane anchor domains, of the envelope protein E. This protein is cleaved from the large flavivirus polyprotein, which yields three structural and seven nonstructural proteins [
].
Human immunodeficiency virus 1, envelope glycoprotein Gp120
Type:
Domain
Description:
The envelope glycoprotein Gp160 of HIV1 is cleaved into the surface protein Gp120 and the transmembrane protein Gp41. The entry of HIV requires interaction of viral Gp120 with the CD4 glycoprotein and a chemokine receptor on the cell surface [
].
This superfamily represents the coat protein-binding C-terminal domain of the scaffolding protein from P22-like viruses [
]. Bacteriophage scaffolding protein is required for successful condensation of DNA within the capsid. The scaffolding protein is lost from the structure during packaging [].
This family of proteins is functionally uncharacterised. This protein is found in bacteria. Proteins in this family are typically between 136 to 304 amino acids in length. This region represents a transmembrane region found at the C terminus of the proteins.
This domain is found in a family of prokaryotic proteins that have no known function. Proteins belonging to this family include hypothetical proteins from eubacteria and archaebacteria. Some of these proteins also contain the Von Willebrand factor, type A domain (see
).
This entry represents the RNA recognition motif (RRM) of plant La-like proteins related to the La autoantigen. A variety of La-related proteins (LARPs or La ribonucleoproteins), with differing domain architecture, appear to function as RNA-binding proteins in eukaryotic cellular processes [
].
Arteriviruses encode four envelope proteins, GL, GS, M and N. GL envelope glycoprotein
is heterogenously glycosylated with N-acetyllactosamine in a cell-type-specific manner. The GL glycoprotein expresses the neutralization determinants [
].This entry is specific for the GL envelope glycoproteins from equine arteritis virus.
This entry represents a region consisting of HEAT repeats that is found in a group of eukaryotic proteins, including HEAT repeat-containing protein 5A/B (HEATR5A/B) from vertebrates, Pof6 interactor protein 1 (SIP1), AP-1 accessory protein LAA1 from yeast and SWEETIE from Arabidopsis thaliana.
This entry includes tetratricopeptide-like repeats not detected by the
,
and
models. The tetratricopeptide repeat (TPR) motif is a protein-protein interaction module found in multiple copies in a number of functionally different proteins that facilitates specific interactions with a partner protein(s) [
].
This entry includes budding yeast Gip2 (GLC7-interacting protein 2) and its paralogue, Pig2 (Protein Interacting with Gsy2). Pig2 is is a putative type-1 protein phosphatase (PP1) targeting subunit that tethers the Glc7p type-1 protein phosphatase to the Gsy2p glycogen synthase [
].
This protein family represents a group of predicted mannosyltransferases, MA4085 type.This group of proteins include the probable dolichyl-phosphate-hexose flippase Agl23 (
) from the extremely halophilic archaea Haloarcula hispanica. This protein may be involved in both N- and O-glycosylation of S-layer glycoproteins [
].
This entry includes major capsid proteins from Myoviridae. Bacteriophage T4 major capsid protein, also known as Gp23, self-associates to form hexamers, building most of the capsid in association with pentons made of the capsid vertex protein and one dodecamer of the portal protein [
].
This entry represents PyrE-like proteins from archaea. They share protein sequence similarity to the PyrE protein (Orotate phosphoribosyltransferase) from Salmonella typhimurium [
]. PyrE-like proteins may have some orotate phosphoribosyltransferase activity, but not enough to ensure growth in the absence of the physiological OPRT [].
Phosphoprotein associated with glycosphingolipid-enriched microdomains 1
Type:
Family
Description:
PAG, or Cbp/PAG (Csk binding protein/phospho-protein associated with glycosphingolipid-enriched microdomains) is a transmembrane protein that has a negative regulatory role in T-cell activation through being an adapter for C-terminal Src kinase, Csk. This family of proteins is found in eukaryotes [
,
,
,
].
This domain in found exclusively in plant proteins, associated with homeobox domain
which may suggest these proteins are homeodomain transcription factors. Proteins containing this domain include BEL1-like homeodomain protein 1 (BLH1) from Arabidopsis thaliana. BLH1 interacts with KNAT2 and KNAT5 [
] and affects plant development [].
This superfamily represents the N-terminal domain of vacuolar protein sorting-associated proteins and callose synthases. This domain contains seven alpha helices arranged into two antiparallel three-helix bundle modules. It has been proposed that vacuolar protein sorting-associated protein Vta1 interacts with Vps60 and Vps46/Did2 via this N-terminal domain [].
Glycoprotein L from Cytomegalovirus serves a chaperone for the correct folding and surface expression of glycoprotein H (gH) [
]. Glycoprotein L is a member of the heterotrimeric gCIII complex of glycoproteins which also includes gH and gO and has an essential role in viral fusion [].
The proteins in this entry contained a domain rich in positionally conserved cysteine residues. Most proteins contains 7 or 8 cysteine residues. The domain is found in disease-related proteins including von Willebrand factor, Alpha tectorin, Zonadhesin and Mucin. It is often found on proteins containing
and
.
Members of this protein family are the ThiH proteins (also known as 2-iminoacetate synthase) involved in thiamine biosynthesis [
,
]. They are homologues of the BioB protein of biotin biosynthesis. Genes encoding these proteins are generally found in operons with other thiamin biosynthesis genes [].
The alpha-2-macroglobulin receptor-associated protein (RAP) is a intracellular glycoprotein that binds to the 2-macroglobulin receptor and other members of the low density lipoprotein receptor family. The protein inhibits binding of all currently known ligands of these receptors [
]. Two different studies have provided conflicting domain boundaries.
The function of Sterile alpha motif domain-containing protein 3 (SAMD3) is not clear. The SAM (sterile alpha motif) domain of SAMD3 is a putative protein-protein interaction domain. SAM is a widespread domain in signaling and regulatory proteins. In many cases SAM mediates dimerization/oligomerization [
,
].
This domain is found near to the C terminus of leucine-rich repeat-containing proteins in the CARMIL family, whose members are potential regulators of actin capping proteins. In leucine-rich repeat-containing protein 16A (LRRC16A) it includes the region responsible for interaction with F-actin-capping protein subunit alpha-2 (CAPZA2) [
].
116kDa U5 small nuclear ribonucleoprotein component, N-terminal
Type:
Domain
Description:
This entry represents the N-terminal domain of U5 small nuclear ribonucleoprotein (snRNP) protein (U5-116kDa). It is also found in Snu114p, a 114kDa protein homologous to U5-116kDa that has been identified in Saccharomyces cerevisiae [
]. Both proteins seem to have a role in pre-mRNA splicing process.
This is a thioredoxin like domain found in AIMP2 proteins (aminoacyl tRNA synthetase complex interacting multifunctional protein 2). AIMP2 is a component of human multi-tRNA synthetase complex (MSC). MSC is a macromolecular protein complex consisting of nine different ARSs and three ARS-interacting multifunctional proteins (AIMPs) [
].
This domain is found in a large number of proteins including magnesium dependent endonucleases and phosphatases involved in intracellular signalling [
]. Proteins this domain is found in include: AP endonuclease proteins (), DNase I proteins (
), Synaptojanin an inositol-1,4,5-trisphosphate phosphatase (
) and Sphingomyelinase (
).
Dodecin flavoprotein is a small dodecameric flavin-binding protein from Halobacterium salinarium that contains two flavins stacked in a single binding pocket between two tryptophan residues to form an aromatic tetrade [
]. This entry represents Dodecin and structurally related proteins, including the functionally uncharacterised proteins YdgH and YjfN.
This entry represents a domain found in acetylene hydratase (Ahy) and other related proteins. The acetylene hydratase of Pelobacter acetylenicus is a tungsten iron-sulfur protein involved in the fermentation of acetylene to ethanol and acetate [
]. Proteins containing this domain belong to the molybdopterin-binding superfamily of proteins.
Glycine reductase is a complex with two selenoprotein subunits, A and B. This entry represents the glycine reductase selenoprotein B. Closely related proteins not matched by this entry include selenoprotein B subunits of betaine reductase and sarcosine reductase. All contain selenocysteine incorporated during translation at a specific UGA codon.
This family of lipoproteins is found in Borrelia spirochetes. They are known as variable large proteins (vlp). The function of these proteins is uncertain, but it may serve to avoid the host immune response by changing from one surface
exposed variable major outer membrane lipoprotein to another [].
This family represents a group of proteins found enriched in fusobacteria. These proteins contain many repeats of a DKNXXYY motif. The repeats are spaced at about 35 amino acid residues intervals. These proteins are likely to be associated with the membrane. The specific function of these proteins is unknown.
This domain constitutes the N-terminal of the paralogous homeobox proteins HoxA9, HoxB9, HoxC9 and HoxD9. The N-terminal region is thought to act as a transcription activation region. Activation may be by interaction with proteins such as Btg proteins, which are thought to recruit a multi-protein Ccr4-like complex [
].
This protein through the member sll1638 from Synechocystis sp. (strain PCC 6803), was shown to be part of the cyanobacteria photosystem II. It is homologous to (but quite diverged from) the chloroplast PsbQ protein, called oxygen-evolving enhancer protein 3 (OEE3). We designate this cyanobacteria protein PsbQ by homology.
This leucine-zipper domain can be found in MIP1 proteins and in putative Rho GTPase-activating proteins. MIP1 proteins, here largely from plants, are subunits of the TORC2 (rictor-mTOR) protein complex controlling cell growth and proliferation [
]. The leucine-zipper is likely to be the region that interacts with plant MADS-box factors [],
The CRIPT protein is a cytoskeletal protein involved in microtubule production. This C-terminal domain is essential for binding to the PDZ3 domain of the SAP90 protein, one of a super-family of PDZ-containing proteins that play an important role in coupling the membrane ion channels with their signalling partners [
].
Quercetin 2,3-dioxygenase, C-terminal cupin domain
Type:
Domain
Description:
This entry represents the C-terminal domain of quercetin 2,3-dioxygenase (also known as quercetinase), such as the YhhW protein from E. coli. This is one of the two cupin domains that make up the protein [
]. Proteins containing this domain also include the Pirin-like protein, YhaK, from E. coli [].
Merozoite surface protein 7 (MSP7) is a protein of the malaria parasite that has been found to be associated with processed fragments from the MSP1 protein in a complex involved in red blood cell invasion [
].This entry represents a C-terminal domain found in MSP7 and other Plasmodium merozoite surface proteins.
This domain is found in proteins expressed by bacterial and archaeal species. One such protein is immunity protein SdpI (
) from Bacillus subtilis, which provides protection for the cell against the toxic effects of its own SdpC killing factor, and also functions as a receptor/signal transduction protein [
].
This entry represents viral glycoprotein N (also known as BLRF1, U46, 53, and UL73). These UL73-like envelope glycoproteins, which associate in a high molecular mass complex with their counterpart protein gM, induce neutralizing antibody responses in the host. These glycoproteins are highly polymorphic, particularly in the N-terminal region [
].
This entry represents a conserved region located towards the N-terminal end of ImpA and related proteins. ImpA is an inner membrane protein, which has been suggested to be involved with proteins that are exported and associated with colony variations in Actinobacillus actinomycetemcomitans [
]. Note that many members are hypothetical proteins.
This entry represents the C-terminal HhH/H2TH-like domain of Mab-21 and Mab-21-like proteins, and related proteins from animals. In Caenorhabditis elegans, these proteins are required for several aspects of embryonic development [
,
]. This entry also includes inositol 1,4,5-triphosphate receptor-interacting proteins, which are predicted to contain a partial Mab-21 domain [].
This domain is found in the N-terminal region of the receptor-binding protein of bacteriophage TP901-1, which infects Lactococcus lactis. The receptor-binding protein of phage TP901-1 is termed the lower baseplate protein (BppL) and is trimeric in nature. The N-terminal domain of BppL plugs into the upper baseplate protein (BppU) [
].
This entry represents a conserved region located towards the C-terminal end of ImpA and related proteins. ImpA is an inner membrane protein, which has been suggested to be involved with proteins that are exported and associated with colony variations in Actinobacillus actinomycetemcomitans [
]. Note that many members are hypothetical proteins.
This is the SPRY domain of HECTD4 (HECT domain-containing protein 4). HECTD4 is a probable E3 ubiquitin-protein ligase, as the HECT (Homologous to the E6-AP Carboxyl Terminus) domain has been identified in proteins that belong to a particular E3 ubiquitin-protein ligase family [
]. The role of HECTD4 is not known.
This family includes Tll0287 protein from Thermosynechococcus vestitus (
) and similar bacterial proteins. Tll0287 is a c-type heme protein that shows a β-sheet with four antiparallel β-strands surrounded by five α-helices. This structure is similar to some kinases and sensor proteins, but its specific function is unknown [
].
This family consists of several Siphovirus and other phage tail component proteins, including bacteriophage SPP1 distal tail protein Dit (also known as Gp19.1 or Gp19) [
], as well as some bacterial proteins of unknown function. The characteristics of the protein distribution suggest prophage matches in addition to the phage matches.
This is a family of Herpesvirus proteins including UL14. UL14 protein is a minor component of the virion tegument [
] and is expressed late in infection. UL14 protein can influence the intracellular localization patterns of a number of proteins belonging to the capsid or the DNA encapsidation machinery [].
This entry represents the Ig-like domain of Glycoprotein E (gE) from herpesvirus. This protein forms a complex with glycoprotein I (gI), functioning as an immunoglobulin G (IgG) Fc binding protein. gE is involved in virus spread but is not essential for propagation. This domain is identified as the Fc-binding domain [
,
].
This domain was named after the yeast Sec63 (or NPL1) (also known as the Brl domain) protein in which it was found. This protein is required for assembly of functional endoplasmic reticulum translocons [
,
]. Other yeast proteins containing this domain include pre-mRNA splicing helicase BRR2, HFM1 protein and putative helicases.
The SecA ATPase is involved in the insertion and retraction of preproteins through the plasma membrane. This domain has been found to cross-link to preproteins, thought to indicate a role in preprotein binding. The pre-protein cross-linking domain is comprised of two sub domains that are inserted within the ATPase domain [
].
The piwi domain [
] is a protein domain found in piwi proteins and a large number of related nucleic acid-binding proteins, especially those that bind and cleave RNA. The function of the domain is double stranded-RNA-guided hydrolysis of single stranded-RNA, as has been determined in the argonaute family of related proteins [].
AT hooks are DNA-binding motifs with a preference for A/T rich regions. These motifs are found in a variety of proteins, including the high mobility group (HMG) proteins [
], in DNA-binding proteins from plants [] and in hBRG1 protein, a central ATPase of the human switching/sucrose non-fermenting (SWI/SNF) remodeling complex [].
A number of amino acid exporter proteins belong to this family. LeuE encodes an exporter of leucine and some other structurally unrelated amino acids [
]. This family also includes threonine efflux protein RhtC [], arginine exporter protein ArgO/YggA [], as well as a number of uncharacterised proteins from a variety of sources.
This entry represents proteins from the Golgi membrane protein 1 (GOLM1) and cancer susceptibility candidate 4 (CASC4), also known as GOLM2, family. GOLM1 is a Golgi-localised protein that is upregulated by viral infection [
], but whose precise function is unknown. CASC4, like GOLM1, appears to be a single pass membrane protein.
This entry represents GSG1 (germ cell-specific gene 1 protein), a protein specifically expressed in testicular germ cells [
]. It has been shown to target testis-specific poly(A) polymerase to the endoplasmic reticulum through protein-protein interactions []. Overexpression of the human homologue may be involved in tumourigenesis of human testicular germ cell tumours [].
The SecA ATPase is involved in the insertion and retraction of preproteins through the plasma membrane. This domain superfamily has been found to cross-link to preproteins, thought to indicate a role in preprotein binding. The pre-protein cross-linking domain is comprised of two sub domains that are inserted within the ATPase domain [
].