In Gram-negative bacteria and eukaryotes, formylglycinamideribonucleotide amidotransferase is a single protein. In archaea and Gram-positive bacteria it is formed from three proteins. This entry describes the PurQ protein of Bacillus subtilis (where PurL, PurQ, and PurS are required for phosphoribosylformylglycinamidine synthase activity) and functionally equivalent proteins from other bacteria and archaea [
].
This entry represents the the N-terminal domain of Glycoprotein E (gE) from herpesvirus. This protein forms a complex with glycoprotein I (gI), functioning as an immunoglobulin G (IgG) Fc binding protein. gE is involved in virus spread but is not essential for propagation. This domain interacts with gI [
,
,
].
This entry represents the N-terminal nucleotidyltransferase domain of Mab-21 and Mab-21 like proteins and related proteins from animals. In Caenorhabditis elegans, these proteins are required for several aspects of embryonic development [
,
]. This entry also includes inositol 1,4,5-triphosphate receptor-interacting proteins, which are predicted to contain a partial Mab-21 domain [].
This entry represents the immunoglobulin domain that is found in hundreds of proteins of different functions. Examples include antibodies, the giant muscle kinase titin and receptor tyrosine kinases. Immunoglobulin-like domains may be involved in protein-protein and protein-ligand interactions. This domain does not include the first and last strand of the immunoglobulin-like domain.
Clostridial species have a layer of surface proteins surrounding their membrane. This layer is comprised of a high molecular weight protein and a low molecular weight protein. It is a structural domain. This domain is the N-terminal domain of the low molecular weight protein. There is a conserved LGDG sequence motif [
].
Members of this protein family are Lep, an outer membrane lipoprotein LipL41-binding protein that appears to function as a chaperone important to its expression. LipL41 is the third most abundant lipoprotein in the pathogen Leptospira interrogans, but is found in saprophytic Leptospira species as well and is not essential for virulence [
].
PAN domains have significant functional versatility fulfilling diverse biological functions by mediating protein-protein or protein-carbohydrate interactions [
]. These domains contain a hair-pin loop like structure, similar to knottins, but the pattern of disulphide bonds differs. The PAN-3 or CW is a domain associated with a number of Caenorhabditis elegans hypothetical proteins.
Proteins containing this domain include aarF domain containing kinase 2 (ADCK2) from humans, YPL109C from budding yeasts and SPBC21C3.03 from fission yeasts. Eukaryotes contain at least three ABC1-like proteins; in humans, these are ADCK3 and the putative protein kinases named ADCK1 and ADCK2 [
,
]. YPL109C and SPBC21C3.03 are uncharacterised mitochondrial proteins.
Brefeldin A sensitivity protein-related, domain of unknown function DUF2421
Type:
Domain
Description:
This domain is found in several uncharacterised proteins and in Brefeldin A-sensitivity protein 4, which is a zinc finger protein containing five transmembrane domains. Brefeldin A-sensitivity protein 4 null mutant exhibits strongly fragmented vacuoles and sensitivity to brefeldin A, a drug which is known to affect intracellular transport [
,
,
].
This entry represents the C-terminal domain of replication protein A (RPA) interacting protein (RIP). RPA is a single stranded DNA-binding protein involved in DNA replication, repair, and recombination [
]. RPA interacting protein is involved in the import of RPA into the nucleus [,
]. The C-terminal domain contains a putative zinc finger [].
Characterised proteins in this family include probable iron export permease protein FetB, which is part of the ABC transporter complex FetAB, probably involved in iron export [
], and aluminum sensitive 3 protein, also part of an ABC transporter required for aluminum (Al) resistance []. This family also includes integral membrane proteins currently uncharacterised.
This group represents a genome polyprotein from Flavivirus. The polyprotein is cleaved by peptidases (including the NS3 protein, a serine endopeptidase [
]) into 14 chains, including the RNA-directed RNA polymerase NS5 or non-structural protein 5, with three activities: mRNA (guanine-N(7)-)-methyltransferase (), mRNA (nucleoside-2'-O-)-methyltransferase (
) and RNA-directed RNA polymerase (
) [
].
Hypothetical archaeal and bacterial proteins make up this superfamily. A few proteins are annotated as being potential metal-binding proteins, and in fact the members of this family have four highly conserved cysteine residues, but no further literature evidence was found in this regard. The structure of Clostridium perfringens protein CPE0013 has been solved (
).
This entry represents the E3 ubiquitin-protein ligase SINA-like proteins from plants, including SINL1-11 from Arabidopsis. Seven in absentia (SINA) proteins are E3 ligases containing an N-terminally located RING finger domain, followed by the conserved SINA domain that is involved in substrate binding and dimerization. Plant SINA-like proteins are putative E3 ubiquitin ligases [
].
This model describes iron-only hydrogenases of anaerobic and microaerophilic bacteria and protozoa. These proteins represent a subdivision within a larger family which includes proteins such as nuclear prelamin A recognition factor in animals. These proteins show some heterogeneity in terms of periplasmic, cytosolic or hydrogenosomic location, NAD or NADP dependence, and overall protein length.
The portal protein is a bacteriophage component that forms a hole, or portal, enabling DNA passage during packaging and ejection. It also forms the junction between the phage head (capsid) and the tail proteins [
].This entry represents one particular subfamily of portal proteins consisting of the Bacillus phage SPP1 protein and similar sequences.
This family includes prokaryotic proteins of unknown function, as well as a protein annotated as the pit accessory protein from Rhizobium meliloti (Sinorhizobium meliloti) (
). However, while it has been hypothesised that this protein may play a role in orthophosphate transport (Pit stands for phosphate transport), its exact function is unknown [
].
The herpes simplex virus type 1 gene UL47 encodes the tegument proteins referred to collectively as VP13/14, which are believed to be differentially modified forms of the same protein. These proteins have been show to target to the nucleus. The function of this family is unknown but it contains a number of Herpesviridae proteins.
This domain contains three transmembrane helices. Proteins containing this domain are important for glycine utilisation, being identified as glycine transporters. Some proteins containing this domain are also important for alanine utilisation. In these proteins this domain is found in pairs [
]. An archaeal protein which contains this domain is a TRIC-type potassium channel [].
The two products of the lrgAB operon are potential membrane proteins. LrgA and LrgB are antiholin-like proteins thought to control murein hydrolase activity and penicillin tolerance [
] and to inhibit the holin-like proteins from the cidAB operon, CidA and CidB []. The regulatory processes by these proteins is still largely unknown.This family represents LrgB and CidB.
This domain can be found at the N-terminal end of several plant proteins, including VAN3-binding protein from Arabidopsis thaliana (also known as FORKED1), a component of the auto-regulatory loop which enables auxin canalisation by recruitment of the PIN1 auxin efflux protein to the cell membrane [
]. This domain is frequently found on proteins containing at the C terminus.
This entry represents a hemerythrin cation-binding domain that occurs [
] in hemerythrins, myohemerythrin and related proteins. This domain binds iron in hemerythrin, but can bind other metals in related proteins, such as cadmium in a Nereis diversicolor protein () [
]. This domain is also found in Repair of iron centres or Ric proteins [].
This group of proteins includes the Escherichia coli proteins YgfB and YecA, and similar proteins mainly found in gammaproteobacteria. The function of these proteins is unknown. The crystal structure is known for the member from Haemophilus influenzae YgfB (
), which folds into seven α-helices arranged into two domains: a four-helix bundle and a three-helix up-down bundle [
].
BH0479 of Bacillus halodurans is a hypothetical protein which contains a tetratrico peptide repeat (TPR) structural motif. The TPR motif is often involved in mediating protein-protein interactions. This protein is likely to function as a dimer. The first 48 amino acids are not present in the clone construct. This entry represents the tetratricopeptide-like repeats classified as TPR_5 in Pfam.
This family includes UDP-arabinopyranose mutases and putative alpha-1,4-glucan-protein synthase [UDP-forming]. The former catalyses the interconvertion of UDP-L-arabinopyranose (UDP-Arap) and UDP-L-arabinofuranose (UDP-Araf), while the latter catalyses the reaction between UDP-glucose and a protein to produce UDP and alpha-D-glucosyl-protein. UDP-arabinopyranose mutases are essential for cell wall establishment and plant development []. This entry also includes uncharacterized prokaryotic proteins.
The domain can be found in some uncharacterised hypothetical proteins from archaea and bacteria. Although their biological roles remain unclear, the family members show significant sequence similarity to the dimeric 2-deoxyuridine 5'-triphosphate nucleotidohydrolase (dUTP pyrophosphatase or dUTPase) and NTP-PPase MazG proteins. However, unlike typical tandem-domain MazG proteins, this group of proteins contain a single MazG-like domain [
].
This group of proteins includes the Escherichia coli proteins YgfB and YecA, and similar proteins mainly found in gammaproteobacteria. The function of these proteins is unknown. The crystal structure is known for the member from Haemophilus influenzae YgfB (
), which folds into seven α-helices arranged into two domains: a four-helix bundle and a three-helix up-down bundle [
].
This entry describes a repeat originally found in proteins from two species of Verrucomicrobia (Chthoniobacter flavus and Verrucomicrobium spinosum) and in four different proteins of Gloeobacter violaceus PCC7421. In the Verrucomicrobial species, the repeat region is followed by a PEP-CTE protein-sorting signal, suggesting an extracellular location. Members of this group of proteins are found in different bacterial taxa.
ApoM is a 25kDa plasma protein associated with high-density lipoproteins (HDLs). ApoM is important in the formation of pre-ss-HDL and also in increasing cholesterol efflux from macrophage foam cells [
]. Lipoproteins consist of lipids solubilized by apolipoproteins. ApoM lacks an external amphipathic motif and is uniquely secreted to plasma without cleavage of its terminal signal peptide [].
Electron transfer flavoprotein-ubiquinone oxidoreductase
Type:
Family
Description:
Electron-transfer flavoprotein-ubiquinone oxidoreductase (ETF-QO) in the inner mitochondrial membrane accepts electrons from electron-transfer flavoprotein which is located in the mitochondrial matrix and reduces ubiquinone in the mitochondrial membrane. The two redox centres in the protein, FAD and a [4Fe4S] cluster, are present in a 64kDa monomer [].This family also includes probable electron transfer flavoprotein-ubiquinone oxidoreductases from bacteria.
Proteins in this entry are identified as putative alpha-1,2-mannosidases based on the characterisation of the protein from Microbacterium sp. M-90 [
]. Most members of this family appear to have signal sequences and may, like the M-90 protein, be secreted. Some of the proteins from the dental pathogen Porphyromonas gingivalis have been described as immunoreactive with periodontitis patient serum.
Insertion elements are mobile elements in DNA, usually encoding proteins required for transposition, for example transposases. Protein InsA is absolutely required for transposition of insertion element 1.This entry represents a short zinc binding domain found in IS1 InsA family protein. It is found at the N terminus of the protein and may be a DNA-binding domain.
Calponin is a smooth muscle-specific, actin-, tropomyosin- and
calmodulin-binding protein believed to be involved in regulation or modulation of contraction []; interaction of the protein with actin inhibits actomyosinMgATPase activity. Multiple isoforms are found in smooth muscle [
]. Calponin is a basic protein of ~34 Kd []. The protein contains three repeats of awell-conserved 26-residue domain (see
).
This putative protein sorting/processing domain occurs about ten times per genome in members of the Thaumarchaeota. Its putative handling protein, a member of the archaeosortase/exosortase protein family, is exceptional in having a Ser rather than Cys at the putative active site. The highly conserved motif resembles the PEF-CTERM protein sorting domain of family
, but membership does not overlap.
This entry describes a domain of about 40 residues with an invariant TQ dipeptide in an almost invariant TQxA[VI]W motif. This domain occurs in surface-expressed proteins of Gram-positive bacteria, many of which are anchored by LPXTG-containing sortase target domains. Proteins in this entry may also have the Cna protein B-type domain and fibronectin-binding protein signal sequence.
The RuvB protein makes up part of the RuvABC revolvasome which catalyses the resolution of Holliday junctions that arise during genetic recombination and DNA repair. Branch migration is catalysed by the RuvB protein that is targeted to the Holliday junction by the structure specific RuvA protein [
]. This entry represents the N-terminal domain of the protein.
This entry includes N-terminal ubiquitin-like domain from proteins such as NEDD8 ultimate buster 1.NUB1 is an adaptor protein which negatively regulates the ubiquitin-like protein Nedd8 as well as neddylated proteins levels through proteasomal degradation [
,
]. It has been shown to be regulated by Mdm2 (E3 ubiquitin ligase) through ubiquitination on its lysine 159 [].
This group of proteins includes thioredoxins, glutaredoxins, protein-disulphide isomerases, amongst others, some of which have several such domains. The sequence of proteins in this group at the redox-active disulphide site, CPYC, matches glutaredoxins rather than thioredoxins, although overall the sequence seems closer to thioredoxins. Proteins may be involved in a ribonucleotide-reducing system component distinct from thioredoxin or glutaredoxin.
Protein ubiquitination is a reversible posttranslational modification, which
affects a large number of cellular processes including protein degradation,trafficking, cell signaling and the DNA damage response. Ubiquitination is
reversible, and dedicated deubiquitinases exist which hydrolyze isopeptidebonds. Ubiquitin specific proteases (USPs)
) are the largest family of deubiquitinating enzymes. USP domains consist of a common conserved catalytic core which is interspersed at five points with insertions, some of which as large as the catalytic domain itself. The insertions can fold into independent domains that can be involved in the regulation of deubiquitinase activity. As commonly found in signaling proteins, many USP deubiquitinases have a modular architecture, and not only contain a catalytic domain but also additional protein-protein interaction and localization domains. Most USP domains cleave the isopeptide linkage between two ubiquitin molecules, and hence contain (at least) two ubiquitin-binding sites, one for the distal ubiquitin, the C terminus of which is linked to the Lys residue on the proximal ubiquitin in a second, proximal binding site [
]. The USP domain forms the peptidase family C19 [].The USP catalytic core can be divided into six conserved boxes that are
present in all USP domains. Box 1 contains the catalytic Cys residue, box 5contains the catalytic His, and box 6 contains the catalytic Asp/Asn residue.
All boxes show several additional conserved features and residues. Boxes 3 and4 contain a Cys-X-X-Cys motif each, which have been shown to constitute a
functional zinc-binding motif. Potentially, zinc-binding facilitates foldingof the USP core, helping the interaction of sequence motifs some few hundred
residues apart. USP domains share a common conserved fold.The USP domain resembles an open hand containing Thumb, Palm and Fingers
subdomains. The catalytic triad resides between the Thumb (Cys) and Palmsubdomains (His/Asp) [
].This entry represents two conserved sites for the USP domain. The first one
is around the catalytic cysteine in box 1, and the second around the catalytichistidine in box 5.
Members of this group are poly(A) polymerases (polynucleotide adenylyltransferases, PAP,
). In eukaryotes, polyadenylation of pre-mRNA plays an essential role in the initiation step of protein synthesis, as well as in the export and stability of mRNAs. Poly(A) polymerase, the central enzyme of the polyadenylation machinery, is a template-independent RNA polymerase that specifically incorporates ATP at the 3' end of mRNA [
,
].The catalytic domain of poly(A) polymerase shares substantial structural homology with other nucleotidyl transferases such as DNA polymerase beta and kanamycin transferase [
]. The three invariant aspartates of the catalytic triad ligate two of the three active site metals. One of these metals also contacts the adenine ring. Other conserved, catalytically important residues contact the nucleotide. These contacts, taken together with metal coordination of the adenine base, provide a structural basis for ATP selection by poly(A) polymerase [].The central domain of poly(A) polymerase shares structural similarity with the allosteric activity domain of ribonucleotide reductase R1, which comprises a four-helix bundle and a three-stranded mixed β-sheet. Even though the two enzymes bind ATP, the ATP-recognition motifs are different [
]. The C-terminal domain is predicted to be an RNA-binding domain because it folds into a compact domain reminiscent of the RNA-recognition motif fold [].The C-terminal region beyond the predicted RNA-binding domain is only conserved in vertebrates and is dispensable for catalytic activity
in vitro. The extended C-terminal domain of vertebrate PAPs is rich in serines and threonines, and enzyme activity can be down regulated by phosphorylation at multiple sites [
,
]. The extreme C terminus of PAP is also the target for another type of regulation. The U1A protein, a component of the U1 snRNP which functions in 5 splice site recognition, is known to inhibit polyadenylation of its own mRNA by binding to PAP []. The C terminus of PAP is also involved in protein-protein interactions with the splicing factor U2AF65 [] and the snRNP protein U1-70K [].
A carbohydrate-binding module (CBM) is defined as a contiguous amino acid sequence within a carbohydrate-active enzyme with a discreet fold having carbohydrate-binding activity. A few exceptions are CBMs in cellulosomal scaffolding proteins and rare instances of independent putative CBMs. The requirement of CBMs existing as modules within larger enzymes sets this class of carbohydrate-binding protein apart from other non-catalytic sugar binding proteins such as lectins and sugar transport proteins.CBMs were previously classified as cellulose-binding domains (CBDs) based on the initial discovery of several modules that bound cellulose [
,
]. However, additional modules in carbohydrate-active enzymes are continually being found that bind carbohydrates other than cellulose yet otherwise meet the CBM criteria, hence the need to reclassify these polypeptides using more inclusive terminology.Previous classification of cellulose-binding domains were based on amino acid similarity. Groupings of CBDs were called "Types"and numbered with roman numerals (e.g. Type I or Type II CBDs). In keeping with the glycoside hydrolase classification, these groupings are now called families and numbered with Arabic numerals. Families 1 to 13 are the same as Types I to XIII. For a detailed review on the structure and binding modes of CBMs see [
].This entry represents
which was previously known as cellulose-binding domain family VI (CBD VI). CBM6 bind to amorphous cellulose, xylan, mixed beta-(1,3)(1,4)glucan and beta-1,3-glucan[
,
,
].CBM6 adopts a classic lectin-like β-jelly roll fold, predominantly consisting of five antiparallel β-strands on one face and four antiparallel β-strands on the other face. It contains two potential ligand binding sites, named respectively cleft A and B. These clefts include aromatic residues which are probably involved in the substrate binding. The cleft B is located on the concave surface of one β-sheet, and the cleft A on one edge of the protein between the loop that connects the inner and outer β-sheets of the jellyroll fold [
]. The multiple binding clefts confer the extensive range of specificities displayed by the domain [,
,
].
Annexins [
,
,
,
,
,
,
,
] are a group of calcium-binding proteins that associate reversibly with membranes. They bind to phospholipid bilayers in the presence of micromolar free calcium concentration. The binding is specific for calcium and for acidic phospholipids. Annexins have been claimed to be involved in cytoskeletal interactions, phospholipase inhibition, intracellular signaling, anticoagulation, and membrane fusion. Annexins are widely distributed among eukaryotes but largely absent in prokaryotes and yeast. They are classified according to the evolutionary divisions of the eukaryotes into five families: A (ANXA, vertebrates, including humans), B (ANXB, invertebrates), C (ANXC, fungi), D (ANXD, true plants), E (ANXE, protists).Each of these proteins consist of a unique N-terminal domain followed by four or eight copies (in annexin A6) of a conserved segment of approximately 70 residues. The tertiary structure of annexins is evolutionary conserved; a single molecule resembles a slightly curved disk with the calcium and phospholipid-binding sites located on the more convex surface and the more concave surface facing the cytoplasm. Each single annexin repeat (sometimes known as an 'endonexin fold') is comprised of five α-helices(A-E). Four of them (A, B, D and E) are arranged parallel and form a tightly packed helix-loop-helix bundle. In contrast, helix C is almost perpendicular and covers the remaining four on the surface. Each of the
repeats has the potential to have a type II Ca(2)-binding bipartite motif, located on two different α-helices (GxGT-(38-40 residues)-D/E), buttypically some of them are non-functional. The core of the helix bundle is composed largely of hydrophobic residues, while hydrophilic residues are
exposed on the surface of the protein and between the repeats. The N-terminal domain of variable length, amino acid composition, and determinants of hydrophobicity plays an important role in mediating the interaction of annexins with other intracellular protein partners, such as those of the S100family cytoplasmic proteins [
,
,
].This region spans positions 9 to 61 of the repeat and includes the only perfectly conserved residue (an arginine in position 22).
Over 70 metallopeptidase families have been identified to date. In these enzymes a divalent cation which is usually zinc, but may be cobalt, manganese or copper, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. In some families of co-catalytic metallopeptidases, two metal ions are observed in crystal structures ligated by five amino acids, with one amino acid ligating both metal ions. The known metal ligands are His, Glu, Asp or Lys. At least one other residue is required for catalysis, which may play an electrophillic role.
Many metalloproteases contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site []. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as 'abXHEbbHbc', where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases [].This group of metallopeptidases belong to MEROPS peptidase family M24 (clan MG), subfamilies M24A and M24B [
].Methionine aminopeptidase (
) (MAP) is responsible for the removal of the amino-terminal (initiator) methionine from nascent eukaryotic cytosolic and cytoplasmic prokaryotic proteins if the penultimate amino acid is small and uncharged. All MAP studied to date are monomeric proteins that require cobalt ions for activity.
Two subfamilies of MAP enzymes are known to exist [
,
]. While being evolutionary related, they only share a limited amount of sequence similarity mostly clustered around the residues shown to be involved in cobalt-binding. The first family consists of enzymes from prokaryotes as well as eukaryotic MAP-1, while the second group is made up of archaeal MAP and eukaryotic MAP-2. The second subfamily also includes proteins which do not seem to be MAP, but that are clearly evolutionary related such as mouse proliferation-associated protein 1 and fission yeast curved DNA-binding protein.
Zinc finger, DNA-directed DNA polymerase, family B, alpha
Type:
Domain
Description:
Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [
,
,
,
,
]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. The DNA Polymerase alpha zinc finger domain adopts an α-helix-like structure, followed by three turns, all of which involve proline. The resulting motif is a helix-turn-helix motif, in contrast to other zinc finger domains, which show anti-parallel sheet and helix conformation. Zinc binding occurs due to the presence of four cysteine residues positioned to bind the metal centre in a tetrahedral coordination geometry. The function of this domain is uncertain: it has been proposed that the zinc finger motif may be an essential part of the DNA binding domain. It is involved in providing a structural platform for interactions with both oligonucleotide/oligosaccharide (OB) and phosphoesterase domains of the B subunit [
,
,
].
Molecular chaperones are a diverse family of proteins that function to protect proteins in the intracellular milieu from irreversible aggregation during synthesis and in times of cellular stress. The bacterial molecular chaperone DnaK is an enzyme that couples cycles of ATP binding, hydrolysis, and ADP release by an N-terminal ATP-hydrolizing domain to cycles of sequestration and release of unfolded proteins by a C-terminal substrate binding domain. Dimeric GrpE is the co-chaperone for DnaK, and acts as a nucleotide exchange factor, stimulating the rate of ADP release 5000-fold [
]. DnaK is itself a weak ATPase; ATP hydrolysis by DnaK is stimulated by its interaction with another co-chaperone, DnaJ. Thus the co-chaperones DnaJ and GrpE are capable of tightly regulating the nucleotide-bound and substrate-bound state of DnaK in ways that are necessary for the normal housekeeping functions and stress-related functions of the DnaK molecular chaperone cycle.Besides stimulating the ATPase activity of DnaK through its J-domain, DnaJ also associates with unfolded polypeptide chains and prevents their aggregation []. Thus, DnaK and DnaJ may bind to one and the same polypeptide chain to form a ternary complex. The formation of a ternary complex may result in cis-interaction of the J-domain of DnaJ with the ATPase domain of DnaK. An unfolded polypeptide may enter the chaperone cycle by associating first either with ATP-liganded DnaK or with DnaJ. DnaK interacts with both the backbone and side chains of a peptide substrate; it thus shows binding polarity and admits only L-peptide segments. In contrast, DnaJ has been shown to bind both L- and D-peptides and is assumed to interact only with the side chains of the substrate.This domain consists of the C-terminal region of the DnaJ protein. The function of this domain is unknown. It is found associated with
and
.
DnaJ is a chaperone associated with the Hsp70 heat-shock system involved in protein folding and renaturation after stress. The two C-terminal domains CTDI and this, CTDII, are necessary for maintaining the J-domains in their specific relative positions [].
Peptidase S11, D-Ala-D-Ala carboxypeptidase A, C-terminal
Type:
Domain
Description:
Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes [
]. They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Many families of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence []. Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases [].Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base [
]. The geometric orientations of the catalytic residues are similar between families, despite different protein folds []. The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) [,
].This entry contains proteins that are annotated as penicillin-binding protein 5 and 6. These belong to MEROPS peptidase family S11 (D-Ala-D-Ala carboxypeptidase A family, clan SE). Penicillin-binding protein 5 expressed by Escherichia coli functions as a D-alanyl-D-alanine carboxypeptidase. It is composed of two domains that are oriented at approximately right angles to each other. The N-terminal domain (
) is the catalytic domain. The C-terminal domain, this entry, is organised into a sandwich of two anti-parallel β-sheets, and has a relatively hydrophobic surface as compared to the N-terminal domain. Its precise function is unknown; it may mediate interactions with other cell wall-synthesising enzymes, thus allowing the protein to be recruited to areas of active cell wall synthesis. It may also function as a linker domain that positions the active site in the catalytic domain closer to the peptidoglycan layer, to allow it to interact with cell wall peptides [
].
Neurotransmitter transport systems are integral to the release, re-uptake and recycling of neurotransmitters at synapses. High affinity transport proteins found in the plasma membrane of presynaptic nerve terminals and glial cells are responsible for the removal from the extracellular space of released-transmitters, thereby terminating their actions [
]. Plasma membrane neurotransmitter transporters fall into two structurally and mechanistically distinct families. The majority of the transporters constitute an extensive family of homologous proteins that derive energy from the co-transport of Na+and Cl
-, in order to transport neurotransmitter molecules into the cell against their concentration gradient. The family has a common structure of 12 presumed transmembrane helices and includes carriers for gamma-aminobutyric acid (GABA), noradrenaline/adrenaline, dopamine, serotonin, proline, glycine, choline, betaine and taurine. They are structurally distinct from the second more-restricted family of plasma membrane transporters, which are responsible for excitatory amino acid transport. The latter couple glutamate and aspartate uptake to the cotransport of Na
+and the counter-transport of K
+, with no apparent dependence on Cl
-[
]. In addition, both of these transporter families are distinct from the vesicular neurotransmitter transporters [,
].The human noradrenaline transporter is the primary target for
widely-used tricyclic antidepressant drugs, such as desipramine. Cloning studieshave revealed a single noradrenaline transporter isoform. The cDNA sequence
predicts a protein of ~617 amino acids, with 12-13 highly hydrophobicregions, as would be expected for a member of the Na
+and Cl
--coupled
neurotransmitter superfamily []. In humans, this protein has been related to generalized anxiety disorder, bipolar disorder and schizophrenia [,
,
]. Related noradrenaline transporter specieshave been identified in both Drosophila and Rana catesbeiana (Bull frog). In mammalian species,
mRNA for the transporter is localised to the brainstem and the adrenalglands. Chromosome mapping has placed the encoding gene to chromosome
16q12.2 in humans []. Recently, a bovine noradrenaline transporter varianthas been identified that shows altered targeting to the plasma membrane.
This has implicated the cytoplasmic facing C terminus in the intracellulartrafficking of this protein [
].
[NiFe] hydrogenases function in H2 metabolism in a variety of microorganisms, enabling them to use H2 as a source of reducing equivalent under aerobic and anaerobic conditions [NiFe] hydrogenases consist of two subunits, hydrogenase large and hydrogenase small. The large subunit contains the binuclear [NiFe]active site, while the small subunit binds at least one [4Fe-4S] cluster [].Energy-converting [NiFe] hydrogenases (or [NiFe]-hydrogenase-3-type) form a distinct group within the [NiFe] hydrogenase family [,
,
]. Members of this subgroup include:Hydrogenase 3 and 4 (Hyc and Hyf) from Escherichia coliCO-induced hydrogenase (Coo) from Rhodospirillum rubrumMbh hydrogenase from Pyrococcus furiosusEha and Ehb hydrogenases from Methanothermobacter speciesEch hydrogenase from Methanosarcina barkeriEnergy-converting [NiFe] hydrogenases are membrane-bound enzymes with a six-subunit core: the large and small hydrogenase subunits, plus two hydrophilic proteins and two integral membrane proteins. Their large and small subunits show little sequence similarity to other [NiFe]hydrogenases, except for key conserved residues coordinating the active site and [FeS] cluster. However, they show considerable sequence similarity to the six-subunit, energy-conserving NADH:quinone oxidoreductases (complex I), which are present in cytoplasmic membranes of many bacteria and in inner mitochondrial membranes. However, the reactions they catalyse differ significantly from complex I. Energy-converting [NiFe]hydrogenases function as ion pumps.Eha and Ehb hydrogenases contain extra subunits in addition to those shared by other energy-converting [NiFe] hydrogenases (or [NiFe]-hydrogenase-3-type). Eha contains a 6[4Fe-4S] polyferredoxin, a 10[4F-4S]polyferredoxin, ten other predicted integral membrane proteins (EhaA
, EhaB
, EhaC
, EhaD
, EhaE
, EhaF
, EhaG
, EhaI
, EhaK
, EhaL
and
) and four hydrophilic subunits (EhaM, EhaR, EhS, EhT) [
,
]. The ten predicted integral membrane proteins are absent from Ech, Coo, Hyc and Hyf complexes, which may have simpler membrane components than Eha. Eha and Ehb catalyse the reduction of low-potential redox carriers (e.g. ferredoxins or polyferredoxins), which then might function as electron donors to oxidoreductases.This entry represents proteins that are predicted to be the hydrophilic EhaR subunits of Eha-type energy-converting [NiFe] hydrogenase complexes.
The serum paraoxonases/arylesterases are enzymes that catalyse the hydrolysisof the toxic metabolites of a variety of organophosphorus insecticides. The
enzymes hydrolyse a broad spectrum of organophosphate substrates, including paraoxon and a number of aromatic carboxylic acid esters (e.g., phenyl
acetate), and hence confer resistance to organophosphate toxicity []. Mammals have 3 distinct paraoxonase types, termed PON1-3 [,
]. In mice andhumans, the PON genes are found on the same chromosome in close proximity.
PON activity has been found in variety of tissues, with highest levels in liver and serum - the source of serum PON is thought to be the liver. Unlike mammals, fish and avian species lack paraoxonase activity.
Human and rabbit PONs appear to have two distinct Ca2+ binding sites, onerequired for stability and one required for catalytic activity. The Ca2+
dependency of PONs suggests a mechanism of hydrolysis where Ca2+ acts as theelectrophillic catalyst, like that proposed for phospholipase A2. The
paraoxonase enzymes, PON1 and PON3, are high density lipoprotein (HDL)-associated proteins capable of preventing oxidative modification of low
density lipoproteins (LPL) []. Although PON2 has oxidative properties, theenzyme does not associate with HDL.
Within a given species, PON1, PON2 and PON3 share ~60% amino acid sequence identity, whereas between mammalian species particular PONs (1,2 or 3) share
79-90% identity at the amino acid level. Human PON1 and PON3 share numerous conserved phosphorylation and N-glycosylation sites; however, it is not
known whether the PON proteins are modified at these sites, or whether modification at these sites is required for activity in vivo [
]. Rabbit and human serum PON1 also hydrolyse a variety of lactones and cycliccarbonate esters, including naturally occurring lactones and pharmacological
agents []. Humans have 2 common PON1 allozymes, determined by the presenceof either arginine or glutamine at position 191. The A-type allozyme (glutamine at position 191) causes low paraoxonase activity []; thispolymorphism is associated with variations in cholesterol and lipoprotein
levels.
[NiFe] hydrogenases function in H2 metabolism in a variety of microorganisms, enabling them to use H2 as a source of reducing equivalent under aerobic and anaerobic conditions [NiFe]hydrogenases consist of two subunits, hydrogenase large and hydrogenase small. The large subunit contains the binuclear [NiFe] active site, while the small subunit binds at least one [4Fe-4S]cluster [
].Energy-converting [NiFe] hydrogenases (or [NiFe]-hydrogenase-3-type) form a distinct group within the [NiFe] hydrogenase family [,
,
]. Members of this subgroup include:Hydrogenase 3 and 4 (Hyc and Hyf) from Escherichia coliCO-induced hydrogenase (Coo) from Rhodospirillum rubrumMbh hydrogenase from Pyrococcus furiosusEha and Ehb hydrogenases from Methanothermobacter speciesEch hydrogenase from Methanosarcina barkeriEnergy-converting [NiFe] hydrogenases are membrane-bound enzymes with a six-subunit core: the large and small hydrogenase subunits, plus two hydrophilic proteins and two integral membrane proteins. Their large and small subunits show little sequence similarity to other [NiFe]hydrogenases, except for key conserved residues coordinating the active site and [FeS] cluster. However, they show considerable sequence similarity to the six-subunit, energy-conserving NADH:quinone oxidoreductases (complex I), which are present in cytoplasmic membranes of many bacteria and in inner mitochondrial membranes. However, the reactions they catalyse differ significantly from complex I. Energy-converting [NiFe]hydrogenases function as ion pumps.Eha and Ehb hydrogenases contain extra subunits in addition to those shared by other energy-converting [NiFe]hydrogenases (or [NiFe]-hydrogenase-3-type). Eha contains a 6[4Fe-4S]polyferredoxin, a 10[4F-4S] polyferredoxin, ten other predicted integral membrane proteins (EhaA , EhaB
, EhaC
, EhaD
, EhaE
, EhaF
, EhaG
, EhaI
, EhaK
, EhaL
and
) and four hydrophilic subunits (EhaM, EhaR, EhS, EhT) [
,
]. The ten predicted integral membrane proteins are absent from Ech, Coo, Hyc and Hyf complexes, which may have simpler membrane components than Eha. Eha and Ehb catalyse the reduction of low-potential redox carriers (e.g. ferredoxins or polyferredoxins), which then might function as electron donors to oxidoreductases.This entry represents proteins that are predicted to be the hydrophilic EhaM subunits of Eha-type energy-converting [NiFe] hydrogenase complexes.
Over 70 metallopeptidase families have been identified to date. In these enzymes a divalent cation which is usually zinc, but may be cobalt, manganese or copper, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. In some families of co-catalytic metallopeptidases, two metal ions are observed in crystal structures ligated by five amino acids, with one amino acid ligating both metal ions. The known metal ligands are His, Glu, Asp or Lys. At least one other residue is required for catalysis, which may play an electrophillic role.
Many metalloproteases contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site []. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as 'abXHEbbHbc', where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases [].The ADAMTSs (a disintegrin and metalloproteinase domain with thrombospondin
type-1 modules) are a family of zinc dependent metalloproteinases that playimportant roles in a variety of normal and pathological conditions. These
enzymes show a complex domain organisation including signal sequence,propeptide, metalloproteinase domain (see
), disintegrin-like
domain (see ), central TS-1 motif (see
), cysteine-rich
region, and a variable number of TS-like repeats at the C-terminal region. TheGON domain is an approximately 200-residue module, whose presence is the hallmark of
a subfamily of structurally and evolutionarily related ADAMTSs, called GON-ADAMTSs. The GON domain is characterised by the presence of several conserved
cysteine residues and is likely to be globular [], [].Some proteins known to contain a GON domain are listed below:
Mammalian ADAMTS-9Mammalian ADAMTS-20Caenorhabditis elegans gon-1, a protease required for gonadal
morphogenesisProteins containing the GON domain belong to MEROPS peptidase subfamily M12B (adamalysin, clan MA).
This is a GIY-YIG domain mainly found in phages proteins, such as the Putative endonuclease segE from Bacteriophage T4. This domain is also found in some uncharacterised bacterial sequences. Proteins containing this domain are approximately 145-300 amino acids in length.Nucleases of the GIY-YIG family are involved in many cellular processes, including DNA repair and recombination, transfer of mobile genetic elements, and restriction of incoming foreign DNA. The GIY-YIG superfamily groups together nucleases characterised by the presence of a domain of typically ~100 amino acids, with two short motifs "GIY"and "YIG"in the N-terminal part, followed by an Arg residue in the centre and a Glu residue in the C-terminal part [
,
,
,
,
]. The GIY-YIG domain forms a compact structural domain, which serves as a scaffold for the coordination of a divalent metal ion required for catalysis of the phosphodiester bond cleavage. The GIY-YIG domain has an α/β-sandwich architecture with a central three-stranded antiparallel β-sheet flanked by three-helices. The three-stranded antiparallel β-sheet contains the GIY-YIG sequence elements. The most conserved and putative catalytic residues are located on a shallow, concave surface and include a metal coordination site [
,
,
,
].The GIY-YIG domain has been implicated in a variety of cellular processes involving DNA cleavage, from self-propagation with or without introns, to restriction of foreign DNA, to DNA repair and maintenance of genome stability [
]. Some proteins known to contain a GIY-YIG domain include:Eukaryotic Slx-1 proteins, involved in the maintenance of the rDNA copy number. They have a C-terminal RING finger Zn-binding domain.Mammalian ankyrin repeat and LEM domain- containing protein 1 (ANKLE1).Bacterial and archaeal UvrC subunits of (A)BC excinucleases, which remove damaged nucleotides by incising the damaged strand on both sides of the lesion.Phage T4 endonucleases SegA to E, probably involved in the movement of the endonuclease-encoding DNA.Phage T4 intron-associated endonuclease 1 (I-TevI), specific to the thymidylate synthase (td) gene splice junction and involved in intron homing.
This entry represents the C-terminal transmembrane domain of Frizzled-4 from arthropods and Mom-5 from nematodes. Frizzleds are seven transmembrane-spanning proteins that constitute an unconventional class of G protein-coupled receptors [
]. They have important regulatory roles during embryonic development [,
].Frizzleds expose their large N terminus on the extracellular side. The N-terminal, extracellular cysteine-rich domain (CRD) has been implicated as the Wnt binding domain and its structure has been solved [
]. The cysteine-rich domain of Frizzled (Fz) is shared with other receptor tyrosine kinases that have roles in development including the muscle-specific receptor tyrosine kinase (MuSK), the neuronal specific kinase (NSK2), and ROR1 and ROR2. The cytoplasmic side of many Fz proteins has been shown to interact with the PDZ domains of PSD-95 family members and is thought to have a role in the assembly of signalling complexes. The conserved cytoplasmic motif of Fz, Lys-Thr-X-X-X-Trp, is required for activation of the beta-catenin pathway, and for membrane localisation and phosphorylation of Dsh.In Drosophila melanogaster, the frizzled locus is involved in planar cell polarity, which is the coordination of the cytoskeleton of epidermal cells to produce a parallel array of cuticular hairs and bristles [
,
]. In the wild-type wing, all hairs point towards the distal tip [], whereas in Fz mutants, the orientation of individual hairs with respect both to their neighbours and to the organism as a whole is altered. In the developing wing, Fz function is required for cells to respond to the extracellular polarity signal as well as the proximal-distal transmission of an intracellular polarity signal.In Caenorhabditis elegans, protein mom-5 is the equivalent of frizzled [
,
].Three main signalling pathways are activated by agonist-activated Frizzled proteins: the Fz/beta-catenin pathway, the Fz/Ca2 pathway and the Fz/PCP (planar cell polarity) pathway [
]. The Wnt/beta-catenin pathway is the best studied signalling pathway involving Fz receptors. In the Wnt/beta-catenin pathway the first downstream cytoplasmic components activated by Fz signalling include Dishevelled (Dsh) and/or its regulatory kinases.
Sea anemones are a rich source of lethal pore-forming peptides and proteins, known collectively as cytolysins or actinoporins. There are several different groups of cytolysins based on their structure and function, and share conserved regions such as a surface-exposed lipid/carbohydrate-binding module involved in toxin binding to cell membranes providing a non-specific binding to membranes to target a wide range of species and protein-protein binding surfaces that contribute to the oligomerization of membrane-bound actinoporin monomers [
,
]. This entry represents the most numerous group, the 20kDa highly basic peptides. These cytolysins form cation-selective pores in sphingomyelin-containing membranes. Examples include equinatoxins (from Actinia equina), sticholysins (from Stichodactyla helianthus), magnificalysins (from Heteractis magnifica), and tenebrosins (from Actinia tenebrosa), which exhibit pore-forming, haemolytic, cytotoxic, and heart stimulatory activities. This entry also includes related proteins from fish.Cytolysins adopt a stable soluble structure, which undergoes a conformational change when brought in contact with a membrane, leading to an active, membrane-bound form that inserts spontaneously into the membrane. They often oligomerize on the membrane surface, before puncturing the lipid bilayers, causing the cell to lyse. The 20kDa sea anemone cytolysins require a phosphocholine lipid headgroup for binding, however sphingomyelin is required for the toxin to promote membrane permeability [
]. The crystal structures of equinotoxin II [] and sticholysin II [] both revealed a compact β-sandwich consisting of ten strands in two sheets flanked on each side by two short α-helices, which is a similar topology to osmotin. It is believed that the β-sandwich structure attaches to the membrane, while a three-turn α-helix lying on the surface of the β-sheet may be involved in membrane pore formation, possibly by the penetration of the membrane by the helix.Interestingly, this entry also includes bryoporin from the moss Physcomitrella patens. It shares the protein structure similarity with sea anemone actinoporin. The bryoporin gene was upregulated by various abiotic stresses, in particular most strongly by dehydration stress. Overexpression of the bryoporin gene heightens drought tolerance in P. patens significantly [
].
Wnt proteins constitute a large family of secreted molecules that are involved in intercellular signalling during development. The name derives from the first 2 members of the family to be discovered: int-1 (mouse) and wingless (Drosophila) [
]. It is now recognised that Wnt signalling controls many cell fate decisions in a variety of different organisms, including mammals []. Wnt signalling has been implicated in tumourigenesis, early mesodermal patterning of the embryo, morphogenesis of the brain and kidneys, regulation of mammary gland proliferation and Alzheimer's disease [,
].Wnt-mediated signalling is believed to proceed initially through binding to cell surface receptors of the frizzled family; the signal is subsequently transduced through several cytoplasmic components to B-catenin, which enters the nucleus and activates the transcription of several genes important in
development []. Several non-canonical Wnt signalling pathways have also been elucidated that act independently of B-catenin. Canonical and noncanonical Wnt signaling branches are highly interconnected, and cross-regulate each other [].Members of the Wnt gene family are defined by their sequence similarity to mouse Wnt-1 and Wingless in Drosophila. They encode proteins of ~350-400 residues in length, with orthologues identified in several, mostly vertebrate, species. Very little is known about the structure of
Wnts as they are notoriously insoluble, but they share the following features characteristics of secretory proteins: a signal peptide, several potential N-glycosylation sites and 22 conserved cysteines [] that are probably involved in disulphide bonds. The Wnt proteins seem to adhere to the plasma membrane of the secreting cells and are therefore likely to signal over only few cell diameters. Fifteen major Wnt gene families have been identified in vertebrates, with multiple subtypes within some classes.In humans, 19 Wnt proteins have been identified that share 27% to 83% amino-acid sequence identity and a conserved pattern of 23 or 24 cysteine residues [
]. Wnt genes are highly conserved between vertebrate species sharing overall sequence identity and gene structure, and are slightly less conserved between vertebrates and invertebrates.
Muscarinic acetylcholine receptors are members of rhodopsin-like G-protein coupled receptor family. They play several important roles; they mediate many of the effects of acetylcholine in the central and peripheral nervous system and modulate a variety of physiological functions, such as airway, eye and intestinal smooth muscle contraction, heart rate and glandular secretions. The receptors have a widespread tissue distribution and are a major drug target in human disease. They may be effective therapeutic targets in Alzheimer's disease, schizophrenia, Parkinson's disease and chronic obstructive pulmonary disease [
,
]. There are five muscarinic acetylcholine receptor subtypes, designated M1-5 [
,
,
,
,
]. The family can be further divided into two broad groups based on their primary coupling to G-proteins. M2 and M4 receptors couple to the pertussis-toxin sensitive Gi proteins, whereas M1, M3 and M5 receptors couple to Gq proteins [,
], which activate phospholipase C. The different subtypes can also couple to a wide range of diverse signalling pathways, some of which are G protein-independent [,
,
].All subtypes seem to serve as autoreceptors [
], and knockout mice reveal the important neuromodulatory role played by this receptor family [,
,
].Muscarinic acetylcholine receptor M2 is highly expressed in the heart and has a profound role in the control of myocyte contraction [
,
,
]. The release of acetylcholine from vagal parasympathetic neurones reduces the heart beating frequency by acting almost exclusively at M2 muscarinic receptors [,
,
]. The M2 receptor is found at much lower levels in the CNS, where it has limited distribution []. It is co-expressed with the M3 muscarinic receptors in smooth muscle [] and it appears to play a much smaller role in the smooth muscle contractile response []. M2 receptors are also expressed in the thermo-regulatory centres of the hypothalamus and are likely to be involved in the regulation of body temperature [].
Muscarinic acetylcholine receptors are members of rhodopsin-like G-protein coupled receptor family. They play several important roles; they mediate many of the effects of acetylcholine in the central and peripheral nervous system and modulate a variety of physiological functions, such as airway, eye and intestinal smooth muscle contraction, heart rate and glandular secretions. The receptors have a widespread tissue distribution and are a major drug target in human disease. They may be effective therapeutic targets in Alzheimer's disease, schizophrenia, Parkinson's disease and chronic obstructive pulmonary disease [
,
]. There are five muscarinic acetylcholine receptor subtypes, designated M1-5 [
,
,
,
,
]. The family can be further divided into two broad groups based on their primary coupling to G-proteins. M2 and M4 receptors couple to the pertussis-toxin sensitive Gi proteins, whereas M1, M3 and M5 receptors couple to Gq proteins [,
], which activate phospholipase C. The different subtypes can also couple to a wide range of diverse signalling pathways, some of which are G protein-independent [,
,
].All subtypes seem to serve as autoreceptors [
], and knockout mice reveal the important neuromodulatory role played by this receptor family [,
,
].Muscarinic acetylcholine M3 receptors are located in smooth muscles, endocrine glands, exocrine glands, lungs, pancreas and the brain [
,
,
,
,
]. They are found in high levels in neuronal cells of the CNS, their distribution largely overlapping with that of M1 and M4 receptor subtypes. The primary role of M3 muscarinic receptors is contraction of smooth muscle, particularly of airway, ileum, iris and bladder [,
,
]. The receptors are also expressed on pancreatic beta-cells and in areas of the brain where they influence insulin secretion [,
,
], indicating that M3 receptors might be an important target for understanding the mechanisms of type 2 diabetes mellitus. M3 receptors are also involved in exocrine secretion, particularly of saliva [,
].
Oxygenic photosynthesis uses two multi-subunit photosystems (I and II) located in the cell membranes of cyanobacteria and in the thylakoid membranes of chloroplasts in plants and algae. Photosystem II (PSII) has a P680 reaction centre containing chlorophyll 'a' that uses light energy to carry out the oxidation (splitting) of water molecules, and to produce ATP via a proton pump. Photosystem I (PSI) has a P700 reaction centre containing chlorophyll that takes the electron and associated hydrogen donated from PSII to reduce NADP+ to NADPH. Both ATP and NADPH are subsequently used in the light-independent reactions to convert carbon dioxide to glucose using the hydrogen atom extracted from water by PSII, releasing oxygen as a by-product.PSII is a multisubunit protein-pigment complex containing polypeptides both intrinsic and extrinsic to the photosynthetic membrane [
,
,
]. Within the core of the complex, the chlorophyll and beta-carotene pigments are mainly bound to the antenna proteins CP43 (PsbC) and CP47 (PsbB), which pass the excitation energy on to the reaction centre proteins D1 (Qb, PsbA) and D2 (Qa, PsbD) that bind all the redox-active cofactors involved in the energy conversion process. The PSII oxygen-evolving complex (OEC) oxidises water to provide protons for use by PSI, and consists of OEE1 (PsbO), OEE2 (PsbP) and OEE3 (PsbQ). The remaining subunits in PSII are of low molecular weight (less than 10kDa), and are involved in PSII assembly, stabilisation, dimerisation, and photo-protection []. The low molecular weight transmembrane protein PsbX found in PSII is associated with the oxygen-evolving complex. Its expression is light-regulated. PsbX appears to be involved in the regulation of the amount of PSII [
], and may be involved in the binding or turnover of quinone molecules at the Qb (PsbA) site [].PsbX can be divided into two subfamilies; this one, which is found only in Prochlorococcus, and a shorter one with a much wider taxonomic distribution. The proteins in this subfamily are currently uncharacterised.
Oxygenic photosynthesis uses two multi-subunit photosystems (I and II) located in the cell membranes of cyanobacteria and in the thylakoid membranes of chloroplasts in plants and algae. Photosystem II (PSII) has a P680 reaction centre containing chlorophyll 'a' that uses light energy to carry out the oxidation (splitting) of water molecules, and to produce ATP via a proton pump. Photosystem I (PSI) has a P700 reaction centre containing chlorophyll that takes the electron and associated hydrogen donated from PSII to reduce NADP+ to NADPH. Both ATP and NADPH are subsequently used in the light-independent reactions to convert carbon dioxide to glucose using the hydrogen atom extracted from water by PSII, releasing oxygen as a by-product.PSII is a multisubunit protein-pigment complex containing polypeptides both intrinsic and extrinsic to the photosynthetic membrane [
,
,
]. Within the core of the complex, the chlorophyll and beta-carotene pigments are mainly bound to the antenna proteins CP43 (PsbC) and CP47 (PsbB), which pass the excitation energy on to the reaction centre proteins D1 (Qb, PsbA) and D2 (Qa, PsbD) that bind all the redox-active cofactors involved in the energy conversion process. The PSII oxygen-evolving complex (OEC) oxidises water to provide protons for use by PSI, and consists of OEE1 (PsbO), OEE2 (PsbP) and OEE3 (PsbQ). The remaining subunits in PSII are of low molecular weight (less than 10kDa), and are involved in PSII assembly, stabilisation, dimerisation, and photo-protection []. The low molecular weight transmembrane protein PsbX found in PSII is associated with the oxygen-evolving complex. Its expression is light-regulated. PsbX appears to be involved in the regulation of the amount of PSII [
], and may be involved in the binding or turnover of quinone molecules at the Qb (PsbA) site [].PsbX can be divided into two subfamilies; this one and a longer one found in most but not all Prochlorococcus. The best characterised proteins (from Synechocystis and Thermosynechococcus elongatus) are found in this subfamily.
This entry represents the NOPS domain and the C-terminal coiled-coil region of PSF (also known as SFPQ). The C-terminal coiled-coil region functions in mediating DBHS dimerization, while some surface-exposed basic residues within the NOPS domain may be involved in nucleic acid binding [
].PSF is a member of the DBHS (Drosophila behavior human splicing) family. It participates in a wide range of gene regulatory processes and cellular response pathways. It has been shown to affect the alternative splicing of CD45 and Tau and regulate the 3' polyadenylation of mRNAs. It is often localised in the paraspeckles and may be involved in the nuclear retention of mRNAs. It is involved in translation and transcription. It can bind directly to DSBs and play a role in DNA repair. PSF can also be utilized as an essential host factor for viral RNA multiplication and replication [
,
]. In addition to the common DHBS core, which encompasses RRM1 and RRM2, the protein-protein interaction NOPS domain and the coiled-coil domain, PSF features additional domains, such as a RGG motif and a proline-rich region in its N terminus []. DBHS (Drosophila behavior human splicing) family are characterised by a core domain arrangement consisting of tandem RNA recognition motifs (RRMs), a conserved intervening sequence referred to as a NONA/ParaSpeckle (NOPS) domain, and a ~100 amino acid coiled-coil domain. Its members include p54nrb (also known as NONO), PTB-associated splicing factor/splicing factor proline-glutamine rich (PSF or SFPQ) and PSPC1 (paraspeckle protein component 1). They are found in the nucleoplasm and can be triggered by binding to local high concentrations of various nucleic acids to form microscopically visible nuclear bodies, paraspeckles or large complexes such as DNA repair foci. They may also function cytoplasmically and on the cell surface in defined cell types. All three DBHS proteins are conserved throughout vertebrate species, while flies, worms, and yeast express a single DBHS protein [
,
].
This entry represents the 6-Cys domain.The 6-Cysteine (6-Cys) domain is found in Plasmodium proteins that are
expressed in all stages of the parasite life cycle in both the vertebrate andmosquito hosts. The domain is of roughly 120 amino acids and contains sixpositionally conserved cysteines. It might occur in 1-14 copies per protein,
either as the sole globular domain or in combination wih β-helix-forminghexapeptide repeats as is the case of sequestrin. It is generally found in
tandem pairs of A-type and B-type domains. Previously believed to be exclusiveto Plasmodium, the 6-Cys domain also exists in proteins found in all members
of the aconoidasidan (hematozoan) clade of Apicomplexa, which unites thehaemosporidians (Plasmodium) and piroplasms) [
,
,
,
,
].The 6-cys domain is a β-sandwich formed by two sheets with a mixture of
parallel and antiparallel strands. Three disulfide bonds arepresent in the 6_Cys domain with C1-C2, C3-C6, and C4-C5 connectivity. C1-C2
and C3-C6 pin together the two sheets of the β-sandwich, whereas C4-C5links an ancyllary loop to the core domain [
,
].Some Plasmodium proteins known to contain a 6-Cys domain are listed below:Pfs48/45, involved in male/female gamete fusion in the mosquito midgut. It
is predicted to be glycosylphosphatidylinositol (GPI)-anchored to thegamete surface.Pfs230, involved in male/female gamete fusion in the mosquito midgut. It is
a soluble protein that associates with the gamete membrane by binding toPfs48/45.Pfs47, involved in male/female gamete fusion in the mosquito midgut. It is
found on female gametes.P52, expressed in the sporozoite. May be required for invasion of
hepatocytes and to promote normal development in the liver.P36, expressed in the sporozoite. May be required for invasion of
hepatocytes and to promote normal development in the liver.Pf41, located on the surface of merozoites and has a signal sequence.Pf38, located on the surface of merozoites. It has a signal sequence and is
GPI-anchored to the merozoite surface.Pf12, located on the surface of merozoites. It has a signal sequence and is
GPI-anchored to the merozoite surface.Pf92, located on the surface of merozoites. It has a signal sequence and is
GPI-anchored to the merozoite surface.
Over 70 metallopeptidase families have been identified to date. In these enzymes a divalent cation which is usually zinc, but may be cobalt, manganese or copper, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. In some families of co-catalytic metallopeptidases, two metal ions are observed in crystal structures ligated by five amino acids, with one amino acid ligating both metal ions. The known metal ligands are His, Glu, Asp or Lys. At least one other residue is required for catalysis, which may play an electrophillic role.
Many metalloproteases contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site []. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as 'abXHEbbHbc', where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases [].This group of metallopeptidases belong to MEROPS peptidase family M48 (subfamily M48B). The members of this set of proteins are mostly described as probable protease htpX homologue (
).
HtpX is a zinc-dependent endoprotease member of the membrane-localized proteolytic system in E. coli, which participates in the proteolytic quality control of membrane proteins in conjunction with FtsH, a membrane-bound and ATP-dependent protease. Biochemical characterisation revealed that HtpX undergoes self-degradation upon cell disruption or membrane solubilisation. It can also degraded casein and cleaves solubilised membrane proteins, for example, SecY [
]. Expression of HtpX in the plasma membrane is under the control of CpxR, with the metalloproteinase active site of HtpX located on the cytosolic side of the membrane. This suggests a potential role for HtpX in the response to mis-folded proteins [].
Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [
,
,
,
,
]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. Members of this entry bind to a 5'-GAGAG-3' DNA consensus binding site, and contain a Cys2-His2 zinc finger core as well as an N-terminal extension containing two highly basic regions. The zinc finger core binds in the DNA major groove and recognises the first three GAG bases of the consensus in a manner similar to that seen in other classical zinc finger-DNA complexes. The second basic region forms a helix that interacts in the major groove recognising the last G of the consensus, while the first basic region wraps around the DNA in the minor groove and recognises the A in the fourth position of the consensus sequence [
].
Zona occludens (ZO), or tight junctions (TJ), are specialised membrane domains found at the most apical region of polarised epithelial and endothelial cells. They create a primary barrier, preventing paracellular transport of solutes, and restricting the lateral diffusion of membrane lipids and proteins, thus maintaining cellular polarity [
]. They also act as diffusion barriers within plasma membranes, creating and maintaining apical and basolateral membrane domains. Under freeze-fracture electron microscopy, TJs appear as a network of continuous anastomosing intramembranous strands. These strands consist mainly of claudins and occludin (), which are transmembrane proteins that polymerise
within plasma membranes to form fibrils [].Recently, the molecular architecture of tight junctions has begun to be elucidated. One group of proteins thought to be major components of TJs is the claudin family []. Immunofluorescence studies have shown that claudins are targeted to and incorporated into tight junctions []. Furthermore, when claudins are introduced into cells that lack tight junctions, networks of strands and grooves form at cell-cell contact sites that closely resemble native TJs [].The claudin protein family is encoded by at least 17 human genes, with many homologues cloned from other species. Tissue distribution patterns for the claudin family members are distinct. Claudin-1 and -2, for example, are expressed at high levels in the liver and kidney, whereas claudin-3 mRNA is detected mainly in the lung and liver [
,
]. This suggests that multiple claudin family members may be involved in tight junction strand formation in a tissue-dependent manner. Hydropathy analysis suggests that all claudins share a common transmembrane (TM) topology. Each family member is predicted to possess four TM domains with intracellular N and C termini. Although their C-terminal cytoplasmic domain sequences vary, most claudin family members share a common motif of -Y-V in this region. This has been postulated as a possible binding motif for PDZ domains of other tight junction-associated membrane proteins, such as ZO-1 (
).
This entry represents a conserved region located in an extracellular loop between the first and second transmembrane regions.
Type III secreted modular tyrosine phosphatase, SptP/YopH
Type:
Family
Description:
Secretion of virulence factors in Gram-negative bacteria involves transportation of the protein across two membranes to reach the cell exterior [
]. There have been four secretion systems described in animal enteropathogens, such as Salmonella and Yersinia, with further sequence similarities in plant pathogens like Ralstonia and Erwinia [].The type III secretion system is of great interest, as it is used to transport virulence factors from the pathogen directly into the host cell and is only triggered when the bacterium comes into close contact with the host. The protein subunits of the system are very similar to those of bacterial flagellar biosynthesis [
]. However, while the latter forms a ring structure to allow secretion of flagellin and is an integral part of the flagellum itself, type III subunits in the outer membrane translocate secreted proteins through a channel-like structure.Exotoxins secreted by the type III system do not possess a secretion signal, and are considered unique for this reason [
]. Salmonella/Yersinia spp. secrete a protein, SptP, with tyrosine phosphatase activity [,
]. SptP exerts its function by acting as a GTPase-activating protein, and counteracts the effects of SopE []. This is similar to the Yersinia exotoxin YopH, which degrades host cell signal transduction [].X-ray crystal structures of the Yersinia tyrosine phosphatase (PTPase) in complex with tungstate and nitrate have been solved to 2.4A resolution [
]. The fold belongs to the α-β class. The crystal structure reveals that the nucleophilic cysteine (Cys 403) is positioned at the centre of a distinctive phosphate-binding loop. This loop is at the hub of several hydrogen-bond arrays that not only stabilise a bound oxyanion, but may activate Cys 403 as a reactive thiolate. Binding of tungstate triggers a conformational change that traps the oxyanion and swings Asp 356, an important catalytic residue that resides in a flexible loop, by ~6A into the active site [].
Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes [
]. They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Many families of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence []. Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases [].Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base [
]. The geometric orientations of the catalytic residues are similar between families, despite different protein folds []. The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) [,
].This entry contains proteins that are annotated as penicillin-binding protein 5 and 6. These belong to MEROPS peptidase family S11 (D-Ala-D-Ala carboxypeptidase A family, clan SE). Penicillin-binding protein 5 expressed by Escherichia coli functions as a D-alanyl-D-alanine carboxypeptidase. It is composed of two domains that are oriented at approximately right angles to each other. The N-terminal domain (
) is the catalytic domain. The C-terminal domain, this entry, is organised into a sandwich of two anti-parallel β-sheets, and has a relatively hydrophobic surface as compared to the N-terminal domain. Its precise function is unknown; it may mediate interactions with other cell wall-synthesising enzymes, thus allowing the protein to be recruited to areas of active cell wall synthesis. It may also function as a linker domain that positions the active site in the catalytic domain closer to the peptidoglycan layer, to allow it to interact with cell wall peptides [
].
Janus kinases (JAKs) are tyrosine kinases that function in membrane-proximal signalling events initiated by a variety of extracellular factors binding to cell surface receptors [
]. Many type I and II cytokine receptors lack a protein tyrosine kinase domain and rely on JAKs to initiate the cytoplasmic signal transduction cascade. Ligand binding induces oligomerisation of the receptors, which then activates the cytoplasmic receptor-associated JAKs. These subsequently phosphorylate tyrosine residues along the receptor chains with which they are associated. The phosphotyrosine residues are a target for a variety of SH2 domain-containing transducer proteins. Amongst these are the signal transducers and activators of transcription (STAT) proteins, which, after binding to the receptor chains, are phosphorylated by the JAK proteins. Phosphorylation enables the STAT proteins to dimerise and translocate into the nucleus, where they alter the expression of cytokine-regulated genes. This system is known as the JAK-STAT pathway.Four mammalian JAK family members have been identified: JAK1, JAK2, JAK3, and TYK2. They are relatively large kinases of approximately 1150 amino acids, with molecular weights of ~120-130kDa. Their amino acid sequences are characterised by the presence of 7 highly conserved domains, termed JAK homology (JH) domains. The C-terminal domain (JH1) is responsible for the tyrosine kinase function. The next domain in the sequence (JH2) is known as the tyrosine kinase-like domain, as its sequence shows high similarity to functional kinases but does not possess any catalytic activity. Although the function of this domain is not well established, there is some evidence for a regulatory role on the JH1 domain, thus modulating catalytic activity. The N-terminal portion of the JAKs (spanning JH7 to JH3) is important for receptor association and non-catalytic activity, and consists of JH3-JH4, which is homologous to the SH2 domain, and lastly JH5-JH7, which is a FERM domain.The FERM domain has a cloverleaf tripart structure composed of A-lobe or F1, B-lobe or F2, and C-lobe or F3 [
]. This entry represents the C-lobe/F3 of the FERM domain of JAK2.
This superfamily represents a structural domain consisting of segregated alpha and beta regions in 3-layers. Homologous domains with this structure are found in:3,4-dihydroxy-2-butanone 4-phosphate synthase (
) (DHBP synthase) (RibB)
A family of eukaryotic and prokaryotic hypothetical proteins that includes YrdC and YciO from Escherichia coli and MTH1692 from the archaea Methanothermobacter thermautotrophicus (Methanobacterium thermoformicicum)DHBP synthase RibB catalyses the conversion of D-ribulose 5-phosphate to formate and 3,4-dihydroxy-2-butanone 4-phosphate, the latter serving as the biosynthetic precursor for the xylene ring of riboflavin [
]. In Photobacterium leiognathi, the riboflavin synthesis genes ribB (DHBP synthase), ribE (riboflavin synthase), ribH (lumazone synthase) and ribA (GTP cyclohydrolase II) all reside in the lux operon []. RibB is sometimes found as a bifunctional enzyme with GTP cyclohydrolase II that catalyses the first committed step in the biosynthesis of riboflavin (). No sequences with significant homology to DHBP synthase are found in the metazoa.
The YrdC family of hypothetical proteins are widely distributed in eukaryotes and prokaryotes and occur as: (i) independent proteins, (ii) with C-terminal extensions, and (iii) as domains in larger proteins, some of which are implicated in regulation [
]. YrdC from Escherichia coli preferentially binds to double-stranded RNA and DNA. YrdC is predicted to be an rRNA maturation factor, as deletions in its gene lead to immature ribosomal 30S subunits and, consequently, fewer translating ribosomes []. Therefore, YrdC may function by keeping an rRNA structure needed for proper processing of 16S rRNA, especially at lower temperatures. Threonylcarbamoyl-AMP synthase (Sua5) is an example of a multi-domain protein that contains an N-terminal YrdC-like domain and a C-terminal Sua5 domain. Sua5 was identified in Saccharomyces cerevisiae (Baker's yeast) as a suppressor of a translation initiation defect in the cytochrome c gene and is required for formation of a threonylcarbamoyl group on adenosine at position 37 in tRNAs [,
]. HypF is involved in the synthesis of the active site of [NiFe]-hydrogenases [
].
The Mediator complex is a coactivator involved in the regulated transcription of nearly all RNA polymerase II-dependent genes. Mediator functions as a bridge to convey information from gene-specific regulatory proteins to the basal RNA polymerase II transcription machinery. The Mediator complex, having a compact conformation in its free form, is recruited to promoters by direct interactions with regulatory proteins and serves for the assembly of a functional preinitiation complex with RNA polymerase II and the general transcription factors. On recruitment the Mediator complex unfolds to an extended conformation and partially surrounds RNA polymerase II, specifically interacting with the unphosphorylated form of the C-terminal domain (CTD) of RNA polymerase II. The Mediator complex dissociates from the RNA polymerase II holoenzyme and stays at the promoter when transcriptional elongation begins. The Mediator complex is composed of at least 31 subunits: MED1, MED4, MED6, MED7, MED8, MED9, MED10, MED11, MED12, MED13, MED13L, MED14, MED15, MED16, MED17, MED18, MED19, MED20, MED21, MED22, MED23, MED24, MED25, MED26, MED27, MED29, MED30, MED31, CCNC, CDK8 and CDC2L6/CDK11. The subunits form at least three structurally distinct submodules. The head and the middle modules interact directly with RNA polymerase II, whereas the elongated tail module interacts with gene-specific regulatory proteins. Mediator containing the CDK8 module is less active than Mediator lacking this module in supporting transcriptional activation.
The head module contains: MED6, MED8, MED11, SRB4/MED17, SRB5/MED18, ROX3/MED19, SRB2/MED20 and SRB6/MED22. The middle module contains: MED1, MED4, NUT1/MED5, MED7, CSE2/MED9, NUT2/MED10, SRB7/MED21 and SOH1/MED31. CSE2/MED9 interacts directly with MED4. The tail module contains: MED2, PGD1/MED3, RGR1/MED14, GAL11/MED15 and SIN4/MED16. The CDK8 module contains: MED12, MED13, CCNC and CDK8. Individual preparations of the Mediator complex lacking one or more distinct subunits have been variously termed ARC, CRSP, DRIP, PC2, SMCC and TRAP.Med10 is one of the protein subunits of the Mediator complex, tethered to Med14 (Rgr1) protein. Med10 specifically mediates basal-level HIS4 transcription via Gcn4. In addition, there is a putative requirement for Med10 in Bas2-mediated transcription [
].
Secretoglobins are relatively small, secreted, disulphide-bridged dimeric proteins with encoding genes sharing substantial sequence similarity [
,
]. Secretoglobin have a four-helical structure, and in the case of uteroglobin, form homodimers, whereas allergen Fel d 1 forms a tetramer of two heterodimers (chains 1 and 2). The conservation of this primary and quaternary structure indicates that the genome of the eutherian common ancestor of cats, rodents, and primates contained a similar gene pair.Uteroglobin (blastokinin or Clara cell protein CC10) is a mammalian steroid-inducible secreted protein originally isolated from the uterus of rabbits during early pregnancy [
]. The mucosal epithelia of several organs that communicate with the external environment express uteroglobin. Its tissue-specific expression is regulated by steroid hormones, and is augmented in the uterus by non-steroidal prolactin. Uteroglobin may be a multi-functional protein with anti-inflammatory/immunomodulatory properties, acting to inhibit phospholipase A2 activity [
,
], and binding to (and possibly sequestering) several hydrophobic ligands such as progesterone, retinols, polychlorinated biphenyls, phospholipids and prostaglandins [,
]. In addition, uteroglobin has anti-chemotactic, anti-allergic, anti-tumourigenic and embryo growth-stimulatory properties. Uteroglobin may have a homeostatic role against oxidative damage, inflammation, autoimmunity and cancer [,
,
,
]. However, the true biological function of uteroglobin is poorly understood. Uteroglobin consists of a disulphide-linked homodimer with a large hydrophobic pocket located between the two dimers []. Each monomer being composed of four helices that do not form a canonical four helix-bundle motif but rather a boomerang-shaped structure in which helices H1, H3, and H4 are able to bind a homodimeric partner []. The hydrophobic pocket binds steroids, particularly progesterone, with high specificity. It is a member of the secretoglobin superfamily.This entry represents the uteroglobin and secretoglobin family 1C (SCGB1C). SCGB1C1 has been shown to be localised to Bowman's glands in the olfactory mucosa [
]. It is thought to act as an odorant-binding protein, with ligands appearing to be small, hydrophobic molecules [].
Inwardly-rectifying potassium channels (Kir) are the principal class of two-TM domain potassium channels. They are characterised by the property of inward-rectification, which is described as the ability to allow large inward currents and smaller outward currents. Inwardly rectifying potassium channels (Kir) are responsible for regulating diverse processes including: cellular excitability, vascular tone, heart rate, renal salt flow, and insulin release [
]. To date, around twenty members of this superfamily have been cloned, which can be grouped into six families by sequence similarity, and these are designated Kir1.x-6.x [,
].Cloned Kir channel cDNAs encode proteins of between ~370-500 residues, both N- and C-termini are thought to be cytoplasmic, and the N terminus lacks a signal sequence. Kir channel alpha subunits possess only 2TM domains linked with a P-domain. Thus, Kir channels share similarity with the fifth and sixth domains, and P-domain of the other families. It is thought that four Kir subunits assemble to form a tetrameric channel complex, which may be hetero- or homomeric [
].The Kir3.x channel family is gated by G-proteins following G-protein
coupled receptor (GPCR) activation. They are widely distributed inneuronal, atrial, and endocrine tissues and play key roles in generating
late inhibitory postsynaptic potentials, slowing the heart rate andmodulating hormone release. They are directly activated by G-protein
beta-gamma subunits released from G-protein heterotrimers of the G(i/o)family upon appropriate receptor stimulation.Kir3.4 is thought to associate with Kir3.1, to form hetero-tetrameric
acetylcholine-activated K+ channels, in the heart. Their activation,following stimulation of the vagus nerve, leads to slowing of the heart,
and reduction in contractile force. In the brain, Kir3.4 distribution hasbeen found to be quite restricted, being found in some neuronal populations,
such as Purkinje cells and neurones of the globus pallidus and the ventralpallidum [
]. A recent study has suggested that Kir3.4 may confer mechano-sensitive properties on Kir channels, since channels containing it areinactivated by membrane stretch forces [
].Mutations of the Kir3.4 gene have been linked to Long QT syndrome 13 (LQT13) [
] and Hyperaldosteronism, familial, 3 (HALD3) [].
Lysosome-related organelles comprise a group of specialised intracellular compartments that include melanosomes and platelet dense granules in mammals and eye pigment granules in insects. Hermansky-Pudlak syndrome (HPS) is a disorder of lysosome-related organelle biogenesis. Genes associated with HPS encode subunits of three complexes that are known as biogenesis of lysosome-related organelles complex (BLOC)-1, -2 and -3 [
]. There are eight known HPS proteins of the BLOCs [,
]]. Organelles affected in HPS include the melanosome, resulting in hypopigmentation, and the platelet delta (dense) granule, resulting in prolonged bleeding times. HPS in humans or mice is caused by mutations in any of 15 genes, five of which encode subunits BLOC-1. BLOC-1 and BLOC-2 act sequentially in the same pathway. Melanosome maturation requires at least two cargo transport pathways directly from early endosomes to melanosomes. One pathway mediated by AP-3, and one pathway mediated by BLOC-1 and BLOC-2 [
]. The adaptor protein AP-3 complex is a component of the cellular machinery that controls protein sorting from endosomes to lysosomes and melanosomes. BLOC-1 interacts physically and functionally with AP-3 to facilitate the trafficking of a known AP-3 cargo, CD63, and of tyrosinase-related protein 1 (Tyrp1). BLOC-1 also interacts with BLOC-2 to facilitate Tyrp1 trafficking by a mechanism apparently independent of AP-3 function. Both BLOC-1 and -2 predominantly localise to early endosome-associated tubules [].Complex-2 (BLOC-2) contains the HPS3, HPS5 and HPS6 proteins as subunits. Fibroblasts deficient in the BLOC-2 subunits HPS3 or HPS6 have normal basal secretion function of the lysosomal enzyme beta-hexosaminidase [
].This entry includes BLOC-2 complex, Hps6 subunit, which may regulate the synthesis and function of lysosomes and of highly specialized organelles [
]. It acts as cargo adapter for the dynein-dynactin motor complex to mediate the transport of lysosomes from the cell periphery to the perinuclear region. Hsp6 facilitates retrograde lysosomal trafficking by linking the motor complex to lysosomes, and perinuclear positioning of lysosomes is crucial for the delivery of endocytic cargos to lysosomes, for lysosome maturation and function [].
Prokaryotic cells have a defence mechanism against a sudden heat-shock stress. Commonly, they induce a set of proteins that protect cellular proteins from being denatured by heat. Among such proteins are the GroE and DnaK chaperones whose transcription is regulated by a heat-shock repressor protein HrcA. HrcA is a winged helix-turn-helix repressor that negatively regulates the transcription of dnaK and groE operons by binding the upstream CIRCE (controlling inverted repeat of chaperone expression) element. In Bacillus subtilis this element is a perfect 9 base pair inverted repeat separated by a 9 base pair spacer. The crystal structure of a heat-inducible transcriptional repressor, HrcA, from Thermotoga maritima has been reported at 2.2A resolution. HrcA is composed of three domains: an N-terminal winged helix-turn-helix domain (WHTH), a GAF-like domain, and an inserted dimerizing domain (IDD). The IDD shows a unique structural fold with an anti-parallel β-sheet composed of three β-strands sided by four α-helices. HrcA crystallises as a dimer, which is formed through hydrophobic contact between the IDDs and a limited contact that involves conserved residues between the GAF-like domains [
]. The structural studies suggest that the inactive form of HrcA is the dimer and this is converted to its DNA-binding form by interaction with GroEL, which binds to a conserved C-terminal sequence region [,
]. Comparison of the HrcA-CIRCE complexes from B. subtilis and Bacillus thermoglucosidasius (Geobacillus thermoglucosidasius), which grow at vastly different ranges of temperature shows that the thermostability profiles were consistent with the difference in the growth temperatures suggesting that HrcA can function as a thermosensor to detect temperature changes in cells []. Any increase in temperature causes the dissociation of the HrcA from the CIRCE complex with the concomitant activation of transcription of the groE and dnaK operons. This domain represents the winged helix-turn-helix DNA-binding domain which is located close to the N terminus of HrcA. This domain is also found at the N terminus of a set of uncharacterised proteins that have two C-terminal CBS domains.
Cbl is a member of the LysR transcriptional regulators that comprise the largest family of prokaryotic transcription factor. Cbl shows high sequence similarity to CysB, the LysR-type transcriptional activator of genes involved in sulfate and thiosulfate transport, sulfate reduction, and cysteine synthesis [
]. In Escherichia coli, the function of Cbl is required for expression of sulfate starvation-inducible (ssi) genes, coupled with the biosynthesis of cysteine from the organic sulfur sources (sulfonates). The ssi genes include the ssuEADCB and tauABCD operons encoding uptake systems for organosulfur compounds, aliphatic sulfonates, and taurine []. The genes in these operons encode an ABC-type transport system required for uptake of aliphatic sulfonates and a desulfonation enzyme. Both Cbl and CysB require expression of the tau and ssu genes. Like many other members of the LTTR family, the Cbl is composed of two functional domains joined by a linker helix involved in oligomerization: an N-terminal HTH (helix-turn-helix) domain, which is responsible for the DNA-binding specificity, and a C-terminal substrate-binding domain, which is structurally homologous to the type 2 periplasmic binding proteins. As also observed in the periplasmic binding proteins, the C-terminal domain of the bacterial transcriptional repressor undergoes a conformational change upon substrate binding which in turn changes the DNA binding affinity of the repressor []. The structural topology of this substrate-binding domain is most similar to that of the type 2 periplasmic binding proteins (PBP2).The PBP2 are responsible for the uptake of a variety of substrates such as phosphate, sulfate, polysaccharides, lysine/arginine/ornithine, and histidine. The PBP2 bind their ligand in the cleft between these domains in a manner resembling a Venus flytrap. After binding their specific ligand with high affinity, they can interact with a cognate membrane transport complex comprised of two integral membrane domains and two cytoplasmically located ATPase domains. This interaction triggers the ligand translocation across the cytoplasmic membrane energized by ATP hydrolysis. Besides transport proteins, the PBP2 superfamily includes the substrate- binding domains from ionotropic glutamate receptors, LysR-like transcriptional regulators, and unorthodox sensor proteins involved in signal transduction [
,
,
].
CysB is a transcriptional activator of genes involved in sulfate and thiosulfate transport, sulfate reduction, and cysteine synthesis. In Escherichia coli, the regulation of transcription in response to sulfur source is attributed to two transcriptional regulators, CysB and Cbl [
]. CysB, in association with Cbl, downregulates the expression of ssuEADCB operon which is required for the utilization of sulfur from aliphatic sulfonates, in the presence of cysteine []. Also, Cbl and CysB together directly function as transcriptional activators of tauABCD genes, which are required for utilization of taurine as sulfur source for growth []. Like many other members of the LTTR family, CysB is composed of two functional domains joined by a linker helix involved in oligomerization: an N-terminal HTH (helix-turn-helix) domain, which is responsible for the DNA-binding specificity, and a C-terminal substrate-binding domain, which is structurally homologous to the type 2 periplasmic binding proteins [
]. As also observed in the periplasmic binding proteins, the C-terminal domain of the bacterial transcriptional repressor undergoes a conformational change upon substrate binding which in turn changes the DNA binding affinity of the repressor []. The structural topology of this substrate-binding domain is most similar to that of the type 2 periplasmic binding proteins (PBP2).The PBP2 are responsible for the uptake of a variety of substrates such as phosphate, sulfate, polysaccharides, lysine/arginine/ornithine, and histidine. The PBP2 bind their ligand in the cleft between these domains in a manner resembling a Venus flytrap. After binding their specific ligand with high affinity, they can interact with a cognate membrane transport complex comprised of two integral membrane domains and two cytoplasmically located ATPase domains. This interaction triggers the ligand translocation across the cytoplasmic membrane energized by ATP hydrolysis. Besides transport proteins, the PBP2 superfamily includes the substrate- binding domains from ionotropic glutamate receptors, LysR-like transcriptional regulators, and unorthodox sensor proteins involved in signal transduction [
,
,
].
This entry represents the N-terminal domain of Small EDRK-rich factors (SERFs), including SERF1/2 from human. Proteins containing this domain are short proteins that are rich in aspartate, glutamate, lysine and arginine [
]. SERF1/2 are positive regulators of amyloid protein aggregation and proteotoxicity; they induce conformational changes in amyloid proteins, driving them into compact formations preceding the formation of aggregates [,
,
].
G protein is a modulator in various transmembrane signalling systems. The beta and gamma subunits are required for the GTPase activity, for replacement of GDP by GTP, and for G protein-effector interaction. This entry represents the G protein (guanine nucleotide-binding proteins) beta subunit, including Ste4 from budding yeasts [
], Git5 from fission yeasts [] and GNB1-5 from animals [].
The Impact protein is a translational regulator that ensures constant high levels of translation under amino acid starvation. It acts by interacting with Gcn1/Gcn1L1, thereby preventing activation of Gcn2 protein kinases (EIF2AK1 to 4) and subsequent down-regulation of protein synthesis. It is evolutionary conserved from eukaryotes to archaea [
]. This entry represents the N-terminal domain of the Impact proteins.
This family contains several plant plasma membrane proteins termed DREPPs as they are developmentally regulated plasma membrane polypeptides [
], including the salt stress root protein RS1 and plasma membrane-associated cation-binding protein 1 (PCAP1). DREPPs are thought to be signal proteins []. In Arabidopsis thaliana, AtPCAP1 may be involved in intracellular signaling through interaction with PtdInsPs and calmodulin (CaM) [
].
This entry includes small ubiquitin-related modifier (SUMO) proteins. SUMOs are small proteins that are covalently attached to lysines as post-translational modifications and are used to control multiple cellular process including signal transduction, nuclear transport and DNA replication and repair [
]. Unlike ubiquitin, they are not involved in protein degradation. This entry also contains the C-terminal Rad60 DNA repair protein SUMO-like domain.
This entry represents a group of plant-specific O-fucosyltransferases and their homologues [
]. O-fucosyltransferase-like proteins are GDP-fucose dependent enzymes with similarities to the family 1 glycosyltransferases (GT1). They are soluble ER proteins that may be proteolytically cleaved from a membrane-associated preprotein, and are involved in the O-fucosylation of protein substrates, the core fucosylation of growth factor receptors, and other processes [,
].
Protein families that contain at least one copy of this domain include citrate lyase ligase, pantoate-beta-alanine ligase, glycerol-3-phosphate cytidyltransferase [
], ADP-heptose synthase, phosphocholine cytidylyltransferase, lipopolysaccharide core biosynthesis protein KdtB, the bifunctional protein NadR, archaeal FAD synthase RibL [], and a number whose function is unknown. Many of these proteins are known to use CTP or ATP and release pyrophosphate.
This entry represents the N-terminal domain of replication protein A (RPA) interacting protein. RPA is a single stranded DNA-binding protein involved in DNA replication, repair, and recombination [
]. RPA interacting protein is involved in the import of RPA into the nucleus [,
]. The N-terminal domain is is rich in basic residues and responsible for interaction with importin beta [,
].
This family consists of the wound-inducible basic proteins from plants. The metabolic activities of plants are dramatically altered upon mechanical injury or pathogen attack. A large number of proteins accumulates at wound or infection sites, such as the wound-inducible basic proteins. These proteins are small, 47 amino acids in length, has no signal peptides and are hydrophilic and basic [
].
The Gag polyprotein is a core viral polyprotein that undergoes specific enzymatic cleavages in vivo to yield the mature protein. It is involved in capsid formation and genome binding. Shortly after infection, interaction between incoming particle-associated Gag proteins and host dynein allows centrosomal targeting of the viral genome (associated to Gag), prior to nucleus translocation and integration into host genome [
,
].
This entry includes the sporulation inhibitor of replication (sirA) protein from Bacillus subtilis and related proteins. SirA is a DnaA-interacting protein that inhibits initiation of replication in diploid Bacillus sp. cells committed to the developmental pathway leading to formation of a dormant spore. SirA protein synthesis is induced by the response regulator Spo0A, the master regulator for entry into sporulation [
].
ENOX proteins are growth-related cell surface proteins that catalyse both hydroquinone or NADH oxidation and protein disulfide-thiol interchange [
]. The two enzymatic activities oscillate with a period length of 24 minutes and play a role in control of the ultradian cellular biological clock [,
]. ENOX proteins may play roles in cancer, cellular time-keeping, growth, aging and neurodegenerative diseases [].
Type I protein secretion is a system in some Gram-negative bacteria to export proteins (often proteases) across both inner and outer membranes to the extracellular medium. This entry contains one of three proteins of the type I secretion apparatus. Targeted proteins are not cleaved at the N terminus, but rather carry signals located toward the extreme C terminus to direct type I secretion.
The Impact protein is a translational regulator that ensures constant high levels of translation under amino acid starvation. It acts by interacting with Gcn1/Gcn1L1, thereby preventing activation of Gcn2 protein kinases (EIF2AK1 to 4) and subsequent down-regulation of protein synthesis. It is evolutionary conserved from eukaryotes to archaea [
]. This entry represents the N-terminal domain superfamily of the Impact proteins.
This family consists of several Chordopoxvirus A20R proteins. The A20R protein is required for DNA replication, is associated with the processive form of the viral DNA polymerase, and directly interacts with the viral proteins encoded by the D4R, D5R, and H5R open reading frames. A20R may contribute to the assembly or stability of the multiprotein DNA replication complex [
].
This family consists of several Caenorhabditis elegans specific ly-6-related HOT and ODR proteins. These proteins are involved in the olfactory system. Odr-2 mutants are known to be defective in the ability to chemotax to odorants that are recognised by the two AWC olfactory neurons. Odr-2 encodes a membrane-associated protein related to the Ly-6 superfamily of GPI-linked signalling proteins [
].
This entry is represented by Autographa californica nuclear polyhedrosis virus (AcMNPV) protein Ac75, which is required for the egress of nucleocapsids from the nucleus, formation of intranuclear microvesicles and subsequent budded virion formation, in addition to protein Ac93. Ac75 is not an integral membrane protein, but it interacts with integral membrane protein Ac76 and is associated with the nuclear membrane [
].
This entry represents a group of E3 ubiquitin-protein ligases, including SP1 and SPL1 from Arabidopsis. They are E3 ubiquitin-protein ligases involved in the regulation of protein import in the chloroplast [
]. SP1 is associated with TOC (translocon at the outer envelope of chloroplasts) complexes and can mediate ubiquitination of TOC components, promoting their degradation [].This entry also includes some uncharacterised bacterial proteins.
This family represents a set of transmembrane proteins including various interferon-induced transmembrane proteins, synapse differentiation-inducing gene protein 1, and tumor suppressor candidate 5 and homologues. Interferon-induced transmembrane protein 1 (also known as human leukocyte antigen CD225) regulates vesicular membrane fusion events and is essential for different physiological processes such as interferon induced cell growth suppression, neurotransmision and metabolism [
,
,
].
This entry represents a group of plant BAG domain containing proteins, including BAG5/6/7/8 from Arabidopsis. The plant BAG domain containing proteins are multifunctional proteins that regulate cytoprotective processes from pathogen attack to abiotic stress and development. AtBAG6 appears to play a role within basal defense pathways, while AtBAG7 is involved in the maintenance of the unfolded protein response in ER [
].
This domain is a putative nucleic acid binding zinc finger and is found at the N terminus of proteins that also contain an adjacent XS domain
and in some proteins a C-terminal XH domain
[
]. Proteins containing this domain include protein SUPPRESSOR OF GENE SILENCING 3 (SGS3), which is required for post-transcriptional gene silencing and natural virus resistance [,
].
The UBX domain is found in ubiquitin-regulatory proteins, which are members of the ubiquitination pathway, as well as a number of other proteins including FAF-1 (FAS-associated factor 1), the human Rep-8 reproduction protein and several hypothetical proteins from yeast. The function of the UBX domain is not known although the fragment of avian FAF-1 containing the UBX domain causes apoptosis of transfected cells.
This domain is located at the C-terminal of the Duffy-antigen binding protein of Plasmodium spp. Duffy-antigen binding protein is an antigen on these parasites which enable them to invade erythrocytes [
]. Proteins containing this domain are typically between 449 and 1061 amino acids in length. These proteins are found in association with . There are two conserved sequence motifs: NKNGG and QKHDF.
This entry represents the M157 glycoprotein, a divergent form of MHC class I-like proteins which is the protein product of Murid herpesvirus 1. This protein is unique in its ability to engage both activating (Ly49H) and inhibitory (Ly49I) natural killer cell receptors. M157 is involved in intra- and intermolecular interacts within and between its domains to form a compact MHC-like molecule [
].
This small family of proteins is functionally uncharacterized. It is found mainly in Firmicutes. Proteins in this family are around 130 amino acids in length. Based on NMR structure 2MCT, it forms an alpha/beta structure with a 6 stranded antiparallel b-sheet planked by a single alpha helix. The only protein with similar structures is a putative lipoprotein (PDB code 4R7R).