This entry represents a conserved domain found at the N terminus of a number of fungal proteins. These proteins are largely annotated as being hyphally-regulated cell wall proteins that are induced in response to hyphal development. Proteins containing this domain include Hpf1 from budding yeasts and Hyr1 from Candida albicans. Hpf1 can reduce haziness in white wine [
]. Hyr1 is a GPI-anchored hyphal cell wall protein required for hyphal growth and virulence. It is involved in innate immune cell evasion through confering resistance to neutrophil killing [,
].
This family includes Melanocyte protein PMEL (also known as PMEL-17), Transmembrane glycoprotein NMB (GPNMB) and Transmembrane protein 130, previously described as melanocyte protein PMEL-17-related family. Regarding the highly conserved domain architecture of these family members, including the PKD, KLD and transmembrane (TM) domains, the name was updated and now referred to as PKD- and KLD-Associated Transmembrane (PKAT) protein family [
]. TMEM130 has been identified as the most ancient paralogue of PMEL and GPNMB.PMEL and GPNMB share overlapping phenotypes and disease associations, such as melanin-based pigmentation, cancer, neurodegenerative disease and glaucoma.
Low molecular weight S layer protein, N-terminal, subdomain
Type:
Homologous_superfamily
Description:
Clostridial species have a layer of surface proteins surrounding their membrane. This layer is comprised of a high molecular weight protein and a low molecular weight protein.This superfamily represents a subdomain of the N-terminal domain of the Low molecular weight S layer protein (LMW SLP). Structurally, it consists of 2 alpha helices and 5 beta strands and adopts a 2-layer sandwich architecture. Within this domain, one of the beta strands is composed of C-terminal residues, while the rest of the protein is contributed to by a contiguous polypeptide chain at the N terminus [
].
This family consists of several eukaryotic GPP34 like proteins. GPP34 (also known as golgi phosphoprotein 3) localises to the Golgi complex and is conserved from Saccharomyces cerevisiae to humans. The cytosolic-ally exposed location of GPP34 predicts a role for a novel coat protein in Golgi trafficking [
]. The budding yeast GPP34 homologue, also known as Vps74, is a phosphatidylinositol-4-phosphate-binding protein that links Golgi membranes to the cytoskeleton and may participate in the tensile force required for vesicle budding from the Golgi [].This family also includes uncharacterized proteins from bacteria.
The portal protein is a bacteriophage component that forms a hole, or portal, enabling DNA passage during packaging and ejection. It also forms the junction between the phage head (capsid) and the tail proteins [
].These entry represents a family of phage minor structural proteins. They are proposed to be portal proteins on the basis of their gene positions within the phage gene order, presence in mature phage, size, and conservation across a number of complete genomes of tailed phage that lack other candidate portal proteins [
].
LIM-homeodomain (LIM-HD) proteins are transcription factors that are critical in the development of many cell types and tissues. The activity of LIM-HD proteins is dependent on the essential cofactor LIM domain-binding protein 1 (Ldb1). This entry represents a 30-residue LIM interaction domain (LID) that binds to the LIM domains of all LIM-HD and closely related LIM-only (LMO) proteins. Isl1 (Insulin gene enhancer protein 1) and Isl2 each contain a LID-like sequence in their C-terminal regions, the Lhx3-binding domain (LBD), which binds the LIM domains Lhx3 and Lhx4 [
].
This entry describes a domain of unknown function that is found in the predicted extracellular domain of a number of putative membrane-bound proteins. One of these is protein psr, described as a penicillin binding protein 5 (PDP-5) synthesis repressor. Another is Bacillus subtilis LytR, described as a transcriptional attenuator of itself and the LytABC operon, where LytC is N-acetylmuramoyl-L-alanine amidase. A third is CpsA, a putative regulatory protein involved in exocellular polysaccharide biosynthesis. These proteins share the property of having a short putative N-terminal cytoplasmic domain and transmembrane domain forming a signal-anchor.
Gibberellins are plant hormones which have great impact on growth signalling. DELLA proteins are transcriptional regulators of growth related proteins that lack a DNA binding domain and exert its negative regulation of gibberellin responses through interaction with other transcription factors [
]. DELLAs are downregulated when gibberellins bind to their receptor GID1 [,
], which forms a complex with DELLA proteins and signals them towards 26S proteasome. The N-terminal of DELLA proteins contains conserved DELLA and TVHYNP motifs which are important for GID1 binding and proteolysis of the DELLA proteins [,
].
Proteins in this entry contain between eight and ten predicted transmembrane regions and are thought to function as transporters. The best characterised of these proteins is PecM (
) from Erwinia chrysanthemi. PecM is an integral membrane protein which influences the activity of the virulence regulator PecS, and appears to be necessary for the complete efflux of blue pigment indigoidine [
,
]. This entry also includes some proteins predicted to be amino acid metabolite efflux pumps, and several proteins for which no functional predictions have been made.
The sugar transporters belong to a family of membrane proteins responsible
for the transport of various sugars in a wide range of prokaryotic andeukaryotic organisms [
]. These integral membrane proteins arepredicted to comprise twelve membrane spanning domains. It is likely that the
transporters have evolved from an ancient protein present in living organismsbefore the divergence into prokaryotes and eukaryotes [
]. In mammals,these proteins are expressed in a number of organs [
]. Proteins transporting sugars and other substrates are related in this broader family.Sialic acid transporters belong to this group.
This superfamily consists of bacterial proteins involved in a general secretion pathway (GSP) for the export of proteins (also called the type II pathway) [
]. These are proteins of about 400 amino acids that are highly hydrophobic and which are thought to be integral protein of the inner membrane. This domain consists of a bundle of six anti-parallel helices and has a novel fold []. Proteins with this domain form a platform for the type II secretion machinery, as well as the type IV pili and the archaeal flagellae [].
The surface of many fungal spores is covered by a hydrophobic sheath, the rodlet layer,
whose main component is a protein known as the rodlet protein [,
]. The rodlet proteins of Neurospora crassa (gene eas) and Emericella nidulans (gene rodA) are evolutionary related to proteins found in the cell wall of fruiting bodies of the
mushroom Schizophyllum commune (Bracket fungus) [].Collectively, these low-molecular-weight, cysteine-rich (eight conserved cysteines),
hydrophobic proteins, are known as hydrophobins.This entry represents a cysteine-rich conserved site found at the C-terminal end of hydrophobin sequences.
This domain is found in members of the junctin, junctate and aspartyl beta-hydroxylaseprotein families. Junctate is an integral ER/SR membrane calcium binding protein, which comes from an
alternatively spliced form of the same gene that generates aspartyl beta-hydroxylase and junctin[
]. Aspartyl beta-hydroxylase catalyses the post-translational hydroxylation ofaspartic acid or asparagine residues contained within epidermal growth factor (EGF) domains of
proteins [].This domain is also found in several eukaryotic triadin proteins. Triadin is a ryanodine receptor and calsequestrin binding protein located in junctional sarcoplasmic reticulum of striated muscles [
].
Proteins in this entry form part of the NADH:ubiquinone oxidoreductase complex I. Complex I is the first multisubunit inner membrane protein complex of the mitochondrial electron transport chain and it transfers two electrons from NADH to ubiquinone. The mammalian complex I is composed of 45 different subunits. The proteins in this entry represent a component of the iron-sulphur (IP) fragment of the enzyme, that is not involved in catalysis. These proteins carry four highly conserved cysteine residues, but these do not appear to be in a configuration which would favour metal binding, so the exact function of the protein is uncertain [
].
This entry represents a HDS (homology downstream of Sec7) domain found towards the C-terminal of guanine nucleotide exchange factors involved Golgi transport, such as budding yeast protein Sec7, protein Mon2 and BIG1-like proteins [
,
]. Sec7 is involved in the secretory pathway as a protein binding scaffold for the COPII-COPI protein switch for maturation of the VTC intermediate compartments for Golgi compartment biogenesis []. Sec7 has four conserved HDS1-4 domains which act to integrate the signals from several small GTPases, including Arf1 itself, to switch Sec7 from a strongly autoinhibited to a strongly auto activated form [].
Mandelate racemase
(MR) and muconate lactonising enzyme
(MLE) are two bacterial enzymes involved in aromatic acid catabolism. They catalyse mechanistically distinct reactions yet they are related at the level of their primary, quaternary (homooctamer) and tertiary structures [
,
].A number of other proteins also seem to be evolutionary related to these two
enzymes. These include, various plasmid-encoded chloromuconate cycloisomerases , Escherichia coli protein rspA [
], E. coli bifunctional DGOA protein, a number of D-galactonate dehydratases and D-mannonate dehydratases, and a hypothetical protein from Streptomyces ambofaciens [].This entry represents the C-terminal region of these proteins.
The adaptor protein complexes mediate both the recruitment of clathrin to membranes and the recognition of sorting signals within the cytosolic tails of transmembrane cargo molecules [
]. Adaptor protein complex 1 (AP-1) is a heterotetramer composed of two large adaptins (gamma-type subunit AP1G1 and beta-type subunit AP1B1), a medium adaptin (mu-type subunit AP1M1 or AP1M2) and a small adaptin (sigma-type subunit AP1S1 or AP1S2 or AP1S3). Subunits of clathrin-associated adaptor protein complex 1 play a role in protein sorting in the late-Golgi/trans-Golgi network (TGN) and/or in endosomes.This group represents an adaptor protein complex, beta subunit.
Blue (type 1) copper proteins are small proteins which bind a single
copper atom and which are characterised by an intense electronic absorptionband near 600 nm [
,
]. The most well known members of this class of proteins are the plant chloroplastic plastocyanins, which exchange electrons
with cytochrome c6, and the distantly related bacterial azurins, which exchangeelectrons with cytochrome c551.
Several subclasses of the blue copper proteins, such as the amicyanins, azurins,
lastocyanins and rusticyanins, have been identified. Although there is an appreciableamount of divergence in the sequences of these proteins, the copper ligand sites are
conserved.
Members of this family are the PhnK protein of C-P lyase systems for utilization of phosphonates. These systems resemble phosphonatase-based systems in having a three-component ABC transporter, where
is the permease,
is the phosphonates binding protein, and
is the ATP-binding cassette (ABC) protein. They differ, however, in having, typically, ten or more additional genes, many of which are believed to form a membrane-associated complex. This protein (PhnK) and the adjacent-encoded PhnL resemble transporter ATP-binding proteins but are suggested, based on mutagenesis studies, to be part of this complex rather than part of a transporter per se [
].
This short motif directs methylation of the conserved phenylalanine residue. It is most often found at the N terminus of pilins and other proteins involved in secretion (see
,
,
and
).
There is a cleavage site G^FxxxE followed by a hydrophobic stretch. The new N-terminal residue produced after cleavage, usually Phe, is methylated. Separate domains of the prepilin peptidase appear to be responsible for cleavage and methylation. Proteins with this N-terminal region include type IV pilins and other components of pilus biogenesis, competence proteins, and type II secretion proteins. Typically several proteins in a single operon have this region.
This entry describes a short conserved region at the N terminus
called the tonB-box [,
,
], which is involved in the interaction of the protein with the tonB protein []. In Escherichia coli the tonB protein interacts with outer membrane receptor proteins that carry out high-affinity binding and energy-dependent uptake of specific substrates into the periplasmic space. These substrates are either poorly permeable through the porin channels or are encountered at very low concentrations. In the absence of tonB these receptors bind their substrates but do not carry out active transport. The tonB protein also interacts with some colicins.
Protein phosphatases with Kelch-like domains (PPKL) are members of the phosphoprotein phosphatases family present only in plants and alveolates. They have been described as positive effectors of brassinosteroid (BR) signaling in plants. This family consists of Bsu1-like (BSL) proteins, such as BSL1, BSL2, and BSL3. BSL1 and BSL2/BSL3 belong to two ancient evolutionary clades that have been highly conserved in land plants. In contrast, the Bsu1 proteins are exclusively found in the Brassicaceae and display a remarkable sequence divergence [
]. This entry consists of the metallophosphatase domain of BSL proteins, and does not include Bsu1.
G-patch domain and KOW motifs-containing protein, KOW 1
Type:
Domain
Description:
GPKOW is also known as T54 protein or MOS2 homologue [
]. It contains one G-patch domain and two KOW motifs. GPKOW is a nuclear protein that is regulated by catalytic (C) subunit of Protein Kinase A (PKA) and binds RNA in vivo []. PKA may be involved in regulating multiple steps in post-transcriptional processing of pre-mRNAs. The KOW domain is known as an RNA-binding motif that is shared so far among some families of ribosomal proteins, the essential bacterial transcriptional elongation factor NusG, the eukaryotic chromatin elongation factor Spt5, the higher eukaryotic KIN17 proteins and Mtr4 [].
G-patch domain and KOW motifs-containing protein, KOW 2
Type:
Domain
Description:
GPKOW is also known as T54 protein or MOS2 homologue [
]. It contains one G-patch domain and two KOW motifs. GPKOW is a nuclear protein that is regulated by catalytic (C) subunit of Protein Kinase A (PKA) and binds RNA in vivo []. PKA may be involved in regulating multiple steps in post-transcriptional processing of pre-mRNAs. The KOW domain is known as an RNA-binding motif that is shared so far among some families of ribosomal proteins, the essential bacterial transcriptional elongation factor NusG, the eukaryotic chromatin elongation factor Spt5, the higher eukaryotic KIN17 proteins and Mtr4 [].
This entry represents the CpcT/CpeT biliprotein lyase, which has been shown to covalently attach chromophores to cystiene residue(s) of phycobiliproteins [
,
]. These proteins contain a conserved motif PYR in the amino terminal half of the protein that may be functionally important. In the chromatically adapting cyanobacterium Fremyella diplosiphon, the proteins have been shown to be induced by green light, as part of the cpeCDESTR operon [].Structurally, CpcT forms a dimer and adopts a calyx-shaped β-barrel fold. Each CpcT subunit adopts a conical, goblet-shaped β-barrel with 10 anti-parallel β-strands; this fold is characteristic for the FABP subfamily of the calycin superfamily.
This entry represents a domain found in proteins of unknown function [
], some of which are described as heat shock protein (HslJ). In Helicobacter pylori (Campylobacter pylori) the protein is secreted e.g. () and implicated in motility. In Leishmania spp. it is described as an essential protein, over-expression of which, in Leishmania amazonensis increases virulence (
[
]). In E. coli, hslJ is not induced by heat shock. It encodes a putative outer-membrane protein that is CysB-regulated and is involved in displaying novobiocin resistance [].A pair of cysteine residues show correlated conservation, suggesting that they form a disulphide bond.
This superfamily consists of several Vesiculovirus matrix proteins. The matrix (M) protein of vesicular stomatitis virus (VSV) expressed in the absence of other viral components causes many of the cytopathic effects of VSV, including an inhibition of host gene expression and the induction of cell rounding. It has been shown that M protein also induces apoptosis in the absence of other viral components. It is thought that the activation of apoptotic pathways causes the inhibition of host gene expression and cell rounding by M protein [
]. Structurally, the M protein consists of a single globular domain [].
This entry represents the SH2 domain of SH2B adapter protein 1.SH2B adapter protein 1 (SH2B1) belongs to the SH2B family of adapter proteins [
]. It promotes insulin expression and glucose metabolism in beta-cells in mice []. It is also involved in brain-derived neurotrophic factor-induced neurite outgrowth [].SH2B family contains three members of adaptor proteins: SH2B1, 2 and 3 [
]. Typical SH2B proteins contain a SH2 (Src homology 2) and a PH (pleckstrin homology) domain. They serve as adaptors involved in signalling by the receptors for growth factors, such as insulin-like growth factor 1, platelet-derived growth factor and nerve growth factor [].
Thiamine-binding lipoprotein (Cypl) and related proteins are specific to Mycoplasma species. Cypl, also known as p37, whose gene is part of an operon encoding two additional proteins, which are highly similar to components of the periplasmic binding-protein-dependent transport systems of Gram-negative bacteria. It has been suggested that p37 is part of a homologous, high-affinity transport system in Mycoplasma hyorhinis, a Gram-positive bacterium [
]. Its structure indicates that it is a thiamine pyrophosphate-binding protein []. The monomer is a mixed alpha/beta fold split into two globular domains (domains I and II) connected by two linker regions and the C-terminal helix.
UL11 (also known as cytoplasmic envelopment protein 3) is conserved throughout the herpesvirus family. It is a membrane-associated tegument protein that is incorporated into the HSV virion. UL11 and UL51, either singly or in combination, are involved in virion envelopment and/or egress. Both proteins are fatty acylated. UL11 is acylated by both myristoic and palmitoic acids while UL51 is mono-acylated by palmitoic acid [
]. UL11 homologue (U71) from HHV-6A/B is a myristylated virion protein which is expressed at the early stage of the lytic cycle []. Proteins in this entry are from the muromegalovirus and roseolovirus groups.
The pathogenesis-related protein-4 (PR-4) family is a group of proteins containing a BARWIN-like domain. This entry includes BARWIN from barley, PR4A/B from wheat, and AtHEL from Arabidopsis. In barley BARWIN is induced by wounding or pathogens [
]. Homologues of the BARWIN protein are also involved in the plant response to fungal infection and mechanical wounding [,
]. Even though PR-4 proteins have been classified as chitinases, however, several BARWIN homologues have been shown to possess ribonuclease activity [,
]. Antifungal DNase activity has also been observed, together with RNase activity, for the Theobroma cacao TcPR-4b protein [].
Tissue inhibitor of metalloproteinase, conserved site
Type:
Conserved_site
Description:
Tissue inhibitors of metalloproteinases (TIMP) are a family of proteins [
] that can form complexes with extracellular matrix metalloproteinases (suchas collagenases) and irreversibly inactivate them. TIMP's are proteins of
about 200 amino acid residues, 12 of which are cysteines involved in disulfidebonds [
]. The basic structure of such a type of inhibitor is shown in thefollowing schematic representation:
+-----------------------------+ +--------------+
**|** | | |CxCxCxxxxxxxxxxxxxxxxxCxxxxxxxxxCxxxxxxxCxCxCxCxCxxxxxCxxCxxx
| | | | | | | || +-----------------|-----------------+ +-+ +-----+
+---------------------+'C': conserved cysteine involved in a disulfide bond.
This conserved site is found at the N-terminal extremity of these
proteins, which includes three conserved cysteines.
Serine/arginine-rich (SR) proteins constitute a conserved family of RNA-binding proteins with roles in both constitutive and alternative splicing [
]. Typically, SR proteins contain an N-terminal RNA-binding domain and a C-terminal RS domain that acts to promote protein-protein interactions [].This entry represents the second RNA recognition motif (RRM2) of serine/arginine-rich splicing factor 1 (SRSF1 or SF2/ASF). SF2/ASF regulates constitutive and alternative splicing [
]. It has also been shown to regulate diverse aspects of gene regulation, including mRNA stability [], translation [], and transcription []. SF2/ASF is upregulated in several human tumours, including lung, colon, kidney, liver, pancreas and breast tumours [].
Members of this protein family are annotated as CitX, containing the CitX domain, the domain is also found in the CitXG bifunctional protein, of the citrate lyase system. CitX transfers the prosthetic group 2'-(5''-triphosphoribosyl)-3'-dephospho-CoA to the citrate lyase gamma chain, an acyl carrier protein. This enzyme may be designated holo-ACP synthase, holo-citrate lyase synthase, or apo-citrate lyase phosphoribosyl-dephospho-CoA transferase. In a few genera, including Haemophilus, this protein occurs as a fusion protein with CitG (
), an enzyme involved in prosthetic group biosynthesis. This CitX family is easily separated from the holo-ACP synthases of other enzyme systems.
Proteins in this entry are found in a variety of bacteria, and are ATPases involved in type III secretion systems. This entry includes SCTN (also known as YscN) from Yersinia enterocolitica, which is a component of the YOP (Yersinia outer protein) secretion machinery that acts as a molecular motor to provide the energy that is required for the export of proteins [
]. It may be required for type III secretion apparatus formation, proper protein secretion, host cell invasion and virulence []. This protein is required for the stabilization of the major sorting platform component (or C-ring component) [].
The Impact protein is a translational regulator that ensures constant high levels of translation under amino acid starvation. It acts by interacting with Gcn1/Gcn1L1, thereby preventing activation of Gcn2 protein kinases (EIF2AK1 to 4) and subsequent down-regulation of protein synthesis. It is evolutionary conserved from eukaryotes to archaea [
]. This entry represents Impact family members found mostly in bacteria, though there are also some archaeal sequences as well. Crystallography of the Escherichia coli protein YigZ shows a two-domain stucture, where the C-terminal domain is suggested to bind nucleic acids. The function of this proteins is unknown [
].
Arenaviridae are single stranded RNA viruses. The arenaviridae S RNAs that have been characterised include conserved terminal sequences, an ambisense arrangement of the coding regions for the precursor glycoprotein (GPC) and nucleocapsid (N) proteins and an intergenic region capable of forming a base-paired "hairpin"structure. The mature glycoproteins that result are G1 and G2 and the N protein [
].This entry represents the N-terminal domain found in nucleocapsid protein (that encapsulates the viral ssRNA) in arenaviridae. The domain folds into a novel structure with a deep cavity for binding the m7GpppN cap structure that is required for viral RNA transcription [
].
This entry represents the N-terminal region of MCM proteins. This region is composed of three structural domains. Firstly a four helical bundle, secondly a zinc binding motif and thirdly an OB-like fold [
].MCM proteins are DNA-dependent ATPases required for the initiation of
eukaryotic DNA replication [,
,
]. In eukaryotes there is a family of six proteins, MCM2 to MCM7. They were first identified in yeast where most of them have adirect role in the initiation of chromosomal DNA replication by interacting directly with autonomously replicating sequences (ARS). They were thus called minichromosome maintenance proteins, MCM proteins [
].
RIP proteins are receptor-interacting serine/threonine-protein kinases or cell death proteins [
]. The RHIM (RIP homotypic interaction motif) domain is involved in virus recognition. It is necessary for the recruitment of RIP and RIP3 by the IFN-inducible protein DNA-dependent activator of IRFs (DAI), also known as DLM-1 or Z-DNA binding protein (ZBP1). Both RIP kinases contribute to DAI-induced NF-kappaB activation. RIP3 undergoes auto phosphorylation on binding to DAI [].The RHIM domain is also located at the C terminus of TIR-domain-containing adapter-inducing IFN-beta (TRIF). It is essential for TRIF-induced apoptosis, and has been shown to contribute to TRIF-induced NF-kappaB activation [
].
The ospA and ospB genes encode the major outer membrane proteins of the Lyme disease spirochaete Borrelia burgdorferi [
]. The deduced gene products OspA and OspB, contain 273 and 296 residues respectively []. The two Osp proteins show a high degree of sequence similarity, indicating a recent evolutionary event. Molecular analysis and sequence comparison of OspA and OspB with other proteins has revealed similarity to the signal peptides of prokaryotic lipoproteins [,
].This superfamily represents a structural domain, which consists of 21 stranded sheet partly folded upon itself at the ends, found in outer surface lipoproteins.
This entry represents a family of conserved hypothetical proteins that almost invariably pair with an uncharacterised radical SAM protein. The pair occurs in about twenty percent of completed prokaryotic genomes. About forty percent of the members of this family occur as fusion proteins, where the C-terminal domain belongs to the uracil-DNA glycosylase family, a DNA repair family (because uracil in DNA is deamidated cytosine). The linkage by gene clustering and correlated species distribution to a radical SAM protein, and by gene fusion to a DNA repair protein family, suggests a role in DNA modification and/or repair.
The major vault protein is the major polypeptide component of a large cellular ribonuclear protein complex found in the cytoplasm of eukaryotic cells. Several roles for vaults have been proposed. Vault proteins have been associated with development of multi-drug resistance [
]. They have also being implicated in the regulation of several cellular processes including transport mechanisms, signal transmission and immune responses [].This domain is found in the Major Vault Protein and has been called the shoulder domain [
]. This family includes two bacterial proteins and
. This suggests that some bacteria may possess vault particles.
These are proteins of 150 to 160 residues that contain three transmembrane segments. As a signature for these proteins a conserved region between the first and second transmembrane domains has been used. This entry contains the following mammalian proteins, which are evolutionary related []: Leukotriene C4 synthase (
) (gene LTC4S), an enzyme that catalyzes the production of LTC4 from LTA4.
Microsomal glutathione S-transferase II (
) (GST-II) (gene GST2), an enzyme that can also produces LTC4 fron LTA4.
5-lipoxygenase activating protein (gene FLAP), a protein that seems to be required for the activation of 5-lipoxygenase.
These sequences represent the ELAV/Hu subfamily of splicing factors found in metazoa. ELAV stands for the Drosophila Embryonic lethal abnormalvisual protein [
]. ELAV-like splicing factors are also known in human as HuB (ELAV-like protein 2), HuC (ELAV-like protein 3, HuD (LAV-like protein 4) and HuR (ELAV-like protein 1). HuR is ubiquitously expressed whereas HuB, HuC and HuD are neuron-specific []. HuD stands for the human paraneoplastic encephalomyelitis antigen D of which there are 4 variants in human []. These genes are most closely related to the sex-lethal subfamily of splicing factors found in Dipteran insects ().
This entry includes Protein GlcG from Escherichia coli, Corrinoid adenosyltransferase PduO from Salmonella typhimurium and many uncharacterised proteins from bacteria and fungi. GlcG controls the expression of the genes of the glycolate pathway [
,
]. The structure of GlcG is composed of an α-β(2)-α(3)-β(2)-α fold, similar to the Roadblock/LC7 domain. PduO converts cob(I)alamin to adenosylcobalamin (adenosylcob(III)alamin), the cofactor for propanediol dehydratase [,
,
]. Extracellular haem-binding protein from Streptomyces reticuli (HbpS,
) is also included in this entry. This protein interacts with the SenS/SenR two-component signal transduction system. Iron binds to surface-exposed lysine residues of an octomeric assembly of the protein [].
This entry represents a domain found in proteins of unknown function [
], some of which are described as heat shock protein (HslJ). In Helicobacter pylori (Campylobacter pylori) the protein is secreted e.g. () and implicated in motility. In Leishmania spp. it is described as an essential protein, over-expression of which, in Leishmania amazonensis increases virulence (
[
]). In E. coli, hslJ is not induced by heat shock. It encodes a putative outer-membrane protein that is CysB-regulated and is involved in displaying novobiocin resistance [].A pair of cysteine residues show correlated conservation, suggesting that they form a disulphide bond.
This family consists of several mammalian protein kinase A anchoring protein 3 (PRKA3) or A-kinase anchor protein 110kDa (AKAP 110) sequences. Agents that increase intracellular cAMP are potent stimulators of sperm motility. Anchoring inhibitor peptides, designed to disrupt the interaction of the cAMP-dependent protein kinase A (PKA) with A kinase-anchoring proteins (AKAPs), are potent inhibitors of sperm motility. PKA anchoring is a key biochemical mechanism controlling motility. AKAP110 shares compartments with both RI and RII isoforms of PKA and may function as a regulator of both motility- and head-associated functions such as capacitation and the acrosome reaction [
].
The protein MutL is essential in mismatch repair as it coordinating multiple protein-protein interactions that signal strand removal upon mismatch recognition by MutS. MutL is composed of two structurally conserved domains connected by a variable flexible linker: an N-terminal ATPase domain and C-terminal dimerisation domain [
]. The latter harbours the the endonuclease activity of the protein. Structural studies have shown that this C-terminal region is organized into a dimerization and a regulatory subdomain connected by a helical lever spanning the conserved endonuclease motif.This superfamily represents the regulatory subdomain of the C-terminal domain of the MutL protein [
].
This family includes LarE, one of three accessory proteins (the others being LarB and LarC) present in the lactate racemization operon larA-E. LarA is a nickel-dependent lactate racemase, and the accessory proteins are required for the incorporation of Ni in the lactate racemase apoprotein [
]. The accessory proteins LarB, LarC, and LarE are widely distributed in microorganisms. Among them, LarC and LarE have been found to be Ni-containing proteins []. In Lactobacillus plantarum, LarE catalyzes the ATP-dependent incorporation of two sulfur atoms in pyridinium-3,5-biscarboxylic acid mononucleotide (P2CMN) to yield pyridinium-3,5-bisthiocarboxylic acid mononucleotide (P2TMN) [
].
Proteins in this entry contain a flavodoxin-like fold, which is characterised by an open twisted/alpha beta structure consisting of five parallel β-sheets connected by α-helices which surround the sheet. Flavodoxins are electron-transfer proteins that function in various electron transport systems. They bind one FMN molecule, which serves as a redox-active prosthetic group and are functionally interchangeable with ferredoxins [
]. This domain superfamily can be found in flavodoxins, FMN-dependent NADH-azo compound oxidoreductases, ribonucleotide reductase stimulatory proteins, glutathione-regulated potassium-efflux system ancillary protein KefG, N-terminal of the sulfite reductase [NADPH] flavoprotein alpha-component and the C-terminal of the anaerobic nitric oxide reductase flavorubredoxin.
Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [
,
,
,
, ]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. This entry represents UBP-type zinc finger domains, which display some similarity with the Zn-binding domain of the insulinase family. The UBP-type zinc finger domain is found only in a small subfamily of ubiquitin C-terminal hydrolases (deubiquitinases or UBP) [
,
], All members of this subfamily are isopeptidase-T, which are known to cleave isopeptide bonds between ubiquitin moieties.Some of the proteins containing an UBP zinc finger include:Homo sapiens (Human) deubiquitinating enzyme 13 (UBPD)Human deubiquitinating enzyme 5 (UBP5)Dictyostelium discoideum (Slime mold) deubiquitinating enzyme A (UBPA)Saccharomyces cerevisiae (Baker's yeast) deubiquitinating enzyme 8 (UBP8)Yeast deubiquitinating enzyme 14 (UBP14)
This entry represents the atypical RRM, named xRRM, found in La and La-related proteins (LaRPs). They belong to an ancient superfamily of proteins that are conserved in nearly all eukaryotes, except Plasmodium. These proteins are broadly involved in critical processes of RNA use and metabolism in the nucleus and the cytoplasm. The LaRP superfamily is distinguished by a conserved bipartite RNA-binding unit called the La-module, composed by a Lupus antigen motif (LaM) followed by an RNA-Recognition motif (RRM). Beyond this, each LaRP family is characterized by distinct family specific domains and motifs that contribute to structure and function. Genuine La and La-related proteins group 7 (LARP7) bind to the non-coding RNAs transcribed by RNA polymerase III (RNAPIII), which end in UUU. The La-module of these proteins bind the UUU-3'OH, protecting the RNA from degradation, while other domains may be important for RNA folding or other functions. The La and LaRP7 protein families have a C-terminal domain that contains a novel class of atypical RRM, named xRRM (for atypical RRM with extended alpha3), which uses a unique mode of single- and double-strand RNA binding [
,
,
,
,
]. The overall fold of the xRRM is an RRM, but with several atypical features. Unusual features of the xRRM include the absence of conserved RNP1 and RNP2 aromatic sequences on the beta3 and beta1 strands, respectively, typically involved in nucleotide recognition; the presence of an additional helix alpha3 that lies across the β-sheet surface, where single-stranded nucleotides usually bind; and a C-terminal tail required for RNA binding that is disordered in the free xRRM but forms an alpha3 extension (alpha3x) on binding RNA. The front face of the xRRM consists of an antiparallel β-sheet with helix alpha3 lying across the β-sheet perpendicular to the β-strand axis. The back side of the xRRM consists of alpha helices. The xRRM interacts with both single- and double-stranded RNA using the β-sheet surface and the C-terminal tail, which forms a helical extension of alpha3 (alhpa3x) that binds to the RNA major groove [
,
,
,
,
].
Cdc6 (also known as Cell division cycle 6 or Cdc18) functions as a regulator at the early stages of DNA replication, by helping to recruit and load the Minichromosome Maintenance Complex (MCM) onto DNA and may have additional roles in the control of mitotic entry. Precise duplication of chromosomal DNA is required for genomic stability during replication. Cdc6 has an essential role in DNA replication and irregular expression of Cdc6 may lead to genomic instability. Cdc6 over-expression is observed in many cancerous lesions. DNA replication begins when an origin recognition complex (ORC) binds to a replication origin site on the chromatin. Studies indicate that Cdc6 interacts with ORC through the Orc1 subunit, and that this association increases the specificity of the ORC-origins interaction. Further studies suggest that hydrolysis of Cdc6-bound ATP promotes the association of the replication licensing factor Cdt1 with origins through an interaction with Orc6 and this in turn promotes the loading of MCM2-7 helicase onto chromatin. The MCM2-7 complex promotes the unwinding of DNA origins, and the binding of additional factors to initiate the DNA replication. S-Cdk (S-phase cyclin and cyclin-dependent kinase complex) prevents rereplication by causing the Cdc6 protein to dissociate from ORC and prevents the Cdc6 and MCM proteins from reassembling at any origin. By phosphorylating Cdc6, S-Cdk also triggers Cdc6's ubiquitination. The Cdc6 protein is composed of three domains, an N-terminal AAA+ domain with Walker A and B, and Sensor-1 and -2 motifs. The central region contains a conserved nucleotide binding/ATPase domain and is a member of the ATPase superfamily. [
,
,
,
,
].The C-terminal domain of cell division control protein 6 (CDC6) assumes a winged helix fold, with a five α-helical bundle (α15-α19) structure, backed on one side by three beta strands (β6-β8). It has been shown that this domain acts as a DNA-localisation factor, however its exact function is, as yet, unknown. Putative functions include: (1) mediation of protein-protein interactions and (2) regulation of nucleotide binding and hydrolysis. Mutagenesis studies have shown that this domain is essential for appropriate CDC6 activity [
].
This entry represents a family of pyridoxal 5'-phosphate synthase subunit, also known as the pdxT/SNO family. This family belongs to a superfamily containing a triad glutamine aminotransferase fold,
characterised by a conserved Cys-His-Glu active site []. Two regions arehighly conserved across all taxa, the PGGEST motif and the FHPE(LT) motif [
].PdxT/SNO proteins are an alpha/beta three-layer sandwich containing a seven-
stranded twisted mixed parallel β-sheet flanked by a six α-helices onthe N-terminal stretch of the sheet, four on one side and two on the other [
]. They are involved in vitamin B6 biosynthesis. Proteins belonging to the pdxT/SNO family include:Bacillus subtilis glutamine amidotransferase subunit pdxT Haemophilus influenzae glutamine amidotransferase subunit pdxT Methanococcus jannaschii glutamine amidotransferase subunit pdxTYeast probable glutamine amidotransferase SNO1Yeast probable glutamine amidotransferase SNO2Yeast probable glutamine amidotransferase SNO3These are hydrophilic proteins of about 19 to 25 Kd.The term vitamin B6 is used to refer collectively to the compound pyridoxine
and its vitameric forms, pyridoxal, pyridoxamine, and their phosphorylatedderivatives. Vitamin B6 is required by all organisms and plays an essential
role as a co-factor for enzymatic reactions. Plants, fungi, bacteria,archaebacteria, and protists synthetize vitamin B6. Animals and some highly
specialised obligate pathogens obtain it nutritionally. Vitamin B6 has twodistinct biosynthetic pathways, which do not coexist in any organism. The
pdxA/pdxJ pathway, that has been extensively characterised in Escherichiacoli, is found in the gamma subdivision of the proteobacteria. A second
pathway of vitamin B6 synthesis involving the pdxS/SNZ andpdxT/SNO protein families, which are completely unrelated in sequence to the
pdxA/pdxJ proteins, is found in plants, fungi, protists, archaebacteria, andmost bacteria.
PdxS/SNZ and pdxT/SNO proteins form a complex which serves as a glutamine
amidotransferase to supply ammonia as a source of the ring nitrogen of vitaminB6 [
]. PdxT/SNO and pdxS/SNZ appear to encode respectively the glutaminasesubunit, which produces ammonia from glutamine, and the synthase subunit,
which combines ammonia with five- and three-carbon phosphosugars to formvitamin B6 [
].
The recA gene product is a multifunctional enzyme that plays a role in homologous recombination, DNA repair and induction of the SOS response [
]. In homologous recombination, the protein functions as a DNA-dependent ATPase, promoting synapsis, heteroduplex formation and strand exchange between homologous DNAs []. RecA also acts as a protease cofactor that promotes autodigestion of the lexA product and phage repressors. The proteolytic inactivation of the lexA repressor by an activated form of recA may cause a derepression of the 20 or so genes involved in the SOS response, which regulates DNA repair, induced mutagenesis, delayed cell division and prophage induction in response to DNA damage []. RecA is a protein of about 350 amino acid residues. Its sequence is very well conserved [
,
,
] among eubacterial species. It is also found in the chloroplast of plants []. RecA-like proteins are found in archaea and diverse eukaryotic organisms, like fission yeast, mouse or human. In the filament visualised by X-ray crystallography, β-strand 3, the loop C-terminal to β-strand 2, and α-helix D of the core domain form one surface that packs against αa-helix A and β-strand 0 (the N-terminal domain) of an adjacent monomer during polymerisation []. The core ATP-binding site domain is well conserved, with 14 invariant residues. It contains the nucleotide binding loop between β-strand 1 and α-helix C. The Escherichia coli sequence GPESSGKT matches the consensus sequence of amino acids (G/A)XXXXGK(T/S) for the Walker A box (also referred to as the P-loop) found in a number of nucleoside triphosphate (NTP)-binding proteins. Another nucleotide binding motif, the Walker B box is found at β-strand 4 in the RecA structure. The Walker B box is characterised by four hydrophobic amino acids followed by an acidic residue (usually aspartate). Nucleotide specificity and additional ATP-binding interactions are contributed by the amino acid residues at β-strand 2 and the loop C-terminal to that strand, all of which are greater than 90% conserved among bacterial RecA proteins.This entry represents a subset of the RecA-like proteins.
The ~200 amino acid TBC/rab GTPase-activating protein (GAP) domain is well conserved across species and has been found in a wide range of different proteins from plant adhesion molecules to mammalian oncogenes. The name TBC derives from the name of the murine protein Tbc1 in which this domain was first identified based on its similarity to sequences in the tre-2 oncogene, and the yeast regulators of mitosis, BUB2 and cdc16 [
]. The connection of this domain with rab GTPase activation stems from subsequent in-depth sequence analyses and alignments [] and recent work demonstrating that it appears to contain the catalytic activities of the yeast rab GAPs, GYP1, and GYP7 [].The TBC/rab GAP domain has also been named PTM after three proteins known to contain it: the Drosophila pollux, the human oncoprotein TRE17 (oncoTRE17), and a myeloid cell line-expressed protein [
]. The TBC/rab GAP domain contains six conserved motifs named A to F []. A conserved arginine residue in the sequence motif B has been shown to be critical for the full GAP activity []. Resolution of the 3D structure of the TBC/rab GAP domain of GYP1 has shown that it is a fully α-helical V-shaped molecule. The conserved arginine residue is positioned at the side of the narrow cleft on the concave site of the V-shaped molecule. It has been proposed that this cleft is the binding site for the GTPase. The conserved arginine residue probably functions as a catalytic arginine finger analogous to that seen in ras and Rho-GAPs. The two key features of the arginine finger activation mechanism appear to be (i) the positioning of the catalytically essential GTPase glutamine side chain via a hydrogen bonding interaction between the glutamine carbamoyl-NH2 group and the main chain carbonyl group of the GAP arginine, and (ii) the polarization of the gamma-phosphate group or the stabilization of charge on it via the interaction of the positively charged side chain guanidinoyl group of the GAP arginine [].
The EH (for Eps15 Homology) domain is a protein-protein interaction module of approximately 95 residues which was originally identified as a repeated sequence present in three copies at the N terminus of the tyrosine kinase substrates Eps15 and Eps15R [
,
]. The EH domain was subsequently found in several proteins implicated in endocytosis, vesicle transport and signal transduction in organisms ranging from yeast to mammals. EH domains are present in one to three copies and they may include calcium-binding domains of the EF-hand type [,
]. Eps15 is divided into three domains: domain I contains signatures of a regulatory domain, including a candidate tyrosine phosphorylation site and EF-hand-type calcium-binding domains, domain II presents the characteristic heptad repeats of coiled-coil rod-like proteins, and domain III displays a repeated aspartic acid-proline-phenylalanine motif similar to a consensus sequence of several methylases [].EH domains have been shown to bind specifically but with moderate affinity to peptides containing short, unmodified motifs through predominantly hydrophobic interactions. The target motifs are divided into three classes: class I consists of the concensus Asn-Pro-Phe (NPF) sequence; class II consists of aromatic and hydrophobic di- and tripeptide motifs, including the Phe-Trp (FW), Trp-Trp (WW), and Ser-Trp-Gly (SWG) motifs; and class III contains the His-(Thr/Ser)-Phe motif (HTF/HSF) [
,
]. The structure of several EH domains has been solved by NMR spectroscopy. The fold consists of two helix-loop-helix characteristic of EF-hand domains, connected by a short antiparallel β-sheet. The target peptide is bound in a hydrophobic pocket between two alpha helices. Sequence analysis and structural data indicate that not all the EF-hands are capable of binding calcium because of substitutions of the calcium-liganding residues in the loop [,
,
]. This domain is often implicated in the regulation of protein transport/sorting and membrane trafficking. Messenger RNA translation initiation and cytoplasmic poly(A) tail shortening require the poly(A)-binding protein (PAB) in yeast. The PAB-dependent poly(A) ribonuclease (PAN) is organised into distinct domains containing repeated sequence elements [
].
Geminiviruses are characterised by a genome of circular single-stranded DNA encapsidated in twinned (geminate) quasi-isometric particles, from which the group derives its name [
]. Most geminiviruses can be divided into two subgroups on the basis of host range and/or insect vector: i.e. those that infect dicotyledenous plants and are transmitted by the same whitefly species, and those that infect monocotyledenous plants and are transmitted by different leafhopper vectors. The genomes of the whitefly-transmitted African cassava mosaic virus, Tomato golden mosaic virus (TGMV) and Bean golden mosaic virus (BGMV) possess a bipartite genome. By contrast, only a single DNA component has been identified for the leafhopper-transmitted Maize streak virus (MSV) and Wheat dwarf virus (WDV) [,
]. Beet curly top virus (BCTV), and Tobacco yellow dwarf virus belong to a third possible subgroup. Like MSV and WDV, BCTV is transmitted by a specific leafhopper species, yet like the whitefly-transmitted geminiviruses it has a host range confined to dicotyledenous plants.Sequence comparison of the whitefly-transmitted Squash leaf curl virus (SqLCV) and Tomato yellow leaf curl virus (TYLCV) with the genomic components of TGMV and BGMV reveals a close evolutionary relationship [
,
,
]. Amino acid sequence alignments of Potato yellow mosaic virus (PYMV) proteins with those encoded by other geminiviruses show that PYMV is closely related to geminiviruses isolated from the New World, especially in the putative coat protein gene regions []. Comparison of MSV DNA-encoded proteins with those of other geminiviruses infecting monocotyledonous plants, including Panicum streak virus [] and Miscanthus streak virus (MiSV) [], reveal high levels of similarity.The AL1 proteins encodes the replication initiator protein (Rep) of geminiviruses, which is a replicon-specific initiator enzyme and is an essential component of the replisome [
]. For geminivirus Rep protein, this N-terminal region is crucial for origin recognition and DNA cleavage and nucleotidyl transfer []. It is found in association with .
Members of this group are involved in transmembrane signalling. In both prokaryotes and mitochondria they are localized to the outer membrane, and have been shown to bind and transport dicarboxylic tetrapyrrole intermediates of the haem biosynthetic pathway [
,
]. They are associated with the major outer membrane porins (in prokaryotes) and with the voltage-dependent anion channel (in mitochondria) [].Rhodobacter sphaeroides TspO (previously CrtK) is involved in signal transduction, functioning as a negative regulator of the expression of some photosynthesis genes (PpsR/AppA repressor/antirepressor regulon). This down-regulation is believed to be in response to oxygen levels. TspO works through (or modulates) the PpsR/AppA system and acts upstream of the site of action of these regulatory proteins [
]. It has been suggested that the TspO regulatory pathway works by regulating the efflux of certain tetrapyrrole intermediates of the haem/bacteriochlorophyll biosynthetic pathways in response to the availability of molecular oxygen, thereby causing the accumulation of a biosynthetic intermediate that serves as a corepressor for the regulated genes []. A homologue of the TspO protein in Rhizobium meliloti (Sinorhizobium meliloti) is involved in regulating expression of the ndi locus in response to stress conditions []. There is evidence that the S. meliloti TspO acts through, or in addition to, the FixL regulatory system.In animals, translocator protein (TSPO), previously known as peripheral-type benzodiazepine receptor (PBR, MBR) is a mitochondrial protein (located in the outer mitochondrial membrane) where it forms a complex with several proteins of the mitochondrial permeability transition pore (MPTP). TSPO is involved in multiple processes, including regulation of cell death, cholesterol transport and steroid biosynthesis, mitochondrial respiration and oxidation and mitochondrial protein import [,
].These observations suggest that fundamental aspects of this receptor and the downstream signal transduction pathway are conserved in bacteria and higher eukaryotic mitochondria. The alpha-3 subdivision of the purple bacteria is considered to be a likely source of the endosymbiont that ultimately gave rise to the mitochondrion. Therefore, it is possible that the mammalian PBR remains both evolutionarily and functionally related to the TspO of R. sphaeroides.
Anthrax toxin is a plasmid-encoded toxin complex produced by the Gram-positive, spore-forming bacteria, Bacillus anthracis. The toxin consists of three non-toxic proteins: the protective antigen (PA), the lethal factor (LF) and the edema factor (EF) [
]. These component proteins self-assemble at the surface of host cell receptors, yielding a series of toxic complexes that can produce shock-like symptoms and death. Anthrax toxin is one of a large group of Bacillus and Clostridium exotoxins referred to as binary toxins, forming independent enzymatic (A moiety) and binding (B moiety) components. The LF and EF proteins are the enzymes (A moiety) that act on cytosolic substrates, while PA is a multi-functional protein (B moiety) that binds to cell surface receptors, mediates the assembly and internalisation of the complexes, and delivers them to the host cell endosome []. Once PA is attached to the host receptor [], it must then be cleaved by a host cell surface (furin family) protease before it is able to bind EF and LF. The cleavage of the N terminus of PA enables the C-terminal fragment to self-associate into a ring-shaped heptameric complex (prepore) that can bind LF or EF competitively. The PA-LF/EF complex is then internalised by endocytosis, and delivered to the endosome, where PA forms a pore in the endosomal membrane in order to translocate LF and EF to the cytosol. LF is a Zn-dependent metalloprotease that cleaves and inactivates mitogen-activated protein (MAP) kinases, kills macrophages, and causes death of the host by inhibiting cell proliferation [,
]. EF is a calcium-and calmodulin-dependent adenylyl cyclase that can cause edema (fluid-filled swelling) when associated with PA. EF is not toxic by itself, and is required for the survival of germinated Bacillus spores within macrophages at the early stages of infection. EF dramatically elevates the level of host intracellular cAMP, a ubiquitous messenger that integrates many processes of the cell; increases in cAMP can interfere with host intracellular signalling [].This entry represents the N- and C-terminal domains found in both lethal factor and edema factor proteins of anthrax toxin.
This entry represents C3a, C4a and C5a anaphylatoxins, which are protein fragments generated enzymatically in serum during activation of complement molecules C3, C4, and C5. They induce smooth muscle contraction. These fragments are homologous to a three-fold repeat in fibulins.Complement components C3, C4 and C5 are large glycoproteins that have important functions in the immune response and host defence [
]. They have a wide variety of biological activities and are proteolytically activated by cleavage at a specific site, forming a- and b-fragments []. A-fragments form distinct structural domains of approximately 76 amino acids, coded for by a single exon within the complement protein gene. The C3a, C4a and C5a components are referred to as anaphylatoxins [,
]: they cause smooth muscle contraction, histamine release from mast cells, and enhanced vascular permeability []. They also mediate chemotaxis, inflammation, and generation of cytotoxic oxygen radicals []. The proteins are highly hydrophilic, with a mainly α-helical structure held together by 3 disulphide bridges [].Fibulins are secreted glycoproteins that become incorporated into a fibrillar extracellular matrix when expressed by cultured cells or added exogenously to cell monolayers [
,
]. The five known members of the family share an elongated structure and many calcium-binding sites, owing to the presence of tandem arrays of epidermal growth factor-like domains. They have overlapping binding sites for several basement-membrane proteins, tropoelastin, fibrillin, fibronectin and proteoglycans, and they participate in diverse supramolecular structures. The amino-terminal domain I of fibulin consists of three anaphylatoxin-like (AT) modules, each approximately 40 residues long and containing four or six cysteines. The structure of an AT module was determined for the complement-derived anaphylatoxin C3a, and was found to be a compact α-helical fold that is stabilised by three disulphide bridges in the pattern Cys1-4, Cys2-5 and Cys3-6 (where Cys is cysteine). The bulk of the remaining portion of the fibulin molecule is a series of nine EGF-like repeats [].
Viruses in the order Picornavirales infect different vertebrate, invertebrate, and plant hosts and are responsible for a variety of human, animal, and plant diseases. These viruses have a single-stranded, positive sense RNA genome that generally translates a large precursor polyprotein which is proteolytically cleaved after translation to generate mature functional viral proteins. This process is usually mediated by (more than one) proteases, and a 3C (for the family Picornaviridae) or 3C-like (3CL) protease (for other families) plays a central role in the cleavage of the viral precursor polyprotein. In addition to this key role, 3C/3C-like protease is able to cleave a number of host proteins to remodel the cellular environment for virus reproduction [
,
,
,
,
,
]. The Picornavirales 3C/3C-like protease domain forms the MEROPS peptidase family C3 (picornain family) of clan PA.The 3C/3CL protease domain adopts a chymotrypsin-like fold with a cysteine nucleophile in place of a commonly found serine which suggests that the cysteine and serine perform an analogous catalytic function. The catalytic triad is made of a histidine, an aspartate/glutamate and the conserved cysteine in this sequential order. The 3C/3CL protease domain folds into two antiparallel beta barrels that are linked by a loop with a short α-helix in its middle, and flanked by two other α-helices at the N- and C-terminal. The two barrels are topologically equivalent and are formed by six antiparallel beta strands with the first four organised into a Greek key motif. The active-site residues are located in the cleft between the two barrels with the nucleophilic Cys from the C-terminal barrel and the general acid base His-Glu/Asp from the N-terminal barrel [
,
,
].This entry represents a rice tungro spherical waikavirus-type peptidase that belongs to MEROPS peptidase family C3G. It is a picornain 3C-type protease, and is responsible for the self-cleavage of the positive single-stranded polyproteins of a number of plant viral genomes. The location of the protease activity of the polyprotein is at the C-terminal end, adjacent and N-terminal to the putative RNA polymerase [
,
].
G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups [
]. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [,
,
,
,
]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The rhodopsin-like GPCRs (GPCRA) represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7 transmembrane (TM) helices [
,
,
].This family represents a group of animal proteins that play important roles in both physiological state and diseases [
]. Proteins in this family are frequently overexpressed by common tumors. Consequently, they are considered a possible therapeutic target in several tumors, particularly in prostate, breast, and lung cancer, but its role in some CNS/neural tumors (gliomas, neuroblastomas, medulloblastomas) may also be of interest [].
This entry represents the pacifastin domain superfamily.Proteins containing this domain are proteinase inhibitors belonging to MEROPS inhibitor family I19 (clan IW) and sharing a pacifastin
domain of ~35 residues, which contains a characteristic pattern of sixconserved cysteine residues (C-x(9,12)-C-N-x-C-x-C-x(2,3)-G-x(3,6)-C-T-x(3)-
C). The pacifastin domain consists of a twisted β-sheet composed of threeantiparallel strands and stabilised by an identical pattern (C1-C4, C2-C6,
C3-C5) of disulfide bridges [,
,
,
,
,
]. Proteins containing this domain were first isolated from Locusta migratoria migratoria(migratory locust). These were HI, LMCI-1 (PMP-D2) and LMCI-2 (PMP-C) [
,
,
]; five additional members SGPI-1 to 5 were identified in Schistocerca gregaria (desert locust) [,
], and a heterodimeric serine protease inhibitor (pacifastin) was isolated from the hemolymph of Pacifastacus leniusculus (Signal crayfish) []. Pacifastin is a 155kDa composed of two covalently linked subunits, which are separately encoded. The heavy chain of pacifastin (105kDa) is related to transferrins, containing three transferrin lobes, two of which seem to
be active for iron binding []. A number of the members of the transferrin family are also serine peptidases belong to MEROPS peptidase family S60 (). The light chain of pacifastin (44kDa) is the proteinase inhibitory subunit, and has nine cysteine-rich inhibitory domains that are homologous to each other. The locust inhibitors share a conserved array of six cysteine residues with the pacifastin light chain. The structure of members of this family reveal that they are comprised of a triple-stranded antiparallel β-sheet connected by three disulphide bridges [
].The biological function(s) of the locust inhibitors is (are) not fully understood. LMCI-1 and LMCI-2 were shown to inhibit the endogenous proteolytic activating cascade of prophenoloxidase [
]. Expression analysis shows that the genes encoding the SGPI precursors are differentially expressed in a time-, stage- and hormone-dependent manner.This entry also contains the multidomain organization of SCO-spondin. The SCO-spondin protein is a special feature of the chordate phylum. This protein is expressed in the central nervous system (CNS) from the time a dorsal neural tube appears in the course of phylogenetical evolution [
].
Signal transduction response regulator, chemotaxis, protein-glutamate methylesterase
Type:
Domain
Description:
Two-component signal transduction systems enable bacteria to sense, respond, and adapt to a wide range of environments, stressors, and growth conditions [
]. Some bacteria can contain up to as many as 200 two-component systems that need tight regulation to prevent unwanted cross-talk [
]. These pathways have been adapted to response to a wide variety of stimuli, including nutrients, cellular redox state, changes in osmolarity, quorum signals, antibiotics, and more []. Two-component systems are comprised of a sensor histidine kinase (HK) and its cognate response regulator (RR) []. The HK catalyses its own auto-phosphorylation followed by the transfer of the phosphoryl group to the receiver domain on RR; phosphorylation of the RR usually activates an attached output domain, which can then effect changes in cellular physiology, often by regulating gene expression. Some HK are bifunctional, catalysing both the phosphorylation and dephosphorylation of their cognate RR. The input stimuli can regulate either the kinase or phosphatase activity of the bifunctional HK.A variant of the two-component system is the phospho-relay system. Here a hybrid HK auto-phosphorylates and then transfers the phosphoryl group to an internal receiver domain, rather than to a separate RR protein. The phosphoryl group is then shuttled to histidine phosphotransferase (HPT) and subsequently to a terminal RR, which can evoke the desired response [
,
].This entry represents the signal transduction response regulator CheB involved in chemotaxis. CheB methylesterase is responsible for removing the methyl group from the gamma-glutamyl methyl ester residues in the methyl-accepting chemotaxis proteins (MCP). The enzyme catalyses the reaction: protein L-glutamate O-methyl ester and water is converted to protein L-glutamate and methanol. CheB
is regulated through phosphorylation by CheA. The N-terminal region of the protein is similar to that of other regulatory components of sensory transduction systems. The Myxococcus xanthus FrzG protein also belongs to this family, and is required for the normal aggregation of cells during fruiting body formation.
Major Histocompatibility Complex (MHC) glycoproteins are heterodimeric cell surface receptors that function to present antigen peptide fragments to T cells responsible for cell-mediated immune responses. MHC molecules can be subdivided into two groups on the basis of structure and function: class I molecules present intracellular antigen peptide fragments (~10 amino acids) on the surface of the host cells to cytotoxic T cells; class II molecules present exogenously derived antigenic peptides (~15 amino acids) to helper T cells. MHC class I and II molecules are assembled and loaded with their peptide ligands via different mechanisms. However, both present peptide fragments rather than entire proteins to T cells, and are required to mount an immune response.Class I MHC glycoproteins are expressed on the surface of all somatic nucleated cells, with the exception of neurons. MHC class I receptors present peptide antigens that are synthesised in the cytoplasm, which includes self-peptides (presented for self-tolerance) as well as foreign peptides (such as viral proteins). These antigens are generated from degraded protein fragments that are transported to the endoplasmic reticulum by TAP proteins (transporter of antigenic peptides), where they can bind MHC I molecules, before being transported to the cell surface via the Golgi apparatus [
,
]. MHC class I receptors display antigens for recognition by cytotoxic T cells, which have the ability to destroy viral-infected or malignant (surfeit of self-peptides) cells.MHC class I molecules are comprised of two chains: a MHC alpha chain (heavy chain), and a beta2-microglobulin chain (light chain), where only the alpha chain spans the membrane. The alpha chain has three extracellular domains (alpha 1-3, with alpha1 being at the N terminus), a transmembrane region and a C-terminal cytoplasmic tail. The soluble extracellular beta-2 microglobulin chain associates primarily with the alpha-3 domain and is necessary for MHC stability. The alpha1 and alpha2 domains of the alpha chain are referred to as the recognition region, because the peptide antigen binds in a deep groove between these two domains. This entry represents the alpha chain C-terminal tail domain.
The Rab11 GTPase regulates recycling of internalized plasma membrane receptors and is essential for completion of cytokinesis. A family of Rab11 interacting proteins (FIPs) that conserve a C-terminal Rab-binding domain (RBD) selectively recognise the active form of Rab11. FIPs are diverse in sequence length and composition toward their N-termini, presumably a feature that underpins their specific roles in Rab11-mediated vesicle trafficking. They have been divided into three subfamilies (classe I, II, and III)on the basis of domain architecture. Class I FIPs comprises a subfamily of three proteins (Rip11/pp75/FIP5, Rab-coupling protein (RCP), and FIP2) that possess an N-terminal C2 domain, localize to recycling endosomes, and regulate plasma membrane recycling. The class II subfamily consists of two proteins (FIP3/eferin/arfophilin and FIP4) with tandem EF hands and a proline-rich region. Class II FIPs localize to recycling endosomes, the trans-Golgi network, and have been implicated in the regulation of membrane trafficking during cytokinesis. The class III subfamily consists of a single protein, FIP1, which does not contain obvious homology domains or motifs other than the FIP-RBD [
,
,
,
].The FIP-RBD domain is also found in Rab6-interacting protein Erc1/Elks. Erc1 is the regulatory subunit of the IKK complex and probably recruits IkappaBalpha/NFKBIA to the complex [
]. It may be involved in the organisation of the cytomatrix at the nerve terminals active zone (CAZ) which regulates neurotransmitter release. It may also be involved in vesicle trafficking at the CAZ, as well as in Rab-6 regulated endosomes to Golgi transport [].The FIB-RBD domain consists of an N-terminal long α-helix, followed by a 90 degrees bend at a conserved proline residue, a 3(10) helix and a C-terminal short β-strand, adopting an "L"shape. The long α-helix forms a parallel coiled-coil homodimer that symmetrically interacts with two Rab11 molecules on both sides, forming a quaternary Rab11-(FIP)2-Rab11 complex. The Rab11-interacting region of FIP-RBD is confined to the C-terminal 24 amino acids, which cover the C-terminal half of the long α-helix and the short β-strand [
,
,
,
]. This entry represents the FIP-RBD C-terminal domain.
Eukaryotic eIF-5A was initially thought to function as a translation initiation factor, based on its ability to stimulate methionyl-puromycin synthesis. However, subsequent work revealed a role for eIF5A in translation elongation [
,
]. Depletion or inactivation of eIF-5A in the yeast Saccharomyces cerevisiae (Baker's yeast) resulted in the accumulation of polysomes and an increase in ribosomal transit times. Addition of recombinant eIF-5A from yeast, but not a derivative lacking hypusine, enhanced the rate of tripeptide synthesis in vitro. Moreover, inactivation of eIF-5A mimicked the effects of the eEF2 inhibitor sordarin, indicating that eIF-5A might function together with eEF2 to promote ribosomal translocation. Finally, it was shown that eIF5A is specifically required to promote peptide-bond formation between consecutive proline residues. It has been proposed to stimulate the peptidyl-transferase activity of the ribosome and facilitate the reactivity of poor substrates like proline [].eIF-5A is a cofactor for the Rev and Rex transactivator proteins of human immunodeficiency virus-1 and T-cell leukaemia virus I, respectively [
,
,
]. IF-5A is the sole protein in eukaryotes and archaea to contain the unusual amino acid hypusine (Ne-(4-amino-2-hydroxybutyl)lysine) that is an absolute functional requirement. The first step in the post-translational modification of lysine to hypusine is catalyzed by the enzyme deoxyhypusine synthase, the structure of which has been reported []. The archaeal IF-5A proteins have not been studied as comprehensively as their eukaryotic homologues, though the crystal structure of the Pyrobaculum aerophilum protein has been determined. Unmodified P. aerophilum IF-5A is found to be a beta structure with two domains and three separate hydrophobic cores. The lysine (Lys42) that is post-translationally modified by deoxyhypusine synthase is found at one end of the IF-5A molecule in a turn between beta strands beta4 and beta5; this lysine residue is freely solvent accessible. The C-terminal domain is found to be homologous to the cold-shock protein CspA of E. coli, which has a well characterised RNA-binding fold, suggesting that IF-5A is involved in RNA binding [
].This entry represents the archaeal IF-5A proteins.
This entry represents the Sirtuin family, class III subfamily. Proteins in this subfamily include the NAD-dependent protein deacylase sirtuin-5. Sirtuin-5 is the NAD-dependent lysine demalonylase and desuccinylase that specifically removes malonyl and succinyl groups on target proteins [
].The sirtuin (also known as Sir2) family is broadly conserved from bacteria to human. Yeast Sir2 (silent mating-type information regulation 2),
the founding member, was first isolated as part of the SIR complex required for maintaining a modified chromatin structure at telomeres. Sir2 functionsin transcriptional silencing, cell cycle progression, and chromosome stability [
]. Although most sirtuins in eukaryotic cells are located in the nucleus, others are cytoplasmic or mitochondrial.This family is divided into five classes (I-IV and U) on the basis of a phylogenetic analysis of 60 sirtuins from a wide array of organisms [
]. Class I and class IV are further divided into three and two subgroups, respectively. The U-class sirtuins are found only in Gram-positive bacteria []. The S. cerevisiae genome encodes five sirtuins, Sir2 and four additional proteins termed 'homologues of sir two' (Hst1p-Hst4p) []. The human genome encodes seven sirtuins, with representatives from classes I-IV [,
].Sirtuins are responsible for a newly classified chemical reaction, NAD-dependent protein deacetylation. The final products of the reaction are the
deacetylated peptide and an acetyl ADP-ribose []. In nuclear sirtuins this deacetylation reaction is mainly directed against histones acetylated lysines [].Sirtuins typically consist of two optional and highly variable N- and C-terminal domain (50-300 aa) and a conserved catalytic core domain (~250 aa). Mutagenesis experiments suggest that the N- and C-terminal regions help direct catalytic core domain to different targets [
,
].The 3D-structure of an archaeal sirtuin in complex with NAD reveals that the protein consists of a large domain having a Rossmann fold and a small domain containing a three-stranded zinc ribbon motif. NAD is bound in a pocket between the two domains [
].
Anthrax toxin is a plasmid-encoded toxin complex produced by the Gram-positive, spore-forming bacteria, Bacillus anthracis. The toxin consists of three non-toxic proteins: the protective antigen (PA), the lethal factor (LF) and the edema factor (EF) [
]. These component proteins self-assemble at the surface of host cell receptors, yielding a series of toxic complexes that can produce shock-like symptoms and death. Anthrax toxin is one of a large group of Bacillus and Clostridium exotoxins referred to as binary toxins, forming independent enzymatic (A moiety) and binding (B moiety) components. The LF and EF proteins are the enzymes (A moiety) that act on cytosolic substrates, while PA is a multi-functional protein (B moiety) that binds to cell surface receptors, mediates the assembly and internalisation of the complexes, and delivers them to the host cell endosome []. Once PA is attached to the host receptor [], it must then be cleaved by a host cell surface (furin family) protease before it is able to bind EF and LF. The cleavage of the N terminus of PA enables the C-terminal fragment to self-associate into a ring-shaped heptameric complex (prepore) that can bind LF or EF competitively. The PA-LF/EF complex is then internalised by endocytosis, and delivered to the endosome, where PA forms a pore in the endosomal membrane in order to translocate LF and EF to the cytosol. LF is a Zn-dependent metalloprotease that cleaves and inactivates mitogen-activated protein (MAP) kinases, kills macrophages, and causes death of the host by inhibiting cell proliferation [,
]. EF is a calcium-and calmodulin-dependent adenylyl cyclase that can cause edema (fluid-filled swelling) when associated with PA. EF is not toxic by itself, and is required for the survival of germinated Bacillus spores within macrophages at the early stages of infection. EF dramatically elevates the level of host intracellular cAMP, a ubiquitous messenger that integrates many processes of the cell; increases in cAMP can interfere with host intracellular signalling [].This entry represents the central domain found in the lethal factor protein of anthrax toxin.
Anthrax toxin is a plasmid-encoded toxin complex produced by the Gram-positive, spore-forming bacteria, Bacillus anthracis. The toxin consists of three non-toxic proteins: the protective antigen (PA), the lethal factor (LF) and the edema factor (EF) [
]. These component proteins self-assemble at the surface of host cell receptors, yielding a series of toxic complexes that can produce shock-like symptoms and death. Anthrax toxin is one of a large group of Bacillus and Clostridium exotoxins referred to as binary toxins, forming independent enzymatic (A moiety) and binding (B moiety) components. The LF and EF proteins are the enzymes (A moiety) that act on cytosolic substrates, while PA is a multi-functional protein (B moiety) that binds to cell surface receptors, mediates the assembly and internalisation of the complexes, and delivers them to the host cell endosome []. Once PA is attached to the host receptor [], it must then be cleaved by a host cell surface (furin family) protease before it is able to bind EF and LF. The cleavage of the N terminus of PA enables the C-terminal fragment to self-associate into a ring-shaped heptameric complex (prepore) that can bind LF or EF competitively. The PA-LF/EF complex is then internalised by endocytosis, and delivered to the endosome, where PA forms a pore in the endosomal membrane in order to translocate LF and EF to the cytosol. LF is a Zn-dependent metalloprotease that cleaves and inactivates mitogen-activated protein (MAP) kinases, kills macrophages, and causes death of the host by inhibiting cell proliferation [,
]. EF is a calcium-and calmodulin-dependent adenylyl cyclase that can cause edema (fluid-filled swelling) when associated with PA. EF is not toxic by itself, and is required for the survival of germinated Bacillus spores within macrophages at the early stages of infection. EF dramatically elevates the level of host intracellular cAMP, a ubiquitous messenger that integrates many processes of the cell; increases in cAMP can interfere with host intracellular signalling [].This entry includes lethal factor and edema factor proteins of anthrax toxin.
This entry includes ADAM-TS and ADAM-TS-like proteins, which are closely related to the ADAM family (A Disintegrin and Metalloproteinase) [
,
,
]. ADAM-TS proteases are zinc metalloendopeptidases, most of whose substrates are extracellular matrix (ECM) components, whereas ADAM-TS-like proteins lack a metalloprotease domain, reside in the ECM and have regulatory roles []. Examples of ADAM-TS-like proteins are papilin [] and punctin [].Proteolysis of the extracellular matrix plays a critical role in establishing tissue architecture during development and in tissue degradation in diseases such as cancer, arthritis, Alzheimer's disease and a variety of inflammatory conditions [
,
]. The proteolytic enzymes responsible for this process are members of diverse protease families, including the secreted zinc metalloproteases (MPs) []. ADAM-TS (A Disintegrin and Metalloproteinase with Thrombospondin Motifs) is closely related to the ADAM family (A Disintegrin and Metalloproteinase) and is a subfamily of the MP family, consists of at least 20 members sharing a high degree of sequence similarity and conserved domain organisation [
,
]. The defining domains of the ADAM-TS family are (from N- to C-termini) a pre-pro metalloprotease domain of the reprolysin type, a snake venom disintegrin-like domain, a thrombospondin type-I (TS) module, a cysteine-rich region, and a cysteine-free (spacer) domain []. Domain organisation following the spacer domain C terminus shows some variability in certain ADAM-TS members, principally in the number of additional TS domains. These enzymes have a wide-spectrum role in vascular biology and cardiovascular pathophysiology [].Members of the ADAM-TS family have been implicated in a range of diseases [
,
,
]. For instance, members of this family have been found to participate directly in processes in the central nervous system (CNS) such as the regulation of brain plasticity []. ADAM-TS1 is reported to be involved in inflammation and cancer cachexia [], whilst recessively inherited ADAM-TS2 mutations cause Ehlers-Danlos syndrome type VIIC, a disorder characterised clinically by severe skin fragility []. ADAM-TS4 is an aggrecanase involved in arthritic destruction of cartilage [].
The Rab11 GTPase regulates recycling of internalized plasma membrane receptors and is essential for completion of cytokinesis. A family of Rab11 interacting proteins (FIPs) that conserve a C-terminal Rab-binding domain (RBD) selectively recognise the active form of Rab11. FIPs are diverse in sequence length and composition toward their N-termini, presumably a feature that underpins their specific roles in Rab11-mediated vesicle trafficking. They have been divided into three subfamilies (classe I, II, and III)on the basis of domain architecture. Class I FIPs comprises a subfamily of three proteins (Rip11/pp75/FIP5, Rab-coupling protein (RCP), and FIP2) that possess an N-terminal C2 domain, localize to recycling endosomes, and regulate plasma membrane recycling. The class II subfamily consists of two proteins (FIP3/eferin/arfophilin and FIP4) with tandem EF hands and a proline-rich region. Class II FIPs localize to recycling endosomes, the trans-Golgi network, and have been implicated in the regulation of membrane trafficking during cytokinesis. The class III subfamily consists of a single protein, FIP1, which does not contain obvious homology domains or motifs other than the FIP-RBD [
,
,
,
].The FIP-RBD domain is also found in Rab6-interacting protein Erc1/Elks. Erc1 is the regulatory subunit of the IKK complex and probably recruits IkappaBalpha/NFKBIA to the complex [
]. It may be involved in the organisation of the cytomatrix at the nerve terminals active zone (CAZ) which regulates neurotransmitter release. It may also be involved in vesicle trafficking at the CAZ, as well as in Rab-6 regulated endosomes to Golgi transport [].The FIB-RBD domain consists of an N-terminal long α-helix, followed by a 90 degrees bend at a conserved proline residue, a 3(10) helix and a C-terminal short β-strand, adopting an "L"shape. The long α-helix forms a parallel coiled-coil homodimer that symmetrically interacts with two Rab11 molecules on both sides, forming a quaternary Rab11-(FIP)2-Rab11 complex. The Rab11-interacting region of FIP-RBD is confined to the C-terminal 24 amino acids, which cover the C-terminal half of the long α-helix and the short β-strand [
,
,
,
]. This entry represents the FIP-RBD C-terminal domain.
G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions (including various autocrine, paracrine and endocrine processes). They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups. We use the term clan to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence [
]. The currently known clan members include the rhodopsin-like GPCRs, the secretin-like GPCRs, the cAMP receptors, the fungal mating pheromone receptors, and the metabotropic glutamate receptor family. The rhodopsin-like GPCRs themselves represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7 transmembrane (TM) helices [,
,
]. A cluster of four intronless GPCR genes, sharing significant sequence similarity with one another, have been identified on human chromosome 19q13.1, downstream from the CD22 gene []. The receptors have been named GPR40, GPR41, GPR42 and GPR43. The GPR42 protein sequence shares more than 98% amino acid identity with GPR41 and is located on a possible polymorphic insert []. GPR40 has recently been shown to bind long-chain free fatty acids, molecules that have a role in various cellular processes, including regulation of insulin secretion [,
]. Expression of GPR40 is restricted to the pancreas, with high levels in the islets and pancreatic beta cell lines []. Upon activation, GPR40 appears to couple predominantly to Gq and partially to Gi proteins, and has been shown to amplify glucose-stimulated insulin secretion from pancreatic beta cells. The receptor may therefore be a potential target for anti-diabetic drugs [].
Pancreatic ribonucleases (RNaseA) are pyrimidine-specific endonucleases
found in high quantity in the pancreas of certain mammals and ofsome reptiles [
]. Specifically, the enzymes are involved in endonucleolyticcleavage of 3'-phosphomononucleotides and 3'-phosphooligonucleotides ending
in C-P or U-P with 2',3'-cyclic phosphate intermediates. Ribonuclease canunwind the DNA helix by complexing with single-stranded DNA; the complex
arises by an extended multi-site cation-anion interaction between lysineand arginine residues of the enzyme and phosphate groups of the nucleotides.
Other proteins belonging to the pancreatic RNAse family include: bovineseminal vesicle and brain ribonucleases; kidney non-secretory ribonucleases
[]; liver-type ribonucleases []; angiogenin, which induces vascularisationof normal and malignant tissues; eosinophil cationic protein [
], acytotoxin and helminthotoxin with ribonuclease activity; and frog liver
ribonuclease and frog sialic acid-binding lectin.The sequence of pancreatic RNases contains four conserved disulphide bonds and
three amino acid residues involved in the catalytic activity.Pancreatic ribonucleases (
) are pyrimidine-specific endonucleases present in high quantity in the pancreas of a number of mammalian taxa and of a few reptiles [
,
]. As shown in the following schematic representation of the sequence of pancreatic RNases there are four conserved disulphide bonds and three amino acid residues involved in the catalytic activity.+---------------------------+
| +------------------|------+| | | |
xxxxx#xxxxxxCxxxxxxC#xxxxxxxCxxCxxxCxxxxxCxxxxxCxxxxxxCxxx#xxx| **** | | |
| +---+ |+----------------------------+
'C': conserved cysteine involved in a disulphide bond.'#': active site residue.
'*': position of the pattern.A number of other proteins belongs to the pancreatic RNAse family and these are listed below.Bovine seminal vesicle and bovine brain ribonucleases.The kidney non-secretory ribonucleases (also known as eosinophil-derived neurotoxin (EDN) [
]).Liver-type ribonucleases [
].Angiogenin, which induces vascularization of normal and malignant tissues. It abolishes protein synthesis by specifically hydrolyzing cellular tRNAs.Eosinophil cationic protein (ECP) [
], a cytotoxin and helminthotoxin with ribonuclease activity.Frog liver ribonuclease and frog sialic acid-binding lectin [
].The signature pattern for these proteins includes five conserved residues: a cysteine involved in a disulphide bond, a lysine involved in the catalytic activity and three other residues important for substrate binding.
Glutaredoxins [
,
,
], also known as thioltransferases (disulphide reductases), are small proteins of approximately one hundred amino-acid residues which utilise glutathione and NADPH as cofactors. Oxidized glutathione is regenerated by glutathione reductase. Together these components compose the glutathione system [].Glutaredoxin functions as an electron carrier in the glutathione-dependent synthesis of deoxyribonucleotides by the enzyme ribonucleotide reductase. Like thioredoxin (TRX), which functions in a similar way, glutaredoxin possesses an active centre disulphide bond [
]. It exists in either a reduced or an oxidized form where the two cysteine residues are linked in an intramolecular disulphide bond. It contains a redox active CXXC motif in a TRX fold and uses a similar dithiol mechanism employed by TRXs for intramolecular disulfide bond reduction of protein substrates. Unlike TRX, GRX has preference for mixed GSH disulfide substrates, in which it uses a monothiol mechanism where only the N-terminal cysteine is required. The flow of reducing equivalents in the GRX system goes from NADPH ->GSH reductase ->GSH ->GRX ->protein substrates [
,
,
,
]. By altering the redox state of target proteins, GRX is involved in many cellular functions including DNA synthesis, signal transduction and the defense against oxidative stress.Glutaredoxin has been sequenced in a variety of species. On the basis of extensive sequence similarity, it has been proposed [
] that Vaccinia virus protein O2L is most probably a glutaredoxin. Finally, it must be noted that Bacteriophage T4 thioredoxin seems also to be evolutionary related. In position 5 of the pattern T4 thioredoxin has Val instead of Pro.This group of proteins found in bacteria and archaea contains a C-terminal domain with homology to bacterial and eukaryotic glutaredoxins, including a CPYC motif. There is an N-terminal domain which has even more distant homology to glutaredoxins. The overall domain structure appears to be related to bacterial alkylhydroperoxide reductases, but the homology may be distant enough that the function of this family is wholly different.
Multicopper oxidases [
,
] are enzymes that possess three spectroscopically different copper centres. These centres are called: type 1 (or blue), type 2 (or normal) and type 3 (or coupled binuclear). The enzymes that belong to this family include: Laccase (EC 1.10.3.2) (urishiol oxidase), an enzyme found in fungi and plants, which oxidizes many different types of phenols and diamines. L-ascorbate oxidase (EC 1.10.3.3), a higher plant enzyme. Ceruloplasmin (EC 1.16.3.1) (ferroxidase), a protein found in the serum of mammals and birds, which oxidizes a great variety of inorganic and organic substances. Structurally ceruloplasmin exhibits internal sequence homology, and seem to have evolved from the triplication of a copper-binding domain similar to that found in laccase and ascorbate oxidase. In addition to the above enzymes there are a number of proteins which, on the
basis of sequence similarities, can be said to belong to this family. Theseproteins are: Copper resistance protein A (copA) from a plasmid in Pseudomonas syringae. This protein seems to be involved in the resistance of the microbial host to copper.Blood coagulation factor V (Fa V).Blood coagulation factor VIII (Fa VIII).Yeast FET3 [
], which is required for ferrous iron uptake.Yeast hypothetical protein YFL041w and SpAC1F7.08, the fission yeast homologue.Factors V and VIII act as cofactors in blood coagulation and are structurally similar []. Their sequence consists of a triplicated A domain, a B domain and a duplicated C domain; in the following order: A-A-B-A-C-C. The A-type domain is related to the multicopper oxidases.This entry is drawn from a conserved region, which in ascorbate oxidase, laccase, in the third domain of ceruloplasmin, and in copA, contains five residues that are known to be involved in the binding of copper centres. However, it does not make any assumption on the presence of copper-binding residues and thus can detect domains that have lost the ability to bind copper (such as those in Fa V and Fa VIII).
The ~200 amino acid TBC/rab GTPase-activating protein (GAP) domain is well conserved across species and has been found in a wide range of different proteins from plant adhesion molecules to mammalian oncogenes. The name TBC derives from the name of the murine protein Tbc1 in which this domain was first identified based on its similarity to sequences in the tre-2 oncogene, and the yeast regulators of mitosis, BUB2 and cdc16 [
]. The connection of this domain with rab GTPase activation stems from subsequent in-depth sequence analyses and alignments [] and recent work demonstrating that it appears to contain the catalytic activities of the yeast rab GAPs, GYP1, and GYP7 [].The TBC/rab GAP domain has also been named PTM after three proteins known to contain it: the Drosophila pollux, the human oncoprotein TRE17 (oncoTRE17), and a myeloid cell line-expressed protein [
]. The TBC/rab GAP domain contains six conserved motifs named A to F []. A conserved arginine residue in the sequence motif B has been shown to be critical for the full GAP activity []. Resolution of the 3D structure of the TBC/rab GAP domain of GYP1 has shown that it is a fully α-helical V-shaped molecule. The conserved arginine residue is positioned at the side of the narrow cleft on the concave site of the V-shaped molecule. It has been proposed that this cleft is the binding site for the GTPase. The conserved arginine residue probably functions as a catalytic arginine finger analogous to that seen in ras and Rho-GAPs. The two key features of the arginine finger activation mechanism appear to be (i) the positioning of the catalytically essential GTPase glutamine side chain via a hydrogen bonding interaction between the glutamine carbamoyl-NH2 group and the main chain carbonyl group of the GAP arginine, and (ii) the polarization of the gamma-phosphate group or the stabilization of charge on it via the interaction of the positively charged side chain guanidinoyl group of the GAP arginine [].
Janus kinases (JAKs) are tyrosine kinases that function in membrane-proximal signalling events initiated by a variety of extracellular factors binding to cell surface receptors [
]. Many type I and II cytokine receptors lack a protein tyrosine kinase domain and rely on JAKs to initiate the cytoplasmic signal transduction cascade. Ligand binding induces oligomerisation of the receptors, which then activates the cytoplasmic receptor-associated JAKs. These subsequently phosphorylate tyrosine residues along the receptor chains with which they are associated. The phosphotyrosine residues are a target for a variety of SH2 domain-containing transducer proteins. Amongst these are the signal transducers and activators of transcription (STAT) proteins, which, after binding to the receptor chains, are phosphorylated by the JAK proteins. Phosphorylation enables the STAT proteins to dimerise and translocate into the nucleus, where they alter the expression of cytokine-regulated genes. This system is known as the JAK-STAT pathway.Four mammalian JAK family members have been identified: JAK1, JAK2, JAK3, and TYK2. They are relatively large kinases of approximately 1150 amino acids, with molecular weights of ~120-130kDa. Their amino acid sequences are characterised by the presence of 7 highly conserved domains, termed JAK homology (JH) domains. The C-terminal domain (JH1) is responsible for the tyrosine kinase function. The next domain in the sequence (JH2) is known as the tyrosine kinase-like domain, as its sequence shows high similarity to functional kinases but does not possess any catalytic activity. Although the function of this domain is not well established, there is some evidence for a regulatory role on the JH1 domain, thus modulating catalytic activity. The N-terminal portion of the JAKs (spanning JH7 to JH3) is important for receptor association and non-catalytic activity, and consists of JH3-JH4, which is homologous to the SH2 domain, and lastly JH5-JH7, which is a FERM domain.This entry represents the non-receptor tyrosine kinase JAK2 [
]. JAK2 was initially cloned using a PCR-based strategy utilising primers corresponding to conserved motifs within the catalytic domain of protein-tyrosine kinases []. In common with JAK1 and TYK2, and by contrast with JAK3, JAK2 appears to be ubiquitously expressed.
Glutaredoxins [
,
,
], also known as thioltransferases (disulphide reductases), are small proteins of approximately one hundred amino-acid residues which utilise glutathione and NADPH as cofactors. Oxidized glutathione is regenerated by glutathione reductase. Together these components compose the glutathione system [].Glutaredoxin functions as an electron carrier in the glutathione-dependent synthesis of deoxyribonucleotides by the enzyme ribonucleotide reductase. Like thioredoxin (TRX), which functions in a similar way, glutaredoxin possesses an active centre disulphide bond [
]. It exists in either a reduced or an oxidized form where the two cysteine residues are linked in an intramolecular disulphide bond. It contains a redox active CXXC motif in a TRX fold and uses a similar dithiol mechanism employed by TRXs for intramolecular disulfide bond reduction of protein substrates. Unlike TRX, GRX has preference for mixed GSH disulfide substrates, in which it uses a monothiol mechanism where only the N-terminal cysteine is required. The flow of reducing equivalents in the GRX system goes from NADPH ->GSH reductase ->GSH ->GRX ->protein substrates [
,
,
,
]. By altering the redox state of target proteins, GRX is involved in many cellular functions including DNA synthesis, signal transduction and the defense against oxidative stress.Glutaredoxin has been sequenced in a variety of species. On the basis of extensive sequence similarity, it has been proposed [
] that Vaccinia virus protein O2L is most probably a glutaredoxin. Finally, it must be noted that Bacteriophage T4 thioredoxin seems also to be evolutionary related. In position 5 of the pattern T4 thioredoxin has Val instead of Pro.In Saccharomyces cerevisiae (Baker's yeast), monothiol glutaredoxin 5 (Grx5p) is essential for the mitochondrial machinery involved in the synthesis and assembly of iron/sulphur centres [
]. Absence of Grx5p leads to constitutive oxidative damage, exacerbating that caused by external oxidants.Proteins in this group contain a conserved structural domain, PICOT-HD (amino acids 46-132 in Grx5p) [
].
The Notch receptor is a large, cell surface transmembrane protein involved in a wide variety of developmental processes in higher organisms [
]. It becomes activated when its extracellular region binds to ligands located on adjacent cells. Much of this extracellular region is composed of EGF-like repeats, many of which can be O-fucosylated. A number of these O-fucosylated repeats can in turn be further modified by the action of a beta-1,3-N-acetylglucosaminyltransferase enzyme known as Fringe []. Fringe potentiates the activation of Notch by Delta ligands, while inhibiting activation by Serrate/Jagged ligands. This regulation of Notch signalling by Fringe is important in many processes [].Four distinct Fringe proteins have so far been studied in detail; Drosophila Fringe (Dfng) and its three mammalian homologues Lunatic Fringe (Lfng), Radical Fringe (Rfng) and Manic Fringe (Mfng). Dfng, Lfng and Rfng have all been shown to play important roles in developmental processes within their host, though the phenotype of mutants can vary between species e.g. Rfng mutants are retarded in wing development in chickens, but have no obvious phenotype in mice [
,
,
]. Mfng mutants have not, so far, been charcterised. Biochemical studies indicate that the Fringe proteins are fucose-specific transferases requiring manganese for activity and utilising UDP-N-acetylglucosamine as a donor substrate []. The three mammalian proteins show distinct variations in their catalytic efficiencies with different substrates.Dfng is a glucosaminyltransferase that controls the response of the Notch receptor to specific ligands which is localised to the Golgi apparatus [
] (not secreted as previously thought). Modification of Notch occurs through glycosylation by Dfng. This entry consists of Fringe proteins and related glycosyltransferase enzymes including:
Beta-1,3-glucosyltransferase, which glucosylates O-linked fucosylglycan on thrombospondin type 1 repeat domains [
].Core 1 beta1,3-galactosyltransferase 1, generates the core T antigen, which is a precursor for many extended O-glycans in glycoproteins and plays a central role in many processes, such as angiogenesis, thrombopoiesis and kidney homeostasis development [
].
This entry represents a Sec39 domain, which can be found in yeast Sec39, human NBAS (neuroblastoma-amplified sequence) [
] and Arabidopsis MIP2 (MAG2-INTERACTING PROTEIN 2) []. They may be involved in Golgi-to-ER transport.Sec39 was originally identified as a protein involved in ER-Golgi transport in a large scale promoter shut down analysis of essential yeast genes [
]. A subsequent study found that Sec39p (Dsl3p) is required for Golgi-ER retrograde transport and is part of a very stable protein complex that also includes Dsl1p (in mammals ZW10), Tip20p (Rint-1) and the ER localized Q-SNARE proteins Ufe1p (syntaxin-18), Sec20p and Use1p []. This was confirmed in a genome-wide analysis of protein complexes [].
This domain is characteristic of cilia- and flagella-associated protein 20 (CFA20). CFA20 is a cilium- and flagellum-specific protein that plays a role in axonemal structure organisation and motility [
,
]. In Chlamydomonas reinhardtii, it stabilises outer doublet microtubules (DMTs) of the axoneme and may work as a scaffold for intratubular proteins, such as tektin and PACRG, to produce the beak structures in DMT1 [,
].Other proteins contain a domain with homology to CFA20. WDR90/POC16 contains such a domain in its N terminus, followed by a large C-terminal domain with multiple WD40 repeats [
]. This domain is also present in the N terminus of uncharacterised protein C3orf67.
Nitrobindins (Nbs), constituting a heme-protein family spanning from bacteria to Homo sapiens, display an all-β-barrel structural organization. Proteins containing this domain are putatively related to fatty acid-binding proteins (FABPs) [
].This domain can be found in THAP4 from mammals and At1g79260 from Arabidopsis. THAP4 catalyzes the heme-based conversion of peroxynitrite into nitrate/NO3- in vitro []. At1g79260 is a nitrophorin-like heme-binding protein that may reversibly bind nitric oxide (NO) and be involved in NO transport []. This entry also includes the β-barrel domain of Caenorhabditis elegans protein male abnormal 7 (Mab-7) which plays an important role in determining body shape and sensory ray morphology [].
The nuclease-related domain (NERD) is found in a broad range of bacterial, as well as single archaeal and plant proteins. Most NERD-containing proteins have a single domain, sometimes with additional (predicted) transmembrane helices. In a few instances, proteins containing NERD domains have additional domains (mostly involved in DNA processing), such as the HRDC, the UvrD/REP helicase, the DNA-binding C4 zinc finger, or the serine/threonine and tyrosine protein kinases. In all cases in which a NERD domain is present in multidomain proteins, it is found at the N terminus. The NERD domain is predicted to function in DNA processing, and may have a nuclease function [
].
Snf8 (also known as Vps22p/Eap30) is a subunit of ESCRT-II, a protein complex involved in driving protein sorting from endosomes to lysosomes.The multivesicular body (MVB) protein-sorting pathway targets transmembrane
proteins either for degradation or for function in the vacuole/lysosomes. Thesignal for entry into this pathway is monoubiquitination of protein cargo,
which results in incorporation of cargo into luminal vesicles at lateendosomes. Another crucial player is phosphatidylinositol 3-phosphate
(PtdINS(3)P), which is enriched on early endosomes and on the luminal vesiclesof MVBs. ESCRT (endosomal sorting complex required for transport)-I, -II and -III complexes are critical for MVB budding and sorting of
monoubiquitinated cargo into the luminal vesicles [].
A number of plant and fungal proteins that bind N-acetylglucosamine (e.g. solanaceous lectins of tomato and potato, plant endochitinases, the wound-induced proteins: hevein, win1 and win2, and the Kluyveromyces lactis killer toxin alpha subunit) contain this domain [
]. The domain may occur in one or more copies and is thought to be involved in recognition or binding of chitin subunits [,
]. In chitinases, as well as in the potato wound-induced proteins, the 43-residue domain directly follows the signal sequence and is therefore at the N terminus of the mature protein; in the killer toxin alpha subunit it is located in the central section of the protein.
A number of plant and fungal proteins that bind N-acetylglucosamine (e.g. solanaceous lectins of tomato and potato, plant endochitinases, the wound-induced proteins: hevein, win1 and win2, and the Kluyveromyces lactis killer toxin alpha subunit) contain this domain [
]. The domain may occur in one or more copies and is thought to be involved in recognition or binding of chitin subunits [,
]. In chitinases, as well as in the potato wound-induced proteins, the 43-residue domain directly follows the signal sequence and is therefore at the N terminus of the mature protein; in the killer toxin alpha subunit it is located in the central section of the protein.
This entry represents a domain found in the universal stress protein UspA [
], which is a small cytoplasmic bacterial protein whose expression is enhanced when the cell is exposed to stress agents. UspA enhances the rate of cell survival during prolonged exposure to such conditions, and may provide a general "stress endurance"activity. The crystal structure of Haemophilus influenzae UspA [
] revealsan alpha/beta fold similar to that of the Methanocaldococcus jannaschii (Methanococcus jannaschii) MJ0577 protein, which binds ATP [
], though UspA lacks ATP-binding activity.Proteins containing this domain include the TeaD protein from Halomonas elongata. TeaD regulates the ectoine uptake by the transporter TeaABC.
TeaD shows an ATP-dependent oligomerisation [].
The SWIRM domain is a small α-helical domain of about 85 amino acid residues found in eukaryotic chromosomal proteins. It is named after the proteins SWI3, RSC8 and MOIRA in which it was first recognised. This domain mediates protein-protein interactions in the assembly of chromatin-protein complexes [
,
]. The yeast SWI3 SWIRM structure revealed that it forms a four-helix globular domain containing a helix-turn-helix motif [].The SWIRM domain can be linked to different domains, such as the ZZ-type zinc finger (
), the Myb DNA-binding domain (
), the HORMA domain (
), the amino-oxidase domain, the chromo domain (
), and the JAB1/PAD1 domain.
This family consists of several bacterial Secretion monitor precursor (SecM) proteins. SecM is known to regulate SecA expression by translational coupling of the secM secA operon. Translational pausing at a specific Pro residue 5 residues before the end of the protein may allow disruption of a mRNA repressor helix that normally suppresses secA translation initiation. The eubacterial protein secretion machinery consists of a number of soluble and membrane associated components. One critical element is SecA ATPase, which acts as a molecular motor to promote protein secretion at translocation sites that consist of SecYE, the SecA receptor, and SecG and SecDFyajC proteins, which regulate SecA membrane cycling [].
Four small, soluble proteins (DsrE, DsrF, DsrH and DsrC) are encoded in the dsr gene region of the phototrophic sulphur bacterium Chromatium vinosum D. The dsrAB genes encoding dissimilatory sulphite reductase are part of the gene cluster, dsrABEFHCMK. The remaining proteins that are encoded are a transmembrane protein (DsrM) with similarity to haem-b-binding polypeptides and a soluble protein (DsrK) resembling [4Fe-4S]-cluster-containing heterodisulphide reductase from methanogenic archaea. DsrE is a small soluble protein involved in intracellular sulphur reduction []. In E. coli, the DsrEFH homologue TusBCD interacts with the DsrC homologue TusE in a sulfur relay system during 2-thiouridine biosynthesis [].This family includes DsrE/F and the homologues TusD/C.
The cysteine-rich domain (CRD) is an essential part of the secreted frizzled-related Protein 5 (SFRP5), which is a positive regulator of BMP and Wnt signaling during retinal and gastrointestinal development [
,
].In general, SFRPs antagonize the activation of Wnt signaling by binding to the CRD domains of frizzled (Fz) proteins, thereby preventing Wnt proteins from binding to these receptors. SFRPs are also known to have functions unrelated to Wnt, as enhancers of procollagen cleavage by the TLD proteinases [
]. SFRPs and Fz proteins both contain CRD domains, but SFRPs lack the seven-pass transmembrane domain which is an integral part of Fzs [,
].
Arenaviridae are single stranded RNA viruses. The arenaviridae S RNAs that have been characterised include conserved terminal sequences, an ambisense arrangement of the coding regions for the precursor glycoprotein (GPC) and nucleocapsid (N) proteins and an intergenic region capable of forming a base-paired "hairpin"structure. The mature glycoproteins that result are G1 and G2 and the N protein [
].The C-terminal domain of the nucleocapsid protein (that encapsulates the viral ssRNA) in arenaviridae has an RNaseH-like fold. This domain contains 3'-5' exoribonuclease activity involved in suppressing interferon induction [
]. It forms a typical alpha/beta/alpha sandwich architecture and contains a CCHE zinc-binding site near the 3'-5' exonuclease active site.
This family of proteins shows considerable homology and identical active site residues to FabH, the beta-ketoacyl-acyl carrier protein synthase III of bacteria and plants. The archaeal species in which it is found, however, do not have a readily detectable homologue of acyl carrier protein itself, suggesting the condensation of the acyl group with some other carrier. In Methanocaldococcus jannaschii (Methanococcus jannaschii)
, Cys-112 is the site of acyl group attachment and His-234 and Asn-237 are also active site residues by homology to FabH.
Closely related bacterial families include a polyketide antibiotic (2,4-diacetylphloroglucinol) biosynthesis protein from Pseudomonas fluorescens and an uncharacterised protein from Staphylococcus carnosus.
This superfamily includes MAGE (melanoma antigen-encoding gene) proteins, which are expressed in a wide variety of tumours but not in normal cells, with the exception of the male germ cells, placenta, and, possibly, cells of the developing embryo. The cellular function is currently unknown. This superfamily also contains the yeast protein, Nse3. The Nse3 protein is part of the Smc5-6 complex and interacts with NSE4/EID family proteins [
,
]. Nse3 has been demonstrated to be important in meiosis [].MAGE proteins share a conserved domain known as the MAGE homology domain (MHD). This superfamily entry represents the domain that contains a winged helix motif WH1.
This superfamily includes MAGE (melanoma antigen-encoding gene) proteins, which are expressed in a wide variety of tumours but not in normal cells, with the exception of the male germ cells, placenta, and, possibly, cells of the developing embryo. The cellular function is currently unknown. This superfamily also contains the yeast protein, Nse3. The Nse3 protein is part of the Smc5-6 complex and interacts with NSE4/EID family proteins [
,
]. Nse3 has been demonstrated to be important in meiosis [].MAGE proteins share a conserved domain known as the MAGE homology domain (MHD). This superfamily entry represents the domain that contains a winged helix motif WH2.
Endothelial cell-specific chemotaxis regulator, ECSCR (also known as endothelial cell-specific molecule 2, ECMS2), is a novel cell surface protein that regulates endothelial chemotaxis and tube formation; it also interacts with filamin A, implicating a role in angiogenesis via modulation of the actin cytoskeleton [
].ECSCR is a 205-amino acid protein containing a putative transmembrane (TM) domain, a conserved intracellular domain and a variable extracellular domain, but no other known functional sites, and no similarity to any other known proteins. It has been shown that the protein is a marker of primary hemangioblasts and endothelial progenitors, and not other hematopoietic progenitor cells [
], but little else is known about the protein.
The Streptococcus Pneumoniae pilus backbone protein, RrgB, has three tandem domains with Lys-to-Asn isopeptide bonds, but these three regions are extremely divergent in sequence. This entry represents the D2 domain [
]. It occurs just once in many surface proteins but up to twenty times in some pilin subunit proteins. Three of every four members have the typical Gram-positive C-terminal motif, LPXTG, although in many cases this motif may be involved in pilin subunit cross-linking rather than cell wall attachment. Proteins with this domain include fimbrial proteins with lectin-like adhesion functions, and the majority of characterised members are involved in surface adhesion to host structures [,
].
This entry represents the archaeal-type gltB proteins. Proteins in this entry share sequence similarity with a region of unknown function found in the large subunit of glutamate synthase, which is encoded by gltB and found in most bacteria and eukaryotes. It is predicted to be homologous to the C-terminal domain of glutamate synthase based upon sequence similarity coupled with genome organization data, showing that this protein is found in a gene cluster with other proteins of Glts, which are annotated. Proteins in this entry are found in archaea, but are also present in a few bacteria, likely as a result of lateral gene transfer [
].