Succinate dehydrogenase/fumarate reductase type B, transmembrane subunit
Type:
Family
Description:
Succinate dehydrogenase (SDH) is a membrane-bound complex of two main components: a membrane-extrinsic component composed of an FAD-binding flavoprotein and an iron-sulphur protein, and a hydrophobic component composed of a cytochrome b and a membrane anchor protein. The cytochrome b component is a mono-haem transmembrane protein [
,
,
] belonging to a family that includes:Cytochrome b-556 from bacterial SDH (gene sdhC).Cytochrome b560 from the mammalian mitochondrial SDH complex, which is encoded in the mitochondrial genome of some algae and in the plant Marchantia polymorpha.Cytochrome b from yeast mitochondrial SDH complex (gene SDH3 or CYB3).Protein cyt-1 from Caenorhabditis elegans.These cytochromes are proteins of about 130 residues that comprise three transmembrane regions. There are two conserved histidines which may be involved in binding the haem group.This family also includes the subunit C (the cytochrome B subunit) of type B fumarate reductases [
].
This entry contains serine endopeptidases belonging to the MEROPS peptidase family S8 (subtilisin family , clan SB). Limited proteolysis of most large protein precursors is carried out in vivo by the subtilisin-like pro-protein convertases. Many important biological processes such as peptide hormone synthesis, viral protein processing and receptor maturation involve proteolytic processing by these enzymes [
]. The subtilisin-serine protease (SRSP) family hormone and pro-protein convertases (furin, PC1/3, PC2, PC4, PACE4, PC5/6, and PC7/7/LPC) act within the secretory pathway to cleave polypeptide precursors at specific basic sites, generating their biologically active forms. Serum proteins, pro-hormones, receptors, zymogens, viral surface glycoproteins, bacterial toxins, amongst others, are activated by this route []. The SRSPs share the same domain structure, including a signal peptide, the pro-peptide, the catalytic domain, the P/middle or homo B domain, and the C terminus.
This entry represents a domain found in Escherichia coli UbiB, known in Providencia stuartii as Aarf, which is required for ubiquinone (CoQ) biosynthesis [
,
,
]. Some proteins with this domain are described as aarF domain-containing protein kinases (ADCKs). This domain is also found in yeast ABC1 proteins () required for function of the mitochondrial bc1 complex [
], in which CoQ functions as an essential cofactor. The function of these proteins is not clear. Along with ABC1, UbiB is part of a large family of proteins that contain motifs found in eukaryotic-type protein kinases [
], but is not known if they have kinase activity and how this would relate to their requirement for the monoxygenase step in CoQ synthesis. A role in regulation of this step by phosphorylation has been speculated [
].
The Streptococcus pneumoniae psaA gene encodes a protein with significant similarity to previously-reported Streptococcal proteins, SsaB (80% similarity) and FimA (92.3% similarity), from Streptococcus sanguis and Streptococcus parasanguis [
]. These homologues are associated with bacterial adhesion, and PsaA may play a similar role [].The SsaB protein has a putative hydrophobic 19-amino-acid signal sequence yielding a 32,620-Mr secreted protein [
]. SsaB is hydrophilic and appears not to have a hydrophobic membrane anchor in its C-terminal region. A high degree of similarity exists between S. sanguis ssaB and type 1 fimbrial genes []. Comparison of the gene products reveals close similarity of the two proteins. It is thought that ssaB adhesion may play a role in oral colonisation by binding either to a receptor on saliva or to a receptor on Actinomyces.This family includes the adhesins and related periplasmic binding proteins.
This entry represents the core domain of the ferrous iron (Fe2+) transport protein FeoA found in bacteria. This domain also occurs at the C terminus in the diphtheria toxin repressor (DtxR). DtxR is a iron-binding repressor that contains two domains separated by a short linker. The C-terminal domain adopts a fold similar to eukaryotic Src homology 3 domains, but its functional role is unknown [
].The transporter Feo is composed of three proteins: FeoA a small, soluble SH3-domain protein probably located in the cytosol; FeoB, a large protein with a cytosolic N-terminal G-protein domain and a C-terminal integral inner-membrane domain containing two 'Gate' motifs which likely functions as the Fe2+ permease; and FeoC, a small protein apparently functioning as an [Fe-S]-dependent transcriptional repressor [,
]. Feo allows the bacterial cell to acquire iron from its environment.
This domain contains a P-loop motif that is characteristic of the AAA superfamily. Proteins containing this domain are DEAD-like helicases belonging to superfamily (SF)1, a diverse family of proteins involved in ATP-dependent RNA or DNA unwinding. Similar to SF2 helicases, SF1 helicases do not form toroidal structures like SF3-6 helicases. Their helicase core consists of two similar protein domains that resemble the fold of the recombination protein RecA [
,
,
].Proteins containing this domain include helicases, such as Dna2 and Nam7. Dna2 is a DNA replication factor with single-stranded DNA-dependent ATPase, ATP-dependent nuclease, (5'-flap endonuclease) and helicase activities [
,
]. Nam7 (also known as Upf1) is an ATP-dependent RNA helicase involved in the nonsense-mediated mRNA decay (NMD) pathway []. This domain can also be found in some virus polyproteins. This domain is also known as HelicC.
Copper resistance protein K (CopK) is a periplasmic dimeric protein which is strongly up-regulated in the presence of copper, leading to a high periplasmic accumulation [
,
]. CopK has two different binding sites for Cu(I), each with a different affinity for the metal. Binding of the first Cu(I) ion (CuI) induces a conformational change of CopK which involves dissociation of the dimeric apo-protein. Binding of a second Cu(I) (CuII) further increases the plasticity of the protein. The binding of CuI greatly enhances the specific affinity for CuII and, in turn, CuII binding increases the specific affinity for CuI, although to a lesser extent. This type of cooperative behaviour is unprecedented in copper binding proteins []. CopK has features that are common with functionally related proteins such as a structure consisting of an all-beta fold and a methionine-rich Cu(I) binding site [].
The SOCS box was first identified in SH2-domain-containing proteins of the suppressor of cytokines signalling (SOCS) family [
] but was later also found in:the WSB (WD-40-repeat-containing proteins with a SOCS box) family,the SSB (SPRY domain-containing proteins with a SOCS box) family,the ASB (ankyrin-repeat-containing proteins with a SOCS box) family,and ras and ras-like GTPases [
].The SOCS box found in these proteins is an about 50 amino acid C-terminal domain composed of two blocks of well-conserved residues separated by between 2 and 10 non-conserved residues [
]. The C-terminal conserved region is an L/P-rich sequence of unknown function, whereas the N-terminal conserved region is a consensus BC box [], which binds to the Elongin BC complex [,
]. It has been demonstrated that this association couple bound proteins to the ubiquitination or proteasomal compartments [,
].
This entry represents the RNA recognition motif 1 (RRM1) of RBM6 (also known as lung cancer antigen NY-LU-12, or protein G16, or RNA-binding protein DEF-3), which has been predicted to be a nuclear factor based on its nuclear localization signal [
,
,
]. RBM6 shows high sequence similarity to RNA-binding protein 5 (RBM5). Both RBM6 and RBM5 specifically bind poly(G) RNA. They contain two RNA recognition motifs (RRMs), also termed RBDs (RNA binding domains) or RNPs (ribonucleoprotein domains), two C2H2-type zinc fingers, a nuclear localization signal, and a G-patch/D111 domain. In contrast to RBM5, RBM6 has two additional unique domains: the decamer repeat occurring more than 20 times, and the POZ (poxvirus and zinc finger) domain. The POZ domain may be involved in protein-protein interactions and inhibit binding of target sequences by zinc fingers [
,
,
].
This entry represents the RNA recognition motif 2 (RRM2) of RBM6 (also known as lung cancer antigen NY-LU-12, or protein G16, or RNA-binding protein DEF-3), which has been predicted to be a nuclear factor based on its nuclear localization signal [
,
,
]. RBM6 shows high sequence similarity to RNA-binding protein 5 (RBM5). Both RBM6 and RBM5 specifically bind poly(G) RNA. They contain two RNA recognition motifs (RRMs), also termed RBDs (RNA binding domains) or RNPs (ribonucleoprotein domains), two C2H2-type zinc fingers, a nuclear localization signal, and a G-patch/D111 domain. In contrast to RBM5, RBM6 has two additional unique domains: the decamer repeat occurring more than 20 times, and the POZ (poxvirus and zinc finger) domain. The POZ domain may be involved in protein-protein interactions and inhibit binding of target sequences by zinc fingers [
,
,
].
Bromodomains are found in a variety of mammalian, invertebrate and yeast DNA-binding proteins [
]. Bromodomains are highly conserved α-helical motifs that can specifically interact with acetylated lysine residues on histone tails [,
]. In some proteins, the classical bromodomain has diverged to such an extent that parts of the region are either missing or contain an insertion (e.g., mammalian protein HRX, Caenorhabditis elegans hypothetical protein ZK783.4, yeast protein YTA7). The bromodomain may occur as a single copy, or in duplicate.This domain is present in proteins involved in a wide range of functions such as acetylating histones, remodeling chromatin, and recruiting other factors necessary for transcription, thus playing a critical role in the regulation of transcription [
].The structure of bromodomain is composed of four helices with bundle fold and minor mirror variant of up-and-down topology.
This family represents a group of Poxvirus Bcl-2-like proteins that function as immunomodulators to evade the host innate immune response through the inhibition of apoptosis or blocking the activation of pro-inflammatory transcription factors, such as interferon (IFN) regulatory factor-3 (IRF-3) and nuclear factor-kappaB (NF-kappaB) [
]. These proteins have low sequence identity but high structural similarity to the eukaryotic Bcl-2 protein family. They have a Bcl-2 fold which comprises a central hydrophobic α-helix that is surrounded by an additional layer of 6-7 amphipathic α-helices []. Included in this family are proteins B14 [], A52 [,
], A46 [], C6 [], K7, N1, and N2 []. Protein N1 has the unusual dual ability of modulating apoptosis and inflammatory signalling []. Protein K7, in addition to its anti-inflammatory activity, forms a complex with RNA helicase DDX3 and antagonises interferon-beta promoter induction [].
Prickle has been implicated in regulation of cell movement in the planar cell polarity (PCP) pathway [
], which requires the conserved Frizzled/Dishevelled (Dsh). Prickle interacts with Dishevelled [], thereby modulating the activity of Frizzled/Dishevelled and the PCP signalling. Two forms of prickle have been identified, namely prickle 1 and prickle 2. These are differentially expressed; prickle 1 is found in fetal heart and haematological malignancies, while prickle 2 is expressed in fetal brain, adult cartilage, pancreatic islet, and some types of tumorous cells [].Prickle contains an N-terminal PET domain and three C-terminal LIM domains. This entry represents the PET domain of prickle. The PET domain is a protein-protein interaction domain, usually found in conjunction with the LIM domain, which is also involved in protein-protein interactions. The PET containing proteins serve as adaptors or scaffolds to support the assembly of multimeric protein complexes.
This is the C-terminal domain of DSC E3 ubiquitin ligase complex subunit 3 (Dsc3), which contains two transmembrane helices and a characteristic GFDRL sequence motif. Dsc3 is a component of the DSC E3 ubiquitin ligase complex (a Golgi-specific protein ubiquitination system) that functions in protein homeostasis under non-stress conditions, playing a role in protein quality control through endosome and Golgi-associated degradation pathway (EGAD) which targets membrane proteins at Golgi and endosomes for degradation by cytosolic proteasomes [
,
,
]. Dsc3 is also involved in endocytic protein trafficking [].Yeast DSC E3 ubiquitin ligase complex is the homologue of Hrd1 E3 ligase complex from mammals, in which Dsc1, Dsc2 and Dsc3 corresponds to Hrd1, Der1, and Usa1, respectively. Dsc3 is a Herp-like protein that acts as a bridge between Dsc1 and Dsc2 for their interaction [
].
Sorting nexin-18 (SNX18) is localized to peripheral endosomal structures, and acts in a trafficking pathway that is clathrin-independent but relies on AP-1 and PACS1 [
]. It binds FIP5 and is required for apical lumen formation []. It may also play a role in axonal elongation []. SNXs are Phox homology (PX) domain containing proteins that are involved in regulating membrane traffic and protein sorting in the endosomal system [,
]. SNX18 also contains BAR and SH3 domains.The PX domain is a phosphoinositide (PI) binding module present in many proteins with diverse functions. Sorting nexins (SNXs) make up the largest group among PX domain containing proteins. They are involved in regulating membrane traffic and protein sorting in the endosomal system. The PX domain of SNXs binds PIs and targets the protein to PI-enriched membranes [
].
Arenaviridae are single stranded RNA viruses. The arenaviridae S RNAs that have been characterised include conserved terminal sequences, an ambisense arrangement of the coding regions for the precursor glycoprotein (GPC) and nucleocapsid (N) proteins and an intergenic region capable of forming a base-paired "hairpin"structure. The mature glycoproteins that result are G1 and G2 and the N protein [
].Tacaribe virus (TACV) is an arenavirus that is genetically and antigenically
closely related to Junin arenavirus (JUNV), the aetiological agent of Argentinehaemorrhagic fever (AHF). It is well established that TACV protects experimental animals fully against an otherwise lethal challenge with JUNV. It has been established that it is the heterologous glycoprotein that protects against JUNV challenge. A recombinant vaccinia virus that expresses JUNV glycoprotein precursor (VV-GJun) protected seventy-two percent of the animals inoculated with two doses of VV-GJun against the lethal JUNV challenge [
].
This domain superfamily is found in apolipoprotein CII (apoC-II). ApoC-II is a surface constituent of plasma lipoproteins and the activator for lipoprotein lipase (LPL). It is therefore central for lipid transport in blood. Lipoprotein lipase is a key enzyme in the regulation of triglyceride levels in human serum [
]. It is the C-terminal helix of apoCII that is responsible for the activation of LPL []. The active peptide of apoC-II occurs at residues 44-79 and has been shown to reverse the symptoms of genetic apoC-II deficiency in a human subject [].Micellar SDS, a commonly used mimetic of the lipoprotein surface, inhibits the aggregation of apoC-II and induces a stable structure containing approximately 60% α-helix. The first 12 residues of apoC-II are structurally heterogeneous but the rest of the protein forms a predominantly helical structure [
].
Pheromone binding proteins (PBPs) are abundant, secreted antennal proteins present in olfactory sensilla. They lie in the
lymph that surrounds the dendrites of olfactory receptor neurons []. The olfactory receptors of terrestrial animals exist in anaqueous environment, yet detect odorants that are primarily hydrophobic. The aqueous solubility of hydrophobic odorants is
thought to be greatly enhanced via odorant binding proteins which exist in the extracellular fluid surrounding the odorantreceptors [
]. This family is composed of pheromone binding proteins (PBP), which are male-specific and associate withpheromone-sensitive neurons and general-odorant binding proteins (GOBP). GOBPs have been found in moth antennae,
and share a high level of sequence similarity with the PBPs. They are believed to carry odorants rather than pheromones.This family is distinct from the vertebrate odorant-binding proteins (OBPs).
This entry represents the PBPs and GOBPs in Lepidoptera (butterflies and moths).
This entry represents the pleckstrin homology (PH) domain from 3-phosphoinositide-dependent protein kinase 1 (PDK1) type proteins.
PDK1 plays an important role in insulin and growth factor signalling cascades. It phosphorylates and activates many AGC (cAMP-dependent, cGMP-dependent, protein kinase C (PKC)) family of protein kinases members, including protein kinase B (PKB, also known as Akt), p70 ribosomal S6-kinase (S6K), serum and glucocorticoid responsive kinase (SGK), p90 ribosomal S6 kinase (RSK), and PKC [
]. PDK1 contains an N-terminal serine/threonine kinase domain followed by a PH domain. Following binding of the PH domain to PtdIns(3,4,5)P3 and PtdIns(3,4)P2, PDK1 activates these enzymes by phosphorylating a Ser/Thr residue in their activation loop. [,
,
,
].Pleckstrin homology (PH) domains are small modular domains that occur in a large variety of signalling proteins, where they serve as simple targeting domains that bind lipids [,
,
].
Microsomal triglyceride transfer protein large subunit (MTTP) catalyzes the transport of triglyceride, cholesteryl ester, and phospholipid between phospholipid surfaces, and is required for the secretion of plasma lipoproteins that contain apolipoprotein B. It is a heterodimer consisting of a large MTP alpha-subunit and a protein disulfide isomerase (PDI) beta-subunit. Mutations in microsomal triglyceride transfer protein (MTP) cause abetalipoproteinemia [
,
,
].The MTP large subunit has and N-terminal β-barrel domain, a central α-helical domain and a C-terminal 2-β-sheet domain. This is the highly conserved MTP C-terminal domain involved in lipid-binding and transfer activity, with preference of neutral 3 chain-containing lipids. The β-sheets are named A and C and form a sandwich comprising the lipid-binding site between them. Half of the A-sheet interacts with the β-barrel. This domain also interacts with a' and b' domains of PDI [
].
RGS14 is a regulator of G protein signaling (RGS) protein and contains an N-terminal RGS domain, two tandem Ras-binding domains (RBDs) and a G protein regulatory (GPR, also referred to as a GoLoco) motif. It regulates G protein nucleotide exchange and hydrolysis by acting as a GTPase-activating protein (GAP) through its RGS domain, and as a guanine nucleotide dissociation inhibitor (GDI) through its GoLoco motif [
,
]. Both domains of RGS14 target members of the Gialpha subclass []. It is a microtubule-associated protein that may modulate microtubule dynamics and spindle formation [] and play an essential role during mammalian cell division []. RGS14 regulates the activation of alphaMbeta2 integrin during phagocytosis []. It is a key regulator of signalling pathways linking synaptic plasticity in CA2 pyramidal neurons to hippocampal-based learning and memory [].This entry represents the RGS domain of RGS14.
The SOCS box was first identified in SH2-domain-containing proteins of the suppressor of cytokines signalling (SOCS) family [
] but was later also found in:the WSB (WD-40-repeat-containing proteins with a SOCS box) family,the SSB (SPRY domain-containing proteins with a SOCS box) family,the ASB (ankyrin-repeat-containing proteins with a SOCS box) family,and ras and ras-like GTPases [
].The SOCS box found in these proteins is an about 50 amino acid C-terminal domain composed of two blocks of well-conserved residues separated by between 2 and 10 non-conserved residues [
]. The C-terminal conserved region is an L/P-rich sequence of unknown function, whereas the N-terminal conserved region is a consensus BC box [], which binds to the Elongin BC complex [,
]. It has been demonstrated that this association couple bound proteins to the ubiquitination or proteasomal compartments [,
].
This entry represents the PX domain found in Sorting nexin-5 (SNX5). The PX domain of SNX5 binds phosphatidylinositol-3-phosphate (PI3P) and PI(3,4)P2. SNX5 is localized to a subdomain of early endosome and is recruited to the plasma membrane following EGF stimulation and elevation of PI(3,4)P2 levels [
].The Phox Homology (PX) domain is a phosphoinositide (PI) binding module present in many proteins with diverse functions. Sorting nexins (SNXs) make up the largest group among PX domain containing proteins. They are involved in regulating membrane traffic and protein sorting in the endosomal system. The PX domain of SNXs binds phosphoinositides (PIs) and targets the protein to PI-enriched membranes [
,
]. SNXs differ from each other in PI-binding specificity and affinity, and the presence of other protein-protein interaction domains, which help determine subcellular localization and specific function in the endocytic pathway [,
,
].
Limited proteolysis of most large protein precursors is carried out in vivo by the subtilisin-like pro-protein convertases. Many important biological processes such as peptide hormone synthesis, viral protein processing and receptor maturation involve proteolytic processing by these enzymes [
]. The subtilisin-serine protease (SRSP) family hormone and pro-protein convertases (furin, PC1/3, PC2, PC4, PACE4, PC5/6, and PC7/7/LPC) act within the secretory pathway to cleave polypeptide precursors at specific basic sites, generating their biologically active forms. Serum proteins, pro-hormones, receptors, zymogens, viral surface glycoproteins, bacterial toxins, amongst others, are activated by this route []. The SRSPs share the same domain structure, including a signal peptide, the pro-peptide, the catalytic domain, the P/middle or homo B domain, and the C terminus.This entry represents subtilisin-like serine proteases related to TK1689 from Thermococcus kodakaraensis. TK1689 has high resistance to heat, denaturants, detergents and chelating agents [
].
This family consists of haemolysin expression modulating protein (Hha) from Escherichia coli and its enterobacterial homologues, such as YmoA from Yersinia enterocolitica, and RmoA encoded on the R100 plasmid. These proteins act as modulators of bacterial gene expression. Members of the Hha/YmoA/RmoA family act in conjunction with members of the H-NS family, participating in the thermoregulation of different virulence factors and in plasmid transfer [
]. Hha, along with the chromatin-associated protein H-NS, is involved in the regulation of expression of the toxin alpha-haemolysin in response to osmolarity and temperature []. YmoA modulates the expression of various virulence factors, such as Yop proteins and YadA adhesin, in response to temperature. RmoA is a plasmid R100 modulator involved in plasmid transfer []. The HHA family of proteins display striking similarity to the oligomerization domain of the H-NS proteins.
Helicases have been classified in 5 superfamilies (SF1-SF5). All of the proteins bind ATP and, consequently, all of them carry the classical Walker A
(phosphate-binding loop or P-loop) and Walker B(Mg2+-binding aspartic acid) motifs. For the two largest groups, commonly
referred to as SF1 and SF2, a total of seven characteristic motifs has beenidentified [
]. These two superfamilies encompass a large number of DNA andRNA helicases from archaea, eubacteria, eukaryotes and viruses that seem to be
active as monomers or dimers. RNA and DNA helicases are considered to beenzymes that catalyze the separation of double-stranded nucleic acids in an
energy-dependent manner [].The various structures of SF1 and SF2 helicases present a common core with two
α-β RecA-like domains [,
]. Thestructural homology with the RecA recombination protein covers the five
contiguous parallel beta strands and the tandem alpha helices. ATP binds tothe amino proximal α-β domain, where the Walker A (motif I) and Walker
B (motif II) are found. The N-terminal domain also contains motif III (S-A-T)which was proposed to participate in linking ATPase and helicase activities.
The carboxy-terminal α-β domain is structurally very similar to theproximal one even though it is bereft of an ATP-binding site, suggesting that
it may have originally arisen through gene duplication of the first one.Some members of helicase superfamilies 1 and 2 are listed below:
DEAD-box RNA helicases. The prototype of DEAD-box
proteins is the translation initiation factor eIF4A. The eIF4A protein isan RNA-dependent ATPase which functions together with eIF4B as an RNA
helicase [].DEAH-box RNA helicases. Mainly pre-mRNA-splicing factor
ATP-dependent RNA helicases [].Eukaryotic DNA repair helicase RAD3/ERCC-2, an ATP-dependent 5'-3' DNA
helicase involved in nucleotide excision repair of UV-damaged DNA.Eukaryotic TFIIH basal transcription factor complex helicase XPB subunit.
An ATP-dependent 3'-5' DNA helicase which is a component of the core-TFIIHbasal transcription factor, involved in nucleotide excision repair (NER) of
DNA and, when complexed to CAK, in RNA transcription by RNA polymerase II.It acts by opening DNA either around the RNA transcription start site or
the DNA.Eukaryotic ATP-dependent DNA helicase Q. A DNA helicase that may play a
role in the repair of DNA that is damaged by ultraviolet light or othermutagens.Bacterial and eukaryotic antiviral SKI2-like helicase. SKI2 has a role in
the 3'-mRNA degradation pathway, repressing dsRNA virus propagation byspecifically blocking translation of viral mRNAs, perhaps recognizing the
absence of CAP or poly(A).Bacterial DNA-damage-inducible protein G (DinG). A probable helicase
involved in DNA repair and perhaps also replication [].Bacterial primosomal protein N' (PriA). PriA protein is one of seven
proteins that make up the restart primosome, an apparatus that promotesassembly of replisomes at recombination intermediates and stalled
replication forks.Bacterial ATP-dependent DNA helicase recG. It has a critical role in
recombination and DNA repair, helping process Holliday junctionintermediates to mature products by catalyzing branch migration. It has a
DNA unwinding activity characteristic of helicases with a 3' to 5'polarity.A variety of DNA and RNA virus helicases and transcription factorsThis entry represents the DNA-binding domain of classical SF1 and SF2 helicases. It does not recognize bacterial DinG and eukaryotic Rad3 which differ from other SF1-SF2 helicases by the presence of a large insert after the Walker A (see
).
Amyloid-beta precursor protein (APP, or A4) is associated with Alzheimer's disease (AD), because one of its breakdown products, amyloid-beta (A-beta), aggregates to form amyloid or senile plaques [
,
,
]. Mutations in APP or in proteins that process APP have been linked with early-onset, familial AD. Individuals with Down's syndrome carry an extra copy of chromosome 21, which contains the APP gene, and almost invariably develop amyloid plaques and Alzheimer's symptoms.APP is important for the neurogenesis and neuronal regeneration, either through the intact protein, or through its many breakdown products [
,
]. APP consists of a large N-terminal extracellular region containing heparin-binding and copper-binding sites, Kunitz domain, E2 domain, a short hydrophobic transmembrane domain, and a short C-terminal intracellular domain. The N-terminal region is similar in structure to cysteine-rich growth factors and appears to function as a cell surface receptor, contributing to neurite growth, neuronal adhesion, axonogenesis and cell mobility []. APP acts as a kinesin I membrane receptor to mediate the axonal transport of beta-secretase and presenilin 1. The N-terminal domain can regulate neurite outgrowth through its binding to heparin and collagen I and IV, which are components of the extracellular matrix. APP is also coupled to apoptosis-inducing pathways, and is involved in copper homeostasis/oxidative stress through copper ion reduction, where copper-metallated APP induces neuronal death [,
]. The C-terminal intracellular domain appears to be involved in transcription regulation through protein-protein interactions. APP can promote transcription activation through binding to APBB1/Tip60, and may bind to the adaptor protein FE65 to transactivate a wide variety of different promoters.APP can be processed by different sets of enzymes:In the non-amyloidogenic (non-plaque-forming) pathway, APP is cleaved by alpha-secretase to yield a soluble N-terminal sAPP-alpha (neuroprotective) and a membrane-bound CTF-alpha. CTF-alpha is broken-down by presenilin-containing gamma-secretase to yield soluble p3 and membrane-bound AICD (nuclear signalling). In the amyloidogenic pathway (plaque-forming), APP is broken down by beta-secretase to yield soluble sAPP-beta and membrane-bound CTF-beta. CTF-beta is broken down by gamma-secretase to yield soluble amyloid-beta and membrane-bound AICD. Amyloid-beta is required for neuronal function, but can aggregate to form amyloid plaques that seem to disrupt brain cells by clogging points of cell-cell contact.The E2 domain is the largest of the conserved domains in the amyloidogenic glycoproteins. The structure of E2 consists of two coiled-coil sub-structures connected through a continuous helix, and bears an unexpected resemblance to the spectrin family of protein structures. E2 can reversibly dimerise in solution, and the dimerisation occurs along the longest dimension of the molecule in an antiparallel orientation, which enables the N-terminal substructure of one monomer to pack against the C-terminal substructure of a second monomer. The high degree of conservation of residues at the putative dimer interface suggests that the E2 dimer observed in the crystal could be physiologically relevant. Heparin sulphate proteoglycans, the putative ligands for the precursor present in extracellular matrix, bind to E2 at a conserved and positively charged site near the dimer interface [
]. The E2 domain is also known as CAPPD (for central APP domain) [].
Human epidermal growth factor (EGF)-like module containing mucin-like hormone receptor 1 (EMR1) is a surface receptor of unknown function that belongs to the EGF-seven-transmembrane (EGF-TM7) family of G-protein coupled receptors [
]. Human EMR1 has been reported to be expressed exclusively on eosinophils []. It is the the human homologue of F4/80, a monoclonal antibody that recognises a Mus musculus (Mouse) macrophage-restricted cell surface glycoprotein that has been extensively used to characterise macrophage populations in a wide range of immunological studies []. Little is known about its possible role in macrophage differentiation and function. The sequence of the F4/80 protein is similar to two protein superfamilies: the N-terminal region contains seven epidermal growth factor (EGF)-like domains, while the C-terminal region contains seven hydrophobic regions whose signature is consistent with membership of the secretin-like superfamily of GPCRs. The EGF and GPCR domains are separated from each other by a serine/threonine-rich domain, a feature reminiscent of mucin-like, single-span, integral membrane glycoproteins with adhesive properties [].This family also comprises EMR3, a marker for mature granulocytes [
], and EMR4 [,
].G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups [
]. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [,
,
,
,
]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The secretin-like GPCRs include secretin [
], calcitonin [], parathyroid hormone/parathyroid hormone-related peptides [] and vasoactive intestinal peptide [], all of which activate adenylyl cyclase and the phosphatidyl-inositol-calcium pathway. These receptors contain seven transmembrane regions, in a manner reminiscent of the rhodopsins and other receptors believed to interact with G-proteins (however there is no significant sequence identity between these families, the secretin-like receptors thus bear their own unique '7TM' signature). Their N-terminal is probably located on the extracellular side of the membrane and potentially glycosylated. This N-terminal region contains a long conserved region which allows the binding of large peptidic ligand such as glucagon, secretin, VIP and PACAP; this region contains five conserved cysteines residues which could be involved in disulphide bond. The C-terminal region of these receptor is probably cytoplasmic. Every receptor gene in this family is encoded on multiple exons, and several of these genes are alternatively spliced to yield functionally distinct products.
This is the PH domain of ephexins, which is believed to act with the DH domain in mediating protein-protein interactions [
,
,
,
].This entry includes ephexin family members [
,
,
] which comprises ephexin-1 to 5 and related animal proteins, such as ARHGEF26, also called SGEF (SH3 domain-containing GEF) which shows structural similarities with ephexins []. ARHGEF26 is highly expressed in liver and may play a role in regulating membrane dynamics []. A common feature of this proteins, apart from their high sequence homology, is that they are the direct downstream proteins of Eph receptors, a large subfamily of receptor tyrosine kinases that is activated by Ephrins and involved in various cellular processes such as axon guidance, formation of tissue boundaries, long-term potentiation, angiogenesis, and cancer. The are essential for normal function of neurons and their development []. Ephexin-1 (also called NGEF/neuronal guanine nucleotide exchange factor) plays a role in the homeostatic modulation of presynaptic neurotransmitter release and plays crucial roles in axon guidance [
].Literature data about Ephexin-2 (also known as RhoGEF19) is limited, however, its intrinsic role to function as a GEF for RhoA seems to be clear. It is involved in convergent extension, a developmental step of anterior-posterior axis extension in Xenopus gastrulation through RhoA activation and it also participates in pronephric tubulogenesis of Xenopus and zebrafish. Elevated levels of Ephexin-2 results in the increased activity of RhoA which causes higher cancer proliferation, migration, and invasion []. Ephexin-3 (also called Rho guanine nucleotide exchange factor 5/RhoGEF5) is ubiquitously expressed in many tissues, such as colon, kidney, trachea, prostate, liver, and pancreas, with tendency to be highly expressed in tissues containing epithelial cells. It functions as a GEF for RhoA. It plays a role in cell migration and adhesion as it is involved in Src-induced podosome formation and its deletion causes defects in immature dendritic cell migration in vivo [
]. Ephexin-4 (also called RhoGEF16) acts downstream of EphA2 to promote ligand-independent breast cancer cell migration and invasion toward epidermal growth factor through activation of RhoG. This in turn results in the activation of RhoG which recruits ELMO2 and Dock4 to form a complex with EphA2 at the tips of cortactin-rich protrusions in migrating breast cancer cells [
,
,
].Ephexin-5 (also known as RhoGEF15 and Vsm-RhoGEF) is the specific GEF for RhoA activation and the regulation of vascular smooth muscle contractility and it is also involved in angiogenesis, as it mediates VEGF-induced Rho GTPase activity modulation. It interacts with EPHA4 PH domains have diverse functions, but in general are involved in targeting proteins to the appropriate cellular location or in the interaction with a binding partner. It is highly expressed in the brain, especially in the hippocampus, where it may act as a beacon locating sites of new spine formation keeping them in check until incoming activity promotes spine formation at these sites [
].Members of the Ephexin family contain a RhoGEF (DH) followed by a PH domain and an SH3 domain, except in Ephexin-5, in which the SH3 domain is absent [
,
,
]. The ephexin PH domain is believed to act with the DH domain in mediating protein-protein interactions [,
,
].
This entry represents the C-terminal domain of the endoplasmic reticulum vesicle transporter proteins. Proteins included in this entry are conserved from plants and fungi to humans. Erv46 (ERGIC3) works in close conjunction with Erv41 (ERGIC2) and together they form a complex which cycles between the endoplasmic reticulum and Golgi complex. Erv46-41 interacts strongly with the endoplasmic reticulum glucosidase II. Mammalian glucosidase II comprises a catalytic alpha-subunit and a 58kDa beta subunit, which is required for ER localisation. All proteins identified biochemically as Erv41p-Erv46p interactors are localised to the early secretory pathway and are involved in protein maturation and processing in the ER and/or sorting into COPII vesicles for transport to the Golgi [
].Proteins containing this domain also include disulfide isomerase (PDI)-C subfamily members from Arabidopsis. They are chimeric proteins containing the thioredoxin (Trx) domain of PDIs, and the conserved N- and C-terminal domains of Erv cargo receptors [
].
Proteins containing this domain are found in all the three major phyla of life: archaebacteria, eubacteria, and eukaryotes. In
Bacillus subtilis, TenA is one of a number of proteins that enhance the expression of extracellular enzymes, such asalkaline protease, neutral protease and levansucrase [
] and has been identified as a Thiaminase 2 []. The THI-4 protein, which is involved in thiamine biosynthesis, also contains this domain. The C-terminal part of these proteins consistently show significant sequence similarity to TenA proteins. This similarity was first noted with the Neurospora crassa THI-4 []. This domain is also found in bacterial coenzyme PQQ synthesis protein C or PQQC. Pyrroloquinoline quinone (PQQ) is the prosthetic group of several bacterial enzymes,including methanol dehydrogenase of methylotrophs and the glucose dehydrogenase of a number of bacteria [
]. PQQC has been found to be required in the synthesis of PQQ, but its function is unclear.
SMARCAL1 (SWI/SNF-related, matrix-associated, actin-dependent regulator of
chromatin, subfamily A-like1), also known as DNA-dependent ATPase A and HARP(Hep-A-related proteins), maintains genome integrity during DNA replication.
SMARCAL1 has ATP-dependent annealing helicase activity, which helps tostabilise stalled replication forks and facilitate DNA repair during
replication. Biochemically, SMARCAL1 can bind to DNA that contains single- anddouble-stranded regions such as forks and DNA hairpins. DNA binding activates
its ATPase activity, and this activity promotes DNA single-stranded annealing[
,
].SMARCAL1 is a multifunctional protein. The ATPase domain, which lies in the C-terminal half of the protein, is split into two regions of primary amino acid
sequence by a 115-amino-acid linker sequence. The N-terminalhalf of the protein contains a highly sequence conserved ssDNA-binding protein
replication protein A (RPA)-binding domain, and one or two HARP domain(s). Theevolutionarily conserved HARP domain determines the annealing helicase
activity required for the in vivo and in vitro functions of SMARCAL1 [,
,
].
This entry includes cytoplasmic fragile X messenger ribonucleoprotein 1-interacting proteins from humans and their homologues, such as Sra-1 (specifically Rac1-associated protein 1) from Drosophila and PIROGI from Arabidopsis. In humans, there are two members, CYFIP1 and CYFIP2. They both interact with FMRP (Fragile X messenger ribonucleoprotein 1), which is responsible for pathologic manifestations in the Fragile X Syndrome. CYFIP1 interacts with the small GTPase Rac1 [
,
]. CYFIP1 represses cap-dependent translation of mRNA by interacting with the initiation factor eIF4E []. CYFIP1 and CYFIP2 are part of the Wiskott-Aldrich syndrome protein-family verprolin-homologous protein (WAVE) complex that regulates actin polymerization at synapses []. Drosophila Sra-1 interacts with the Kette and Wasp. It is required for neuronal and bristle development in Drosophila [
]. PIROGI is part of a WAVE complex that activates the ARP2/3 complex and is Involved in regulation of actin organization [
].
UBA domains are a commonly occurring sequence motif of approximately 45 amino acid residues that are found in diverse proteins involved in the ubiquitin/proteasome pathway, DNA excision-repair, and cell signalling via protein kinases [
]. HHR23A, the human homologue of yeast Rad23A is a nucleotide excision-repair protein that contains both an internal and a C-terminal UBA domain. The fold of the UBA domain consists of a compact three-helical bundle with a right-handed twist, and have a conserved hydrophobic surface patch for protein-protein interactions. UBA-like domains can be found in other proteins as well, such as the TS-N domain in the elongation factor Ts (EF-Ts), which catalyses the recycling of the GTPase EF-Tu required for the binding of aminoacyl-tRNA top the ribosomal A site []; and the C-terminal domain of TAP/NXF1, which functions in nuclear export through the interaction of its UBA-like domain with FG nucleoporins [].
This family includes DUF34/metal-binding proteins from bacteria, NIF3 from budding yeasts and NIF3-like proteins from animals. This entry includes the DUF34/metal-binding protein/NIF3 proteins, which are widely distributed across superkingdoms. They were previously annotated as GTP cyclohydrolase 1 type 2 [
] and, recently, through a comprehensive literature review and integrative bioinformatic analyses it was revealed that annotations for these members are misleading as they were based on a single set of in vitro results examining the NIF3 homolog of Helicobacter pylori []. Actually, they have varied phenotypes with the unifying functional role as metal-binding proteins [].NIF3 interacts with the yeast transcriptional coactivator Ngg1p which is part of the ADA complex, the exact function of this interaction is unknown [
,
].The structure of the Methanocaldococcus jannaschii MJ0927 NIF3 protein has been determined [
,
]. It binds to both single-stranded and double-stranded DNA [].
This group represents Protein numb and similar proteins from animals. This protein plays key roles in cell fate determination [
]. Members of this protein family contain a PID domain, a type of PTB domain []. NUMB from Drosophila is required in determination of cell fate during sensory organ formation in embryos [
]. It recruits alpha-Adaptin, and this physical interaction plays a role in downregulating Notch, presumably by stimulating endocytosis of Notch [,
]. Numb-related protein 1 (NUMB1) from Caenorhabditis elegans is involved in the tethering and targeting of pkc-3 to modulate the intracellular distribution of the kinase [
]. Mammalian Numb-like protein (NUMBL) plays a role in the process of neurogenesis and is required throughout embryonic neurogenesis to maintain neural progenitor cells []. It inhibits glioma cell migration and invasion by suppressing TRAF5-mediated NF-kappaB activation []. It has a role in tumorigenesis [].
This domain is found at the C terminus of phage P4 alpha protein and related proteins. Phage P4 DNA replication depends on the product of the alpha gene, which has origin recognition ability, DNA helicase activity, and DNA primase activity. The structure of the protein can be summarised as follows: The N terminus provides the primase activity, the central region is the helicase/nucleoside triphosphatase domain and the ori DNA recognition resides in the C-terminal 1/3 of the protein [
].
The domain is also found at the C terminus of a number of proteins from orthopox viruses including vaccinia virus D5. D5 encodes a 90kDa protein that is transiently expressed at early times after infection. It has an nucleoside triphosphatase activity which is independent of common nucleic acid cofactors and it can hydrolyse all the common ribo- and deoxyribonucleoside triphosphates to diphosphates in the presence of a divalent cation [
].
HECW1 (HECT, C2 and WW domain containing E3 ubiquitin protein ligase 1), also known as NEDL1, is an HECT-type E3 ubiquitin protein ligase highly expressed in favorable neuroblastomas [
]. NEDL1 is thought to normally function in the quality control of cellular proteins by eliminating misfolded proteins. This is thought to be accomplished via a mechanism analogous to that of ER-associated degradation by forming tight complexes and aggregating misfolded proteins that have escaped ubiquitin-mediated degradation []. NEDL1 is thought to stimulate p53-mediated apoptosis [].NEDL1 shares large homology and structure with NEDL2, including a C2 domain at the N terminus, two WW domains in the middle of the protein, and a HECT domain at the C terminus [
]. NEDL2 regulates the stability of p73 [] and functions as a regulator of the metaphase to anaphase transition [].This entry represents the C2 domain of NEDL1 and NEDL2.
The gene encoding Nuclear Testis (NUT) protein is found fused to BRD3 or BRD4 genes in some aggressive types of carcinoma due to chromosomal translocations [
,
]. Proteins of the BRD family contain two bromodomains that bind transcriptionally active chromatin through associations with acetylated histones H3 and H4 [,
]. Such proteins are crucial for the regulation of cell cycle progression. On the other hand, little is known about NUT protein function. NUT has a Nuclear Export Sequence (NES) as well as a Nuclear Localization Signal (NLS), both located towards the C-terminal end of the protein [
,
]. A fused NUT-GFP protein has shown either cytoplasmic or nuclear localisation, suggesting that it is subject to nuclear/cytoplasmic shuttling. Consistent with this possibility, treatment with leptomycin B, an inhibitor of CRM1-dependent nuclear export, has been shown to result in re-distribution of NUT-GFP to the nucleus [,
].
This entry represents the RNA recognition motif 3 (RRM3) of HuD (also known as ELAV-like protein 4), one of the neuronal members of the Hu family. The neuronal Hu proteins play important roles in neuronal differentiation, plasticity and memory. HuD has been implicated in various aspects of neuronal function, such as the commitment and differentiation of neuronal precursors as well as synaptic remodeling in mature neurons [
]. HuD also functions as an important regulator of mRNA expression in neurons by interacting with AU-rich RNA element (ARE) and stabilizing multiple transcripts []. Moreover, HuD regulates the nuclear processing/stability of N-myc pre-mRNA in neuroblastoma cells []. Like other Hu proteins, HuD contains three RNA recognition motifs (RRMs). RRM1 and RRM2 may cooperate in binding to an ARE. RRM3 may help to maintain the stability of the RNA-protein complex, and might also bind to poly(A) tails or be involved in protein-protein interactions [].
This is the death domain of RAIDD (RIP-associated ICH-1 homologous protein with a death domain), also known as CRADD (Caspase and RIP adaptor). RAIDD is an adaptor protein that together with the p53-inducible protein PIDD and caspase-2, forms the PIDDosome complex, which is required for caspase-2 activation and plays a role in mediating stress-induced apoptosis. RAIDD contains an N-terminal Caspase Activation and Recruitment Domain (CARD), which interacts with the caspase-2 CARD, and a C-terminal DD, which interacts with the DD of PIDD [
,
].DDs (Death domains) are protein-protein interaction domains found in a variety of domain architectures. Their common feature is that they form homodimers by self-association or heterodimers by associating with other members of the DD superfamily including CARD (Caspase activation and recruitment domain), DED (Death Effector Domain), and PYRIN. They serve as adaptors in signaling pathways and can recruit other proteins into signaling complexes [
,
].
The transcription factor activator protein (AP)-1 consists of Jun (c-Jun, JunB, and JunD), Fos (c-Fos, FosB, Fra1, and Fra2), ATF and JDP family members [
]. They are basic leucine zipper transcription factors that play a central role in regulating gene transcription in various biological processes []. This entry includes Fos, ATF-3 and JDP family members. AP-1 proteins have an α-helical bZIP domain, which contains a basic DNA-binding region and regularly spaced leucine residues known as the leucine zipper motif [
]. They have similar protein structure and can either form homodimers or form heterodimers with other AP-1 proteins (predominantly with Jun proteins), which can then bind to TRE-like sequences (consensus sequence 5'-TGAG/CTCA-3') []. Each of these proteins are expressed in different tissues and can be regulated in different ways, which means that every cell type has a complex mixture of AP-1 dimers with subtly different functions [].
The TerD domain is found in TerD family proteins that include the paralogous TerD, TerA, TerE, TerF and TerZ proteins [
,
]. It is found in a stress response operon with TerB and TerC. TerD has a maximum of two calcium binding sites [,
] depending on the conservation of aspartates []. It has various fusions to nuclease domains, RNA binding domains, ubiquitin related domains, and metal binding domains. The ter gene products lie at the centre of membrane-linked metal recognition complexes with regulatory ramifications encompassing phosphorylation-dependent signal transduction, RNA-dependent regulation, biosynthesis of nucleoside-like metabolites and DNA processing linked to novel pathways [].TerD domain is also found in cAMP binding protein, chemical-damaging agent resistance proteins and general stress proteins. The cellular Slime mould, Dictyostelium discoideum, contains a cAMP-binding protein, CABP1, which is composed of two subunits. The C-terminal half of these subunits contain this domain [
].
Claudins form the paracellular tight junction seal in epithelial tissues. In humans, 24 claudins (claudin 1-24) have been identified. Their ability to polymerise and form strands is affected by the cell types [
,
,
]. They can also form heteropolymers with each other within and between tight junction strands []. Most of the claudins (claudin-12 being the exception) have a C-terminal PDZ-binding motif that can interact with other PDZ domain proteins, such as scaffolding protein, ZO-1, -2 and -3 []. They also interact with non-tight junction proteins, such as cell adhesion proteins EpCam and tetraspanins and the signaling proteins, ephrin A and B and their receptors, EphA and EphB [].Claudin-1 was the first member of the claudin family to be identified as
a tight junction component []. The human isoform of claudin-1 was originally termed senescence-associated epithelial membrane protein 1
(SEMP1) [], but has since been reclassified.
Caveolae are 50-100 nm invaginations located at the plasma membrane of many cell types and are known to transport molecules across endothelial cells [
]. Caveolae require the caveolin proteinfor formation. Caveolins may act as scaffolding proteins within caveolar membranes by compartmentalizing and concentrating signalling molecules. Mammals have three caveolin proteins:caveolin-1 (Cav-1, or VIP21), caveolin-2 and caveolin-3 (or M-caveolin). Various classes of signalling molecules, including G-protein subunits, receptor and non-receptor tyrosine kinases, endothelial nitric oxide synthase (eNOS), and small GTPases, bind Cav-1 through its 'caveolin-scaffolding domain' [
].Caveolins are proteins of about 20 Kd, they form high molecular mass homo-oligomers. Structurally they seem to have N-terminal and C-terminal hydrophilic segments and a long central transmembrane domain that probably forms a hairpin in the membrane. Both extremities are known to face the cytoplasm. Caveolae are enriched with cholesterol and Cav-1 is one of the few proteins that binds cholesterol tightly and specifically.
This superfamily represents the C-terminal domain of UPF0234 uncharacterised proteins, which includes YajQ. It also found also in UPF0381 uncharacterised proteins.In Pseudomonas syringae, YajQ functions as a host protein involved in the temporal control of bacteriophage Phi6 gene transcription. It has been shown to bind to the phage's major structural core protein P1, most likely activating transcription by acting indirectly on the RNA polymerase. YajQ may remain bound to the phage particles throughout the infection period [
,
]. Earlier, YajQ was characterized as a putative nucleic acid-binding protein based on the similarity of its (ferredoxin-like) three-dimensional topology with that of RNP-like RNA-binding domains [,
].The polypeptide chain of YajQ is folded into two domains with identical folding topology. Each domain has a four-stranded antiparallel β-sheet flanked on one side by two α-helices. This structural motif is a characteristic feature of many RNA-binding proteins [
].
Caveolae are 50-100 nm invaginations located at the plasma membrane of many cell types and are known to transport molecules across endothelial cells [
]. Caveolae require the caveolin proteinfor formation. Caveolins may act as scaffolding proteins within caveolar membranes by compartmentalizing and concentrating signalling molecules. Mammals have three caveolin proteins:caveolin-1 (Cav-1, or VIP21), caveolin-2 and caveolin-3 (or M-caveolin). Various classes of signalling molecules, including G-protein subunits, receptor and non-receptor tyrosine kinases, endothelial nitric oxide synthase (eNOS), and small GTPases, bind Cav-1 through its 'caveolin-scaffolding domain' [
].Caveolins are proteins of about 20 Kd, they form high molecular mass homo-oligomers. Structurally they seem to have N-terminal and C-terminal hydrophilic segments and a long central transmembrane domain that probably forms a hairpin in the membrane. Both extremities are known to face the cytoplasm. Caveolae are enriched with cholesterol and Cav-1 is one of the few proteins that binds cholesterol tightly and specifically.
This is a C2H2 zinc-finger domain found in ZHX proteins such as ZHX1. ZHXs are multidomain proteins comprising two C2H2 zinc finger motifs and five homeodomains. Both homeodomains and zinc fingers are short protein modules involved in protein-DNA and/or protein-protein interactions; they are frequently associated with roles in transcriptional regulation. All members of the ZHX family are reported to be able to form both homo- and heterodimers via the region containing homeodomain 1 [
]. ZHX1 is a transcriptional repressor which is ubiquitously expressed. It interacts with nuclear factor Y subunit A (NFYA) and DNA methyl transferase 3B (DNMT3B) for its repression activity. Changes in expression profiles of rat ZHX1 ortholog have been associated with glomerular disease. In addition to the five homeodomains, ZHX1, which also contains of two N-terminal C2H2 zincfingers forms homodimers via homeodomain and can also form heterodimers with ZHX3 [].
Protein 4.1 (EPB4.1) is a major structural element of the erythrocyte membrane skeleton. It plays a key role in regulating physical properties of the membrane, such as mechanical stability and de-formability, by stabilising spectrin-actin interactions [
,
,
]. It is required for dynein-dynactin complex and NUMA1 recruitment at the mitotic cell cortex during anaphase []. The protein has been shown to associate with the nuclear mitotic apparatus, the contractile apparatus and tight junctions. Defects in this protein can cause elliptocytosis type 1 and hereditary pyropoikilocytosis.Note: Band 4.1 and Band 7 (
) proteins refer to human erythrocyte membrane proteins separated by SDS polyacrylamide gels and stained with coomassie blue [
].This entry represents the N-terminal F1 sub-domain of the FERM (Four.1 protein, Ezrin, Radixin, Moesin) domain found in EPB4.1. This domain is also known as the N-terminal ubiquitin-like structural domain of the FERM domain (FERM_N).
Proteins in the transcriptional enhancer factor family (also known as TEAD family, includes TEAD 1 to 4) play a key role in the Hippo signaling pathway, a pathway involved in organ size control and tumour suppression by restricting proliferation and promoting apoptosis. The core of this pathway is composed of a kinase cascade wherein MST1/MST2, in complex with its regulatory protein SAV1, phosphorylates and activates LATS1/2 in complex with its regulatory protein MOB1, which in turn phosphorylates and inactivates YAP1 oncoprotein and WWTR1/TAZ. TEAD transcription factors act by mediating gene expression of YAP1 and WWTR1/TAZ, thereby regulating cell proliferation, migration and epithelial mesenchymal transition (EMT) induction [
,
]. This entry includes protein Scalloped from Drosophila melanogasterwhich in combination with protein Vestigial promotes wing formation [
], and in combination with Yorkie which promotes transcriptional activity and movement of Yorkie to the nucleus [].
The GLEYA domain is related to lectin-like binding domains found in the Saccharomyces cerevisiae Flo proteins and the Candida glabrata Epa proteins [
]. It is a carbohydrate-binding domain that is found in fungal adhesins (also referred to as agglutinins or flocculins) []. Adhesins with a GLEYA domain possess a typical N-terminal signal peptide and a domain of conserved sequence repeats, but lack glycosylphosphatidylinositol (GPI) anchor attachment signals []. They contain a conserved motif G(M/L)(E/A/N/Q)YA, hence the name GLEYA. Based on sequence homology, it is suggested that the GLEYA domain would predominantly contain beta sheets []. The GLEYA domain is also found in Schizosaccharomyces pombe protein , thought to be a kinetochore protein (Sim4 complex subunit), however no direct evidence for kinetochore association has been found [
]. Furthermore, a global protein localisation study in S. pombe identified it as a secreted protein localized to the Golgi complex [].
Syntrophins are scaffold proteins that associate with associate with the Duchenne muscular dystrophy protein dystrophin and the dystrophin-related proteins, utrophin and dystrobrevin to form the dystrophin glycoprotein complex (DGC). There are 5 members: alpha, beta1, beta2, gamma1, and gamma2) all of which contains a split (also called joined) PH domain and a PDZ domain (PHN-PDZ-PHC). The split PH domain of alpha-syntrophin adopts a canonical PH domain fold and together with PDZ forms a supramodule functioning synergistically in binding to inositol phospholipids. The alpha-syntrophin PH-PDZ supramodule showed strong binding to phosphoinositides PI(3,5)P2 and PI(5)P, modest binding to PI(3,4)P2 and PI(4,5)P2, and weak binding to PI(3)P, PI(4)P, and PI(3,4,5)P. There are a large number of signaling proteins that bind to the PDZ domain of syntrophins: nitric oxide synthase (nNOS), aquaporin-4, voltage-gated sodium channels, potassium channels, serine/threonine protein kinases, and the ATP-binding cassette transporter A1 [
].
Escherichia coli HscA (heat shock cognate protein A, also called Hsc66), belongs to the heat shock protein 70 (HSP70) family of chaperones that assist in protein folding and assembly and can direct incompetent 'client' proteins towards degradation. Typically, HSP70s have a nucleotide-binding domain (NBD) and a substrate-binding domain (SBD). The nucleotide sits in a deep cleft formed between the two lobes of the NBD. The two subdomains of each lobe change conformation between ATP-bound, ADP-bound, and nucleotide-free states. ATP binding opens up the substrate-binding site; substrate-binding increases the rate of ATP hydrolysis. HSP70 chaperone activity is regulated by various co-chaperones: J-domain proteins and nucleotide exchange factors (NEFs). HscA's partner J-domain protein is HscB; it does not appear to require a NEF, and has been shown to be induced by cold-shock [
]. The HscA-HscB chaperone/co-chaperone pair is involved in [Fe-S]cluster assembly or tranfer [
,
].
This entry represent the HUS regulatory domain found towards the N terminus in guanine nucleotide exchange factors involved Golgi transport, such as budding yeast protein Sec7, protein Mon2 and BIG1-like proteins [
,
].Sec7 and its homologues are guanine nucleotide exchange factors (GEFs) involved in the secretory pathway [
]. The full-length Sec7 functions proximally in the secretory pathway as a protein binding scaffold for the coat protein complexes COPII-COPI []. The COPII-COPI-protein switch is necessary for maturation of the vesicular-tubular cluster, VTC, intermediate compartments for Golgi compartment biogenesis. This N-terminal domain however does not appear to be binding either of the COP or the ARF [].Mon2 is distantly related to the Arf1 guanine nucleotide exchange factors (GEFs), such as Sec7. However, it lacks the Sec7 domain that catalyses nucleotide exchange on Arf1. Instead, Mon2 acts as a scaffold to recruit the Golgi-localised pool of Dop1 [
].
Limited proteolysis of most large protein precursors is carried out in vivo by the subtilisin-like pro-protein convertases. Many important biological processes such as peptide hormone synthesis, viral protein processing and receptor maturation involve proteolytic processing by these enzymes [
]. The subtilisin-serine protease (SRSP) family hormone and pro-protein convertases (furin, PC1/3, PC2, PC4, PACE4, PC5/6, and PC7/7/LPC) act within the secretory pathway to cleave polypeptide precursors at specific basic sites, generating their biologically active forms. Serum proteins, pro-hormones, receptors, zymogens, viral surface glycoproteins, bacterial toxins, amongst others, are activated by this route []. The SRSPs share the same domain structure, including a signal peptide, the pro-peptide, the catalytic domain, the P/middle or homo B domain, and the C terminus.This entry contains predicted subtilisin-related peptidases, predominantly found in cyanobacterial species. The peptides belong to MEROPS peptidase family S8A (subtilisin family, clan SB), and are unassigned.
Proteins in this entry include HemW (also known as oxygen-independent coproporphyrinogen-III oxidase-like protein). HemW is a heme chaperone catalyzing the insertion of heme into hemoproteins [
]. Experimentally determined examples of oxygen-independent coproporphyrinogen III oxidase, an enzyme that replaces HemF function under anaerobic conditions, belong to a family of proteins called HemN (
). This family contains a closely related protein, shorter at the amino end and lacking the region containing the motif PYRT[SC]YP found in members of the hemN family. Several species, including Escherichia coli, Helicobacter pylori, Aquifex aeolicus, and Chlamydia trachomatis, have members of both this family and the Escherichia coli hemN family. The member of this family from Bacillus subtilis was shown to complement a hemF/hemN double mutant of Salmonella typhimurium and to prevent accumulation of coproporphyrinogen III under anaerobic conditions, but the exact role of this protein is still uncertain. It is found in a number of species that do not synthesize haem de novo.
H/ACA ribonucleoprotein particles (RNPs) are a family of RNA pseudouridine synthases that specify modification sites through guide RNAs. The function of these H/ACA RNPs is essential for biogenesis of the ribosome, splicing of precursor mRNAs (pre-mRNAs), maintenance of telomeres and probably for additional cellular processes [
]. All H/ACA RNPs contain a specific RNA component (snoRNA or scaRNA) and at least four proteins common to all such particles: Cbf5, Gar1, Nhp2 and Nop10. These proteins are highly conserved from yeast to mammals and homologues are also present in archaea []. The H/ACA protein complex contains a stable core composed of Cbf5 and Nop10, to which Gar1 and Nhp2 subsequently bind [].In eukaryotes Nop10 is a nucleolar protein that is specifically associated with H/ACA snoRNAs. It is essential for normal 18S rRNA production and rRNA pseudouridylation by the ribonucleoprotein particles containing H/ACA snoRNAs (H/ACA snoRNPs). Nop10 is probably necessary for the stability of these RNPs [
].
This entry represents a group of LSM domain containing proteins functioning in RNA processing, including U6 snRNA-associated Sm-like protein LSm4 and small nuclear ribonucleoproteins Sm D1 and D3.LSm4 is a component of LSm protein complexes, which are involved in RNA processing and may function in a chaperone-like manner. It binds specifically to the 3'-terminal U-tract of U6 snRNA [
,
]. SmD1 is involved in pre-mRNA splicing. It binds snRNA U1, U2, U4 and U5, which contain a highly conserved structural motif called the Sm binding site. It also binds telomerase RNA and is required for its accumulation [
,
].SmD3 is a core protein of small nuclear ribonucleoprotein (snRNP) essential for splicing of primary transcripts [
]. It appears to function in the U7 snRNP complex that is involved in histone 3'-end processing. It binds to the downstream cleavage product (DCP) of histone pre-mRNA in a U7 snRNP dependent manner [].
Spo11 is a meiosis-specific protein that is responsible for the initiation of recombination during the early stages of meiosis through the formation of DNA double-strand breaks (DSBs) by a type II DNA topoisomerase-like activity [
,
]. These DSBs initiate homologous recombination, which is required for chromosomal segregation and generation of genetic diversity during meiosis. Spo11 acts in conjunction with several other proteins, including Rec102 in yeast, to bring about meiotic recombination []. Mouse and human homologues of Spo11 have been cloned and characterised. The proteins are 82% identical and share ~25% identity with other family members. Mouse Spo11 has been localised to chromosome 2H4, and human SPO11 to chromosome 20q13.2-q13.3, a region amplified in some breast and ovarian tumours [].Similarity between SPO11 and archaebacterial TOP6A proteins points to evolutionary specialisation of a DNA-cleavage function for meiotic recombination [
]. Note that the yeast SPO11 protein shares far less similarity to other SPO11 proteins than the human and mouse homologues do to each other.
This NYN domain is found in Meiosis regulator and mRNA stability factor 1 (MARF1, also known as limkain-b1) [
,
] and in uncharacterised proteins. The NYN domains are found in the eukaryotic proteins typified by the Nedd4-binding protein 1 and the bacterial YacP-like proteins. The NYN (for Nedd4-BP1, YacP-like Nuclease) domain shares a common protein fold with two other previously characterised groups of nucleases, namely the PIN (PilT N-terminal) and FLAP/5' -->3' exonuclease superfamilies. These proteins share a common set of 4 acidic conserved residues that are predicted to constitute their active site. Based on the conservation of the acidic residues and structural elements it has been suggested that PIN and NYN domains are likely to bind only a single metal ion, unlike the FLAP/5' -->3' exonuclease superfamily, which binds two metal ions [
]. Based on conserved gene neighbourhoods the bacterial members are likely to be components of the processome/degradosome that process tRNAs or ribosomal RNAs.
This family represents the biotin carboxylase subunit found usually as a component of acetyl-CoA carboxylase. Acetyl-CoA carboxylase (
) is a heterohexamer of biotin carboxyl carrier protein, biotin carboxylase (
), and two subunits of carboxyl transferase in a 2:2 complex. In the first step of long-chain fatty acid synthesis, biotin carboxylase catalyses the carboxylation of the carrier protein and then the transcarboxylase transfers the carboxyl group to form malonyl-coA. Homologous domains are found in eukaryotic forms of acetyl-CoA carboxylase and in a number of other carboxylases (e.g. pyruvate carboxylase). In some systems, the biotin carboxyl carrier protein and this protein (biotin carboxylase) may be shared by different carboxyltransferases. However, this model is not intended to identify the biotin carboxylase domain of propionyl-coA carboxylase. The model should hit the full length of proteins, except for chloroplast transit peptides in plants. If it hits a domain only of a longer protein, there may be a problem with the identification.
This family represents Asterix proteins which are a component of the PAT complex, an endoplasmic reticulum (ER)-resident membrane multiprotein complex that facilitates multi-pass membrane proteins insertion into membranes [
]. The PAT complex, formed by CCDC47 and Asterix proteins, acts as an intramembrane chaperone by directly interacting with nascent transmembrane domains (TMDs), releasing its substrates upon correct folding, and is needed for optimal biogenesis of multi-pass membrane proteins []. WDR83OS/Asterix, () is the substrate-interacting subunit of the PAT complex, whereas CCDC47 is required to maintain the stability of WDR83OS/Asterix [
,
]. WDR83OS/Asterix associates with the first transmembrane domain (TMD1) of the nascent chain, independently of the N-glycosylation of the chain and irrespective of the amino acid sequence and transmembrane topology of TMD1 [,
]. The PAT complex favors the binding to TMDs with exposed hydrophilic amino acids within the lipid bilayer and provides a membrane-embedded partially hydrophilic environment in which TMD1 binds []. This entry also includes the ER membrane uncharacterised protein from Schizosaccharomyces pombe C18B11.08c.
It has been shown [
,
,
] that the following carbohydrate and purine kinases are evolutionary related and can be grouped into a single family, which is known [] as the 'pfkB family':Fructokinase (
) (gene scrK).
6-phosphofructokinase isozyme 2 (
) (phosphofructokinase-2) (gene pfkB). pfkB is a minor phosphofructokinase isozyme in Escherichia coli and is not evolutionary related to the major isozyme (gene pfkA). Plant 6-phosphofructokinase also belong to this family.
Ribokinase (
) (gene rbsK).
Adenosine kinase (
) (gene ADK).
2-dehydro-3-deoxygluconokinase (
) (gene: kdgK).
1-phosphofructokinase (
) (fructose 1-phosphate kinase) (gene fruK).
Inosine-guanosine kinase (
) (gene gsk).
Tagatose-6-phosphate kinase (
) (phosphotagatokinase) (gene lacC).
E. coli hypothetical protein yeiC.E. coli hypothetical protein yeiI.E. coli hypothetical protein yhfQ.E. coli hypothetical protein yihV.Yeast hypothetical protein YJR105w.All the above kinases are proteins of from 280 to 430 amino acid residues that share a few region of sequence similarity.Note: some bacterial fructokinases belong to the ROK family (see
).
The armadillo (Arm) repeat is an approximately 40 amino acid long tandemly repeated sequence motif first identified in the Drosophila melanogaster segment polarity gene armadillo involved in signal transduction through wingless. Animal Arm-repeat proteins function in various processes, including intracellular signalling and cytoskeletal regulation, and include such proteins as beta-catenin, the junctional plaque protein plakoglobin, the adenomatous polyposis coli (APC) tumour suppressor protein, and the nuclear transport factor importin-alpha, amongst others [
]. A subset of these proteins is conserved across eukaryotic kingdoms. In higher plants, some Arm-repeat proteins function in intracellular signalling like their mammalian counterparts, while others have novel functions [].The 3-dimensional fold of an armadillo repeat is known from the crystal structure of beta-catenin, where the 12 repeats form a superhelix of alpha helices with three helices per unit [
]. The cylindrical structure features a positively charged grove, which presumably interacts with the acidic surfaces of the known interaction partners of beta-catenin.
The cysteine-rich secretory proteins (Crisp) are predominantly found in the mammalian male reproductive tract as well as in the venom of reptiles. This family includes mammalian testis-specific protein (Tpx-1), also known as cysteine-rich secretory protein 2 (CRISP2) [
]; venom allergen 5 from vespid wasps and venom allergen 3 from fire ants, which are potent allergens that mediate allergic reactions to stings insects of the Hymenoptera family []; scoloptoxins from Scolopendra dehaani (Thai centipede) []; plant pathogenesis proteins of the PR-1 family [], which are synthesised during pathogen infection or other stress-related responses; allurin, a sperm chemoattractant [], serotriflin [], etc.The precise function of some of these proteins is still unclear. Tpx-1 or CRISP2 may regulate some ion channels' activity and thereby regulate calcium fluxes during sperm capacitation [
].This entry also includes several Tabinhibitin proteins and allergen Tab y 5.0101 from horsefly salivary glands [
,
] and antigen 5 like allergen Cul n 1 from biting midge salivary glands [].
Protein phosphatases remove phosphate groups from various proteins that are
the key components of a number of signalling pathways in eukaryotes andprokaryotes. Protein phosphatases that dephosphorylate Ser and Thr residues
are classified into the phosphoprotein (PPP) and the protein phosphataseMg(2+)- or Mn(2+)-dependent (PPM) families. The core structure of PPMs is the
300-residue PPM-type phosphatase domain that catalyzes the dephosphorylationof phosphoserine- and phosphothreonine-containing protein. The PPM-type
phosphatase domain is found as a module in diverse structural contexts and ismodulated by targeting and regulatory subunits [
,
,
,
].The PP2C-type phosphatase domain consists of 10 segments of β-strands and 5
segments of α-helix and comprises a pair of detached subdomains. The firstis a small β-sandwich with strand beta1 packed against strands beta2 and
beta3; the second is a larger β-sandwich in which a four-stranded beta-heet packs against a three-stranded β-sheet with flanking α-helices [
,
].This entry represents a conserved aspartate residue involved in divalent cation binding [
].
This entry represents the translocase of chloroplast 159 family of proteins (Tocs), GTPases involved in protein precursor import into chloroplasts, which recognise chloroplast-destined precursor proteins and regulate their presentation to the translocation channel through GTP hydrolysis [
]. They have three domains: the N-terminal A-domain is acidic, repetitive, weakly conserved, readily removed by proteolysis during chloroplast isolation, not required for protein translocation [,
], and the other domains are designated G (GTPase) and M (membrane anchor) []. The signature for this family includes most of the G domain and all of M. There are at least two distinct Toc159 subtypes, Toc159 and Toc132/Toc120. Toc159 is expressed in young photosynthetic tissues and seems to be specialised in the import of nuclear encoded photosynthetic preproteins from the cytoplasm to the chloroplast, whereas Toc132/Toc120 are expressed relatively prominent in nonphotosynthetic tissues and seem to be specialised in the import of nuclear encoded non-photosynthetic preproteins from the cytoplasm to the chloroplast [].
The HAMP domain (present in Histidine kinases, Adenyl cyclases, Methyl-accepting proteins and Phosphatases) is an approximately 50-amino acid α-helical region common to chemoreceptors and histidine kinases that is present in several multidomain sensor proteins that participate in a variety of signal transduction processes. It is found in bacterial sensor and chemotaxis proteins and in eukaryotic histidine kinases. The bacterial proteins are usually integral membrane proteins and part of a two-component signal transduction pathway. One or several copies of the HAMP domain can be found in association with other domains, such as the histidine kinase domain, the bacterial chemotaxis sensory transducer domain, the PAS repeat, the EAL domain, the GGDEF domain, the protein phosphatase 2C-like domain, the guanylate cyclase domain, or the response regulatory domain. It has been suggested that the HAMP domain possesses a role of regulating the phosphorylation or methylation of homodimeric receptors by transmitting the conformational changes in periplasmic ligand-binding domains to cytoplasmic signalling kinase and methyl-acceptor domains [
,
,
].
This family is comprised of antitumour antibiotic chromoproteins, as represented by neocarzinostatin [
]. These chromoproteins consist of a noncovalently bound, labile enediyne chromophore and its stabilising carrier apoprotein. The protein component of the chromophore displays an unusual bicyclic dienediyne structure. The chromoprotein inter-chelates the DNA, where its cycloaromatisation produces a biradical intermediate that has the ability to abstract hydrogens from the sugar moiety of DNA. This causes single- and double-strand breaks in the DNA []. In addition to their ability to cleave DNA at sites specific for each chromophore, results indicate that these chromoproteins also possess proteolytic activity against histones, with histone H1 as the preferred substrate [].Neocarzinostatin has 2 disulphide bridges and is kidney-shaped with 2 defined domains that hold a binding cavity. The larger domain forms a 7-stranded antiparallel β-barrel and the smaller domain consists of 2 anti-parallel strands of beta sheet that are perpendicular to each other [
]. Other members of this family include macromycin, actinoxanthine, kedarcidin [], and C-1027 [].
The bromodomain and extraterminal (BET) proteins are a class of transcriptional regulators whose members can be found in animals, plants and fungi. BET proteins are involved in diverse cellular phenomena such as meiosis, cell-cycle control, and homeosis and have been suggested to modulate
chromatin structure and affect transcription via a sequence-independent mechanism. BET proteins are defined as having one (plants) or two (animals/yeast) bromodomains and an Extra Terminal (ET) domain. The ET domain consists of three separate regions, only one of which, the N-terminal ET (NET) domain is conserved in all BET proteins. The function of the NET domain is assumed to be protein binding [,
,
,
].The structure of the NET domain comprises three α-helices and a characteristic loop region of an irregular but well-defined structure. The NET structure has an acidic patch that forms a continuous ridge with a hydrophobic cleft. which may interact with other proteins and/or DNA [
].
This superfamily consists of several Mastadenovirus E4 ORF3 proteins. Early proteins E4 ORF3 and E4 ORF6 have complementary functions during viral infection. Both proteins facilitate efficient viral DNA replication, late protein expression, and prevention of concatenation of viral genomes. A unique function of E4 ORF3 is the reorganisation of nuclear structures known as PML oncogenic domains (PODs). This domain is annotated as pleiotropic. It is relevant to highlight that early proteins E4 ORF3 and E4 ORF6 have complementary functions during viral infection which include efficient viral DNA replication, late protein expression, and prevention of concatenation of viral genomes. However, characterisation of E4 ORF3 has revealed several unique functions such as the reorganisation of nuclear structures known as PML oncogenic domains (PODs), which are nuclear structures whose function still remains unclear although these have been implicated in several critical cellular processes such as, regulation of transcription, apoptosis, transformation, and response to interferon [
].
This entry represents several Nif (B, X and Y) proteins, which are involved in the biosynthesis of the iron-molybdenum cofactor (FeMo-co) found in the dinitrogenase enzyme of the nitrogenase complex in nitrogen-fixing bacteria. The nitrogenase complex catalyses the reduction of atmospheric dinitrogen to ammonia, and is composed of an iron metalloprotein (dinitrogenase reductase; homodimer of NifH;
) and a Fe-Mo metalloprotein (dinitrogenase; heterotetramer of NifD and NifK;
). The pathway for the synthesis of the Fe-Mo cofactor involves several proteins, including NifB, NifE, NifH, NifN, NifQ, NifV and NifX. NifB appears to be an iron-sulphur source for FeMo-co biosynthesis, while NifX may be associated with the mature FeMo-co, in particular with the addition of homocitrate during the last step of biosynthesis [
]. The NifX protein shows sequence similarity with the C terminus of NifB [], as well as to the conserved protein MTH1175 from the archaeon Methanobacterium thermoautotrophicum, which displays a ribonuclease H-like motif of three layers, alpha/beta/alpha, with a single mixed β-sheet [].
Atrophin-1 is the protein product of the dentatorubral-pallidoluysian atrophy (DRPLA) gene. DRPLA (OMIM:125370) is a progressive neurodegenerative disorder. It is caused by the expansion of a CAG repeat in the DRPLA gene on chromosome 12p. This results in an extended polyglutamine region in atrophin-1, that is thought to confer toxicity to the protein, possibly through altering its interactions with other proteins [
,
]. The expansion of a CAG repeat is also the underlying defect in six other neurodegenerative disorders, including Huntington's disease. One interaction of expanded polyglutamine repeats that is thought to be pathogenic is that with the short glutamine repeat in the transcriptional coactivator CREB binding protein, CBP. This interaction draws CBP away from its usual nuclear location to the expanded polyglutamine repeat protein aggregates that are characteristic of the polyglutamine neurodegenerative disorders. This interferes with CBP-mediated transcription and causes cytotoxicity [
]..
This entry includes Atrophin-1 and related proteins.
This entry includes SURF1 and Shy1 proteins. The surfeit locus 1 gene (SURF1 or surf-1) encodes a conserved protein of about 300 amino-acid residues that seems to be involved in the biogenesis of
cytochrome c oxidase []. Vertebrate SURF1 is evolutionary related to yeastprotein Shy1, which is a mitochondrial inner membrane protein required for assembly of cytochrome c oxidase [
]. There seems to be two transmembrane regions in these proteins,one in the N-terminal, the other in the C-terminal.
Defects in SURF1 are a cause of Leigh syndrome (LS). LS is a severe neurological disorder characterised by bilaterally symmetrical necrotic lesions in subcortical brain regions that is commonly associated with systemic cytochrome c oxidase (COX) deficiency [
,
,
]. The surfeit locus gene SURF4 (or surf-4) encodes a conserved integral eukaryotic membrane protein of about 270 to 300 amino-acid residues that seems to be located in the endoplasmic reticulum [
].
The REJ (Receptor for Egg Jelly) domain is found in PKD1
and the sperm receptor for egg jelly
[
]. The exact function of this domain is unknown. The domain is 600 amino acids long so is probably composed of multiple structural domains. There are six completely conserved cysteine residues that may form disulphide bridges. This region contains tandem PKD-like domains.Sequence similarity between a region of the autosomal dominant polycystic kidney disease (ADPKD) protein, polycystin-1 and a sea urchin sperm glycoprotein involved in fertilization, the receptor for egg jelly (suREJ) has been known for some time. The suREJ protein binds the glycoprotein coat of the egg (egg jelly), triggering the acrosome reaction, which transforms the sperm into a fusogenic cell. The sequence similarity and expression pattern suggests that the predicted human PKDREJ protein is a mammalian equivalent of the suREJ protein and therefore may have a central role in human fertilization [
].
This entry represents the RNA recognition motifs (RRM) Yra1 and Mlo3 found in budding yeast and fission yeast, respectively. Yra1 is an essential nuclear RNA-binding protein. It belongs to the evolutionarily conserved REF (RNA and export factor binding proteins) family of hnRNP-like proteins. Yra1 possesses potent RNA annealing activity and interacts with a number of proteins involved in nuclear transport and RNA processing. It binds to the mRNA export factor Mex67p/TAP and couples transcription to export in yeast. Yra1 is associated with Pse1 and Kap123, two members of the beta-importin family, further mediating transport of Yra1 into the nucleus. In addition, the co-transcriptional loading of Yra1 is required for autoregulation. Yra1 consists of two highly conserved N- and C-terminal boxes and a central RNA recognition motif (RRM) [
]. Mlo3, also termed mRNA export protein mlo3, has been identified in fission yeast as a protein that causes defects in chromosome segregation when overexpressed. It shows high sequence similarity with Yra1 [
].
This entry represents the RNA recognition motif 2 (RRM2) of HuR, also known as ELAV-like protein 1 (ELAV-1), the ubiquitously expressed Hu family member [
,
]. HuR binds to AU-rich RNA element (ARE) in target mRNAs and stabilizes them against degradation. It also regulates the nuclear import of proteins []. It has a variety of biological functions mostly related to the regulation of cellular response to DNA damage and other types of stress. HuR has an anti-apoptotic function during early cell stress response []. HuR may be important in muscle differentiation, adipogenesis, suppression of inflammatory response and modulation of gene expression in response to chronic ethanol exposure and amino acid starvation [].Like other Hu proteins, HuR contains three RNA recognition motifs (RRMs). RRM1 and RRM2 may cooperate in binding to an AU-rich RNA element (ARE). RRM3 may help to maintain the stability of the RNA-protein complex, and might also bind to poly(A) tails or be involved in protein-protein interactions [
].
This entry represents the Death Domain (DD) of Myeloid Differentiation primary response protein 88 (MyD88). MyD88 is an adaptor protein involved in interleukin-1 receptor (IL-1R)- and Toll-like receptor (TLR)-induced activation of nuclear factor-kappaB (NF-kB) and mitogen activated protein kinase pathways that lead to the induction of proinflammatory cytokines [
]. It is a key component in the signaling pathway of pathogen recognition in the innate immune system. MyD88 contains an N-terminal DD and a C-terminal Toll/IL-1 Receptor (TIR) homology domain that mediates interaction with TLRs and IL-1R [].In general, DDs are protein-protein interaction domains found in a variety of domain architectures. Their common feature is that they form homodimers by self-association or heterodimers by associating with other members of the DD superfamily including CARD (Caspase activation and recruitment domain), DED (Death Effector Domain), and PYRIN [
]. They serve as adaptors in signaling pathways and can recruit other proteins into signaling complexes [].
H/ACA ribonucleoprotein particles (RNPs) are a family of RNA pseudouridine synthases that specify modification sites through guide RNAs. The function of these H/ACA RNPs is essential for biogenesis of the ribosome, splicing of precursor mRNAs (pre-mRNAs), maintenance of telomeres and probably for additional cellular processes [
]. All H/ACA RNPs contain a specific RNA component (snoRNA or scaRNA) and at least four proteins common to all such particles: Cbf5, Gar1, Nhp2 and Nop10. These proteins are highly conserved from yeast to mammals and homologues are also present in archaea []. The H/ACA protein complex contains a stable core composed of Cbf5 and Nop10, to which Gar1 and Nhp2 subsequently bind [].In eukaryotes Nop10 is a nucleolar protein that is specifically associated with H/ACA snoRNAs. It is essential for normal 18S rRNA production and rRNA pseudouridylation by the ribonucleoprotein particles containing H/ACA snoRNAs (H/ACA snoRNPs). Nop10 is probably necessary for the stability of these RNPs [
]. The Nop10 domain structure has a rubredoxin-like fold.
This entry represents several Nif (B, X and Y) proteins, which are involved in the biosynthesis of the iron-molybdenum cofactor (FeMo-co) found in the dinitrogenase enzyme of the nitrogenase complex in nitrogen-fixing bacteria. The nitrogenase complex catalyses the reduction of atmospheric dinitrogen to ammonia, and is composed of an iron metalloprotein (dinitrogenase reductase; homodimer of NifH;
) and a Fe-Mo metalloprotein (dinitrogenase; heterotetramer of NifD and NifK;
). The pathway for the synthesis of the Fe-Mo cofactor involves several proteins, including NifB, NifE, NifH, NifN, NifQ, NifV and NifX. NifB appears to be an iron-sulphur source for FeMo-co biosynthesis, while NifX may be associated with the mature FeMo-co, in particular with the addition of homocitrate during the last step of biosynthesis [
]. The NifX protein shows sequence similarity with the C terminus of NifB [], as well as to the conserved protein MTH1175 from the archaeon Methanobacterium thermoautotrophicum, which displays a ribonuclease H-like motif of three layers, alpha/beta/alpha, with a single mixed β-sheet [].
Barrier-to-autointegration factor (BAF) is an essential protein that is highly conserved in metazoan evolution, and which may act as a DNA-bridging protein [
]. BAF binds directly to double-stranded DNA, to transcription activators, and to inner nuclear membrane proteins, including lamin A filament proteins that anchor nuclear-pore complexes in place, and nuclear LEM-domain proteins that bind to laminins filaments and chromatin. New findings suggest that BAF has structural roles in nuclear assembly and chromatin organisation, represses gene expression and might interlink chromatin structure, nuclear architecture and gene regulation in metazoans [].BAF can be exploited by retroviruses to act as a host component of pre-integration complexes, which promote the integration of the retroviral DNA into the host chromosome by preventing autointegration of retroviral DNA [
]. BAF might contribute to the assembly or activity of retroviral pre-integration complexes through direct binding to the retroviral proteins p55 Gag and matrix, as well as to DNA.
Vesicular carriers mediate a continuous flux of proteins and lipids between endoplasmic reticulum (ER) and the Golgi. Anterograde and retrograde transport is mediated by distinct sets of cytosolic coat proteins, the COPI and COPII coats, respectively, which act on the membrane to capture cargo proteins into nascent vesicles [
].Sec23 is a component of the coat protein complex II (COPII). Polymerization of the coat requires the recruitment of the Sec13/Sec31 complex (coat outer shell) by the Sec23/Sec24 complex. The Sec23/Sec24 coat complex then sorts the fusion machinery (SNAREs) into vesicles as they bud from the ER. Sec23 has been shown to interact in a sequential manner with other proteins (Sar1, TRAPPI and Hrr25) to control the direction of anterograde membrane flow [
].This entry represents the C-terminal domain of Sec23. This domain is distantly related to gelsolin-like repeats and the actin depolymerizing domains found in cofilin and similar proteins. The function of the Sec23 C-terminal domain is unclear.
The RH1 (RILP homology 1) protein-protein interaction domain is found in the
following animal Rab36-binding proteins:Rab interacting lysosomal proteins (RILP),RILP-like 1 (RILP-L1),RILP-like 2 (RILP-L2),JNK-interacting protein 3 (JIP3),JNK-interacting protein 4 (JIP4).It binds to the myosin Va globular tail domain (MyoVa-GTD) in mainly
hydrophobic interactions.The RH1 domain adopts an all-helical structure and forms a
homodimer with a four-helix bundle conformation to interact with MyoVa-GTD.The RH1 homodimer is structurally separated into two parts, the N-terminal
four-helix bundle formed by alpha2 and alpha3N and the C-terminal coiled-coilformed by alpha3C. The four-helix bundle in the RH1 dimer is mainly stabilized
by forming a hydrophobic core. The N-terminal small helix (alpha1) and itsfollowing loop pack on alpha2 from the same molecule and alpha3 from the
neighbouring molecule and thus contribute to the bundle stability. The RH1homodimer is further strengthened by a coiled coil formed by the C-terminal
half of the alpha3-helix [].This entry represents the entire RH1 domain.
The entry describes a hydrophobic sequence region that is duplicated to form the AbrB protein of Escherichia coli (not to be confused with a Bacillus subtilis protein with the same gene symbol). In some species, notably the Cyanobacteria and Thermus thermophilus, proteins consist of a single copy rather than two copies. The member from Pseudomonas putida, PP_1415 (
), was suggested to be an ammonia monooxygenase characteristic of heterotrophic nitrifiers, based on an experimental indication of such activity in the organism and a glimmer of local sequence similarity between parts of the P. putida protein and an instance of the AmoA protein (
) from Nitrosomonas europaea [
]; we do not believe the sequence similarity to be meaningful. The member from E. coli (b0715, ybgN) appears to be the largely uncharacterised AbrB (aidB regulator) protein of E. coli [], although we did not manage to trace the origin of association of the article to the sequence.
Glutathione peroxidase (GSHPx) (
) is an enzyme that catalyses the reduction of hydroxyperoxides by glutathione [
]. Its main function is to protect against the damaging effect of endogenously formed hydroxyperoxides.In higher vertebrates, several forms of GSHPx are known, including a ubiquitous cytosolic form (GSHPx-1), a gastrointestinal cytosolic form (GSHPx-GI), a plasma secreted form (GSHPx-P), and an epididymal secretory form (GSHPx-EP). In addition to these characterised forms, the sequence of a protein of unknown function [
] has been shown to be evolutionary related to those of GSHPx's. Escherichia coli protein btuE, a periplasmic protein involved in the transport of vitamin B12, is also evolutionary related to GSHPx's; the significance of this relationship is not yet clear. Escherichia coli protein btuE, a periplasmic protein involved in the transport of vitamin B12, is also evolutionary related to GSHPx's; the significance of this relationship is not yet clear. This conserved site consists of a octapeptide located in the central section of these proteins.
N-terminal RING finger/B-box/coiled coil (RBCC) or tripartite motif (TRIM) proteins, which are found in metazoa, are involved in a vast array of intracellular functions. They appear to function as part of large protein complexes and possess ubiquitin-protein isopeptide ligase activity. The following RBCC proteins contain an ~60-residue COS (C-terminal subgroup one signature) domain, which is also found in a distantly related non-RBCC microtubule-binding protein, GLFND:
Vertebrate MID1 and MID2, which associate with microtubules through homo- and heterodimerizationAnimal TRIM9, which plays a regulatory role in synaptic vesicle exocytosisMammalian TRIM nine-like (TNL)Mammalian TRIM36, which could play a regulatory role in exocytosis of the sperm vesicleMammalian tripartite, fibronectin type III and C-terminal B30.2/SPRY (TRIFIC)Mammalian muscle-specific RING finger (MURF) family. MURF proteins have an ability to form both homo- and heterodimers with each other and to associate with the microtubule cytoskeletonIn addition to RBCC, the COS domain is also found in association with B30.2/SPRY or fibronectin type-III (FN3) domains.The COS domain is predicted to consist of two α-helical coils [
].
The major royal jelly proteins (MRJPs) comprise 12.5% of the mass, and 82-90% of the protein content [
], of honeybee (Apis mellifera) royal jelly. Royal jelly is a substance secreted by the cephalic glands of nurse bees [] and it is used to trigger development of a queen bee from a bee larva. The biological function of the MRJPs is unknown, but they are believed to play a major role in nutrition due to their high essential amino acid content []. Two royal jelly proteins, MRJP3 and MRJP5, contain a tandem repeat that results from a high genetic variability. This polymorphism may be useful for genotyping individual bees [].This family also includes related proteins such as protein yellow-f and yellow-f2 from Drosophila, which are dopachrome-conversion enzymes responsible for catalysing the conversion of dopachrome into 5,6-dihydroxyindole in the melanization pathway [
, [
]. This family of yellow-like proteins has only been identified within insects and a number of bacterial species [].
Claudins form the paracellular tight junction seal in epithelial tissues. In humans, 24 claudins (claudin 1-24) have been identified. Their ability to polymerise and form strands is affected by the cell types [
,
,
]. They can also form heteropolymers with each other within and between tight junction strands []. Most of the claudins (claudin-12 being the exception) have a C-terminal PDZ-binding motif that can interact with other PDZ domain proteins, such as scaffolding protein, ZO-1, -2 and -3 []. They also interact with non-tight junction proteins, such as cell adhesion proteins EpCam and tetraspanins and the signaling proteins, ephrin A and B and their receptors, EphA and EphB [].Claudin-3 was originally termed rat ventral prostate 1 protein (RVP1), and
Clostridium perfringens enterotoxin receptor 2 (CPETR2). It was reclassified as claudin-3 on the basis of cDNA similarity with claudins-1 and
-2, and antibody studies that showed it to be expressed at tight junctions[
].
The human pathogen Mycobacterium tuberculosis harbours a large number of genes that encode proteins whose N-termini contain the characteristic motifs Pro-Glu (PE) or Pro-Pro-Glu (PPE). A subgroup of the PE proteins contains polymorphic GC-rich sequences (PGRS), while a subgroup of the PPE proteins contains major polymorphic tandem repeats (MPTR). The function of most of these proteins remains unknown [
]. However, the PE_PGRS proteins from Mycobacterium marinum are secreted by components of the ESX-5 system that belongs to the recently defined type VII secretion systems []. It has also been reported that the PE_PGRS family of proteins contains multiple calcium-binding and glycine-rich sequence motifs GGXGXD/NXUX. This sequence repeat constitutes a calcium-binding parallel β-roll or parallel β-helix structure and is found in RTX toxins secreted by many Gram-negative bacteria []. This domain is found C-terminal to the PE (
) and PPE (
) domains. The secondary structure of this domain is predicted to be a mixture of alpha helices and beta strands [
].
Ebola virus sp. are non-segmented, negative-strand RNA viruses that causes severe haemorrhagic fever in humans with high rates of mortality. The virus matrix protein VP40 is a major structural protein that plays a central role in virus assembly and budding at the plasma membrane of infected cells. VP40 proteins associate with cellular membranes, interact with the cytoplasmic tails of glycoproteins, and bind to the ribonucleoprotein complex. The VP40 monomer consists of two domains, the N-terminal oligomerization domain and the C-terminal membrane-binding domain, connected by a flexible linker. Both the N- and C-terminal domains fold into beta sandwich structures of similar topology [
]. Within the N-terminal domain are two overlapping L-domains with the sequences PTAP and PPEY at residues 7 to13, which are required for efficient budding []. L-domains are thought to mediate their function in budding through their interaction with specific host cellular proteins, such as tsg101 and vps-4 []. This entry describes the VP40 C-terminal domain.
A sequence of about thirty to forty amino-acid residues long found in the sequence of epidermal growth factor (EGF)
has been shown [,
,
,
] to be present, in a moreor less conserved form, in a large number of other, mostly animal proteins. The list of proteins currently known to
contain one or more copies of an EGF-like pattern is large and varied. The functional significance of EGF domains inwhat appear to be unrelated proteins is not yet clear. However, a common feature is that these repeats are found in
the extracellular domain of membrane-bound proteins or in proteins known to be secreted (exception: prostaglandinG/H synthase). The EGF domain includes six cysteine residues which have been shown (in EGF) to be involved in disulphide
bonds. The main structure is a two-stranded β-sheet followed by a loop to a C-terminal short two-stranded sheet.Subdomains between the conserved cysteines vary in length.
This entry contains EGF domains found in a variety of extracellular and membrane proteins
This group of proteins contain serine peptidases belonging to the MEROPS peptidase family S54 (Rhomboid, clan ST). They are integral membrane proteins related to the Drosophila melanogaster (Fruit fly) rhomboid protein
. Members of this family are found in archaea, bacteria and eukaryotes.The D. melanogaster rhomboid protease cleaves type-1 transmembrane domains using a catalytic triad composed of serine, histidine and asparagine contributed by different transmembrane domains. It cleaves the transmembrane proteins Spitz, Gurken and Keren within their transmembrane domains to release a soluble TGFalpha-like growth factor. Cleavage occurs in the Golgi, following translocation of the substrates from the endoplasmic reticulum membrane by Star, another transmembrane protein. The growth factors are then able to activate the epidermal growth factor receptor [
,
].Few substrates of mammalian rhomboid homologues have been determined, but rhomboid-like protein 2 (MEROPS S54.002) has been shown to cleave ephrin B3 [
]. Parasite-encoded rhomboid enzymes are also important for invasion of host cells by Toxoplasma and the malaria parasite.
The calcium-binding domain found in S100 and CaBP-9k proteins is a subfamily of the EF-hand calcium-binding domain [
]. S100s are small dimeric acidic calcium and zinc-binding proteins abundant in the brain, with S100B playing an important role in modulating the proliferation and differentiation of neurons and glia cells []. S100 proteins have two different types of calcium-binding sites: a low affinity one with a special structure, and a 'normal' EF-hand type high-affinity site.Calbindin-D9k (CaBP-9k) also belong to this family of proteins, but it does not form dimers. CaBP-9k is a cytosolic protein expressed in a variety of tissues. Although its precise function is unknown, it appears to be under the control of the steroid hormones oestrogen and progesterone in the female reproductive system [
]. In the intestine, CaBP-9k may be involved in calcium absorption by mediating intracellular diffusion [].This entry represents a subdomain of the calcium-binding domain found in S100, CaBP-9k, and related proteins.
The axin interaction dorsalization-associated (Aida) protein was characterised in zebrafish as a protein that utilizes its C-terminal region to interact with axis formation inhibitor (Axin), which is a microtubule-interacting scaffold protein for several distinct signalling proteins in the Wnt cascade. The C-terminal region of the Aida protein is a distinct version of the C2 domain. This Aida-type C2 domain is found in the C-terminal region of the proteins and it is critical for interactions with cytoskeletal in the context of cellular adhesion points, thus, it is combined with diverse domains related to cytoskeletal functions, e.g. EF hands, coiled coils, IQ calmodulin-binding motifs, ankyrin repeats and myosin head motor domain, or with a second lipid-binding domain, e.g. the PH domain. The Aida-type C2 domain is found only in the metazoan, choanoflagellate, chromist and chlorophyte lineages [
,
].This domain has predominantly a β-strand globular fold composed of an antiparallel β-sandwich with two β-sheets, and three short α-helices to stabilize the conformation [
].
This entry represents a group of RING membrane-anchor proteins, including RNF5/RNF185 from humans and AtRMA1/AtRMA2/AtRMA3 from Arabidopsis. They contain a RING finger motif and a C-terminal membrane-anchoring domain. RNF5 and RNF185 are E3 ubiquitin-protein ligases [
,
].AtRMA1, AtRMA2, and AtRMA3, are endoplasmic reticulum (ER)-localized Arabidopsis homologues of human outer membrane of the ER-anchor E3 ubiquitin-protein ligase, RING finger protein 5 (RNF5) [
]. AtRMAs possess E3 ubiquitin ligase activity, and may play a role in the growth and development of Arabidopsis. The AtRMA1 and AtRMA3 genes are predominantly expressed in major tissues, such as cotyledons, leaves, shoot-root junction, roots, and anthers, while AtRMA2 expression is restricted to the root tips and leaf hydathodes. AtRma1 probably functions with the Ubc4/5 subfamily of E2. AtRMA2 is likely involved in the cellular regulation of ABP1 expression levels through interacting with auxin binding protein 1 (ABP1). AtRMA proteins contain an N-terminal C3HC4-type RING-HC finger and a trans-membrane-anchoring domain in their extreme C-terminal region [].
Glycerate/sugar phosphate transporter, conserved site
Type:
Conserved_site
Description:
Proteins in this group are involved in the transport system that mediates the uptake of a number of sugar phosphates as well as the regulatory components that are responsible for induction of this transport system by external glucose 6-phosphate. In Escherichia coli its role in transmembrane signalling may involve sugar-phosphate-binding sites and transmembrane orientations similar to those of the transport protein [
]. The following proteins in this entry, involved in the uptake of phosphorylated metabolites,are evolutionary related [
,
]:E. coli, Bacillus subtilis and Haemophilus influenzae glycerol-3- phosphate transporter (gene glpT).Salmonella typhimurium phosphoglycerate transporter (gene pgtP).E. coli and S. typhimurium hexose-6-phosphate transporter (gene uhpT).E. coli and S. typhimurium protein uhpC. UhpC is necessary for the expression of uhpT and seems to act jointly with the uhpB sensor/kinase protein.Human glucose 6-phosphate translocase [
].These proteins of about 50kDa apparently contain 12 transmembrane regions.This entry represents a conserved region in the central section of these transporters.
This family consists of the eukaryotic protein 2',3'-cyclic nucleotide 3'-phosphodiesterase (CNP). 2',3'-cyclic nucleotide 3'-phosphodiesterase (CNP) is one of the earliest myelin-related proteins expressed in differentiating oligodendrocytes and Schwann cells. CNP is abundant in the central nervous system and in oligodendrocytes. This protein is also found in mammalian photoreceptor cells, testis and lymphocytes. Although the biological function of CNP is unknown, it is thought to play a significant role in the formation of the myelin sheath, where it comprises 4% of total protein. CNP selectively cleaves 2',3'-cyclic nucleotides to produce 2'-nucleotides
in vitro. Although physiologically relevant substrates with 2',3'-cyclic termini are still unknown, numerous cyclic phosphate containing RNAs occur transiently within eukaryotic cells. Other known protein families capable of hydrolysing 2',3'-cyclic nucleotides include tRNA ligases and plant cyclic phosphodiesterases. The catalytic domains from all these proteins contain two tetra-peptide motifs H-X-T/S-X, where X is usually a hydrophobic residue. Mutation of either histidine in CNP abolishes enzymatic activity [
].
RPGR-interacting protein 1 (RPGRIP1) is mutated in the eye disease Leber congenital amaurosis (LCA) and its structural homologue, RPGRIP1-like (RPGRIP1L, also called NPHP8 or fantom), is mutated in many different ciliopathies [
,
]. Both are multidomain proteins that are predicted to interact with retinitis pigmentosa G-protein regulator (RPGR) []. Both consist of an N-terminal coiled coil domain, two C2 domains (C2N and C2C), and a C-terminal RPGR-interacting domain (RID). RID is a C2 domain with a canonical beta sandwich structure that does not bind Ca2+ and/or phospholipids and thus constitutes a unique type of protein-protein interaction module [].Both RPGRIP1 and RPGRIP1L interact with the ciliary transition zone protein nephrocystin 4 (NPHP4) via their C2C domain [
,
]. An hypothesis is that RPGRIP1 and RPGRIP1L function as cilium-specific scaffolds that recruit a Nek4 signaling network which regulates cilium stability []. The expression of RPGRIP1 seems to be limited to photoreceptors and amacrine cells in the retina [], whereas RPGRIP1L is found in other tissues as well.
Rubella virus is an enveloped positive-strand RNA virus of the family Togaviridae. Virions are composed of three structural proteins: a capsid and two membrane-spanning glycoproteins, E2 and E1. During virus assembly, the capsid interacts with genomic RNA to form nucleocapsids. It has been discovered that capsid phosphorylation serves to negatively regulate binding of viral genomic RNA. This may delay the initiation of nucleocapsid assembly until sufficient amounts of virus glycoproteins accumulate at the budding site and/or prevent non-specific binding to cellular RNA when levels of genomic RNA are low. It follows that at a late stage in replication, the capsid may undergo dephosphorylation before nucleocapsid assembly occurs [
].This superfamily represents a domain found in the rubella capsid protein. Structurally, it consists of 5 beta strands and a two-turn alpha helix. The 5 beta strands are arranged anti-parallel and within the capsid protein are labelled ABCDE. The two-turn alpha helix is positioned between strands B and C. Within rubella virus, capsid protein forms a two-fold symmetric dimer.
Barrier-to-autointegration factor (BAF) is an essential protein that is highly conserved in metazoan evolution, and which may act as a DNA-bridging protein [
]. BAF binds directly to double-stranded DNA, to transcription activators, and to inner nuclear membrane proteins, including lamin A filament proteins that anchor nuclear-pore complexes in place, and nuclear LEM-domain proteins that bind to laminins filaments and chromatin. New findings suggest that BAF has structural roles in nuclear assembly and chromatin organisation, represses gene expression and might interlink chromatin structure, nuclear architecture and gene regulation in metazoans [].BAF can be exploited by retroviruses to act as a host component of pre-integration complexes, which promote the integration of the retroviral DNA into the host chromosome by preventing autointegration of retroviral DNA [
]. BAF might contribute to the assembly or activity of retroviral pre-integration complexes through direct binding to the retroviral proteins p55 Gag and matrix, as well as to DNA.