Protein phosphatase 1 regulatory subunit 3B/C/D, metazoa
Type:
Family
Description:
This entry represents protein phosphatase 1 regulatory subunit 3B/C/D from animals. They act as glycogen-targeting subunits for PP1 (protein phosphatase 1) and regulate its activity [
].
Mitogen-activated protein (MAP) kinase kinase kinase 14
Type:
Family
Description:
This entry represents mitogen-activated protein kinase kinase kinase 14. Mammalian MAPKKK14 (also known as NIK) is a lymphotoxin beta-activated kinase which seems to be exclusively involved in the activation of NF-kappa-B and its transcriptional activity [
,
].
Mitogen-activated protein (MAP) kinase kinase kinase 7
Type:
Family
Description:
This entry represents mitogen-activated protein kinase kinase kinase 7 (MAP3K7, also known as TAK-1), which is a component of a protein kinase signal transduction cascade, acting as a mediator of TGF-beta signal transduction. MAP3K7 stimulates NF-kappa-B (NFKB) activation and the p38 MAPK pathway through the phosphorylation and activation of several MAP kinase kinases []. MAP3K7 binds both upstream activators and downstream substrates in multi-molecular complexes. This entry also includes the C. elegans human homologue MOM-4 [], which is part of the Wnt signaling pathway, essential for cell fate specification during embryo development.A small deletion of MAP3K7 and four other genes was found to be strongly associated with high-grade prostate cancers [
].
Mitogen-activated protein (MAP) kinase kinase kinase 8
Type:
Family
Description:
This entry represents mitogen-activated protein kinase kinase kinase 8 (MAP3K8 or COT or TPL2) (
), which plays a role in the cell cycle. MAP3K8 is required for TLR4 activation of the MEK/ERK pathway. MAP3K8 is able to activate NF-kappa-B 1 (NFKB1) by stimulating proteasome-mediated proteolysis of NF-kappa-B 1/p105 [
]. MAP3K8 forms a ternary complex with NFKB1 and TNIP2 []. MAP3K8 is recruited to the CD40 complex via a mechanism dependent on TRAF-binding sites in CD40 [].
Tyrosine-protein kinase, chain length determinant protein EpsG
Type:
Family
Description:
Protein phosphorylation, which plays a key role in most cellular activities, is a reversible process mediated by protein kinases and phosphoprotein phosphatases. Protein kinases catalyse the transfer of the gamma phosphate from nucleotide triphosphates (often ATP) to one or more amino acid residues in a protein substrate side chain, resulting in a conformational change affecting protein function. Phosphoprotein phosphatases catalyse the reverse process.
Protein kinases fall into three broad classes, characterised with respect to substrate specificity []:Serine/threonine-protein kinasesTyrosine-protein kinasesDual specificity protein kinases (e.g. MEK - phosphorylates both Thr and Tyr on target proteins)Protein kinase function is evolutionarily conserved from Escherichia coli to human [
]. Protein kinases play a role in a multitude of cellular processes, including division, proliferation, apoptosis, and differentiation []. Phosphorylation usually results in a functional change of the target protein by changing enzyme activity, cellular location, or association with other proteins. The catalytic subunits of protein kinases are highly conserved, and several structures have been solved [], leading to large screens to develop kinase-specific inhibitors for the treatments of a number of diseases [].Tyrosine-protein kinases can transfer a phosphate group from ATP to a tyrosine residue in a protein. These enzymes can be divided into two main groups [
]:Receptor tyrosine kinases (RTK), which are transmembrane proteins involved in signal transduction; they play key roles in growth, differentiation, metabolism, adhesion, motility, death and oncogenesis [
]. RTKs are composed of 3 domains: an extracellular domain (binds ligand), a transmembrane (TM) domain, and an intracellular catalytic domain (phosphorylates substrate). The TM domain plays an important role in the dimerisation process necessary for signal transduction []. Cytoplasmic / non-receptor tyrosine kinases, which act as regulatory proteins, playing key roles in cell differentiation, motility, proliferation, and survival. For example, the Src-family of protein-tyrosine kinases [
].The proteins in this family are homologues of the EpsG protein found in Methylobacillus sp. 12S and are generally found in operons with other Eps homologues. The protein is believed to function as the protein tyrosine kinase component of the chain length regulator (along with the transmembrane component EpsF).
This entry represents a group of WD repeat and FYVE domain-containing proteins, including WDFY1 and WDFY2. WDFY1 mediates TLR3/4 signalling by recruiting TRIF [
]. WDFY2, also known as Prof (propeller-FYVE protein), is involved in endocytosis [].
Secreted effector protein SptP, N-terminal domain superfamily
Type:
Homologous_superfamily
Description:
This superfamily represents a domain found in the Salmonella effector protein SptP, which interacts with SicP chaperone dimers mainly through four regions of its chaperone-binding domain. The structure of the SptP-SicP complex contains four molecules of SicP, aligned in a linear fashion and arranged in two sets of tightly bound homodimers that bind two SptP molecules. The SicP homodimers do not interact with each other, but are held together by a molecular interface formed between two SptP molecules. Each SptP molecule is wrapped around by three SicP chaperones (two chaperones from one homodimer and a third one from the opposite homodimer pair) [
].
Uncharacterised protein family, Major Facilitator Superfamily MSF4
Type:
Family
Description:
The MFS is a very old, large and diverse superfamily that includes several hundred sequenced members. They catalyze uniport, solute:cation (H+ or Na+) symport and/or solute:H+ or solute:solute antiport. Most are of 400-600 amino acyl residues in length and possess either 12 or 14 putative transmembrane a-helical spanners. They exhibit specificity for sugars, polyols, drugs, neurotransmitters, Krebs cycle metabolites, phosphorylated glycolytic intermediates, amino acids, peptides, osmolites, nucleosides, organic anions, inorganic anions, etc. They are found ubiquitously in all three kingdoms of living organisms. The generalized transport reactions catalyzed by MFS porters are: (1)Uniport: S (out) S (in).(2)Symport: S (out) + [H+ or Na+] (out) --->S (in) + [H+ or Na+] (in).(3)Antiport: S1 (out) + S2 (in) S1 (in) + S2 (out), (S1 may be H+ or a solute)
This family of proteins are uncharacterised proteins from archaea, which may be major facilitators. This family includes proteins from Archaeoglobus fulgidus and Aeropyrum pernix.
Structural accessory protein ORF7a, SARS-CoV-like, X4e domain
Type:
Domain
Description:
This entry includes the structural accessory protein ORF7a, also called NS7a, X4 and U122, of Severe Acute Respiratory Syndrome Coronaviruses (SARS-CoV) from betacoronavirus subgenera Sarbecovirus (lineage B), including SARS-CoV-2. ORF7a/NS7a from betacoronavirus in the subgenera Sarbecovirus (B lineage) are not related to NS7a proteins from other coronavirus lineages. The structure of the structural accessory protein ORF7a, shows similarities to the immunoglobulin-like fold with some features resembling those of the Dl domain of ICAM-1 and suggests a binding activity to integrin I domains [
]. In SARS-CoV-infected cells, ORF7a is expressed and retained intracellularly within the Golgi network []. ORF7a is thought to play an important role during the SARS-CoV replication cycle []. Expression studies of ORF7a have shown that biological functions include induction of apoptosis through a caspase-dependent pathway, activation of the p38 mitogen-activated protein kinase signaling pathway, inhibition of host protein translation, and suppression of cell growth progression. These results collectively suggested that ORF7a protein may be involved in virus-host interactions []. Studies in SARS-CoV-2 revealed that ORF7a plays a role as antagonist of host tetherin (BST2), disrupting its antiviral effect. ORF7a binds to BST2 and sequesters it to the perinuclear region, thereby preventing its antiviral function at cell membrane [].This entry represents the X4 ectodomain from ORF7a (X4e), which forms a well defined β-sandwich fold. It is built up from seven β-strands, four strands form one β-sheet and the other three strands form a second sheet. The sheets are closely packed or 'sandwiched' against each other. Each sheet is amphipathic with the hydrophobic side facing inward. Two disulfide bonds link both sheets on opposite edges therefore stabilizing the β-sandwich structure [,
].
ECA polysaccharide chain length modulation protein WzzE
Type:
Family
Description:
This entry represents ECA polysaccharide chain length modulation protein WzzE from Enterobacteriaceae. WzzE is involved in the Enterobacterial common antigen (ECA) biosynthesis, which is part of bacterial outer membrane biogenesis. ECA is a specific surface antigen (polysaccharide) shared by all members of the Enterobacteriaceae and is restricted to this family [
]. WzzE modulates the polysaccharide chain length of enterobacterial common antigen []. It is required for the assembly of the phosphoglyceride-linked form of ECA (ECA(PG)) and the water-soluble cyclic form of ECA (ECA(CYC)) [].
DNA double-strand break repair protein Mre11, archaea-type
Type:
Family
Description:
This entry includes DNA double-strand break repair protein Mre11 mainly from archaea. Similar to the eukaryotic Mre11-Rad50 complex, the archaeal Mre11-Rad50 complex is involved in the early steps of DNA double-strand break (DSB) repair. The Haloferax volcanii Mre11-Rad50 complex may restrain the repair of DSBs by HR (homologous recombination), allowing another pathway to act as the primary mode of repair [
].
Rad9, Rad1, Hus1-interacting nuclear orphan protein 1
Type:
Family
Description:
This entry represents Rad9, Rad1, Hus1-interacting nuclear orphan protein 1 (RHINO or RHNO1), which plays a role in DNA damage response (DDR) signalling upon genotoxic stresses such as ionizing radiation (IR) during the S phase [
]. RHNO1 is recruited to sites of DNA damage through interaction with the Rad9-Rad1-Hus1 (9-1-1) complex and ATR activator TopBP1. It plays a role in ATR-mediated activation of Chk1 driven by TopBP1 and 9-1-1 []. It is involved in mammary carcinogenesis [].
Outer membrane protein assembly factor BamC, C-terminal
Type:
Homologous_superfamily
Description:
This superfamily represents the C-terminal domain of outer membrane protein (OMP) assembly factor BamC, a component of the essential five-protein β-barrel assembly machinery (BAM) in E. coli [
,
,
,
,
]. This BAM complex is responsible for recognition and assembly of outer membrane β-barrel proteins, although BamC is not essential for its functionality []. BamC, together with the other two non-essential protein assembly factors BamE and BamB, express a rather unspecific influence on the stability of the complex and the kinetic folding of selected OMP proteins.BamC has evolved through the gene duplication of two conserved domains known to mediate protein interactions in structurally related complexes.
Aurora kinase A and ninein-interacting protein (also known as AIBp or AUNIP) is a family of eukaryotic proteins necessary for the adequate functioning of Aurora-A, a protein involved in chromosome alignment, centrosome maturation, mitotic spindle assembly and aspects of tumourigenesis. AIBp is likely to act as a regulator of Aurora-A activity [
]. It interacts with with AURKA via its C terminus and interacts with NIN via its N terminus [].
ABC transporter, urea permease protein UrtB, archaeal
Type:
Family
Description:
This entry represents a small group of ABC transporter permease subunits found in archaea. Several lines of evidence suggest they are functionally analogous, as well as homologous, to the UrtB subunit of the Corynebacterium glutamicum urea transporter. All members of the operons in which they are encoded show sequence similarity to urea transport subunits, their genes are located near the urease structural subunits in two out of three species, and partial phylogenetic profiling identifies this permease subunit as closely matching the profile of urea utilization.
Secreted effector protein SifA, C-terminal domain superfamily
Type:
Homologous_superfamily
Description:
Salmonella typhimurium SifA is an effector protein that alters host cell physiology and promotes bacterial survival in host tissues. This protein is a member of the WxxxE family of bacterial TTSS effectors that mimic activated small GTPases and it is required for endosomal tubulation and formation of Salmonella-induced filaments (Sifs), which are filamentous structures containing lysosomal membrane glycoproteins within epithelial cells. Sif formation is concomitant with intracellular bacterial replication [
,
].This superfamily represents the C-terminal WxxxE-containing domain which is organised in two three-helix bundles that form a V-shaped structure. The WxxxE conserved sequence important for both the folding and the secretion of the SifA effector [
,
]. This domain also includes a six-residue prenylation motif which influences the membrane association and targeting properties of SifA [].
Aleutian mink disease parvovirus non-structural protein 1-3
Type:
Domain
Description:
This entry represents a domain found in Aleutian mink disease parvovirus (ADV) non-structural protein 1-3 (NS1-3). It is approximately 50 amino acids in length. ADV utilizes caspase activity to cleave its major non-structural protein (NS1), enable the localization of this protein to the nucleus and facilitate virus replication [
,
]. The smaller NS2/3 are also encoded during viral infection and are essential for the replication of the virus [].
Photosystem II CP43 reaction centre protein superfamily
Type:
Homologous_superfamily
Description:
PSII is a multisubunit protein-pigment complex containing polypeptides both intrinsic and extrinsic to the photosynthetic membrane [
,
,
]. Within the core of the complex, the chlorophyll and beta-carotene pigments are mainly bound to the antenna proteins CP43 (PsbC) and CP47 (PsbB), which pass the excitation energy on to the reaction centre proteins D1 (Qb, PsbA) and D2 (Qa, PsbD) that bind all the redox-active cofactors involved in the energy conversion process. The PSII oxygen-evolving complex (OEC) oxidises water to provide protons for use by PSI, and consists of OEE1 (PsbO), OEE2 (PsbP) and OEE3 (PsbQ). The remaining subunits in PSII are of low molecular weight (less than 10kDa), and are involved in PSII assembly, stabilisation, dimerisation, and photo-protection []. This superfamily represents a domain found in the intrinsic antenna protein CP43 (PsbC), which is one of two such proteins found in the reaction centre of PSII, both of which can bind to chlorophyll 'a' and beta-carotene, passing the excitation energy on to the reaction centre [
]. It has a mainly α-helical structure.
Tetratricopeptide repeat protein POLLENLESS 3/SULFUR DEFICIENCY-INDUCED 1
Type:
Family
Description:
This entry represents a group of plant tetratricopeptide repeat proteins, including POLLENLESS 3 (also known as MS5) and SULFUR DEFICIENCY-INDUCED 1 (SDI1). MS5 is involved in the regulation of cell division after male meiosis I and II to facilitate exit from meiosis and transition to G1. MS5 mutants are male sterile, pollen tetrads undergo an extra round of division after meiosis II without chromosome replication, resulting in chromosome abnormalities [
, [
]. SDI1 is involved in the utilization of stored sulphate pools under sulphur-limiting conditions [].
Cytochrome c biogenesis CcmF C-terminal-like mitochondrial protein
Type:
Family
Description:
This entry represents a group of plant proteins, including CCMFC from plants. CCMFC foms a complex with CCMFN1, CCMFN2 and CCMH that performs the assembly of heme with c-type apocytochromes in mitochondria [
].
Uncharacterised protein family, calcium binding protein, CcbP
Type:
Family
Description:
CcbP is a Ca(2+) binding protein found in bacteria which is thought to bind Ca(2+) by protein surface charge. When bound to Ca(2+), the protein becomes more compact and the level of free calcium decreases. Within the bacteria, Ca(2+) has a role in the early stages of heterocyst differentiation. The free Ca(2+) concentration which is regulated by CcbP is critical for the differentiation process [
]. Calcium signalling is widespread in bacterial species, and prokaryotic cells like eukaryotes are equipped with all the elements to maintain Ca2+ homeostasis [].
This family represents thelytic switch protein BZLF1 from Epstein-Barr virus (strain GD1) (HHV-4) (Human herpesvirus 4). It is a transcription factor that acts as a molecular switch to induce the transition from the latent to the lytic or productive phase of the virus cycle of infection in cells containing a highly methylated viral genome [
,
]. It probably binds to silenced chromatin and recruits host chromatin-remodeling enzymes [].
2-aminoethylphosphonate ABC transporter system, permease protein PhnU
Type:
Family
Description:
The enzyme phosphonatase catalyses the degradation of 2-aminoethylphosphonate (AEP) in bacteria. This allows them to metabolise a range of organophosphonate compounds, including 2-aminoethylphosphonate, as a sole source of carbon, energy and phosphorus for growth [
]. The C-P bond in phosphonoacetaldehyde (Pald) is hydrolysed and a bi-covalent Lys53ethylenamine/Asp12 aspartylphosphate intermediate is formed []. This step can also be catalysed by C-P lyase [], with some bacteria having the genes for both pathways and some only for one of them. The 2-aminoethylphosphonate ABC transport system functions in the transport of 2-aminoethylphosphonate across the membrane for utilisation in the bacterial cell [].PhnU is probably part of the phnSTUV complex involved in 2-aminoethylphosphonate import. It is found in a region of the salmonella typhimurium LT2 genome [
] responsible for the catabolism of 2-aminoethylphosphonate via the phnWX pathway ().
Small G protein signalling modulator 1/2, Rab-binding domain
Type:
Domain
Description:
This domain adopts a PH-like fold. It has been called the Rab-binding domain (RBD) [
]. Small G-protein signalling modulator 1/2 (also known as RUTBC2/1) bind to Rab9A via their Pleckstrin homology (PH) domain [
,
]. RUTBC1 stimulates GTP hydrolysis by Rab32 and Rab33B [], while RUTBC2 appears to be a GAP for Rab36, Rab9A and associated proteins controling the recycling of mannose-6-phosphate receptors from late endosomes to the trans-Golgi [,
,
].
Nucleolar GTP-binding protein 2, circularly permuted GTPase motif
Type:
Domain
Description:
This entry represents the circularly permuted GTPase motif of Nucleolar GTP-binding protein 2 (GNL2, also known as NGP-1) from animals and fungi. GNL2 is a GTPase that associates with pre-60S ribosomal subunits in the nucleolus and is required for their nuclear export and maturation [
].This group of proteins also includes Nuclear/nucleolar GTPase 2 (NUG2) from plants [
].
HEAT repeat associated with sister chromatid cohesion protein
Type:
Repeat
Description:
This HEAT repeat is found most frequently in sister chromatid cohesion proteins such as Nipped-B [
]. HEAT repeats are found tandemly repeated in many proteins, and they appear to serve as flexible scaffolding on which other components can assemble.
Nuclear pore complex protein Nup98-Nup96-like, autopeptidase S59 domain
Type:
Domain
Description:
Nuclear pore complexes (NPCs) facilitate all nucleocytoplasmic transport in eukaryotic cells, playing essential roles in cellular homeostasis. The NPC is a modular structure composed of multiple copies of ~30 proteins (nucleoporins, Nups) arranged into distinct subcomplexes [
,
]. A number of these peptides are synthesised as precursors and undergo self-catalyzed cleavage. The largest NPC sub-complex is the heptameric Y-shaped mammalian Nup107-Nup160 complex (called Nup84 complex in budding yeast), an essential scaffolding component of the NPC [
,
,
]. Nup98 and Nup96 are encoded by the same gene that produces a 190 kDa polyprotein with autoproteolytic activity which generates the N-terminal NUP98 and C-terminal NUP96 proteins, part of the Nup107-Nup160 subcomplex [
,
]. The yeast homologue Nup145 undergoes the similar proteolytic event to produce Nup145N and Nup145C, which are part of the Nup84 complex. The function of the heptamer is to coat the curvature of the nuclear pore complex between the inner and outer nuclear membranes. Nup96, which is predicted to be an alpha helical solenoid, complexes with Sec13 in the middle of the heptamer. The interaction between Nup96 and Sec13 is the point of curvature in the heptameric complex [,
].The proteolytic cleavage site of yeast Nup145p has been mapped upstream of an evolutionary conserved serine residue. Then, Nup145C form the heptameric Y-complex together with six other proteins while Nup145N shuttle between the NPC and the nuclear interior. [
,
].Nup98, a component of the nuclear pore that plays its primary role in the export of RNAs, is expressed in two forms, derived from alternate mRNA splicing. Both forms are processed into two peptides through autoproteolysis mediated by the C-terminal domain of hNup98. The three-dimensional structure of the C-terminal domain reveals a novel protein fold, and thus a new class of autocatalytic proteases. The structure further reveals that the suggested nucleoporin RNA binding motif is unlikely to bind to RNA [
].The following nucleoporins share an ~150-residue C-terminal domain responsible for NPC targeting [
,
]:Vertebrate Nup98, a component of the nuclear pore that plays its primary role in the export of RNAs.
Yeast Nup100, plays an important role in several nuclear export and import pathways including poly(A)+ RNA and protein transport.
Yeast Nup116, involved in mRNA export and protein transport.
Yeast Nup145, involved in nuclear poly(A)+ RNA and tRNA export.The NUP C-terminal domains of Nup98 and Nup145 possess peptidase S59
autoproteolytic activity. The autoproteolytic sites of Nup98 and Nup145each occur immediately C-terminal to the NUP C-terminal domain. Thus, although
this domain occurs in the middle of each precursor polypeptide, it winds up atthe C-terminal end of the N-terminal cleavage product. Cleavage of the peptide
chains are necessary for the proper targeting to the nuclear pore [,
].The NUP C-terminal domain adopts a predominantly β-strand structure. The molecule consists of a six-stranded β-sheet sandwiched against a two-stranded β-sheet and flanked by α-helical regions. The N-terminal helical region consists of two short helices, whereas the stretch on the opposite side of molecule consists of a single, longer helix [
,
].
Cytochrome c oxidase assembly protein CtaG/Cox11, domain superfamily
Type:
Homologous_superfamily
Description:
Cytochrome c oxidase assembly protein is essential for the assembly of functional cytochrome oxidase protein. In eukaryotes it is an integral protein of the mitochondrial inner membrane. Cox11 is essential for the insertion of Cu(I) ions to form the CuB site. This is essential for the stability of other structures in subunit I, for example haems a and a3, and the magnesium/manganese centre. Cox11 is probably only required in sub-stoichiometric amounts relative to the structural units [
]. The C-terminal region of the protein is known to form a dimer. Each monomer coordinates one Cu(I) ion via three conserved cysteine residues (111, 208 and 210) in Saccharomyces cerevisiae (). Met 224 is also thought to play a role in copper transfer or stabilising the copper site [
].The copper binding motif is composed of two highly conserved cysteines and is located on one side of a β-barrel structure [
].
Uncharacterised conserved protein UCP036920, phosphoglycerate mutase, plant X4/Y4
Type:
Family
Description:
Silene latifolia (White campion) and Silene dioica (Red campion) are plants with separate sexes and with X and Y sex chromosomes. The presence or absence of the Y chromosome determines which sex organs will develop. Y4 is a Y-linked gene that has a homologue on the X chromosome, X4. Both Y4 and X4 are expressed and potentially encode enzymes related to phosphoglycerate mutase and fructose-2,6-bisphosphatase [
]. This group also includes sequences from plants that do not have separate sexes, such as Arabidopsis and other species of Silene, thus providing examples for comparison of substitution rates in the X- and Y-linked genes of plants [].
ER membrane protein complex subunit 7, beta-sandwich domain
Type:
Domain
Description:
This is the β-sandwich domain found in ER membrane protein complex subunit 7 (EMC7) [
,
], which is an integral membrane component of the EMC. EMC is widely conserved and involved in membrane protein biogenesis. It mediates the insertion into endoplasmic reticulum membranes of newly synthesized membrane proteins in an energy-independent manner, both post-translational insertion of tail-anchored proteins and co-translational insertion of multipass membrane proteins [,
]. This entry includes UPF0620 protein C83.10 from S. pombe, an orthologue from animal EMC7.This domain is also found in nodal modulators, which have been identified as part of a protein complex that participates in the nodal signalling pathway during vertebrate development [
].
DNA recombination and repair protein RecA, monomer-monomer interface
Type:
Domain
Description:
The recA gene product is a multifunctional enzyme that plays a role in homologous recombination, DNA repair and induction of the SOS response [
]. In homologous recombination, the protein functions as a DNA-dependent ATPase, promoting synapsis, heteroduplex formation and strand exchange between homologous DNAs []. RecA also acts as a protease cofactor that promotes autodigestion of the lexA product and phage repressors. The proteolytic inactivation of the lexA repressor by an activated form of recA may cause a derepression of the 20 or so genes involved in the SOS response, which regulates DNA repair, induced mutagenesis, delayed cell division and prophage induction in response to DNA damage []. RecA is a protein of about 350 amino acid residues. Its sequence is very well conserved [
,
,
] among eubacterial species. It is also found in the chloroplast of plants []. RecA-like proteins are found in archaea and diverse eukaryotic organisms, like fission yeast, mouse or human. In the filament visualised by X-ray crystallography, β-strand 3, the loop C-terminal to β-strand 2, and α-helix D of the core domain form one surface that packs against αa-helix A and β-strand 0 (the N-terminal domain) of an adjacent monomer during polymerisation []. The core ATP-binding site domain is well conserved, with 14 invariant residues. It contains the nucleotide binding loop between β-strand 1 and α-helix C. The Escherichia coli sequence GPESSGKT matches the consensus sequence of amino acids (G/A)XXXXGK(T/S) for the Walker A box (also referred to as the P-loop) found in a number of nucleoside triphosphate (NTP)-binding proteins. Another nucleotide binding motif, the Walker B box is found at β-strand 4 in the RecA structure. The Walker B box is characterised by four hydrophobic amino acids followed by an acidic residue (usually aspartate). Nucleotide specificity and additional ATP-binding interactions are contributed by the amino acid residues at β-strand 2 and the loop C-terminal to that strand, all of which are greater than 90% conserved among bacterial RecA proteins.The signature in this entry spans the entire monomer-monomer interface in RecA proteins.
DNA recombination and repair protein RecA-like, ATP-binding domain
Type:
Domain
Description:
The recA gene product is a multifunctional enzyme that plays a role in homologous recombination, DNA repair and induction of the SOS response [
]. In homologous recombination, the protein functions as a DNA-dependent ATPase, promoting synapsis, heteroduplex formation and strand exchange between homologous DNAs []. RecA also acts as a protease cofactor that promotes autodigestion of the lexA product and phage repressors. The proteolytic inactivation of the lexA repressor by an activated form of recA may cause a derepression of the 20 or so genes involved in the SOS response, which regulates DNA repair, induced mutagenesis, delayed cell division and prophage induction in response to DNA damage []. RecA is a protein of about 350 amino acid residues. Its sequence is very well conserved [
,
,
] among eubacterial species. It is also found in the chloroplast of plants []. RecA-like proteins are found in archaea and diverse eukaryotic organisms, like fission yeast, mouse or human. In the filament visualised by X-ray crystallography, β-strand 3, the loop C-terminal to β-strand 2, and α-helix D of the core domain form one surface that packs against αa-helix A and β-strand 0 (the N-terminal domain) of an adjacent monomer during polymerisation []. The core ATP-binding site domain is well conserved, with 14 invariant residues. It contains the nucleotide binding loop between β-strand 1 and α-helix C. The Escherichia coli sequence GPESSGKT matches the consensus sequence of amino acids (G/A)XXXXGK(T/S) for the Walker A box (also referred to as the P-loop) found in a number of nucleoside triphosphate (NTP)-binding proteins. Another nucleotide binding motif, the Walker B box is found at β-strand 4 in the RecA structure. The Walker B box is characterised by four hydrophobic amino acids followed by an acidic residue (usually aspartate). Nucleotide specificity and additional ATP-binding interactions are contributed by the amino acid residues at β-strand 2 and the loop C-terminal to that strand, all of which are greater than 90% conserved among bacterial RecA proteins.This entry represents the ATP-binding domain found in the N-terminal part of RecA proteins.
Type III secretion system inner membrane P protein
Type:
Family
Description:
Secretion of virulence factors in Gram-negative bacteria involves transportation of the protein across two membranes to reach the cell exterior [
]. There have been four secretion systems described in animal enteropathogens such as Salmonella and Yersinia, with further sequence similarities in plant pathogens like Ralstonia and Erwinia. The type III secretion system is of great interest as it is used to transport virulence factors from the pathogen directly into the host cell [] and is only triggered when the bacterium comes into close contact with the host. The protein subunits of the system are very similar to those of bacterial flagellar biosynthesis []. However, while the latter forms a ring structure to allow secretion of flagellin and is an integral part of the flagellum itself [], type III subunits in the outer membrane translocate secreted proteins through a channel-like structure. It is believed that the family of type III inner membrane proteins are used as structural moieties in a complex with several other subunits [], including the ATPase necessary for driving the secretion system.One such set of inner membrane proteins, termed "P"here for nomenclature purposes, includes the Salmonella and Shigella SpaP, the Yersinia YscR, the Erwinia HrcR, and the Xanthamonas Pro2 genes [
], as well as several FliP flagellar biosynthesis genes []. FliP is an ~30Kd protein containing three or four transmembrane (TM) regions.
Mitochondrial outer membrane transport complex protein Metaxin 1/3
Type:
Family
Description:
Metaxin is a mitochondrial outer membrane protein, which is involved in transport of proteins into the mitochondrion [
,
,
]. Metaxin extends into the cytosol while anchored into the outer mitochondrial membrane at its C terminus. This entry contains Mitochondrial outer membrane import complex protein Metaxin from Arabidopsis thaliana and similar proteins from plants as well as Metaxin 1/3 from animals.
LIM domain and actin-binding protein 1, Lim domain
Type:
Domain
Description:
LIM domain and actin-binding protein 1 (LIMA1, also know as EPLIN) is a cytoskeleton-associated protein that regulates actin dynamics by cross-linking and stabilising filaments []. It is a tumour suppressor whose expression inversely correlates with cell growth, motility, invasion and cancer mortality. It interacts and stabilizes F-actin filaments and stress fibers, which correlates with its ability to suppress anchorage independent growth [,
]. EPLIN was first identified as the product of a gene that is transcriptionally down-regulated or lost in a number of human epithelial tumour cells [,
]. In humans, there are two EPLIN isoforms, EPLIN alpha and EPLIN beta, both have a centrally located LIM domain that may mediate self-dimerisation. EPLIN inhibits Arp2/3 complex-mediated branching nucleation of actin filaments and stabilises actin filament networks []. EPLIN can be regulated through phophoryltion by extracellular signal-regulated kinase (ERK) [].This entry represents the Lim domain of EPLIN, which function as an adaptor or scaffold to support the assembly of multimeric protein. This domain shows two characteristic zinc finger motifs. The two zinc fingers contain eight conserved residues, mostly cysteines and histidines, which coordinately bond to two zinc atoms [
,
].
Cytochrome c oxidase cbb3 type, accessory protein FixG
Type:
Family
Description:
Member of this ferredoxin-like protein family, include FixG, CcoG and RdxA,. They are found exclusively in species with an operon encoding the cbb3 type of cytochrome c oxidase (cco-cbb3), and near the cco-cbb3 operon in about half the cases. The cco-cbb3 is found in a variety of proteobacteria and almost nowhere else; it is associated with oxygen use under microaerobic conditions. Some (but not all) of these proteobacteria are also nitrogen-fixing, hence the gene symbol fixG. FixG was shown essential for functional cco-cbb3 expression in Bradyrhizobium japonicum [
].
Type-F conjugative transfer system pilin assembly protein TraF
Type:
Family
Description:
This entry includes TraF; a protein that is part of a large group of proteins involved in conjugative transfer of plasmid DNA, specifically in the F-type system. TraF has been predicted to contain a thioredoxin fold and has been shown to be localized to the periplasm [
]. Unlike the related protein TrbB (), TraF does not contain a conserved pair of cysteines and has been shown not to function as a thiol disulphide isomerase by complementation of an Escherichia coli DsbA defect [
]. The protein is believed to be involved in pilin assembly []. Even more closely related than TrbB is a clade of genes (), which do contain the CXXC motif, but it is unclear whether these are involved in type-F conjugation systems per se.
Type-F conjugative transfer system pilin assembly protein TrbC
Type:
Family
Description:
This entry represents TrbC, a protein that is an essential component of the F-type conjugative pilus assembly system (aka type 4 secretion system) for the transfer of plasmid DNA [
,
]. The N-terminal portion of these proteins is heterogeneous.
Cell division control protein 73, C-terminal domain superfamily
Type:
Homologous_superfamily
Description:
CDC73 is an RNA polymerase II accessory factor [
], and forms part of the Paf1 complex that has roles in post-initiation events []. More specifically, crystal structure analysis shows the C terminus to be a Ras-like domain that adopts a fold that is highly similar to GTPases of the Ras superfamily. The C-terminal domain contains a large but comparatively flat surface of highly conserved residues, devoid of ligand. Deletion of the Cdc73 C-domain significantly reduces Paf1C occupancy on active genes, which means that the Cdc73 C-domain plays a role in promoting association of Paf1C with chromatin []. The canonical nucleotide binding pocket is altered in CDC73, and there is no nucleotide ligand, but it contributes to histone methylation and Paf1 complex (Paf1C) recruitment to active genes. Thus, together with Rtf1, it combines to couple Paf1C to elongating polymerase [
].
Krev interaction trapped protein 1, FERM domain C-lobe
Type:
Domain
Description:
KRIT1, also known as CCM1, a Rap1-binding protein, is expressed in endothelial cells where it is present in cell-cell junctions and associated with junctional proteins [
]. Together with CCM2/MGC4607 and CCM3/PDCD10, KRIT1 constitutes a set of proteins, mutations of which are found in cerebral cavernous malformations which are characterized by cerebral hemorrhages and vascular malformations in the central nervous system. KRIT-1 possesses four ankyrin repeats, a FERM domain, and multiple NPXY sequences, one of which is essential for integrin cytoplasmic domain-associated protein-1alpha (ICAP1alpha) binding and all of which mediate bindingof CCM2. KRIT-1 localization is mediated by its FERM domain [
].The FERM domain has a cloverleaf tripart structure composed of: (1) FERM_N (A-lobe or F1); (2) FERM_M (B-lobe, or F2); and (3) FERM_C (C-lobe or F3). The C-lobe/F3 within the FERM domain is part of the PH domain family. Like most other ERM members they have a phosphoinositide-binding site in their FERM domain. The FERM C domain is the third structural domain within the FERM domain. The FERM domain is found in the cytoskeletal-associated proteins such as ezrin, moesin, radixin, 4.1R, and merlin. These proteins provide a link between the membrane and cytoskeleton and are involved in signal transduction pathways. The FERM domain is also found in protein tyrosine phosphatases (PTPs) , the tyrosine kinases FAK and JAK, in addition to other proteins involved in signaling. This domain is structurally similar to the PH and PTB domains and consequently is capable of binding to both peptides and phospholipids at different sites [
,
].
Branched-chain amino acid transport system II carrier protein
Type:
Family
Description:
Characterised members of the branched chain Amino Acid:Cation Symporter (LIVCS) family transport all three of the branched chain aliphatic amino acids (leucine (L), isoleucine (I) and valine (V)) [
]. They function by a Na+ or H+ symport mechanism and display 12 putative transmembrane helical spanners.
Inhibitor of growth protein 1, PHD-type zinc finger
Type:
Domain
Description:
Members of the ING (inhibitor of growth protein) family of tumour suppressors regulate cell cycle progression, apoptosis, and DNA repair as important cofactors of p53. Inhibitor of growth protein 1 (ING1) is the best characterised member of this family of five genes (ING1-5) with conserved plant homeodomains (PHDs) [
].ING1 interacts with proliferating cell nuclear antigen (PCNA) and PCNA-interacting protein p15 (PAF) [
,
]. p53 and ING1 have been shown to functionally cooperate in the activation of apoptosis []. ING1 could also induce apoptosis by translocation from the nucleus to the mitochondria where interaction with BAX stabilises this protein, promoting its pro-apoptotic function []. ING1 functions in DNA demethylation [] and has a role in the early steps of multiple DNA repair pathways [].This entry represents the C-terminal plant homeodomain (PHD)-type zinc-finger domain of ING1.
Type III secretion system inner membrane R protein
Type:
Family
Description:
Secretion of virulence factors in Gram-negative bacteria involves
transportation of the protein across two membranes to reach the cell exterior [
]. There have been four secretion systems described in animal enteropathogens such as Salmonella and Yersinia, with further
sequence similarities in plant pathogens like Ralstonia and Erwinia []. The type III secretion system is of great interest, as it is used to
transport virulence factors from the pathogen directly into the host cell [
] and is only triggered when the bacterium comes into close contact withthe host. The protein subunits of the system are very similar to those of
bacterial flagellar biosynthesis []. However, while the latter forms aring structure to allow secretion of flagellin and is an integral part of
the flagellum itself [], type III subunits in the outer membranetranslocate secreted proteins through a channel-like structure.
It is believed that the family of type III inner membrane proteins are
used as structural moieties in a complex with several other subunits []. One such set of inner membrane proteins, labeled "R"here for nomenclature
purposes, includes the Salmonella and Shigella SpaR, the Yersinia YscT, Rhizobium Y4YN, and the Erwinia HrcT genes [
]. The flagellar protein FliR also shares similarity, probably due to evolution of the type III secretion
system from the flagellar biosynthetic pathway.
Outer dense fibre protein 1, alpha crystallin domain
Type:
Domain
Description:
ODF1 is a component of the outer dense fibres (ODF), which are cytoskeletal structures specifically found in the sperm tails of vertebrates. ODF1 has been assigned to the heat shock protein family based on its overall structural features and especially on its conserved alpha-crystallin domain, and it is also known as HSPB10 [
]. ODF1 is essential for tight linkage of sperm head to tail and is essential for male fertility [].This entry represents the alpha crystallin domain (ACD) of ODF1.
Protein of unknown function UCP017292, zinc finger, CHY-type
Type:
Family
Description:
Proteins in this family contain a CHY-type zinc finger (
). However, unlike other CHY domain-containing proteins, these do not also contain a RING-type zinc finger (
).
Most of the proteins in this entry are from bacteria, however, some eukaryotic proteins in this entry have been characterised, such as Hot13 (helper of Tim protein) from budding yeasts. Hot13 is required for the assembly or recycling of the small Tim proteins in the mitochondrial intermembrane, thereby participating in the import and insertion of multi-pass transmembrane proteins into the mitochondrial inner membrane [
].
Ribosomal protein S6 kinase alpha-3, C-terminal catalytic domain
Type:
Domain
Description:
This entry represents the C-terminal catalytic domain of ribosomal protein S6 kinase alpha-3.RPS6KA3 (also called RSK2) is expressed highly in the regions of the brain with high synaptic activity and plays a role in the maintenance and consolidation of excitatory synapses. It is a specific modulator of phospholipase D in calcium-regulated exocytosis [
]. Mutations in the RPS6KA3 gene cause Coffin-Lowry syndrome (CLS), a rare syndromic form of X-linked mental retardation characterised by growth and psychomotor retardation and skeletal abnormalities []. RSK2 is one of four RSK isoforms (RSK1-4) from distinct genes present in vertebrates.The most striking feature of RSKs is the presence of two functional and non-identical phosphotransferase domains [
]. RSKs contain an N-terminal kinase domain (NTD) that is characteristics of the AGC family and a C-terminal kinase domain (CTD) that is characteristics of the CAMK family. They are activated by signaling inputs from extracellular regulated kinase (ERK) and phosphoinositide dependent kinase 1 (PDK1). ERK phosphorylates and activates the CTD of RSK, serving as a docking site for PDK1, which phosphorylates and activates the NTD, which in turn phosphorylates all known RSK substrates. RSKs act as downstream effectors of mitogen-activated protein kinase (MAPK) and play key roles in mitogen-activated cell growth, differentiation, and survival [,
,
].
Type II secretion system protein GspD, conserved site
Type:
Conserved_site
Description:
A number of proteins are involved in the general secretion pathway (GSP); one of these is known as protein D (GSPD protein). Protein D is involved in the type II general secretion pathway within Gram-negative bacteria, a signal sequence-dependent process responsible for protein export [,
,
,
,
,
,
]. The most probable location of protein D is the outer membrane []. This suggests that protein D constitutes the apparatus of the accessory mechanism, and is thus involved in transporting exoproteins from the periplasm, across the outer membrane, to the extracellular environment.
Endoplasmic reticulum resident protein 44, TRX-like domain b'
Type:
Domain
Description:
ERp44 is an endoplasmic reticulum (ER)-resident protein that retrieves some ER-resident enzymes and immature oligomers of secretory proteins from the Golgi [
]. It is composed of three thioredoxin (Trx)-like domains (a, b,and b'), followed by a C-terminal extension (C tail). Domain a contains a rather unique CRFS motif. The cysteine in this motif (Cys29) is known to form mixed disulfide bonds with client proteins to promote their thiol-dependent retention. Domain a is followed by two redox inactive Trx-like domains, b and b' []. Interestingly, the interactions between ERp44 and client proteins are pH-dependent []. This entry represents domain b', the second redox inactive TRX-like domain of ERp44.
Sporulation stage II protein D, amidase enhancer LytB
Type:
Family
Description:
This entry describes a region which is found, typically in two or three proteins per genome, in Cyanobacteria and Firmicutes, and sporadically in other genomes. One example is SpoIID from Bacillus subtilis. Another, also from B. subtilis, is LytB which contains this region at the C terminus. LytB is encoded immediately upstream of an amidase, the autolysin LytC, and both these proteins show considerable homology in their N-terminal regions. Genes encoding proteins in this entry do not occur in conserved neighbourhoods, and many, such as SpoIID are monocistronic. One modeling study [
] has suggested that SpoIID may bind DNA, but the function of these proteins is so far unknown.
Nitrite and sulphite reductase 4Fe-4S domain containing protein
Type:
Family
Description:
Sulphite reductases (SiRs) and related nitrite reductases (NiRs) catalyse the six-electron reduction reactions of sulphite to sulphide, and nitrite to ammonia, respectively. The Escherichia coli SiR enzyme is a complex composed of two proteins, a flavoprotein alpha-component (SiR-FP) and a hemoprotein beta-component (SiR-HP), and has an alpha(8)beta(4) quaternary structure [
]. SiR-FP contains both FAD and FMN, while SiR-HP contains a Fe(4)S(4) cluster coupled to a sirohaem through a cysteine bridge. Electrons are transferred from NADPH to FAD, and on to FMN in SiR-FP, from which they are transferred to the metal centre of SiR-HP, where they reduce the siroheme-bound sulphite.
Endoplasmic reticulum resident protein 44, TRX-like domain b
Type:
Domain
Description:
This entry represents the first redox inactive TRX-like domain b found in endoplasmic reticulum resident protein 44 (ERp44). ERp44 is an endoplasmic reticulum (ER)-resident protein, induced during stress, and involved in thiol-mediated ER retention. It contains an N-terminal TRX domain with a CXFS motif followed by two redox inactive TRX-like domains, homologous to the b and b' domains of PDI. Through the formation of reversible mixed disulfides, ERp44 mediates the ER localization of Ero1alpha, a protein that oxidizes protein disulfide isomerases into their active form [,
]. ERp44 also prevents the secretion of unassembled cargo protein with unpaired cysteines. ERp44 also modulates the activity of inositol 1,4,5-triphosphate type I receptor (IP3R1), an intracellular channel protein that mediates calcium release from the ER to the cytosol []. Similar to PDI, the b domain of ERp44 is likely involved in binding to substrates.
Stage V sporulation protein AA, N-terminal domain superfamily
Type:
Homologous_superfamily
Description:
This domain superfamily is found in bacteria - primarily Firmicutes, and is approximately 90 amino acids in length. There is a single completely conserved residue G that may be functionally important. Most annotation associated with this domain suggests that it is involved in the fifth stage of sporulation, however there is little publication to back this up.
Effector protein HopAB, E3 ubiquitin ligase domain superfamily
Type:
Homologous_superfamily
Description:
HopAB family members are type III effector proteins that are secreted by the plant pathogen Pseudomonas syringae into the host plant to inhibit its immune system and facilitate the spread of the pathogen [
]. AvrPtoB, also called HopAB3, is the best studied member of the family. It suppresses host basal defenses by interfering with PAMP (pathogen-associated molecular signature)-triggered immunity (PTI) through binding and inhibiting BAK1, a kinase which serves to activate defense signaling []. It also recognizes the kinase Pto to activate effector-triggered immunity (ETI) [].AvrPtoB contains an N-terminal region that contains two kinase-interacting domains (KID) and a C-terminal E3 ligase domain. The first KID recognizes the PTI-associated kinase Bti9 as well as Pto, and is referred to as the Pto-binding domain (PID). The second KID interacts with BAK1 and FLS2, which are leucine-rich repeat-containing receptor-like kinases, and is called the BAK1-interacting domain (BID) [
,
]
. The family member HopPmaL is shorter and lacks the C-terminal E3 ligase domain [].The E3 ubiquitin ligase domain found in the bacterial protein AvrPtoB inhibits immunity-associated programmed cell death (PCD) when translocated into plant cells, probably by recruiting E2 enzymes and transferring ubiquitin molecules to cellular proteins involved in regulation of PCD and targeting them for degradation. The structure reveals a globular fold centred on a four-stranded β-sheet that packs against two helices on one face and has three very extended loops connecting the elements of secondary structure, with remarkable homology to the RING-finger and U-box families of proteins involved in ubiquitin ligase complexes in eukaryotes [
].
54S ribosomal protein L3, double-stranded RNA binding domain
Type:
Domain
Description:
MRPL3 (54S ribosomal protein L3) is a component of the mitochondrial ribosome (mitoribosome), a dedicated translation machinery responsible for the synthesis of mitochondrial genome-encoded proteins, including at least some of the essential transmembrane subunits of the mitochondrial respiratory chain. MRPL3 contains a RNase III-like domain and a double-stranded RNA binding motif (DSRM) [
,
]. DSRM is not sequence specific, but highly specific for dsRNAs of various origin and structure.This entry represents the DSRM of MRPL3 from fungi.
DNA mismatch repair protein MutS, core domain superfamily
Type:
Homologous_superfamily
Description:
Mismatch repair contributes to the overall fidelity of DNA replication and is essential for combating the adverse effects of damage to the genome. It involves the correction of mismatched base pairs that have been missed by the proofreading element of the DNA polymerase complex. The post-replicative Mismatch Repair System (MMRS) of Escherichia coli involves MutS (Mutator S), MutL and MutH proteins, and acts to correct point mutations or small insertion/deletion loops produced during DNA replication [
]. MutS and MutL are involved in preventing recombination between partially homologous DNA sequences. The assembly of MMRS is initiated by MutS, which recognises and binds to mispaired nucleotides and allows further action of MutL and MutH to eliminate a portion of newly synthesized DNA strand containing the mispaired base []. MutS can also collaborate with methyltransferases in the repair of O(6)-methylguanine damage, which would otherwise pair with thymine during replication to create an O(6)mG:T mismatch []. MutS exists as a dimer, where the two monomers have different conformations and form a heterodimer at the structural level []. Only one monomer recognises the mismatch specifically and has ADP bound. Non-specific major groove DNA-binding domains from both monomers embrace the DNA in a clamp-like structure. Mismatch binding induces ATP uptake and a conformational change in the MutS protein, resulting in a clamp that translocates on DNA. MutS is a modular protein with a complex structure [
], and is composed of:N-terminal mismatch-recognition domain, which is similar in structure to tRNA endonuclease.Connector domain, which is similar in structure to Holliday junction resolvase ruvC.Core domain, which is composed of two separate subdomains that join together to form a helical bundle; from within the core domain, two helices act as levers that extend towards (but do not touch) the DNA.Clamp domain, which is inserted between the two subdomains of the core domain at the top of the lever helices; the clamp domain has a β-sheet structure.ATPase domain (connected to the core domain), which has a classical Walker A motif.HTH (helix-turn-helix) domain, which is involved in dimer contacts.The MutS family of proteins is named after the Salmonella typhimurium MutS protein involved in mismatch repair. Homologues of MutS have been found in many species including eukaryotes (MSH 1, 2, 3, 4, 5, and 6 proteins), archaea and bacteria, and together these proteins have been grouped into the MutS family. Although many of these proteins have similar activities to the E. coli MutS, there is significant diversity of function among the MutS family members. Human MSH has been implicated in non-polyposis colorectal carcinoma (HNPCC) and is a mismatch binding protein [
].This diversity is even seen within species, where many species encode multiple MutS homologues with distinct functions []. Inter-species homologues may have arisen through frequent ancient horizontal gene transfer of MutS (and MutL) from bacteria to archaea and eukaryotes via endosymbiotic ancestors of mitochondria and chloroplasts []. The core domain of MutS adopts a multi-helical structure comprised of two subdomains, which are interrupted by the clamp domain. Two of the helices in the core domain comprise the levers that extend towards the DNA. This domain is found associated with Pfam:PF00488, Pfam:PF05188, Pfam:PF01624 and Pfam:PF05190. The aligned region corresponds with domain III, which is central to the structure of Thermus aquaticus MutS [
].
mRNA decapping protein 2, Box A domain superfamily
Type:
Homologous_superfamily
Description:
This superfamily represents the box A specific to mRNA decapping protein 2
, Dcp2, the catalytic subunit of the mRNA-decapping enzyme [
], which is found at the N-terminal of the NUDIX hydrolase domain . Removal of the cap structure is mediated by the Dcp1-Dcp2 complex [
].The Box A domain contains a four helices fold and an orthogonal array of two alpha hairpins topology.
Glutamate synthase, large subunit domain 3 stand-alone protein
Type:
Family
Description:
The large (alpha, GltB) subunit of bacterial glutamate synthase (GOGAT) consists of three domains: N-terminal domain (amidotransferase domain) or related (in archaeal GOGAT), central domain and the FMN-binding domain, and C-terminal domain. This family represents a stand-alone form of the C-terminal domain. The stand-alone form occurs in the archaeal type of GOGAT, where the large subunit is represented by three separate proteins, corresponding to the three domains of the "standard"bacterial enzyme [
]. Similar organisation of GOGAT with stand-alone domains has been found in some bacteria (e.g., members from Sinorhizobium meliloti, Thermotoga maritima), but its function is not clear in those organisms where the "standard"bacterial form is also present (e.g., Sinorhizobium meliloti).
This domain is also called the GXGXG structural domain, containing repeated sequence motif G-XX-G-XXX-G). It has a right-handed β-helix topology composing seven β-helical turns. It does not have a direct function in glutamate synthase activity but rather a structural function through extensive interactions with the amidotransferase and FMN-binding domains [
,
].Originally, only the ORF encoding the central domain of GOGAT has been recognised and annotated as GltB in archaea, and the rest of the large subunit was thought to be missing, which may lead to some miss-annotations [
]. This has led to speculations that the archaeal form of the GOGAT large subunit is the ancestral minimum form of the enzyme. Later analysis showed, however, that in all archaea where the large subunit has been found, its entire sequence is represented by three separate ORFs [].Glutamate synthase (GOGAT, GltS) is a complex iron-sulphur flavoprotein that catalyses the reductive synthesis of L-glutamate from 2-oxoglutarate and L-glutamine via intramolecular channelling of ammonia, a reaction in the bacterial, yeast and plant pathways for ammonia assimilation [
]. GOGAT is a multifunctional enzyme that performs L-glutamine hydrolysis, conversion of 2-oxoglutarate into L-glutamate, and electron uptake from an electron donor [].There are four classes of GOGAT [
,
]: 1. Bacterial NADPH-dependent GOGAT (NADPH-GOGAT,
). This standard bacterial NADPH-GOGAT is composed of a large (alpha, GltB) subunit and a small (beta, GltD) subunit.
2. Ferredoxin-dependent form in cyanobacteria and plants (Fd-GOGAT,
) displays a single-subunit structure corresponding to the large bacterial subunit.
3. Pyridine-linked form in both photosynthetic and nonphotosynthetic eukaryotes (eukaryotic GOGAT or NADH-GOGAT,
) displays a single-subunit structure corresponding to the fusion of the small and the large bacterial subunits (
).
4. The archaeal type with stand-alone proteins corresponding to the N-terminal, FMN-binding, and the C-terminal domains of the large subunit [
,
] (,
,
), and to the small subunit.
MPH1 is a photosystem II (PSII) associated protein that participates in the maintenance of normal PSII activity under photoinhibitory stress, protecting PSII against photooxidative damage [
,
].
Biorientation of chromosomes in cell division protein 1-like
Type:
Family
Description:
This entry includes biorientation of chromosomes in cell division protein 1 (Bod1), and Bod1-like2 (BOD1L2). Bod1 is required for proper chromosome biorientation through the detection or correction of syntelic attachments in mitotic spindles [
]. The function of BOD1L2 is not clear.
Zinc finger MYND domain-containing protein 11, PWWP domain
Type:
Domain
Description:
This entry represents the PWWP domain of ZMY11, which specifically recognises DNA and histone methylated lysines.
Zinc finger MYND domain-containing protein 11 (ZMYND11 or ZMY11, also known as protein BS69) is a ubiquitously expressed nuclear protein acting as a transcriptional co-repressor in association with various transcription factors [
]. This protein specifically recognises H3K36me3 on H3.3 (H3.3K36me3) and regulates RNA polymerase II elongation []. It is critical for the repression of a transcriptional program that is essential for tumour cell growth. It was originally identified as an adenovirus 5 E1A-binding protein that inhibits E1A transactivation, as well as c-Myb transcription [,
, ]. It also mediates repression, at least in part, through interaction with the co-repressor N-CoR []. Moreover, it interacts with Toll-interleukin 1 receptor domain (TIR)-containing adaptor molecule-1 (TICAM-1, also named TRIF) to facilitate NF-kappaB activation and type I IFN induction. It associates with PIAS1, a SUMO E3 enzyme, and Ubc9, a SUMO E2 enzyme, and plays an inhibitory role in muscle and neuronal differentiation [].ZMY11 regulates Epstein-Barr virus (EBV) latent membrane protein 1 (LMP1)/C-terminal activation region 2 (CTAR2)-mediated NF-kappaB activation by interfering with the complex formation between TNFR-associated death domain protein (TRADD) and LMP1/CTAR2 [
,
]. It also cooperates with tumour necrosis factor receptor (TNFR)-associated factor 3 (TRAF3) in the regulation of EBV-derived LMP1/CTAR1-induced NF-kappaB activation []. Furthermore, ZMY11 is involved in the p53-p21Cip1-mediated senescence pathway []. ZMY11 contains a plant homeodomain (PHD) finger, a bromodomain, a proline-tryptophan-tryptophan-proline (PWWP) domain, and a MYeloid translocation protein 8, Nervy and DEAF-1 (MYND) domain [
,
].
Tripartite motif-containing protein 54, B-box-type 2 zinc finger
Type:
Domain
Description:
Tripartite motif-containing protein 54 (Trim54, also known as MuRF-3) is involved in the muscle growth and development [
]. It serves as a regulator of the microtubule network of striated muscle cells []. It can function as a E3 ubiquitin ligase in ubiquitin-mediated muscle protein turnover [].This entry represents the B-box-type 2 zinc finger of Trim54, which is characterized by a CHC3H2 zinc-binding consensus motif.
Endoplasmic reticulum resident protein 29, C-terminal domain superfamily
Type:
Homologous_superfamily
Description:
ERp29 (also known as ERp28 and ERp31) is a ubiquitously expressed endoplasmic reticulum protein found in mammals [
]. This protein has an N-terminal thioredoxin-like domain, which is homologous to the domain of human protein disulphide isomerase (PDI). ERp29 may help mediate the chaperone function of PDI. The C-terminal Erp29 domain has a 5-helical bundle fold. ERp29 is thought to form part of the thyroglobulin folding complex []. The Drosophila homologue, Wind, is the product of windbeutel, an essential gene in the development of dorsal-ventral patterning. Wind is required for correct targeting of Pipe, a Golgi-resident type II transmembrane protein with homology to 2-O-sulfotransferase. The C-terminal domain of Wind is thought to provide a distinct site required for interaction with its substrate, Pipe [
].
Glutamate synthase large subunit domain 1 stand-alone protein
Type:
Family
Description:
The large (alpha, GltB) subunit of bacterial glutamate synthase (GOGAT, GltS) consists of three domains. This entry represents a stand-alone version of the N-terminal amidotransferase domain that is found in archaeal GOGAT, where the large subunit is represented by three separate proteins corresponding to the three domains of the "standard"bacterial enzyme [
]. Similar organisation of GOGAT with stand-alone domains has been found in some bacteria (e.g., Sinorhizobium meliloti and Thermotoga maritima), but its function is not clear in those organisms where the "standard"(integrated) bacterial form is also present (e.g., Sinorhizobium meliloti).
The amidotransferase domain of the bacterial GOGAT is characterised by a four layer alpha/beta/beta/alpha architecture [] and contains the typical catalytic centre. The N-terminal Cys-1 catalyses the hydrolysis of L-glutamine generating ammonia and the first molecule of L-glutamate [].Originally, only the ORF encoding the central domain of GOGAT was recognised and annotated as GltB in archaea, and the rest of the large subunit was thought to be missing, which may lead to some misannotations [
]. This led to speculation that the archaeal form of the GOGAT large subunit was the ancestral minimal form of the enzyme. Later analysis showed, however, that in all archaea where the large subunit has been found, its entire sequence is represented by three separate ORFs [].Glutamate synthase is a complex iron-sulphur flavoprotein that catalyses the reductive synthesis of L-glutamate from 2-oxoglutarate and L-glutamine via intramolecular channeling of ammonia, a reaction in the bacterial, yeast and plant pathways for ammonia assimilation [
]. GOGAT is a multifunctional enzyme that functions through three distinct active centres carrying out multiple reaction steps: L-glutamine hydrolysis, conversion of 2-oxoglutarate into L-glutamate, and electron uptake from an electron donor [].
Mitogen-activated protein kinase kinase kinase 2/3, PB1 domain
Type:
Domain
Description:
The PB1 domain is present in the two mitogen-activated protein kinase kinases MEKK2 and MEKK3 which are two members of the signaling kinase cascade involved in angiogenesis and early cardiovascular development. The PB1 domain of MEKK2 (and/or MEKK3) interacts with the PB1 domain of another member of the kinase cascade, MAP2K5/MEK5 [
].A canonical PB1-PB1 interaction, which involves heterodimerization of two PB1 domains, is required for the formation of macromolecular signaling complexes ensuring specificity and fidelity during cellular signaling. The interaction between two PB1 domain depends on the type of PB1. There are three types of PB1 domains: type I which contains an OPCA motif, acidic aminoacid cluster, type II which contains a basic cluster, and type I/II which contains both an OPCA motif and a basic cluster. Interactions of PB1 domains with other protein domains have been described as noncanonical PB1-interactions [
]. The MEKK2 and MEKK3 proteins contain a type II PB1 domain.
ABC transporter, vitamin B12 import, permease protein BtuC
Type:
Family
Description:
ABC transporters belong to the ATP-Binding Cassette (ABC) superfamily, which uses the hydrolysis of ATP to energise diverse biological systems. ABC transporters minimally consist of two conserved regions: a highly conserved ATP binding cassette (ABC) and a less conserved transmembrane domain (TMD). These can be found on the same protein or on two different ones. Most ABC transporters function as a dimer and therefore are constituted of four domains, two ABC modules and two TMDs.Most bacterial importers employ a periplasmic substrate-binding protein (PBP) that delivers the ligand to the extracellular gate of the TM domains. These proteins bind their substrates selectively and with high affinity, which is thought to ensure the specificity of the transport reaction. Binding proteins in Gram-negative bacteria are present within the periplasm, whereas those in Gram-positive bacteria are tethered to
the cell membrane via the acylation of a cysteine residue that is an integralcomponent of a lipoprotein signal sequence. In planta expression of a high-affinity iron-uptake system involving the siderophore chrysobactin in Erwinia chrysanthemi 3937 contributes greatly to invasive growth of this pathogen on its natural host, African violets [
]. The cobalamin (vitamin B12) andthe iron transport systems share many common attributes and probably evolved
from the same origin [,
].This entry represents bacterial BtuC, which is a component of the vitamin B12 ABC transporter complex [
]. The BtuC proteins are the membrane-spanning subunits which engage with the ATP binding cassette BtuD. Its crystal structure has been resolved [].
This entry represents the RNA recognition motif 1 (RRM1) of DND1, an RNA-binding protein that is essential for maintaining viable germ cells in vertebrates [
,
]. It interacts with the 3'-untranslated region (3'-UTR) of multiple messenger RNAs (mRNAs) and prevents micro-RNA (miRNA) mediated repression of mRNA [,
]. For instance, DND1 binds cell cycle inhibitor, P27 (p27Kip1, CDKN1B), and cell cycle regulator and tumor suppressor, LATS2 (large tumor suppressor, homologue 2 of Drosophila) []. It helps maintain their protein expression through blocking the inhibitory function of microRNAs (miRNA) from these transcripts. DND1 may also impose another level of translational regulation to modulate expression of critical factors in embryonic stem (ES) cells. DND1 interacts specifically with apolipoprotein B editing complex 3 (APOBEC3), a multi-functional protein inhibiting retroviral replication. The DND1-APOBEC3 interaction may play a role in maintaining viability of germ cells and for preventing germ cell tumor development [
]. DND1 contains two conserved RNA recognition motifs (RRMs).
Probable RNA-binding protein 19, RNA recognition motif 2
Type:
Domain
Description:
This entry represents the RNA recognition motif 2 (RRM2) of RNA-binding protein 19 (RBM19).RBM19 (also known as RBD-1), a nucleolar protein conserved in eukaryotes. It is involved in ribosome biogenesis by processing rRNA and is also essential for preimplantation development in mice [
,
]. RBM19 has a unique domain organization containing 6 conserved RNA recognition motifs (RRMs).
Probable RNA-binding protein 19, RNA recognition motif 1
Type:
Domain
Description:
This entry represents the RNA recognition motif 1 (RRM1) of RNA-binding protein 19 (RBM19).RBM19 (also known as RBD-1), a nucleolar protein conserved in eukaryotes. It is involved in ribosome biogenesis by processing rRNA and is also essential for preimplantation development in mice [
,
]. RBM19 has a unique domain organization containing 6 conserved RNA recognition motifs (RRMs).
Probable RNA-binding protein 19, RNA recognition motif 3
Type:
Domain
Description:
This entry represents the RNA recognition motif 3 (RRM3) of RBM19.RBM19 (also known as RBD-1), a nucleolar protein conserved in eukaryotes. It is involved in ribosome biogenesis by processing rRNA and is also essential for preimplantation development in mice [
,
]. RBM19 has a unique domain organization containing 6 conserved RNA recognition motifs (RRMs).
Probable RNA-binding protein 19, RNA recognition motif 4
Type:
Domain
Description:
This entry represents the RNA recognition motif 4 (RRM4) of RBM19.RBM19 (also known as RBD-1), a nucleolar protein conserved in eukaryotes. It is involved in ribosome biogenesis by processing rRNA and is also essential for preimplantation development in mice [
,
]. RBM19 has a unique domain organization containing 6 conserved RNA recognition motifs (RRMs).
Probable RNA-binding protein 19, RNA recognition motif 6
Type:
Domain
Description:
This entry represents the RNA recognition motif 6 (RRM6) of RBM19.RBM19 (also known as RBD-1), a nucleolar protein conserved in eukaryotes. It is involved in ribosome biogenesis by processing rRNA and is also essential for preimplantation development in mice [
,
]. RBM19 has a unique domain organization containing 6 conserved RNA recognition motifs (RRMs).
Probable RNA-binding protein 19, RNA recognition motif 5
Type:
Domain
Description:
This entry represents the RNA recognition motif 1 (RRM1) of RBM19.RBM19 (also known as RBD-1), a nucleolar protein conserved in eukaryotes. It is involved in ribosome biogenesis by processing rRNA and is also essential for preimplantation development in mice [
,
]. RBM19 has a unique domain organization containing 6 conserved RNA recognition motifs (RRMs).
Probable RNA-binding protein 46, RNA recognition motif 1
Type:
Domain
Description:
RBM46 is also termed cancer/testis antigen 68 (CT68), a putative RNA-binding protein that shows high sequence homology with heterogeneous nuclear ribonucleoprotein R (hnRNP R) and heterogeneous nuclear ribonucleoprotein Q (hnRNP Q). Its biological function remains unclear. It has been shown to bind to and stabilize Cdx2 mRNA in early mouse embryos [
]. Like hnRNP R and hnRNP Q, RBM46 contains two well-defined and one degenerated RNA recognition motifs (RRMs) and a C-terminal double-stranded RNA binding motif (DSRM).This entry represents the RNA recognition motif 1 (RRM1) of RBM46.
Probable RNA-binding protein 46, RNA recognition motif 2
Type:
Domain
Description:
RBM46 is also termed cancer/testis antigen 68 (CT68), a putative RNA-binding protein that shows high sequence homology with heterogeneous nuclear ribonucleoprotein R (hnRNP R) and heterogeneous nuclear ribonucleoprotein Q (hnRNP Q). Its biological function remains unclear. It has been shown to bind to and stabilize Cdx2 mRNA in early mouse embryos [
]. Like hnRNP R and hnRNP Q, RBM46 contains two well-defined and one degenerated RNA recognition motifs (RRMs) and a C-terminal double-stranded RNA binding motif (DSRM).This entry represents the RNA recognition motif 2 (RRM2) of RBM46.
Zinc finger protein 638, RNA recognition motif 1/2
Type:
Domain
Description:
Zinc finger protein 638 (ZNF638, also known as NP220) is a transcription factor that binds to cytidine clusters in double-stranded DNA [
,
]. It interacts with splicing regulators and influences alternative splicing and may control adipogenesis through regulation of the relative amounts of differentiation-specific isoforms [,
]. ZNF638 also mediates transcriptional repression of unintegrated viral DNA by specifically binding to the cytidine clusters of retroviral DNA and mediating the recruitment of chromatin silencers, such as the HUSH complex, SETDB1 and the histone deacetylases HDAC1 and HDAC4 []. This entry represents the two RNA recognition motifs (RRM) of ZNF638.
RNA-binding protein Musashi homologue, RNA recognition motif 2
Type:
Domain
Description:
This entry represents the RNA recognition motif 2 (RRM2) of Musashi-1, which is a highly conserved RNA binding protein that was initially identified in Drosophila by its ability to regulate sensory organ development and asymmetric cell division [
]. Mammalian Musashi-1 has multiple functions in normal and abnormal processes by mediating different post-transcriptional processes. It has been implicated in the maintenance of the stem-cell state, differentiation, and tumorigenesis. It translationally regulates the expression of a mammalian numb gene by binding to the 3'-untranslated region of mRNA of Numb, encoding a membrane-associated inhibitor of Notch signaling, and further influences neural development []. Moreover, Musashi-1 represses translation by interacting with the poly(A)-binding protein and competes for binding of the eukaryotic initiation factor-4G (eIF-4G) [].Proteins containing this domain also includes Musashi-2, which has been identified as a regulator of the hematopoietic stem cell (HSC) compartment and of leukemic stem cells after transplantation of cells with loss and gain of function of the gene [
]. It influences proliferation and differentiation of HSCs and myeloid progenitors, and further modulates normal hematopoiesis and promotes aggressive myeloid leukemia [,
]. Musashi-1 and Musashi-2 contain two conserved N-terminal tandem RNA recognition motifs (RRMs), also termed RBDs (RNA binding domains) or RNPs (ribonucleoprotein domains), along with other domains of unknown function
Mitochondrial distribution and morphology protein family 31/32, fungi
Type:
Family
Description:
Proteins in this family are yeast mitochondrial inner membrane proteins Mdm31 and Mdm32. They are required for the maintenance of mitochondrial morphology, and the stability of mitochondrial DNA [
]. Mdm31 plays important roles in phospholipid biosynthesis in mitochondria [].
DNA mismatch repair protein Msh2, ATP-binding cassette domain
Type:
Domain
Description:
This entry represents the DNA mismatch repair protein Msh2 (homologous to bacterial MutS) from eukaryotes. Msh2-Msh6 complex recognises base pair mismatches and small insertion/deletions in DNA and initiates repair [
]. Human Msh2-Msh6 complex has been shown to regulate BLM helicase in response to the damaged DNA forks during double-stranded break repair []. Mismatch repair (MMR) is one of five major DNA repair pathways. The mismatch repair system recognises and repairs mispaired or unpaired nucleotides that result from errors in DNA replication. The most extensively studied general MMR system is the MutHLS pathway of the bacterium Escherichia coli. In the first step of the MutHLS pathway, the MutS protein (in the form of a dimer) binds to the site of a mismatch in double-stranded DNA. Through a complex interaction between MutS, MutL and MutH, a section of the newly replicated DNA strand (and thus the strand with the replication error) at the location of the mismatch bound by MutS is targeted for removal [
]. Homologues of MutS have been found in many species including eukaryotes, Archaea and other bacteria, and together these proteins have been grouped into the MutS family.This entry represents the ATP-binding cassette domain of Msh2.
Atypical protein kinase C iota type, catalytic domain
Type:
Domain
Description:
PKCs are classified into three groups (classical, atypical, and novel) depending on their mode of activation and the structural characteristics of their N-terminal regulatory domain [
,
]. Atypical protein kinases C (aPKCs) have a PB1 and an atypical C1 domain, which only accepts phosphatidylserine [].In mammals there are two aPKC isoforms, zeta and iota/lambda (iota is the human orthologue and lambda the mouse orthologue) [
]. aPKCs are involved in many cellular functions including proliferation, migration, apoptosis, polarity maintenance and cytoskeletal regulation [,
]. They also play a critical role in the regulation of glucose metabolism and in the pathogenesis of type 2 diabetes [,
].PKC-iota is directly implicated in carcinogenesis [
]. It is critical to oncogenic signalling mediated by Ras and Bcr-Abl. The PKC-iota gene is the target of tumour-specific gene amplification in many human cancers, and has been identified as a human oncogene. In addition to its role in transformed growth, PKC-iota also promotes invasion, chemoresistance, and tumour cell survival. Expression profiling of PKC-iota is a prognostic marker of poor clinical outcome in several human cancers []. PKC-iota also plays a role in establishing cell polarity, and has critical embryonic functions [].
Vitelline membrane outer layer protein I (VOMI) superfamily
Type:
Homologous_superfamily
Description:
Vitelline membrane outer layer protein I (VMO-I) is one of the proteins found in the outer layer of the vitelline membrane of eggs. VMO-I, lysozyme, and VMO-II are bound tightly to ovomucin fibrils of the egg yolk membrane. The structure of VMO-I [
] consists of three β-sheets forming Greek key motifs, which are related by an internal pseudo three-fold symmetry. It is a member of the β-prism-fold family and the structure of VOM-I has strong similarity to the structure of the delta-endotoxin, as well as a carbohydrate-binding site in the top region of the common fold []. VMO-I has been shown to synthesize N-acetylchito-oligosaccharides from N-acetylglucosamine.
Type II secretion system protein GspN, conserved site
Type:
Conserved_site
Description:
GspN is a cytoplasmic membrane component of the type II secretion system (T2SS). It is present in T2SS operons of a subset of Gram-negative speciesI, such as Aeromonas hydrophila (gene exeN); Erwinia carotovora (gene outN); Klebsiella pneumoniae (gene pulN); or Vibrio cholerae (gene epsN) [
]. The size of the 'N' protein is around 250 amino acids. It apparently contains a single transmembrane domain located in the N-terminal section. The short N-terminal domain ispredicted to be cytoplasmic and the large C-terminal domain periplasmic.
This entry represents a conserved site found towards the N terminus of bacterial type II secretion system protein GspN.
White spot syndrome virus structural envelope protein Vp28
Type:
Domain
Description:
This family of proteins is found in viruses. Proteins in this family are approximately 210 amino acids in length. There is a conserved NNT sequence motif. These proteins are structural envelope proteins in viruses. This is the beta barrel C-terminal domain. There is a protruding N-terminal domain which completes the proteins. Three of four envelope proteins in Shrimp white spot syndrome virus share sequence homology with each other and are present in this family - VP24, VP26 and VP28. VP19 is the other major envelope protein but shares no sequence homology with the other proteins. These proteins are essential for entry into cells of the crustacean host.
ABC transporter, methionine import, ATP-binding protein MetN, proteobacteria
Type:
Family
Description:
ABC transporters belong to the ATP-Binding Cassette (ABC) superfamily, which uses the hydrolysis of ATP to energise diverse biological systems. ABC transporters minimally consist of two conserved regions: a highly conserved ATP binding cassette (ABC) and a less conserved transmembrane domain (TMD). These can be found on the same protein or on two different ones. Most ABC transporters function as a dimer and therefore are constituted of four domains, two ABC modules and two TMDs.ABC transporters are involved in the export or import of a wide variety of substrates ranging from small ions to macromolecules. The major function of ABC import systems is to provide essential nutrients to bacteria. They are found only in prokaryotes and their four constitutive domains are usually encoded by independent polypeptides (two ABC proteins and two TMD proteins). Prokaryotic importers require additional extracytoplasmic binding proteins (one or more per systems) for function. In contrast, export systems are involved in the extrusion of noxious substances, the export of extracellular toxins and the targeting of membrane components. They are found in all living organisms and in general the TMD is fused to the ABC module in a variety of combinations. Some eukaryotic exporters encode the four domains on the same polypeptide chain [
].The ABC module (approximately two hundred amino acid residues) is known to bind and hydrolyse ATP, thereby coupling transport to ATP hydrolysis in a large number of biological processes. The cassette is duplicated in several subfamilies. Its primary sequence is highly conserved, displaying a typical phosphate-binding loop: Walker A, and a magnesium binding site: Walker B. Besides these two regions, three other conserved motifs are present in the ABC cassette: the switch region which contains a histidine loop, postulated to polarise the attaching water molecule for hydrolysis, the signature conserved motif (LSGGQ) specific to the ABC transporter, and the Q-motif (between Walker A and the signature), which interacts with the gamma phosphate through a water bond. The Walker A, Walker B, Q-loop and switch region form the nucleotide binding site [,
,
].The 3D structure of a monomeric ABC module adopts a stubby L-shape with two distinct arms. ArmI (mainly β-strand) contains Walker A and Walker B. The important residues for ATP hydrolysis and/or binding are located in the P-loop. The ATP-binding pocket is located at the extremity of armI. The perpendicular armII contains mostly the alpha helical subdomain with the signature motif. It only seems to be required for structural integrity of the ABC module. ArmII is in direct contact with the TMD. The hinge between armI and armII contains both the histidine loop and the Q-loop, making contact with the gamma phosphate of the ATP molecule. ATP hydrolysis leads to a conformational change that could facilitate ADP release. In the dimer the two ABC cassettes contact each other through hydrophobic interactions at the antiparallel β-sheet of armI by a two-fold axis [
,
,
,
,
,
].The ATP-Binding Cassette (ABC) superfamily forms one of the largest of all protein families with a diversity of physiological functions [
]. Several studies have shown that there is a correlation between the functional characterisation and the phylogenetic classification of the ABC cassette [,
]. More than 50 subfamilies have been described based on a phylogenetic and functional classification [,
,
].Members of this family are the ATP-binding protein of the D-methionine ABC transporter complex from proteobacteria.
Autographa californica nuclear polyhedrosis virus (AcMNPV), Protein AC78
Type:
Family
Description:
This entry represents Protein AC78 from Autographa californica nuclear polyhedrosis virus (AcMNPV) and similar proteins from Baculovirus. AC78 is a late gene in the viral life cycle and encodes an envelope structural protein that plays an essential role in embedding the occlusion-derived virus (ODV) in the occlusion body []. Although AC78 is not essential for budding virus formation or nucleocapsid assembly and ODV formation, number are significantly reduced if the gene is knocked-out [].
Programmed cell death protein 10, dimerisation domain superfamily
Type:
Homologous_superfamily
Description:
Programmed cell death 10 protein (PDCD10/CCM3) is part of the CCM complex and is required for neuronal migration [
]. Outside of this complex, it is crucial in vascularization and in angiogenesis as it functions in vessel permeability and stability []. This protein plays an essential role in early embryonic angiogenesis and cardiovascular development. PDCD10/CCM3 interacts with a variety of proteins, including paxillin, membrane receptor vascular epidermal growth factor receptor 2, CCM complex component CCM2 and germinal centre kinase III proteins (GCKIII). PDCD10/CCM3 contains an N-terminal dimerisation domain and a C-terminal focal adhesion targeting-homology (FAT-H) domain [,
,
].This entry represents the N-terminal dimerisation domain of PDC10/CCM3, consisting of four α-helices. This domain is also found at the C-terminal of GCKIIIs (STK24/MST3, STK25, STK26/MST4), which adopts a closely related fold. GCKIIIs are involved in the regulation of apoptosis, cell proliferation, polarity, migration, and cytoskeleton remodelling. This domain mediates homo and heterodimerization. PDCD10/CCM3 forms a heterodimer with GCKIIIs analogous to CCM3 homodimer [
,
].
Beet necrotic yellow vein virus, movement protein TGB3
Type:
Family
Description:
This entry is represented by Beet necrotic yellow vein virus, movement protein TGB3 (also known as p15) which participates in the transport of viral RNA to the plasmodesmata [
].
Non-structural protein NSP3, SUD-C domain, bat CoV HKU9-like
Type:
Domain
Description:
This entry represents the SUD-C domain of Rousettus bat coronavirus (CoV) HKU9 non-structural protein 3 (NSP3) and other NSP3s from betacoronaviruses in the nobecovirus subgenera (D lineage).NSP3 of SARS coronavirus includes a SARS-unique domain (SUD) consisting of three globular domains separated by short linker peptide segments: SUD-N, SUD-M, and SUD-C. SUD-N and SUD-M are macro domains which bind G-quadruplexes (unusual nucleic-acid structures formed by consecutive guanosine nucleotides) [
]. SUD is not as specific to SARS CoV as originally thought and is also found in Rousettus bat CoV HKU9 and related bat CoVs []. Similar to SARS SUD-C, Rousettus bat CoV HKU9 SUD-C (HKU9 C), also adopts a frataxin-like fold that has structural similarity to DNA-binding domains of DNA-modifying enzymes. However, there is little sequence similarity between the two domains. SARS SUD-C has been shown to bind to single-stranded RNA and recognize purine bases more strongly than pyrimidine bases; it also regulates the RNA binding behavior of the SARS SUD-M macrodomain. It is not known whether HKU9 C functions in the same way [].
Autographa californica nuclear polyhedrosis virus (AcMNPV), Protein AC18
Type:
Family
Description:
This entry represents Protein AC18 from Autographa californica nuclear polyhedrosis virus (AcMNPV) and similar proteins from the viral family Baculoviridae. AC18 may play a role in occlusion-derived virions (ODV) formation and/or regulation of late viral gene expression [
]. It interacts with the protein FP25, which is related to polyhedrin formation [].
RecQ-mediated genome instability protein 1, C-terminal OB-fold domain
Type:
Domain
Description:
The dissolvasome is a protein complex that consists of BLM, a RecQ helicase that is product of the gene mutated in Bloom syndrome, topoisomerase IIIalpha (Top3alpha) and the RMI (RecQ-mediated genome instability) subcomplex, comprised of RMI1 and RMI2. The dissolvasome acts on double Holliday junction intermediates in homologous recombination, creating non-crossover products in a process termed 'dissolution' [
,
].This entry represents the C-terminal oligo-nucleotide binding domain of Recq-mediated genome instability protein RMI1. This domain interacts with RMI2-OB folds to make up the RMI core complex. The RMI core interface is crucial for dissolvasome assembly and may have additional cellular roles as a docking hub for other proteins [
,
].
Type VI secretion system (T6SS), amidase immunity protein
Type:
Family
Description:
Tai4 is a new form of autoimmunity protein for a type VI secretion system, T6SS. T6SS has roles in interspecies interactions, as well as higher order host-infection, by injecting effector proteins into the periplasmic compartment of the recipient cells of closely related species. Pseudomonas aeruginosa produces at least three effector proteins to other cells and thus has three specific cognate immunity proteins to protect itself. Tae4, or type VI amidase effector 4, in Enterobacter cloacae has a cognate Tai4 or type VI amidase immunity 4 protein [
]. The effector is Tae4 ().
Fibronectin type III domain-containing protein 5, C-terminal domain
Type:
Domain
Description:
This domain is found in fibronectin type III domain-containing protein 5 (FNDC5). FNDC5 is cleaved into Irisin, a putative myokin that stimulates white-to-brown fat conversion in mice [
], though this role is disputed in humans [,
]. This domain is found C-terminal to the irisin domain [].
Spike (S) protein S1 subunit, N-terminal domain, MERS-CoV-like
Type:
Domain
Description:
The CoV Spike (S) protein is an envelope glycoprotein that plays the most important role in viral attachment, fusion, and entry into host cells, and serves as a major target for the development of neutralizing antibodies, inhibitors of viral entry, and vaccines. It is synthesised as a precursor protein that is cleaved into an N-terminal S1 subunit (~700 amino acids) and a C-terminal S2 subunit (~600 amino acids) that mediates attachment and membrane fusion, respectively. Three S1/S2 heterodimers assemble to form a trimer spike protruding from the viral envelope. The S1 subunit contains a receptor-binding domain (RBD), while the S2 subunit contains a hydrophobic fusion peptide and two heptad repeat regions. S1 contains two structurally independent domains, the N-terminal domain (NTD) and the C-terminal domain (C-domain). Depending on the virus, either the NTD or the C-domain can serve as the receptor-binding domain (RBD). Most CoVs, including SARS-CoV-2, SARS-CoV, and MERS-CoV use the C-domain to bind their receptors. However, CoV such as mouse hepatitis virus (MHV) uses the NTD to bind its receptor, mouse carcinoembryonic antigen related cell adhesion molecule 1a (mCEACAM1a). The S1 NTD contributes to the Spike trimer interface [
,
,
,
].This entry represents the N-terminal domain (NTD) of the S1 subunit of the Spike (S) proteins from betacoronaviruses in the merbecovirus subgenera (C lineage), including Middle East respiratory syndrome (MERS) CoV and related bat CoVs.
Spike (S) protein S1 subunit, N-terminal domain, HKU9-like
Type:
Domain
Description:
The CoV Spike (S) protein is an envelope glycoprotein that plays the most important role in viral attachment, fusion, and entry into host cells, and serves as a major target for the development of neutralizing antibodies, inhibitors of viral entry, and vaccines. It is synthesised as a precursor protein that is cleaved into an N-terminal S1 subunit (~700 amino acids) and a C-terminal S2 subunit (~600 amino acids) that mediates attachment and membrane fusion, respectively. Three S1/S2 heterodimers assemble to form a trimer spike protruding from the viral envelope. The S1 subunit contains a receptor-binding domain (RBD), while the S2 subunit contains a hydrophobic fusion peptide and two heptad repeat regions. S1 contains two structurally independent domains, the N-terminal domain (NTD) and the C-terminal domain (C-domain). Depending on the virus, either the NTD or the C-domain can serve as the receptor-binding domain (RBD). Most CoVs, including SARS-CoV-2, SARS-CoV, and MERS-CoV use the C-domain to bind their receptors. However, CoV such as mouse hepatitis virus (MHV) uses the NTD to bind its receptor, mouse carcinoembryonic antigen related cell adhesion molecule 1a (mCEACAM1a). The S1 NTD contributes to the Spike trimer interface [
,
,
,
].This entry contains the N-terminal domain (NTD) of the S1 subunit of the Spike (S) proteins from betacoronaviruses in the nobecovirus subgenera (D lineage), including Rousettus bat coronavirus HKU9 and related bat CoVs.
Infectious hypodermal and haematopoietic necrosis virus, capsid protein
Type:
Family
Description:
This entry represents the single capsid protein of infectious hypodermal and haematopoietic necrosis virus (IHHNV), found particularly in shrimp densovirus. Densoviruses are a subfamily of the parvoviruses. The capsid protein has an eight-stranded anti-parallel β-barrel 'jelly roll' motif similar to that found in many icosahedral viruses, including other parvoviruses. The N-terminal portion of the IHHNV coat protein adopts a 'domain-swappe' conformation relative to its twofold-related neighbour. The loops connecting the strands of the structurally conserved jelly roll motif differ considerably in structure and length from those of other parvoviruses. IHHNV was first reported as a highly lethal disease of juvenile shrimp in 1983, and has only one type of capsid protein that lacks the phospholipase A2 activity that has been implicated as a requirement during parvoviral host cell infection. The structure of recombinant virus-like particles, composed of 60 copies of the 37.5kDa coat protein is the smallest parvoviral capsid protein reported thus far. The small size of the PstDNV capsid protein makes the system attractive as a model for studying assembly mechanisms of icosahedral virus capsids [
].
Spike (S) protein S1 subunit, N-terminal domain, SARS-CoV-like
Type:
Domain
Description:
This entry represents the N-terminal domain (NTD) of the S1 subunit of the Spike (S) proteins from betacoronaviruses in the sarbecovirus subgenera (B lineage), including the highly pathogenic human coronavirus (CoV), Severe acute respiratory syndrome (SARS) CoV, and SARS-CoV-2, also known as a 2019 novel coronavirus (2019-nCoV) or COVID-19 virus [,
,
].The CoV Spike (S) protein is an envelope glycoprotein that plays the most important role in viral attachment, fusion, and entry into host cells, and serves as a major target for the development of neutralizing antibodies, inhibitors of viral entry, and vaccines. It is synthesised as a precursor protein that is cleaved into an N-terminal S1 subunit (~700 amino acids) and a C-terminal S2 subunit (~600 amino acids) that mediates attachment and membrane fusion, respectively. Three S1/S2 heterodimers assemble to form a trimer spike protruding from the viral envelope. The S1 subunit contains a receptor-binding domain (RBD), while the S2 subunit contains a hydrophobic fusion peptide and two heptad repeat regions. S1 contains two structurally independent domains, the N-terminal domain (NTD) and the C-terminal domain (C-domain). Depending on the virus, either the NTD or the C-domain can serve as the receptor-binding domain (RBD). Most CoVs, including SARS-CoV-2, SARS-CoV, and MERS-CoV use the C-domain to bind their receptors. However, CoV such as mouse hepatitis virus (MHV) uses the NTD to bind its receptor, mouse carcinoembryonic antigen related cell adhesion molecule 1a (mCEACAM1a). The S1 NTD contributes to the Spike trimer interface [
,
,
,
].