This entry represents the archaeal oligosaccharyl transferases. Members of this protein family are found, one to three members per genome, in the same species of Euryarchaeota that contain the predicted protein-sorting enzyme archaeosortase and its cognate protein-sorting signal PGF-CTERM [
]. Oligosaccharyl transferase AglB (STT3) is responsible for transferring a pentasaccharide moiety to at least two S-layer glycoproteinsites in Haloferax volcanii [
]. Dolichyl-phosphooligosaccharide-protein glycotransferase 3 (aglB3) from Archaeoglobus fulgidus is required for the first step in attaching an N-linked-carbohydrate to a newly-synthesized protein, by transfering a glycan from the lipid carrier dolichol-monophosphate to an asparagine residue within an Asn-X-Ser/Thr consensus motif [].
Protein arginine deiminases (PADs) use a nucleophilic cysteine to hydrolyze guanidinium groups on arginine residues to form citrulline. This reaction, known as citrullination or deimination, results in the loss of positive charge, thereby affecting protein function and altering protein-protein and protein-nucleic acid interactions. Humans encode five PADs, designated PADs 1-4 and PAD6, which regulate numerous cellular processes. PADs are dysregulated in inflammatory diseases and cancer [
].The PAD2 monomer consists of two immunoglobulin-like domains, IgG1 (residues 1-115) and IgG2 (residues 116-295), as well as a C-terminal catalytic domain (residues 296-665) [
].This superfamily represents the first immunoglobulin-like domain found in protein arginine deiminases.
This entry represents the RNA recognition motif 3 (RRM3) of HuC (also known as ELAV-like protein 3), one of the neuronal members of the Hu family. The neuronal Hu proteins play important roles in neuronal differentiation, plasticity and memory [
]. Like other Hu proteins, HuC contains three RNA recognition motifs (RRMs). RRM1 and RRM2 may cooperate in binding to an AU-rich RNA element (ARE) []. The AU-rich element binding of HuC can be inhibited by flavonoids. RRM3 may help to maintain the stability of the RNA-protein complex, and might also bind to poly(A) tails or be involved in protein-protein interactions [].
Sec16 is a multi-domain vesicle coat protein. The overall function of Sec16 is in mediating the movement of protein-cargo between the organelles of the secretory pathway. The yeast protein Sec16 plays a key role in the formation of coat protein II vesicles, which mediate protein transport from the endoplasmic reticulum (ER) to the Golgi apparatus [
].This entry represents the N terminus of the Sec16 protein found in fungi. Over-expression of truncated mutants of only the N terminus are lethal, and this portion does not appear to be essential for function so may act as a stabilising region [
].
Escherichia coli NADPH-sulphite reductase (SiR) is a multimeric hemoflavoprotein composed of eight alpha-subunits (SiR-FP) and four beta-subunits (SiR-HP) that catalyses the six electron reduction of sulphite to sulphide. This is one of several activities required for the biosynthesis of L-cysteine from sulphate. The alpha component of NADPH-sulphite reductase is a flavoprotein, the beta component is a hemoprotein []. The flavoprotein component catalyses the electron flow from NADPH to FAD to FMN to the hemoprotein component.This entry describes an NADPH-dependent sulphite reductase flavoprotein subunit alpha. It binds one FAD and one FMN as prosthetic
groups and contains an NADPH-binding domain [].
This entry includes the zinc-dependent metalloprotease domain of BMP1 and tolloid-like proteins. Procollagen C-peptidase (BMP1, Bone morphogenetic protein 1; MEROPS identifier M12.005) and TLD (tolloid)-like metalloproteases play vital roles in extracellular matrix formation, by cleaving precursor proteins such as enzymes, structural proteins, and proteins involved in the mineralization of the extracellular matrix. The Drosophila protein tolloid (MEROPS identifier M12.010) and its Xenopus homologue xolloid (MEROPS identifier M12.015) cleave and inactivate Sog and chordin, respectively, which are inhibitors of Dpp (the Drosophila decapentaplegic gene product) and its homologue BMP4, involved in dorso-ventral patterning [
,
,
].
Sulphate-binding protein (gene sbp or sbpA) and thiosulphate-binding protein (gene cysP) are two
structurally related periplasmic bacterial proteins which specifically bind sulphate and thiosulphate and are involved in the transport systems for these nutrients [
,
]. There are two conserved regions in the protein, one located in the N-terminal region and the other in the central
part of these proteins. The second region includes two adjacent amino acids (Ser-Gly) that, in sbp, are known to be essential for sulphate binding [
].This entry represents a conserved site located in the N-terminal region of these proteins.
This protein family is the ATP-binding cassette subunit of binding protein-dependent ABC transporter complex that strictly co-occurs with
. TIGRFAMs model
describes a protein domain that occurs singly or as one of up to three repeats in proteins of a number of Actinobacteria, including Propionibacterium acnes KPA171202. The
domain occurs both in an adjacent gene for the substrate-binding protein and in additional (often nearby) proteins, often with LPXTG-like sortase recognition signals. Homologous ATP-binding subunits outside the scope of this family include manganese transporter MntA in Synechocystis sp PCC 6803 and chelated iron transporter subunits. The function of this transporter complex is unknown.
This entry includes LppA, which is a lipoprotein found in pathogenic mycobacteria. These pathogenic lipoproteins may play a role in host-pathogen interactions. Lipoproteins localised to the cell-envelope of pathogenic bacteria are major determinants of virulence. The proteins are localised to the cell-surface via an N-terminal lipidation carried out by a transferase - pro-lipoprotein diacylglyceryl transferase Lgt - which attaches a diacylglyceride molecule to a sulfur atom from a crucial cysteine, and a consecutively acting lipoprotein signal peptidase LspA that cleaves the signal peptide just before the modified cysteine. When the peptidase is inactivated the pathogen has difficulty in replicating inside macrophages [
].
This family consists of several mammalian selenoprotein S (SelS) sequences. SelS is a plasma membrane protein and is present in a variety of tissues and cell types. These proteins are involved in the degradation process of misfolded endoplasmic reticulum (ER) luminal proteins which participate in the transfer of misfolded proteins from the ER to the cytosol, where they are destroyed by the proteasome in a ubiquitin-dependent manner [
]. They probably serve as a linker between DER1, which mediates the retro-translocation of misfolded proteins into the cytosol, and the ATPase complex VCP, which mediates the translocation and ubiquitination.
This entry represents a group of E3 ubiquitin-protein ligases, including UBR1, UBR2 and UBR3. They are part of the N-end rule pathway [
]. They recognize and bind to proteins bearing specific N-terminal residues, leading to their ubiquitination and subsequent degradation [,
].The UBR1 protein was shown to bind specifically to proteins bearing N-terminal
residues that are destabilising according to the N-end rule, but not tootherwise identical proteins bearing stabilising N-terminal residues [
]. UBR1 contains an N-terminal conserved region (the UBR-type zinc finger) which is also found in various proteins implicated in N-degron recognition [].
E3 ubiquitin-protein ligase Trim36, RING finger, HC subclass
Type:
Domain
Description:
The tripartite motif-containing protein (Trim) family is defined by the presence of a common domain structure composed of a RING finger, a B-box, and a coiled-coil motif [
]. Trim36 is a E3 ubiquitin-protein ligase that mediates ubiquitination and subsequent proteasomal degradation of target proteins. It interacts with centromere protein-H, one of the kinetochore proteins and is involved in chromosome segregation and cell cycle regulation. This protein has been associated with anencephaly [,
,
]. It may play a role in the acrosome reaction and fertilisation [,
].This entry represents the HC-RING finger found at the N-terminal of Trim36.
RING1 and YY1-binding protein/YY1-associated factor 2
Type:
Family
Description:
Proteins included in this entry have an N-terminal RanBP2-type zinc finger and a Yaf2/RYBP C-terminal binding motif. Proteins with these regions include Yaf2 and RYBP proteins, which are homologous parts of the PRC1 complex [
].RYBP is a zinc finger protein with an essential role during embryonic development, which binds transcriptional factors, Polycomb products, and mediators of apoptosisis [
]. RYBP also binds ubiquitin and Cbx proteins via the C-terminal docking module [,
]. RYBP is natively unstructured until it binds to the C-terminal region of the Polycomb protein Ring1B or to DNA []. Yaf2 binds to MYC and inhibits MYC-mediated transactivation [].
TTHA0281 is a hypothetical protein from Thermus thermophilus HB8 that belongs to an uncharacterized protein family UPF0150. The TTHA0281 monomer adopts an α-β-β-β-alpha fold and forms a homotetramer. Based on the properties and functions of structural homologues of the TTHA0281 monomer, such as the HicB protein from the HicAB cassette toxin antitoxin system [
], the TTHA0281 protein is speculated to be involved in RNA metabolism, including RNA binding and cleavage [].TTHA1013 is annotated as a hypothetical protein from Thermus thermophilus HB8. The overall fold of this protein consists of one α-helix and four β-strands [
].
The HEAT repeat is a tandemly repeated, 37-47 amino acid long module occurring in a number of cytoplasmic proteins, including the four name-giving proteins huntingtin, elongation factor 3 (EF3), the 65 kDa alpha regulatory subunit of protein phosphatase 2A (PP2A) and the yeast PI3-kinase TOR1 [
]. Arrays of HEAT repeats consists of 3 to 36 units forming a rod-like helical structure and appear to function as protein-protein interaction surfaces. It has been noted that many HEAT repeat-containing proteins are involved in intracellular transport processes.In the crystal structure of PP2A PR65/A [
], the HEAT repeats consist of pairs of antiparallel α-helices [].
The HEAT repeat is a tandemly repeated, 37-47 amino acid long module occurring in a number of cytoplasmic proteins, including the four name-giving proteins huntingtin, elongation factor 3 (EF3), the 65 kDa alpha regulatory subunit of protein phosphatase 2A (PP2A) and the yeast PI3-kinase TOR1 [
]. Arrays of HEAT repeats consists of 3 to 36 units forming a rod-like helical structure and appear to function as protein-protein interaction surfaces. It has been noted that many HEAT repeat-containing proteins are involved in intracellular transport processes.In the crystal structure of PP2A PR65/A [
], the HEAT repeats consist of pairs of antiparallel α-helices [].
Protein arginine deiminases (PADs) use a nucleophilic cysteine to hydrolyze guanidinium groups on arginine residues to form citrulline. This reaction, known as citrullination or deimination, results in the loss of positive charge, thereby affecting protein function and altering protein-protein and protein-nucleic acid interactions. Humans encode five PADs, designated PADs 1-4 and PAD6, which regulate numerous cellular processes. PADs are dysregulated in inflammatory diseases and cancer [].The PAD2 monomer consists of two immunoglobulin-like domains, IgG1 (residues 1-115) and IgG2 (residues 116-295), as well as a C-terminal catalytic domain (residues 296-665) [
].This entry represents the first immunoglobulin-like non-catalytic domain of protein-arginine deiminase.
Holliday junction recognition protein, HJURP, central domain
Type:
Domain
Description:
This entry represents a central, conserved region found in Holliday junction recognition protein from humans (HJURP) and similar proteins from vertebrates. This domain is not present in Holliday junction recognition proteins from fungi. However, both the N-terminal domain and a repeated domain that appears further downstream, also of unknown function, appear in all both vertebral and fungal Holliday junction recognition proteins. HJURP is a conserved non-histone protein that interact physically with the conventional histone H3 (CENP-A) heterotetramer and is essential for its deposition at centromeres in vivo [
,
,
,
,
]. This protein has been implicated in many cancers [].
This domain occurs as the C-terminal region in a number of proteins that have extensive collagen-like triple helix repeat regions. Member domains are predicted by TmHMM to have four or five transmembrane helices. Proteins in this entry are found mostly in the Firmicutes, but also in Acanthamoeba polyphaga mimivirus. This entry includes spore surface glycoprotein BclB from Bacillus anthracis, a protein of the exosporium. The exosporium is an additional outermost spore layer, lacking in Bacillus subtilis and most other spore formers, consisting of a basal layer and, above it, a nap of fine filaments.Proteins containing this domain also include some uncharacterised virus proteins.
This entry represents Capsid proteins from Bacteriophage P2 (GpN) and similar proteins from tailed bacteriophages and bacterial prophages mainly among Proteobacteria. GpN undergoes proteolytic cleavage into three products: Minor capsid protein H1, a 39 kDa proteinMinor capsid protein H2, 38.6 kDaMajor capsid protein N*, 36.7 kDa [
]
The capsid of P4, the satellite phage of P2, consists of those same gene products encoded by P2, although in this case the resulting capsids are smaller than those usually formed by P2 alone. P2 procapsid have a dextro HK97 fold with T = 7 symmetry [
,
].
The gamma-secretase-activating protein (GSAP) family includes the mammalian GSAPs and the insect pigeon proteins (also known as linotte proteins). GSAP is a gamma-secretase regulator. It specifically activates the production of beta-amyloid protein through interactions with both gamma-secretase and its substrate, the amyloid precursor protein carboxy-terminal fragment (APP-CTF) [
]. This has led to interest in the protein as potential therapeutic target for the treatment of Alzheimer's disease []. Pigeon/linotte was initially identified as a gene that functions in adult Drosophila during associative learning [].This entry represents a domain found in the C-terminal of GSAP family members.
Arteriviruses are enveloped, positive-stranded RNA viruses and include
pathogens of major economic concern to the swine- and horse-breedingindustries:Equine arteritis virus (EAV).Porcine reproductive and respiratory syndrome virus (PRRSV).Mice actate dehydrogenase-elevating virus.Simian hemorrhagic fever virus.The arterivirus replicase gene is composed of two open reading frames (ORFs).
ORF1a is translated directly from the genomic RNA, whereas ORF1b can beexpressed only by ribosomal frameshifting, yelding a 1ab fusion protein. Both
replicase gene products are multidomain precursor proteins which areproteolytically processed into functional nonstructural proteins (nsps) by a
complex proteolytic cascade that is directed by four (PRRSV/LDV) or three(EAV) proteinase domains encoded in ORF1a. The arterivirus replicase
processing scheme involves the rapid autoproteolytic release of two or threeN-terminal nsps (nsp1 (or nsp1alpha/1beta) and nsp2) and the
subsequent processing of the remaining polyproteins by the "main protease"residing in nsp4, together resulting in a set of 13
or 14 individual nsps. The arterivirus nsp1 region contains a tandem of
papain-like cysteine autoprotease domains (PCPalpha and PCPbeta), but in EAVPCPalpha has lost its enzymatic activity, resulting in the 'merge' of
nsp1alpha and nsp1beta into a single nsp1 subunit. Thus, instead of threeself-cleaving N-terminal subunits, EAV has two: nsp1 and nsp2. The PCPalpha
and PCPbeta domains mediate the nsp1alpha|1beta and nsp1beta|2 cleavages,respectively. The catalytic dyad of PCPalpha and PCPbeta domains is composed
of Cys and His residues. In EAV, a Lys residue is found in place of thecatalytic Cys residue, which explains the proteolytic deficiency of the EAV
PCPalpha domain [,
,
,
]. The PCPalpha and PCPbeta domains form respectively MEROPSpeptidase families C31 and C32.
The PCPalpha and PCPbeta domains have a typical papain fold, which consists of
a compact global region containing sequentially connected left (L) and right(R) parts in a so-called standard orientation. The L subdomain of PCPalpha
consists of four α-helices, while the R subdomain is formed by threeantiparallel beta strands [
]. The L subdomain of the PCBbetaconsists of three α-helices, while the R subdomain is formed by four
antiparallel β-strands []. The Cys and His residues faceeach other at the L-R interface and form the catalytic centre of the PCPalpha
and PCPbeta domains [,
].This entry represents the PCPalpha domain (peptidase C31).A cysteine peptidase is a proteolytic enzyme that hydrolyses a peptide bond using the thiol group of a cysteine residue as a nucleophile. Hydrolysis involves usually a catalytic triad consisting of the thiol group of the cysteine, the imidazolium ring of a histidine, and a third residue, usually asparagine or aspartic acid, to orientate and activate the imidazolium ring. In only one family of cysteine peptidases, is the role of the general base assigned to a residue other than a histidine: in peptidases from family C89 (acid ceramidase) an arginine is the general base. Cysteine peptidases can be grouped into fourteen different clans, with members of each clan possessing a tertiary fold unique to the clan. Four clans of cysteine peptidases share structural similarities with serine and threonine peptidases and asparagine lyases. From sequence similarities, cysteine peptidases can be clustered into over 80 different families [
]. Clans CF, CM, CN, CO, CP and PD contain only one family.Cysteine peptidases are often active at acidic pH and are therefore confined to acidic environments, such as the animal lysosome or plant vacuole. Cysteine peptidases can be endopeptidases, aminopeptidases, carboxypeptidases, dipeptidyl-peptidases or omega-peptidases. They are inhibited by thiol chelators such as iodoacetate, iodoacetic acid,
N-ethylmaleimide or
p-chloromercuribenzoate.
Clan CA includes proteins with a papain-like fold. There is a catalytic triad which occurs in the order: Cys/His/Asn (or Asp). A fourth residue, usually Gln, is important for stabilising the acyl intermediate that forms during catalysis, and this precedes the active site Cys. The fold consists of two subdomains with the active site between them. One subdomain consists of a bundle of helices, with the catalytic Cys at the end of one of them, and the other subdomain is a β-barrel with the active site His and Asn (or Asp). There are over thirty families in the clan, and tertiary structures have been solved for members of most of these. Peptidases in clan CA are usually sensitive to the small molecule inhibitor E64, which is ineffective against peptidases from other clans of cysteine peptidases [
].Clan CD includes proteins with a caspase-like fold. Proteins in the clan have an α/β/α sandwich structure. There is a catalytic dyad which occurs in the order His/Cys. The active site His occurs in a His-Gly motif and the active site Cys occurs in an Ala-Cys motif; both motifs are preceded by a block of hydrophobic residues [
]. Specificity is predominantly directed towards residues that occupy the S1 binding pocket, so that caspases cleave aspartyl bonds, legumains cleave asparaginyl bonds, and gingipains cleave lysyl or arginyl bonds.Clan CE includes proteins with an adenain-like fold. The fold consists of two subdomains with the active site between them. One domain is a bundle of helices, and the other a β-barrel. The subdomains are in the opposite order to those found in peptidases from clan CA, and this is reflected in the order of active site residues: His/Asn/Gln/Cys. This has prompted speculation that proteins in clans CA and CE are related, and that members of one clan are derived from a circular permutation of the structure of the other.Clan CL includes proteins with a sortase B-like fold. Peptidases in the clan hydrolyse and transfer bacterial cell wall peptides. The fold shows a closed β-barrel decorated with helices with the active site at one end of the barrel [
]. The active site consists of a His/Cys catalytic dyad.Cysteine peptidases with a chymotrypsin-like fold are included in clan PA, which also includes serine peptidases. Cysteine peptidases that are N-terminal nucleophile hydrolases are included in clan PB. Cysteine peptidases with a tertiary structure similar to that of the serine-type aspartyl dipeptidase are included in clan PC. Cysteine peptidases with an intein-like fold are included in clan PD, which also includes asparagine lyases.
This entry represents the C-terminal domain found in the Timeless (TIM) proteins. This domain can be found in TIM homologues mostly from animals. This domain found in hTIM has been shown to bind to the PARP-1 catalytic domain [
].The timeless gene in Drosophila melanogasteris involved in circadian rhythm control [
]. Drosophila contains two paralogs, dTIM and dTIM2, acting in clock/photoreception and chromosome integrity/photoreception respectively. The mammalian TIMELESS (TIM) protein, originally identified based on its similarity to Drosophila dTIM, interacts with the clock proteins dCRY and dPER and is essential for circadian rhythm generation and photo-entrainment in the fly []. However, phylogenetic sequence analysis has demonstrated that dTIM2 is likely to be the orthologue of mammalian TIM and other widely conserved TIM-like proteins in eukaryotes []. These proteins include Saccharomyces cerevisiae Tof1, Schizosaccharomyces pombe Swi1, and Caenorhabditis elegans TIM. These proteins are not involved in the core clock mechanism, but instead play important roles in chromosome integrity, efficient cell growth and/or development [,
], with the exception of dTIM-2, that has an additional function in retinal photoreception [].Saccharomyces cerevisiae Tof1 is a subunit of a replication-pausing checkpoint complex (Tof1-Mrc1-Csm3) that acts at the stalled replication fork to promote sister chromatid cohesion after DNA damage, facilitating gap repair of damaged DNA [
,
]. Schizosaccharomyces pombe Swi1 and Swi3 form the fork protection complex that coordinates leading- and lagging-strand synthesis and stabilizes stalled replication forks []. In humans timeless forms a stable complex with its partner protein Tipin. The Timeless-Tipin complex has been reported to travel along with the replication fork during unperturbed DNA replication. Moreover, the Timeless-Tipin-Claspin complex contributes to full activation of the ATR-Chk1 signaling pathway through the recruitment of Chk1 to arrested replication forks for sufficient ATR-mediated phosphorylation. It also interacts with PARP-1, and this interaction is required for efficient homologous recombination repair [
].
Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [
,
,
,
,
]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. This entry represents a CW-type zinc finger motif, named for its conserved cysteine and tryptophan residues. It is predicted to be a highly specialised mononuclear four-cysteine (C4) zinc finger that plays a role in DNA binding and/or promoting protein-protein interactions in complicated eukaryotic processes including chromatin methylation status and early embryonic development. Weak homology to members of
further evidences these predictions. The domain is found exclusively in vertebrates, vertebrate-infecting parasites and higher plants [
].
The SEP (after shp1, eyc and p47) domain is an eukaryotic domain, which occurs frequently and mainly in single units. Almost all proteins containing a SEP domain are succeeded closely by a UBX domain (see
). The function of the SEP domain is as yet unknown but it has been proposed to act as a reversible competitive inhibitor of the lysosomal cysteine protease cathepsin L [
,
].The sructure of the SEP domain comprises a β-sheet composed of four
strands, and two α-helices. One side of the β-sheet faces alpha1 and alpha2. The longer helix alpha1 packs against the four-stranded β-sheet, where as the shorter helix alpha2 is located at one edge of the globular structure formed by alpha1 and the four stranded beta sheet. A number of highly conserved hydrophobic residues are present in the SEP domain, which are predominantly buried and form the hydrophobic core [,
].Some proteins known to contain a SEP domain are listed below:- Eukaryotic NSFL1 cofactor p37 (or p97 cofactor p37), an adapter protein required for Golgi and endoplasmic reticulum biogenesis. It is involved in Golgi and endoplasmic reticulum maintenance during interphase and in their reassembly at the end of mitosis.
- Eukaryotic NSFL1 cofactor p47 (or p97 cofactor p47), a major adaptor molecule of the cytosolic AAA-type ATPase (ATPases associated with various cellular activities) p97. p47 is required for the p97-regulated membrane reassembly of the endoplasmic reticulum (ER), the nuclear envelope and the Golgi apparatus.
- Vertebrate UBX domain-containing protein 4 (UBXD4).
- Plant UBA and UBX domain-containing protein.
- Saccharomyces cerevisiae (Baker's yeast) UBX domain-containing protein 1 or Suppressor of high-copy PP1 protein (shp1), the homologue of p47.
- Drosophila melanogaster (Fruit fly) eyes closed (eyc).
Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [
,
,
,
,
]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. This zinc-finger like domain is distributed throughout the eukaryotic kingdom in NIPA (Nuclear interacting partner of ALK) and other proteins. NIPA is thought to perform an antiapoptotic role in nucleophosmin-anaplastic lymphoma kinase (ALK) mediated signalling events [
]. The domain is often repeated, with the second domain usually containing a large insert (approximately 90 residues) after the first three cysteine residues. The Schizosaccharomyces pombe protein containing this domain () is involved in mRNA export from the nucleus [
].
This entry includes retinoblastoma-associated protein (RB, also known as pRb, RB, p1051), retinoblastoma-like protein 1 (RBL1, also known as p107) and retinoblastoma-like protein 2 (RBL2, also known as RB2 or p130). Members of this entry contain a conserved domain named the 'pocket' that interacts with the LXCXE motif found in viral proteins, such as SV40 large T antigen [
]. This pocket consists of A- and B-boxes which are found at the C-terminal end of the protein. This entry represents the B-box, that is on C-terminal side of the A-box. The crystal structure of the RB pocket bound to a nine-residue E7 peptide containing the LxCxE motif, shared by other RB-binding viral and cellular proteins, shows that the LxCxE peptide binds a highly conserved groove on the B-box portion of the pocket; the A-box portion (see ) appears to be required for the stable folding of the B box. Also highly conserved is the extensive A-B interface, suggesting that it may be an additional protein-binding site. The A and B boxes each contain the cyclin-fold structural motif, with the LxCxE-binding site on the B-box cyclin fold being similar to a Cdk2-binding site of cyclin A and to a TBP-binding site of TFIIB [
,
].In humans, RB is a tumour suppressor linked to several major cancers [
]. RB forms complexes with E2Fs and represses gene expression by recruiting chromatin remodeling factors, such as histone deacetylases (HDACs) to E2F-responsive promoters [,
]. Apart from E2Fs, RB also interacts with other transcription factors that govern cell differentiation [,
].RBL1 and RBL2 are the components of the DREAM complex, which represses cell cycle-dependent genes in quiescent cells and plays a role in the cell cycle-dependent activation of G2/M genes [
,
].
This entry represents the phosphatase domain within eukaryotic myotubularin-related proteins. Myotubularin is a dual-specific lipid phosphatase that dephosphorylates phosphatidylinositol 3-phosphate and phosphatidylinositol (3,5)-bi-phosphate [
]. Mutations in gene encoding myotubularin-related proteins have been associated with disease []. The protein exists as a dimer with twofold symmetry, in which the dimerization is mediated by the phosphatase domain [].Myotubularin phosphatases are members of the protein tyrosine phosphatase (PTP) superfamily. The PTP domain is found in a diverse group of enzymes that catalyse phosphoester hydrolysis using a cysteine nucleophile and an arginine residue that binds to oxygen atoms of the phosphate. These two catalytically essential residues are found in a Cys-x(5)-Arg motif, which is a hallmark of PTP domains. The PTP superfamily of enzymes includes tyrosine-specific, dual specificity, low molecular weight, and Cdc25 phosphatases. All of these enzymes utilise phosphoproteins as substrates. Unlike these members of PTPs, enzymes that contain the tensin and myotubularin PTP domain utilise the phosphoinositide as its physiologic substrate. Myotubularins are 3-phosphatases specific for membrane-embedded PtdIns3P and PtdIns(3,5)P2, two PIs that function within the endosomal-lysosomal pathway [,
].The myotubularin phosphatase domain consists of a central seven stranded beta sheet flanked by thirteen alpha helices [
,
]. Although its core structure is similar to that of other PTP superfamily members, the myotubularin phosphatase domain is much larger. It contains an extra C-terminal region, which could be implicated in protein-protein interactions. The active site motif forms a P-loop at the base of a substrate binding pocket that is characteristic of PTP domains. This pocket is significantly deeper than that of other PTP pockets, which could explain the difference in substrate specificity.The myotubularin family includes catalytically inactive members, or pseudophosphatases, which contain inactivating substitutions in the phosphatase domain [
].
Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [
,
,
,
,
]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. This entry represents a zinc finger motif found in nuclear hormone receptors and in erythroid transcription factor GATA-1. Nuclear hormone receptors usually have two copies of this motif, while GATA-1 has one copy. The zinc fingers in nuclear receptors are generally regarded as DNA-binding domains [
], while those in GATA-1 have been implicated in protein-recognition (of FOG proteins) [,
].
Rare lipoprotein A (RlpA) contains a conserved region that has the double-psi β-barrel (DPBB) fold [
,
]. RlpA is a bacterial septal ring protein and a lytic transglycosylase that contributes to rod shape and daughter cell separation in Pseudomonas aeruginosa []. It has been shown to act as a prc mutant suppressor in Escherichia coli []. The DPBB fold is often an enzymatic domain. Proteins containing this domain are quite diverse and may have several different functions. Another example of this domain is found in the N terminus of pollen allergen []. Some studies show that the full-length RlpA protein from Pseudomonas Aeruginosa is an outer membrane protein that is a lytic transglycolase with specificity for peptidoglycan lacking stem peptides. Residue D157 in Pseudomonas aeruginosa RlpA is critical for lytic activity [].Beta barrels are commonly observed in protein structures. They are classified in terms of two integral parameters: the number of strands in the sheet, n, and the shear number, S, a measure of the stagger of the strands in the β-sheet. These two parameters have been shown to determine the major geometrical features of β-barrels. Six-stranded β-barrels with a pseudo-twofold axis are found in several proteins. One involving parallel strands forming two psi structures is known as the double-psi barrel. The first psi structure consists of the loop connecting strands beta1 and beta2 (a 'psi loop') and the strand beta5, whereas the second psi structure consists of the loop connecting strands beta4 and beta5 and the strand beta2. All the psi structures in double-psi barrels have a unique handedness, in that beta1 (beta4), beta2 (beta5) and the loop following beta5 (beta2) form a right-handed helix. The unique handedness may be related to the fact that the twisting angle between the parallel pair of strands is always larger than that between the antiparallel pair [].
Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [
,
,
,
,
]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. This entry represents the CysCysHisCys (CCHC) type zinc finger domains, and have the sequence:C-X2-C-X4-H-X4-C
where X can be any amino acid, and number indicates the number of residues. These 18 residues CCHC zinc finger domains are mainly found in the nucleocapsid protein of retroviruses. It is required for viral genome packaging and for early infection process [
,
,
]. It is also found in eukaryotic proteins involved in RNA binding or single-stranded DNA binding [].
Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [
,
,
,
,
]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. This model describes a putative zinc finger domain found in three closely spaced copies in Arabidopsis protein LSD1 and in two copies in other proteins from the same species. The motif resembles CxxCRxxLMYxxGASxVxCxxC [
]. This domain may play a role in the regulation of transcription, via either repression of a prodeath pathway or activation of an antideath pathway, in response to signals emanating from cells undergoingpathogen-induced hypersensitive cell death.
A subgroup of serine/threonine protein kinases, Polo or Polo-like kinases play multiple roles during the cell cycle. Polo kinases are required at several key points through mitosis, starting from control of the G2/M transition through phosphorylation of Cdc25C and mitotic cyclins. They are also involved in meiosis I as regulators of kinetochore function [
,
]. Polo kinases are characterised by an amino terminal catalytic domain, and a carboxy terminal non-catalytic domain consisting of three blocks of conservedsequences known as polo boxes which form one single functional domain [
]. The domain is named after its founding member encoded by the polo gene of Drosophila melanogaster []. This domain of around 70 amino acids has been found in species ranging from yeast to mammals. Polo boxes appear to mediate interaction with multiple proteins through protein:protein interactions; some but not all of these proteins are substrates for the kinase domain of the molecule [].The crystal structure of the polo domain of the murine protein, Sak, is dimeric, consisting of two α-helices and two six-stranded β-sheets [
]. The topology of one polypeptide subunit of the dimer consists of, from its N- to C terminus, an extended strand segment, five β-strands, one α-helix (A) and C-terminal β-strand. β-strands from one subunit form a contiguous antiparallel β-sheet with β-strands from the second subunit. The two β-sheets pack with crossing angle of 110 degrees, orienting the hydrophobic surfaces inward and the hydrophilic surfaces outward. Helix A, which is colinear with β-strand 6 of the same polypeptide, buries a large portion of the non-overlapping hydrophobic β-sheet surfaces. Interactions involving helices A comprise a majority of the hydrophobic core structure and also the dimer interface.Point mutations in the Polo box of the budding yeast Cdc5 protein abolish the ability of overexpressed Cdc5 to interact with the spindle poles and to organise cytokinetic structures [
].
Proteins in this family include the envelope glycoprotein and the pre-small/secreted glycoprotein from Filoviridae [
]. The envelope glycoprotein can be cleaved into 3 chains: GP1, GP2 and GP2-delta.GP1 is responsible for binding to the receptor(s), such as CD209 and CLEC4M, on target cells. These interactions not only facilitate virus cell entry, but also allow capture of viral particles by dendritic cells (DCs) and subsequent transmission to susceptible cells without DCs infection (trans infection) [
].GP2 acts as a class I viral fusion protein. It is responsible for penetration of the virus into the cell cytoplasm by mediating the fusion of the membrane of the endocytosed virus particle with the endosomal membrane [
]. GP1,2 peplomers mediates endothelial cell activation and decreases endothelial barrier function. It mediates activation of primary macrophages. At terminal stages of the viral infection, when its expression is high, GP1,2 down-modulates the expression of various host cell surface molecules that are essential for immune surveillance and cell adhesion [
].GP2delta is part of the complex GP1,2delta released by host ADAM17 metalloprotease. This secreted complex may play a role in the pathogenesis of the virus by efficiently blocking the neutralizing antibodies that would otherwise neutralize the virus surface glycoproteins GP1,2. Might therefore contribute to the lack of inflammatory reaction seen during infection in spite the of extensive necrosis and massive virus production. GP1,2delta does not seem to be involved in activation of primary macrophage [
].pre-small/secreted glycoprotein sGP seems to possess an anti-inflammatory activity as it can reverse the barrier-decreasing effects of TNF alpha. It might therefore contribute to the lack of inflammatory reaction seen during infection in spite the of extensive necrosis and massive virus production. It does not seem to be involved in activation of primary macrophages. It does not seem to interact specifically with neutrophils [
,
,
,
].
This entry represents a group of tRNA thiolation proteins. It includes the tRNA-cytidine(32) 2-sulfurtransferase (
), and Cytoplasmic tRNA 2-thiolation protein 1 (
).
tRNA-cytidine(32) 2-sulfurtransferase (also known as 2-thiocytidine tRNA biosynthesis protein TtcA) is required for the thiolation of cytidine in position 32 of tRNA, to form 2-thiocytidine (s(2)C32). The modified nucleoside 2-thiocytidine (s(2)C) has so far been found in tRNA from archaea and bacteria. The TtcA protein family is characterised by the existence of both a PP-loop and a Cys-X(1)-X(2)-Cys motif in the central region of the protein but can be divided into two distinct groups based on the presence and location of additional Cys-X(1)-X(2)-Cys motifs in terminal regions of the sequence. Mutant analysis showed that both cysteines in this central conserved Cys-X(1)-X(2)-Cys motif are required for the formation of s(2)C [
]. The PP-loop motif appears to be a modified version of the P-loop of nucleotide binding domain that is involved in phosphate binding [
]. Named PP-motif, since it appears to be a part of a previously uncharacterised ATP pyrophophatase domain. ATP sulfurylases, Escherichia coli NtrL, and Bacillus subtilis OutB consist of this domain alone. In other proteins, the pyrophosphatase domain is associated with amidotransferase domains (type I or type II), a putative citrulline-aspartate ligase domain or a nitrilase/amidase domain.Cytoplasmic tRNA 2-thiolation protein 1 (also known as Ncs6/Tuc1 in budding yeasts) is responsible for 2-thiolation of mcm5S2U at tRNA wobble positions of tRNA(Lys), tRNA(Glu) and tRNA(Gln) [
,
]. It directly binds tRNAs and probably acts by catalysing adenylation of tRNAs, an intermediate required for 2-thiolation. Its fission yeast homologue, Ctu1 forms a complex with Ctu2 (Ncs2/Tuc2 homologue) and serves as a putative enzyme for the formation of 2-thiouridine []. This family also includes tRNA-5-methyluridine(54) 2-sulfurtransferase, which catalyses the 2-thiolation of 5-methyluridine residue at position 54 in the T loop of tRNAs, leading to 5-methyl-2-thiouridine [].
Electron transfer flavoproteins (ETFs) serve as specific electron acceptors for primary dehydrogenases, transferring the electrons to terminal respiratory systems. They can be functionally classified into constitutive, "housekeeping"ETFs, mainly involved in the oxidation of fatty acids (Group I), and ETFs produced by some prokaryotes under specific growth conditions, receiving electrons only from the oxidation of specific substrates (Group II) [
]. ETFs are heterodimeric proteins composed of an alpha and beta subunit, and contain an FAD cofactor and AMP [
,
,
,
,
]. ETF consists of three domains: domains I and II are formed by the N- and C-terminal portions of the alpha subunit, respectively, while domain III is formed by the beta subunit. Domains I and III share an almost identical α-β-alpha sandwich fold, while domain II forms an α-β-alpha sandwich similar to that of bacterial flavodoxins. FAD is bound in a cleft between domains II and III, while domain III binds the AMP molecule. Interactions between domains I and III stabilise the protein, forming a shallow bowl where domain II resides. The alpha subunit of both Group I and Group II ETFs is composed of domains I and II.
Many enterobacteria are able to convert carnitine, via crotonobetaine, to gamma-butyrobetaine in the presence of carbon and nitrogen sources under anaerobic conditions [
]. In Escherichia coli the enzymes involved in this pathway are encoded by the caiTABCDE operon []. The adjacent but divergent fixABCD operon also appears to be necessary for carnintine meatbolism []. The Fix proteins are homologous to proteins found in known electron transport pathways.This entry represents the predicted electron transfer protein beta subunit FixA, which is necessary for anaerobic carnitine reduction. FixA may be involved in transferring reductant to the CaiA protein.
The SEP (after shp1, eyc and p47) domain is an eukaryotic domain, which occurs frequently and mainly in single units. Almost all proteins containing a SEP domain are succeeded closely by a UBX domain (see
). The function of the SEP domain is as yet unknown but it has been proposed to act as a reversible competitive inhibitor of the lysosomal cysteine protease cathepsin L [
,
].The sructure of the SEP domain comprises a β-sheet composed of four
strands, and two α-helices. One side of the β-sheet faces alpha1 and alpha2. The longer helix alpha1 packs against the four-stranded β-sheet, where as the shorter helix alpha2 is located at one edge of the globular structure formed by alpha1 and the four stranded beta sheet. A number of highly conserved hydrophobic residues are present in the SEP domain, which are predominantly buried and form the hydrophobic core [
,
].Some proteins known to contain a SEP domain are listed below:- Eukaryotic NSFL1 cofactor p37 (or p97 cofactor p37), an adapter protein required for Golgi and endoplasmic reticulum biogenesis. It is involved in Golgi and endoplasmic reticulum maintenance during interphase and in their reassembly at the end of mitosis.
- Eukaryotic NSFL1 cofactor p47 (or p97 cofactor p47), a major adaptor molecule of the cytosolic AAA-type ATPase (ATPases associated with various cellular activities) p97. p47 is required for the p97-regulated membrane reassembly of the endoplasmic reticulum (ER), the nuclear envelope and the Golgi apparatus.
- Vertebrate UBX domain-containing protein 4 (UBXD4).
- Plant UBA and UBX domain-containing protein.
- Saccharomyces cerevisiae (Baker's yeast) UBX domain-containing protein 1 or Suppressor of high-copy PP1 protein (shp1), the homologue of p47.
- Drosophila melanogaster (Fruit fly) eyes closed (eyc).
Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [,
,
,
,
]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few []. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. This entry represents a putative C4-type zinc finger, DksA/TraR-type, found almost exclusively in prophage regions, actual phage, or conjugal transfer regions of the proteobacteria. Many proteins carrying this motif are small (about 70 amino acids), and appear to be homologous to but smaller than DksA (DnaK suppressor protein), found to be critical for regulating transcription of ribosomal RNA. DKsA contains a canonical zinc finger motif [
].
This entry represents Cas5, which helps process or stabilize pre-crRNA into individual crRNA units. Cas5 and Cas6 are also required for optimal crRNA processing and/or stability [
].The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [
]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [
,
,
].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [
].
This entry represents CRISPR pre-crRNA endoribonuclease Cas5d, which is a sequence-specific endonuclease that cleaves pre-crRNA at G21 into mature crRNA [
]. The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [
]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [
,
,
].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [
].
Rare lipoprotein A (RlpA) contains a conserved region that has the double-psi β-barrel (DPBB) fold [
,
]. RlpA is a bacterial septal ring protein and a lytic transglycosylase that contributes to rod shape and daughter cell separation in Pseudomonas aeruginosa []. It has been shown to act as a prc mutant suppressor in Escherichia coli []. The DPBB fold is often an enzymatic domain. Proteins containing this domain are quite diverse and may have several different functions. Another example of this domain is found in the N terminus of pollen allergen [
]. Some studies show that the full-length RlpA protein from Pseudomonas Aeruginosa is an outer membrane protein that is a lytic transglycolase with specificity for peptidoglycan lacking stem peptides. Residue D157 in Pseudomonas aeruginosa RlpA is critical for lytic activity [].Beta barrels are commonly observed in protein structures. They are classified in terms of two integral parameters: the number of strands in the sheet, n, and the shear number, S, a measure of the stagger of the strands in the β-sheet. These two parameters have been shown to determine the major geometrical features of β-barrels. Six-stranded β-barrels with a pseudo-twofold axis are found in several proteins. One involving parallel strands forming two psi structures is known as the double-psi barrel. The first psi structure consists of the loop connecting strands beta1 and beta2 (a 'psi loop') and the strand beta5, whereas the second psi structure consists of the loop connecting strands beta4 and beta5 and the strand beta2. All the psi structures in double-psi barrels have a unique handedness, in that beta1 (beta4), beta2 (beta5) and the loop following beta5 (beta2) form a right-handed helix. The unique handedness may be related to the fact that the twisting angle between the parallel pair of strands is always larger than that between the antiparallel pair [
].
A subgroup of serine/threonine protein kinases, Polo or Polo-like kinases play multiple roles during the cell cycle. Polo kinases are required at several key points through mitosis, starting from control of the G2/M transition through phosphorylation of Cdc25C and mitotic cyclins. They are also involved in meiosis I as regulators of kinetochore function [
,
]. Polo kinases are characterised by an amino terminal catalytic domain, and a carboxy terminal non-catalytic domain consisting of three blocks of conservedsequences known as polo boxes which form one single functional domain [
]. The domain is named after its founding member encoded by the polo gene of Drosophila melanogaster []. This domain of around 70 amino acids has been found in species ranging from yeast to mammals. Polo boxes appear to mediate interaction with multiple proteins through protein:protein interactions; some but not all of these proteins are substrates for the kinase domain of the molecule [].The crystal structure of the polo domain of the murine protein, Sak, is dimeric, consisting of two α-helices and two six-stranded β-sheets [
]. The topology of one polypeptide subunit of the dimer consists of, from its N- to C terminus, an extended strand segment, five β-strands, one α-helix (A) and C-terminal β-strand. β-strands from one subunit form a contiguous antiparallel β-sheet with β-strands from the second subunit. The two β-sheets pack with crossing angle of 110 degrees, orienting the hydrophobic surfaces inward and the hydrophilic surfaces outward. Helix A, which is colinear with β-strand 6 of the same polypeptide, buries a large portion of the non-overlapping hydrophobic β-sheet surfaces. Interactions involving helices A comprise a majority of the hydrophobic core structure and also the dimer interface.Point mutations in the Polo box of the budding yeast Cdc5 protein abolish the ability of overexpressed Cdc5 to interact with the spindle poles and to organise cytokinetic structures [
].
Zinc finger (Znf) domains are relatively small protein motifs which contain multiple finger-like protrusions that make tandem contacts with their target molecule. Some of these domains bind zinc, but many do not; instead binding other metals such as iron, or no metal at all. For example, some family members form salt bridges to stabilise the finger-like folds. They were first identified as a DNA-binding motif in transcription factor TFIIIA from Xenopus laevis (African clawed frog), however they are now recognised to bind DNA, RNA, protein and/or lipid substrates [
,
,
,
,
]. Their binding properties depend on the amino acid sequence of the finger domains and of the linker between fingers, as well as on the higher-order structures and the number of fingers. Znf domains are often found in clusters, where fingers can have different binding specificities. There are many superfamilies of Znf motifs, varying in both sequence and structure. They display considerable versatility in binding modes, even between members of the same class (e.g. some bind DNA, others protein), suggesting that Znf motifs are stable scaffolds that have evolved specialised functions. For example, Znf-containing proteins function in gene transcription, translation, mRNA trafficking, cytoskeleton organisation, epithelial development, cell adhesion, protein folding, chromatin remodelling and zinc sensing, to name but a few [
]. Zinc-binding motifs are stable structures, and they rarely undergo conformational changes upon binding their target. This entry represents the CysCysHisCys (CCHC) type zinc finger domains superfamily, and have the sequence:C-X2-C-X4-H-X4-C
where X can be any amino acid, and number indicates the number of residues. These 18 residues CCHC zinc finger domains are mainly found in the nucleocapsid protein of retroviruses. It is required for viral genome packaging and for early infection process [
,
,
]. It is also found in eukaryotic proteins involved in RNA binding or single-stranded DNA binding [].
The basic structure of immunoglobulin (Ig) molecules is a tetramer of two light chains and two heavy chains linked by disulphide bonds. There are two types of light chains: kappa and lambda, each composed of a constant domain (CL) and a variable domain (VL). There are five types of heavy chains: alpha, delta, epsilon, gamma and mu, all consisting of a variable domain (VH) and three (in alpha, delta and gamma) or four (in epsilon and mu) constant domains (CH1 to CH4). Ig molecules are highly modular proteins, in which the variable and constant domains have clear, conserved sequence patterns. The domains in Ig and Ig-like molecules are grouped into four types: V-set (variable;
), C1-set (constant-1;
), C2-set (constant-2;
) and I-set (intermediate;
) [
]. Structural studies have shown that these domains share a common core Greek-key β-sandwich structure, with the types differing in the number of strands in the β-sheets as well as in their sequence patterns [,
].Immunoglobulin-like domains that are related in both sequence and structure can be found in several diverse protein families. Ig-like domains are involved in a variety of functions, including cell-cell recognition, cell-surface receptors, muscle structure and the immune system [
]. This entry represents I-set domains, which are found in several cell adhesion molecules, including vascular (VCAM), intercellular (ICAM), neural (NCAM) and mucosal addressin (MADCAM) cell adhesion molecules, as well as junction adhesion molecules (JAM). I-set domains are also present in several other diverse protein families, including several tyrosine-protein kinase receptors, the hemolymph protein hemolin, the muscle proteins titin, telokin, and twitchin, the neuronal adhesion molecule axonin-1 [
], the signalling molecule semaphorin 4D that is involved in axonal guidance, immune function and angiogenesis [] and the Zwei Ig domain proteins zig involved in the postembriogenic neuronal soma and axon position maintenance [,
].
This entry includes the FOG-type CCHC zinc finger.Zinc fingers (Znfs) are among the most common of all protein domains. They have traditionally been regarded as sequence-specific DNA-binding motifs. However it has also become apparent that many Znfs mediate specific protein-protein interactions. Transcriptional cofactors of the Friend of GATA (FOG) family are diverse in sequence but their individual Znf regions share considerable homology. They contain either eight or nine Znfs that are related to the classical CCHH Znfs (which have a conserved C-X(2,5)-C-X(12)-H-X(2,5)-H sequence). Several of the fingers in each FOG protein, however, have an altered consensus sequence in which the final zinc binding histidine is replaced with a cysteine. It is only these variant CCHC ZbFs that mediate interactions with GATA N-terminal zinc fingers (NFs) [
,
,
,
,
].The FOG-type CCHC zinc finger comprises a short beta hairpin followed by an alpha helix. It resembles classical CCHH fingers in structure. The only notable difference is a short extended portion of the polypeptide backbone before the fourth zinc ligand. Two of the residues implicated in GATA binding are located on the extended portion of the backbone, immediately preceding the final cysteine. Thus, the CCHC topology appears to play a key role in facilitating the interaction. The residues that are implicated in GATA binding are largely conserved across FOG family CCHC Znfs, although three of the residues that affect the binding affinity show some variation [
,
].Some proteins known to contain a FOG-type CCHC zinc finger are listed below:
Vertebrate Friend of GATA-1 (FOG-1), a zinc finger protein that interacts physically with the erythroid DNA-binding protein GATA-1 and modulates its transcriptional activity.Mammalian FOG-2, associates with GATA-1 and other mammalian GATA factors.Drosophila melanogaster U-shaped (Ush), interacts with the GATA factors Pannier and Serpent to control development in the fly.
This entry includes the ORF8 immunoglobulin (Ig) domain proteins of Bat SARS coronavirus HKU3-1, which have been classified previously as type III ORF8's.ORF8 is an accessory protein that is not shared by all members of subgenus sarbecovirus. The presence and location of ORF8 in the SARS-CoV-2 genome has led its classification with SARS-CoV [
,
]. ORF8 is a potential pathogenicity factor which evolves rapidly to counter the immune response and facilitate the transmission between hosts []. ORF8 has been suggested to be one of the relevant genes in the study of human adaptation of the virus [,
].The ORF8 protein is a fast-evolving protein in SARS-related CoVs, with a tendency to recombine and undergo deletions. During the early phases of the SARS (SARS-CoV) epidemic in 2002, human isolates were found to possess a unique continuous ORF8 with 366 nucleotides and a predicted protein with 122 amino acids. During the middle and late phases of the SARS epidemic, two functional ORFs (ORF8a and ORF8b) were emerged; they are predicted to encode two small proteins, 8a with 39 amino acids and 8b with 84 amino acids. Interestingly, SARS-CoV-2 ORF8 has not undergone any significantly measurable deletion events, so its function as a full-length protein might be more important to its pathogenicity [
]. ORF8 plays a role in modulating host immune response [] which may act by down-regulating major histocompatibility complex class I (MHC-I) []. It may inhibit expression of some members of the IFN-stimulated gene (ISG) family including hosts IGF2BP1/ZBP1, MX1 and MX2, and DHX58 []. ORF8 also binds to IL17RA receptor, leading to IL17 pathway activation and an increased secretion of pro-inflammatory factors, contributing to cytokine storm during COVID-19 infection [].
S-crystallins of cephalopod are the major protein constituent of the lens in cephalopods. Their primary protein sequences show a high degree (41%) of identity with the cephalopod digestive gland sigma-class glutathione transferase (GST). In spite of the sequence similarity, lens S-crystallin shows little if any GST activity. S-crystallins fail to bind to an S-hexylglutathione affinity
column (marker for glutathione affinity) and have very little GST activity in a typical substitution reaction with glutathione and 1-chloro-2,4-dinitrobenzene []. Nevertheless, pH rate profiles indicate that anyreactions that do take place proceed via the same mechanism as GSTs [
]. Sequence analysis suggests that the S-crystallins arose from gene duplication of a cephalopod sigma-class GST [].The three-dimensional structure of the sigma-class GST from squid has been
determined to 2.4A resolution []. The protein is characterised by two domains, one of which has a 3-layer(aba) sandwich architecture, the other
being largely helical. The modelled S-crystallin structure has a similar topology to the squid sigma-class GST, with longer helices 4 and 5,
corresponding to a long insertion. The insertion causes the active centreto be in a more closed conformation than sigma-class GSTs and may explain
the low affinity for glutathione [].The function of glutathione S-transferases (GST) is the conjugation of reduced glutathione to a wide number of exogenous and endogenous hydrophobic electrophiles. The isoenzymes appear to play a central role in the parasite detoxification system. Glutathione S-transferases form homodimers, but in eukaryotes can also form heterodimers of the A1 and A2 or YC1 and YC2 subunits. The GST domain is also found in S-crystallins from squid, and proteins with no known GST activity, such as eukaryotic elongation factors 1-gamma and the HSP26 family of stress-related proteins, which include auxin-regulated proteins in plants and stringent starvation proteins in Escherichia coli.
Visual pigments [
,
] are the light-absorbing molecules that mediate vision. They consist of an apoprotein, opsin, covalently linked to the chromophore cis-retinal. Vision is effected through the absorption of a photon by cis-retinal which is isomerised to trans-retinal. This isomerisation leads to a change of conformation of the protein. Opsins are integral membrane proteins with seven transmembrane regions that belong to family 1 of G-protein coupled receptors.
In vertebrates four different pigments are generally found. Rod cells, which mediate vision in dim light, contain the pigment rhodopsin. Cone cells, which function in bright light, are responsible for colour vision and contain three or more colour pigments (for example, in mammals: red, blue and green).By contrast with vertebrate rhodopsin, which is found in rod cells, insect photoreceptors are found in the ommatidia that comprise the compound eyes. Each Drosophila eye has 800 ommatidia, each of which contains 8 photoreceptor cells (designated R1-R8): R1-R6 are outer cells, while R7 and R8
are inner cells. Opsins RH3 and RH4 are sensitive to UV light [,
,
]. In Drosophila, the eye is composed of 800 facets or ommatidia. Each ommatidium contains eight photoreceptor cells (R1-R8): the R1 to R6 cells are outer cells, R7 and R8 inner cells. Each of the three types of cells (R1-R6, R7 and R8) expresses a specific opsin.Proteins evolutionary related to opsins include:Squid retinochrome, also known as retinal photoisomerase, which converts various isomers of retinal into 11-cis retinal.Mammalian opsin 3 (Encephalopsin) that may play a role in encephalic photoreception.Mammalian opsin 4 (Melanopsin) that may mediate regulation of circadian rhythms and acute suppression of pineal melatonin.Mammalian retinal pigment epithelium (RPE) RGR [
], a protein that may also act in retinal isomerisation.The attachment site for retinal in the above proteins is a conserved lysine
residue in the middle of the seventh transmembrane helix.
The homeobox is a 60-residue motif first identified in a number of Drosophila homeotic and segmentation proteins, but now known to be well-conserved in many other animals, including vertebrates [
,
]. Proteins containing homeobox domains are likely to play an important role in development - most are known to be sequence-specific DNA-binding transcription factors. The domain binds DNA through a helix-turn-helix (HTH) structure.Many homeodomain-containing proteins have now been sequenced and, while the homeodomain flanking regions vary, characteristic conserved sequences upstream of the domain allow the proteins to be grouped into 3 subfamilies: the so-called antennapedia, engrailed and 'paired box' proteins. Antennapedia, which regulates the formation of leg structures in Drosophila, was one of the first homeotic genes studied and led to the discovery of the homeobox domain. Over expression of this gene in the wrong segment of the fruit fly can lead to the formation of leg structures in these segments. For example, over expression in the head segment can lead to the formation of legs instead of antennae (hence the name antennapedia). The sequences of the antennapedia proteins contain a conserved hexapeptide 5-16 residues upstream of the homeobox, the specific function of which is unclear. The six Drosophila proteins that belong to this group are antennapedia (Antp), abdominal-A (abd-A), deformed (Dfd), proboscipedia (pb), sex combs reduced (scr) and ultrabithorax (ubx) and are collectively known as the 'antennapedia' subfamily.In vertebrates the corresponding Hox genes are known [
] as Hox-A2, A3, A4, A5, A6, A7, Hox-B1, B2, B3, B4, B5, B6, B7, B8, Hox-C4, C5, C6, C8, Hox-D1, D3, D4 and D8.Caenorhabditis elegans lin-39 and mab-5 are also members of the 'antennapedia' subfamily.Arg and Lys are most frequently found in the last position of the hexapeptide; other amino acids are found in only a few cases.
Alpha crystallin/Small heat shock protein, animal type
Type:
Family
Description:
The crystallins are water-soluble structural proteins that occur in high concentration in the cytoplasm of eye lens fibre cells. Four major groups of crystallin have been distinguished on the basis of size, charge and immunological properties: alpha-, beta- and gamma-crystallins occur in all vertebrate classes (though gamma-crystallins are low or absent in avian lenses); and delta-crystallin is found exclusively in reptiles and birds [
,
].Alpha-crystallin occurs as large aggregates, comprising two types of related subunits or chains (A and B) that are highly similar to the small (15-30kDa) heat shock proteins (HSPs), particularly in their C-terminal halves. The relationship between these families is one of classic gene duplication and divergence, from the small HSP family, allowing adaptation to novel functions. Divergence probably occurred prior to evolution of the eye lens, alpha-crystallin being found in small amounts in tissues outside the lens [
].Alpha-crystallin has chaperone-like properties including the ability to prevent the precipitation of denatured proteins and to increase cellular tolerance to stress [
]. It has been suggested that these functions are important for the maintenance of lens transparency and the prevention of cataracts. This is supported by the observation that alpha-crystallin mutations show an association with cataract formation.HSPs can be divided into HSP100, HSP90, HSP70, HSP60, HSP40 and small heat shock proteins (sHSPs) according to their molecular weight and homology. Small heat shock proteins contain an alpha-crystallin domain and variable N- and C-terminal extensions. sHsps are generally active as large oligomers consisting of multiple subunits, and are believed to be ATP-independent chaperones that prevent aggregation and are important in refolding in combination with other Hsps [
,
].This entry represents a group of alpha-crystallin domain containing proteins from animals, including the A and B subunits (or chains) of alpha-crystallin and related small heat shock proteins.
Electron transfer flavoproteins (ETFs) serve as specific electron acceptors for primary dehydrogenases, transferring the electrons to terminal respiratory systems. They can be functionally classified into constitutive, "housekeeping"ETFs, mainly involved in the oxidation of fatty acids (Group I), and ETFs produced by some prokaryotes under specific growth conditions, receiving electrons only from the oxidation of specific substrates (Group II) [
]. ETFs are heterodimeric proteins composed of an alpha and beta subunit, and contain an FAD cofactor and AMP [
,
,
,
,
]. ETF consists of three domains: domains I and II are formed by the N- and C-terminal portions of the alpha subunit, respectively, while domain III is formed by the beta subunit. Domains I and III share an almost identical α-β-alpha sandwich fold, while domain II forms an α-β-alpha sandwich similar to that of bacterial flavodoxins. FAD is bound in a cleft between domains II and III, while domain III binds the AMP molecule. Interactions between domains I and III stabilise the protein, forming a shallow bowl where domain II resides. The alpha subunit of both Group I and Group II ETFs is composed of domains I and II.Many enterobacteria are able to convert carnitine, via crotonobetaine, to gamma-butyrobetaine in the presence of carbon and nitrogen sources under anaerobic conditions [
]. In Escherichia coli the enzymes involved in this pathway are encoded by the caiTABCDE operon []. The adjacent but divergent fixABCD operon also appears to be necessary for carnintine meatbolism []. The Fix proteins are homologous to proteins found in known electron transport pathways.This entry represents the predicted electron transfer protein alpha subunit FixB, which is necessary for anaerobic carnitine reduction. FixB may be involved in transferring reductant to the CaiA protein.
The tripartite DENN (after differentially expressed in neoplastic versus
normal cells) domain is found in several proteins that share common structuralfeatures and have been shown to be guanine nucleotide exchange factors (GEFs)
for Rab GTPases, which are regulators of practically all membrane traffickingevents in eukaryotes. The tripartite DENN domain is composed of three distinct
modules which are always associated due to functional and/or structuralconstraints: upstream DENN or uDENN, the better conserved central or core or
cDENN, and downstream or dDENN regions. The tripartite DENN domain is foundassociated with other domains, such as RUN, PLAT, PH, PPR, WD-40, GRAM or C1. The function of DENN domain remains to date unclear,
although it appears to represent a good candidate for a GTP/GDP exchange
activity [,
,
,
,
].Some proteins known to contain a tripartite DENN domain are listed below:Rat Rab3 GDP/GTP exchange protein (Rab3GEP).Human mitogen-activated protein kinase activating protein containing death
domain (MADD). It is orthologous to Rab3GEP.Caenorhabditis elegans regulator of presynaptic activity aex-3, the
ortholog of Rab3GEP.Mouse Rab6 interacting protein 1 (Rab6IP1).Human SET domain-binding factor 1(SBF1).Human suppressor of tumoreginicity 5 (ST5).Human C-MYC promoter-binding protein IRLB.The DENN domain forms a heart-shaped structure, with the N-
terminal residues forming one and the C-terminal residues forming the secondone. The N-terminal half forms the uDENN domain and consists of a central
antiparallel β-sheet layered between one helix and two helices. A longrandom-coil region links the two lobes. The C-terminal lobe is composed of
the cDENN and dDENN domains. The cDENN domain is an alpha/beta three layeredsandwich domain with a central sheet of 5-strands. The dDENN domain is an all-
alpha helical domain, whose core contains two alpha-hairpins which divergerapidly in sequence [
,
].This domain represents the entire tripartite DENN domain.
Homeobox protein, antennapedia type, conserved site
Type:
Conserved_site
Description:
The homeobox is a 60-residue motif first identified in a number of Drosophila homeotic and segmentation proteins, but now known to be well-conserved in many other animals, including vertebrates [
,
]. Proteins containing homeobox domains are likely to play an important role in development - most are known to be sequence-specific DNA-binding transcription factors. The domain binds DNA through a helix-turn-helix (HTH) structure.Many homeodomain-containing proteins have now been sequenced and, while the homeodomain flanking regions vary, characteristic conserved sequences upstream of the domain allow the proteins to be grouped into 3 subfamilies: the so-called antennapedia, engrailed and 'paired box' proteins. Antennapedia, which regulates the formation of leg structures in Drosophila, was one of the first homeotic genes studied and led to the discovery of the homeobox domain. Over expression of this gene in the wrong segment of the fruit fly can lead to the formation of leg structures in these segments. For example, over expression in the head segment can lead to the formation of legs instead of antennae (hence the name antennapedia). The sequences of the antennapedia proteins contain a conserved hexapeptide 5-16 residues upstream of the homeobox, the specific function of which is unclear. The six Drosophila proteins that belong to this group are antennapedia (Antp), abdominal-A (abd-A), deformed (Dfd), proboscipedia (pb), sex combs reduced (scr) and ultrabithorax (ubx) and are collectively known as the 'antennapedia' subfamily.In vertebrates the corresponding Hox genes are known [
] as Hox-A2, A3, A4, A5, A6, A7, Hox-B1, B2, B3, B4, B5, B6, B7, B8, Hox-C4, C5, C6, C8, Hox-D1, D3, D4 and D8.Caenorhabditis elegans lin-39 and mab-5 are also members of the 'antennapedia' subfamily.Arg and Lys are most frequently found in the last position of the hexapeptide; other amino acids are found in only a few cases.
This entry represents a structural domain found in both the histidine-containing phosphocarrier protein HPr, as well as its structural homologues, which includes the catabolite repression protein Crh found in Bacillus subtilis [,
]. This domain has a alpha+beta structure found in two layers with an overall architecture of an open faced β-sandwich in which a β-sheet is packed against three α-helices. The histidine-containing phosphocarrier protein (HPr) is a central component of the phosphoenolpyruvate-dependent sugar phosphotransferase system (PTS), which transfers metabolic carbohydrates across the cell membrane in many bacterial species [
,
]. PTS catalyses the phosphorylation of incoming sugar substrates concomitant with their translocation across the cell membrane. The general mechanism of the PTS is as follows: a phosphoryl group from phosphoenolpyruvate (PEP) is transferred to Enzyme I (EI) of the PTS, which in turn transfers it to the phosphoryl carrier protein (HPr) [,
]. Phospho-HPr then transfers the phosphoryl group to a sugar-specific permease complex (enzymes EII/EIII). HPr [
,
] is a small cytoplasmic protein of 70 to 90 amino acid residues. In some bacteria, HPr is a domain in a larger protein that includes a EIII(Fru) (IIA) domain and in some cases also the EI domain. A conserved histidine in the N-terminal section of HPr serves as an acceptor for the phosphoryl group of EI. In the central part of HPr, there is a conserved serine which (in Gram-positive bacteria only) is phosphorylated by an ATP-dependent protein kinase; a process which probably play a regulatory role in sugar transport. Regulatory phosphorylation at the conserved Ser residue does not appear to induce large structural changes to the HPr domain, in particular in the region of the active site [,
].
This entry represents the GP-PDE domain.The glycerophosphodiester phosphodiesterases (GD-PDEs) were initially
characterised in bacteria, where they have functional roles for production ofmetabolic carbon and phosphate sources from glycerophosphodiesters and in
adherence to and degradation of mammalian host-cell membranes. Mammalian GP-GDEs have been identified more recently and shown to be implicated in several
physiological functions. GD-PDEs are involved in glycerol metabolism andcatalyze the reaction of glycerophosphodiester and water to alcohol and
sn-glycerol-3-phosphate. They display broad specificity forglycerophosphodiesters, such as glycerophosphocholine,
glycerophosphoethanolamine, glycerophosphoglycerol andbis(glycerophosphoglycerol).The GP-PDE domain adopts the ubiquitous triosephosphate isomerase (TIM) barrel
alpha/beta fold. The TIM barrel iscomprised of an eight-stranded parallel β-sheet barrel surrounded by eight
α-helices. There is a small insertion to the conventional TIM barrelstructure referred to as the GDPD-insertion (GDPD-I). The GDPD-I is comprised
of beta strands, α-helices (H3 and H4), and 3/10 helices. Although the TIMbarrel and a small insertion are unique for GP-PDE family, there are subtle
differences in size and topology of each domain [,
].Some proteins known to contain a GP-PDE domain are listed below:Bacterial glycerophosphoryl diester phosphodiesterase GlpQ (EC 3.1.4.46).Bacterial gylcerophosphoryl diester phosphodiesterase UgpQ (EC 3.1.4.46).Mammalian glycerophosphodiester phosphodiesterase 1 (GDE1) (EC 3.1.4.44)(or MMIR16) [
], an integral membrane glycoprotein that interacts withregulator of G protein signaling proteins. It hydrolyzes
glycerophosphoinositols (GPIs) producing inositol and glycerol 3-phosphate.Mammalian glycerophosphodiester phosphodiesterase domain-containing protein
5 (GDPD5) (EC 3.1.-.-) (or GDE2) [].Mammalian glycerophosphoinositol inositolphosphodiesterase GDPD2 (or GDE3)
(EC 3.1.4.43) [], up-regulated during osteoblast differentiation and canaffect cell morpholgy. It hydrolyzes glycerophosphoinositol (GroPIns),
producing inositol 1-phosphate and glycerol.Mammalian glycerophosphodiester phosphodiesterase domain-containing protein
1 (GDPD1) (EC 3.1.-.-) (or GDE4) [].Mammalian glycerophosphocholine phosphodiesterase GPCPD1 (EC 3.1.4.2) (or
GDE5), selectively hydrolyzes glycerophosphocholine (GroPCho) and controlsskeletal muscle development [
].Mammalian glycerophosphodiester phosphodiesterase domain-containing protein
4 (GDPD4) (EC 3.1.-.-) (or GDE6) [].
The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [
]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [
,
,
].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [
]. This superfamily represents the CRISPR-associated endonuclease Cas1, N-terminal β-sandwich-like domain. Cas1 may play a role in the recognition, cleavage, and/or integration of foreign nucleic acids into CRISPRs.
The outer and inner segments of vertebrate rod photoreceptor cells contain phosducin, a soluble phosphoprotein that complexes with the beta/gamma-subunits of the GTP-binding protein, transducin. Light-induced changes in cyclic nucleotide levels modulate the phosphorylation of phosducin by protein kinase A [
]. The protein is thought to participate in the regulation of visual phototransduction or in the integration of photo-receptor metabolism. Similar proteins have been isolated from the pineal gland and it is believed that the functional role of the protein is the same in both retina and pineal gland [].This entry represents a domain found in members of the phosducin family. This domain has a thioredoxin-like fold [
].
Members of protein family FAM175 include the BRCA1-A complex subunit Abraxas 1 [
,
], BRISC complex subunit Abraxas 2 or Abro1 (Abraxas brother protein 1) [,
], and uncharacterised plant proteins.It is thought that BRCA1-A complex subunit Abraxas acts as a central scaffold protein responsible for assembling the various components of the BRCA1-A complex, and mediates recruitment of BRCA1 [
,
]. Similarly, Abro1 probably acts as a scaffold facilitating assembly of the various components of BRISC [] - the protein does not interact with BRCA1, but binds polyubiquitin []. The primary sequences of these proteins contain an MPN-like domain [].This entry represents the plant members of FAM175.
The BRO1 domain is a protein domain of ~390 residues in length. It occurs in a number of eukaryotic proteins, such as yeast BRO1 and human PDCD6IP/Alix, which are involved in protein targeting to the vacuole or lysosome. The BRO1 domain of fungal and mammalian proteins binds with multivesicular body components (ESCRT-III proteins) such as yeast Snf7 and mammalian CHMP4b, and can function to target BRO1 domain-containing proteins to endosomes [
,
,
].The BRO1 domain has a boomerang shape composed of 14 α-helices and 3 β-sheets. It contains a TPR-like substructure in the central part [
]. The C terminus is less conserved.
Cytochrome C biogenesis protein, transmembrane domain
Type:
Domain
Description:
This entry represents the transmembrane domain of Cytochrome C biogenesis proteins also known as disulphide interchange proteins, such as DsbD from E. coli and DipZ from Mycobacterium. These proteins posses a protein disulphide isomerase like domain that is not found within the aligned region of this family.DsbA and DsbC, periplasmic proteins of E. coli, are two key players involved in disulphide bond formation. DsbD generates a reducing source in the periplasm, which is required for maintaining proper redox conditions [
]. DipZ is essential for maintaining cytochrome c apoproteins in the correct conformations for the covalent attachment of haem groups to the appropriate pairs of cysteine residues [].
This entry contains the alpha subunit (Acetyl-coenzyme A carboxylase carboxyl transferase ACCA
) of the acetyl coenzyme A carboxylase complex (
). ACCA catalyses the transfer of a carboxyl group carried on a biotinylated biotin carboxyl carrier protein (BCCP) to acetyl-CoA, forming malonyl-CoA, the first step in the synthesis of long-chain fatty acids. The acetyl-CoA carboxylase complex is a heterohexamer of biotin carboxyl carrier protein, biotin carboxylase and two non-identical carboxyl transferase subunits (alpha and beta) in a 2:2 association [
]. The reaction involves two steps:Biotin carrier protein + ATP + HCO3-->Carboxybiotin carrier protein + ADP + P
iCarboxybiotin carrier protein + Acetyl-CoA ->Malonyl-CoA + Biotin carrier protein
The yeast protein Sec16 plays a key role in the formation of coat protein II vesicles, which mediate protein transport from the endoplasmic reticulum (ER) to the Golgi apparatus [
]. Mammals have two isoforms of this protein - Sec16A and Sec16B. Sec16A appears to be the primary orthologue as it has the highest sequence similarity to the yeast sequence. Sec16B is involved in exportof the peroxisomal membrane biogenesis factor peroxin 16 [
].This entry represents the central conserved domain (CCD) of Sec16, found in all isoforms of this protein. The CCD is necessary for targeting of the protein to the ER [
].
The MD-2-related lipid-recognition (ML) domain is implicated in lipid recognition, particularly in the recognition of pathogen related products. It has an immunoglobulin-like β-sandwich fold similar to that of E-set Ig domains. This domain is present in proteins from plants, animals and fungi, including the following proteins:
Epididymal secretory protein E1 (also known as Niemann-Pick C2 protein - Npc2), which is known to bind cholesterol. Niemann-Pick disease type C2 is a fatal hereditary disease characterised by accumulation of low-density lipoprotein-derived cholesterol in lysosomes [
].House-dust mite allergen proteins such as Der f 2 from Dermatophagoides farinae and Der p 2 from Dermatophagoides pteronyssinus [
].
This entry represents non-heme Ferritins found in archaea and bacteria. They belong to a broad superfamily of ferritin-like diiron-carboxylate proteins. The ferritin protein shell is composed of 24 protein subunits arranged in 432-point symmetry. Each protein subunit, a four-helix bundle with a fifth short terminal helix, contains a dinuclear ferroxidase centre (H type) [
,
]. Unique to this group of proteins is a third metal site in the ferroxidase centre []. Iron storage involves the uptake of iron (II) at the protein shell, its oxidation by molecular oxygen at the ferroxidase centres, and the movement of iron (III) into the cavity for deposition as ferrihydrite [].
The BRO1 domain is a protein domain of ~390 residues in length. It occurs in a number of eukaryotic proteins, such as yeast BRO1 and human PDCD6IP/Alix, which are involved in protein targeting to the vacuole or lysosome. The BRO1 domain of fungal and mammalian proteins binds with multivesicular body components (ESCRT-III proteins) such as yeast Snf7 and mammalian CHMP4b, and can function to target BRO1 domain-containing proteins to endosomes [
,
,
].The BRO1 domain has a boomerang shape composed of 14 α-helices and 3 β-sheets. It contains a TPR-like substructure in the central part [
]. The C terminus is less conserved.
The tripartite motif-containing protein (Trim) family is defined by the presence of a common domain structure composed of a RING finger, a B-box, and a coiled-coil motif [
]. Trim36 is a E3 ubiquitin-protein ligase that mediates ubiquitination and subsequent proteasomal degradation of target proteins. It interacts with centromere protein-H, one of the kinetochore proteins and is involved in chromosome segregation and cell cycle regulation. This protein has been associated with anencephaly [,
,
]. It may play a role in the acrosome reaction and fertilisation [,
].This entry represents the B-box-type 2 zinc finger of Trim36, which is characterised by a CHC3H2 zinc-binding consensus motif.
Smac (Second Mitochondria-derived Activator of Caspase) and DIABLO (Direct IAP-Binding protein with Low PI) are 29kDa mitochondrial precursor proteins. Apoptosis, or programmed cell death, is an essential process in metazoan development and homeostasis. Apoptosis is carried out by caspases, which are under tight regulatory control. Inhibitor-of-apoptosis proteins (IAP) suppress apoptosis by inhibiting caspases, while the mitochondrial protein Smac/Diablo promotes apoptosis by suppressing IAPs. The N-terminal four sequences in Smac/Diablo are responsible for the protein's interaction with IAPs [
]. The crystal structure of Smac/Diablo reveals a closed, three-helical bundle structure with a left-handed twist, and that the protein homodimerises through an extensive hydrophobic interface [].
Unique short glycoprotein 2 (US2) and US3 are Herpesviral proteins that interfere with the expression of major histocompatibility complex (MHC) class I molecules on the surface of infected cells. US2 is an endoplasmic reticulum resident transmembrane protein that binds to newly synthesized MHC class I heavy chains in the endoplasmic reticulum, redirecting them to the cytosol for proteasome-dependent destruction, thereby preventing their expression at the cell surface [
]. US3 is similarly an endoplasmic reticulum resident transmembrane glycoprotein that binds MHC class I molecules and prevents their departure. The endoplasmic reticulum retention signal of the US3 protein is contained in the luminal domain of the protein [].
This entry represents a protein family which is homologous to the HlyD membrane fusion protein of type I secretion systems. The genes encoding these proteins within prokaryotic genomes are associated with genes encoding a novel class of microcin (small bacteriocin) which has a propeptide region related to nitrile hydratase. The bacteriocin has been designated as Nitrile Hydratase Propeptide Microcin, or NHPM. On this basis, therefore, proteins in this entry are designated as NHPM bacteriocin system secretion proteins. Some, but not all, NHPM-class putative microcins belong to the TOMM (thiazole/oxazole modified microcin) class as assessed by the presence of the scaffolding protein and/or cyclodehydratase in the same gene clusters.
Togavirin, also known as Sindbis virus core endopeptidase, is a serine protease resident at the N terminus of the p130 polyprotein of togaviruses [
]. The endopeptidase signature identifies the peptidase as belonging to the MEROPS peptidase family S3 (togavirin family, clan PA(S)). The polyprotein also includes structural proteins for the nucleocapsid core and for the glycoprotein spikes []. Togavirin is only active while part of the polyprotein, cleavage at a Trp-Ser bond resulting in total lack of activity []. Mutagenesis studies have identified the location of the His-Asp-Ser catalytic triad, and X-ray studies have revealed the protein fold to be similar to that of chymotrypsin [,
].
Testin contains a PET domain at the N terminus and three C-terminal LIM domains [
]. It is a cytoskeleton associated focal adhesion protein that localizes along actin stress fibres, at cell-cell contact areas, and at focal adhesion plaques [,
]. Testin interacts with a variety of cytoskeletal proteins, including zyxin, mena, VASP, talin, and actin and is involved in cell motility and adhesion events. Knockout mice experiments have revealed a tumour repressor function. [,
].The entry represents the testin PET domain. It is a protein-protein interaction domain and PET containing proteins serve as adaptors or scaffolds to support the assembly of multimeric protein complexes.
COMM (copper metabolism gene MURR1) domain proteins constitute a family initially identified as interacting partners of COMMD1 (previously known as MURR1), the prototype member of this protein family. COMMD1 is a multifunctional protein that has been shown to participate in two apparently distinct activities, regulation of the transcription factor NF-kappa-B and control of copper metabolism. The family is defined by the presence of a C-terminal motif termed COMM domain, which functions as an interface for protein-protein interactions. The proteins designated as COMMD or COMM domain containing 1-10 are extensively conserved in multicellular eukaryotic organisms [,
].The leucine-rich, 70-85 amino acid long COMM domain is predicted to form a β-sheet [
].
This domain has an essentially invariant motif, Cys-Gly-Pro, followed by a highly hydrophobic transmembrane domain, always at the protein C terminus. It occurs, so far, strictly in the family Thermococcaceae (including Thermococcus and Pyrococcus) within the Euryarchaeota. It occurs in ten proteins per genome on average, and proteins with the domain may lack similarity elsewhere. The presumed sorting/processing protein, for which this domain contains the recognition sequence is unknown, but it is unlikely to be a member of the exosortase/archaeosortase family. The Cys residue suggests a lipid modification. Upstream from this domain, most member proteins have an extremely Thr-rich sequence, suggesting archaeal surface protein O-linked glycosylation.
The ~180-residue NIDO domain is an extracellular domain of unknown function,
found in nidogen (entactin) and hypothetical proteins. The NIDO domain isfound in association with other domains, such as nidogen G2 β-barrel (
), thyroglobulin type-1 (
), LDLRB (
), AMOP (
), EGF-like (
), VWFD, IPT/TIG, or sushi/CCP/SCR () [
,
,
,
].Some proteins known to contain a NIDO domain are listed below:
Vertebrate nidogen-1 (NID-1) or entactin, a sulphated glycoprotein widely
distributed in basement membranes.Vertebrate nidogen-2 (NID-2) or osteonidogen, a cell adhesion glycoprotein
which is widely distributed in basement membranes.Vertebrate alpha-tectorin.Mammalian mucin-4 (MUC4), a highly glycosylated membrane-bound protein.Xenopus ID14, a putative matrix protein.
Guanine-nucleotide dissociation stimulator, CDC24, conserved site
Type:
Conserved_site
Description:
Ras proteins are membrane-associated molecular switches that bind GTP and GDP and
slowly hydrolyze GTP to GDP []. The balance between the GTP bound (active) and GDP bound (inactive) states is regulated by the
opposite action of proteins activating the GTPase activity and that of proteins which promote the loss of bound GDP and the uptake of fresh GTP [
,
]. The latter proteinsare known as guanine-nucleotide dissociation stimulators (GDSs) or also as
guanine-nucleotide releasing (or exchange) factors (GRFs). Proteins that act as GDS can be classified into at least two families. One of these families is currently known to
group the CDC24 family of proteins.
The tripartite motif-containing protein (Trim) family is defined by the presence of a common domain structure composed of a RING finger, a B-box, and a coiled-coil motif [
]. Trim36 is a E3 ubiquitin-protein ligase that mediates ubiquitination and subsequent proteasomal degradation of target proteins. It interacts with centromere protein-H, one of the kinetochore proteins and is involved in chromosome segregation and cell cycle regulation. This protein has been associated with anencephaly [,
,
]. It may play a role in the acrosome reaction and fertilisation [,
].This entry represents the B-box-type 1 zinc finger of Trim36, which is characterized by a C6H2 zinc-binding consensus motif.
A number of bacterial proteins seem to be involved in translocating specific
proteins across the bacterial cell membrane by a distinct secretory mechanismthat does not require the cleavage of a signal peptide. They belong to the FHIPEP family, which stands for Flagella/Hr/Invasion Proteins Export Pore [
,
,
].The N-terminal half of these proteins is highly conserved and seems to contain 6 to 8 transmembrane domains. The C-terminal half is less conserved and seemingly devoid of transmembrane domains. It is possible that these proteins serve as pores for the export of specific proteins. This entry represents a highly
conserved hydrophilic region between two transmembranes regions.
The Dok family adapters are phosphorylated by different protein tyrosine kinases. Dok proteins are involved in processes such as modulation of cell differentiation and proliferation, as well as in control of the cell spreading and migration The Dok protein contains an N-terminal pleckstrin homology (PH) domain followed by a central phosphotyrosine binding (PTB) domain, which has a PH-like fold, and a proline- and tyrosine-rich C-terminal tail. The PH domain binds to acidic phospholids and localizes proteins to the plasma membrane, while the PTB domain mediates protein-protein interactions by binding to phosphotyrosine-containing motifs [
].This entry represents the PH domain found in Dok4, Dok5 and Dok6 [
,
].
The PRELI/MSF1 domain is an eukaryotic protein module which occurs in stand-
alone form in several proteins, including the human PRELI protein and theyeast MSF1 protein, and as an amino-terminal domain in an orthologous group of
proteins typified by human SEC14L1, which is conserved in all animals. In thisgroup of proteins, the PRELI/MSF1 domain co-occurs with the CRAL-TRIO and the GOLD domains. The PRELI/MSF1 domain is
approximately 170 residues long and is predicted to assume a globular alpha +beta fold with six beta strands and four alpha helices. It has been suggested
that the PRELI/MSF1 domain may have a function associated with cellularmembrane [
].
Streptococcal surface repeat domain - SSURE - is a protein fragment found to bind to extracellular matrix protein fibronectin but not to collagen or submaxillary mucin in Streptococci. Anti-SSURE antibodies recognised the corresponding protein on the surface of streptococcal cells. The full-length proteins are thus fibronectin-binding surface adhesins [
]. The proteins are further characterised by having an N-terminal motif resembling [YF]SIRKxxxGxxS[VIA] and a C-terminal LPXTG motif-containing region which is a characteristic of many surface proteins of Streptococcus and Streptomyces species. Cleavage between the Thr and Gly by sortase or a related enzyme leads to covalent anchoring at the new C-terminal Thr to the cell wall.
This family consists of several mammalian protein kinase A anchoring protein 3 (PRKA3) or A-kinase anchor protein 110kDa (AKAP 110) sequences. Agents that increase intracellular cAMP are potent stimulators of sperm motility. Anchoring inhibitor peptides, designed to disrupt the interaction of the cAMP-dependent protein kinase A (PKA) with A kinase-anchoring proteins (AKAPs), are potent inhibitors of sperm motility. PKA anchoring is a key biochemical mechanism controlling motility. AKAP110 shares compartments with both RI and RII isoforms of PKA and may function as a regulator of both motility- and head-associated functions such as capacitation and the acrosome reaction [
].This entry represents a sub group of the A-kinase anschor 110kDa protein.
This entry corresponds to the Asp/Pro-rich domains (DP) of STI1/HOP and similar proteins [
], including TIC40, a co-chaperone during protein import into chloroplasts []. STI1/HOP proteins are cochaperones that belong to the foldosome, a system consisting in Hsp70 and Hsp90 chaperones. STI1/HOP acts as an adapter protein capable of transferring client proteins from the first to the second molecular chaperone []. This domain is found in two copies in STI1, named DP1 and DP2 [,
,
], while in others is present in one copy, such as Hsp70-Hsp90 organising protein from P. falciparum. This domain (and specifically DP2 from STI1), is involved in client activation [].
This family consists of several tobravirus 2B proteins. It is known that the 2B protein is required for transmission by both Paratrichodorus pachydermus and Paratrichodorus
anemones nematodes []. Transmission of the tobraviruses Tobacco rattle virus by trichodorid vector nematodes requires the viral coat protein (CP) and the 2B protein, a nonstructural protein encoded by RNA2, the smaller of the two viral genomic RNAs. It is hypothesized that the 2B protein functions by interacting with a small, flexible domain located at the C terminus of the CP, forming a bridge between the virus particle and the internal surface of the vector nematode feeding apparatus [].
This superfamily represents an actin-crosslinking domain with a β-trefoil structure, consisting of a triplet of β-hairpins packed against a six-stranded antiparallel β-barrel. Proteins containing this domain include fascin, which carries a tandem repeat of four copies of this domain, and the histidine-rich actin-binding protein hisactophilin. Actin-crosslinking proteins organise actin filaments into dynamic and complex subcellular scaffolds that orchestrate important mechanical functions, including cell motility and adhesion [
].The fascins are a structurally unique and evolutionarily conserved group of actin cross-linking proteins. Fascins function in the organisation of two major forms of actin-based structures: dynamic, cortical cell protrusions and cytoplasmic microfilament bundles [
].A related protein, hisactophilin, is an essential protein for the osmoprotection of dictyostelium cells [
,
].
Protein-L-isoaspartate(D-aspartate) O-methyltransferase (
) (PCMT) [
] (which is also known as L-isoaspartyl protein carboxyl methyltransferase) is an enzyme that catalyses the transfer of a methyl group from S-adenosylmethionine to the free carboxyl groups of D-aspartyl or L-isoaspartyl residues in a variety of peptides and proteins. The enzyme does not act on normal L-aspartyl residues L-isoaspartyl and D-aspartyl are the products of the spontaneous deamidation and/or isomerisation of normal L-aspartyl and L-asparaginyl residues in proteins. PCMT plays a role in the repair and/or degradation of these damaged proteins; the enzymatic methyl esterification of the abnormal residues can lead to their conversion to normal L-aspartyl residues. The SAM domain is present in most of these proteins.
This domain is found in a number of proteins, including TPR protein (
) and yeast myosin-like protein 1 (MLP1,
). These proteins share a number of features; for example, they have coiled-coil regions and are associated with nuclear pores [
,
,
]. TPR is thought to be a component of nuclear pore complex- attached intranuclear filaments [], and is implicated in nuclear protein import []. Moreover, its N-terminal region is involved in the activation of oncogenic kinases, possibly by mediating the dimerisation of kinase domains or by targeting these kinases to the nuclear pore complex []. Mlp1 acts as a docking platform for for heterogeneous nuclear ribonucleoproteins that are required for mRNA export [].
Fatty acid synthesis involves a set of reactions, starting with carboxylation of acetyl-CoA to malonyl-CoA. This is an irreversible reaction, catalysed by the acetyl-CoA carboxylase complex (
); a heterohexamer of biotin carboxyl carrier protein, biotin carboxylase and two non-identical carboxyl transferase subunits (alpha and beta) in a 2:2 association [
]. The reaction involves two steps:
Biotin carrier protein + ATP + HCO
3-= Carboxybiotin carrier protein + ADP + P
iCarboxybiotin carrier protein + Acetyl-CoA = Malonyl-CoA + Biotin carrier proteinIn the first step, biotin carboxylase catalyses the carboxylation of the carrier protein to form an intermediate. Next, the transcarboxylase complex transfers the carboxyl group from the intermediate to acetyl-CoA forming malonyl-CoA.
Bromodomains are found in a variety of mammalian, invertebrate and yeast DNA-binding proteins [
]. Bromodomains are highly conserved α-helical motifs that can specifically interact with acetylated lysine residues on histone tails [,
]. In some proteins, the classical bromodomain has diverged to such an extent that parts of the region are either missing or contain an insertion (e.g., mammalian protein HRX, Caenorhabditis elegans hypothetical protein ZK783.4, yeast protein YTA7). The bromodomain may occur as a single copy, or in duplicate.This domain is present in proteins involved in a wide range of functions such as acetylating histones, remodeling chromatin, and recruiting other factors necessary for transcription, thus playing a critical role in the regulation of transcription [
].
Amino acid permeases are integral membrane proteins involved in the transport of amino acids into the cell. A number of such proteins have been found to be evolutionary related [
,
,
]. These proteins include several yeast specific and general amino acid permeases; Emericella nidulans (Aspergillus nidulans) proline transport protein (gene prnB); Trichoderma harzianum amino acid permease INDA1; Salmonella typhimurium L-asparagine permease (gene ansP); and several Escherichia coli and other bacterial permeases and transport proteins. These proteins seem to contain up to 12 transmembrane segments. This entry consists of members of the amino acid-polyamine-organocation (APC) superfamily [].Also included in this entry is the methylthioribose transporter mtrA from Bacillus subtilis, which transports methylthioribose into the cell [
].
Bromodomains are found in a variety of mammalian, invertebrate and yeast DNA-binding proteins [
]. Bromodomains are highly conserved α-helical motifs that can specifically interact with acetylated lysine residues on histone tails [,
]. In some proteins, the classical bromodomain has diverged to such an extent that parts of the region are either missing or contain an insertion (e.g., mammalian protein HRX, Caenorhabditis elegans hypothetical protein ZK783.4, yeast protein YTA7). The bromodomain may occur as a single copy, or in duplicate.This domain is present in proteins involved in a wide range of functions such as acetylating histones, remodeling chromatin, and recruiting other factors necessary for transcription, thus playing a critical role in the regulation of transcription [
].
Members of this family are integral membrane proteins. This family includes a protein with hemolytic activity from Bacillus cereus [
]. YOL002c (AdipoR-like receptor IZH2) from Saccharomyces cerevisiae encodes a protein that plays a key role in metabolic pathways that regulate lipid and phosphate metabolism [
,
]. In eukaryotes, members are seven-transmembrane pass molecules found to encode functional receptors with a broad range of apparent ligand specificities, including progestin and adiponectin (AdipoQ) receptors (AdipoR), and hence have been named PAQR proteins []. The mammalian members include progesterone binding proteins []. Unlike the case with GPCR receptor proteins, the evolutionary ancestry of the members of this family can be traced back to the Archaea.
This is a family of glycine cleavage H-proteins, part of the glycine cleavage system (GCS) found in bacteria, archaea, and the mitochondria of eukaryotes. GCS is a multienzyme complex consisting of 4 different components (P-, H-, T- and L-proteins) which catalyzes the oxidative cleavage of glycine [
]. The H-protein shuttles the methylamine group of glycine from the P-protein (glycine dehydrogenase) to the T-protein (aminomethyltransferase) via a lipoyl group, attached to a completely conserved lysine residue [].This enty includes Abitram proteins. Abitram (also known as Simiate) functions in transcription regulation [
]. Furthermore, it associates with both G- and F-Actin and affects actin polymerization and actin turnover, influencing filopodia dynamics and arborization of neurons [].
This entry represents the RNA recognition motif 3 (RRM3) of HuB. HuB (also known as ELAV-like protein 2 or Hel-N1) is one of the neuronal members of the Hu family [
]. The neuronal Hu proteins play important roles in neuronal differentiation, plasticity and memory. HuB is also expressed in gonads. It is up-regulated during neuronal differentiation of embryonic carcinoma P19 cells []. Like other Hu proteins, HuB contains three RNA recognition motifs (RRMs). RRM1 and RRM2 may cooperate in binding to an AU-rich RNA element (ARE). RRM3 may help to maintain the stability of the RNA-protein complex, and might also bind to poly(A) tails or be involved in protein-protein interactions [].
This entry represents the RNA recognition motif 2 (RRM2) of HuB. HuB (also known as ELAV-like protein 2 or Hel-N1) is one of the neuronal members of the Hu family [
]. The neuronal Hu proteins play important roles in neuronal differentiation, plasticity and memory. HuB is also expressed in gonads. It is up-regulated during neuronal differentiation of embryonic carcinoma P19 cells []. Like other Hu proteins, HuB contains three RNA recognition motifs (RRMs). RRM1 and RRM2 may cooperate in binding to an AU-rich RNA element (ARE). RRM3 may help to maintain the stability of the RNA-protein complex, and might also bind to poly(A) tails or be involved in protein-protein interactions [].
Respiratory synctial virus (RSV) has two major virion envelope proteins, the fusion F and major attachment G glycoproteins, which are the two viral neutralisation antigens []. This entry represents the major surface glycoprotein G from RSV. G glycoprotein interacts with host CX3CR1, the receptor for the CX3C chemokine fractalkine, to modulate the immune response and facilitate infection [,
]. There are two versions of the G protein: the full length G protein (mG), which is anchored by a transmembrane domain near the N terminus; the secreted version (sG), which lacks the transmembrane domain due to an alternative initiation of translation []. The secreted version-sG helps the virus evade antibody-mediated restriction of replication by acting as an antigen decoy [].
This family consists of a series of short proteins of around 90 residues in length. The human protein
(or BC10) has been implicated in bladder cancer where the transcription of the gene coding for this protein is nearly completely abolished in highly invasive transitional cell carcinomas (TCCs) [
]. The protein is a small globular protein containing two transmembrane helices, and it is a multiply edited transcript. All the editing sites are found in either the 5'-UTR or the N-terminal section of the protein, which is predicted to be outside the membrane. The three coding edits are all non-synonymous and predicted to encode exposed residues []. The function of this family is unknown.
This family consists of several VirE2 proteins which seem to be specific to Agrobacterium tumefaciens and Rhizobium etli. VirE2 is known to interact, via its C terminus, with VirD4. A. tumefaciens transfers oncogenic DNA and effector proteins to plant cells during the course of infection. Substrate translocation across the bacterial cell envelope is mediated by a type IV secretion (TFS) system composed of the VirB proteins, as well as VirD4, a member of a large family of inner membrane proteins implicated in the coupling of DNA transfer intermediates to the secretion machine. VirE2 is therefore thought to be a protein substrate of a type IV secretion system which is recruited to a member of the coupling protein superfamily [].
This entry includes the ORF8b accessory protein from Middle East respiratory syndrome-related coronavirus (MERS-CoV) and related merbecoviruses (C lineage). The gene encoding ORF8b is an internal ORF that is overlapped by the N (nucleocapsid) protein gene (ORF8a) [
]. ORF8b appear to have no homologous proteins in Sarbecovirus (lineage B), which includes Severe acute respiratory syndrome (SARS) coronavirus (SARS-CoV) and SARS-CoV-2 (2019 novel coronavirus, 2019-nCoV). MERS-CoV ORF8b is not essential for viral replication. It is related to protein I (also known as accessory protein N2) of bovine enteritic coronavirus-F15 strain (BECV-F15) and other related Embecoviruses; the gene encoding protein I is included in the N gene as an alternative ORF [,
,
].
This entry represents the RNA recognition motif (RRM) of BRAP2, also known as BRCA1-associated protein, a novel cytoplasmic protein interacting with the two functional nuclear localisation signal (NLS) motifs of BRCA1, a nuclear protein linked to breast cancer. It also binds to the SV40 large T antigen NLS motif and the bipartite NLS motif found in mitosin [
]. BRAP2 may serve as a cytoplasmic retention protein and play a role in the regulation of nuclear protein transport [,
]. It has been shown to act as a negative regulator of nuclear import of viral proteins []. It contains an N-terminal RNA recognition motif (RRM), followed by a C3HC4-type ring finger domain and a UBP-type zinc finger.
The insulin receptor substrate protein (IRS) family consists of IRS-1, -2, -3 and -4. IRS family proteins are adapter proteins that relay signals from receptor tyrosine kinases to downstream components of signalling pathways [
]. They are important regulatory factors in insulin signaling pathways []. When tyrosine is phosphorylated by the activated insulin receptor, the IRS proteins recruit and activate various adapter molecules or enzymes, such as phosphoinositide 3-kinase(PI3K) and mitogen-activated protein kinase (MAPK) to facilitate glucose uptake [], lipid metabolism [,
] and cell proliferation [,
].IRS proteins consist of two highly conserved domains in the N-terminal region, a pleckstrin homology (PH) domain and a phosphotyrosine-binding (PTB) domain, followed by a long, non-conserved C-terminal domain.