RIN1, a member of the RIN (AKA Ras interaction/interference) family, have multifunctional domains including SH2 and proline-rich (PR) domains in the N-terminal region, and RIN-family homology (RH), VPS9 and Ras-association (RA) domains in the C-terminal region. RIN proteins function as Rab5-GEFs [
]. Previous studies showed that RIN1 interacts with EGF receptors via its SH2 domain and regulates trafficking and degradation of EGF receptors via its interaction with STAM, indicating a vital role for RIN1 in regulating endosomal trafficking of receptor tyrosine kinases (RTKs) []. RIN1 was first identified as a Ras-binding protein that suppresses the activated RAS2 allele in S. cerevisiae. RIN1 binds to the activated Ras through its carboxyl-terminal domain and this Ras-binding domain also binds to 14-3-3 proteins as Raf-1 does [].The SH2 domain of RIN1 are thought to interact with the phosphotyrosine-containing proteins, but the physiological partners for this domain are unknown. The proline-rich domain in RIN1 is similar to the consensus SH3 binding regions.
Zeste white 10 (ZW10) was initially identified as a mitotic checkpoint protein involved in chromosome segregation as part of the RZZ (Rod-Zwilch-Zw10) complex, and then implicated in targeting cytoplasmic dynein and dynactin to mitotic kinetochores, but it is also important in non-dividing cells. These include cytoplasmic dynein targeting to Golgi and other membranes, and SNARE-mediated ER-Golgi trafficking in the context of Zw10-Rint1-Nag complex which are DSL11, Tip20, and Sec39 orthologues, respectively [
,
,
]. Yeast DSL1, is a peripheral membrane protein required for transport between the Golgi and the endoplasmic reticulum, part of the Dsl1p complex (Dsl1Tip20-Sec20-Sec39) [,
,
]. These proteins are members of the CATCHR (complexes associated with tethering containing helical rods) family which includes subunits of evolutionarily related complex, such as conserved oligomeric Golgi (COG), Golgi-associated retrograde protein (GARP) and exocyst []. This entry represents the C-terminal domain of Zw10 and DSL1 which consists of an α-helical bundle.
This entry represents the ePHD finger of Peregrin (also known as BRPF1), which is a multi-domain protein that binds histones, mediates monocytic leukemic zinc-finger protein (MOZ) -dependent histone acetylation, and is required for Hox gene expression and segmental identity [
]. It is a close partner of the MOZ histone acetyltransferase (HAT) complex and a novel Trithorax group (TrxG) member with a central role during development [,
]. BRPF1 is primarily a nuclear protein that has a broad tissue distribution and is abundant in testes and spermatogonia. It contains a plant homeodomain (PHD) zinc finger followed by a non-canonical ePHD finger, a bromodomain and a proline-tryptophan-tryptophan-proline (PWWP) domain. This PHD finger binds to methylated lysine 4 of histone H3 (H3K4me), the bromodomain interacts with acetylated lysines on N-terminal tails of histones and other proteins, and the PWWP domain shows histone-binding and chromatin association properties. BRPF1 may be involved in chromatin remodeling. The extended plant homeodomain (ePHD) zinc finger is characterized as Cys2HisCys5HisCys2His [,
].
Sac2 is an evolutionary conserved protein in multicellular organisms from
nematode to human. Sac2 orthologues are also found in fungi genera such asAspergillus species and Yarrowia lipolytica, but is absent in Saccharomyces
cerevisiae. Sac2 is a phosphatidylinositol (PI) 4-phosphatase thatspecifically hydrolyzes phosphatidylinositol 4-phosphate (PI(4)P) to
phosphatidylinositol. Besides the conserved N-terminal Sac1 domain, Sac2 contains a unique homology domain, homology Sac2 (hSac2),followed by a proline-rich C-terminal portion with various lengths in
different species. The hSac2 domain plays a role in Sac2 dimerization andintracellular localization. The hSac2 domain is also found in proteins encoded
by the transformation-related protein 63 regulated 1 (TPRG1) and the tumorprotein p63 regulated gene 1-like genes (TPRG1L) [
,
,
,
].The hSac2 domain has a PH domain-like fold and consists of a
core of two perpendicularly apposed beta sheets with a C-terminal alpha helixthat seals the gap between the two sheets. In addition to the core, hSac2 has
an extra N-terminal alpha helix and a short beta strand in the C-terminal end [].
The RH2 (RILP homology 2) Rab-binding domain is found in the following animal
proteins [,
,
]:Rab interacting lysosomal proteins (RILP), which interacts with Rab7,
Rab34, and Rab36.RILP-like 1 (RILP-L1), which interacts with Rab12, Rab34, and Rab36.RILP-like 2 (RILP-L2), which interacts with Rab34 and Rab36.JNK-interacting protein 3 (JIP3), which interacts with Rab36 alone.JNK-interacting protein 4 (JIP4), which interacts with Rab36 alone.The RH2 domain is folded into a long helix alpha1 and a short helix alpha2
connected by a very tight loop. The RH2 domain forms atightly associated four helices homodimer in which both helices alpha1 and
alpha2 are involved in dimerization. Such a homodimer binds to two separateRab-GTP molecules on opposite sides, with both helices involved in the
interaction with Rab. In the complex interface, although each Rab-GTPinteracts with both helices of the RH2 domain, these two helices are
contributed by two different molecules, with helix alpha1 coming from oneprotomer and helix alpha2 from the other protomer [
].The entry represents the entire RH2 domain.
LC3 Interacting Regions (LIRs), also known as ATG8-interacting motifs (AIMs), are short-linear motifs (SLiMs) of autophagy receptors and adaptor proteins that facilitates the selective recruitment of autophagy substrates to the autophagosome. ATG8 proteins are highly conserved in eukaryotes and play a key role in selective autophagy as they recognise and bind different autophagy receptors and adaptors containing a specific class of SLiMs. LIRs are characterised by degenerated sequences with a four-residue core central sequence involved in ATG8-binding, with the W/Y/FxxL/I/V pattern. Based on the aromatic amino acid in position 1 they can be classified into W-type, Y-type and F-type. This entry represents the F-type LIR motif from cysteine protease ATG4, which contains the sequence FEI/VL. There are acidic residues at the N-terminal of this motif (Asp and Glu), which may contribute to stabilize the binding or may provide specificity toward a certain class of ATG8 proteins [
]. This entry includes proteins from chordates.
Signal recognition particle subunit SRP68, RNA-binding domain
Type:
Domain
Description:
Signal recognition particles (SRPs) are ribonucleoprotein complexes that target particular nascent pre-secretory proteins to the endoplasmic reticulum. The SRP complex targets the ribosome-nascent chain complex to the SRP receptor (SR), which is anchored in the ER, where SR compaction and GTPase rearrangement drive cotranslational protein translocation into the ER [
]. SRP68 is one of the two largest proteins found in SRPs (the other being SRP72), and it forms a heterodimer with SRP72. Heterodimer formation is essential for SRP function []. SRP68 binds to SRP RNA directly, while SRP72 binds the SRP RNA largely via nonspecific electrostatic interaction. The binding of SRP72 with SRP RNA enhances the affinity of SRP68 for the RNA. This entry describes the N-terminal RNA-binding domain (RBD) of SRP68, a tetratricopeptide-like module. Interactions between SRP68-RBD and SRP RNA (7SL RNA) are thought to facilitate a conformation of SRP RNA that is required for interactions with ribosomal RNA [,
,
].
This homology domain, GlyGly-CTERM, shares a species distribution with rhombosortase (
), a subfamily of rhomboid-like intramembrane serine proteases [
]. It is probably a recognition sequence for protein sorting and then cleavage by rhombosortase. Shewanella species have the largest number of target proteins per genome, up to thirteen. The domain occurs at the extreme carboxyl-terminus of a diverse set of proteins, most of which are enzymes with conventional signal sequences and with hydrolytic activities: nucleases, proteases, agarases, etc. The agarase AgaA from Vibro sp. strain JT0107 is secreted into the medium, while the same protein heterologously expressed in E. coli is retained in the cell fraction []. This suggests cleavage and release in species with this domain. Both this suggestion, and the chemical structure of the domain (motif, hydrophobic predicted transmembrane helix, cluster of basic residues) closely parallels that of the LPXTG/sortase system and the PEP-CTERM/exosortase(EpsH) system. For this reason, the putative processing enzyme is designated rhombosortase.
This entry includes NADH dehydrogenase [ubiquinone] 1 beta subcomplex subunit 10 (NDUFB10) and mitochondrial NADH-ubiquinone oxidoreductase 12kDa subunit (also known as NDUS6).NADH-ubiquinone oxidoreductase subunit 10 of (NDUFB10) is a member of a family of conserved proteins of up to 180 residues. It is one of the 41 protein subunits within the hydrophobic fraction of the NADH:ubiquinone oxidoreductase (complex I), a multiprotein complex located in the inner mitochondrial membrane whose main function is the transport of electrons from NADH to ubiquinone, which is accompanied by translocation of protons from the mitochondrial matrix to the intermembrane space. NDUFB10 is encoded in the nucleus [
,
]. NADH dehydrogenase [ubiquinone] iron-sulphur protein 6 subunit is an accessory subunit of the mitochondrial membrane respiratory chain NADH dehydrogenase (Complex I), that is believed to be not involved in catalysis. Complex I functions in the transfer of electrons from NADH to the respiratory chain and is composed of about 45 subunits. The immediate electron acceptor for the enzyme is believed to be ubiquinone.
Bacterial type IV pili are surface filaments critical for diverse biological processes including surface and host cell adhesion, colonisation, biofilm formation, twitching motility, DNA uptake during natural transformation and virulence [
,
]. The proteins necessary to form the type IV pili inner-membrane complex, are included in the pilMNOPQ operon which encodes the cytoplasmic actin-like protein PilM, PilN, PilO, the periplasmic lipoprotein PilP and the outer-membrane secretin PilQ. The inner-membrane PilM/N/O/P complex is required for the optimal function of the outer-membrane secretin PilQ. This cluster is highly conserved across the type IV pilus-producing bacterial species, and all of these proteins have been shown to be essential for twitching motility [,
].This entry represents type IV pilus inner membrane component PilO, which is necessary for proper folding of PilN. PilO and PilN depend on each other for stability and the periplasmic domains of them interact to form a stable heterodimer, which is essential for the assembly of a functional complex [
,
].
Nitrobindins (Nbs) are evolutionary conserved all-β-barrel heme-proteins displaying a highly solvent-exposed heme-Fe(III) atom. Mycobacterium tuberculosis Nb (Mt-Nb(III) and the C terminus of Homo sapiens Nb (Hs-Nb(III)) share this β-barrel structure, suggesting that Nb may act as a sensor possibly modulating the THAP4 transcriptional activity residing in the N-terminal region [
]. Ferric nitrobindin-like proteins that lack the conserved His residue which binds heme iron are also included in the Nb family.Mt-Nb(III) is a peroxynitrite isomerase that converts peroxynitrite to nitrate. It may be required to scavenge reactive nitrogen and oxygen species produced by the host during the immunity response [
]. In humans THAP4 catalyses the heme-based conversion of peroxynitrite into nitrate/NO3 in vitro [
]. At1g79260 is a nitrophorin-like heme-binding protein that may reversibly bind nitric oxide (NO) and be involved in NO transport []. This entry also includes Caenorhabditis elegans protein male abnormal 7 (Mab-7) which plays an important role in determining body shape and sensory ray morphology [].
In mice, Sema5A has been identified as a protein that induces inhibitory responses during optic nerve development [
]. It also has been identified as a candidate gene for causing idiopathic autism in humans []. Plexin B3 functions as a binding partner and receptor for Sema5A []. Sema5A is highly expressed in human pancreatic cancer cells and is associated with tumor growth, invasion and metastasis []. Sema5A belongs to class 5 semaphorin family of proteins, which are transmembrane glycoproteins characterised by unique thrombospondin specific repeats in the extracellular region of the protein []. Semaphorins are regulatory molecules in the development of the nervous system and in axonal guidance. They also play important roles in other biological processes, such as angiogenesis, immune regulation, respiration systems and cancer. The Sema domain is located at the N terminus and contains four disulfide bonds formed by eight conserved cysteine residues. It serves as a receptor-recognition and -binding module [
].
The multifunctional fibrillar adhesin CshA, which mediates binding to both host molecules and other microorganisms, is an important determinant of colonization by Streptococcus gordonii, an oral commensal and opportunistic pathogen of animals and humans. CshA binds the high-molecular-weight glycoprotein fibronectin (Fn) via an N-terminal non-repetitive region, and this protein-protein interaction has been proposed to promote S. gordonii colonization at multiple sites within the host. This 259kDa polypeptide is organized in the form of a leader peptide (residues 1-41), a non-repetitive region (residues 42-778), 17 repeat domains (R1-R17, each about 101-aa residues), and a C-terminal cell wall anchor. The non-repetitive Fn-binding region of CshA in turn is composed of three distinct domains, designated as non-repetitive domain 1 (NR1, CshA(42-222)), non-repetitive domain 2 (NR2, CshA(223-540)), and non-repetitive domain 3 (NR3, CshA(582-814)). The NR2 domain of CshA is shown to adopt a globular structure with a lectin-like fold and a ligand-binding site on its surface with structural homologues identified as those involved in binding carbohydrates or glycoproteins [
].
Dr-adhesin bind the Dr blood group antigen component of decay-accelerating factor. These proteins contain both fimbriated and afimbriated adherence structures and mediate adherence of uropathogenic Escherichia coli to the urinary tract [
]. They also confer the mannose-resistant hemagglutination phenotype, which can be inhibited by chloramphenicol. The N-terminal portion of the mature protein is thought to be responsible for chloramphenicol sensitivity [].Afa/Dr adhesins display an afimbrial/fimbrial morphology and are exported to the bacterial surface by the chaperone-usher pathway, a widespread system among gram-negative pathogens for the secretion of fimbrial proteins [
]. AfaD protein caps the AfaE fibrils, where it can efficiently perform its role as an invasin [].The structure of the AfaD domain exhibits an Immunoglobulin-like topology where the two β-sheets pack against each other in an analogous fashion to AfaE and archetypal pilin domains. The AfaD role as an initiator of the fimbrial assembly and the cap of the fibrillar structure is defined by the absence of a free N-terminal extension [
].
This entry includes eIF3m and COPS7A/B. eIF3m is a component of the eukaryotic translation initiation factor 3 (eIF-3) complex, which is involved in protein synthesis and, together with other initiation factors, stimulates binding of mRNA and methionyl-tRNAi to the 40S ribosome [
]. The COP9 signalosome (CSN) is a conserved protein complex that regulates the ubiquitin (Ubl) conjugation pathway by mediating the deneddylation of the cullin subunits of SCF-type E3 ligase complexes [
], which leads to a decrease in Ubl ligase activity of SCF-type complexes such as SCF, CSA or DDB2 []. Protein kinases CK2 and D, which phosphorylate proteins such as cJun and p53 resulting in their degradation by the ubiquitin-26S proteasome system, also binds to CSN [,
]. The mammalian CSN typically consistis of eight subunits designated CSN1-CSN8. The fission yeast possesses a smaller version of the CSN, consisting only of six subunits, whereas a more distant CSN-like complex has been described in Saccharomyces cerevisiae [].
Beta-1,4-galactosyltransferase is responsible for catalysing the transfer of galactose onto proteins or lipids [
]. Studies on C.elegans have shown that the enzyme is required for susceptibility to pore-forming crystal toxins, together with glycosyltransferase genes bre-1, bre-2, bre-3 and bre-5 [].In mammals, the protein is found in long and short forms. The short form is preferentially located in the golgi complex [
], where it catalyses the production of lactose in the lactating mammary gland; it may also be responsible for the synthesis of complex-type N-linked oligosaccharides of various glycoproteins, as well as the carbohydrate moieties of glycolipids. The short form is preferentially targeted to the plasma membrane of many cell types [], where it functions as a recognition molecule during various cell-cell and cell-matrix interactions, by binding to specific oligosaccharide ligands on opposing cells, or in the extracellular matrix.This entry also includes other members of the glycosyltransferase 7 family, such as xylosylprotein 4-beta-galactosyltransferase and beta-N-acetyl-D-glucosaminide beta-1,4-N-acetylglucosaminyl-transferase.
Proteins containing this domain include the uncharacterised Sulfolobus tokodaii ST1585 protein [
]. It belongs to the MBL-fold metallo-hydrolase superfamily which is comprised mainly of hydrolytic enzymes which carry out a variety of biological functions. The class B metal beta-lactamases (MBLs) from which this fold was named are only a small fraction of the activities which are included in this superfamily. Activities carried out by superfamily members include class B beta-lactamases, hydroxyacylglutathione hydrolases, AHL (acyl homoserine lactone) lactonases, persulfide dioxygenases, flavodiiron proteins, cleavage and polyadenylation specificity factors such as the Int9 and Int11 subunits of Integrator, Sdsa1-like and AtsA-like arylsulfatases, 5'-exonucleases human SNM1A and yeast Pso2p, ribonuclease J and ribonuclease Z, cyclic nucleotide phosphodiesterases, insecticide hydrolases, and proteins required for natural transformation competence. Classical members of the superfamily are di-, or less commonly mono-, zinc-ion-dependent hydrolases, however the diversity of biological roles is reflected in variations in the active site metallo-chemistry [,
,
,
,
].
Ero1 and PDI form the disulfide relay system of the ER that supports correct disulfide bond formation of secretory proteins. This entry represents Ero1 (endoplasmic oxidoreductin-1) from yeasts and its homologues from mammals, Ero1-alpha and Ero1-beta. Ero1 is an flavoprotein that directly transfers disulfide bonds to disulfide isomerase PDI [
,
,
]. Ero1 acts as an thiol oxidoreductase responsible for catalyzing disulfide bond formation in nascent polypeptide substrates via electron transfer through protein disulfide isomerase (PDI) with oxygen acting as the final electron acceptor []. Newly generated disulfides are transferred from a FAD (flavin adenine dinucleotide)-associated active site via a "shuttle disulfide"cysteine pair in Ero1 to PDI and from there on to substrate proteins [
,
,
]. The activity of Ero1 is regulated by PDI (also known as Pdi1). This regulation of Ero1 through reduction and oxidation of regulatory bonds within Ero1 is essential for maintaining the proper redox balance in the ER [,
]. ERO1 has a multihelical structure consisting of two four-helical bundles.
This entry represent the second SH3 domain of Nck1. The second SH3 domain of Nck appears to prefer ligands containing the APxxPxR motif [
].Nck1 (also called Nck-alpha) plays a crucial role in connecting signaling pathways of tyrosine kinase receptors and important effectors in actin dynamics and cytoskeletal remodeling [
]. It binds and activates RasGAP, resulting in the downregulation of Ras []. It is also involved in the signaling of endothilin-mediated inhibition of cell migration [].Cytoplasmic proteins Nck are non-enzymatic adaptor proteins composed of three SH3 (Src homology 3) domains and a C-terminal SH2 domain [
]. They regulate actin cytoskeleton dynamics by linking proline-rich effector molecules to protein tyrosine kinases and phosphorylated signaling intermediates []. They function downstream of the PDGFbeta receptor and are involved in Rho GTPase signaling and actin dynamics []. They associate with tyrosine-phosphorylated growth factor receptors or their cellular substrates [,
]. There are two vertebrate Nck proteins, Nck1 and Nck2.
This entry represent the third SH3 domain of Nck1. The third SH3 domain of Nck appears to prefer ligands with a PxAPxR motif [
].Nck1 (also called Nck-alpha) plays a crucial role in connecting signaling pathways of tyrosine kinase receptors and important effectors in actin dynamics and cytoskeletal remodeling [
]. It binds and activates RasGAP, resulting in the downregulation of Ras []. It is also involved in the signaling of endothilin-mediated inhibition of cell migration [].Cytoplasmic proteins Nck are non-enzymatic adaptor proteins composed of three SH3 (Src homology 3) domains and a C-terminal SH2 domain [
]. They regulate actin cytoskeleton dynamics by linking proline-rich effector molecules to protein tyrosine kinases and phosphorylated signaling intermediates []. They function downstream of the PDGFbeta receptor and are involved in Rho GTPase signaling and actin dynamics []. They associate with tyrosine-phosphorylated growth factor receptors or their cellular substrates [,
]. There are two vertebrate Nck proteins, Nck1 and Nck2.
This entry represents the SH3 domain found in srGAP proteins 1, 2 and 3. srGAP1 (also called ARHGAP13) is a key GTPase activating protein (GAP) downstream of Slit-Robo pathway, and has been shown to inhibit neuronal migration and glioma cell invasion by reducing the activation of Cdc42 [
]. srGAP2 (ARHGAP34) regulates neuronal morphogenesis through the ability of its F-BAR domain to regulate membrane deformation and induce filopodia formation []. srGAP3 (ARHGAP14) interacts with lamellipodin at the cell membrane and regulates Rac-dependent cellular protrusions []. The SLIT-ROBO Rho GTPase-activating protein (srGAP) family consists of four members: srGAP1, -2, -3 and -4. They contain F-BAR, RhoGAP and SH3 domains. Their RhoGAP domain is involved in negative regulation of Rho GTPase activities important for cytoskeleton rearrangement [
]. The srGAP family members have an "inverse F-BAR"or IF-BAR domain that is distinct from other F-BAR domains such as FBP17. They are multifunctional adaptor proteins involved in various aspects of neuronal development [
].
This entry represents the SH3 domain of ASAP2 (Arf-GAP with SH3 domain, ANK repeat and PH domain-containing protein 2). ASAP2 (Arf-GAP with SH3 domain, ANK repeat and PH domain-containing protein 2; also known as AMAP2 or PAG3) is a PH domain-containing ArfGAP. It mediates the functions of Arf GTPases via dual mechanisms: it exhibits GTPase activating protein (GAP) activity towards class I (Arf1) and II (Arf5) Arfs; and it binds class III Arfs (GTP-Arf6) stably without GAP activity [
]. It binds paxillin and is implicated in Fcgamma receptor-mediated phagocytosis in macrophages and in cell migration []. ASAP2 contains an N-terminal BAR domain, followed by a Pleckstrin homology (PH) domain, an Arf GAP domain, ankyrin (ANK) repeats, and a C-terminal SH3 domain.ArfGAPs are a protein family containing the ArfGAP domain. There are 31 genes encoding ArfGAPs in human [
]. They catalyse the hydrolysis of GTP that is bound to Arf, thereby converting Arf-GTP to Arf-GDP [].
PACSIN1 (protein kinase C and casein kinase substrate in neurons protein 1, also known as SYNDAPIN1) belongs to the PACSIN family that contains a N-terminal F-BAR (FCH-BAR) domain and a C-terminal SH3 domain [
]. There are three tissue-specific isoforms of PACSIN1 found in mammals: PACSIN11 is enriched in neurons, PACSIN12 is ubiquitously expressed, and PACSIN13 is found mainly in muscle []. They are cytoplasmic phosphoproteins that play a role in vesicle formation and transport [].PACSIN1 is upregulated upon differentiation into neuronal cells [
]. The SH3 domain of PACSIN1 mediates activation of neural WASP (N-WASP), which is required to regulate actin polymerisation and is essential for proper neuromorphogenesis and cellular motility []. Two phosphorylation sites within the F-BAR domain of PACSIN1 are used for membrane tubulation regulation []. This entry represents the F-BAR domain of PACSIN1. F-BAR domains are dimerization modules that bind and bend membranes and are found in proteins involved in membrane dynamics and actin reorganization [
].
This entry represents the C-terminal domain of CoFeSP (RACo). This domain is also known as DUF4445 domain. Structure analysis of RACo indicate that it contains 4 regions: N-terminal region that binds the [2Fe-2S] cluster, a linker region, the middle region, and the large C-terminal domain. This entry represents the C-terminal domain which harbours the ATP-binding site. Structural studies show that the C-terminal domain contains the conserved topology characteristic of the ASKHA (acetate and sugar kinases/heat shock protein 70/actin). Despite the low-sequence identity shared between members of the ASKHA superfamily, they show a common central fold. Members of the ASKHA include proteins that catalyze phosphoryl transfers or hydrolysis of ATP in a variety of biological contexts. Asp, Asn, Glu, and Gln residues are well conserved in the core of the ASKHA proteins, where they interact with the phosphates of ATP and the bound Mg2+ ions [].Proteins containing this domain, such as PGA1_c15200 from Phaeobacter inhibens, is suggested to be involved in vitamin B12 reactivation [
].
The TCP transcription factor family was named after: teosinte branched 1 (tb1, Zea mays (Maize)) [
], cycloidea (cyc) (Antirrhinum majus) (Garden snapdragon) [] and PCF in rice (Oryza sativa) [,
]. The TCP proteins code for structurally related proteins implicated in the evolution of key morphological traits []. However, the biochemical function of CYC and TB1 proteins remains to be demonstrated. One of the conserved regions is predicted to form a non-canonical basic-Helix-Loop-Helix (bHLP) structure. This domain is also found in two rice DNA-binding proteins, PCF1 and PCF2, where it has been shown to be involved in DNA-binding and dimerization.This family of transcription factors are exclusive to higher plants. They can be divided into two groups, TCP-C and TCP-P, that appear to have separated following an early gene duplication event [
]. This duplication event may have led to functional divergence and it has been proposed that that the TCP-P subfamily are transcriptional repressors, while the TPC-C subfamily are transcription activators [].
Phosphopantetheine (or pantetheine 4' phosphate) is the prosthetic group of acyl carrier proteins (ACP) in some multienzyme complexes where it serves as a 'swinging arm' for the attachment of activated fatty acid and amino-acid groups [
].The amino-terminal region of the ACP proteins is well defined and consists of four α helices arranged in a right-handed bundle held together by interhelical hydrophobic interactions. The Asp-Ser-Leu (DSL) motif is conserved in all of the ACP sequences, and the 4'-PP prosthetic group is covalently linked via a phosphodiester bond to the serine residue. The DSL sequence is present at the amino terminus of helix II, a domain of the protein referred to as the recognition helix and which is responsible for the interaction of ACPs with the enzymes of type II fatty acid synthesis [
].This entry represents the phosphopantetheine-binding domain from polyketide synthases. Polyketide synthases are large multidomain proteins involved in the synthesis of secondary metabolites [
].
Colipase [
,
] is a small protein cofactor needed by pancreatic lipase for efficient dietary lipid hydrolyisis. It also binds to the bile-salt covered triacylglycerol interface, thus allowing the enzyme to anchor itself to the water-lipid interface. Efficient absorption of dietary fats is dependent on the action of pancreatic triglyceride lipase. Colipase binds to the C-terminal, non-catalytic domain of lipase, thereby stabilising as active conformation and considerably increasing the overall hydrophobic binding site. Structural studies of the complex and of colipase alone have revealed the functionality of its architecture [,
].Colipase is a small protein with five conserved disulphide bonds. Structural analogies have been recognised between a developmental protein (Dickkopf), the pancreatic lipase C-terminal domain, the N-terminal domains of lipoxygenases and the C-terminal domain of alpha-toxin. These non-catalytic domains in the latter enzymes are important for interaction with membrane. It has not been established if these domains are also involved in eventual protein cofactor binding as is the case for pancreatic lipase [
].
The GW domain or cell wall targeting (CWT) signal is a module of about 80-90 amino acids named for a conserved Gly-Trp (GW) dipeptide. GW domains have only been identified in Gram-positive bacteria and form a small protein family. They are divergent members of the SH3 family. However, GW
domains are unlikely to mimic SH3 domains functionally, as their potential peptide-binding sites are destroyed or blocked. GW domains may constitute a motif for cell-surface anchoring in Listeria and other Gram-positive bacteria [,
].The GW domain is composed of seven β-strands, five of which are organized into an open barrel conformation like the SH3 domains one. The eponymous GW dipeptide, located in the fourth β-strand, is more conserved in GW domains than in SH3 domains. Both the glycine and tryptophan are buried in GW proteins, while the equivalent residues in SH3 proteins are surface accessible, perhaps explaining the greater conservation in GW proteins [
,
].
This is a family of unknown function found in Mycobacterium phages. Family members include the full Gp79 protein found in Mycobacteriophage L5
. Mycobacteriophage L5 is a phage isolated from Mycobacterium smegmatis. It forms stable lysogens in M. smegmatis and has a broad host range among the pathogenic mycobacteria. L5 encodes gene products (gp) toxic to the host M. smegmatis. Expression of gp79 interferes with the cell membrane or cell-wall synthesis of M. smegmatis, leading to altered cell morphology. It also has a bactericidal effect on E. coli. The N-terminal segment of gp79 (amino acids 1-41) shares sequence similarity with the signal peptide of the D-alanyl-D-alanine carboxypeptidase of Bacillus licheniformis. This enzyme removes C-terminal D-alanyl residues from sugarpeptide cell-wall precursors and is also a penicillin-binding protein (PBP). The homology of the hydrophobic N-terminal part of gp79 to a PBP (penicillin-binding protein) signal peptide may indicate an interaction of gp79 with proteins or metabolites involved in the peptidoglycan synthesis of M. smegmatis [
].
Hydrophobic surface binding proteins are typically between 171 to 275 amino acids in length. Although the HsbA amino acid sequence suggests that HsbA may be hydrophilic, HsbA adsorbed to hydrophobic PBSA (Polybutylene succinate-co-adipate) surfaces in the presence of NaCl or CaCl2. When HsbA was adsorbed on the hydrophobic PBSA surfaces, it promoted PBSA degradation via the CutL1 polyesterase. CutL1 interacts directly with HsbA attached to the hydrophobic QCM electrode surface. These results suggest that when HsbA is adsorbed onto the PBSA surface, it recruits CutL1, and that when CutL1 is accumulated on the PBSA surface, it stimulates PBSA degradation [
]. This entry is also characterised by a antigenic cell wall galactomannoprotein in Aspergillus fumigatus, which is a protein of 284 amino acid residues. It contains a serine- and threonine-rich region for O glycosylation, a signal peptide, and a putative glycosylphosphatidyl inositol attachment signal sequence. Ultrastructural analysis showed that the protein is present in the cell walls of hyphae and conidia [
].
Mitochondrial ribosomes synthesise core subunits of the inner membrane
respiratory chain complexes. In mitochondria, ribosomes are mainly membraneassociated and translation is tightly coupled to the inner mitochondrial
membrane. Mdm38/Letm1 is a conserved membrane receptor for mitochondrialribosomes and specifically involved in respiratory chain biogenesis.These proteins play a role in potassium and hydrogen ion exchange [
,
]. Some features found in LETM1, such as a transmembrane domain and a CK2 and PKC phosphorylation site [], are relatively conserved throughout the family. Deletion of LETM1 is thought to be involved in the development of Wolf-Hirschhorn syndrome in humans [].This entry represents the Letm1 ribosome-binding domain (RBD) of Letm1 from animals and MDM38 from yeast. This domain is necessary and sufficient for
interaction with mitochondrial ribosomes. It exhibits a predominantly α-helical compact fold and contains extended, highly chargedsurface patches. The conserved regions most probably represent interaction
sites for ribosomal RNA or protein subunits, or for proteins involved inmitochondrial protein biogenesis [
,
].
Saccharopepsin (or fungal proteinase A, MEROPS identifier A01.018) is a member of the aspartic proteinase superfamily. In Saccharomyces cerevisiae, targeted to the vacuole as a zymogen, activation of saccharopepsin at acidic pH can occur by two different pathways: a one-step process to release mature saccharopepsin, involving the intervention of proteinase B, or a step-wise pathway via the auto-activation product known as pseudo-saccharopepsin. Once active, saccharopepsin is essential to the activities of other yeast vacuolar hydrolases, including proteinase B and carboxypeptidase Y. The mature enzyme is bilobal, with each lobe providing one of the two catalytically essential aspartic acid residues in the active site. The crystal structure of free saccharopepsin shows that the flap loop is atypically pointing directly into the S(1) pocket of the enzyme. Saccharopepsin preferentially hydrolyzes hydrophobic residues such as Phe, Leu or Glu at the P1 position and Phe, Ile, Leu or Ala at P1'. Moreover, the enzyme is inhibited by IA3, a natural and highly specific inhibitor produced by S. cerevisiae [
].
The repetin gene is a member of the "fused"gene family. It binds Ca2+ and is crosslinked to trichohyalin in the inner root sheath of mouse hair follicle [
], and to loricrin, SPR1, and SPR2 in the human foreskin and oral mucosa [].In humans, a number of genes specifying structural proteins expressed late during epidermal differentiation have been identified and found to be clustered on chromosome 1q21. Therefore, this region is named the epidermal differentiation complex (EDC). The proteins encoded by the EDC genes can be classified into three groups: the precursor proteins of the CE, the S100 family and the "fused"gene family [
]. In some classification, the "fused gene"family is classified as a subgroup within the S100 gene family [
]. The "fused"gene family members contain EF hands and internal tandem repeats. It consists of profilaggrin, trichohyalin, repetin, hornerin, the profilaggrin-related protein and cornulin (encoded by c1orf10). They are associated with keratin intermediate filaments and partially cross-linked to the cell envelope (CE) [
].
Casein kinase, a ubiquitous well-conserved protein kinase involved in cell metabolism and differentiation, is characterised by its preference for Ser or Thr in acidic stretches of amino acids. The enzyme is a tetramer of 2 alpha- and 2 beta-subunits [
,
]. However, some species (e.g., mammals) possess 2 related forms of the alpha-subunit (alpha and alpha'), while others (e.g., fungi) possess 2 related beta-subunits (beta and beta') []. The alpha-subunit is the catalytic unit and contains regions characteristic of serine/threonine protein kinases. The beta-subunit is believed to be regulatory, possessing an N-terminal auto-phosphorylation site, an internal acidic domain, and a potential metal-binding motif [
]. The beta subunit contains, in its central section, a cysteine-rich motif, CX(n)C, that could be involved in binding a metal such as zinc [
]. The mammalian beta-subunit gene promoter shares common features with those of other mammalian protein kinases and is closely related to the promoter of the regulatory subunit of cAMP-dependent protein kinase [].
In Drosophila melanogaster, Nanos functions as a localised determinant of posterior pattern. Nanos RNA is localised to the posterior pole of the maturing egg cell and encodes a protein that emanates from this localised source. Nanos acts as a translational repressor and thereby establishes a gradient of the morphogen Hunchback [
]. Nanos comprises a non-conserved amino-terminus and highly conserved carboxy-terminal regions. The C-terminal region has two conserved Cys-Cys-His-Cys
(CCHC)-type zinc-finger motifs that are indispensable for nanos function[
]. Xcat-2 from Xenopus encodes a protein with a nanos-like zinc finger domain. It is found in the vegetal cortical region and is inherited by the vegetal blasomeres during development, and is degraded very early in development. The localised and maternally restricted expression of Xcat-2 RNA suggests a role for its protein in setting up regional differences in gene expression that occur early in development [].This entry consists of the Nanos protein and homologues, including Xcat-2.
Rubella virus (RV), the sole member of the genus Rubivirus within the family Togaviridae, is a small enveloped, positive strand RNA virus. The nucleocapsid consists of 40S genomic RNA and a single species of capsid protein which is enveloped within a host-derived lipid bilayer containing two viral glycoproteins, E1 (58kDa) and E2 (42-46kDa). In virus infected cells, RV matures by budding either at the plasma membrane, or at the internal membranes depending on the cell type and enters adjacent uninfected cells by a membrane fusion process in the endosome, directed by E1-E2 heterodimers. The heterodimer formation is crucial for E1 transport out of the endoplasmic reticulum to the Golgi and plasma membrane. In RV E1, a cysteine at position 82 is crucial for the E1-E2 heterodimer formation and cell surface expression of the two proteins. E1 has been shown to be a type 1 membrane protein, rich in cysteine residues with extensive intramolecular disulphide bonds [
].This family is found together with and
.
Glucosyltransferase 3 (also known as Glucosyltransferase C, GtfC) catalyses the second step of glycosylation of Serine-rich repeat glycoproteins (SRRPs) in Gram-positive bacteria and transfers glucosyl residues to GlcNAc-modified SRRPs. SRRPs have been first identified in pathogenic bacteria but they are also in gut commensal bacteria. These proteins are O-glycosylated on serine or threonine residues to be exported via an specialised accessory secretion (SecA2/Y2) system. This secretion system is encoded in a gene cluster that includes the motor protein SecA2, the channel SecY2 and three to five accessory Sec proteins, but it also contains genes encoding a variable number of glycosyltransferases (GTs), among them Gft3 [
,
,
].Structural analysis revealed that Gft3 forms a tetramer and has structural homology with other glycosyltransferases. The conserved residues Phe315 and Leu320 at the C-terminal are critical for oligomerisation and UDP- or UDP-glucose binding. Gft3 homologs constitutes a unique subfamiliy of glycosyltransferases as they are highly conserved and only found in SRRP-containing bacteria [
].
This entry consists of the C-terminal region of several eukaryotic and archaeal RuvB-like 1 (Pontin or TIP49a) and RuvB-like 2 (Reptin or TIP49b) proteins. In zebrafish, the liebeskummer (lik) mutation, causes development of hyperplastic embryonic hearts. lik encodes Reptin, a component of a DNA-stimulated ATPase complex. Beta-catenin and Pontin, a DNA-stimulated ATPase that is often part of complexes with Reptin, are in the same genetic pathways. The Reptin/Pontin ratio serves to regulate heart growth during development, at least in part via the beta-catenin pathway [
]. TBP-interacting protein 49 (TIP49) was originally identified as a TBP-binding protein, and two related proteins are encoded by individual genes, tip49a and b. Although the function of this gene family has not been elucidated, they are supposed to play a critical role in nuclear events because they interact with various kinds of nuclear factors and have DNA helicase activities. TIP49a has been suggested to act as an autoantigen in some patients with autoimmune diseases [].
Hexon is a major coat protein found in various species-specific Adenoviruses, which are type II dsDNA viruses. Hexon coat proteins are synthesised during late infection and form homo-trimers. The 240 copies of the hexon trimer that are produced are organised so that 12 lie on each of the 20 facets. The central 9 hexons in a facet are cemented together by 12 copies of polypeptide IX. The penton complex, formed by the peripentonal hexons and base hexon (holding in place a fibre), lie at each of the 12 vertices [
]. The hexon coat protein is a duplication consisting of two domains with a similar fold packed together like the nucleoplasmin subunits. Within a hexon trimer, the domains are arranged around a pseudo 6-fold axis. The domains have a β-sandwich structure consisting of 8 strands in two sheets with a jelly-roll topology; each domain is heavily decorated with many insertions [].This entry represents the C-terminal domain of hexon coat proteins.
Hexon is a major coat protein found in various species-specific Adenoviruses, which are type II dsDNA viruses. Hexon coat proteins are synthesised during late infection and form homo-trimers. The 240 copies of the hexon trimer that are produced are organised so that 12 lie on each of the 20 facets. The central 9 hexons in a facet are cemented together by 12 copies of polypeptide IX. The penton complex, formed by the peripentonal hexons and base hexon (holding in place a fibre), lie at each of the 12 vertices [
]. The hexon coat protein is a duplication consisting of two domains with a similar fold packed together like the nucleoplasmin subunits. Within a hexon trimer, the domains are arranged around a pseudo 6-fold axis. The domains have a β-sandwich structure consisting of 8 strands in two sheets with a jelly-roll topology; each domain is heavily decorated with many insertions [].This entry represents the N-terminal domain of hexon coat proteins.
The vicinal oxygen chelate (VOC) family of enzymes catalyzes a highly diverse
set of chemistries that derives from one common mechanistic trait: bidentatecoordination to a divalent metal centre by a substrate or intermediate or
transition state through vicinal oxygen atoms. The array of reactionscatalyzed by this family is mediated structurally by a common fold and
protein-chelating residues that secure and localize a metal ion. The commonfold has topological symmetry being comprised of two Beta-α-β-β-β
units that form an incompletely closed barrel of β-sheet about the metalion [
]. Members of this family include the glyoxalases I (GLO), the extradiol dioxygenases (DHBD), the bleomycin resistance
proteins, the fosfomycin resistance proteins, and the methylmalonyl-CoAepimerases (MMCE) involved in the epimerization of (2S)-methylmalonyl-CoA to
its (2R)-stereoisomer. The bleomycin resistance proteins are unique in thatthey do not possess a metal binding site and are not enzymes. They bind and
sequester bleomycin and related compounds without degrading or transformingthem [
,
,
].
Longin domains are evolutionarily conserved regions widely distributed among eukaryotes, involved in membrane dynamic regulation and exhibit similarities in primary sequence and secondary structure. Longin-like domains are found in FUZ and related proteins, such as the MON1 and HPS1 proteins [
,
,
]. The MON1/CCZ1 complex (MC1) is the GDP/GTP exchange factor (GEF) for the Rab GTPase Ypt7/Rab7 during vesicular trafficking []. The HPS1/HPS4 complex (BLOC-3) is a Rab32 and Rab38 GEF and is required for biogenesis of melanosomes and platelet dense granules []. Inturned (INTU) and Fuzzy (FUZ) proteins interact as members of the ciliogenesis and planar polarity effector (CPLANE) complex that controls recruitment of intraflagellar transport machinery to the basal body of primary cilia [,
]. Structurally, these domains are composed of an alpha/beta fold which contains five anti-parallel β-strands organised as a central β-sheet and around it, two α-helices [].This entry represents the third Longin domain found in CCZ1, INTU and HPS4 proteins.
Longin domains are evolutionarily conserved regions widely distributed among eukaryotes, involved in membrane dynamic regulation and exhibit similarities in primary sequence and secondary structure. Longin-like domains are found in FUZ and related proteins, such as the MON1 and HPS1 proteins [
, ,
]. The MON1/CCZ1 complex (MC1) is the GDP/GTP exchange factor (GEF) for the Rab GTPase Ypt7/Rab7 during vesicular trafficking []. The HPS1/HPS4 complex (BLOC-3) is a Rab32 and Rab38 GEF and is required for biogenesis of melanosomes and platelet dense granules []. Inturned (INTU) and Fuzzy (FUZ) proteins interact as members of the ciliogenesis and planar polarity effector (CPLANE) complex that controls recruitment of intraflagellar transport machinery to the basal body of primary cilia [,
]. Structurally, these domains are composed of an alpha/beta fold which contains five anti-parallel β-strands organised as a central β-sheet and around it, two α-helices [].This entry represents the second Longin domain found in INTU and CCZ1 proteins.
Longin domains are evolutionarily conserved regions widely distributed among eukaryotes, involved in membrane dynamic regulation and exhibit similarities in primary sequence and secondary structure. Longin-like domains are found in FUZ and related proteins, such as the MON1 and HPS1 proteins [
,
,
]. The MON1/CCZ1 complex (MC1) is the GDP/GTP exchange factor (GEF) for the Rab GTPase Ypt7/Rab7 during vesicular trafficking []. The HPS1/HPS4 complex (BLOC-3) is a Rab32 and Rab38 GEF and is required for biogenesis of melanosomes and platelet dense granules []. Inturned (INTU) and Fuzzy (FUZ) proteins interact as members of the ciliogenesis and planar polarity effector (CPLANE) complex that controls recruitment of intraflagellar transport machinery to the basal body of primary cilia [,
]. Structurally, these domains are composed of an alpha/beta fold which contains five anti-parallel β-strands organised as a central β-sheet and around it, two α-helices [].This entry represents the first Longin domain found in FUZ, MON1 and HPS1 proteins.
Longin domains are evolutionarily conserved regions widely distributed among eukaryotes, involved in membrane dynamic regulation and exhibit similarities in primary sequence and secondary structure. Longin-like domains are found in FUZ and related proteins, such as the MON1 and HPS1 proteins [
,
,
]. The MON1/CCZ1 complex (MC1) is the GDP/GTP exchange factor (GEF) for the Rab GTPase Ypt7/Rab7 during vesicular trafficking []. The HPS1/HPS4 complex (BLOC-3) is a Rab32 and Rab38 GEF and is required for biogenesis of melanosomes and platelet dense granules []. Inturned (INTU) and Fuzzy (FUZ) proteins interact as members of the ciliogenesis and planar polarity effector (CPLANE) complex that controls recruitment of intraflagellar transport machinery to the basal body of primary cilia [,
]. Structurally, these domains are composed of an alpha/beta fold which contains five anti-parallel β-strands organised as a central β-sheet and around it, two α-helices [].This entry represents the third Longin domain of FUZ, MON1 and HPS1 proteins.
Longin domains are evolutionarily conserved regions widely distributed among eukaryotes, involved in membrane dynamic regulation and exhibit similarities in primary sequence and secondary structure. Longin-like domains are found in FUZ and related proteins, such as the MON1 and HPS1 proteins [,
,
]. The MON1/CCZ1 complex (MC1) is the GDP/GTP exchange factor (GEF) for the Rab GTPase Ypt7/Rab7 during vesicular trafficking []. The HPS1/HPS4 complex (BLOC-3) is a Rab32 and Rab38 GEF and is required for biogenesis of melanosomes and platelet dense granules []. Inturned (INTU) and Fuzzy (FUZ) proteins interact as members of the ciliogenesis and planar polarity effector (CPLANE) complex that controls recruitment of intraflagellar transport machinery to the basal body of primary cilia [,
]. Structurally, these domains are composed of an alpha/beta fold which contains five anti-parallel β-strands organised as a central β-sheet and around it, two α-helices [].This entry represents the second Longin domain found in FUZ, MON1 and HPS1 proteins.
Methionine aminopeptidase (
) (MAP) catalyses the hydrolytic cleavage of the N-terminal methionine from newly synthesised polypeptides if the penultimate amino acid is small, with different tolerance to Val and Thr at this position [
]. All MAP studied to date are monomeric proteins that require cobalt ions for activity. Two subfamilies of MAP enzymes are known to exist [,
]. While being evolutionary related, they only share a limited amount of sequence similarity mostly clustered around the residues shown, in the Escherichia coli MAP [], to be involved in cobalt-binding. The first family consists of enzymes from prokaryotes as well as eukaryotic MAP-1 (), while the second group is made up of archaeal MAP and eukaryotic MAP-2 and includes proteins which do not seem to be MAP, but that are clearly evolutionary related such as mouse proliferation-associated protein1 and fission yeast curved DNA-binding protein.
This entry represents the archaeal MAP, which belongs to the subfamily two [
].
The biotin operon of Escherichia coli contains 5 structural genes involved in the synthesis of biotin. Transcription of the operon is regulated via one of these proteins, BirA. BirA is an asymetric protein with 3 specific domains. The ligase reaction intermediate, biotinyl-5'-AMP, is the co-repressor that triggers DNA binding by BirA.
The α-helical N-terminal domain of the BirA protein has the helix-turn-helix structure of DNA-binding proteins with a central DNA recognition helix. BirA undergoes several conformational changes related to repressor function and the N-terminal DNA-binding function is connected to the rest of the molecule through a hinge which will allow relocation of the domains during the reaction []. Two repressor molecules form the operator-repressor complex, with dimer formation occuring simultaneously with DNA binding. DNA-binding may cause a conformational change which allows this co-operative interaction. In the dimer structure, the β-sheets in the central domain of each monomer are arranged side-by-side to form a single, seamless β-sheet.
Guided entry of tail-anchored proteins factor CAMLG
Type:
Family
Description:
This entry represents the Guided entry of tail-anchored proteins factor CAMLG (also known as Calcium signal-modulating cyclophilin ligand CAML) which is required for the post-translational delivery of tail-anchored (TA) proteins to the endoplasmic reticulum [
,
,
]. Together with GET1/WRB, acts as a membrane receptor for soluble GET3/TRC40, which recognizes and selectively binds the transmembrane domain of TA proteins in the cytosol [,
,
]. It is required for the stability of GET1 []. CAMLG also stimulates calcium signaling in T cells through its involvement in elevation of intracellular calcium [], and it is essential for the survival of peripheral follicular B cells. Although CAML shows no sequence similarity to yeast Get2, they share similar biochemical properties and topology [
], thus resulting in the Get2 homologue in mammals, playing the same essential role in the membrane insertion as that of Get2. It also shows similarity with ER membrane protein complex subunit 6 (EMC6) [].
This entry represents the CARD domain found in Receptor Interacting Protein 2 (RIP2). RIP2 harbours a C-terminal CARD domain and functions as an effector kinase downstream of the pattern recognition receptors from the Nod-like (NLR)-family, NOD1 and NOD2, which recognizes bacterial peptidoglycans released upon infection. This cascade is implicated in inflammatory immune responses and the clearance of intracellular pathogens. RIP2 associates with NOD1 and NOD2 via CARD-CARD interactions [
].In general, CARDs are death domains (DDs) found associated with caspases [
]. They are known to be important in the signaling pathways for apoptosis, inflammation, and host-defense mechanisms []. DDs are protein-protein interaction domains found in a variety of domain architectures. Their common feature is that they form homodimers by self-association or heterodimers by associating with other members of the DD superfamily including PYRIN and DED (Death Effector Domain). They serve as adaptors in signaling pathways and can recruit other proteins into signaling complexes [].
This entry represents the PHD finger of A. thaliana histone-lysine N-methyltransferase arabidopsis trithorax-like proteins ATX1, -2, and similar proteins.ATX1 and -2 are sister paralogs originating from a segmental chromosomal duplication; they are plant counterparts of the Drosophila melanogaster trithorax (TRX) and mammalian mixed-lineage leukemia (MLL1) proteins [
]. ATX1 is a methyltransferase that trimethylates histone H3 at lysine 4 (H3K4me3). It also acts as a histone modifier and as a positive effector of gene expression []. ATX1 regulates transcription from diverse classes of genes implicated in biotic and abiotic stress responses. It is involved in dehydration stress signaling in both abscisic acid (ABA)-dependent and ABA-independent pathways []. ATX2 is involved in dimethylating histone H3 at lysine 4 (H3K4me2) []. ATX1 and ATX2 are multi-domain proteins that consist of an N-terminal PWWP domain, FYRN- and FYRC (DAST, domain associated with SET in trithorax) domains, a canonical PHD finger, a non-canonical ePHD finger, and a C-terminal SET domain [].
This entry represents a group of dsDNA Baculovirus proteins. It is required for the infectivity of the OBs or occlusion bodies. It is a structural protein of the ODV envelope required only in the first steps of per os larva infection, as viruses being produced in cells expressing the gene for this protein but not containing it in their genomes are able to produce successful infections. Baculoviruses are large DNA viruses that infect arthropods, mainly members of the order Lepidoptera. In their life cycle, they produce two kinds of particles, a budded, non-occluded virus (BV), which buds out of the infected cell and is responsible for the cell-to-cell transmission of the virus, and an occluded form, the occlusion body (OB), which is responsible for protecting the virus between encounters with larvae. A variable number of virions are included in the para-crystalline structure of the OB, mainly constituted by the virus-encoded polyhedrin protein; these virions are called occlusion body-derived virions or ODVs [
].
The ABC transporter family is a group of membrane proteins that use the hydrolysis of ATP to power the translocation of a wide variety of substrates across cellular membranes. ABC transporters minimally consist of two conserved regions: a highly conserved nucleotide-binding domain (NBD) and a less conserved transmembrane domain (TMD). Eukaryotic ABC proteins are usually organised either as full transporters (containing two NBDs and two TMDs), or as half transporters (containing one NBD and one TMD), that have to form homo- or heterodimers in order to constitute a functional protein [
].This entry represents Tap2 (also known as antigen peptide transporter 2), which is an eukaryotic protein belonging to the ABC transporter family. It plays a crucial role in the processing and presentation of the MHC class I-restricted antigens. It is a half transporter that forms a complex with Tap1. This complex translocates antigens from the cytoplasm to the endoplasmic reticulum for loading onto MHC class I molecules [
].
Folliculin (FLCN) is a tumor suppressor that enables nutrient-dependent activation of the mechanistic target of rapamycin complex 1 (mTORC1) protein kinase via its guanosine triphosphatase (GTPase) Activating Protein (GAP) activity. It belong to the DENN module family of proteins and contains a divergent DENN module comprised of a N-terminal longin domain (also known as upstream DENN domain, u-DENN), followed by a DENN domain. It forms a complex with its partners, FNIP1 or FNIP2 (Folliculin interacting protein 1 or 2), which directly contacts the Rag GTPases RagC/D to stimulate GTP hydrolysis and thus promote the conversion to the GDP-bound state. FLCN-FNIP2 adopts an extended conformation with two pairs of heterodimerized domains. They contain longin domains that heterodimerize and contact both nucleotide binding domains of the Rag heterodimer, and C-terminal DENN domains which interact at the distal end of the structure [
,
,
].This is a subdomain found at the C terminus in the DENN domain of folliculin.
This domain entry includes the N terminus of the actin-interacting protein sperm-specific antigen 2 (SSFA2), also known as Ki-ras-induced actin-interacting protein (KRAP) [
]. In this region are found the residues that interact with inositol 1,4,5-trisphosphate receptor (IP3R, also known as ITPR). SSFA2 was first localised as a membrane-bound form with extracellular regions suggesting it might be involved in the regulation of filamentous actin and signals from the outside of the cells []. It has now been shown to be critical for the proper subcellular localisation and function of IP3R. Inositol 1,4,5-trisphosphate receptor functions as the Ca2+ release channel on specialised endoplasmic reticulum membranes, so the subcellular localisation of IP3R is crucial for its proper function [].This entry also recognises a domain in Tespa1 and in uncharacterized coiled-coil protein CCDC129. Tespa1 (thymocyte-expressed positive selection-associated protein 1) is required for the development and maturation of T-cells [
]. Tespa1 shows sequence homology to SSFA2 and physically associates with IP3R in T and B lymphocytes [].
Glutathione S-transferases (GSTs) are soluble proteins with typical molecular masses of around 50kDa, each composed of two polypeptide subunits. GSTs catalyse the transfer of the tripeptide glutathione (gamma-glutamyl-cysteinyl-glycine; GSH) to a co-substrate (R-X) containing a reactive electrophillic centre to form a polar S-glutathionylated reaction product (R-SG). Each soluble GST is a dimer of approximately 26kDa subunits, typically forming a hydrophobic 50kDa protein with an isoelectric point in the pH range 4-5. The ability to form heterodimers greatly increases the diversity of the GSTs, but the functional significance of this mixing and matching of subunits has yet to be determined. Each GST subunit of the protein dimer contains an independent catalytic site composed of two components. The first is a binding site specific for GSH or a closely related homologue (the G site) formed from a conserved group of amino-acid residues in the amino-terminal domain of the polypeptide. The second component is a site that binds the hydrophobic substrate (the H site), which is much more structurally variable and is formed from residues in the carboxy-terminal domain. Between the two domains is a short variable linker region of 5-10 residues. The GST proteins have evolved by gene duplication to perform a range of functional roles. GSTs also have non-catalytic roles, binding flavonoid natural products in the cytosol prior to their deposition in the vacuole. Recent studies have also implicated GSTs as components of ultraviolet-inducible cell signalling pathways and as potential regulators of apoptosis. The mammalian GSTs active in drug metabolism are now classified into the alpha, mu and pi classes. Additional classes of GSTs have been identified in animals that do not have major roles in drug metabolism; these include the sigma GSTs, which function as prostaglandin synthases. In cephalopods, however, sigma GSTs are lens S-crystallins, giving an indication of the functional diversity of these proteins. The soluble glutathione transferases can be divided into the phi, tau, theta, zeta and lambda classes. The theta and zeta GSTs have counterparts in animals, whereas the other classes are plant-specific. In the case of phi and tau GSTs, only subunits from the same class will dimerise. Within a class, however, the subunits can dimerise even if they are quite different in amino-acid sequence. An insect-specific delta class has also been described, and bacteria contain a prokaryote-specific beta class of GST.
Recombinant human omega-class GST shows glutathione-dependent thiol transferase and dehydroascorbate reduction activity. This sort of activity has not been observed in any other
class of GSTs, but is associated with the glutaredoxins (thioltransferases).Members of this class of GST have a novel unique N-terminal extension, and a
cysteine residue in the active site, which is different from the tyrosineand serine residues found at the active sites of other eukaryotic GSTs [
].
Histone proteins have central roles in both chromatin organisation (as
structural units of the nucleosome) and gene regulation (as dynamic componentsthat have a direct impact on DNA transcription and replication). Eukaryotic
DNA wraps around a histone octamer to form a nucleosome, the first order ofcompaction of eukaryotic chromatin. The core histone octamer is composed of a
central H3-H4 tetramer and two flanking H2A-H2B dimers. Each of the corehistone contains a common structural motif, called the histone fold, which
facilitates the interactions between the individual core histones.In addition to the core histones, there is a "linker histone"called H1 (or H5 in avian species). The linker histones present in all multicellular eukaryotes are the most divergent group of histones, with numerous cell type- and stage-specific variant. Linker histone H1 is an essential component of chromatin structure. H1 links nucleosomes into higher order structures.
Histone H5 performs the same function as histone H1, and replaces H1 in certain cells. The structure of GH5, the globular domain of the linker histone H5 is known [,
]. The fold is similar to the DNA-binding domain of the catabolite gene activator protein, CAP, thus providing a possible model for the binding of GH5 to DNA.The linker histones, which do not contain the histone fold motif, are critical to the higher-order compaction of chromatin, because they bind to internucleosomal DNA and facilitate interactions between individual nucleosomes. In addition, H1 variants have been shown to be involved in the regulation of developmental genes. A common feature of this protein family is a tripartite structure in which a globular (H15) domain of about 80 amino acids is flanked by two less structured N- and C-terminal tails. The H15
domain is also characterised by high sequence homology among the family oflinker histones. The highly conserved H15 domain is essential for the binding
of H1 or H5 to the nucleosome. It consists of a three helix bundle (I-III),with a β-hairpin at the C terminus. There is also a short three-residue
stretch between helices I and II that is in the β-strand conformation.Together with the C-terminal β-hairpin, this strand forms the third strand
of an antiparallel β-sheet [,
,
,
].Histone H5 is a nuclear protein involved in the condensation of nucleosome chains into higher order structures. In this respect, it performs the same function as histone H1, and replaces H1 in certain cells. The structure of GH5, the globular domain (residues 22-100) of the linker histone H5, has been solved. The fold is similar to the DNA-binding domain of the catabolite gene activator protein, CAP, thus providing a possible model for the binding of GH5 to DNA. The structure comprises 3 α-helices and 2 short β-strands [
,
].
Over 70 metallopeptidase families have been identified to date. In these enzymes a divalent cation which is usually zinc, but may be cobalt, manganese or copper, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. In some families of co-catalytic metallopeptidases, two metal ions are observed in crystal structures ligated by five amino acids, with one amino acid ligating both metal ions. The known metal ligands are His, Glu, Asp or Lys. At least one other residue is required for catalysis, which may play an electrophillic role.
Many metalloproteases contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site []. The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as 'abXHEbbHbc', where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases [].This group of sequences contain a diverse range of gene families, which include metallopeptidases belonging to MEROPS peptidase family M14 (carboxypeptidase A, clan MC), subfamilies M14A and M14B.The carboxypeptidase A family can be divided into four subfamilies: M14A
(carboxypeptidase A or digestive), M14B (carboxypeptidase H or regulatory), M14C (gamma-D-glutamyl-L-diamino acid peptidase I) and M14D (AGTPBP-1/Nna1-like proteins) [,
]. Members of subfamily M14B have longer C-termini than those of subfamily M14A [], and carboxypeptidase M (a member of the H family) is bound to the membrane by a glycosylphosphatidylinositol anchor, unlike the majority of the M14 family, which are soluble [].ATP/GTP binding protein (AGTPBP-1/Nna1)-like proteins are active metallopeptidases that act on cytosolic proteins such as alpha-tubulin, to remove a C-terminal tyrosine. Mutations in AGTPBP-1/Nna1 cause Purkinje cell degeneration (pcd). AGTPBP-1/Nna1-like proteins from the different phyla are highly diverse, but they all contain a unique N-terminal conserved domain right before the CP domain. It has been suggested that this N-terminal domain might act as a folding domain [
,
,
,
].The zinc ligands have been determined as two histidines and a glutamate,
and the catalytic residue has been identified as a C-terminal glutamate,but these do not form the characteristic metalloprotease HEXXH motif [
,
]. Members of the carboxypeptidase A family are synthesised as inactive molecules with propeptides that must be cleaved to activate the enzyme. Structural studies of carboxypeptidases A and B reveal the propeptide to exist as a globular domain, followed by an extended α-helix; this shields the catalytic site, without specifically binding to it, while the substrate-binding site is blocked by making specific contacts [,
].
Globins are haem-containing proteins involved in binding and/or transporting oxygen. They belong to a very large and well studied family that is widely distributed in many organisms [
]. Globins have evolved from a common ancestor and can be divided into three groups: single-domain globins, and two types of chimeric globins, flavohaemoglobins and globin-coupled sensors. Bacteria have all three types of globins, while archaea lack flavohaemoglobins, and eukaryotes lack globin-coupled sensors []. Several functionally different haemoglobins can coexist in the same species. The major types of globins include:Haemoglobin (Hb): tetramer of two alpha and two beta chains, although embryonic and foetal forms can substitute the alpha or beta chain for ones with higher oxygen affinity, such as gamma, delta, epsilon or zeta chains. Hb transports oxygen from lungs to other tissues in vertebrates [
]. Hb proteins are also present in unicellular organisms where they act as enzymes or sensors [].Myoglobin (Mb): monomeric protein responsible for oxygen storage in vertebrate muscle [
].Neuroglobin: a myoglobin-like haemprotein expressed in vertebrate brain and retina, where it is involved in neuroprotection from damage due to hypoxia or ischemia [
]. Neuroglobin belongs to a branch of the globin family that diverged early in evolution. Cytoglobin: an oxygen sensor expressed in multiple tissues. Related to neuroglobin [
].Erythrocruorin: highly cooperative extracellular respiratory proteins found in annelids and arthropods that are assembled from as many as 180 subunit into hexagonal bilayers [
].Leghaemoglobin (legHb or symbiotic Hb): occurs in the root nodules of leguminous plants, where it facilitates the diffusion of oxygen to symbiotic bacteriods in order to promote nitrogen fixation.Non-symbiotic haemoglobin (NsHb): occurs in non-leguminous plants, and can be over-expressed in stressed plants [
].Flavohaemoglobins (FHb): chimeric, with an N-terminal globin domain and a C-terminal ferredoxin reductase-like NAD/FAD-binding domain. FHb provides protection against nitric oxide via its C-terminal domain, which transfers electrons to haem in the globin [
].Globin-coupled sensors: chimeric, with an N-terminal myoglobin-like domain and a C-terminal domain that resembles the cytoplasmic signalling domain of bacterial chemoreceptors. They bind oxygen, and act to initiate an aerotactic response or regulate gene expression [
,
]. Protoglobin: a single domain globin found in archaea that is related to the N-terminal domain of globin-coupled sensors [].Truncated 2/2 globin: lack the first helix, giving them a 2-over-2 instead of the canonical 3-over-3 α-helical sandwich fold. Can be divided into three main groups (I, II and II) based on structural features [
].This entry represents fairy shrimp polymeric haemoglobin. Unlike vertebrates, many invertebrates display extracellular haemoglobins dissolved in the haemolymph. Elimination from the haemolymph by excretory processes is avoided by the concatenation of globin domains into high molecular weight polymers [
,
,
].
This group of sequences represent the p45 (45kDa) precursor of caspases, which can be processed to produce the active p20 (20kDa) and p10 (10kDa) subunits. Caspases (Cysteine-dependent ASPartyl-specific proteASE) are cysteine peptidases that belong to the MEROPS peptidase family C14 (caspase family, clan CD) based on the architecture of their catalytic dyad or triad [
]. Caspases are tightly regulated proteins that require zymogen activation to become active, and once active can be regulated by caspase inhibitors. Activated caspases act as cysteine proteases, using the sulphydryl group of a cysteine side chain for catalysing peptide bond cleavage at aspartyl residues in their substrates. The catalytic cysteine and histidine residues are on the p20 subunit after cleavage of the p45 precursor.Caspases are mainly involved in mediating cell death (apoptosis) [
,
,
]. They have two main roles within the apoptosis cascade: as initiators that trigger the cell death process, and as effectors of the process itself. Caspase-mediated apoptosis follows two main pathways, one extrinsic and the other intrinsic or mitochondrial-mediated. The extrinsic pathway involves the stimulation of various TNF (tumour necrosis factor) cell surface receptors on cells targeted to die by various TNF cytokines that are produced by cells such as cytotoxic T cells. The activated receptor transmits the signal to the cytoplasm by recruiting FADD, which forms a death-inducing signalling complex (DISC) with caspase-8. The subsequent activation of caspase-8 initiates the apoptosis cascade involving caspases 3, 4, 6, 7, 9 and 10. The intrinsic pathway arises from signals that originate within the cell as a consequence of cellular stress or DNA damage. The stimulation or inhibition of different Bcl-2 family receptors results in the leakage of cytochrome c from the mitochondria, and the formation of an apoptosome composed of cytochrome c, Apaf1 and caspase-9. The subsequent activation of caspase-9 initiates the apoptosis cascade involving caspases 3 and 7, among others. At the end of the cascade, caspases act on a variety of signal transduction proteins, cytoskeletal and nuclear proteins, chromatin-modifying proteins, DNA repair proteins and endonucleases that destroy the cell by disintegrating its contents, including its DNA. The different caspases have different domain architectures depending upon where they fit into the apoptosis cascades, however they all carry the catalytic p10 and p20 subunits.Caspases can have roles other than in apoptosis, such as caspase-1 (interleukin-1 beta convertase) (
), which is involved in the inflammatory process. The activation of apoptosis can sometimes lead to caspase-1 activation, providing a link between apoptosis and inflammation, such as during the targeting of infected cells. Caspases may also be involved in cell differentiation [
].There are non-peptidase homologues in the caspase family, such as CASP8 and FADD-like apoptosis regulator (CASH/c-FLIP), which suppresses death receptor induced apoptosis and TCR activation induced cell death by inhibiting caspase-8 activation [
,
,
].
This entry includes fibroblast growth factor 15/19/21 (FGF15/19/21).Fibroblast growth factor 15 (FGF15) plays a key role in enterohepatic signaling, regulation of liver bile acid biosynthesis, gallbladder motility and metabolic homeostasis [
,
,
]. Mouse FGF15 has been shown to be stimulated when bile acids bind to farnesoid X receptor (FXR) [], and is therefore thought to a factor in chronic bile acid diarrhoea and in certain metabolic disorders [].FGF15 has been experimentally characterised in mouse, but has not been found in other species. However, there is an orthologous human protein, FGF19, and together they share about 50% amino acid identity and display similar endocrine functions, so are often referred to as FGF15/19 [
,
]. FGF15 and FGF19 differ from other FGFs due to subtle changes in their tertiary structure, they have low heparin binding affinity enabling them to diffuse away from their site of secretion and signal to distant cells. FGF signaling through the FGF receptors is also different, as they require klotho protein cofactors rather than heparin sulfate proteoglycan [].Fibroblast growth factor 19 (FGF19) plays a key role in enterohepatic signaling, regulation of liver bile acid biosynthesis, gallbladder motility and metabolic homeostasis [
,
,
]. Human FGF19 expression has been shown to be stimulated approximately 300-fold by physiological concentrations of bile acids including chenodeoxycholic acid, glycochenodeoxycholic acid and obeticholic acid in explants of ileal mucosa []. The protein is thought to be a factor in chronic bile acid diarrhoea and in certain metabolic disorders [,
]. FGF19 has been experimentally characterised in humans and other species, but has not been found in mouse. However there is an orthologous mouse protein, FGF15, and together they share about 50% amino acid identity and display similar endocrine functions, so are often referred to as FGF15/19 [,
]. FGF15 and FGF19 differ from other FGFs due to subtle changes in their tertiary structure. They have low heparin binding affinity, enabling them to diffuse away from their site of secretion and signal to distant cells. FGF signaling through the FGF receptors is also different, as they require klotho protein cofactors rather than heparin sulfate proteoglycan []. Unlike other members of the family that can bind several FGF receptors, FGF19 is specific for FGFR4 [].FGF21 stimulates glucose uptake in differentiated adipocytes via the induction of glucose transporter SLC2A1/GLUT1 expression [
]. FGF21 has been shown to protect animals from diet-induced obesity when overexpressed in transgenic mice. It also lowers blood glucose and triglyceride levels when administered to diabetic rodents [], suggesting it may exhibit the therapeutic characteristics necessary for effective treatment of diabetes. Treatment of animals with FGF21 results in increased energy expenditure, fat utilisation and lipid excretion []. FGF21 is most abundantly expressed in the liver, and also expressed in the thymus at lower levels [].
G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups [
]. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [,
,
,
,
]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The rhodopsin-like GPCRs (GPCRA) represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7 transmembrane (TM) helices [
,
,
].Prostanoids (prostaglandins (PG) and thromboxanes (TX)) mediate a wide variety of actions and play important physiological roles in the cardiovascular and immune systems, and in pain sensation in peripheral systems. PGI2 and TXA2 have opposing actions, involving regulation of the interaction of platelets with the vascular endothelium, while PGE2, PGI2 and PGD2 are powerful vasodilators and potentiate the action of various autocoids to induce plasma extravasation and pain sensation. To date, evidence for at least 5 classes of prostanoid receptor has been obtained. However, identification of subtypes and their distribution is hampered by expression of more than one receptor within a tissue, coupled with poor selectivity of available agonists and antagonists.EP3 receptors mediate contraction in a wide range of smooth muscles,
including gastrointestinal and uterine. They also inhibit neurotransmitter release in central and autonomic nerves through a presynaptic action,and inhibit secretion in glandular tissues (e.g., acid secretion from
gastric mucosa, and sodium and water reabsorption in the kidney). mRNAis found in high levels in the kidney and uterus, and in lower levels in
the brain, thymus, lung, heart, stomach and spleen. The receptors activateadenylate cyclase via an uncharacterised G-protein, probably of the Gi/Go
class.
Globins are haem-containing proteins involved in binding and/or transporting oxygen. They belong to a very large and well studied family that is widely distributed in many organisms [
]. Globins have evolved from a common ancestor and can be divided into three groups: single-domain globins, and two types of chimeric globins, flavohaemoglobins and globin-coupled sensors. Bacteria have all three types of globins, while archaea lack flavohaemoglobins, and eukaryotes lack globin-coupled sensors []. Several functionally different haemoglobins can coexist in the same species. The major types of globins include:Haemoglobin (Hb): tetramer of two alpha and two beta chains, although embryonic and foetal forms can substitute the alpha or beta chain for ones with higher oxygen affinity, such as gamma, delta, epsilon or zeta chains. Hb transports oxygen from lungs to other tissues in vertebrates [
]. Hb proteins are also present in unicellular organisms where they act as enzymes or sensors [].Myoglobin (Mb): monomeric protein responsible for oxygen storage in vertebrate muscle [
].Neuroglobin: a myoglobin-like haemprotein expressed in vertebrate brain and retina, where it is involved in neuroprotection from damage due to hypoxia or ischemia [
]. Neuroglobin belongs to a branch of the globin family that diverged early in evolution. Cytoglobin: an oxygen sensor expressed in multiple tissues. Related to neuroglobin [
].Erythrocruorin: highly cooperative extracellular respiratory proteins found in annelids and arthropods that are assembled from as many as 180 subunit into hexagonal bilayers [].Leghaemoglobin (legHb or symbiotic Hb): occurs in the root nodules of leguminous plants, where it facilitates the diffusion of oxygen to symbiotic bacteriods in order to promote nitrogen fixation.Non-symbiotic haemoglobin (NsHb): occurs in non-leguminous plants, and can be over-expressed in stressed plants [
].Flavohaemoglobins (FHb): chimeric, with an N-terminal globin domain and a C-terminal ferredoxin reductase-like NAD/FAD-binding domain. FHb provides protection against nitric oxide via its C-terminal domain, which transfers electrons to haem in the globin [
].Globin-coupled sensors: chimeric, with an N-terminal myoglobin-like domain and a C-terminal domain that resembles the cytoplasmic signalling domain of bacterial chemoreceptors. They bind oxygen, and act to initiate an aerotactic response or regulate gene expression [
,
]. Protoglobin: a single domain globin found in archaea that is related to the N-terminal domain of globin-coupled sensors [
].Truncated 2/2 globin: lack the first helix, giving them a 2-over-2 instead of the canonical 3-over-3 α-helical sandwich fold. Can be divided into three main groups (I, II and II) based on structural features [
].This entry represents a group of globins found in nematode worms. Nematodes have several very divergent globins, 35 having been described in Caenorhabditis elegans alone [
]. This entry includes the nematode globins: body wall globin (GlbB), cuticle globin (GlbC), eye globin (GlbE), intracellular globin (GlbM), and pseudocoelomic globin (GlbP) [,
,
,
].
G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups [
]. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [,
,
,
,
]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice [
]. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].GPCR Fungal pheromone mating factor receptors form a distinct family of G-protein-coupled receptors, and are also known as Class D GPCRs.The Fungal pheromone mating factor receptors STE2 and STE3 are integral membrane proteins that may be involved in the response to mating factors on the cell membrane [
,
,
]. The amino acid sequences of both receptors contain high proportions of hydrophobic residues grouped into 7 domains,in a manner reminiscent of the rhodopsins and other receptors believed tointeract with G-proteins. However, while a similar 3D framework has been proposed to account for this, there is no significant sequence similarity either between STE2 and STE3, or between these and the rhodopsin-type family: the receptors thereofore bear their own unique '7TM' signatures which is why they have been given their own GPCR group: Class D Fungal mating pheromone receptors.The STE3 gene in Saccharomyces cerevisiae is the cell-surface receptor that binds the
13-residue lipopeptide a-factor. Several related fungal pheromone receptorsequences are known: these include pheromone B alpha 1 and B alpha 3, and
pheromone B beta 1 receptors from Schizophyllum commune; pheromone receptor1 from Ustilago hordei; and pheromone receptors 1 and 2 from Ustilago maydis.
Members of the family share about 20% sequence identity.The multiallelic mating type locus B alpha-1 of S. commune encodes
a pheromone receptor and putative pheromone genes []. Analysis of this locus has provided evidence that pheromones and pheromone receptors govern recognition of self versus non-self, and sexual development in this homobasidiomycetous fungus [].This entry represents pheromone B alpha type receptors.
G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups []. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [,
,
,
,
]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The rhodopsin-like GPCRs (GPCRA) represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7 transmembrane (TM) helices [
,
,
].In addition to their role in energy metabolism, purines (especially
adenosine and adenine nucleotides) produce a wide range of pharmacologicaleffects mediated by activation of cell surface receptors. Distinct
receptors exist for adenosine. In the periphery, the main effects ofadenosine include vasodilation, bronchoconstriction, immunosuppresion,
inhibition of platelet aggregation, cardiac depression, stimulation ofnociceptive afferents, inhibition of neurotransmitter release and
inhibition of the release of other factors, e.g. hormones. In the CNS,adenosine exerts a pre- and post-synaptic depressant action, reducing motor
activity, depressing respiration, inducing sleep and relieving anxiety. Thephysiological role of adenosine is thought to be to adjust energy demands
in line with oxygen supply. Many of the clinical actions of methylxanthinesare thought to be mediated through antagonism of adenosine receptors. Four
subtypes of receptor have been identified, designated A1, A2A, A2B and A3.A3 receptors are found in high levels in the testis, and in lower levels in
the lung, kidney and heart. They are also found in low levels in regionsof the CNS (including the cerebral cortex, striatum and olfactory bulb). The
presence in high levels in the testis has led to the suggestion that it mayplay a role in reproduction. The A3 receptor inhibits adenylyl cyclase
through a pertussis-toxin-sensitive G-protein, probably belonging to theGi/Go class.
GPCR, family 2, calcitonin gene-related peptide, type 1 receptor
Type:
Family
Description:
G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups [
]. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [,
,
,
,
]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The secretin-like GPCRs include secretin [
], calcitonin [], parathyroid hormone/parathyroid hormone-related peptides [] and vasoactive intestinal peptide [], all of which activate adenylyl cyclase and the phosphatidyl-inositol-calcium pathway. These receptors contain seven transmembrane regions, in a manner reminiscent of the rhodopsins and other receptors believed to interact with G-proteins (however there is no significant sequence identity between these families, the secretin-like receptors thus bear their own unique '7TM' signature). Their N-terminal is probably located on the extracellular side of the membrane and potentially glycosylated. This N-terminal region contains a long conserved region which allows the binding of large peptidic ligand such as glucagon, secretin, VIP and PACAP; this region contains five conserved cysteines residues which could be involved in disulphide bond. The C-terminal region of these receptor is probably cytoplasmic. Every receptor gene in this family is encoded on multiple exons, and several of these genes are alternatively spliced to yield functionally distinct products. Calcitonin gene-related peptide (CGRP) type 1 receptor (also known as Calcitonin receptor-like receptor) is a neuropeptide with diverse
biological effects including potent vasodilator activity [,
]. Messenger RNA for this receptor is predominantly expressed in the lung and heart, with specific localisation to lung alveolar cells and cardiac myocytes []. Mutations in the gene for this protein has been related to pontaneous miscarriage and subfertility []. In the rat lung, it is associated with blood vessels; the gene may therefore play an important role in the maintenance of vascular tone []. mRNA is also found in the cerebellum []. The ligand for this receptor-like protein remains to be discovered.
G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups [
]. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [,
,
,
,
]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The rhodopsin-like GPCRs (GPCRA) represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7 transmembrane (TM) helices [
,
,
].In addition to their role in energy metabolism, purines (especially
adenosine and adenine nucleotides) produce a wide range of pharmacologicaleffects mediated by activation of cell surface receptors. Distinct
receptors exist for adenosine. In the periphery, the main effects ofadenosine include vasodilation, bronchoconstriction, immunosuppresion,
inhibition of platelet aggregation, cardiac depression, stimulation ofnociceptive afferents, inhibition of neurotransmitter release and
inhibition of the release of other factors, e.g. hormones. In the CNS,adenosine exerts a pre- and post-synaptic depressant action, reducing motor
activity, depressing respiration, inducing sleep and relieving anxiety. Thephysiological role of adenosine is thought to be to adjust energy demands
in line with oxygen supply. Many of the clinical actions of methylxanthinesare thought to be mediated through antagonism of adenosine receptors. Four
subtypes of receptor have been identified, designated A1, A2A, A2B and A3.A1 receptors are distributed widely in peripheral tissues (e.g., heart,
adipose tissue, kidney, stomach and pancreas), where they have a mainlyinhibitory role, and are also found in peripheral nerves (e.g., in the
intestine and vas deferens). In the CNS, they are present in highlevels, notably in the cerebral cortex, hippocampus, cerebellum, thalamus
and striatum. The receptors inhibit adenylyl cyclase and voltage-dependentcalcium channels, and activate potassium channels through a pertussis-toxin-sensitive G-protein, probably of the Gi/Go class.
G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups [
]. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [,
,
,
,
]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The rhodopsin-like GPCRs (GPCRA) represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7 transmembrane (TM) helices [
,
,
].In addition to their role in energy metabolism, purines (especially
adenosine and adenine nucleotides) produce a wide range of pharmacologicaleffects mediated by activation of cell surface receptors. Distinct
receptors exist for adenosine. In the periphery, the main effects ofadenosine include vasodilation, bronchoconstriction, immunosuppresion,
inhibition of platelet aggregation, cardiac depression, stimulation ofnociceptive afferents, inhibition of neurotransmitter release and
inhibition of the release of other factors, e.g. hormones. In the CNS,adenosine exerts a pre- and post-synaptic depressant action, reducing motor
activity, depressing respiration, inducing sleep and relieving anxiety. Thephysiological role of adenosine is thought to be to adjust energy demands
in line with oxygen supply. Many of the clinical actions of methylxanthinesare thought to be mediated through antagonism of adenosine receptors. Four
subtypes of receptor have been identified, designated A1, A2A, A2B and A3.A2B receptors are widespread in the human brain relative to A2A receptors.
By contrast, however, in the rat its mRNA is found only in low levels inthe brain and it has a unique distribution in the periphery, high levels
occurring in the intestine and bladder. The receptor stimulates cAMPthrough G proteins.
This entry represents the C-terminal conserved domain found in caspases mostly from animals. This domain includes the core of p45 (45kDa) precursor of caspases, which can be processed to produce the active p20 (20kDa) and p10 (10kDa) subunits. Caspases (Cysteine-dependent ASPartyl-specific proteASE) are cysteine peptidases that belong to the MEROPS peptidase family C14 (caspase family, clan CD) based on the architecture of their catalytic dyad or triad [
]. Caspases from animals can be classified as C14A subfamiy. Caspases are tightly regulated proteins that require zymogen activation to become active, and once active can be regulated by caspase inhibitors. Activated caspases act as cysteine proteases, using the sulphydryl group of a cysteine side chain for catalysing peptide bond cleavage at aspartyl residues in their substrates. The catalytic cysteine and histidine residues are on the p20 subunit after cleavage of the p45 precursor.Caspases are mainly involved in mediating cell death (apoptosis) [
,
,
]. They have two main roles within the apoptosis cascade: as initiators that trigger the cell death process, and as effectors of the process itself. Caspase-mediated apoptosis follows two main pathways, one extrinsic and the other intrinsic or mitochondrial-mediated. The extrinsic pathway involves the stimulation of various TNF (tumour necrosis factor) cell surface receptors on cells targeted to die by various TNF cytokines that are produced by cells such as cytotoxic T cells. The activated receptor transmits the signal to the cytoplasm by recruiting FADD, which forms a death-inducing signalling complex (DISC) with caspase-8. The subsequent activation of caspase-8 initiates the apoptosis cascade involving caspases 3, 4, 6, 7, 9 and 10. The intrinsic pathway arises from signals that originate within the cell as a consequence of cellular stress or DNA damage. The stimulation or inhibition of different Bcl-2 family receptors results in the leakage of cytochrome c from the mitochondria, and the formation of an apoptosome composed of cytochrome c, Apaf1 and caspase-9. The subsequent activation of caspase-9 initiates the apoptosis cascade involving caspases 3 and 7, among others. At the end of the cascade, caspases act on a variety of signal transduction proteins, cytoskeletal and nuclear proteins, chromatin-modifying proteins, DNA repair proteins and endonucleases that destroy the cell by disintegrating its contents, including its DNA. The different caspases have different domain architectures depending upon where they fit into the apoptosis cascades, however they all carry the catalytic p10 and p20 subunits.Caspases can have roles other than in apoptosis, such as caspase-1 (interleukin-1 beta convertase) (
), which is involved in the inflammatory process. The activation of apoptosis can sometimes lead to caspase-1 activation, providing a link between apoptosis and inflammation, such as during the targeting of infected cells. Caspases may also be involved in cell differentiation [].
G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups [
]. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [,
,
,
,
]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The rhodopsin-like GPCRs (GPCRA) represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7 transmembrane (TM) helices [
,
,
].Bombesins are peptide neurotransmitters whose biological activity resides
in a common C-terminal sequence, WAXGHXM. In the periphery, bombesin-related peptides stimulate smooth muscle and glandular secretion. In thebrain, these peptides are believed to play a role in homeostasis, thermoregulation and metabolism, and have been reported to elicit analgesia and
excessive grooming, together with central regulation of a variety ofperipheral effects.Mammalian bombesins are encoded by 2 genes. The preproGRP gene transcript
encodes a precursor of 147 amino acids, which gives GRP and GRP18-27. ThepreproNMB gene transcript encodes a precursor of 117 amino acids, which is
metabolised to neuromedin B. Receptors for these peptides have widespreaddistribution in peripheral tissue. High levels are found in smooth muscle
and in the brain.The gastrin-releasing peptide receptor has a wide distribution in peripheral
tissue. High levels are found in smooth muscle (e.g., intestine, stomachand bladder) and in secretory glands (e.g., pancreas). In the brain, it is
found in high levels in the hypothalamus, and is present in other areas inlower levels (e.g., the olfactory tract, dendate gyrus and cortex). It
is also found in various cell lines (e.g., Swiss 3T3 fibroblasts and small-cell lung carcinomas). GRP receptors activate the phosphoinositidepathway via a pertussis-toxin-insensitive G-protein, probably of the Gq/G11
class.
Bacterial species have many methods of controlling gene expression and cell
growth. Regulation of gene expression in response to changes in cell density is termed quorum sensing [,
]. Quorum-sensing bacteria produce, release and respond to hormone-like molecules (autoinducers) that accumulate in the external environment as the cell population grows. Once a threshold of these molecules is reached, a signal transduction cascade is triggered that ultimately leads to behavioural changes in the bacterium []. Autoinducers are thus clearly important mediators of molecular communication.Conjugal transfer of Agrobacterium octopine-type Ti plasmids is activated
by octopine, a metabolite released from plant tumours []. Octopine causes conjugal donors to secrete a pheromone, Agrobacterium autoinducer (AAI),and exogenous AAI further stimulates conjugation. The putative AAI synthase and an AAI-responsive transcriptional regulator have been found to be encoded by the Ti plasmid traI and traR genes, respectively. TraR and TraI are similar to the LuxR and LuxI regulatory proteins of Vibrio fischeri, and AAI is similar in structure to the diffusable V. fischeri autoinducer, the inducing ligand of LuxR. TraR activates target genes in the presence of AAI and also activates traR and traI themselves, creating two positive-feedback loops. TraR-AAI-mediated activation in wild-type Agrobacterium strains is enhanced by culturing on solid media, suggesting a possible role in cell density sensing [
].Production of light by the marine bacterium V. fischeri and by recombinant hosts containing cloned lux genes is controlled by the density
of the culture []. Density-dependent regulation of lux gene expression has been shown to require a locus consisting of the luxR and luxI genes.In these and other Gram-negative bacteria, N-(3-oxohexanoyl)-L-homoserine lactone (OHHL) acts as the autoinducer by binding to transcriptional regulatory proteins and activating them [
]. OHHL and related molecules, such as N-butanoyl- (BHL), N-hexanoyl- (HHL) and N-oxododecanoyl- (PAI) homoserine lactones, are produced by a family of proteins that share a high level of sequence similarity.Proteins which currently members of this family include:
luxI from V. fischeri.ahyI and asaI from Aeromonas species, which synthesize BHL and whose targets are ahyR and asaR respectively.carI from Erwinia carotovora. The target of OHHL is carR which activates genes involved in the biosynthesis of carbapenem antibiotics.eagI from Enterobacter agglomerans. The target of OHHL is not yet known.esaI from Erwinia stewartii. expI from E. carotovora. lasI from Pseudomonas aeruginosa, which synthesizes PAI and whose target is lasR which activates the transcription of the elastase gene.rhlI (or vsmI) from P. aeruginosa, which synthesizes BHL and HHL and whose target is rhlR.swrI from Serratia liquefaciens, which synthesizes BHL.yenI from Yersinia enterocolitica.This entry represents proteins sequences that match a pattern found in the best conserved region, which is located in the N terminus.
G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups [
]. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [,
,
,
,
]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The rhodopsin-like GPCRs (GPCRA) represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7 transmembrane (TM) helices [
,
,
].Leukotrienes (LT) are potent lipid mediators derived from arachidonic acid
metabolism. They can be divided into two classes based on the presence orabsence of a cysteinyl group. Leukotriene B4 (LTB4) does not contain such a
group, whereas LTC4, LTD4, LTE4 and LTF4 are cysteinyl leukotrienes.Cysteinyl leukotrienes (CysLTs), previously known as the "slow reacting
substance of anaphylaxis", are produced predominantly by myeloid cells
associated with inflammatory responses []. They are the most potent bronchoconstrictors known and also have pro-inflammatory effects, makingthem important mediators in the pathophysiology of human asthma [
]. CysLTs have also been implicated in a variety of other diseases, such as allergic rhinitis, inflammatory bowel disease and psoriasis []. Pharmacological studies of the effects of CysLTs have provided evidence for the existence of at least 2 distinct receptor subtypes, belonging to the G protein-coupled receptor family, designated CysLT1 and CysLT2 [,
]. CysLT1 is thought to mediate bronchospasm, plasma exudation, vasoconstriction, mucus secretion and eosinophil recruitment []. CysLT2 is less well defined, due to a lack of specific agonists and antagonists, but is thought to mediate some of the vascular effects attributed to CysLTs [,
]. Both receptor subtypes have now been cloned [,
].
G protein-coupled receptors (GPCRs) constitute a vast protein family that encompasses a wide range of functions, including various autocrine, paracrine and endocrine processes. They show considerable diversity at the sequence level, on the basis of which they can be separated into distinct groups [
]. The term clan can be used to describe the GPCRs, as they embrace a group of families for which there are indications of evolutionary relationship, but between which there is no statistically significant similarity in sequence []. The currently known clan members include rhodopsin-like GPCRs (Class A, GPCRA), secretin-like GPCRs (Class B, GPCRB), metabotropic glutamate receptor family (Class C, GPCRC), fungal mating pheromone receptors (Class D, GPCRD), cAMP receptors (Class E, GPCRE) and frizzled/smoothened (Class F, GPCRF) [,
,
,
,
]. GPCRs are major drug targets, and are consequently the subject of considerable research interest. It has been reported that the repertoire of GPCRs for endogenous ligands consists of approximately 400 receptors in humans and mice []. Most GPCRs are identified on the basis of their DNA sequences, rather than the ligand they bind, those that are unmatched to known natural ligands are designated by as orphan GPCRs, or unclassified GPCRs [].The rhodopsin-like GPCRs (GPCRA) represent a widespread protein family that includes hormone, neurotransmitter and light receptors, all of which transduce extracellular signals through interaction with guanine nucleotide-binding (G) proteins. Although their activating ligands vary widely in structure and character, the amino acid sequences of the receptors are very similar and are believed to adopt a common structural framework comprising 7 transmembrane (TM) helices [
,
,
].Neuromedin U is a neuropeptide, first isolated from porcine spinal cord and
expressed widely in the gastrointestinal, genitourinary and central nervoussystems [
]. Neuromedin U has potent contractile activity on smooth muscle and this activity is believed to reside within the C-terminal portion of the peptide, which is highly conserved between species. Other roles
for the peptide include: regulation of blood flow and ion transport in the intestine, regulation of adrenocortical function and increased blood
pressure []. The roles of neuromedin U in the central nervous systemare poorly understood, but may include: regulation of food intake,
neuroendocrine control, modulation of dopamine actions and involvement inneuropsychiatric disorders. Two G protein-coupled receptor subtypes,
with differing expression patterns, have been identified and shown to bindneuromedin U.
The neuromedin U type 2 receptor (NMU2) is expressed most abundantly in thecentral nervous system, particularly in the medulla oblongata, pontine
reticular formation, substantia nigra, spinal cord and thalamus []. High levels of expression have also been found in the thymus, thyroid and testes[
]. NMU2 has been detected at much lower levels in some peripheral tissues, including the kidney, lung, trachea and gastrointestinal tract.
Asparaginyl endopeptidase, also known as legumain, is a family of cysteine proteases found in many organisms. This group of cysteine peptidases belong to the MEROPS peptidase family C13 (legumain family, clan CD). A type example is legumain from Canavalia ensiformis (Jack bean, Horse bean) [
]. Although legumains were first described from beans (also known as Vacuolar Processing Enzymes), homologues have been identified in plants, protozoa, vertebrates, and helminths [,
]. In blood-feeding helminths, asparaginyl endopeptidases (sometimes described as hemoglobinases) have been located in the gut and are considered to be involved in host hemoglobin digestion [,
,
,
].Also included in the family C13 of cysteine peptidases are GPI-anchor transamidases, which share significant homology with legumains. GPI-anchor transamidases mediate glycosylphosphatidylinositol (GPI) anchoring in the endoplasmic reticulum [
,
].A cysteine peptidase is a proteolytic enzyme that hydrolyses a peptide bond using the thiol group of a cysteine residue as a nucleophile. Hydrolysis involves usually a catalytic triad consisting of the thiol group of the cysteine, the imidazolium ring of a histidine, and a third residue, usually asparagine or aspartic acid, to orientate and activate the imidazolium ring. In only one family of cysteine peptidases, is the role of the general base assigned to a residue other than a histidine: in peptidases from family C89 (acid ceramidase) an arginine is the general base. Cysteine peptidases can be grouped into fourteen different clans, with members of each clan possessing a tertiary fold unique to the clan. Four clans of cysteine peptidases share structural similarities with serine and threonine peptidases and asparagine lyases. From sequence similarities, cysteine peptidases can be clustered into over 80 different families [
]. Clans CF, CM, CN, CO, CP and PD contain only one family.Cysteine peptidases are often active at acidic pH and are therefore confined to acidic environments, such as the animal lysosome or plant vacuole. Cysteine peptidases can be endopeptidases, aminopeptidases, carboxypeptidases, dipeptidyl-peptidases or omega-peptidases. They are inhibited by thiol chelators such as iodoacetate, iodoacetic acid,
N-ethylmaleimide or
p-chloromercuribenzoate.
Clan CA includes proteins with a papain-like fold. There is a catalytic triad which occurs in the order: Cys/His/Asn (or Asp). A fourth residue, usually Gln, is important for stabilising the acyl intermediate that forms during catalysis, and this precedes the active site Cys. The fold consists of two subdomains with the active site between them. One subdomain consists of a bundle of helices, with the catalytic Cys at the end of one of them, and the other subdomain is a β-barrel with the active site His and Asn (or Asp). There are over thirty families in the clan, and tertiary structures have been solved for members of most of these. Peptidases in clan CA are usually sensitive to the small molecule inhibitor E64, which is ineffective against peptidases from other clans of cysteine peptidases [
].Clan CD includes proteins with a caspase-like fold. Proteins in the clan have an α/β/α sandwich structure. There is a catalytic dyad which occurs in the order His/Cys. The active site His occurs in a His-Gly motif and the active site Cys occurs in an Ala-Cys motif; both motifs are preceded by a block of hydrophobic residues [
]. Specificity is predominantly directed towards residues that occupy the S1 binding pocket, so that caspases cleave aspartyl bonds, legumains cleave asparaginyl bonds, and gingipains cleave lysyl or arginyl bonds.Clan CE includes proteins with an adenain-like fold. The fold consists of two subdomains with the active site between them. One domain is a bundle of helices, and the other a β-barrel. The subdomains are in the opposite order to those found in peptidases from clan CA, and this is reflected in the order of active site residues: His/Asn/Gln/Cys. This has prompted speculation that proteins in clans CA and CE are related, and that members of one clan are derived from a circular permutation of the structure of the other.Clan CL includes proteins with a sortase B-like fold. Peptidases in the clan hydrolyse and transfer bacterial cell wall peptides. The fold shows a closed β-barrel decorated with helices with the active site at one end of the barrel [
]. The active site consists of a His/Cys catalytic dyad.Cysteine peptidases with a chymotrypsin-like fold are included in clan PA, which also includes serine peptidases. Cysteine peptidases that are N-terminal nucleophile hydrolases are included in clan PB. Cysteine peptidases with a tertiary structure similar to that of the serine-type aspartyl dipeptidase are included in clan PC. Cysteine peptidases with an intein-like fold are included in clan PD, which also includes asparagine lyases.
The type I glycoprotein S of Coronavirus, trimers of which constitute the typical viral spikes, is assembled into virions through noncovalent interactions with the M protein. The spike glycoprotein is translated as a large polypeptide that is subsequently cleaved to S1 (
) and S2 [
]. The cleavage of S can occur at two distinct sites: S2 or S2' []. The spike is present in two very different forms: pre-fusion (the form on mature virions) and post-fusion (the form after membrane fusion has been completed). The spike is cleaved sequentially by host proteases at two sites: first at the S1/S2 boundary (i.e. S1/S2 site) and second within S2 (i.e. S2' site). After the cleavages, S1 dissociates from S2, allowing S2 to transition to the post-fusion structure []. Both chimeric S proteins appeared to cause cell fusion when expressed individually, suggesting that they were biologically fully active [
]. The spike is a type I membrane glycoprotein that possesses a conserved transmembrane anchor and an unusual cysteine-rich (cys) domain that bridges the putative junction of the anchor and the cytoplasmic tail [].SARS-CoV S is largely uncleaved after biosynthesis. It can be later processed by endosomal cathepsin L, trypsin, thermolysin, and elastase, which are shown to induce syncytia formation and virus entry. Other proteases that are of potential biological relevance in potentiating SARS-CoV S include TMPRSS2, TMPRSS11a, and HAT which are localized on the cell surface and are highly expressed in the human airway [
]. The furin-like S2' cleavage site at KR/SF with P1 and P2 basic residues and a P2' hydrophobic Phe downstream of the IFP is identical between the SARS-CoV-2 and SARS-CoV. One or more furin-like enzymes would cleave the S2' site at KR/SF [,
]. Deletion of SARS-CoV-2 furin cleavage site suggests that it may not be required for viral entry but may affect replication kinetics and altered sites have been still seen proteolytically cleaved. Several substitutions within the S2' cleavage domain of SARS-COV-2 have been reported, including P812L/S/T, S813I/G, F817L, I818S/V, but further experimental study of their consequences and the replication properties of the altered viruses are required to understand the role of furin cleavage in SARS-CoV-2 infection and virulence []. The S2 subunit normally contains multiple key components, including one or more fusion peptides (FP), a second proteolytic site (S2') and two conserved heptad repeats (HRs), driving membrane penetration and virus-cell fusion. The HRs can trimerize into a coiled-coil structure built of three HR1-HR2 helical hairpins presenting as a canonical six-helix bundle and drag the virus envelope and the host cell bilayer into close proximity, preparing for fusion to occur [
]. The fusion core is composed of HR1 and HR2 and at least three membranotropic regions that are denoted as the fusion peptide (FP), internal fusion peptide (IFP), and pretransmembrane domain (PTM). The HR regions are further flanked by the three membranotropic components. Both FP and IFP are located upstream of HR1, while PTM is distally downstream of HR2 and directly precedes the transmembrane domain of SARS-CoV S. All of these three components are able to partition into the phospholipid bilayer to disturb membrane integrity. []. During the pandemic, many conservative amino acid changes in FP segment of SARS-CoV-2 have been reported (i.e., L821I, L822F, K825R, V826L, T827I, L828P, A829T, D830G/A, A831V/S/T, G832C/S, F833S, I834T), although their impact is not known as the active conformation and mode of insertion of SARS-CoV-2 fusion peptide have not been experimentally characterised. Differences in HR1 sequences between SARS-CoV and SARS-CoV-2 suggest that SARS-CoV-2 HR2 makes stronger interactions with HR1. However, the substitutions observed in the solvent accessible surface of the HR1 domain (e.g., D936Y, S943P, S939F) of SARS-CoV-2 do not seem to be involved in stabilizing interactions with HR2. Substitutions in HR2 (e.g., K1073N, V1176F) or the TM or cytoplasmic tail domains have also been observed, but further experimental work is required to determine the effects of these changes [].This entry represents the heptad repeat 1 (HR1) from coronavirus Spike glycoprotein, S2 subunit. This region forms a long trimeric helical coiled-coil structure with peptides from the HR2 region packing in an oblique antiparallel manner on the grooves of the HR1 trimer in a mixed extended and helical conformation. Packing of the helical parts of HR2 on the HR1 trimer grooves and formation of a six-helical bundle plays an important role in the formation of a stable post-fusion structure. In contrast to their extended helical conformations in the post-fusion state, the HR1 motifs within S2 form several shorter helices in their pre-fusion state [,
].
Globins are haem-containing proteins involved in binding and/or transporting oxygen. They belong to a very large and well studied family that is widely distributed in many organisms [
]. Globins have evolved from a common ancestor and can be divided into three groups: single-domain globins, and two types of chimeric globins, flavohaemoglobins and globin-coupled sensors. Bacteria have all three types of globins, while archaea lack flavohaemoglobins, and eukaryotes lack globin-coupled sensors []. Several functionally different haemoglobins can coexist in the same species. The major types of globins include:Haemoglobin (Hb): tetramer of two alpha and two beta chains, although embryonic and foetal forms can substitute the alpha or beta chain for ones with higher oxygen affinity, such as gamma, delta, epsilon or zeta chains. Hb transports oxygen from lungs to other tissues in vertebrates [
]. Hb proteins are also present in unicellular organisms where they act as enzymes or sensors [].Myoglobin (Mb): monomeric protein responsible for oxygen storage in vertebrate muscle [
].Neuroglobin: a myoglobin-like haemprotein expressed in vertebrate brain and retina, where it is involved in neuroprotection from damage due to hypoxia or ischemia [
]. Neuroglobin belongs to a branch of the globin family that diverged early in evolution. Cytoglobin: an oxygen sensor expressed in multiple tissues. Related to neuroglobin [
].Erythrocruorin: highly cooperative extracellular respiratory proteins found in annelids and arthropods that are assembled from as many as 180 subunit into hexagonal bilayers [
].Leghaemoglobin (legHb or symbiotic Hb): occurs in the root nodules of leguminous plants, where it facilitates the diffusion of oxygen to symbiotic bacteriods in order to promote nitrogen fixation.Non-symbiotic haemoglobin (NsHb): occurs in non-leguminous plants, and can be over-expressed in stressed plants [
].Flavohaemoglobins (FHb): chimeric, with an N-terminal globin domain and a C-terminal ferredoxin reductase-like NAD/FAD-binding domain. FHb provides protection against nitric oxide via its C-terminal domain, which transfers electrons to haem in the globin [
].Globin-coupled sensors: chimeric, with an N-terminal myoglobin-like domain and a C-terminal domain that resembles the cytoplasmic signalling domain of bacterial chemoreceptors. They bind oxygen, and act to initiate an aerotactic response or regulate gene expression [
,
]. Protoglobin: a single domain globin found in archaea that is related to the N-terminal domain of globin-coupled sensors [
].Truncated 2/2 globin: lack the first helix, giving them a 2-over-2 instead of the canonical 3-over-3 α-helical sandwich fold. Can be divided into three main groups (I, II and II) based on structural features [
].In vertebrates, hemoglobins (Hb) function to transport oxygen in blood plasma. Hb binds oxygen in the reduced [Fe(II)] state. Hb is composed of four globins in a tetrahedral arrangement, typically two alpha- and two beta-globins, where each monomer binds a heme group. The alpha and beta subunits are highly similar in sequence, but differ structurally in that the beta subunit contains an α-helix (the D helix) that is missing in the alpha subunit []. There is at least one heme-site ligand on each of the alpha and beta subunits. The imidazole ring of the 'proximal' His residue provides the fifth heme iron ligand; the other axial heme iron position remains essentially free for oxygen coordination. The binding of oxygen and carbon dioxide is associated with a variation of the heme iron coordination. Oxygen binding results in a transition from high-spin to low-spin iron, with accompanying changes in the Fe-N bond lengths and coordination geometry. In Hb, these subtle changes lead to the well-known cooperative effect of oxygen binding, which involves a relaxed (R) state when oxygen is bound and a tense (T) state upon oxygen release [,
]. The alpha or beta subunits are substituted in embryo and foetal Hb with subunits that have higher oxygen affinity (gamma, delta, epsilon, pi or zeta subunits). There are at least three types of human embryonic Hb (zeta2epsilon2, alpha2epsilon2, zeta2gamma2) and two foetal Hb (alpha2gamma2, alpha2delta2) [,
]. It has been hypothesised that the embryonic alpha-hemoglobin family diverged considerably earlier than the beta-hemoglobin line, as reflected in the greater diversity found amongst alpha sequences []. Alpha-like globins derived from a common ancestor and have a more stable history than beta-globins [].This entry represents secreted globins and haemoglobins found primarily in annelid worms [
,
]. These proteins are very large, consisting of multiple subunits linked by disulphide bonds []. For example, the extracellular haemoglobin from Lumbricus terrestris (Common earthworm) consists of 12 subunits arranged in a hexagonal bilayer structure, where each of the 12 subunits is itself composed of disulphide-linked trimers (chains A, B, and C) and monomers (chain D).
ABC transporters belong to the ATP-Binding Cassette (ABC) superfamily, which uses the hydrolysis of ATP to energise diverse biological systems. ABC transporters minimally consist of two conserved regions: a highly conserved ATP binding cassette (ABC) and a less conserved transmembrane domain (TMD). These can be found on the same protein or on two different ones. Most ABC transporters function as a dimer and therefore are constituted of four domains, two ABC modules and two TMDs.ABC transporters are involved in the export or import of a wide variety of substrates ranging from small ions to macromolecules. The major function of ABC import systems is to provide essential nutrients to bacteria. They are found only in prokaryotes and their four constitutive domains are usually encoded by independent polypeptides (two ABC proteins and two TMD proteins). Prokaryotic importers require additional extracytoplasmic binding proteins (one or more per systems) for function. In contrast, export systems are involved in the extrusion of noxious substances, the export of extracellular toxins and the targeting of membrane components. They are found in all living organisms and in general the TMD is fused to the ABC module in a variety of combinations. Some eukaryotic exporters encode the four domains on the same polypeptide chain [
].The ABC module (approximately two hundred amino acid residues) is known to bind and hydrolyse ATP, thereby coupling transport to ATP hydrolysis in a large number of biological processes. The cassette is duplicated in several subfamilies. Its primary sequence is highly conserved, displaying a typical phosphate-binding loop: Walker A, and a magnesium binding site: Walker B. Besides these two regions, three other conserved motifs are present in the ABC cassette: the switch region which contains a histidine loop, postulated to polarise the attaching water molecule for hydrolysis, the signature conserved motif (LSGGQ) specific to the ABC transporter, and the Q-motif (between Walker A and the signature), which interacts with the gamma phosphate through a water bond. The Walker A, Walker B, Q-loop and switch region form the nucleotide binding site [,
,
].The 3D structure of a monomeric ABC module adopts a stubby L-shape with two distinct arms. ArmI (mainly β-strand) contains Walker A and Walker B. The important residues for ATP hydrolysis and/or binding are located in the P-loop. The ATP-binding pocket is located at the extremity of armI. The perpendicular armII contains mostly the alpha helical subdomain with the signature motif. It only seems to be required for structural integrity of the ABC module. ArmII is in direct contact with the TMD. The hinge between armI and armII contains both the histidine loop and the Q-loop, making contact with the gamma phosphate of the ATP molecule. ATP hydrolysis leads to a conformational change that could facilitate ADP release. In the dimer the two ABC cassettes contact each other through hydrophobic interactions at the antiparallel β-sheet of armI by a two-fold axis [
,
,
,
,
,
].The ATP-Binding Cassette (ABC) superfamily forms one of the largest of all protein families with a diversity of physiological functions [
]. Several studies have shown that there is a correlation between the functional characterisation and the phylogenetic classification of the ABC cassette [,
]. More than 50 subfamilies have been described based on a phylogenetic and functional classification [,
,
].This family contains ABC-type bacteriocin transporter. In general, bacteriocins are agents which are responsible for killing or inhibiting the closely related species or even different strains of the same species. Bacteriocins are encoded by bacterial plasmids. Bacteriocins are named after the species and hence in literature one encounters various names e.g., leucocin from Leuconostic geldium; pedicocin from Pedicoccus acidilactici; sakacin from Lactobacillus sake etc. Peptide bacteriocins are exported across the cytoplasmic membrane by a dedicated ATP-binding cassette (ABC) transporter. These ABC-transporters have an N-terminal peptidase domain that belong to MEROPS peptidase family C39 (clan CA); a central multi-pass transmembrane region and a C-terminal ABC transporter domain. These transporters have dual function: (i) they remove the N-terminal leader peptide from its bacteriocin precursor by cleavage at a Gly-Gly bond and (ii) transport the mature bacteriocin across the cytoplasmic membrane. This represents a novel strategy for secretion of bacterial proteins [
]. Many bacteria are known to regulate diverse physiological processes through this system, such as bioluminescence, regulation of sporulation, virulence factor expression, antibiotics production, competence for genetic transformation, and activation of biofilm formation [].
Carbamoyl phosphate synthase (CPSase) is a heterodimeric enzyme composed of a small and a large subunit (with the exception of CPSase III, see below). CPSase catalyses the synthesis of carbamoyl phosphate from biocarbonate, ATP and glutamine or ammonia, and represents the first committed step in pyrimidine and arginine biosynthesis in prokaryotes and eukaryotes, and in the urea cycle in most terrestrial vertebrates [
,
]. This entry represents the small subunit of the glutamine-dependent form () of carbamoyl phosphate synthase. The small subunit catalyses the hydrolysis of glutamine to ammonia, which in turn used by the large chain to synthesize carbamoyl phosphate. The C-terminal domain of the small subunit of CPSase has glutamine amidotransferase activity.
In animals CPSase small subunit is part of a fusion protein, CAD, which combines enzymatic activities of the pyrimidine pathway (glutamine-dependent carbamyl phosphate synthetase (GLN-CPSase), aspartate transcarbamylase (ATCase), and dihydroorotase (DHOase)) [
]. In fungi, the CAD-like protein Ura2 is a fusion protein with CPSase and ATCase activity, but without DHOase activity, which is provided by a separate protein [].
Dynein is a multisubunit microtubule-dependent motor enzyme that acts as the force generating protein of eukaryotic cilia and flagella. The cytoplasmic isoform of dynein acts as a motor for the intracellular retrograde motility of vesicles and organelles along microtubules.Dynein is composed of a number of ATP-binding large subunits (see
), intermediate size subunits and small subunits. Among the small subunits, there is a family of highly conserved proteins which make up this family [
,
,
]. Proteins in this family act as one of several non-catalytic accessory components of the cytoplasmic dynein 1 complex that are thought to be involved in linking dynein to cargos and to adapter proteins that regulate dynein function and may play a role in changing or maintaining the spatial distribution of cytoskeletal structures. In yeast, it was identified as a component of the nuclear pore complex where it may contribute to the stable association of the Nup82 subcomplex with the nuclear pore complex [].Both type 1 (DLC1) and 2 (DLC2) dynein light chains have a similar two-layer α-β core structure consisting of beta-alpha(2)-beta-X-beta(2) [
,
].
Initiation factor 2 binds to Met-tRNA, GTP and the small ribosomal subunit. The eukaryotic translation initiation factor EIF-2B is a complex made up of five different subunits, alpha, beta, gamma, delta and epsilon, and catalyses the exchange of EIF-2-bound GDP for GTP. This family includes initiation factor 2B alpha, beta and delta subunits from eukaryotes; related proteins from archaebacteria and IF-2 from prokaryotes and also contains a subfamily of proteins in eukaryotes, archaeae (e.g. Pyrococcus furiosus), or eubacteria such as Bacillus subtilis and Thermotoga maritima. Many of these proteins were initially annotated as putative translation initiation factors despite the fact that there is no evidence for the requirement of an IF2 recycling factor in prokaryotic translation initiation. Recently, one of these proteins from B. subtilis has been functionally characterised as a 5-methylthioribose-1-phosphate isomerase (MTNA) [
]. This enzyme participates in the methionine salvage pathway catalysing the isomerisation of 5-methylthioribose-1-phosphate to 5-methylthioribulose-1-phosphate []. The methionine salvage pathway leads to the synthesis of methionine from methylthioadenosine, the end product of the spermidine and spermine anabolism in many species.
MORF4 (mortality factor on chromosome 4), MRG15 (MORF4-related gene on chromosome 15) and MRGX (MORF4-related gene on chromosome X) are members of the MRG protein family that were first identified as transcription factors involved in cellular senescence. All expressed members of the MRG family are localized to the nucleus and have predicted motifs that indicate they function as chromatin remodeling complex components. MORF4, MRG15 and MRGX share a common C-terminal part but a different N-terminal part. The C-terminal similarity of all MRG family members (MORF4, MRG15 and MRGX homologues) defines a new conserved protein domain. The ~170 amino acid MRG domain binds a plethora of transcriptional regulators and chromatin-remodeling factors, including the histone deacetylase transcriptional corepressor mSin3A and the nuclear protein PAM14 (protein-associated MRG, 14kDa) [
,
].The MRG domain consists of three conserved blocks. It is predominantly hydrophobic, and consists of mainly α-helices that are arranged in a three layer sandwich topology. The hydrophobic core is stabilised by interactions among a number of conserved hydrophobic residues. The molecular surface is largely hydrophobic, but contains a few hydrophilic patches [
,
,
].
Gelsolin is a cytoplasmic, calcium-regulated, actin-modulating protein that binds to the barbed ends of actin filaments, preventing monomer exchange (end-blocking or capping) [
]. It can promote nucleation (the assembly of monomers into filaments), as well as sever existing filaments. In addition, this protein binds with high affinity to fibronectin. Plasma gelsolin and cytoplasmic gelsolin are derived from a single gene by alternate initiation sites and differential splicing.Sequence comparisons indicate an evolutionary relationship between gelsolin,
villin, fragmin and severin []. Six large repeating segments occur in gelsolin and villin, and 3 similar segments in severin and fragmin. While the multiple repeats have yet to be related to any known function of the actin-severing proteins, the superfamily appears to have evolved from an ancestral sequence of 120 to 130 amino acid residues [].This gelsolin-like domain can also be found in the C-terminal of the members of Sec23/Sec24 family. They are components of the coat protein complex II (COPII) which promotes the formation of transport vesicles from the endoplasmic reticulum (ER).
LITAF (LPS-induced TNF-activating factor) (also known as SIMPLE; small
integral membrane protein of the late endosome) is an endosome-associatedintegral membrane protein important for multivesicular body (MVB) sorting. It
is a monotypic membrane protein with both termini exposed to the cytoplasm andis anchored to membranes via an in-plane helical membrane anchor, present
within the highly conserved C-terminal region known as the 'LITAF domain' or'SIMPLE-like domain'. The LITAF domain consists of conserved cysteines
separated by a 22 residue hydrophobic region. LITAF domains are foundthroughout the eukaryotes, suggesting ancient conserved functions, with
multiple instances of expansion, especially in the metazoa [,
].The LITAF domain consists of five β-sheets, three N-terminal and two C-
terminal to the predicted hydrophobic anchor region and is stabilized by thecoordination of a zinc atom by two pairs of evolutionarily conserved cysteine
residues. Consistent with a protein domain that resides in close proximity tomembranes, specific residues within the LITAF domain interact with
phosphoethanolamine (PE) head groups. The anchoring-region of the LITAF domainis likely to embed into the cytosolic-facing monolayer of the membrane bilayer
by adopting an amphipathic character [].
The S1 domain of around 70 amino acids, originally identified in ribosomal protein S1, is found in a large number of RNA-associated proteins. It has been shown that S1 proteins bind RNA through their S1 domains with some degree of sequence specificity. This type of S1 domain is found in translation initiation factor 1.The solution structure of one S1 RNA-binding domain from Escherichia coli polynucleotide phosphorylase has been determined [
]. It displays some similarity with the cold shock domain (CSD) (). Both the S1 and the CSD domain consist of an antiparallel beta barrel of the same topology with 5 beta strands. This fold is also shared by many other proteins of unrelated function and is known as the OB fold. However, the S1 and CSD fold can be distinguished from the other OB folds by the presence of a short 3(10) helix at the end of strand 3. This unique feature is likely to form a part of the DNA/RNA-binding site.This entry is specific for bacterial, chloroplastic and eukaryotic IF-1 type S1 domains.
This entry represents the C-terminal region of several eukaryotic and archaeal RuvB-like 1 (Pontin or TIP49a) and RuvB-like 2 (Reptin or TIP49b) proteins. The N-terminal domain contains the AAA ATPase, central region
domain. In zebrafish, the liebeskummer (lik) mutation, causes development of hyperplastic embryonic hearts. lik encodes Reptin, a component of a DNA-stimulated ATPase complex. Beta-catenin and Pontin, a DNA-stimulated ATPase that is often part of complexes with Reptin, are in the same genetic pathways. The Reptin/Pontin ratio serves to regulate heart growth during development, at least in part via the beta-catenin pathway [
]. TBP-interacting protein 49 (TIP49) was originally identified as a TBP-binding protein, and two related proteins are encoded by individual genes, tip49a and b. Although the function of this gene family has not been elucidated, they are supposed to play a critical role in nuclear events because they interact with various kinds of nuclear factors and have DNA helicase activities. TIP49a has been suggested to act as an autoantigen in some patients with autoimmune diseases [].
The MCM2-7 complex consists of six closely related proteins that are highly conserved throughout the eukaryotic kingdom. In eukaryotes, Mcm7 is a component of the MCM2-7 complex (MCM complex), which consists of six sequence-related AAA + type ATPases/helicases that form a hetero-hexameric ring [
]. MCM2-7 complex is part of the pre-replication complex (pre-RC). In G1 phase, inactive MCM2-7 complex is loaded onto origins of DNA replication [,
,
]. During G1-S phase, MCM2-7 complex is activated to unwind the double stranded DNA and plays an important role in DNA replication forks elongation [].The components of the MCM2-7 complex are:
DNA replication licensing factor MCM2, DNA replication licensing factor MCM3, DNA replication licensing factor MCM4, DNA replication licensing factor MCM5, DNA replication licensing factor MCM6, DNA replication licensing factor MCM7, The human MCM7 gene has been localised to chromosome 7q21.3-q22.1. Increased expression of Mcm7
RNA and protein in MYCN-amplified neuroblastoma tumour and cell lines hasbeen reported [
]. Furthermore, The Mcm7 protein has been shown to formcomplexes with the retinoblastoma protein [
]. These findings suggest Mcm7-directed DNA replication contributes to neoplastic transformation.
Proteins that bind cyclic nucleotides (cAMP or cGMP) share a structural domain of about 120 residues [
,
,
]. The best studied of these proteins is the prokaryotic catabolite gene activator (alsoknown as the cAMP receptor protein) (gene crp) where such a domain is known to be composed of three α-helices and a distinctive eight-stranded, antiparallel β-barrel structure. There are six invariant amino acids in this domain, three of which are glycine residues that are thought to be essential for maintenance of the structural integrity of the β-barrel. cAMP- and cGMP-dependent protein kinases (cAPK and cGPK) contain two tandem copies of the cyclic nucleotide-binding domain. The cAPK's are composed of two different subunits, a catalytic chain and a regulatory chain,
which contains both copies of the domain. The cGPK's are single chain enzymes that include the two copies of the domain in their N-terminal section. Vertebrate cyclic nucleotide-gated ion-channels also contain this domain. Two such cations channels have been fully characterised, one is found in rod cells where it plays a role in visual signal transduction.
CBS domains are evolutionarily conserved structural domains found in a variety of non functionally-related proteins from all kingdoms of life. These domains pair together to form a intramolecular dimeric structure (CBS pair), termed Bateman domain [
,
,
,
]. CBS domains have been shown to bind mainly ligands with an adenosyl group such as AMP, ATP and S-AdoMet, but may also bind metal ions, or nucleic acids [,
]. Hence, they play an essential role in the regulation of the activities of numerous proteins, and mutations in them are associated with several hereditary diseases [,
,
]. CBS domains are found attached to a wide range of other protein domains suggesting that CBS domains may play a regulatory role making proteins sensitive to adenosyl-carrying ligands. The region containing the CBS domains in cystathionine-beta synthase is involved in regulation by S-AdoMet []. CBS domain pairs from AMPK bind AMP or ATP []. The CBS domains from IMPDH, which bind ATP, have shown to have a role in the regulation of adenylate nucleotide synthesis [,
].
The C-terminal domain of trigger factor and the peptide-binding domain of porin chaperone SurA share a multi-helical structure consisting of an irregular array of long and short helices.In the Escherichia coli cytosol, a fraction of the newly synthesised proteins requires the activity of molecular chaperones for folding to the native state. The major chaperones implicated in this folding process are the ribosome-associated Trigger Factor (TF), and the DnaK and GroEL chaperones with their respective co-chaperones. Trigger Factor is an ATP-independent chaperone and displays chaperone and peptidyl-prolyl-cis-trans-isomerase (PPIase) activities
in vitro. It is composed of three domains, an N-terminal domain which mediates association with the large ribosomal subunit, a central PPIase domain with homology to FKBP proteins, and a C-terminal substrate-binding domain [
,
]. The association between its N-terminal domain with the ribosomal protein L23 located next to the peptide tunnel exit is essential for the interaction with nascent polypeptides and its in vivo function [,
].The porin chaperon SurA facilitates correct folding of outer membrane proteins in Gram-negative bacteria [
,
].
Proteins that bind cyclic nucleotides (cAMP or cGMP) share a structural domain of about 120 residues [
,
,
]. The best studied of these proteins is the prokaryotic catabolite gene activator (alsoknown as the cAMP receptor protein) (gene crp) where such a domain is known to be composed of three α-helices and a distinctive eight-stranded, antiparallel β-barrel structure. There are six invariant amino acids in this domain, three of which are glycine residues that are thought to be essential for maintenance of the structural integrity of the β-barrel. cAMP- and cGMP-dependent protein kinases (cAPK and cGPK) contain two tandem copies of the cyclic nucleotide-binding domain. The cAPK's are composed of two different subunits, a catalytic chain and a regulatory chain,which contains both copies of the domain. The cGPK's are single chain enzymes that include the two copies of the domain in their N-terminal section. Vertebrate cyclic nucleotide-gated ion-channels also contain this domain. Two such cations channels have been fully characterised, one is found in rod cells where it plays a role in visual signal transduction.
Proteins that bind cyclic nucleotides (cAMP or cGMP) share a structural domain of about 120 residues [
,
,
]. The best studied of these proteins is the prokaryotic catabolite gene activator (alsoknown as the cAMP receptor protein) (gene crp) where such a domain is known to be composed of three α-helices and a distinctive eight-stranded, antiparallel β-barrel structure. There are six invariant amino acids in this domain, three of which are glycine residues that are thought to be essential for maintenance of the structural integrity of the β-barrel. cAMP- and cGMP-dependent protein kinases (cAPK and cGPK) contain two tandem copies of the cyclic nucleotide-binding domain. The cAPK's are composed of two different subunits, a catalytic chain and a regulatory chain,which contains both copies of the domain. The cGPK's are single chain enzymes that include the two copies of the domain in their N-terminal section. Vertebrate cyclic nucleotide-gated ion-channels also contain this domain. Two such cations channels have been fully characterised, one is found in rod cells where it plays a role in visual signal transduction.
Syntaxins A and B are nervous system-specific proteins implicated in the docking of synaptic vesicles with the presynaptic plasma membrane. Syntaxins are a family of
receptors for intracellular transport vesicles. Each target membrane may beidentified by a specific member of the syntaxin family [
].Members of the syntaxin family [,
] have a size ranging from30 Kd to 40 Kd; a C-terminal extremity which is highly hydrophobic and anchors the protein on the cytoplasmic surface of cellular membranes; a central, well
conserved region SNARE motif, which seems to be in a coiled-coil conformation. SNARE motifs assemble into parallel four helix bundles stabilised by the burial of these hydrophobic helix faces in the bundle core []. Monomeric SNARE motifs are disordered so this assembly reaction is accompanied by a dramatic increase in α-helical secondary structure. The parallel arrangement of SNARE motifs within complexes bring the transmembrane anchors, and the two membranes, into close proximity. Epimorphin is related to neuronal and yeast vesicle
targeting proteins.This entry represents the coiled-coil region of these proteins.
ACF (for ATP-utilising chromatin assembly and remodeling factor) is a chromatin-remodeling complex that catalyzes the ATP-dependent assembly of periodic nucleosome arrays. This reaction utilises the energy of ATP hydrolysis by ISWI, the smaller of the two subunits of ACF. Acf1, the large subunit of ACF, is essential for the full activity of the complex. The WAC (WSTF/Acf1/cbp146) domain is an ~110-residue module present at the N-termini of Acf1-related proteins in a variety of organisms. It is found in association with other domains such as the bromodomain, the PHD-type zinc finger, DDT or WAKS. The DNA-binding region of Acf1 includes the WAC domain, which is necessary for the efficient binding of ACF complex to DNA. It seems probable that the WAC domain will be involved in DNA binding in other related factors [,
]. Some proteins known to contain a WAC domain are the Drosophila melanogaster (Fruit fly) ATP-dependent chromatin assembly factor large subunit Acf1, human WSTF (Williams syndrome transcription factor), mouse cbp146, yeast imitation switch two complex protein 1 (ITC1 or YGL133w), and yeast protein YPL216w.
The NodB homology domain is a catalytic domain of ~200 amino acid residues,
which has been named after its similarity to rhizobial NodBchitooligosaccharide deacetylase. It is found in members of carbohydrate
esterase family 4 (CE4) and in PuuE proteins.Members of the CE4 family exhibit metal-dependent deacetylation of O- and N-
acetylated polysaccharides, such as chitin, peptidoglycan, and acetylxylan.Proteins belonging to this family have conserved residues that are important
for metal coordination (D-H-H triad) and enzymatic activity. CE4 enzymestypically require a divalent Zn(2+) or Ni(2+) metal ion that is usually
coordinated by an aspartate and two histidine residues [,
,
,
].PuuE proteins are allantoinases that catalyze the hydrolytic cleavage of the
hydantoin ring of allantoin. The conserved D-H-H metal-binding triad isreplaced by E-H-W in PuuE proteins. Amino acid substitutions are also observed
for residues that have been implicated in catalysis, conferring metalindependency to the enzyme [
].The NodB homology domain adopts a deformed (beta/alpha) barrel fold comprising
eight parallel β-strands, with the C-terminal ends of five of these strandsforming the solvent-exposed active site region, surrounded by eight alpha-
helices [,
,
].
These proteins are members of the C4-Dicarboxylate Uptake (Dcu) family. Most proteins in this family are predicted to have 12 GES predicted transmembrane regions; however the one member whose membrane topology has been experimentally determined has 10 transmembrane regions, with both the N- and C-termini localized to the periplasm [
]. The DcuA and DcuB proteins are involved in the transport of aspartate, malate, fumarate and succinate in many species [,
,
], and are thought to function as antiporters with any two of these substrates. Since DcuA is encoded in an operon with the gene for aspartase, and DcuB is encoded in an operon with the gene for fumarase, their physiological functions may be to catalyze aspartate:fumarate and fumarate:malate exchange during the anaerobic utilization of aspartate and fumarate, respectively []. The Escherichia coli DcuA and DcuB proteins have very different expression patterns []. DcuA is constitutively expressed; DcuB is strongly induced anaerobically by FNR and C4-dicarboxylates, while it is repressed by nitrate and subject to CRP-mediated catabolite repression.
Monellin is an intensely sweet-tasting protein derived from Dioscoreophyllum cumminsii (Serendipity berry).
The protein has a very high specificity for the sweet receptors, making it~100,000 times sweeter than sugar on a molar basis and several thousand
times sweeter on a weight basis []. Like the sweet-tasting proteinthaumatin, it neither contains carbohydrates nor modified amino acids.
Although there is no sequence similarity between the proteins, antibodiesfor thaumatin compete for monellin (and other sweet compounds, but not for
chemically modified non-sweet monellin) and vice versa. It is thoughtthat native conformations are important for the sweet taste.
Monellin is a heterodimer, comprising an A chain of 44 amino acid residues,
and a B chain of 50 residues []. The individual subunits are not sweet, nor do they block the sweet sensation of sucrose or monellin. However,
blocking the single SH of monellin abolishes its sweetness, as does reaction of its methionyl residue with CNBr. The cysteinyl and methionyl residues
are adjacent and it has been suggested that this part of the molecule is essential for its sweetness [
].
The centromere is the chromosomal site that joins to microtubules during mitosis for proper segregation. Centromere protein A (CENP-A) is a histone H3 variant and an essential component of centromeres. Mis18 proteins are involved in the priming of centromeres for recruitment of CENP-A. They possess two structurally distinct domains: an N-terminal globular domain mainly comprised of β-strands and a C-terminal α-helical domain. The oligomerization of Mis18, mediated by its conserved N-terminal globular domain, is crucial for its centromere localization and function [
,
].The Mis18 domain is mainly comprised of β-strands and has two conserved C-x-x-C motifs, which are signatures motifs present in metal
ion-binding proteins. The overall fold of the Mis18 domain is formed by antiparallel β-sheets: a three stranded (β1-β2-β9: β-sheet I)and a six stranded (β3-β4-β8-β7-β6-βa5: β-sheet II) sheet, arranged approximately perpendicular to each other. The two β-sheets are held together by a Zn(2+) ion coordinated via the C-x-x-C motifs from loops L1 and L5. The Mis18 domain contains a cradle-shaped pocket that is implicated in
protein/nucleic acid binding, which is required for Mis18 function [,
].
Murine neutrophil gelatinase-associated lipocalin precursor (NGAL) exhibits
a 7-10-fold increase in expression in cultured mouse kidney cells infectedby Simian virus 40 (SV40) or other viruses [
]. NGAL has been identified as a major secretory product of lipopolysaccharide-stimulated cultured mouse
macrophages, suggesting that the protein might function in defence againstinfection [
]. Recently, NGAL has been shown to be identical to SIP24, a previously identified secretory product of quiescent mouse fibroblasts
induced by serum, dexamethasone, basic fibroblast growth factor, and phorbolester [
]. Mouse plasma levels of NGAL rise as a result of increased expression levels in the liver, in response to intramuscular turpentine
injection. Tumour necrosis factor can regulate NGAL expression in cultured liver cells. These findings indicate that NGAL is a positive acute phase
protein and may possess immunosuppressive or anti-inflammatory properties,
possibly linked to its regulation of neutrophil gelatinase or other plasmaprotein [
]. The uterus is also a major site of NGAL synthesis, especiallyat parturition, when expression increases significantly, suggesting a
physiological role for the protein in uterine secretions [].
Cytochromes c (cytC) can be defined as electron-transfer proteins having
one or several haem c groups, bound to the protein by one or, more generally, two thioether bonds involving sulphydryl groups of cysteine
residues. The fifth haem iron ligand is always provided by a histidine residue. CytC possess a wide range of properties and function in a large
number of different redox processes. Ambler [
] recognised four classes of cytC.Class I includes the low-spin
soluble cytC of mitochondria and bacteria, with the haem-attachment sitetowards the N terminus, and the sixth ligand provided by a methionine
residue about 40 residues further on towards the C terminus. On the basisof sequence similarity, class I cytC were further subdivided into five
classes, IA to IE. Class IE includes such bacterial proteins as cyt c5, cyt c-555 and Ectothiorhodospira cyt c-551. The 3D structure of cyt c5
from Azotobacter vinelandii has been determined []. The protein consists of 5 α-helices; three 'core' helices form a 'basket' around the haem
group, with one haem edge exposed to the solvent.
This entry represents the ePHD finger of A. thaliana histone-lysine N-methyltransferase arabidopsis trithorax-like proteins ATX1, -2, and similar proteins. The extended plant homeodomain (ePHD) zinc finger is characterized as Cys2HisCys5HisCys2His.ATX1 and -2 are sister paralogs originating from a segmental chromosomal duplication; they are plant counterparts of the Drosophila melanogaster trithorax (TRX) and mammalian mixed-lineage leukemia (MLL1) proteins [
]. ATX1 is a methyltransferase that trimethylates histone H3 at lysine 4 (H3K4me3). It also acts as a histone modifier and as a positive effector of gene expression []. ATX1 regulates transcription from diverse classes of genes implicated in biotic and abiotic stress responses. It is involved in dehydration stress signaling in both abscisic acid (ABA)-dependent and ABA-independent pathways []. ATX2 is involved in dimethylating histone H3 at lysine 4 (H3K4me2) []. ATX1 and ATX2 are multi-domain proteins that consist of an N-terminal PWWP domain, FYRN- and FYRC (DAST, domain associated with SET in trithorax) domains, a canonical PHD finger, a non-canonical ePHD finger, and a C-terminal SET domain [].