LIM domain only protein 7 (LMO-7) contains a LIM zinc-binding domain and a PDZ domain. LMO7 is involved in the regulation of cell adhesion and signaling [
,
,
]. In humans it has several isoforms; isoform 2 and isoform 4 are predominantly expressed in brain []. It may have roles in embryonic development and cancer progression [,
,
].
Phosphatidylinositol-glycan biosynthesis class X protein
Type:
Family
Description:
PIG-X is an essential component of glycosylphosphatidylinositol-mannosyltransferase I [
]. It probably acts by stabilizing the mannosyltransferase PIG-M []. It has also be shown to interact with reticulocalbin 1 (RCN1) and reticulocalbin 2 (RCN2), and may negatively regulate the expression of these two genes [
].
Proteins in this entry, which are synthesised by Saccharomycetes, adopt a structure consisting of a four-stranded β-sheet, with strand order beta2-beta1-beta4-beta3, and two α-helices, with an overall topology of β-β-α-β-beta-alpha. They have no known function [
].
This entry represents a group of hypothetical proteins found in various bacteria and archaea. The structure of MTH1675 from Methanobacterium thermoautotrophicum has a three-layer alpha/beta/alpha structure, similar to that found in the C-terminal domain of pyruvate kinase.
This entry includes acetylaranotin biosynthesis cluster protein L (AtaL) from Aspergillus terreus, which is a non-ribosomal peptide synthetase. It is required for the biosynthesis of the toxin acetylaranotin, which is a disulfide bridged cyclic dipeptide [
]. There are a number of steps in the biosyntheis of this epipolythiodioxopiperazine toxin from two phenylananines, but the specific function of AtaL has yet to be determined []. Homologues are also known from bacteria.
The phosphorylated adaptor for RNA export (PHAX) protein is required for nuclear export of snRNAs in metazoans and also involved in the intranuclear transport of snoRNAs to Cajal bodies [
].
The N-terminal propeptide of surfactant protein C adopts an α-helical structure, with turn and extended regions. Its main function is the stabilisation of metastable surfactant protein C (SP-C), since the latter can irreversibly transform from its native α-helical structure to β-sheet aggregates and form amyloid-like fibrils. The correct intracellular trafficking of proSP-C has also been reported to depend on the propeptide [
].
CHERP is an integral endoplasmic reticulum (ER) membrane protein involved in calcium homeostasis [
]. It is also a Ca(2+)-dependent ALG-2-interactive target in the nucleus that participates in regulation of alternative splicing of inositol trisphosphate receptor type 1 (IP3R1) pre-mRNA [].
Ric1 is part of the RIC1-RGP1 complex that acts as a guanine nucleotide exchange factor (GEF) for Rab6 GTPase, which is important for normal Golgi structure and function [
,
].
This entry represents a group of fatty acid-binding protein homologues from roundworms. They may play a role in sequestering potentially toxic fatty acids and their peroxidation products, or may be involved in the maintenance of the impermeable lipid layer of the eggshell [
].
IRG proteins (immunity-related GTPases) contain a Ras-like G domain and were first described as genes strongly induced by infection. Despite its phylogenetic relationship to the IRG proteins, IRGQ (FKSG27) is not a GTPase [
]. Its function is not known.
Phosphonate ABC transporter, substrate-binding protein
Type:
Family
Description:
This entry represents a subset of the broader family of phosphate/phosphonate binding protein ABC transporter components. Many of the proteins in this entry can be associated with pathways for the metabolism of phosphonates, particularly the C-P lyase system based on their genomic context; this includes the characterised PhnD protein from Escherichia coli [
,
]. Note that not all of these proteins are encoded within a phosphonate metabolism context, but can be assumed to bind phosphonate compounds even in the absence of such context.Bacterial high affinity transport systems are involved in active transport of solutes across the cytoplasmic membrane. Most of the bacterial ABC (ATP-binding cassette) importers are composed of one or two transmembrane permease proteins, one or two nucleotide-binding proteins and a highly specific periplasmic solute-binding protein. In Gram-negative bacteria the solute-binding proteins are dissolved in the periplasm, while in archaea and Gram-positive bacteria, their solute-binding proteins are membrane-anchored lipoproteins [
,
].
The two phosphocholine-preferring phospholipase C enzymes (
) in Pseudomonas aeruginosa, include the hemolytic protein PlcH, which can hydrolyse sphingomyelin as well as phosphatidylcholine. This entry represents PlcR, an accessory protein for PlcH with which it forms a heterodimer. PlcR is required for the secretion of PlcH and affects both its enzymatic and hemolytic activities [
,
]. The P. aeruginosa protein, is encoded immediately downstream of PlcH, but the equivalent proteins from various Burkholderia species are not.
Cbl (Casitas B-lineage lymphoma) is an adaptor protein that functions as a negative regulator of many signalling pathways that start from receptors at the cell surface.The N-terminal region of Cbl contains a Cbl-type phosphotyrosine-binding (Cbl-PTB) domain, which is composed of three evolutionarily conserved domains: an N-terminal four-helix bundle (4H) domain, an EF hand-like calcium-binding domain, and a divergent SH2-like domain. The calcium-bound EF-hand wedges between the 4H and SH2 domains, and roughly determines their relative orientation. The Cbl-PTB domain has also been named Cbl N-terminal (Cbl-N) or tyrosine kinase binding (TKB) domain [
,
].The N-terminal 4H domain contains four long α-helices. The C and D helices in this domain pack against the adjacent EF-hand-like domain, and a highly conserved loop connecting the A and B helices contacts the SH2-like domain. The EF-hand motif is similar to classical EF-hand proteins. The SH2-like
domain retains the general helix-sheet-helix architecture of the SH2 fold, but lacks the secondary β-sheet, comprising β-strands D', E and F, and also a prominent BG loop [].This entry represents the N-terminal four-helical bundle domain.
Salmonella invasion protein A (SipA) is a virulence factor that is translocated into host cells by a type III secretion system. In the host cell it binds to actin, stimulates actin polymerisation and counteracts F-actin destabilising proteins. This contributes towards cytoskeletal rearrangements that allow the entry of the pathogen into the host cell [
]. This entry represents the N-terminal domain of SipA.
Calcineurin-binding protein cabin-1, MEF2-binding domain
Type:
Domain
Description:
The myocyte enhancer factor-2 (MEF2) binding domain, predominantly found in the calcineurin-binding protein CABIN 1, adopts an amphipathic α-helical structure, which allows it to bind a hydrophobic groove on the MEF2S domain, forming a triple-helical interaction. Interaction of this domain with MEF2 causes repression of transcription [
].
In zebrafish, CSAP (centriole, cilia and spindle-associated protein) plays a role in embryonic development, including proper left-right asymmetry formation. In human tissue culture cells it colocalises with polyglutamylated tubulin to centrioles, spindle microtubules, and cilia [
].
Tuberin/Ral GTPase-activating protein subunit alpha
Type:
Family
Description:
The activity of GTPases is regulated by the opposing effects of guanine nucleotide exchange factors (GEFs) and GTPase-activating proteins (GAPs). Tuberin (tuberous sclerosis 2 protein or Tsc2) is believed to be a tumor suppressor and is able to stimulate specific GTPases. It stimulates the intrinsic GTPase activity of the Ras-related protein Rap1A and Rab5 [
,
]. In complex with Tsc1, inhibits the nutrient-mediated or growth factor-stimulated phosphorylation of S6K1 and EIF4EBP1 by negatively regulating mTORC1 signaling. It acts as a GTPase-activating protein (GAP) for the small GTPase RheB, a direct activator of the protein kinase activity of mTORC1 [,
]. Ral GTPase-activating protein subunit alpha is the catalytic subunit of the heterodimeric RalGAP complex which acts as a GTPase activator for the Ras-like small GTPases RalA and RalB [
]. RalGAP complexes share structural and catalytic similarities with the tuberous sclerosis tumor suppressor complex [].
Centromere protein B (CENP-B) interacts with centromeric heterochromatin in chromosomes and binds to a specific subset of alphoid satellite DNA, called the CENP-B box. CENP-B may organise arrays of centromere satellite DNA into a higher order structure, which then directs centromere formation and kinetochore assembly in mammalian chromosomes.The CENP-B dimerisation domain is composed of two α-helices, which are folded into an antiparallel configuration. Dimerisation of CENP-B is mediated by this domain, in which monomers dimerise to form a symmetrical, antiparallel, four-helix bundle structure with a large hydrophobic patch in which 23 residues of one monomer form van der Waals contacts with the other monomer. This CENP-B dimer configuration may be suitable for capturing two distant CENP-B boxes during centromeric heterochromatin formation [
]. This C-terminal domain contains the dimerization domain that has being delimited to the C-terminal 59 amino acid residues of CENP-B [
].
The HutP protein regulates the expression of Bacillus 'hut' structural genes by an anti-termination complex, which recognises three UAG triplet units, separated by four non-conserved nucleotides on the RNA terminator region. L-Histidine and Mg2+ ions are also required. These proteins exhibit the structural elements of alpha/beta proteins, arranged in the order: α-α-beta-α-α-β-β-beta in the primary structure, and the four antiparallel β-strands form a β-sheet in the order beta1-beta2-beta3-beta4, with two α-helices each on the front (alpha1 and alpha2) and at the back (alpha3 and alpha4) of the β-sheet [
].
Protein L is a bacterial protein with immunoglobulin (Ig) light chain-binding properties. It contains a number of homologous b1 repeats towards the N terminus. These repeats have been found to be responsible for the interaction of protein L with Ig light chains [
].
Choline ABC transporter, substrate-binding protein
Type:
Family
Description:
Partial phylogenetic profiling [
] suggests that the ABC transporter periplasmic binding proteins in this entry are involved in choline uptake for glycine betaine biosynthesis. Genomes often carry several paralogs, one encoded together with the permease and ATP-binding components, and another encoded next to a choline-sulphatase gene, suggesting that different members of this protein family interact with shared components and give some flexibility in substrate. Of the two proteins from Sinorhizobium meliloti 1021, one designated ChoX has been shown experimentally to bind choline (though not various related compounds such as betaine) and to be required for about 60 % of choline uptake []. In A. tumefaciens, the ABC-type choline transporter is encoded by the chromosomally located choXWV operon (ChoX, binding protein; ChoW, permease; and ChoV, ATPase). The ChoXWV transporter functions as a high-affinity choline transporter [].
Partial phylogenetic profiling [
] suggests that the ABC transporter permease proteins in this entry are involved in choline uptake for glycine betaine biosynthesis.
Talin rod domain-containing protein 1 (TLNRD1), also known as mesoderm development candidate 1, is an actin-binding protein that may have an oncogenic function and regulates cell proliferation, migration and invasion in cancer cells [
].
EF-hand calcium-binding domain-containing protein 9
Type:
Family
Description:
EF-hand calcium-binding domain-containing protein 9 (EFCAB9) is a pH-dependent Ca2 sensor modulating the channel activity and the domain organization of CatSper, a sperm-specific, pH-sensitive calcium channel essential for hyperactivated motility and male fertility [
].
Glutaredoxin domain-containing cysteine-rich protein 1
Type:
Family
Description:
Glutaredoxin domain-containing cysteine-rich protein 1 (GRXCR1) contains a region of similarity to glutaredoxin proteins and a cysteine-rich region at its C terminus. It may be involved in regulation of actin filament architecture in hair cell stereocilia []. Mutations of the GRXCR1 gene cause deafness, autosomal recessive, 25 (DFNB25) [].This entry also includes the orthologues from Drosophila CG31559 and CG12206 which are predicted to enable protein-disulfide reductase activity. CG31559 is possibly expressed in embryonic/larval system and CG12206 is expressed in embryonic/larval posterior spiracle, epithelium, and larval foregut atrium.
CDK5 regulatory subunit-associated protein 2 (CDK5RAP2) is a centrosomal protein that regulates centrosomal maturation by recruitment of a gamma-tubulin ring complex (gamma-TuRC) onto centrosomes [
]. It interacts directly with EB1, a prototypic member of microtubule plus-end tracking proteins. The CDK5RAP2-EB1 complex regulates microtubule dynamics and stability []. CDK5RAP2 also interacts with Cep169, a microtubule plus-end-tracking centrosomal protein; this interaction regulates stability of microtubules []. Mutations in the CDK5RAP2 gene cause microcephaly 3, primary, autosomal recessive (MCPH3) [,
].
ML domain, phosphatidylinositol/phosphatidylglycerol transfer protein
Type:
Domain
Description:
The MD-2-related lipid-recognition (ML) domain is implicated in lipid recognition, particularly in the recognition of pathogen related products. It has an immunoglobulin-like β-sandwich fold similar to that of E-set Ig domains. This domain is present in proteins from plants, animals and fungi, including the following proteins:
Epididymal secretory protein E1 (also known as Niemann-Pick C2 protein - Npc2), which is known to bind cholesterol. Niemann-Pick disease type C2 is a fatal hereditary disease characterised by accumulation of low-density lipoprotein-derived cholesterol in lysosomes [
].House-dust mite allergen proteins such as Der f 2 from Dermatophagoides farinae and Der p 2 from Dermatophagoides pteronyssinus [
].This entry refers to the ML domain found in phosphatidylinositol/phosphatidylglycerol transfer protein (PG/PI-TP). PG/PI-TP has been shown to bind phosphatidylglycerol and phosphatidylinositol, but the biological significance of this is still obscure [
].
Protein Esc1, basic helix-loop-helix-zipper domain
Type:
Domain
Description:
Esc1 from fission yeasts is involved in the sexual differentiation process [
].This entry represents the basic helix-loop-helix-zipper (bHLHzip) domain of ESC1.
Members of this family consist of a β-sheet region followed by an α-helix and an unstructured C terminus. The β-sheet region contains a CXCX...XCXC sequence with Cys residues located in two proximal loops and pointing towards each other. This precise function of this set of bacterial proteins is, as yet, unknown [
].
This entry represents CXXC-type zinc finger proteins CXXC4 and CXXC5. CXXC4 is a potential tumour suppressor directly regulated by EZH2 [
]. It has been shown to inactivate mitogen activated protein kinase signaling []. CXXC5 is a retinoid-inducible nuclear factor (RINF) involved in normal myelopoiesis []. It has been associated with various malignant tumours [].
The SIAMESE-RELATED (SMR) family of cyclin-dependent kinase (CDK) inhibitors is a plant-specific family that consist of SIAMESE (SIM) and related proteins [
]. SIM controls endoreplication during trichome development []. SMR2 acts to restrict cell proliferation during leaf growth in Arabidopsis and SIM, SMR1/LGO, and SMR2 play overlapping roles in controlling the transition from cell division to endoreplication [].
PrdD is encoded in the proline reductase gene cluster. It is closely homologous to PrdA, which cleaves during maturation to create two subunits of the proline reductase complex, one of which has a Cys-derived pyruvoyl active site [
].
The anti-apoptotic protein p35 from baculovirus is thought to prevent the suicidal response of
infected insect cells by inhibiting caspases. Ectopic expression of p35 in a number of transgenic animals or cell lines is also anti-apoptotic, giving rise to the hypothesis that the protein is a general inhibitor of caspases. This protein belongs to MEROPS proteinase inhibitor family I50, clan IQ. Purified recombinant p35 inhibits human caspase-1, -3, -6, -7, -8, and -10 but does not significantly inhibit unrelated serine or cysteine proteases, implying that p35 is a potent caspase-specific inhibitor. The interaction of p35 with caspase-3, as a model of the inhibitory mechanism,revealed classic slow-binding inhibition, with both active-sites of the caspase-3 dimer acting equally and independently. Inhibition resulted from complex formation between the enzyme and inhibitor, which could be visualised under non-denaturing conditions, but was dissociated by SDS to give p35 cleaved at Asp87, the P1 residue of the inhibitor. Complex formation requires the substrate-binding cleft to be unoccupied [
].Infecting the insect cell line IPLB-Ld652Y with the baculovirus Autographa californica nuclear polyhedrosis virus (AcMNPV) results in global translation arrest, which correlates with the presence of the AcMNPV apoptotic suppressor, p35. However, the anti-apoptotic function of p35 in translation arrest is not solely due to caspase inactivation, but its activity enhances signalling to a separate translation arrest pathway, possibly by stimulating the late stages of the baculovirus infection cycle [
].
Members of this occur in gene pairs with members of
. The N-terminal region contains several predicted transmembrane helix regions while the few invariant residues (G, CxxD, and W) occur in the C-terminal region.
Members of this family are found in a set of prokaryotic hypothetical proteins. Their exact function has not, as yet, been defined.
Early region 3 (E3) of human adenoviruses (Ads) codes for proteins that appear to control viral interactions with the host [
]. This region called CR1 (conserved region 1) [] is found three times in Human adenovirus 19 (a subgroup D adenovirus) 49kDa protein in the E3 region. CR1 is also found in the 20.1 Kd protein of subgroup B adenoviruses. The function of this 80 amino acid region is unknown. This region is probably a divergent immunoglobulin domain.
Early region 3 (E3) of human adenoviruses (Ads) codes for proteins that appear to control viral interactions with the host [
]. This region called CR1 (conserved region 1) [] is found three times in Human adenovirus 19 (a subgroup D virus) 49 Kd protein in the E3 region. CR1 is also found in the 20.1 Kd protein of subgroup B adenoviruses. The function of this 80 amino acid region is unknown. This region is probably a divergent immunoglobulin domain.
This family of insect proteins are each about 100 amino acids long and have 6 conserved cysteine residues. They all have a predicted signal peptide and are probably excreted. The function of the proteins is unknown [].
The flagellar hook-associated protein 2 (HAP2 or FliD) forms the distal end of the flagella, and plays a role in mucin specific adhesion of the bacteria [
]. This entry represents the N-terminal region of this family of proteins.
This is a family of outer surface proteins (Osp) from the Borrelia spp. spirochete [
]. The family includes OspE, OspF, and OspEF-related proteins (Erp) []. These proteins are coded for on different circular plasmids in the Borrelia genome.Borrelia burgdorferi spirochetes, that cause Lyme borreliosis, survive for a long time in human serum due to their ability to succesfully evade the complement system, an important arm of innate immunity. The outer surface protein E (OspE) of B. burgdorferi is required for this since it recruits complement regulator factor H (FH) onto the bacterial surface to evade complement-mediated cell lysis [].
Cilia- and flagella-associated protein 20/CFAP20DC
Type:
Family
Description:
This entry includes cilia- and flagella-associated protein 20 (CFA20) and CFAP20DC, also known as C3orf67. CFA20 is a cilium- and flagellum-specific protein that plays a role in axonemal structure organisation and motility [
,
]. In Chlamydomonas reinhardtii, it stabilises outer doublet microtubules (DMTs) of the axoneme and may work as a scaffold for intratubular proteins, such as tektin and PACRG, to produce the beak structures in DMT1 [,
].
This superfamily represents a beta sandwich domain found at the C terminus of Z ring-associated protein D (ZapD) and in other proteins currently uncharacterised. ZapD enhances FtsZ-ring assembly [
].
Selenocysteine tRNA 1 associated proteins (also known as TRNAU1AP) is involved in the early steps of selenocysteine biosynthesis and tRNA(Sec) charging to the later steps resulting in the cotranslational incorporation of selenocysteine into selenoproteins. Selenium deficiency results in a variety of diseases, including cardiac disease [
]. TRNAU1AP contains RNA recognition motifs (RRM) and a Tyr-rich region found in the C-terminal. The Tyr-rich region (amino acids 185-225) is conserved among several mammals, including human, chimp, dog, cattle, mouse and rat. Furthermore, constitutive deletion of exons corresponding to the Tyr-rich region in mouse resulted in embryonic lethality [].This entry also includes uncharacterised protein C6orf52 from humans.
Factor VIII intron 22 protein, also known as CpG island protein, is contained within the largest factor VIII intron. Unlike factor VIII, it is produced abundantly in a wide variety of cell types [].
This entry includes a group of leucine rich adaptor proteins, including LURAP1 (LRAP35A), LURAP1L and LRAP25. LURAP1 activates the canonical NF-kappa-B pathway, promotes proinflammatory cytokine production and promotes the antigen presenting and priming functions of dendritic cells [
]. LRAP25, also known as C184M or MMTV receptor, is a Ski (a transcriptional co-repressor) binding protein that negatively regulates tumour growth factor-beta signaling by sequestering the Smad proteins in the cytoplasm [
]. LRA25 also serves as an adaptor protein of MRCK (myotonic dystrophy kinase-related Cdc42-binding kinase) [].
This family of proteins activate the canonical NF-kappa-B pathway, promote proinflammatory cytokine production and promote the antigen presenting and priming functions of dendritic cells [
].
MIG-18 functions together with MIG-17 in the regulation of directional migration of gonadal distal tip cells (DTCs) during gonadogenesis in Caenorhabditis elegans [
]. This entry also includes other uncharacterized proteins from C. elegans.
Motile sperm domain-containing protein 1/3 (MOSPD1/3) contain a motile sperm protein domain and two transmembrane domains. MOSPD1 is involved in mesenchymal versus epidermal cell differentiation [
]. It is also involved in mesenchymal stem cell proliferation and differentiation [].
This entry includes proteins RESPONSE TO LOW SULFUR 1-4 (LSU1-4) from Arabidopsis. These proteins may be involved in plant responses to environmental challenges; they are up-regulated in several abiotic and biotic stress conditions [
]. Three out of four of the Arabidopsis LSU genes (namely LSU1, LSU2, and LSU3) are induced by sulfur deficiency []. LSU expression prevents chloroplastic reactive oxygen species (ROS) production and proper stomatal closure during sulphur stress. LSU1 has been shown to interact with and stimulates the activity of the chloroplastic superoxide dismutase FSD2 []. Homologues are known only from plants.
This entry includes SAWADEE HOMEODOMAIN HOMOLOGS 1 and 2 (SHH1 and SHH2) from plants. In Arabidopsis, SHH1 is a homeodomain protein required for DNA methylation and for the accumulation of siRNAs at specific loci [
]. It interacts with Pol-IV and acts upstream in the RdDM (RNA-directed DNA methylation) pathway. Its SAWADEE domain is a chromatin-binding module that functions as a dual lysine reader, probing for both unmethylated K4 and methylated K9 modifications on the histone 3 (H3) tail []. The function of SHH2 is not clear.
ABC transporters belong to the ATP-Binding Cassette (ABC) superfamily, which uses the hydrolysis of ATP to energise diverse biological systems. ABC transporters minimally consist of two conserved regions: a highly conserved ATP binding cassette (ABC) and a less conserved transmembrane domain (TMD). These can be found on the same protein or on two different ones. Most ABC transporters function as a dimer and therefore are constituted of four domains, two ABC modules and two TMDs.ABC transporters are involved in the export or import of a wide variety of substrates ranging from small ions to macromolecules. The major function of ABC import systems is to provide essential nutrients to bacteria. They are found only in prokaryotes and their four constitutive domains are usually encoded by independent polypeptides (two ABC proteins and two TMD proteins). Prokaryotic importers require additional extracytoplasmic binding proteins (one or more per systems) for function. In contrast, export systems are involved in the extrusion of noxious substances, the export of extracellular toxins and the targeting of membrane components. They are found in all living organisms and in general the TMD is fused to the ABC module in a variety of combinations. Some eukaryotic exporters encode the four domains on the same polypeptide chain [
].The ABC module (approximately two hundred amino acid residues) is known to bind and hydrolyse ATP, thereby coupling transport to ATP hydrolysis in a large number of biological processes. The cassette is duplicated in several subfamilies. Its primary sequence is highly conserved, displaying a typical phosphate-binding loop: Walker A, and a magnesium binding site: Walker B. Besides these two regions, three other conserved motifs are present in the ABC cassette: the switch region which contains a histidine loop, postulated to polarise the attaching water molecule for hydrolysis, the signature conserved motif (LSGGQ) specific to the ABC transporter, and the Q-motif (between Walker A and the signature), which interacts with the gamma phosphate through a water bond. The Walker A, Walker B, Q-loop and switch region form the nucleotide binding site [,
,
].The 3D structure of a monomeric ABC module adopts a stubby L-shape with two distinct arms. ArmI (mainly β-strand) contains Walker A and Walker B. The important residues for ATP hydrolysis and/or binding are located in the P-loop. The ATP-binding pocket is located at the extremity of armI. The perpendicular armII contains mostly the alpha helical subdomain with the signature motif. It only seems to be required for structural integrity of the ABC module. ArmII is in direct contact with the TMD. The hinge between armI and armII contains both the histidine loop and the Q-loop, making contact with the gamma phosphate of the ATP molecule. ATP hydrolysis leads to a conformational change that could facilitate ADP release. In the dimer the two ABC cassettes contact each other through hydrophobic interactions at the antiparallel β-sheet of armI by a two-fold axis [
,
,
,
,
,
].The ATP-Binding Cassette (ABC) superfamily forms one of the largest of all protein families with a diversity of physiological functions [
]. Several studies have shown that there is a correlation between the functional characterisation and the phylogenetic classification of the ABC cassette [,
]. More than 50 subfamilies have been described based on a phylogenetic and functional classification [, ,
].This entry represents permease protein (thiP) of the thiamine ABC transporter (thiBPQ), required for the transport of thiamine and thiamine pyrophosphate (TPP) into the cell [
]. It has been experimentally demonstrated that mutants in the various steps in the de novo synthesis of thiamine and its biologically active form, namely thiamine pyrophosphate can be exogenously supplemented with thiamine, thiamine monophosphate (TMP) or thiamine pyrophosphate (TPP).Thiamine pyrophosphate (TPP) is a required cofactor synthesized de novo in Salmonella typhimurium. The primary role for TPP is in central
metabolism as an electron carrier and nucleophile for such enzymes as pyruvate dehydrogenase (), acetolactate synthase (
), and -ketoglutarate dehydrogenase (
). Despite its importance in cellular physiology, neither the de novo biosynthetic
pathway nor the salvage systems for thiamine are fully understood in any organism.
The enzyme responsible for nitrogen fixation, the nitrogenase, shows a high degree of conservation of structure, function, and amino acid sequence across wide phylogenetic ranges. All known Mo-nitrogenases consist of two components, component I (also called dinitrogenase, or Fe-Mo protein), an alpha2beta2 tetramer encoded by the nifD and nifK genes, and component II (dinitrogenase reductase, or Fe protein) a homodimer encoded by the nifH gene [
,
] which has an Fe4S4 cluster bound between the subunits and two ATP-binding domains. The Fe protein supplies energy by ATP hydrolysis, and transfers electrons from reduced ferredoxin or flavodoxin to component 1 for the reduction of molecular nitrogen to ammonia [,
]. Nitrogenase contains two unusual rare metal clusters; one of them is the iron molybdenum cofactor (FeMo-co), which is considered to be the site of dinitrogen reduction and whose biosynthesis requires the products of the nifNE operon and of some other nif genes []. It has been proposed that nifNE might serve as a scaffold upon which FeMo-co is built and then inserted into component I [].This entry represents the molybdenum-iron protein beta chain, encoded by nifK.
Nitrogenase molybdenum-iron cofactor biosynthesis protein
Type:
Family
Description:
The enzyme responsible for nitrogen fixation, the nitrogenase, shows a high degree of conservation of structure, function, and amino acid sequence across wide phylogenetic ranges. All known Mo-nitrogenases consist of two components, component I (also called dinitrogenase, or Fe-Mo protein), an alpha2beta2 tetramer encoded by the nifD and nifK genes, and component II (dinitrogenase reductase, or Fe protein) a homodimer encoded by the nifH gene [
,
] which has an Fe4S4 cluster bound between the subunits and two ATP-binding domains. The Fe protein supplies energy by ATP hydrolysis, and transfers electrons from reduced ferredoxin or flavodoxin to component 1 for the reduction of molecular nitrogen to ammonia [,
]. Nitrogenase contains two unusual rare metal clusters; one of them is the iron molybdenum cofactor (FeMo-co), which is considered to be the site of dinitrogen reduction and whose biosynthesis requires the products of the nifNE operon and of some other nif genes []. It has been proposed that nifNE might serve as a scaffold upon which FeMo-co is built and then inserted into component I [].This entry refers to the nitrogenase iron-molybdenum cofactor biosynthesis protein NifN, which forms an alpha2beta2 tetramer with NifE. NifN and NifE are structurally homologous to nitrogenase MoFe protein beta and alpha subunits respectively [
][]. NifB-co (an iron and sulfur containing precursor of the FeMoco) from NifB is transferred to the NifEN complex where it is further processed to FeMoco. The nifEN bound precursor of FeMoco has been identified as a molybdenum-free, iron- and sulfur- containing analog of FeMoco. It has been suggested that this nifEN bound precursor also acts as a cofactor precursor in nitrogenase systems which require a cofactor other than FeMoco: i.e. iron-vanadium cofactor (FeVco) or iron only cofactor (FeFeco) [
].
Members of this protein family are found in O-antigen biosynthesis loci in Leptospira, two tandem homologues in a polysaccharide biosynthesis region in the archaeon Methanoregula formicicum, in Rhizobium leguminosarum bv. trifolii WSM2297, etc. Members are more strongly conserved in the C-terminal region, where an invariant sequence PFTS is found.
Members of this protein family are related to radical SAM enzymes HemN (oxygen-independent coproporphyrinogen III oxidase) and HutW (a putative heme utilisation enzyme) but lack the signature CxxxCxxC motif for 4Fe-4S binding. Members occur exclusively in Borrelia, which appears to live without iron, as the only radical SAM enzyme homologue in any Borrelia genome. This enzyme has been designated PsgB (Pseudo-SAM, Genus Borrelia) [
].
NifB is a protein required for the biosynthesis of the iron-molybdenum
(or iron-vanadium) cofactor used by the nitrogen-fixing enzyme nitrogenase. NifB belongs to the radical SAM family, and the FeMo cluster biosynthesis process requires S-adenosylmethionine [,
].
Members of this family occur only in CRISPR/cas loci of the MYXAN type, but are present in a minority of such systems. This protein shows similar length and composition but only about 12 percent sequence identity to
/
.
Molybdate ABC transporter, substrate-binding protein
Type:
Family
Description:
This entry represents the molybdate ABC transporter periplasmic binding protein from bacteria and archaea. It has been shown experimentally by radioactive labeling that ModA is a hydrophylioc periplasmic-binding protein in Gram-negative organisms [
,
] and its counterpart in Gram-positive organisms is a lipoprotein. The other components of the system include ModB, an integral membrane protein and ModC, the ATP-binding subunit. Almost all of them display a common beta/alpha folding motif and have similar tertiary structures consisting of two globular domains.Bacterial high affinity transport systems are involved in active transport of solutes across the cytoplasmic membrane. Most of the bacterial ABC (ATP-binding cassette) importers are composed of one or two transmembrane permease proteins, one or two nucleotide-binding proteins and a highly specific periplasmic solute-binding protein. In Gram-negative bacteria the solute-binding proteins are dissolved in the periplasm, while in archaea and Gram-positive bacteria, their solute-binding proteins are membrane-anchored lipoproteins [
,
].
The Sec-independent protein export system TAT, or twin-arginine translocation, is composed of TatA, TatB, and TatC. The TAT system is unusual in Leptospira, with Lys replacing Arg in the second position of the twin-Arg motif. This protein, restricted to Leptospira and showing distant homology to the phosphoserine phosphatases RsbU and SpoIIE, is always encoded immediately downstream of the tatC gene and appears to be part of the variant TAT system. It lacks a TAT signal itself, and so is more likely to be part of the Sec-independent translocation machinery than to be a substrate. The suggested symbol is rktP, for RK-Translocation Phosphatase.
This entry represents Izumo sperm-egg fusion protein 2 (IZUMO2), a member of the IZUMO family.IZUMO is a protein with a single immunoglobulin (Ig) domain. Its expression has been found to be testis-specific [
,
]. IZUMO is not detectable on the surface of fresh sperm, but becomes exposed after an exocytotic process, the acrosome reaction, has occurred. It is thought to bind to putative Izumo receptors on the oocyte. Studies have shown that knock-out mice (Izumo-/- males) were sterile despite normal mating behaviour and ejaculation, indicating the importance of the protein in fertilisation []. IZUMO is a typical type I membrane glycoprotein with one immunoglobulin-like domain and a putative N-glycoside link motif (Asn 204) []. It contains cysteine residues thought to form a disulphide bridge, and a conserved GCL sequence motif.
This entry represents a group of plant proteins, including PRIN2 (Protein PLASTID REDOX INSENSITIVE 2) from Arabidopsis. PRIN2 is a chloroplastic protein essential for proper plant embryo development [
,
]. It is localized to the plastid nucleoids and is required for full expression of genes transcribed by PEP (plastid encoded RNA polymerase), an enzyme that predominantly mediates the transcription of photosynthesis-related genes [].
This entry represents a group of plant proteins, including Sigma factor binding protein 1/2 (SIB1/2) from Arabidopsis. They belong to the large plant-specific VQ motif-containing protein family. SIB1 is a component of the RNA polymerase machinery responsible for transcription of plastid genes []. SIB1 and SIB2 function as activators of WRKY33 in plant defense against necrotrophic pathogens [].In general, Arabidopsis VQPs interacted specifically with the C-terminal WRKY domains of group I and the sole WRKY domains of group IIc WRKY transcription factors [
,
]. Arabidopsis VQPs reported to control stress responses include the calmodulin (CaM)-binding protein CamBP25 and VQ9, which regulate osmotic and salinity tolerance, respectively, the sigma factor binding proteins SIB1 and SIB2, which act as activators of WRKY33 in plant defence, and the negative regulator of the jasmonate defence pathway [].
Zinc finger TRAF-type-containing protein 1 (also known as cysteine and histidine-rich protein 1 (CYHR1)) is an uncharacterized protein containing a degenerate RING-type zinc finger and a TRAF-type zinc finger. It binds to, and might assist in the intracellular trafficking of, the soluble beta-galactoside-binding protein, galectin-3 [
].
Escherichia coli has an iron(II) transport system (feo) which may make an important contribution to the iron supply of the cell under anaerobic conditions. FeoB has been identified as part of this transport system and may play a role in the transport of ferrous iron. FeoB is a large 700-800 amino acid integral membrane protein. The N terminus contains a P-loop motif suggesting that iron transport may be ATP dependent [
].
This entry represents a group of animal FLYWCH-type zinc finger-containing proteins, including FLYWCH1 and FLYWCH2. FLYWCH1 is a transcription modulator that suppresses the transcriptional activity of Wnt/beta-catenin signaling. It may function as a tumor suppressor in colorectal cancer [
].
A bacterial microcompartment is a protein-based metabolic organelle where several enzymes and metabolites are brought together. PduM is a structural component of the microcompartments involved in coenzyme B(12)-dependent 1,2-propanediol degradation by Salmonella enterica [
].
Phosphate ABC transporter substrate-binding protein
Type:
Family
Description:
Members of this family are the substrate-binding protein of the phosphate ABC transporter as found in Mollicutes genera such as Mycoplasma, Mesoplasma, and Spiroplasma. The most similar sequences outside this family are PtsS in phosphate binding protein family (
), but sequence architecture differs considerably. Members of this family are never lipoproteins.
PSRP5 is a component of the plastid ribosomal protein (PRP) complex. Plastid-specific 50S ribosomal proteins (PSRPs) may have evolved to perform functions unique to plastid translation and its regulation [
].
Spermatogenic leucine zipper protein 1 (Spz1) is a basic helix-loop-helix-leucine zipper (bHLH-Zip) transcription factor expressed specifically in the testis and epididymis of adult mice. It plays an important regulatory role during spermatogenesis [
,
]. It also mediates mitogen-activated protein kinase cell proliferation, transformation, and tumorigenesis []. TIP60-dependent acetylation of the SPZ1-TWIST complex has been shown to activate VEGF (vascular endothelial growth factor) expression, which regulates tumour metastasis and promotes several physiological and pathological events including angiogenesis, vascular hyperpermeability, cancer metastasis, and cancer stem cell transition [].
Members of this protein family have a novel N-terminal sequence region. The C-terminal region of these proteins share homology with proteins in the PpiC-type peptidyl-prolyl cis-trans isomerase entry (
). However, proteins in this family are not included in the IPR000297 entry, suggesting an origin within a branch of the PpiC family but subsequent neofunctionalisation with a rapid change of sequence. The genome context for members always includes an ATP-grasp enzyme associated with peptide modification and a short polypeptide likely to be the modification target.
Nuclear polyadenylated RNA-binding protein Nab2/ZC3H14
Type:
Family
Description:
Budding yeast Nab2 is a nuclear polyadenylated RNA-binding protein that contains seven CCCH-type zinc fingers and an RGG box [
]. It is required for nuclear mRNA export and poly(A) tail length control [,
]. The tertiary structure has been solved and the protein is all helical (). Its animal homologue, known as ZC3H14, has been shown to control poly(A) tail length in neuronal cells [
].
Chloroplast protein CEST, also known as Ycf3-interacting protein 1 (Y3IP1), cooperates with Ycf3 in the assembly of stable photosystem I units in the thylakoid membrane [
]. It also seems to induce tolerance to multiple environmental stresses and reduce photooxidative damage [].
G patch domain-containing protein 3 (GPATCH3) is a transcriptional regulator during embryonic development, activating the CXCR4 promoter. The CXCR4 gene is involved in embryo neural crest cell migration. Because the GPATCH3 protein is detected in human tissues relevant to glaucoma, GPATCH3 may be involved in the pathogenesis of congenital glaucoma [
]. GPATCH3 is also a negative regulator of retinoic acid-inducible gene I-like receptor-mediated antiviral signaling pathways, a means to control the response triggered by the presence of viral RNA by interacting with VISA (an adaptor of virus-triggered, RLR-mediated induction of innate antiviral responses), and disrupting the assembly of the virus-induced VISA signalosome [].Proteins in this family contain a G-patch domain.
This domain is a chain that forms part of the integron cassette protein VCH_CASS14 present in Vibrio cholerae. In each monomer lies a deep binding pocket for small molecule substrates formed by helices alpha-1 and alpha-2 and residues from the central four strands of the β-sheet. The pocket is extensively lined with hydrophobic side chains [].
Sarcoplasmic reticulum histidine-rich calcium-binding protein
Type:
Family
Description:
The histidine-rich calcium-binding protein of sarcoplasmic reticulum (HRC) may play a role in the regulation of calcium sequestration or release in the SR of skeletal and cardiac muscle [
]. This protein is very acidic (31% of Asp and Glu) and rich in histidine (13%). The sequence of HRC contains 10 tandem repeats of a 26 to 29 amino acid residues domain. This domain starts with an invariant hexapeptide (HRHRGH), followed by a stretch of acidic residues. The end of the domain consists of an almost invariant nonapeptide (STESDRHQA). The highly acidic central cores of each repeat are likely to constitute the calcium-binding sites of HRC.
This entry represents a group of plant proteins, including protein PLASTID MOVEMENT IMPAIRED 1 (PMI1) and PMIR1/2 from Arabidopsis. PMI1 and PMIR1 are required for efficient chloroplast photorelocation movement. However, PMIR2 seems not necessary for chloroplast and nuclear photorelocation movements [
]. PMI1 has been found to mediate ABA sensitivity during germination [
].
SCHIP-1 is a coiled-coil protein that specifically associates with schwannomin in vitro and in vivo. The product of the neurofibromatosis type 2 (NF2) tumour suppressor gene, known as schwannomin or merlin, is involved in NF2-associated and sporadic schwannomas and meningiomas. It is closely related to the ezrin-radixin-moesin family members, which link membrane proteins to the cytoskeleton. Association with SCHIP-1 can be observed only with some naturally occurring mutants of schwannomin, or a schwannomin spliced isoform lacking exons 2 and 3, but not with the schwannomin isoform exhibiting growth-suppressive activity [
].This entry represents the C-terminal domain of mammalian SCHIP-1. This domain contains a leucine zipper coiled-coil structure implicated in their dimerization [
]. This domain is also found in its isoforms such as IQCJ-SCHIP-1 (IQ motif containing J-Schwannomin-Interacting Protein 1). The N-terminal part of IQCJ-SCHIP-1 possesses an additional IQ motif, which typically serves as a Ca2+-independent CaM-binding site [].
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [
]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids. CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA). In addition, there are many protein families known as CRISPR-associated sequences (Cas), which are encoded in the vicinity of CRISPR loci [
]. CRISPR/cas gene regions can be quite large, with up to 20 different, tandem-arranged cas genes next to a CRISPR cluster or filling the region between two repeat clusters. Cas genes and CRISPRs are found on mobile genetic elements such as plasmids, and have undergone extensive horizontal transfer. Cas proteins are thought to be involved in the propagation and functioning of CRISPRs. Some Cas proteins show similarity to helicases and repair proteins, although the functions of most are unknown. Cas families can be divided into subtypes according to operon organisation and phylogeny. Members of this family are the Cas4 protein of a novel CRISPR subtype, PREFRAN, found in Prevotella bryantii B14, Prevotella disiens FB035-09AN, Francisella tularensis subsp. novicida, Francisella philomiragia, Butyrivibrio proteoclasticus B316, Helcococcus kunzii ATCC 51366, etc.
I-BAR domain containing protein IRSp53/IRTKS/Pinkbar
Type:
Family
Description:
This entry represents a group of I-BAR (Bin/amphipysin/Rvs) domain containing proteins, including IRSp53, IRTKS, Pinkbar. BAR domain forms an anti-parallel all-helical dimer, with a curved (banana-like) shape, that promotes membrane tubulation. The BAR domain containing proteins can be classified into three types: BAR, F-BAR and I-BAR. BAR and F-BAR proteins generate positive membrane curvature, while I-BAR proteins induce negative curvature [
]. Proteins in this family also contain an additional C-terminal SH3 domain.