This family includes the L2 minor capsid protein, a late protein from Human papillomavirus (HPV). HPV are dsDNA viruses with no RNA stage in their replication cycle. Their dsDNA is contained within a capsid composed of 72 L1 capsomers and about 36 L2 minor capsid proteins. L2 minor capsid proteins enter the nucleus twice during infection: in the initial phase after virion disassembly, and in the productive phase when it assembles into replicated virions along with L1 major capsid proteins. L2 proteins contain two nuclear localisation signals (NLSs), one at the N-terminal (nNLS) and the other at the C-terminal (cNLS). L2 uses its NLSs to interact with a network of karyopherins in order to enter the nucleus via several import pathways. L2 from HPV types 11 and 16 was shown to interact with karyopherins Kapbeta(2) and Kapbeta(3) [
,
]. L2 capsid proteins can also interact with viral dsDNA, facilitating its release from the endocytic compartment after viral uncoating.
This entry represents NifA, a DNA-binding regulatory protein for nitrogen fixation. Not included in this group are: the homologue in Aquifex aeolicus (which lacks nitrogenase), transcriptional activators of alternative nitrogenases (VFe or FeFe instead of MoFe), and truncated forms.In diazotrophic proteobacteria, the sigma54-dependent activator NifA activates transcription of the nif (nitrogen fixation) genes by a conserved mechanism common to members of the enhancer binding protein family. Although NifA proteins have similar domain structures, both transcriptional regulation of nifA expression and posttranslational regulation of NifA activity by oxygen and fixed nitrogen vary significantly from one organism to another. In Klebsiella pneumoniae and Azotobacter vinelandii, nifA is co-ordinately transcribed with a second gene, nifL, whose product inhibits NifA activity in response to oxygen and fixed nitrogen [
].
This family represents Protein K7 from Orthopoxvirus. K7 is Bcl-2-like protein which, through its interaction with the DEAD box RNA helicase DDX3X/DDX3, prevents TBK1/IKKepsilon-mediated IRF3 activation [
]. It contributes to virulence by binding to the host TRAF6 and IRAK2 and preventing host NF-kappa-B activation and affects the acute immune response to infection [,
,
]. In vaccinia virus, this protein has been related to the increase in cellular histone methylation during infection [].
This domain is found in eukaryotes, and is approximately 280 amino acids in length. The family is found in association with
. There is a single completely conserved residue E that may be functionally important. Dynactin has been associated with Dynein, a kinesin protein which is involved in organelle transport, mitotic spindle assembly and chromosome segregation. Dynactin anchors Dynein to specific subcellular structures [
].
The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [
]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [
,
,
].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [
]. Type III complexes share the Cas10 subunit but are subclassifed as type IIIA
(CSM) and type IIIB (CMR), depending on their specificity for DNA or RNA targets, respectively []. This family represents the CRISPR system CMS protein Csm5.
The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [
]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [
,
,
].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [
]. This entry represents the Cse3 (CRISPR/Cas Subtype Ecoli protein 3, also known as CasE) family of Cas proteins. The Thermus thermophilus HB8 family member has been crystallised and found to have a structure consisting of two domains with opposing parallel β-sheets, known as a β-sheet platform [
]. This structure is similar to those found in the sex-lethal protein and poly(A)-binding protein and is consistent with an RNA-binding function.
Rab23 is a member of the Rab family of small GTPases. In mouse, Rab23 has been shown to function as a negative regulator in the sonic hedgehog (Shh) signaling pathway. Rab23 mediates the activity of Gli2 and Gli3, transcription factors that regulate Shh signaling in the spinal cord, primarily by preventing Gli2 activation in the absence of Shh ligand [
]. Rab23 also regulates a step in the cytoplasmic signal transduction pathway that mediates the effect of Smoothened (one of two integral membrane proteins that are essential components of the Shh signaling pathway in vertebrates) [,
].In humans, Rab23 is expressed in the retina [
]. Mice contain an isoform that shares 93% sequence identity with the human Rab23 and an alternative splicing isoform that is specific to the brain. This isoform causes the murine open brain phenotype, indicating it may have a role in the development of the central nervous system [,
]. GTPase activating proteins (GAPs) interact with GTP-bound Rab and accelerate the hydrolysis of GTP to GDP. Guanine nucleotide exchange factors (GEFs) interact with GDP-bound Rabs to promote the formation of the GTP-bound state. Rabs are further regulated by guanine nucleotide dissociation inhibitors (GDIs), which facilitate Rab recycling by masking C-terminal lipid binding and promoting cytosolic localization. Most Rab GTPases contain a lipid modification site at the C terminus, with sequence motifs CC, CXC, or CCX. Lipid binding is essential for membrane attachment, a key feature of most Rab proteins [
].
This entry represents Sm-like protein Lsm3. It can be found in the nuclear Lsm2-8 complex or in the cytoplasmic Lsm1-7 complex. The Lsm2-8 complex associates with multiple snRNP complexes containing the U6 snRNA (U4/U6 snRNP, U4/U6.U5 snRNP, and free U6 snRNP). It binds and stabilizes the 3'-terminal poly(U) tract of U6 snRNA and facilitates the assembly of U4-U6 di-snRNP and U4-U6-U5 tri-snRNP [,
,
]. The Lsm1-7 complex associates with deadenylated mRNA and promotes decapping in the 5'-3' mRNA decay pathway [,
]. The Sm and the Lsm proteins, characterised by the Sm-domain, have RNA-related functions. The Sm heptamer ring associates with four (U1, U2, U4, U5) snRNPs, while Lsm2-8 heptamer is part of the U6 snRNP. Another Lsm heptameric complex, Lsm1-7, which differs from Lsm2-8 by one Lsm protein, functions in mRNA decapping, a crucial step in the mRNA degradation pathway [
].
This entry represents Sm-like protein Lsm4. It could be found in the nuclear Lsm2-8 complex or in the cytoplasmic Lsm1-7 complex. The Lsm2-8 complex associates with multiple snRNP complexes containing the U6 snRNA (U4/U6 snRNP, U4/U6.U5 snRNP, and free U6 snRNP). It binds and stabilizes the 3'-terminal poly(U) tract of U6 snRNA and facilitates the assembly of U4-U6 di-snRNP and U4-U6-U5 tri-snRNP [
,
,
]. The Lsm1-7 complex associates with deadenylated mRNA and promotes decapping in the 5'-3' mRNA decay pathway [,
]. The Sm and the Lsm proteins, characterised by the Sm-domain, have RNA-related functions. The Sm heptamer ring associates with four (U1, U2, U4, U5) snRNPs, while Lsm2-8 heptamer associates with the U6 snRNP. Another Lsm heptameric complex, Lsm1-7, which differs from Lsm2-8 by one Lsm protein, functions in mRNA decapping, a crucial step in the mRNA degradation pathway [
].
This entry represents Sm-like protein Lsm8. It is found in the nuclear Lsm2-8 complex, which associates with multiple snRNP complexes containing the U6 snRNA (U4/U6 snRNP, U4/U6.U5 snRNP, and free U6 snRNP). The Lsm2-8 complex binds and stabilizes the 3'-terminal poly(U) tract of U6 snRNA and facilitates the assembly of U4-U6 di-snRNP and U4-U6-U5 tri-snRNP [
,
,
]. The Sm and the Lsm proteins, characterised by the Sm-domain, have RNA-related functions. The Sm heptamer ring associates with four (U1, U2, U4, U5) snRNPs, while Lsm2-8 heptamer is part of the U6 snRNP. Another Lsm heptameric complex, Lsm1-7, which differs from Lsm2-8 by one Lsm protein, functions in mRNA decapping, a crucial step in the mRNA degradation pathway [
].
The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [
]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [
,
,
].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [
]. Members of this protein family are Csp1, also known as Cas7, a CRISPR-associated (cas) gene marker for the Pging subtype of CRISPR/cas system, as found in Porphyromonas gingivalis W83 and Bacteroides forsythus ATCC 43037. This protein belongs to the family of DevR (
), a regulator of development in Myxococcus xanthus located in a cas gene region. A different branch of the DevR family, Cst2 (
), is a marker for the Tneap subtype of CRISPR/cas system.
This entry represents Sm-like protein Lsm1. It can be found in the cytoplasmic Lsm1-7 complex, which associates with deadenylated mRNA and promotes decapping in the 5'-3' mRNA decay pathway [
,
]. The Sm and the Lsm proteins, characterised by the Sm-domain, have RNA-related functions. The Sm heptamer ring associates with four (U1, U2, U4, U5) snRNPs, while Lsm2-8 heptamer is part of the U6 snRNP. Another Lsm heptameric complex, Lsm1-7, which differs from Lsm2-8 by one Lsm protein, functions in mRNA decapping, a crucial step in the mRNA degradation pathway [
].
Proteins in this family have been inferred by homology as being related to both ArgE (N-formyl-4-amino-5-aminomethyl-2-methylpyrimidine deformylase) and DapE (succinyl-diaminopimelate desuccinylase). The family includes yodQ from Bacillus subtilis, which is part of a transcriptional unit (yodT-yodS-yodR-yodQ-yodP-kamA) whose expression is upregulated during sporulation, and that may be involved in the production of N-acetyl-beta-lysine [
].
Insect cuticle is composed of proteins and chitin. The cuticular proteins seem to be specific to the type of
cuticle (flexible or stiff) that occur at stages of the insect development. The proteins found in the flexiblecuticle of larva and pupa of different insects share a conserved C-terminal section [
] such aregion is also found in the soft endocuticle of adults insects [
] as well as in other cuticularproteins including in arachnids [
]. In addition, cuticular proteins share hydrophobic regionsdominated by tetrapeptide repeats (A-A-P-A/V), which are presumed to be functionally important [
,
]. Many insect cuticle proteins also include a 35-36 amino acid motif known as the R and R consensus. An extended form of this motif has been shown [] to bind chitin. It has no sequence similiarity to the cysteine-containing chitin-binding domain of chitinases and some peritrophic membrane proteins, suggesting that arthropods have two distinct classes of chitin-binding proteins, those with the chitin-binding domains found in lectins, chitinases and peritrophic membranes (cysCBD), and those with the type of chitin-binding domains found in cuticular proteins (non-cysCBD) [].The cuticle protein signature has been found in locust cuticle proteins 7 (LM-7), 8 (LM-8), 19
(LM-19) and endocuticle structural glycoprotein ABD-4; Hyalophora cecropia (Cecropia moth) cuticle proteins 12 and 66;Drosophila melanogaster (Fruit fly) larval cuticles proteins I, II, III and IV (LCP1 to LCP4); drosophila pupal cuticle proteins PCP,
EDG-78E and EDG-84E; Manduca sexta (Tobacco hawkmoth) cuticle protein LCP-14; Tenebrio molitor (Yellow mealworm) cuticle proteins ACP-20, A1A, A2Band A3A; and Araneus diadematus (Spider) cuticle proteins ACP 11.9, ACP 12.4, ACP 12.6, ACP 15.5 and ACP 15.7.
This family includes the minor capsid protein VIII from adenoviruses. Protein VIII is a structural component of the virion that lashes peripentonal hexons to the hexons situated in the facets through its interaction with the capsid vertex protein [
]. Adenoviruses are responsible for diseases such as pneumonia, cystitis, conjunctivitis and diarrhoea, all of which can be fatal to patients who are immunocompromised [].
Bacteriophage lambda encodes two repressors: the Cro repressor that acts to turn off early gene transcription during the lytic cycle, and the lambda or cI repressor that is required to maintain lysogenic growth. Together the Cro and cI repressors form a helix-turn-helix (HTH) superfamily. The lambda Cro repressor binds to DNA as a highly flexible dimer. The crystal structure of the lambda Cro repressor [
] reveals a HTH DNA-binding protein with an alpha/beta fold that differs from other Cro family members, possibly by an evolutionary fold change []. Most Cro proteins, such as Enterobacteria phage P22 Cro and Bacteriophage 434 Cro, have an all-alpha structure that is thought to be ancestral to lambda Cro, where the fourth and fifth helices are replaced by a β-sheet, possibly as a result of secondary structure switching rather than by nonhomologous replacement []. This entry represents the lambda-type Cro repressor with an alpha/beta topology.
This family includes KIAA1549 from human (
) which has been implicated in pilocytic astrocytomas [
,
,
]. In the majority of cases of pilocytic astrocytomas a tandem duplication produces an in-frame fusion of the gene encoding this protein and the BRAF oncogene. The resulting fusion protein has constitutive BRAF kinase activity and is capable of transforming cells. More recently, KIAA1549 has been described to play a role in photoreceptor function [].
Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [
,
]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [,
].This family consists of the 30S ribosomal proteins subunit S22 polypeptides. This polypeptide is 47 amino acids in length and has a molecular weight of about 5kDa. The S22 subunit is a component of the stationary-phase-specific ribosomal protein and is assembled in the ribosomal particles in the stationary phase. This subunit along with other stationary-phase-specific ribosomal proteins result in compositional changes of ribosomes during the stationary phase. The significance of this change is not clear as yet [
].
This entry includes the major DNA-binding protein (DBP, UL57 or ICP8) from Herpesviruses. DBP binds single-stranded DNA, and the region encompassing residues 368-902 contains the DNA-binding site [
]. UL5, UL8 and UL52 genes encode an essential heterotrimeric DNA helicase-primase that is responsible for concomitant DNA unwinding and primer synthesis at the viral DNA replication fork. DBP may stimulate DNA unwinding and enable bypass of cisplatin damaged DNA by recruiting the helicase-primase to the DNA []. DBP helps initiate DNA replication by binding to the origin-binding protein (UL9) []. It also reorganizes the host nucleus leading to the formation of prereplicative sites and replication compartments [].
The green fluorescent-like protein family consists of fluorescent proteins and non-fluorescent chromoproteins, derived from several species of Cnidarians, as well as certain diazotrophic bacteria [
,
]. These proteins range in their absorption wavelength maximum, and are often classified by their colour: green, yellow, red and purple-blue. These colour differences arise from changes in the structure of the chromophore, which is generated internally by auto-catalysis. The chromophore comprises Ser65-Tyr66-Gly67 in Aequorea victoria (Jellyfish), which forms a five-member ring after its modification. In the bioluminescent organism A. victoria, GFP acts to transform the blue light emitted from aequorin into green light. However, most organisms with GFP-like molecules are not bioluminescent, and in some cases are not even fluorescent. These proteins all display a β-can structure, which surrounds the chromophore and acts to shield it against quenching agents. The G2 domain of nidogen contains a β-can structure that exhibits extraordinary similarity to GFP, even though their sequences show only low sequence identity [
]. Nidogen is a component of basement membranes, whose interactions with other basement membrane proteins contribute to the assembly and function of the basement membrane. The G2 domain serves as a protein-binding module. The structure is similar enough between GFP and the G2 domain of nidogen to suggest a common ancestral origin.
This entry represents the major capsid protein VP1 (viral protein 1) from Polyomaviruses, such as Murine polyomavirus (strain P16 small-plaque) (MPyV) [
]. Polyomaviruses are dsDNA viruses with no RNA stage in their life cycle. The virus capsid is composed of 72 icosahedral units, each of which is composed of five copies of VP1. The virus attaches to the cell surface by recognition of oligosaccharides terminating in alpha(2,3)-linked sialic acid. The capsid protein VP1 forms a pentamer. The complete capsid is composed of 72 VP1 pentamers, with a minor capsid protein, VP2 or VP3, inserted into the centre of each pentamer like a hairpin. This structure restricts the exposure of internal proteins during viral entry. Polyomavirus coat assembly is rigorously controlled by chaperone-mediated assembly. During viral infection, the heat shock chaperone hsc70 binds VP1 and co-localises it in the nucleus, thereby regulating capsid assembly [].
Members of this family are very small proteins, about 47 residues each. An EIxxE motif present in most members of this family resembles cleavage sites by the germination protease GPR in a number of small acid-soluble spore proteins (SASP). A role in sporulation is possible.
The proteins in this family are related to the m04 encoded protein gp34 of pathogenic microorganisms such as Murid herpesvirus 1. m06 and m152 genes are expressed earlier in the intracellular replication phases of these microorganism' life cycles. They function to inhibit MHC-1 loading and export. gp34 is theorized to prevent immune reactions from NK cells which would ordinarily recognise and attack cells lacking MHC [
].
Autophagy is a major survival mechanism in which eukaryotes recycle cellular nutrients during stress conditions. Atg22, Avt3 and Avt4 are partially redundant vacuolar effluxers, which mediate the efflux of leucine and other amino acids resulting from autophagy [
]. This family also includes other transporter proteins.
Atg22 (also known as Aut4) protein functions as a vacuolar effluxer which mediates the efflux of amino acids resulting from autophagic degradation. The release of autophagic amino acids allows the maintenance of protein synthesis and viability during nitrogen starvation [
,
,
]. Members of this family belong to the Major Facilitator Superfamily (MFS) of membrane transport proteins, which are thought to function through a single substrate binding site, alternating-access mechanism involving a rocker-switch type of movement [,
].
The beta-lactamase-inhibitor protein (BLIP) is produced by Streptomyces species. BLIP acts as a potent inhibitor of beta-lactamases such as TEM-1, which is the most widespread resistance enzyme to penicillin antibiotics. BLIP binds competitively to TEM-1 and makes direct contacts with TEM-1 active site residues. BLIP is able to inhibit a variety of class A beta-lactamases, possibly through flexibility of its two domains. The two tandemly repeated domains of BLIP have an alpha(2)-beta(4) structure, the β-hairpin loop from domain 1 inserting into the active site of beta-lactamase []. BLIP shows no sequence similarity with BLIP-II, even though both bind to and inhibit TEM-1 [].
Endospores of B. subtilis are encased in a thick protein shell known as the spore coat. The coat's complex structure comprises an inner coat (IC) and an
outer coat (OC), and includes more than 70 spore-specific proteins. This entry includes several sporulation-specific proteins including YjcZ [] and SscA, which is involved in spore germination and spore coat assembly []. Proteins in this entry are found only in spore-forming species. A Gly-rich variable region is followed by a strongly conserved, highly hydrophobic region, predicted to form a transmembrane helix, ending with an invariant Gly. The consensus for this stretch is FALLVVFILLIIV.
Mouse LOC66273 isoform 2 (LI2) protein, a novel Mth938 domain-containing protein, may play a role in preadipocyte differentiation and adipogenesis [
].
The Sm and the Lsm proteins, characterised by the Sm-domain, have RNA-related functions. The Sm heptamer ring associates with four (U1, U2, U4, U5) snRNPs, while Lsm2-8 heptamer is part of the U6 snRNP. Another Lsm heptameric complex, Lsm1-7, which differs from Lsm2-8 by one Lsm protein, functions in mRNA decapping, a crucial step in the mRNA degradation pathway [
].
KIF1 binding protein (KBP) is a binding partner for KIF1Balpha that is a regulator of its transport function and thus represents a type of kinesin interacting protein [
].
Competence is the ability of a cell to take up exogenous DNA from its environment, resulting in transformation. It is widespread among bacteria and is probably an important mechanism for the horizontal transfer of genes. DNA usually becomes available by the death and lysis of other cells. Competent bacteria use components of extracellular filaments called type 4 pili to create pores in their membranes and pull DNA through the pores into the cytoplasm. This process, including the development of competence and the expression of the uptake machinery, is regulated in response to cell-cell signalling and/or nutritional conditions [
].CoiA falls within a competence-specific operon in Streptococcus. It is required for optimal transformation [
].
This group represents competence protein CoiA found in bacteria belonging to the Bacilli class. It is required for optimal transformation [
]. CoiA is a transient protein expressed specifically during competence and required for genetic transformation, but not for DNA uptake [
,
].
Proteins in this family bind to fibrinogen. Fibrinogen is capable of binding to a wide number of endogenous proteins and cell receptors during haemostasis, including binding to platelets to promote their aggregation [
]. Included in this family is receptor FbsA () from the pathogen Streptococcus agalactiae, which is responsible for causing endocarditis in humans. FbsA is considered an important virulence factor that is capable of binding fibrinogen, through which it elicits platelet aggregation and adherence to the extracellular matrix. This enables the bacteria to invade the pulmonary epithelium, which may be a prerequisite for infection [
].
The matrix protein plays a crucial role in virus assembly, and interacts with the RNP complex as well as with the viral membrane. It is found in Morbillivirus, Paramyxovirus, Pneumovirus.
This family includes FlgP from Vibrio cholerae, which is part of an operon with two genes, flgO and flgP, positively regulated by FlrC, the activator of class III flagellar genes. FlgP is an outer membrane lipoprotein required for motility that functions as a colonization factor [
,
,
].
Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [
,
]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [
,
].A number of eukaryotic and archaeabacterial ribosomal proteins can be grouped on the basis of sequence
similarities. One of these families [] consists of mammalian ribosomal protein L24; yeastribosomal protein L30A/B (Rp29) (YL21); Kluyveromyces lactis ribosomal protein L30; Arabidopsis thaliana
ribosomal protein L24 homolog; Haloarcula marismortui ribosomal protein HL21/HL22; and Methanocaldococcus jannaschii (Methanococcus jannaschii) MJ1201. These proteins have 60 to 160 amino-acid residues.This entry represents L24e ribosomal proteins mostly from Archaea. Some eukaryotic proteins such as RSL24D1 from humans is included in this entry. The function of RSL24D1 is not clear.
It has recently been shown [
] that three yeast proteins, two of which are known to be induced by various stress conditions, are structurally related and are probably part of a larger family. These
proteins include cold-shock inducible protein TIR1 (also known as serine-rich protein 1, SRP1), which is induced by glucose [
] and cold shock []; temperature-shock inducible protein 1 (SRP2) [
]; seripauperins, which are closely related protein of about 13kDa (120 to 124 residues) and are generally encoded at the extremity of yeast chromosomes (eg. PAU1, PAU2, PAU3, PAU4, PAU5, PAU6,
YBR301w, YGL261c, YGR294w, YHL046c, YIL176c, YIR041w and YKL224c) []; and hypothetical proteins YIL011w, YJR150c and YJR151c. These proteins all seem to start with a putative signal sequence followed by
a conserved domain of about 90 residues. In TIR1, TIR2, TIP1, YIL011w, YJR150c and YJR151c, this domain is followed by a repetitive serine and alanine rich region absent in the other members of this family.
The fire ant, Solenopsis invicta, exists in either monogyne or polygyne social forms: the former contains a single reproductive queen, the latter multiple queens. These social behavioural differences are associated with variation at gene Gp-9, colonies bearing only the B allelic variant being monogyne, those bearing both B and b-like variants being polygyne [
,
,
]. Gp-9 is a pheromone-binding protein, and may act as a homodimer. Little is known about Gp-9 at the molecular level, but it seems that differences in worker Gp-9 genotypes between social forms may lead to differences in their ability to recognise queens and hence regulate their numbers [
].
Sulfolobus virus-like particle SSV1 and its fusellovirus homologues can be found in many acidic (pH less than 4.0) hot springs (greater than 70 degrees C) around the world. SSV1 contains a 15.5-kb double-stranded DNA genome that encodes 34 proteins with greater than 50 amino acids [
]. A site-specific integrase and a DnaA-like protein have been previously identified by sequence homology, and three structural proteins have been isolated from purified virus and identified by N-terminal sequencing (VP1, VP2, and VP3).
This family represents Protein N1 and similar proteins from the genus Orthopoxvirus. N1 is a Bcl-2-like protein which contributes to virulence by preventing host NF-kappa-B activation in response to pro-inflammatory stimuli such as TNF-alpha or IL1B [
,
].
This entry represents GRB2-associated-binding proteins 1-4 (Gab1-4) and similar proteins from animals. Proteins in this family contain one PH domain. GAB1 from humans and its orthologues dos and soc-1 from Drosophila and C. elegans [
], respectively, function as adapter proteins that modulate intracellular signalling cascades triggered by activated receptor-type kinases. Gab1 from human plays a role in FGFR1 signalling, is involved in the MET/HGF-signalling pathway and functions as an important upstream regulator in EGF-mediated activation of mTORCs [,
].
This entry represents KaiC domain-containing proteins that occur across a broad taxonomic range (Euryarchaeota, Aquificae, Dictyoglomi, Epsilonproteobacteria, and Firmicutes), but exclusively in thermophiles.
This entry represents a group of homeobox proteins from animals, including Hox-A/B10 from Danio rerio, Hox-A/C/D10 from humans and Homeobox protein abdominal-B from Drosophila melanogaster. The homeobox domain binds DNA through a helix-turn-helix (HTH) structure. Some proteins in this family are known to function as sequence-specific transcription factors [
,
].
This entry represents a group of homeobox proteins, including HOX-A1/B1/D1 (HXA1/B1/D1) and related proteins from animals. The homeobox domain is a DNA binding domain. Human HOX genes are critical for development of the central nervous system [
]. They have been also implicated in vascular development and angiogenesis, particularly with regulation of genes involved in cell-cell or cell-extracellular matrix (ECM) interactions [
]. HXA1 has been linked to the Bosley-Salih-Alorainy syndrome (BSAS) or the Athabascan brainstem dysgenesis syndrome (ABDS) []. HXB1 is a transcription factor that play important roles in early vertebrate development [,
,
,
,
] and its expression is associated with tumours [,
]. HOXD1 plays a significant role in endothelial cell functions by regulating the expression of ITGB1 [].Homeobox protein rough from Drosophila melanogaster is a member of this group of proteins. It is required to establish the unique cell identity of photoreceptors R2 and R5 and consequently for ommatidial assembly in the developing eye imaginal disk in the organism [
,
].
Retroviral matrix proteins (or major core proteins) are components of envelope-associated capsids, which line the inner surface of virus envelopes and are associated with viral membranes [
]. Matrix proteins are produced as part of Gag precursor polyproteins. During viral maturation, the Gag polyprotein is cleaved into major structural proteins by the viral protease, yielding the matrix (MA), capsid (CA), nucleocapsid (NC), and some smaller peptides. Gag-derived proteins govern the entire assembly and release of the virus particles, with matrix proteins playing key roles in Gag stability, capsid assembly, transport and budding. Although matrix proteins from different retroviruses appear to perform similar functions and can have similar structural folds which predominantly consist of four closely packed α-helices that are interconnected through loops, their primary sequences can be very different []. This entry represents matrix proteins from gamma-retroviruses, such as Moloney murine leukemia virus (MoMLV), Feline leukemia virus (FLV), and Feline sarcoma virus (FESV) [
,
]. This entry also identifies matrix proteins from several eukaryotic endogenous retroviruses, which arise when one or more copies of the retroviral genome becomes integrated into the host genome [].
Polycomb group proteins are transcriptional repressors that control processes ranging from the maintenance of cell fate decisions and stem cell pluripotency in animals to the control of flowering time in plants. Additional sex combs (Asx) is a member of the polycomb group which is required for maintenance of stable repression of homeotic genes during Drosophila development [
,
].
The domain forms an unusual alpha/beta fold where a six-stranded antiparallel β-sheet is wrapped around a central α-helix, flanked by an additional α-helix and a small sub-domain consisting of a single β-strand and a two-stranded antiparallel β-sheet [
]. It shows weak structural similarities to phosphoribosylformylglycinamidine synthases and some thioesterase superfamily members, but its function is unknown.
Staphylococcus aureus lrgAB operon negatively regulates murein hydrolase activity and promotes tolerance to penicillin [
]. LgrB inhibits the expression or activity of extracellular murein hydrolases by interacting, possibly with LrgA, with the holin-like proteins CidA and/or CidB.
T-cell immunomodulatory protein (Tip) is a modulator of T-cell function. It has a protective effect in graft versus host disease model [
] and may protect the parasite against attack by the host immune system by immunomodulation [].This entry also includes LINKIN from C. elegans. LINKIN is a transmembrane protein required for maintaining tissue integrity through cell adhesion and apical polarization. It is suggested to be an adhesion molecule that uses its extracellular domain to bind molecules on the surface of neighbouring cells and its intracellular domain to regulate microtubule dynamics [
].
The lipocalins are a diverse, interesting, yet poorly understood family of
proteins composed, in the main, of extracellular ligand-binding proteinsdisplaying high specificity for small hydrophobic molecules [
]. Functions of these proteins include transport of nutrients, control of cell regulation, pheromone transport, cryptic colouration, and the enzymatic synthesis of prostaglandins. The crystal structures of several lipocalins have been solved and show a novel 8-stranded anti-parallel β-barrel fold well conserved within the family. Sequence similarity within the family is at a much lower level and would seem to be restricted to conserved disulphides and 3 motifs, which form a juxtaposed cluster that may act as a common cell surface receptor site [
,
]. By contrast, at the more variable end of the fold are found an internal ligand binding site and a putative surface for the formation of macromolecular complexes []. The anti-parallel β-barrel fold is also exploited by the fatty acid-binding proteins, which function similarly by binding small hydrophobic molecules. Similarity at the sequence level, however, is less obvious, being confined to a single short N-terminal motif.A number of lipocalins act in invertebrate colouration and are represented in this entry. These include: bilin binding protein from the cabbage white butterfly (Pieris brassicae), the closely related protein insecticyanin from Manduca sexta (Tobacco hawkmoth) and the lobster protein crustacyanin. Like other members of the family, they bind small molecules, and gain their colourant properties from interaction with their ligands. Crustacyanin (meaning `shell blue') is the general name given to the
carotenoprotein complex found in the epicuticle, or calcified outer layer,of the lobster carapace. It acts as the dominant pigment of the lobster
shell, giving rise to its characteristic blue colour. In solution, crustacyanin exists as an equilibrium mixture between several distinct
forms, differing in their physical and spectral properties. The native, blue form (alpha-crustacyanin), which predominates in vivo, will, at low
ionic strength, form alpha'-crustacyanin; this in turn changes to purplebeta-crustacyanin on standing. The alpha to alpha' transition is favoured
by low ionic strength and is reversible, while conversion into beta-crustacyanin is irreversible. Native alpha-crustacyanin is an octamer of
heterodimers, totalling 16 separate polypeptide chains, each dimer binding two molecules of astaxanthin, beta-crustacyanin corresponding to the free
heterodimer.
Transmembrane (TMEM)-176A and 176B proteins are closely related to MS4A (membrane-spanning 4-domains subfamily A) proteins [
]. Their levels are significantly elevated in certain cancers [,
]. TMEM176B (LR8, Torid, Clast1) is broadly expressed, but was up regulated in antigen presenting cells in a rat model of allograft tolerance []. Their role in the immune system is unclear.
This family includes type III secretion system (T3SS) chaperone proteins similar to Salmonella enterica SicP. In S. enterica, many of its serovars being serious human pathogens, the T3SS allows injection of the effector SptP, a virulence protein that is involved in bacterial invasion into a host cell. Chaperone SicP forms a complex with SptP at an early stage of the effector protein secretion process in order to avoid premature degradation; also, the complex is dissociated at a late stage to secrete only SptP with the help of the ATPase InvC which is part of the related T3SS injectisome [
,
,
].
RELT-like protein 2 (RELL2) belongs to the RELT family. Overexpression of RELL2 induces activation of MAPK14/p38 cascade [
].RELT (receptor expressed in lymphoid tissues) is a member of the TNFR superfamily. The messenger RNA of RELT is especially abundant in hematologic tissues such as spleen, lymph node, and peripheral blood leukocytes as well as in leukemias and lymphomas. RELT is able to activate the NF-kappaB pathway and selectively binds tumor necrosis factor receptor-associated factor 1 [
]. RELT like proteins 1 and 2 (RELL1 and RELL2) are two RELT homologues that bind to RELT. The expression of RELL1 at the mRNA level is ubiquitous, whereas expression of RELL2 mRNA is more restricted to particular tissues [].
RELT-like protein 1 (RELL1) belongs to the RELT family. Its overexpression induces activation of MAPK14/p38 cascade [
].RELT (receptor expressed in lymphoid tissues) is a member of the TNFR superfamily. The messenger RNA of RELT is especially abundant in hematologic tissues such as spleen, lymph node, and peripheral blood leukocytes as well as in leukemias and lymphomas. RELT is able to activate the NF-kappaB pathway and selectively binds tumor necrosis factor receptor-associated factor 1 [
]. RELT like proteins 1 and 2 (RELL1 and RELL2) are two RELT homologues that bind to RELT. The expression of RELL1 at the mRNA level is ubiquitous, whereas expression of RELL2 mRNA is more restricted to particular tissues [].
Adenoviruses are responsible for diseases such as pneumonia, cystitis, conjunctivitis and diarrhoea, all
of which can be fatal to patients who are immunocompromised []. Viral infection commences with recognition of host cell receptors by means of specialised proteins on viral surfaces. The adenovirus
fibre protein `knob domain' at the C terminus is one such receptor-binding protein subunit. The crystal structure of the knob domain reveals a trimeric organisation, each subdomain folded into 2 functionally
distinct β-sheets. The V sheet is highly conserved, and provides contact surfaces in the formation of the trimer, while the R sheet is more variable, and may play a role in viral-receptor interactions. The
overall shape of the trimer resembles a 3-bladed propeller, with a central surface depression and 3 valleys formed by the symmetry-related R sheets. Sequence comparison of different types of adenovirus fibre protein
suggests an overall similarity in the structure of the knob domain. The main conserved regions lie in the central surface depression around the 3-fold symmetry axis [
]. The N terminus of the proteincontains the 'shaft' region.
This family consists of several PV-1 (PLVAP) proteins, which seem to be specific to vertebrates. PV-1 is a component of the endothelial fenestral and stomatal diaphragms [
,
]. PV-1 is retained on the cell surface of enddotelial cells by structures capable of forming diaphragms, but undergoes rapid internalization and degradation in the absence of these structures [].
This family consists of several nuclear disruption (Ndd) proteins from T4-like phages. Early in a Bacteriophage T4 infection, the phage ndd gene causes the rapid destruction of the structure of the Escherichia coli nucleoid. The targets of Ndd action may be the chromosomal sequences that determine the structure of the nucleoid [
].
This entry represents Mad1 and Cdc20-bound-Mad2 binding proteins that are involved in the cell-cycle surveillance mechanism called the spindle checkpoint [
]. This mechanism monitors the proper bipolar attachment of sister chromatids to spindle microtubules and ensures the fidelity of chromosome segregation during mitosis. A key player in mitosis is Mad2, which exhibits an unusual two-state behaviour. A Mad1-Mad2 core complex recruits cytosolic Mad2 to kinetochores through Mad2 dimerisation and converts Mad2 to a conformer amenable to Cdc20 binding. p31comet inactivates the checkpoint by binding to Mad1- or Cdc20-bound Mad2 in such a way as to stop Mad2 activation and to promote the dissociation of the Mad2-Cdc20 complex [].
Tra is a member of the regulatory pathway controlling female somatic sexual differentiation, regulated by Sxl. It activates dsx female-specific splicing by promoting the formation of a splicing enhancer complex which consists of tra, tra2 and sr proteins [
].
This family consists of several P-47 proteins from various Clostridium species [
] as well as related sequences from other bacteria. The function of this family is unknown.
This family represents a group of bacterial proteins that are required for the rotation of the flagellar motor, which probably forms a transmembrane proton channel used to energize the flagellar rotary motor. This entry includes MotA and related proteins, such as PomA and LafT [
,
]. These are integral membrane proteins that contains four transmembrane domains.
NIMIN-2 is a member of a novel family of proteins from Arabidopsis (also consisting of NIMIN-1 and NIMIN-3) that interact with NPR1/NIM1, a key regulator of systemic acquired resistance in plants [
].
The PAC (PALE CRESS) protein is required for leaf and chloroplast development [
]. PAC mutation arrests chloroplast development at an early stage, affecting the abundance and maturation of specific chloroplast-encoded transcripts. The PAC protein may be a nucleus-encoded factor that functions in plastid mRNA maturation and accumulation [].
The meiosis-specific kinetochore factor Meikin plays a crucial role in both mono-orientation and centromeric cohesion protection during meiosis I, partly by stabilizing the localization of the cohesin protector shugoshin, and also by recruiting Polo-like kinase PLK1 to the kinetochores. PLK1 is required for mono-orientation and the protection of centromeric cohesion [].
Centromere protein R (CENP-R, also known as NRIF3) is a transcription co-regulator that can have both co-activator and co-repressor functions [
,
]. It is involved in the co-activation of nuclear receptors for retinoid X (RXRs) and thyroid hormone (TRs) in a ligand-dependent fashion [,
]. It is a probable component of a centromeric complex involved in assembly of kinetochore proteins, mitotic progression and chromosome segregation [].
This family consists of several Enterobacterial FlhE flagellar proteins. The absence of FlhE results in a proton leak through the flagellar system, inappropriate secretion patterns, and cell death. FlhE is a member of the flhBAE operon. FlhA and FlhB are established components of the flagellar type III secretion system. However, the function of FlhE is not clear [
].
This family consists of several animal Gemin6 proteins. The exact function of Gemin6 is unknown but it has been found to form part of the Survival of motor neuron complex. The SMN complex plays a key role in the biogenesis of spliceosomal small nuclear ribonucleoproteins (snRNPs) and other ribonucleoprotein particles [
].
This family of conserved hypothetical proteins has no known function. Homologous proteins related to MJ0570 of Methanocaldococcus jannaschii (Methanococcus jannaschii) include both the apparent orthologs in the family, the much longer protein YLR143W from Saccharomyces cerevisiae (Baker's yeast), and second homologous proteins from Archaeoglobus fulgidus and Pyrococcus horikoshii that appear to represent a second orthologous group.
Many bacterial pathogens deliver effector proteins into host cells via a type III secretion system. These effector proteins then alter the host cell's biology in ways that are advantageous to the pathogen. The NleG protein and its homologues form the largest family of effector proteins in the enterohemorrhagic Escherichia coli O157:H7, with 14 members identified in the Sakai strain alone [
]. NleG family members share a conserved C-terminal domain that forms a structure similar to the RING finger/U-box domain found in eukaryotic ubiquitin ligases []. They selectively interact with human E2 ubiquitin conjugating enzymes and exhibit in vitro activity typical of eukaryotic E3 ligases, though the role of this activity in pathogenesis is not yet known.
This small protein, designated YqfC in Bacillus subtilis, is both restricted to and universal in sporulating species of the Firmcutes, such as Bacillus subtilis and Clostridium perfringens. It is part of the sigma(E)-controlled regulon, and its mutation leads to a sporulation defect [
]. This protein is uncharacterized.
Transmembrane protein PVRIG, also known as CD112 receptor (CD112R), is the cell surface receptor for NECTIN2 (CD antigen CD112). CD112R functions as a coinhibitory receptor for T cells, competing with CD226 to bind to CD112 [
].
sefABC genes make up part of a complex sef operon responsible for the expression and assembly of SEF14 fimbriae. sefA encodes a fimbrin, the structural subunit of SEF14 fimbriae [
]. Possession of SEF14 fimbriae alone do not appear to play a significant role in the pathogenesis of Salmonella enteritidis [].This family also includes adhesin CS22 and CS15 (antigen 8786) which share homology with fimbria SEF14 of Salmonella enteritidis [
].
Competence is the ability of a cell to take up exogenous DNA from its environment, resulting in transformation. It is widespread among bacteria and is probably an important mechanism for the horizontal transfer of genes. DNA usually becomes available by the death and lysis of other cells. Competent bacteria use components of extracellular filaments called type 4 pili to create pores in their membranes and pull DNA through the pores into the cytoplasm. This process, including the development of competence and the expression of the uptake machinery, is regulated in response to cell-cell signalling and/or nutritional conditions [
].This family consists of several bacterial ComK proteins. ComK of Bacillus subtilis is a positive autoregulatory protein occupying a central position in the competence-signal-transduction network. It positively regulates the transcription of late competence genes, which specify morphogenetic and structural proteins necessary for construction of the DNA-binding and uptake apparatus, as well as the transcription of comK itself [
,
]. ComK specifically binds to the promoters of the genes that it affects. It has been found that ClpX plays an important role in the regulation of ComK at the post-transcriptional level [].
This entry represents the C-terminal domain of protein UNC80 from eukaryotes, a component of the NALCN sodium channel complex. NALC is a cation voltage-independent channel activated by substance P, neurotensin, acetylcholine and noradrenaline that controls neuronal excitability. UNC80 forms a complex with UNC79 and both are key regulators of the channel and required for the proper expression and axonal localisation of NALCN. UNC80 is required for NALCN control by GPCRs and essential for its sensitivity to extracellular calcium. This protein acts as a scaffold for Src family of tyrosine kinases (SFK) and UNC-79 to mediate interaction with NALCN [
,
,
].
This entry includes chromodomain-helicase-DNA-binding protein 8 (CHD8) from animals. CHD8 regulates transcription. It acts as a repressor in several pathways: repressing transcription by remodeling chromatin structure (binding to histone H3 di- and trimethylated on Lys4) and recruiting histone H1 to target genes [
]; suppressing p53/TP53-mediated apoptosis by recruiting histone H1 and preventing p53/TP53 transactivation activity []; and a negative regulator of Wnt signaling pathway, regulating beta-catenin activity []. By interacting with CTCF, it is involved in epigenetic remodeling []. CHD8 can also act as a transcription activator, participating in efficient U6 RNA polymerase III transcription by interacting with ZNF143. Gene knockout is embryonically lethal []. CHD8 has been linked to autism spectrum disorder (ASD) [].
Colicins, which are produced by bacteria carrying the corresponding Col plasmids, kill sensitive Escherichia coli cells using different mechanisms. Colicin E5 is a tRNase toxin. The immunity protein ImmE5 is a specific inhibitor of colicin E5 that is expressed to protect the host cells. It binds to E5 C-terminal ribonuclease domain (CRD) to prevent cell death [
,
].
Atg3 is the E2 enzyme for the LC3 lipidation process [
]. It is essential for autophagocytosis. The super protein complex, the Atg16L complex, consists of multiple Atg12-Atg5 conjugates. Atg16L has an E3-like role in the LC3 lipidation reaction. The activated intermediate, LC3-Atg3 (E2), is recruited to the site where the lipidation takes place []. Atg3 catalyses the conjugation of Atg8 and phosphatidylethanolamine (PE). Atg3 has an α/β-fold, and its core region is topologically similar to canonical E2 enzymes. Atg3 has two regions inserted in the core region and another with a long α-helical structure that protrudes from the core region as far as 30 A [
]. It interacts with Atg8 through an intermediate thioester bond between Cys-288 and the C-terminal Gly of Atg8. It also interacts with the C-terminal region of the E1-like Atg7 enzyme.
Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites [
,
]. About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome [
,
].This entry consists of several eukaryotic mitochondrial 28S ribosomal protein S30 (or programmed cell death protein 9 PDCD9) sequences. The exact function is unknown although it is known to be a component of the mitochondrial ribosome and a component in cellular apoptotic signalling pathways []. The entry also contains the mitochondrial 39S ribosomal protein L37 and 39S ribosomal protein S30.
This family of proteins is found in bacteriophages such as Bacteriophage phiMH2K and Bacteriophage Chp1. Proteins in this family are typically between 81 and 96 amino acids in length. These proteins appear to have a role in phage packaging.
TGDs are trigalactosyldiacylglycerol proteins required for chloroplast membrane lipid synthesis. The TGD1, -2, and -3 proteins form a putative ABC (ATP-binding cassette) transporter transporting ER-derived lipids through the inner envelope membrane of the chloroplast, while TGD4 binds phosphatidic acid and resides in the outer chloroplast envelope [
].This entry represents TGD4, which is involved in endoplasmic reticulum-to-chloroplast lipid trafficking [
].
This family includes keratin-associated protein 7-1, a member of the type 7 family. In the hair cortex, hair keratin intermediate filaments are embedded in an interfilamentous matrix, consisting of hair keratin-associated proteins (KRTAP), which are essential for the formation of a rigid and resistant hair shaft through their extensive disulphide bond cross-linking with abundant cysteine residues of hair keratins. The matrix proteins include the high-sulphur and high-glycine-tyrosine keratins.Other groups of keratin-associated proteins include
and
.
Thioredoxins (Trxs) are ubiquitous enzymes with a CXXC active site that catalyses the reduction of disulfide bonds. This entry represents a group of plant thioredoxin-like proteins, including AtCDSP32 from Arabidopsis and OsCDSP32 from rice. AtCDSP32 includes two Trx modules with one potential active site (219)CGPC(222) and three extra Cys, this region is responsible for the insulin reduction activity of the protein [
]. It forms a heterodimeric complex with MSRB1 (methionine sulfoxide reductases B 1) via reduction of the sulfenic acid formed on MSRB1 catalytic Cys after methionine sulfoxide reduction [].Thioredoxins [
,
,
,
] are small disulphide-containing redox proteins that have been found in all the kingdoms of living organisms. Thioredoxin serves as a general protein disulphide oxidoreductase. It interacts with a broad range of proteins by a redox mechanism based on reversible oxidation of two cysteine thiol groups to a disulphide, accompanied by the transfer of two electrons and two protons. The net result is the covalent interconversion of a disulphide and a dithiol. In the NADPH-dependent protein disulphide reduction, thioredoxin reductase (TR) catalyses the reduction of oxidised thioredoxin (trx) by NADPH using FAD and its redox-active disulphide; reduced thioredoxin then directly reduces the disulphide in the substrate protein [].Thioredoxin is present in prokaryotes and eukaryotes and the sequence around the redox-active disulphide bond is well conserved. All thioredoxins contain a cis-proline located in a loop preceding β-strand 4, which makes contact with the active site cysteines, and is important for stability and function [
]. Thioredoxin belongs to a structural family that includes glutaredoxin, glutathione peroxidase, bacterial protein disulphide isomerase DsbA, and the N-terminal domain of glutathione transferase []. Thioredoxins have a beta-alpha unit preceding the motif common to all these proteins.A number of eukaryotic proteins contain domains evolutionary related to thioredoxin, most of them are protein disulphide isomerases (PDI). PDI (
) [
,
,
] is an endoplasmic reticulum multi-functional enzyme that catalyses the formation and rearrangement of disulphide bonds during protein folding [
]. All PDI contains two or three (ERp72) copies of the thioredoxin domain, each of which contributes to disulphide isomerase activity, but which are functionally non-equivalent []. Moreover, PDI exhibits chaperone-like activity towards proteins that contain no disulphide bonds, i.e. behaving independently of its disulphide isomerase activity []. The various forms of PDI which are currently known are:PDI major isozyme; a multifunctional protein that also function as the beta subunit of prolyl 4-hydroxylase (
), as a component of oligosaccharyl transferase (
), as thyroxine deiodinase (
), as glutathione-insulin transhydrogenase (
) and as a thyroid hormone-binding protein
ERp60 (ER-60; 58 Kd microsomal protein). ERp60 was originally thought to be a phosphoinositide-specific phospholipase C isozyme and later to be a protease.ERp72.ERp5.Bacterial proteins that act as thiol:disulphide interchange proteins that allows disulphide bond formation in some periplasmic proteins also contain a thioredoxin domain. These proteins include:Escherichia coli DsbA (or PrfA) and its orthologs in Vibrio cholerae (TtcpG) and Haemophilus influenzae (Por).E. coli DsbC (or XpRA) and its orthologues in Erwinia chrysanthemi and H. influenzae.E. coli DsbD (or DipZ) and its H. influenzae orthologue.E. coli DsbE (or CcmG) and orthologues in H. influenzae.Rhodobacter capsulatus (Rhodopseudomonas capsulata) (HelX), Rhiziobiacae (CycY and TlpA).
This entry represents F-box proteins SNE and GID2. In Arabidopsis, SNE (SNEEZY or SLY2, At5g48170) and its closest homologue GID2 (also known as SLEEPY1, SLY1) are the F-box subunits of a Skp-Cullin-F-box (SCF) E3 ubiquitin ligase complex that positively regulates the gibberellin (GA) signaling pathway. SCF interacts with its substrates, the DELLA proteins, to promote their ubiquitination and degradation, and mediate GA responses [
]. SNE over-expression can partially compensate sly1-10 mutant phenotype of dwarfism []. It may function as a redundant positive regulator of GA signaling [].
This family of baculovirus proteins is represented by Autographa californica nuclear polyhedrosis virus (AcMNPV) protein AC11 (Orf11). ac11 is an early gene essential for budded-virus production and occlusion-derived-virus envelopment [
].
This entry includes the transmembrane protein 126 A/B (TMEM126A/B) from animals. Human TMEM126B participates in constructing the membrane arm of mitochondrial respiratory complex I [
].
This family consists of several bacteriophage T4-like capsid assembly (or portal) proteins. The exact mechanism by which the double-stranded (ds) DNA bacteriophages incorporate the portal protein at a unique vertex of the icosahedral capsid is unknown. In phage T4, there is evidence that this vertex, constituted by 12 subunits of gp20, acts as an initiator for the assembly of the major capsid protein and the scaffolding proteins into a prolate icosahedron of precise dimensions. The regulation of portal protein gene expression is an important regulator of prohead assembly in bacteriophage T4 []. This family represents the protease responsible for the proteolysis of head proteins, a critical step in the morphogenesis of many tailed phages, Cleavage facilitates the conversion of the prohead to the mature capsid. All these cleavages are carried out by action at consensus S/A/G-X-E recognition sequences at 39 cleavage sites. Evidence of multiple processing sites in nine phiKZ proteins appears to represent a built-in mechanism by which the phage ensures that the majority of the propeptide regions are removed, and emphasizes the essential nature of processing in phiKZ-head morphogenesis []. The family is classified by MEROPS as a serine peptidase.
This is a family of conserved poxvirus proteins related to Protein B14 and Protein B22/C16
from Vaccinia virus. B14 contributes to virulence by binding to the host IKBKB subunit of the IKK complex and preventing host NF-kappa-B activation in response to pro-inflammatory stimuli such as TNF-alpha or IL1B. Mechanistically, it sterically hinders the direct contact between the kinase domains of IKBKB in the IKK complex containing IKBKB, CHUK/IKKA and NEMO [,
,
].
This family consists of several invasion associated locus B (IalB) proteins and related sequences. IalB is known to be a major virulence factor in Bartonella bacilliformis where it was shown to have a direct role in human erythrocyte parasitism. IalB is up-regulated in response to environmental cues signalling vector-to-host transmission. Such environmental cues would include, but not be limited to, temperature, pH, oxidative stress, and haemin limitation. It is also thought that IalB would aide B. bacilliformis survival under stress-inducing environmental conditions [].