Histone-lysine N-methyltransferase ATXR3, also known as protein SET DOMAIN GROUP 2 (SDG2), is a histone methyltransferase specifically required for trimethylation of Lys-4 of histone H3 (H3K4me3) and is crucial for both sporophyte and gametophyte development [
,
]. This domain includes the POST_SET domain located at the C ternimus of SDG2, although there is no significant sequence homology with other Arabidopsis SDGs outside of the SET domain [
].
Proline reductase-associated electron transfer protein PrdC
Type:
Family
Description:
Members of this family are encoded near the prdA and prdB genes for proteins of the proline reductase complex. They are induced by proline, and have been designated as PrdC [
]. Some members are selenoproteins (at two different positions), as is PrdB. Members are homologous to, but distinct from, electron transport protein RnfC [].
This entry includes type 2A encapsulin shell protein SrpI from Synechococcus elongatus and its homologues from Mycobacteria, such as
from M. leprae, which has been described as the most antigenic protein in human leprosy patients and useful for serodiagnosis of leprosy [
,
,
]. SrpI forms a nanocompartment that is probably involved in sulfur metabolism [].Some members of this entry contain an N-terminal cyclic nucleotide binding domain. This entry also includes proteins annotated as phage capsid proteins, such as
, suggesting that in bacteria this protein is a co-opted phage protein. There is a clear evolutionary relationship between the bacterial encapsulins and Caudovirales bacteriophages as they share the HK97 fold of the capsid proteins.
The plant high mobility group (HMG) proteins can be grouped into two families: HMGB (HMG 1/2) and HMGA (HMG I/Y). This entry represents a subset of the HMGB family from plants, including Arabidopsis HMGB1-5. HMGB proteins bind to chromosomal DNA via their DNA binding motif, a HMG box, and induce structural changes of chromatin [
].
This superfamily represents a repeated domain found in the amino terminal half of the Major Vault Protein (MVP). Vaults are one of the largest cytoplasmic ribonucleoprotein particles, found in numerous eukaryotic species. The cellular function carried out by these ribonucleoproteins remains unclear, although roles in multidrug resistance and innate immunity have been suggested []. The primary sequence of MVP consists of seven short sequence repeats of ~55 amino acid residues followed by a central region and a coiled-coil protein association domain, which mediates the interactions between MVP molecules [].
Nitrogen regulation protein NtrY-like, N-terminal domain
Type:
Domain
Description:
This entry represents the probable sensor region of the two component regulator protein NtrY, predominantly found in proteobacteria [
,
]. This region includes four probable transmembrane helices.
CcbP is a Ca(2+) binding protein found in bacteria which is thought to bind Ca(2+) by protein surface charge. When bound to Ca(2+), the protein becomes more compact and the level of free calcium decreases. Within the bacteria, Ca(2+) has a role in the early stages of heterocyst differentiation. The free Ca(2+) concentration which is regulated by CcbP is critical for the differentiation process []. Calcium signalling is widespread in bacterial species, and prokaryotic cells like eukaryotes are equipped with all the elements to maintain Ca2+ homeostasis [].This superfamily represents a domain of 5 anti-parallel beta strands arranged into a beta barrel from the calcium binding protein CcbP. The 2nd and 3rd strand are linked by a short 3(10)-helix. The 4th and 5th strand are also linked by a short 3(10)-helix.
NosL is one of the accessory proteins of the nos (nitrous oxide reductase) gene cluster. NosL is a monomeric protein of 18,540 MW that specifically and stoichiometrically binds Cu(I). The copper ion in NosL is ligated by a Cys residue, and one Met and one His are thought to serve as the other ligands. It is possible that NosL is a copper chaperone involved in metallocentre assembly [
].This entry also contains HTH-type transcriptional repressors, including YcnK. YcnK may act as a negative transcriptional regulator of YcnJ in the presence of copper and may use copper as a corepressor. The gene, ycnK, is significantly induced under copper-limiting conditions [
].
BTB/POZ domain-containing protein KCTD18, C-terminal domain
Type:
Domain
Description:
This entry represents the C-terminal region of BTB/POZ domain-containing protein KCTD18. This domain is highly variable among the BTB domain-containing KCTD proteins and it has been proposed that it could be used to bind and recruit diverse cellular proteins destined for different processes such as ubiquitination and degradation or regulation of G-protein coupled receptors [
,
,
].
Stealth_CR2 is the second of several highly conserved regions on stealth proteins in metazoa and bacteria. There are up to four CR regions on all member proteins. CR2 carries a well-conserved NDD sequence-motif. The domain is found in tandem with CR1, CR3 and CR4 on both potential metazoan hosts and pathogenic eubacterial species that are capsular polysaccharide phosphotransferases. The CR domains appear on eukaryotic proteins such as GNPTAB, N-acetylglucosamine-1-phosphotransferase subunits alpha/beta. Horizontal gene-transfer seems to have occurred between host and bacteria of these sequence-regions in order for the bacteria to evade detection by the host innate immune system [
].
This is a family of proteins conserved in yeasts. Saccharomyces cerevisiae Aim21 may be involved in mitochondrial migration along actin filament []. It may also interact with ribosomes [].
Contractile injection systems (CISs) are cell-puncturing nanodevices that share ancestry with contractile tail bacteriophages. This entry represents a group of proteins found mostly in bacteria as part of the Photorhabdus virulence cassette (PVC), one group of extracellular CISs [
]. This protein is the tape measure which dictates the length of the syringe tail. It is found in low copy number and is most likely not present in the mature syringes. Deletion of the tape measure protein Pvc14 from P. asymbiotica resulted in large variations in the length of the syringes [
]. The N and C-terminal domains of these proteins are well conserved, but they have a wide length variation in the central region.
DNA-dependent protein kinase catalytic subunit, CC5
Type:
Domain
Description:
This entry represents the C-terminal region of the Circular Cradle segment (CC) of the DNA-dependent protein kinase catalytic subunit (DNA-PKcs), containing most of the supersecondary structure CC4 and the complete CC5. Structural studies revealed that this domain contains the Ku-binding site B and one of the caspase-3 cleavage sites [
].DNA-dependent protein kinase catalytic subunit (DNA-PKcs) is involved in DNA nonhomologous end joining (NHEJ), which is recruited by Ku70/80 heterodimer to DNA ends and required for double-strand break (DSB) repair. DNA-PKcs phosphorylates a number of protein substrates, including the heat shock protein 90 (HSP90), the transcription factors p53, specificity protein 1 (Sp1) and MYC23, and a majority of NHEJ factors. It folds into three well-defined large structural units, consisting of a N-terminal region (arranged in four supersecondary α-helical structures, N1 to N4), the Circular Cradle (consisting of five supersecondary α-helical structures CC1 to CC5), and the C-terminal Head (comprising FAT, FRB, kinase, and FATC). The N-terminal and CCs regions resemble HEAT repeats, and thus, they are also referred to as N-HEAT and M-HEAT ('middle'), respectively. The N-terminal region likely mediates DNA binding and, together with the CCs, forms a ring through which Ku70/80 may present DNA for repair. The CCs form a curved elliptical ring that serves as a scaffold to maintain the integrity of the whole complex. It has been suggested that the binding of Ku or DNA activates the allosteric mechanism required for communication between the N terminus and the CC with the kinase in the Head [
,
,
,
].
Protein phosphatase 1 regulatory subunit 35 has been identified as a PP1 interacting protein [
] and shown to inhibit PP1-alpha phosphatase activity [].
This domain is known as VMAP-M0. Most members of this entry belong to the class Actinomycetia. This domain represents a highly variable central region of the vWA-MoxR associated protein (VMAP) of the classical ternary system (vWA-MoxR-VMAP) in NTP-dependent conflict systems. Using sequence similarity-based clustering it has been possible to identify 29 distinct versions named VMAP-M0-28, that may be involved in sensing of invasive entities [
]. It is predicted to adopt an all-α-helical secondary structure.
This family contains several mammalian ectropic viral integration site 2A (EVI2A) proteins. The function of this protein is unknown although it is thought to be a membrane protein and may function as an oncogene in retrovirus induced myeloid tumours [
,
].
Acidic fibroblast growth factor intracellular-binding protein
Type:
Family
Description:
Acidic fibroblast growth factor (aFGF) intracellular binding protein (FIBP) is a protein found mainly in the nucleus that is thought to be involved in the intracellular function of aFGF [
].
This entry represents a family of plant proteins that contain a C3H1-type zinc finger, including the Zinc finger CCCH domain-containing protein 13 from Oryza sativa subsp. japonica (OsC3H13, also known as the BRI1-kinase domain-interacting protein 105) and the Zinc finger CCCH domain-containing protein 40 from Arabidopsis thaliana (AtC3H40).
Retrograde transport protein Dsl1 N-terminal domain
Type:
Domain
Description:
Dsl1 is a peripheral membrane protein, part of the DSL1 complex, required for transport between the Golgi and the endoplasmic reticulum [
]. It is localised to the ER membrane, and bridges the other two subunits of the complex, Tip20 and Sec39. DSL1 complex, together with conserved oligomeric Golgi (COG), Golgi-associated retrograde protein (GARP), and the exocyst, is a member of the complexes associated with tethering containing helical rods (CATCHR) family, which are evolutionary related and share structural features []. This complex interacts at its base with ER SNAREs and at its tip with COPI vesicles, thus, it captures Golgi-derived COP-I vesicles and facilitates their fusion with the ER the Dsl1 complex, upstream of SNARE complex formation [,
,
]. The C-terminal domain is involved in binding to the Sec39 subunit of the Dsl1p complex [,
]. The N-terminal complexes with another subunit of the Dsl1p complex called Tip20 which forms heterodimers by pairing the N termini of each protein [].
Retrograde transport protein Dsl1, C-terminal domain
Type:
Domain
Description:
Dsl1 is a peripheral membrane protein part of the DSL1 complex, required for transport between the Golgi and the endoplasmic reticulum [
,
]. It is localised to the ER membrane, and bridges the other two subunits of the complex, Tip20 and Sec39. DSL1 complex, together with conserved oligomeric Golgi (COG), Golgi-associated retrograde protein (GARP), and the exocyst, is a member of the complexes associated with tethering containing helical rods (CATCHR) family, which are evolutionary related and share structural features []. This complex interacts at its base with ER SNAREs and at its tip with COPI vesicles, thus, it captures Golgi-derived COP-I vesicles and facilitates their fusion with the ER the Dsl1 complex, upstream of SNARE complex formation [,
,
]. The C-terminal domain is involved in binding to the Sec39 subunit of the Dsl1p complex [,
]. The N-terminal complexes with another subunit of the Dsl1p complex called Tip20 which forms heterodimers by pairing the N termini of each protein [].
The myelin sheath is a multi-layered membrane, unique to the nervous system, that functions as an insulator to greatly increase the velocity of axonal impulse conduction [
]. Myelin proteolipid protein (PLP or lipophilin) [] is the major myelin protein from the central nervous system (CNS). It probably plays an important role in the formation or maintenance of the multilamellar structure of myelin. In man point mutations in PLP are the cause of Pelizaeus-Merzbacher disease (PMD), a neurologic disorder of myelin metabolism. In animals dismyelinating diseases such as mouse 'jimpy' (jp), rat md, or dog 'shaking pup' are also caused by mutations in PLP.PLP is a highly conserved [
] hydrophobic protein of 276 to 280 amino acids which seems to contain four transmembrane segments, two disulphide bonds and which covalently binds lipids (at least six palmitate groups in mammals) [].PLP is highly related to M6, a neuronal membran glycoprotein [
].
Cysteine-rich and transmembrane domain-containing protein 1-like
Type:
Family
Description:
This family includes a group of eukaryotic proteins containing the uncharacterised cysteine-rich, transmembrane (TM) module, CYSTM, found in a wide range of tail-anchored membrane proteins, which may play a role in stress tolerance [
]. This entry includes CYSTM1 from animals, whose function is not clear, the uncharacterized protein YBR016W from yeast and AthCYSTM10 (also known as WINDHOSE 3) from Arabidopsis. AthCYSTM10 is involved in resistance to abiotic stress [].
Guanine nucleotide-binding protein subunit gamma 1/2
Type:
Family
Description:
This protein family represents Guanine nucleotide-binding protein subunit gamma 1 from Arabidopsis thaliana (GG1) and Guanine nucleotide-binding protein subunit gamma 2 (GG2) from Oryza sativa. These G proteins are involved as modulators or transducers in various transmembrane signaling systems [
]. Members of this family are only found in streptophytes.In Arabidopsis thaliana GG1 and GG2 are functionally interchangeable in specific pathways [
].
Protein of unknown function DUF3556, transmembrane
Type:
Family
Description:
This family of transmembrane proteins is functionally uncharacterised. This protein is found in bacteria. Proteins in this family are typically between 576 to 592 amino acids in length.
The Bacteriophage T4 gene 59 helicase assembly protein (Gp59) is required for recombination-dependent DNA replication and repair, which is the predominant mode of DNA replication in the late stage of T4 infection. Gp59 accelerates the loading of the T4 gene 41 helicase during DNA synthesis by the T4 replication system in vitro. This protein binds to both T4 gene 41 helicase and T4 gene 32 single-stranded DNA binding protein, and to single and double-stranded DNA [
].The structure of Gp59 helicase assembly protein reveals a novel α-helical bundle fold with two domains of similar size. Surface residues are predominantly basic (pI 9.37) with clusters of acidic residues but exposed hydrophobic residues suggest sites for potential contact with DNA and with other protein molecules [
]. The N-terminal domain shares structural homology with the high mobility group (HMG) proteins from eukaryotic organisms and it has been suggested that it plays a role in duplex DNA binding ahead of the fork. The C-terminal domain interacts with the helicase (T4 gp41) and with SSB (single-stranded binding protein T4 gp32) [].
Kinetochore protein Cenp-F/LEK1, Rb protein-binding domain
Type:
Domain
Description:
This entry represents the Rb protein-binding domain from the centromere protein Cenp-F. Cenp-F is a centromeric kinetochore, microtubule-binding protein consisting of two 1,600-amino acid-long coils, that is involved in chromosome segregation during mitosis and is essential for the full functioning of the mitotic checkpoint pathway [
,
]. Cenp-F interacts with retinoblastoma protein (RB), CENP-E and BUBR1. This domain is at the very C terminus of the C-terminal coiled-coil region, and binds to the Rb family of tumour suppressors [].
CRISPR-associated protein Cas6, N-terminal domain superfamily
Type:
Homologous_superfamily
Description:
The CRISPR-Cas system is a prokaryotic defense mechanism against foreign genetic elements. The key elements of this defense system are the Cas proteins and the CRISPR RNA. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are a family of DNA direct repeats separated by regularly sized non-repetitive spacer sequences that are found in most bacterial and archaeal genomes [
]. CRISPRs appear to provide acquired resistance against mobile genetic elements (viruses, transposable elements and conjugative plasmids). CRISPR clusters contain sequences complementary to antecedent mobile elements and target invading nucleic acids. CRISPR clusters are transcribed and processed into CRISPR RNA (crRNA).
The defense reaction is divided into three stages. In the adaptation stage, the invader DNA is cleaved, and a piece of it is selected to be integrated as a new spacer into the CRISPR locus, where it is stored as an identity tag for future attacks by this invader. During the second stage (the expression stage), the CRISPR RNA (pre-crRNA) is transcribed and subsequently processed into the mature crRNAs. In the third stage (the interference stage), Cas proteins, together with crRNAs, identify and degrade the invader [
,
,
].The CRISPR-Cas systems have been sorted into three major classes. In CRISPR-Cas types I and III, the mature crRNA is generally generated by a member of the Cas6 protein family. Whereas in system III the Cas6 protein acts alone, in some class I systems it is part of a complex of Cas proteins known as Cascade (CRISPR-associated complex for antiviral defense). The Cas6 protein is an endoribonuclease necessary for crRNA production whereas the additional Cas proteins that form the Cascade complex are needed for crRNA stability [
]. Cas6 is a member of the RAMP (repeat-associated mysterious protein) superfamily [
]. It is among the most widely distributed Cas proteins and is found in both bacteria and archaea []. Cas6 functions in the generation of CRISPR-derived guide RNAs for invader defense in prokaryotes []. The structure of this protein showed it adopts a tandem ferredoxin/RRM fold [].This superfamily represents the N-terminal domain of Cas6 proteins, which has a ferredoxin-like fold [].
Replication initiator protein RctB, central region
Type:
Domain
Description:
This region is found in the replication initiator protein RctB from Vibrio cholerae
and similar sequences predominantly found in Vibrionales. RctB mediates oriCII-based replication and other regulatory processes. This protein is organised into four
domains and adopts a dimeric configuration, which occurs with the interface in one of the two central domains: a seven-stranded β-sheet form a swapped configuration with the same domain of the other monomer. It seems to be involved in DNA binding [].
This entry represents the PorV protein. The Bacteroidete T9SS contains at least 15 proteins. PorQ, PorU, PorV and PorZ form a cell surface-exposed attachment complex that proteolytically removes the substrate CTD following transport and covalently links some substrate proteins to a lipopolysaccharide anchor. A pool of PorV proteins in the outer membrane is thought to shuttle substrate proteins from the translocon to the attachment complex. The PorV protein has a membrane beta barrel topology [
].
In the nematode Caenorhabditis elegans, RNA interference (RNAi) silencing signals are efficiently taken up from the environment and transported between cells and tissues [
]. SID-5 is a C. elegans endosome-associated protein that promotes the transport of RNAi silencing signals between cells [].
Mitochondria fission protein Fis1, cytosolic domain
Type:
Domain
Description:
Fis1, along with Dnm1 and Mdv1, is an essential protein in mediating mitochondrial fission. Dnm1 and Fis1 are highly conserved, with a common mechanism in disparate species. In mutants of these proteins, mitochondrial fission is impaired, resulting in networks of undivided mitochondria. Fis1 appears to act via the recruitment of division complexes to the mitochondrial outer membrane. Dnm1 is recruited to mitochondria via interactions with the adaptor proteins Caf4 and Mdv1, which bind directly to Fis1 [
,
].The Fis1 N terminus is cytosolic and tethered to the mitochondrial outer membrane via a C-terminal transmembrane domain. The cytosolic domain of Fis1 forms a six-helix bundle, in which the central four helices consist of two tandem tetratricopeptide repeat (TPR)-like motifs, known to mediate protein-protein interactions [
,
,
,
].
This is the N-terminal domain of Stage II sporulation protein E (SpoIIE), which includes the 10 transmembrane domains and the regulatory domain, the latter located adjacent the PP2C phosphatase domain. The regulatory domain mediates the formation of dimers between SpoIIE molecules, required for phosphatase activation [
]. An N-terminal 45-residue long α-helix makes intramolecular contacts with the switch helices (alpha1 and alpha2) of the phosphatase domain [].
BTB/POZ domain-containing protein KCTD11/21, C-terminal domain
Type:
Domain
Description:
This entry represents the C-terminal region of BTB/POZ domain-containing protein KCTD11 (the KCASH1 protein, also known as REN: EGF-, NGF- and retinoic acid-responsive gene) and its homologue, KCTD21 (the KCASH2 protein).These members of the KCASH family (KCTD containing, Cullin3 adaptor, suppressor of Hedgehog) share high similarity and a BTB domain responsible for oligomerization and formation of a Cullin3 ubiquitin ligase complex. They are specifically expressed in brain and cerebellum where they have a role in neuronal differentiation, and suppressor activity on acetylation-dependent Hedgehog/Gli signalling through ubiquitination and degradation of histone deacetylase 1 (HDAC1). These proteins have been described as tumor suppressors that inhibit medulloblastoma growth by antagonizing Hedgehog signalling [
,
,
].
Baseplate structural protein Gp11, C-terminal domain
Type:
Homologous_superfamily
Description:
The bacteriophage baseplate controls host cell recognition, attachment, tail sheath contraction and viral DNA ejection. The baseplate is a multi-subunit assembly at the distal end of the tail, which is composed of long and short tail fibres [
]. The tail region is responsible for attachment to the host bacteria during infection: long tail fibres enable host receptor recognition, while irreversible attachment is via short tail fibres. Recognition and attachment induce a conformational transition of the baseplate from a hexagonal to a star-shaped structure. In viruses such as Bacteriophage T4, Gp11 acts as a structural protein to connect the short tail fibres to the baseplate, while Gp9 connects the baseplate with the long tail fibres. Both Gp9 and Gp11 are trimers. Each Gp11 monomer consists of three domains, which are entwined together in the trimer: the N-terminal domains of the three monomers form a central, trimeric, parallel coiled coil surrounded by the entwined middle finger domains; the C-terminal domains appear to be responsible for trimerisation [].This superfamily represents the C-terminal domain of Gp11.
F-actin-capping protein subunit beta, N-terminal domain
Type:
Homologous_superfamily
Description:
The F-actin capping protein (CP) is a heterodimer composed of two unrelated alpha and beta subunits. Neither of the subunits shows sequence similarity to other filament-capping proteins [
]. This superfamily represents the N-terminal domain of the beta subunit of F-actin capping protein. It is part of a protein of about 280 amino acid residues whose sequence is well conserved in eukaryotic species []. In Drosophila, mutations in the alpha and beta subunits cause actin accumulation and subsequent retinal degeneration [].
Non-structural protein NSP15, middle domain superfamily
Type:
Homologous_superfamily
Description:
The unique coronavirus transcription/replication machinery comprised of multiple virus-encoded non structural proteins (NSP) plays a vital role during initial and intermediate phases of the viral life cycle. NSP15 forms a hexamer made of dimers of trimers which is suggested to be a functional unit, responsible for the endoribonuclease activity. The NSP15 monomer consists of three domains: N-terminal, middle and C-terminal [
,
]. The catalytic function of NSP15 resides in the C-terminal NendoU domain. The active site carries six key residues conserved among SARS-CoV-2, SARS-CoV and MERS-CoV, suggesting that its activity is important for sustained replication in the host [,
].This entry represents the NSP15 middle domain, which is made up of a major mixed β-sheet, three small ones and two short α-helices [
].
Mitogen-activated protein kinase kinase kinase, N-terminal
Type:
Domain
Description:
This domain represents the regulatory N-terminal non-catalytic region of Mitogen-activated protein kinase kinase kinase 4 (MEKK4), which contains a putative PH-domain. This N-terminal region acts as an autoinhibitory domain that must be dissociated from the catalytic domain to allow the activation of these kinases [
,
,
].
EGR1 is a transcriptional regulator that binds specifically to 9-bp target sequences containing two CpG sites that can potentially be methylated at four cytosine bases [
].
Gas vesicles are small, hollow, gas filled protein structures found in several cyanobacterial and archaebacterial
microorganisms []. They allow the positioning of the bacteria at the favourable depth for growth.Gas vesicles are hollow cylindrical tubes, closed by a hollow, conical cap at each end. Both the conical end
caps and central cylinder are made up of 4-5 nm wide ribs that run at right angles to the long axis of thestructure. Gas vesicles seem to be constituted of two different protein components, GVPa and GVPc. GVPa, a
small protein of about 70 amino acid residues, is the main constituent of gas vesicles and form the essentialcore of the structure. The sequence of GVPa is extremely well conserved. GvpJ and GvpM, two proteins encoded
in the cluster of genes required for gas vesicle synthesis in the archaebacteria Halobacterium salinarium andHalobacterium mediterranei (Haloferax mediterranei), have been found [
] to be evolutionary related to GVPa. The exact functionof these two proteins is not known, although they could be important for determining the shape determination
gas vesicles. The N-terminal domain of Aphanizomenon flos-aquae protein GvpA/J is also related to GVPa.This entry represents conserved sequence regions located in the N- and C-terminal domains of these proteins.
Ectoine/hydroxyectoine ABC transporter, permease protein EhuD
Type:
Family
Description:
Members of this entry are presumed to act as permease subunits of ectoine ABC transporters. Operons containing this gene also contain other genes of the ABC transporter and are typically found next to either ectoine utilization or ectoine biosynthesis operons.
Ectoine/hydroxyectoine ABC transporter, permease protein EhuC
Type:
Family
Description:
ABC transporters belong to the ATP-Binding Cassette (ABC) superfamily, which uses the hydrolysis of ATP to energise diverse biological systems. ABC transporters minimally consist of two conserved regions: a highly conserved ATP binding cassette (ABC) and a less conserved transmembrane domain (TMD). These can be found on the same protein or on two different ones. Most ABC transporters function as a dimer and therefore are constituted of four domains, two ABC modules and two TMDs.ABC transporters are involved in the export or import of a wide variety of substrates ranging from small ions to macromolecules. The major function of ABC import systems is to provide essential nutrients to bacteria. They are found only in prokaryotes and their four constitutive domains are usually encoded by independent polypeptides (two ABC proteins and two TMD proteins). Prokaryotic importers require additional extracytoplasmic binding proteins (one or more per systems) for function. In contrast, export systems are involved in the extrusion of noxious substances, the export of extracellular toxins and the targeting of membrane components. They are found in all living organisms and in general the TMD is fused to the ABC module in a variety of combinations. Some eukaryotic exporters encode the four domains on the same polypeptide chain [
].Members of this family are presumed to act as permease subunits of ectoine ABC transporters. Operons containing this gene also contain other genes of the ABC transporter and are typically found next to either ectoine utilization or ectoine biosynthesis operons. Permease subunits EhuC and EhuD are homologues.
This entry represents Tail tip protein (L) from Bacteriophage lambda and similar proteins found in tailed bacteriophages (Caudovirales) and prophages mostly from Proteobacteria. L is part of the distal tail tip complex which plays a role in DNA ejection during entry, and in tail assembly initiation during exit. The tail tip complex is assembled successively with three tail tip proteins J, one tail tip protein I, one tail tip protein L and one tail tip protein K. The tail tip complex interacts with tail measure protein to initiate tail tube assembly. The formation of the tail tip complex is completed by the addition of tail tip protein M, which is followed by tail tube polymerisation [
,
].
Bacteriophage lambda, Replication protein O, N-terminal
Type:
Domain
Description:
This entry represents the N-terminal domain of Replication protein O from Bacteriophage lambda and similar proteins found in tailed bacteriophages and bacterial prophages. O is necessary for the bidirectional replication of DNA. It interacts with the ori (origin of replication) region of the genome during the initiation of replication [
,
,
].
This group of sequences represent the acyl carrier protein (gamma subunit) of the holoenzyme citrate lyase (
) composed of alpha (
), beta (
), and acyl carrier protein subunits in a stoichiometric relationship of 6:6:6. Citrate lyase is an enzyme which converts citrate to oxaloacetate. In bacteria, this reaction is involved in citrate fermentation. The acyl carrier protein covalently binds the coenzyme of citrate lyase. The set contains an experimentally characterised member from Leuconostoc mesenteroides [
]. The sequences come from a wide range of Gram-positive bacteria. For Gram-negative bacteria, it appears that only sequences from the gamma proteobacteria are included.
Malonate decarboxylase/citrate lyase acyl carrier protein
Type:
Family
Description:
This family consists of the acyl carrier protein found in malonate decarboxylase and citrate lyase. This subunit has the same covalently bound prosthetic group, derived from and similar to coenzyme A, as does citrate lyase, although this protein and the acyl carrier protein of citrate lyase do not show significant sequence similarity. Both malonyl and acetyl groups are transferred to the prosthetic group for catalysis.
Growth arrest/ DNA-damage-inducible protein-interacting protein 1
Type:
Family
Description:
Members of this family of proteins act as negative regulators of G1 to S cell cycle phase progression by inhibiting cyclin-dependent kinases. Inhibitory effects are additive with GADD45 proteins but occur also in the absence of GADD45 proteins. Furthermore, they act as a repressor of the orphan nuclear receptor NR4A1 by inhibiting AB domain-mediated transcriptional activity [
]. They may be involved in the hormone-mediated regulation of NR4A1 transcriptional activity.
This presumed domain is found in members of the TMEM132 family. TMEM132A may be involved in embryonic and postnatal brain development [
]. TMEM132D may be a marker for oligodendrocyte differentiation [].
Zinc-finger BED domain-containing protein 3 (Zbed3) is a zinc finger protein that binds to axin and activates the Wnt/beta-catenin signalling. Interaction with axin is important for Zbed3 to inhibit beta-catenin phosphorylation, hence inhibiting beta-catenin degradation [
]. Zbed3 has been associated with insulin resistance and type 2 diabetes mellitus in humans [,
].This entry includes ZBED2 and ZBED3. The function of ZBED2 is not clear.
This entry represents a family of proteins that appear to form a plug in the type 9 secretion system pore [
]. Members are predominantly found in Bacteroidetes.
Cysteine and histidine-rich domain-containing protein RAR1
Type:
Family
Description:
This entry represents a group of CHORD domain-containing proteins from plants, including RAR1 (At5g51700) from Arabidiopsis. RAR1 has two CHORD domains. RAR1 is required for several R gene-mediated resistance responses in monocotyledonous and dicotyledonous plant species against different pathogen classes [
,
].
CDC73 is an RNA polymerase II accessory factor [
], and forms part of the Paf1 complex that has roles in post-initiation events []. More specifically, crystal structure analysis shows the C terminus to be a Ras-like domain that adopts a fold that is highly similar to GTPases of the Ras superfamily. The C-terminal domain contains a large but comparatively flat surface of highly conserved residues, devoid of ligand. Deletion of the Cdc73 C-domain significantly reduces Paf1C occupancy on active genes, which means that the Cdc73 C-domain plays a role in promoting association of Paf1C with chromatin []. The canonical nucleotide binding pocket is altered in CDC73, and there is no nucleotide ligand, but it contributes to histone methylation and Paf1 complex (Paf1C) recruitment to active genes. Thus, together with Rtf1, it combines to couple Paf1C to elongating polymerase [
].
Tda11 was identified as a protein affecting topoisomerase I-induced DNA damage from a high-throughput screening [
]. It is a potential Cdc28p substrate []. Its function is not clear.
Altered inheritance of mitochondria protein 3 (Aim3) was first identified as an actin patch protein [
]. Later, it was found to inhibit barbed-end actin filament elongation []. It may be involved in mitochondrion organisation and biogenesis [].
This entry represents the first of several highly conserved regions on stealth proteins in metazoa and bacteria. There are up to four CR regions on all member proteins. CR1 carries a well-conserved IDVVYT sequence-motif. The domain is found in tandem with CR2, CR3 and CR4 on both potential metazoan hosts and pathogenic eubacterial species that are capsular polysaccharide phosphotransferases. The CR domains appear on eukaryotic proteins such as GNPTAB, N-acetylglucosamine-1-phosphotransferase subunits alpha/beta. Horizontal gene-transfer seems to have occurred between host and bacteria of these sequence-regions in order for the bacteria to evade detection by the host innate immune system [
].
This entry represents the third of several highly conserved regions on stealth proteins from animals and bacteria. There are up to four CR regions on all member proteins. The domain is found in tandem with CR1, CR2 and CR4 on both potential metazoan hosts and pathogenic eubacterial species that are capsular polysaccharide phosphotransferases. The CR domains appear on eukaryotic proteins such as GNPTAB, N-acetylglucosamine-1-phosphotransferase subunits alpha/beta. Horizontal gene-transfer seems to have occurred between host and bacteria of these sequence-regions in order for the bacteria to evade detection by the host innate immune system [
].
This entry represents a fourth conserved region on stealth proteins from animals and bacteria. There are four CR regions on mammalian members. CR4 carries a well-conserved CLND sequence-motif. The domain is found in tandem with CR1, CR2 and CR3 on both potential metazoan hosts and on pathogenic eubacterial species that are capsular polysaccharide phosphotransferases. The CR domains also appear on eukaryotic proteins such as GNPTAB, N-acetylglucosamine-1-phosphotransferase subunits alpha/beta. Horizontal gene-transfer seems to have occurred between host and bacteria of these sequence-regions in order for the bacteria to evade detection by the host innate immune system [
].
This entry represents the phage-like element PBSX protein from Firmicutes. PBSX protein forms a hole, or portal, that enables DNA passage during packaging and ejection. It also forms the junction between the phage head (capsid) and the tail proteins. It functions as a dodecamer of a single polypeptide of average molecular weight of 40-90kDa.
This entry represents the phage-like element PBSX protein from Proteobacteria. PBSX protein forms a hole, or portal, that enables DNA passage during packaging and ejection. It also forms the junction between the phage head (capsid) and the tail proteins. It functions as a dodecamer of a single polypeptide of average molecular weight of 40-90kDa.
The ubiquitous protein Ser/Thr phosphatase-1 (PP1) interacts with dozens of regulatory proteins that are structurally unrelated. This entry represents one of the PP1 interactors, protein phosphatase 1 regulatory subunit 32 (PPP1R32) [
].
This domain can be found in N-acetylglucosamine binding protein (GbpA) from Vibrio cholerae, a bacterial pathogen that colonizes the chitinous exoskeleton of zooplankton as well as the human gastrointestinal tract. GbpA binds to GlcNAc oligosaccharides. Structural comparison show that there are distant structural similarities between domain 2 of GbpA and the beta-domain of the flagellin protein p5. It is suggested that this domain interacts with the bacterial surface, and functions to project an alginate binding domain of the protein from the cell surface [
].
WD repeat and coiled-coil-containing protein family
Type:
Family
Description:
This family includes WD repeat and coiled-coil-containing protein (WDCP, previously known as C2orf44), which is found in eukaryotes and consists of around 721 amino acids. The N-terminal contains two WD (tryptophan-aspartic acid) repeats (WD1 and WD2). WD repeats may be involved in a range of biological functions including apoptosis, transcriptional regulation and signal transduction. The C-terminal contains a proline-rich sequence (PPRLPQR), and is predicted to have leucine-rich coiled coil region (CC) [
].WDCP was identified in a proteomic screen to find signalling components that interact with Hck (hematopoietic cell kinase), a non-receptor tyrosine kinase. WDCP was shown to bind tightly and specifically to the SH3 domain of Hck in U937 human monocytic cells. WDCP was also shown to exist as an oligomer when expressed in mammalian cells. While the function of WDCP is unknown, it has been identified in a gene fusion event with anaplastic lymphoma kinase (ALK) in colorectal cancer patients [
].
EsaE interacts with the type VII secretion system (T7SS) ATPase, EssC. It is required for efficient secretion of EsaD and at least one further T7SS substrate. It is located both in the cytoplasm and cell membrane [
].
Cell-traversal protein for ookinetes and sporozoites
Type:
Domain
Description:
Cell-traversal protein for ookinetes and sporozoites (CelTOS) is a conserved protein that is essential for traversal of malaria parasites in both the mosquito vector and human host and is therefore critical for malaria transmission and disease pathogenesis. It specifically binds phosphatidic acid commonly present within the inner leaflet of plasma membranes, and potently disrupts liposomes composed of phosphatidic acid by forming pores. CelTOS resembles class I viral membrane fusion glycoproteins and a bacterial pore-forming toxin with roles in membrane binding and disruption. CelTOS forms an alpha helical dimer that resembles a tuning fork. Structure analysis indicate that it has a distinct structural architecture with two subdomains that independently resemble membrane binding and/or disrupting proteins and could simultaneously act during disruption [
].
Serine/Threonine/Tyrosine Kinase found in polyvalent protein
Type:
Family
Description:
This is a family of protein kinases found in polyvalent proteins of phages and prophages that although preserving their active site residues for ATP-binding and phosphotransfer appear to have lost the C-terminal subdomain characteristic of this superfamily [].
FeS assembly protein IscA, plant/cyanobacteria type
Type:
Family
Description:
This entry represents a group of iron-sulfur assembly proteins mainly from plants and cyanobacteria, including AtCPISCA (AT1G10500,
) from Arabidopsis. AtCPISCA has homology to bacterial IscA and SufA proteins that have a scaffold function during Fe-S cluster formation. It may serve as a scaffold in chloroplast Fe-S cluster assembly [
].This clade is distinctive from the proteobacteria clade shown in
.
Iron-sulphur (FeS) clusters are important cofactors for numerous proteins involved in electron transfer, in redox and non-redox catalysis, in gene regulation, and as sensors of oxygen and iron. These functions depend on the various FeS cluster prosthetic groups, the most common being [2Fe-2S] and [4Fe-4S][
]. FeS cluster assembly is a complex process involving the mobilisation of Fe and S atoms from storage sources, their assembly into [Fe-S]form, their transport to specific cellular locations, and their transfer to recipient apoproteins. So far, three FeS assembly machineries have been identified, which are capable of synthesising all types of [Fe-S] clusters: ISC (iron-sulphur cluster), SUF (sulphur assimilation), and NIF (nitrogen fixation) systems.The ISC system is conserved in eubacteria and eukaryotes (mitochondria), and has broad specificity, targeting general FeS proteins [
,
]. It is encoded by the isc operon (iscRSUA-hscBA-fdx-iscX). IscS is a cysteine desulphurase, which obtains S from cysteine (converting it to alanine) and serves as a S donor for FeS cluster assembly. IscU and IscA act as scaffolds to accept S and Fe atoms, assembling clusters and transferring them to recipient apoproteins. HscA is a molecular chaperone and HscB is a co-chaperone. Fdx is a [2Fe-2S]-type ferredoxin. IscR is a transcription factor that regulates expression of the isc operon. IscX (also known as YfhJ) appears to interact with IscS and may function as an Fe donor during cluster assembly [
].The SUF system is an alternative pathway to the ISC system that operates under iron starvation and oxidative stress. It is found in eubacteria, archaea and eukaryotes (plastids). The SUF system is encoded by the suf operon (sufABCDSE), and the six encoded proteins are arranged into two complexes (SufSE and SufBCD) and one protein (SufA). SufS is a pyridoxal-phosphate (PLP) protein displaying cysteine desulphurase activity. SufE acts as a scaffold protein that accepts S from SufS and donates it to SufA [
]. SufC is an ATPase with an unorthodox ATP-binding cassette (ABC)-like component. SufA is homologous to IscA [], acting as a scaffold protein in which Fe and S atoms are assembled into [FeS]cluster forms, which can then easily be transferred to apoproteins targets.
In the NIF system, NifS and NifU are required for the formation of metalloclusters of nitrogenase in Azotobacter vinelandii, and other organisms, as well as in the maturation of other FeS proteins. Nitrogenase catalyses the fixation of nitrogen. It contains a complex cluster, the FeMo cofactor, which contains molybdenum, Fe and S. NifS is a cysteine desulphurase. NifU binds one Fe atom at its N-terminal, assembling an FeS cluster that is transferred to nitrogenase apoproteins [
]. Nif proteins involved in the formation of FeS clusters can also be found in organisms that do not fix nitrogen [].
Telomere repeats-binding bouquet formation protein 2
Type:
Family
Description:
Telomere repeats-binding bouquet formation protein 2 (TERB2) is a meiosis-specific telomere-associated protein involved in meiotic telomere attachment to the nucleus inner membrane, a crucial step for homologous pairing and synapsis. It is a component of the MAJIN-TERB1-TERB2 complex, which promotes telomere cap exchange by mediating attachment of telomeric DNA to the inner nuclear membrane and replacement of the protective cap of telomeric chromosomes: in early meiosis, the MAJIN-TERB1-TERB2 complex associates with telomeric DNA and the shelterin/telosome complex. During prophase, the complex matures and promotes release of the shelterin/telosome complex from telomeric DNA [
].
tRNA selenocysteine 1-associated protein 1, C-terminal
Type:
Domain
Description:
This entry represents the C-terminal region of selenocysteine tRNA 1 associated proteins (also known as TRNAU1AP), which is involved in the early steps of selenocysteine biosynthesis and tRNA(Sec) charging to the later steps resulting in the cotranslational incorporation of selenocysteine into selenoproteins. Selenium deficiency results in a variety of diseases, including cardiac disease [
]. TRNAU1AP contains RNA recognition motifs (RRM) and a Tyr-rich region found in the C-terminal. The Tyr-rich region (amino acids 185-225) is conserved among several mammals, including human, chimp, dog, cattle, mouse and rat. Furthermore, constitutive deletion of exons corresponding to the Tyr-rich region in mouse resulted in embryonic lethality [
].
Periplasmic flagella are the organelles of spirochete mobility, and are structurally different from the flagella of other motile bacteria. They reside inside the cell within the periplasmic space, and confer mobility in viscous gel-like media such as connective tissue [
]. The flagella are composed of an outer sheath of FlaA proteins and a core filament of FlaB proteins. Each species usually has several FlaA protein species [].
The SD domain is found N-terminal to the homeobox domain in the protein SIX1. As a transcription factor, SIX1 lacks intrinsic activation domains and thus needs to bind to the EYA family of co-factors in order to mediate transcriptional activation. The SD domain is necessary for this protein-protein interaction, binding to the C-terminal region of EYA (Eyes absent homologue proteins) [
].
Trafficking protein particle complex subunit 2-like
Type:
Family
Description:
The function of TRAPPC2L (trafficking protein particle complex subunit 2-like) is not known, but evidence suggests that it is part of the TRAPP II complex required for distinct tethering events at Golgi membranes, playing a role in vesicular transport from endoplasmic reticulum to Golgi [
]. Mutations in the TRAPPC2L gene cause an autosomal recessive disease characterised by encephalopathy with episodic rhabdomyolysis [].
This domain is found in the N-terminal of fungal cell division control protein 13 (Cdc13), which is a single-stranded telomere binding protein required for telomere length regulation and genome stability. This domain contains an OB fold (known as OB1), which has weak affinity for long single-strand telomere DNA and is involved in Cdc13 dimerisation and recruitment of Pol alpha-primase to the telomeres for C-strand synthesis [
,
].
Hexon is a major coat protein found in various species-specific Adenoviruses, which are type II dsDNA viruses. Hexon coat proteins are synthesised during late infection and form homo-trimers. The 240 copies of the hexon trimer that are produced are organised so that 12 lie on each of the 20 facets. The central 9 hexons in a facet are cemented together by 12 copies of polypeptide IX. The penton complex, formed by the peripentonal hexons and base hexon (holding in place a fibre), lie at each of the 12 vertices []. The hexon coat protein is a duplication consisting of two domains with a similar fold packed together like the nucleoplasmin subunits. Within a hexon trimer, the domains are arranged around a pseudo 6-fold axis. The domains have a β-sandwich structure consisting of 8 strands in two sheets with a jelly-roll topology; each domain is heavily decorated with many insertions [].The major capsid proteins (vp54 and vp72) found in Iridoviruses, Phycodnaviruses, Asfarviruses and Ascoviruses share the same structure. All these viruses are type II dsDNA viruses with no RNA stage. This is the most abundant structural protein and can account for up to 45% of virion protein [
]. The structure of vp54 has been determined from Paramecium bursaria Chlorella virus 1 (PBCV-1), a very large icosahedral virus containing an internal membrane enclosed within a glycoprotein coat. The vp54 protein is a duplication consisting of two domains with a similar fold packed together like the nucleoplasmin subunits. The vp54 protein forms a trimer, where the domains are arranged around a pseudo 6-fold axis. The domains have a β-sandwich structure consisting of 8 strands in two sheets with a jelly-roll topology [].Similarly, the major capsid protein p3 from Bacteriophage PRD1 also adopts a similar double-barrel structure comprising two eight-stranded viral β-barrels or jelly rolls, each of which contains a 12-residue α-helix. This protein then trimerises through a 'trimerisation loop' sequence, and is incorporated within the viral capsid [
]. This entry represents a structural domain consisting of a β-sandwich structure containing 8 strands in two sheets with a jelly-roll topology. This domain is found several type II dsDNA viruses, including Adenoviruses, Iridoviruses, Phycodnaviruses, Asfarviruses, Ascoviruses, and Bacteriophage PRD1.
Csm2 (chromosome segregation in meiosis protein 2) is a component of the Shu complex (also known as PCSS complex) involved in the error-free DNA post-replication repair (PRR) [
]. Psy3 forms a complex first with Cms2, and their L2 loops confer the DNA-binding activity to the Shu complex [,
]. The Shu complex binds to recombination sites and is required for Rad51 assembly and function during meiosis. Psy3-Csm2 constitutes a core sub-complex that can stabilise the Rad51-single-stranded DNA complex independently of nucleotide cofactor [].
This family of proteins with no known function is conserved from nematodes to humans. It includes members of the TMEM200 family of transmembrane proteins.
This entry represents a domain consisting of twelve helices that fold into a compact structure that contains the overall structural scaffold observed in other regulator of G protein signalling (RGS) proteins and three additional helical elements that pack closely to it. Helices 1-9 comprise the RGS fold, in which helices 4-7 form a classic antiparallel bundle adjacent to the other helices. Like other RGS structures, helices 7 and 8 span the length of the folded domain and form essentially one continuous helix with a kink in the middle. Helices 10-12 form an apparently stable C-terminal extension of the structural domain, and although other RGS proteins lack this structure, these elements are intimately associated with the rest of the structural framework by hydrophobic interactions. This domain binds to active G-alpha proteins, promoting GTP hydrolysis by the alpha subunit of heterotrimeric G proteins, thereby inactivating the G protein and rapidly switching off G protein-coupled receptor signalling pathways [
]. This RGS-like domain is found in Rho guanine nucleotide exchange factors (RhoGEF) such as Pdz-RhoGEF [
] and p115RhoGEF [].
This entry represents a domain found in the putative pilus assembly protein RcpC and RcpC-like protein CpaB. RcpC is a bacterial protein expressed from the tight-adherence tad locus. The Tad machine assembles type IVb pili, and RcpC could have a role modifying the Flp pilin [
]. It is an auxillary protein that sits in the inner membrane and interacts with TadB and TadZ, an AAA ATPase []. A recent study has identified two tandem β-clip domains in RcpC95. beta-Clip domains are known to interact with carbohydrate moieties in other systems [].
SNF1-related protein kinase regulatory subunit beta
Type:
Family
Description:
This family represents the regulatory beta subunits of the plant SNF1-related protein kinase (SnRK) complex, probably a trimeric complex consisting of an alpha catalytic subunit, and a beta and a gamma non-catalytic regulatory subunits [
]. In Arabidopsis, there are two isoforms of the catalytic subunit (AKIN10 and AKIN11), three beta subunits (AKINbeta1, AKINbeta2, and AKINbeta3) and one gamma subunit (AKINg) [,
]. The SnRK complex in higher plants plays a role in signal transduction cascades regulating gene expression, carbohydrate and nitrogen metabolism [
]. Some known SnRK1 substrates are key metabolic enzymes such as sucrose phosphate synthase, nitrate reductase, and HMG-CoA reductase [], while others are transcription factors []. Its physical interaction with nitrate reductase specifically occurs via the AKINbeta1 subunit [].Beta subunits show conserved domains in different organisms. These domains include the Association with the SNF1 Complex (ASC) domain, which is important for interaction with the alpha and gamma subunits, and a specialised carbohydrate-binding module named the Glycogen Binding Domain (GBD). In Arabidopsis, only AKINbeta1 and AKINbeta2 exhibit both the GBD and the ASC domains [
,
]. AKINbeta3 could be a plant specific subunit [].
This entry represents the N-terminal domain of FliN. FliN is one of three proteins that form a switch-complex at the base of the basal body of the flagellum; the switch regulates the flagellum-motor [
].
Uncharacterised protein family DM11, Drosophila melanogaster
Type:
Family
Description:
This family consists of a group of Drosophila melanogaster proteins of unknown function. A number of these
proteins are expressedin nonneuronal auxiliary cells within two nested subsets of chemosensory sensilla on the
front legs of male D. melanogaster, suggesting they may be involvedin pheromone response [
]. These proteins are predicted to have a single transmembrane domainat their amino terminus, probably to serve as a signal peptide, suggesting that they are soluble and secreted.
Bardet-Biedl syndrome 5 protein/sex-determination protein fem-3
Type:
Family
Description:
Bardet-Biedl syndrome 5 protein (BBS5) is part of the BBSome complex that may function as a coat complex required for sorting of specific membrane proteins to the primary cilia [
]. Mutations in the BBS5 gene cause Bardet-Biedl syndrome 5, which is a syndrome characterised by usually severe pigmentary retinopathy, early-onset obesity, polydactyly, hypogenitalism, renal malformation and mental retardation [,
]. Also included in this family is fem-3, required for male development in Caenorhabditis elegans. Together with fem-2, fem-3 associates with the CBC(fem-1) E3 ubiquitin-protein ligase complex which mediates the ubiquitination and subsequent proteasomal degradation of tra-1, a transcription factor that is the terminal effector of the sex-determination pathway [
].
Peptidoglycan recognition protein family domain, metazoa/bacteria
Type:
Domain
Description:
This domain is found in a family of animal peptidoglycan recognition proteins homologous to Bacteriophage T3 lysozyme [
], and some bacterial homologues. The bacteriophage molecule, but not its moth homologue, has been shown to have N-acetylmuramoyl-L-alanine amidase activity. One member of this family, Tag7, is a cytokine [].
Nonstructural protein 15, middle domain, coronavirus
Type:
Domain
Description:
The unique coronavirus transcription/replication machinery comprised of multiple virus-encoded non structural proteins (NSP) plays a vital role during initial and intermediate phases of the viral life cycle. NSP15 forms a hexamer made of dimers of trimers which is suggested to be a functional unit, responsible for the endoribonuclease activity. The NSP15 monomer consists of three domains: N-terminal, middle and C-terminal [
,
]. The catalytic function of NSP15 resides in the C-terminal NendoU domain. The active site carries six key residues conserved among SARS-CoV-2, SARS-CoV and MERS-CoV, suggesting that its activity is important for sustained replication in the host [,
].This entry represents the non-catalytic middle domain from NSP15. This domain is formed by ten β-strands organised into threeβ-hairpins. It creates concave surfaces that may serve as interaction hubs with other proteins and RNA [
,
].