Skip to Main content
Codon Usage
Codon usages are often biased for guanine and cytosine in the third position and there are no introns within protein-coding genes.
From: Reference Module in Biomedical Sciences, 2014
Related terms:
Amino Acid
Enzyme
Protein
Mutation
DNA
Codon
Transfer RNA
Messenger RNA
Escherichia coli
Bacterium
View all Topics
Codon Usage
Raimi M. Redwan, ... Ranjeev Hari, in Encyclopedia of Bioinformatics and Computational Biology, 2019
Abstract
Codon usage in many organisms is known to be nonrandom with evolutionary pressures such as natural selection and mutation biases at play. Choice of codon is also learnt to be different across taxonomic groups but would be similar in the same species. Especially, the variation is governed by the third letter degeneracy, which allows for redundancy of the codon being used that allows for unique pattern of codon usage. Here, we explore the various factors affecting codon usage and bioinformatics software applications that assist in its analysis. Furthermore, we also discuss briefly the future outlook where codon usage may be exploited in the fields of synthetic and systems biology.
View chapterPurchase book
Synthetic Biology, Part A
Thorsten Heidorn, ... Peter Lindblad, in Methods in Enzymology, 2011
3.2.3 Protein coding sequences: Codon usage and gene optimization
Cyanobacterial codon usage is often similar to that of other bacteria, such as E. coli, and in many strains there are few or no strongly unfavored codons. Nevertheless, among the model strains, the unicellular strains tend to have more codons that are used with a frequency below 10% for a specific amino acid than do the filamentous strains. Especially strains of Synechococcus typically have several codons used at very low frequency (based on codon tables from The Kazusa Codon Usage Database; Nakamura et al., 2000).
To improve expression of a gene in a specific cyanobacterium, codon usage may be adjusted to better fit the codon usage of the intended expression host strain. In a study using Synechocystis PCC 6803, heterologous expression of the plant enzyme isoprene synthase, under control of the psbA2 promoter, was enhanced about 10-fold after adjustment of the entire gene sequence to a Synechocystis codon usage (Lindberg et al., 2010).
View chapterPurchase book
Synthetic Biology, Part B
Randall A. Hughes, ... Andrew D. Ellington, in Methods in Enzymology, 2011
6.1 Codon optimization
Synonymous codon usage varies by organism; this phenomenon has been reviewed elsewhere both in terms of evolutionary implications (Hershberg and Petrov, 2008) and in terms of implications for heterologous expression of proteins (Gustafsson et al., 2004). The process by which an amino acid sequence is rendered as a DNA sequence with codon usage suitable to a given organism is known as codon optimization. At the most basic level, an amino acid sequence can be reverse-translated using highly utilized codons for an expression host, and this is almost automatically implemented for E. coli in many DNA editing programs. More sophisticated decision trees regarding codon-use can also be implemented; for example, next-best codons may be introduced to avoid creating undesirable elements in the DNA sequence (restriction sites, transcription terminators, inverted repeats). For example, the JCat Web site (www.jcat.de; Grote et al., 2005) uses codon adaptation index (CAI) values (Sharp and Li, 1987) to optimize codon usage for a wide range of prokaryotic hosts and a handful of eukaryotes, but can also take into account these additional DNA sequence features. Similarly, the OPTIMIZER website (http://genomes.urv.es/OPTIMIZER/; Puigbo et al., 2007) uses information from a database of genes that are predicted to be highly expressed (Puigbo et al., 2008) to optimize codon usage.
Several groups have subjected sequence-based expression optimization strategies to extensive experimental tests. Kudla et al. generated a synthetic library of 154 iso-coding GFP genes with random silent mutations. There proved to be no correlation between CAI and GFP fluorescence, but instead the greatest expression of fluorescence stemmed from the occurrence of weak RNA structures in the first 28 bases of the open reading frame (Kudla et al., 2009). Welch et al. (2009) constructed 40 iso-coding variants for two genes using a Monte Carlo approach that should have explored a wide range of parameters thought to affect expression (secondary structure, GC content, codon frequency; Welch et al., 2009). Based on the observed expression levels, they concluded that the use of codons which corresponded to tRNAs which were well charged during amino acid depletion led to optimal expression, regardless of CAI values, AT content, or secondary structure, although they also noted some effect of 5′ RNA structures. These authors have developed a codon-usage model based on this empirical optimization, and this has been commercialized through the company DNA2.0. Finally, Allert et al. (2010) examined 816 complete bacterial genomes for CAI values, AT content, and RNA secondary structure at several positions within open reading frames. They found that AT content at the 5′ and 3′ ends of genes was significantly higher than in the middle and used a Monte Carlo approach to explore whether this skewing impacted the expression of 285 synthetic variants of three genes. Indeed increasing the AT content at the extremes (particularly at the 5′end) was shown to increase expression levels of targeted protein sequences (Allert et al., 2010).
Taken together, these studies clearly indicate that merely recoding a gene using commonly used host codons will not maximize expression. Removing RNA structure (often by enhancing AT content) is also likely to be important. That said the derivation of rules for the expression of synthetic genes is still in its infancy. Calculating and comparing the codon usages of the high-expressing variants from Allert et al. with the high-expressing codon usages from Welch et al. would be illuminating, at least for E. coli.
View chapterPurchase book
Codon Usage Bias
P.M. Sharp, in Encyclopedia of Genetics, 2001
Applications
Gene Identification
Information on the codon usage profile of a species can be applied in genome sequencing projects to assess whether an open reading frame is indeed likely to be gene. However, particularly in bacteria, mismatched codon bias may reflect the recent horizontal transfer of a gene from a species with different codon bias. In species where translational selection is effective it is possible to predict whether a gene is likely to be highly expressed.
Heterologous Gene Expression
Knowledge of codon bias may have applications in the field of biotechnology. Genes are often cloned and then inserted into another species for expression. The codon usage of a heterologous gene is often quite different from that of the host genome. Adjusting the codon usage of the foreign gene may enhance its expression, increasing the amount of protein product obtained. This effect has been reported a number of times, both in the case of heterologous expression of genes for protein production, and in the use of reporter genes such as that for jellyfish green fluorescent protein (GFP). However, contradictory reports exist. Because of the considerations discussed above concerning the manner in which optimal codon usage may be adaptive, it is quite surprising that optimizing codon usage can change the expression level of a single gene. It is possible that the effect is indirect, for example due to changes in mRNA structure or longevity. This area remains controversial and mysterious.
View chapterPurchase book
Codon Usage and Translational Selection
R. Hershberg, in Encyclopedia of Evolutionary Biology, 2016
Favored Codon Identity and Shifts in Codon Usage
Different organisms have distinct codon usages. Such variation in codon usage between organisms is likely to be contributed by differences in background substitution biases. At the same time, they are also likely to be the result of differences in the identities of the codons that are favored by selection within different organisms.
Variation in the identity of favored codons between organisms begs two questions: First, one must ask which codons are favored in each organism, and whether there are general rules determining the identities of favored codons. Second, the question arises of how changes are achieved in the identity of favored codons. After all, in order to obtain a change in codon usage one would have to change a very high number of sites within a genome. It is hard to imagine how such a change would be feasible in the face of selection to maintain codon usage. For a long time it was assumed that the identity of favored codons within each organism is a frozen accident. Meaning, the choice of tRNAs that are most abundant was more or less random and this drove a certain codon usage. Once this codon usage was established it was more or less frozen, since changing it would be very difficult in the face of selection. Changes in codon usage under this model would be made possible during long periods of relaxed selection (Bulmer, 1987).
Contrary to the idea that the choice of favored codon is random, we have demonstrated that in bacteria the identity of favored codons tracks the background substitution patterns of the genome (Hershberg and Petrov, 2009). In other words, in GC-rich bacteria favored codons will tend to be GC-rich, while they will be AT-rich in AT-rich bacteria. This leads to a trend, we have dubbed 'going with the flow.' According to this trend codon bias will always go in the direction of the background substitution biases of a bacterial genome, but to a more extreme extent. If the background substitution biases of a genome determine that this genome will be GC-rich, this will mean the synonymous sites of this genome will be even more GC-rich (Hershberg and Petrov, 2009; 2012). This can be seen by comparing the GC-content of synonymous sites to that of intergenic regions of the same genome (Figure 1).
It is important to note that there is no trivial a priori reason to expect that tRNA abundance and the background substitution biases of a genome would be linked in such a manner. However, it suggests another mechanism for changes in codon usage that does not require long periods of relaxed selection. Suppose an organism starts experiencing a shift in nucleotide substitution biases. (Note that our model does not claim to explain why such shifts would happen. However, clearly such shifts do occur, or all organisms would have similar nucleotide contents). At first, highly expressed genes will not change their nucleotide content alongside the rest of the genome, since they will be under selection to continue using the most favored codons, recognized by the most abundant tRNAs. Yet, the remainder of the genes which are not highly expressed will shift their nucleotide content and alongside it their codon usage. Once a sufficient amount of non-highly expressed genes are using codons that totally do not match the tRNA pool, it may become advantageous for the tRNAs matching the codons used most by this large group of genes to increase in abundance. After all, even though these genes are not highly expressed, there are many of them and if all of them are translated inefficiently it could reduce the global efficiency of translation and also lead to aggregation of misfolded proteins. Once these tRNAs increase in abundance it will allow the highly expressed genes to follow suit and also alter their codon usage. This may eventually result in a total shift in codon usage toward the direction determined by the genome's new background substitution biases (Hershberg and Petrov, 2009).
View chapterPurchase book
Calorimetry
Debashish Sahu, ... Scott A. Showalter, in Methods in Enzymology, 2016
2.1 Plasmid Construction
A synthetic gene with E. coli optimized codon usage and also encoding a nonnative N-terminal tryptophan was purchased from Geneart to encode FCP1 (residues 878–961 of the human polypeptide sequence). The DNA encoding this FCP1 construct was cloned into the pET-47b (Novagen) expression vector to create the plasmid pET-47b-FCP1W, which was transformed into E. coli BL21 (DE3) competent cells. The same procedure was repeated with an E. coli optimized synthetic gene encoding residues 434–517 of the human Rap74 polypeptide sequence, producing the plasmid pET-47b-Rap74.
View chapterPurchase book
Unique DNA
M. Ponomarenko, ... N. Kolchanov, in Brenner's Encyclopedia of Genetics (Second Edition), 2013
The 'Genomic Signature' of a Biological Species
In 1981, Ruth Nussinov discovered that the codon usage in protein genes varies from one species to the next. The molecular origin of this phenomenon was explained by Samuel Karlin in 1989: he substantiated, using the Monte Carlo method, a statistical significance of the correlation between the different abundance of synonymous codons and complementary palindromes in protein-encoding DNA, which also encodes the resistance of matrix RNAs to degradation during their translation into protein polypeptide chains.
In 1989, Edward Trifonov summarized all this into the hypothesis of quite diverse codes for genomic protein-encoding DNA: genetic, nucleosomal, mRNA-protecting, α-helix/β-strand-warranting, functional site, domain, globular, species-identifying, and so on. The results of their interplay are unique DNAs.
In 1995, Samuel Karlin discovered a statistically significant species specificity of nucleotide frequencies in full bacterial genomes and called this phenomenon the 'genomic signature' of biological species. In 1997, Nikolay Kolchanov found that the same is true of higher eukaryotes in the promoters of genes transcribed by RNA polymerase II. These particular promoters represent the regions of unique DNAs, which bind the initiation factors upstream of the transcription start sites of these genes.
View chapterPurchase book
Bacterial Genetics
David P. Clark, Nanette J. Pazdernik, in Molecular Biology (Second Edition), 2013
Whole Genome Sequencing of Bacteria
•
For most bacteria, genetic information has been gathered by sequencing the whole genome.
The first bacterium to have its entire genome sequenced was Haemophilus influenzae. Rapid advances in sequencing technology have led to many completed sequences for a wide range of bacteria, including many important pathogens.
Additionally, differences in nitrogenous base content of DNA molecules and codon usage frequencies indicate segments of the genome with foreign origins.
•
Genome specialization islands are blocks of contiguous genes usually with a "foreign" origin that perform some specialized function, such as virulence or biodegradation.
Comparisons of genome sequences between harmless and pathogenic bacterial relatives have indicated large sections of genes responsible for virulence. These are known as pathogenicity islands and are a specific type of specialization islands. Other "islands" might encode genes for the degradation of chemical pollutants, such as petroleum, herbicides, and industrial chemicals.
•
Whole bacterial genomes have been chemically synthesized and successfully inserted into bacterial cells.
The Venter Institute has performed some, once thought, impossible feats. These include the complete chemical manufacturing of a bacterium's genome, followed by whole genome transformation into a new cell.
View chapterPurchase book
Cellulases
Joana L.A. Brás, ... Carlos M.G.A. Fontes, in Methods in Enzymology, 2012
2.2 Producing synthetic dockerin or cohesin genes
Under some circumstances the lack of bacterial genomic DNA, or inappropriate codon usage for obtaining high levels of gene expression in E. coli, might entail the production of synthetic genes in addition to the strategy described in Section 2.1.
1.
Select the primary sequences of the required cohesin and dockerin genes and design the genes encoding the respective proteins with a codon usage that is compatible with high level of expression in E. coli (gene design might be performed using Gene Designer by DNA2.0; https://www.dna20.com/genedesigner2; Villalobos et al., 2006). This dedicated software excludes undesired internal restriction sites, repetitive regions, or putative regulatory sequences.
2.
Divide the designed gene into overlapping oligonucleotides (20 bp overlap and 40 bp in length). One can use a dedicated software to design primers with overlapping regions with similar melting temperatures (e.g., Gene2Oligo http://berry.engin.umich.edu/gene2oligo; Rouillard et al., 2004). Design upstream and downstream primers incorporating the engineered restriction sites that will be used for the subsequent cloning reactions.
3.
Assemble a 50-μl PCR reaction using 25 pmol of the upstream and downstream primers and 0.25 pmol of the internal primers.
4.
Perform the PCR reaction as described in Section 2.1 (point 2) using a proofreading thermostable DNA polymerase. Perform a standard PCR cycle using a 55 °C annealing temperature and an extension period of at least 1 min/kb.
5.
Check the result of the PCR reaction through agarose gel electrophoresis. Clone the PCR product of the estimated size into a blunt-ended vector as described above.
6.
Sequence the synthetic gene to confirm that no mutations have accumulated during the amplification.
Codon usage bias refers to the fact that different organisms have differences in the frequency of occurrence of synonymous codons in their coding DNA, meaning that some codons are rarely used while other codons are frequently used in a particular organism.
Other Bacteria
Similar observations have been made for the gram-positive bacterium Bacillus subtilis. For some amino acids, such as Phe, it is the same codon (UUC) that is translationally optimal, but for others, such as Leu, the identity of the optimal codon is different (Table 1), correlated with a change in the abundance of the respective tRNAs. Although the abundance of tRNA species has not been quantified in other bacteria, similar observations of strong codon usage bias, specifically in highly expressed genes, have been made in a number of other species. Within any species, the pattern of codon usage and the abundance of tRNA species can be viewed as a highly coadapted system.
Closely related species, such as E. coli and Salmonella typhimurium, generally have very similar patterns of codon usage because the influence are similar. Bacteriophages exploit the translation machinery of their hosts, and often have similar codon usage patterns to their hosts.
However, selected codon usage bias is not ubiquitous among bacteria. The human pathogen Helicobacter pylori does not exhibit preferentially biased codon usage in highly expressed genes. Also, in many species with extremely biased base compositions, such as Mycoplasma and Streptomyces (Table 1), or Borrelia which is A+T-rich overall and exhibits a strong skew to G+T on the leading strand, there is little evidence of differently biased codon usage in highly expressed genes. In these species natural selection on codon usage has not been effective.
Codon Usage Bias
Differences in codon usage bias may be helpful in identifying genes that have been acquired by horizontal gene transfer. We investigated whether genes in the A. fumigatus ergot cluster had codon usage bias more like that of other A. fumigatus genes or more like those in the C. purpurea ergot cluster. Seven A. fumigatus genes that have apparent homologs in the C. purpurea ergot alkaloid cluster (refer to Fig. 2.5, Table 2.1) were studied for biases in the third position of codon families. (The dioxygenase genes were omitted due to the lack of an open reading frame in A. fumigatus and the presence of two different versions in C. purpurea.) The only readily apparent codon usage bias in the A. fumigatus copies of the shared ergot cluster genes was toward G in the third position of codons that had a G versus A option only. This includes codons for glutamic acid, glutamine, lysine, and a subset of the leucine and arginine codons. This preference also was seen in the coding sequences of a random sampling of seven A. fumigatus genes not associated with the ergot alkaloid cluster. Conversely, the seven C. purpurea genes that have homologs in the A. fumigatus cluster did not display this same codon usage bias (Fig. 2.6). When the data for these five codon sets were pooled, the proportion of codons that contained a G in the third position was significantly lower (P<0.01; Tukey-Kramer HSD test) in the C. purpurea ergot cluster genes than in the A. fumigatus ergot cluster genes or the random sampling of A. fumigatus genes. The means for the two A. fumigatus gene sets were not significantly different. No other codon usage biases were apparent in any of the data sets. The G+C content for these same three sets of seven genes also was investigated, and no significant differences were identified. The data do not support a recent transfer of the ergot alkaloid biosynthesis genes between A. fumigatus and C. purpurea.
Skip to Main content

Codon Usage Bias
Codon usage bias refers to the fact that different organisms have differences in the frequency of occurrence of synonymous codons in their coding DNA, meaning that some codons are rarely used while other codons are frequently used in a particular organism.
From: Comprehensive Biotechnology (Second Edition), 2011
Related terms:
Primase
Transfer RNA
GC-content
Nested Gene
Mutation
Codon
Codon Usage
Escherichia coli
View all Topics
Codon Usage Bias
P.M. Sharp, in Encyclopedia of Genetics, 2001
Other Bacteria
Similar observations have been made for the gram-positive bacterium Bacillus subtilis. For some amino acids, such as Phe, it is the same codon (UUC) that is translationally optimal, but for others, such as Leu, the identity of the optimal codon is different (Table 1), correlated with a change in the abundance of the respective tRNAs. Although the abundance of tRNA species has not been quantified in other bacteria, similar observations of strong codon usage bias, specifically in highly expressed genes, have been made in a number of other species. Within any species, the pattern of codon usage and the abundance of tRNA species can be viewed as a highly coadapted system.
Closely related species, such as E. coli and Salmonella typhimurium, generally have very similar patterns of codon usage because the influence are similar. Bacteriophages exploit the translation machinery of their hosts, and often have similar codon usage patterns to their hosts.
However, selected codon usage bias is not ubiquitous among bacteria. The human pathogen Helicobacter pylori does not exhibit preferentially biased codon usage in highly expressed genes. Also, in many species with extremely biased base compositions, such as Mycoplasma and Streptomyces (Table 1), or Borrelia which is A+T-rich overall and exhibits a strong skew to G+T on the leading strand, there is little evidence of differently biased codon usage in highly expressed genes. In these species natural selection on codon usage has not been effective.
View chapterPurchase book
Integrative Plant Biochemistry
Daniel G. Panaccione, ... Christine M. Coyle, in Recent Advances in Phytochemistry, 2006
Codon Usage Bias
Differences in codon usage bias may be helpful in identifying genes that have been acquired by horizontal gene transfer. We investigated whether genes in the A. fumigatus ergot cluster had codon usage bias more like that of other A. fumigatus genes or more like those in the C. purpurea ergot cluster. Seven A. fumigatus genes that have apparent homologs in the C. purpurea ergot alkaloid cluster (refer to Fig. 2.5, Table 2.1) were studied for biases in the third position of codon families. (The dioxygenase genes were omitted due to the lack of an open reading frame in A. fumigatus and the presence of two different versions in C. purpurea.) The only readily apparent codon usage bias in the A. fumigatus copies of the shared ergot cluster genes was toward G in the third position of codons that had a G versus A option only. This includes codons for glutamic acid, glutamine, lysine, and a subset of the leucine and arginine codons. This preference also was seen in the coding sequences of a random sampling of seven A. fumigatus genes not associated with the ergot alkaloid cluster. Conversely, the seven C. purpurea genes that have homologs in the A. fumigatus cluster did not display this same codon usage bias (Fig. 2.6). When the data for these five codon sets were pooled, the proportion of codons that contained a G in the third position was significantly lower (P<0.01; Tukey-Kramer HSD test) in the C. purpurea ergot cluster genes than in the A. fumigatus ergot cluster genes or the random sampling of A. fumigatus genes. The means for the two A. fumigatus gene sets were not significantly different. No other codon usage biases were apparent in any of the data sets. The G+C content for these same three sets of seven genes also was investigated, and no significant differences were identified. The data do not support a recent transfer of the ergot alkaloid biosynthesis genes between A. fumigatus and C. purpurea.

Sign in to download full-size image
Figure 2.6. Histogram indicating the percentage of codons for the indicated amino acids (single amino acid code used) in which G as opposed to A appears in the third position of the codon. For arginine and leucine codons, only the subset of codons for which A versus G is the sole possible difference (i.e., CGR, with R representing purine, for arginine, and UUR for leucine) were considered. Key: C.p. (white bars) represents the seven C. purpurea ergot alkaloid cluster genes with orthologs in the A. fumigatus cluster; A.f. ergot (black bars) represents the seven A. fumigatus ergot alkaloid cluster genes with orthologs in the C. purpurea cluster; and, A.f. other (grey bars) represents an arbitrary sample of seven A. fumigatus genes retrieved from GenBank that have no association with the ergot alkaloid gene cluster.
View chapterPurchase book
Synthetic Biology, Part A
Kevin J. Morey, ... June I. Medford, in Methods in Enzymology, 2011
3.3 Consideration of codon bias when designing synthetic signaling systems
One last consideration when engineering a synthetic signaling system is codon usage bias. The presence of codon usage bias among different organisms has been well documented (Batard et al., 2000; Lessard et al., 2002; Suo et al., 2006). The preference of one codon over another by an organism can be a barrier to expressing bacterial genetic circuits in plants or testing plant proteins in bacteria. Codon optimization of heterologously expressed genes can improve expression levels and, in some cases, simply allow a gene to be heterologously expressed (Perlak et al., 1991).
Our system presents a stark example of conflicting codon usage bias. Four of the seven rare codons utilized by E. coli (used at a frequency < 0.5%) code for the amino acid arginine (Chen and Texada, 2006). These codons account for over 74% of the arginine codon usage in Arabidopsis (Nakamura et al., 2000), with the extremely rare AGG and AGA codons making up 57% of Arabidopsis usage. Hence, if not addressed when expressing heterologous genes, rare codon clustering can lead to drastically reduced protein levels and mRNA degradation (Li et al., 2006; Sunohara et al., 2004). For instance, the AHK4/CRE1 gene contains 40 of the rare arginine codons, including two arginine codons rare to E. coli appearing contiguously at positions 16 and 17, as well as at positions 133 and 134. Our laboratory has encountered gene instability/toxicity issues when expressing this protein in E. coli, a result that has also been reported elsewhere (Mizuno and Yamashino, 2010).
The effect of codon usage bias can be complicated during cloning in bacterial cells by the unwanted activity of promoters used to drive plant genes. The CaMV35S promoter, commonly used for constitutive expression of genes in plants, has been shown to have activity in E. coli (Assaad and Signer, 1990). Therefore, cloning a gene that has been codon-optimized for plants downstream of the CaMV35S promoter could potentially lead to instability in bacterial cells if the gene has a large number of rare E. coli codons. Hence, codon usage may need to be considered when simply cloning plant genes in E. coli as well as when developing a bacterial testing system.
View chapterPurchase book
Enzymes of Energy Technology
Vera Engelbrecht, Thomas Happe, in Methods in Enzymology, 2018
Expression strain/codon usage
If protein expression in BL21 is low, differences in chaperones, codon usage bias, posttranslational modifications, or disulfide bridge formation in the algae compared to the E. coli expression strain could be the reason. In the case of codon usage bias, it might be most convenient to use a gene sequence that is already optimized for E. coli. Even though many methods optimize codon usage by enriching high-frequency host codons (Chin, Chung, & Lee, 2014; Grote et al., 2005; Liu, Deng, Wang, & Wang, 2014; Lorimer et al., 2009; Puigbò, Guzmán, Romeu, & Garcia-Vallvé, 2007; Villalobos, Ness, Gustafsson, Minshull, & Govindarajan, 2006), a recent study suggests using varying ratios of low-frequency to high-frequency codons in order to synchronize the translational speed of the protein according to its structural elements (Tian et al., 2017). Another strategy for overcoming the codon usage problem is to increase the availability of underrepresented tRNAs, which is possible with E. coli strains such as BL21-CodonPlus (Stratagene) or Rosetta (DE3) (Novagen). These strains contain plasmids with extra genes for rare tRNAs (Rosano & Ceccarelli, 2014). If disulfide bond formation is thought to be the critical step in proper folding of the recombinant hydrogenase in E. coli, expression in Origami (Novagen) or Shuffle (NEB) strains could be helpful. Both strains have mutations that lead to an oxidative cytoplasmic environment which favors disulfide bond formation (Derman, Prinz, Belin, & Beckwith, 1993). Another possibility is that the E. coli chaperone network might not be efficient enough to help with the recombinant protein folding. In that case, it has proven helpful to either induce the E. coli chaperone network by the addition of 10 mM benzyl alcohol 20 min before induction or coexpress plasmid-encoded molecular chaperones (de Marco, Vigh, Diamant, & Goloubinoff, 2005).
➔
Optimize codon usage
➔
Induce E. coli chaperone network by adding 10 mM benzyl alcohol before induction
➔
Use different E. coli strains such as Rosetta, BL21-CodonPlus, Origami, or Shuffle
View chapterPurchase book
Computational Tools for Taxonomic Microbiome Profiling of Shotgun Metagenomes
Matthias Scholz, ... Nicola Segata, in Metagenomics for Microbiology, 2015
Compositional approaches for metagenomic binning
Compositional approaches compare the intrinsic properties of sequences without being reliant on direct nucleotide or protein sequence alignment. Such intrinsic properties that are known to be good organismal signatures include variations in GC-content, codon usage bias, and the distribution of k-mers of variable length, with the latter being considered the most important compositional feature for comparison. In a compositional approach, the first step is to build a statistical model of species- or genus-specific intrinsic properties by preprocessing reference genomes (the so-called training step). The second step is applying this model to compare and classify the metagenomic reads. There are several different approaches to achieve these goals; for example, PhyloPythia/PhyloPythiaS25 adopts a support vector machine classifier based on k-mer statistics. Different methods use other state-of-the-art machine-learning tools and these include Phymm26 and NBC27 that are based on Bayesian models and TACOA,28 which adopts a k-nearest neighbor-based strategy.
Because compositional approaches avoid the computationally expensive sequence alignment, they usually permit quick running times. Similarly to assembly-based approaches, they have high generalizing capabilities showing good properties in classifying reads without closely related reference sequences. This capability is because of the fact that intrinsic sequence information is evolutionarily more conserved than nucleotide sequence homology. However, this ability comes at the expense of low discrimination power when closely related sequences are present in the reference databases. For this reason, compositional taxonomic profiling is usually limited to genus-level resolution. Moreover, the low discriminatory power is further exacerbated by very short sequencing reads. Combining compositional with mapping-based approaches can mitigate both shortcomings.
View chapterPurchase book
Emerging Applications of Molecular Imaging to Oncology
Il Minn, ... Martin G. Pomper, in Advances in Cancer Research, 2014
4.3 Codon optimization
The genetic code is degenerate as it has 64 different codons for 20 amino acids and transcriptional stop signs. Different species often have a preference for a particular codon for encoding an amino acid (Comeron & Aguade, 1998). That codon usage bias often makes it less efficient to express reporter genes from different species. For this reason, reporter genes of nonhuman origin have been optimized for their codon usage bias by replacing codons (DNA sequences) with the ones more frequently used in humans. An excellent example of that involves an attempt at the optimization of GFP as a reporter from Aequorea victoria (Yang, Cheng, & Kain, 1996). The humanized gLuc via codon optimization exhibited an approximately 1000-fold increase in signal intensity compared with its wild-type isolated from another marine organism, Gaussia princeps (Tannous et al., 2005). Recently, codon-optimized fLuc with a single mutation (S284T) has been shown to emit a red-shifted bioluminescent signal with enhanced intensity in human glioma cells (Caysa et al., 2009). Software is available online to aid in optimization of codons for desired species (Fox & Erill, 2010).
View chapterPurchase book
Agricultural and Related Biotechnologies
S. Ma, ... N.P.A. Hüner, in Comprehensive Biotechnology (Second Edition), 2011
4.23.4.4 Chloroplast Codon Optimization
It is well recognized that various organisms utilize certain codons in preference to others. Such preferential codon usage also occurs in chloroplasts. For example, the chloroplast of C. reinhardtii displays such codon bias, with codons containing adenine or uracil nucleotides in the third position favored over those with guanine or cytosine [12, 44]. Codon usage bias is an important factor in limiting foreign gene expression in chloroplasts [12, 44]. The adaption of foreign genes to the preferred codon usage of highly expressed chloroplast genes from Chlamydomonas may be another effective strategy for increasing recombinant protein expression in algal chloroplasts. Franklin et al. [39] demonstrated that the optimization of a GFP reporter to reflect chloroplast codon usage increased its expression at least 80-fold as compared to its nonoptimized counterpart. Similarly, Mayfield and Schultz [45] showed increased expression of the bacterial luciferase reporter when a chloroplast codon-optimized version of this gene was transformed into the chloroplast of C. reinhardtii. These results may indicate the necessity for codon optimization of any gene for which high levels of protein production are desired when using algal chloroplasts as an expression platform.
View chapterPurchase book
Bacillus Subtilis
A. Danchin, in Encyclopedia of Genetics, 2001
Codon Usage and Organization of the Cell's Cytoplasm
Because the genetic code is redundant, coding sequences exhibit highly variable patterns of codon usage. If there were no bias, all codons for a given amino acid should be used more or less equally. The genes of B. subtilis have been split into three classes on the basis of their codon usage bias. One class comprises the bulk of the proteins, another is made up of genes that are expressed at a high level during exponential growth, and a third class, with A+T-rich codons, corresponds to portions of the genome that have been horizontally exchanged. What is the source of such biases? Random mutations would be expected to have smoothed out any differences, but this is not the case. There are also systematic effects of context, with some DNA sequences being favored or selected against.
The cytoplasm of a cell is not a tiny test tube. One of the most puzzling features of the organization of the cytoplasm is that it accommodates the presence of a very long thread-like molecule, DNA, which is transcribed to generate a multitude of RNA threads that usually are as long as the length of the whole cell. If mRNA molecules were left free in the cytoplasm, all kinds of knotted structures would arise. There must exist therefore, some organizational principles that prevent mRNA molecules and DNA from becoming entangled. Several models, supported by experiments, postulate an arrangement where transcribed regions are present at the surface of a chromoid, in such a way that RNA polymerase does not have to circumscribe the double helix during transcription. Compartmentalization is important even for small molecules, despite the fact that they can diffuse quickly. In a B. subtilis cell growing exponentially in rich medium, the ribosomes occupy more than 15% of the cell's volume. The cytoplasm is therefore a ribosome lattice, in which the local diffusion rates of small molecules, as well as macromolecules, is relatively slow. Along the same lines, the calculated protein concentration of the cell is ca. 100–200 mg ml−1, a very high concentration.
The translational machinery requires an appropriate pool of elongation factors, aminoacyl-tRNA synthetases, and tRNAs. Counting the number of tRNA molecules adjacent to a given ribosome, one conceptualizes a small, finite number of molecules. As a consequence, a translating ribosome is an attractor that acts upon a limited pool of tRNA molecules. This situation provides a form of selective pressure, whose outcome would be adaptation of the codon usage bias of the translated message as a function of its position within the cytoplasm. If codon usage bias were to change from mRNA to mRNA, these different molecules would not see the same ribosomes during the life cycle. In particular, if two genes had very different codon usage patterns, this would predict that the corresponding mRNAs are not formed within the same sector of the cytoplasm.
When mRNA threads are emerging from DNA they become engaged by the lattice of ribosomes, and they ratchet from one ribosome to the next, like a thread in a wiredrawing machine (note that this is exactly opposite to the view of translation presented in textbooks, where ribosomes are supposed to travel along fixed mRNA molecules). In this process, nascent proteins are synthesized on each ribosome, and spread throughout the cytoplasm by the linear diffusion of the mRNA molecule from one ribosome to the next. However, when mRNA disengages from DNA, the transcription complex must sometimes break up. Broken mRNA is likely to be a dangerous molecule because, if translated, it would produce a truncated protein. Such protein fragments are often toxic, because they can disrupt the architecture of multisubunit complexes (this explains why many nonsense mutants are negative dominant, rather than recessive). There exists a process that copes with this kind of accident in B. subtilis. When a prematurely terminated mRNA molecule reaches its end, the ribosome stops translating, does not dissociate, and waits. A specialized RNA, tmRNA, which is folded and processed at its 3′ end like a tRNA and charged with alanine, comes in, inserts its alanine at the C-terminus of the nascent polypeptide, then replaces the mRNA within a ribosome, where it is translated as ASFNQNVALAA. This tail is a protein tag that is then used to direct it to a proteolytic complex (ClpA, ClpX), where it is degraded.
The organization of the ribosome lattice, coupled to the organization of the transcribing surface of the chromoid, ensures that mRNA molecules are translated parallel to each other, in such a way that they do not make knots. Polycistronic operons ensure that proteins having related functions are coexpressed locally, permitting channeling of the corresponding pathway intermediates. In this way, the structure of mRNA molecules is coupled to their fate in the cell, and to their function in compartmentalization. Genes translated sequentially in operons are physiologically and structurally connected. This is also true for mRNAs that are translated parallel to each other, suggesting that several RNA polymerases are engaged in the transcription process simultaneously, yoked as draft animals. Indeed, if there is correlation of function and/or localization in one dimension, there exists a similar constraint in the orthogonal directions. Because ribosomes attract tRNA molecules, they bring about a local coupling between these molecules and the codons being translated. This predicts that a given ribosome would preferentially translate mRNAs having similar patterns of codon usage. As a consequence, as one moves away from a strongly biased ribosome, there would be less and less availability of the most biased tRNAs. This creates a selection pressure for a gradient of codon usage as one goes away from the most biased messages and ribosomes, nesting transcripts around central core(s), formed of transcripts for highly biased genes. Finally, ribosome synthesis creates a repulsive force that pushes DNA strands away from each other, in particular from regions near the origin of replication. Together these processes result in a gene gradient along the chromosome, which is an important element of the architecture of the cell.
View chapterPurchase book
Impact of the host on plant virus evolution
Xiao-fei Cheng, ... Hui-zhong Wang, in Plant Virus–Host Interaction, 2014
Impact of the host on viral synonymous codon choice
Due to the degeneracy of genetic codons, all amino acids, except methionine and tryptophan, are encoded by more than one codon. Codons encoding the same amino acid are known as synonymous codons. The individual synonymous codons for a given amino acid are not used at similar frequencies in different genes or organisms, indicating a bias in codon usage (Grantham et al 1980). Synonymous codon usage is determined by many factors, such as translation selection, mutation pressure, gene transfer, amino acid conservation, RNA stability, hypersaline adaptation, and growth conditions (Ermolaeva 2001, Lynn et al 2002, Paul et al 2008). For viruses, the viral mutational preference is thought to be the most important factor that shapes viral synonymous codon usage (Jenkins & Holmes 2003, Adams & Antoniw 2004). However, the translational pressure due to tRNA availability, nucleotide acid abundance, and selection of CpG-suppressed clones in the host cell by the immune system, also affects viral synonymous codon usage and even determines the synonymous codon usage bias in some viral genes or particular viruses (Karlin et al 1990, Zhou et al 1999, Woo et al 2007, Lobo et al 2009, Aragonès et al 2010). For example, Chantawannakul and Cutler (2008) found that the nucleotide composition at all codon positions and synonymous codon usage of viruses infecting the honeybee show a high degree of resemblance to that of the honeybee, suggesting that the long-term convergent evolution between honeybee and associated viruses results in the adaptation of virus synonymous codon usage to that of the host.
The first comprehensive analysis of the synonymous codon usage of plant viruses was carried out by Adams and Antoniw (2004). In this study, the synonymous codon usage bias of 385 plant viruses was measured with an effective number of codons (ENC), a simple method to quantify how far the codon usage of a gene departs from equal usage of synonymous codons (Wright 1990), and was correlated with the viral nucleotide composition, host type, and mode of transmission. They found that the ENC values of these viruses were positively correlated with those of viral GC contents in the third codon position but not with the host type they infect or the transmission model. As a result, they concluded that mutational bias, rather than translational selection, accounts for the observed variations in synonymous codon usage in plant viruses, and that there is no obvious impact of host translation selection in the viral synonymous codon usage. However, there are several pitfalls in their study. First, it is arbitrary to use only one indicator, the ENC value, to evaluate viral synonymous codon usage. Second, a direct comparison of the relative synonymous codon usage (RSCU) between the viruses and their respective hosts was not performed. Third, it is also arbitrary to categorize all the plant viruses based on the type of host plants they infect without considering features of the viral genome, for example, genome type (ssDNA, dsDNA, ssRNA, or dsRNA) and genome polarity (positive or negative), because viruses with different genome features were originated separately and may differ from each other greatly in many aspects, including the nucleotide composition and mutation bias. In fact, detailed analysis of the synonymous codon usage of begomoviruses (circular ssDNA viruses, Geminiviridae) showed that translational selection can be detected in the genomes of begomoviruses, especially in the highly expressed genes, although mutation bias appears to be the major determinant of the overall synonymous codon usage of begomoviruses (Xu et al 2008). Interestingly, we found a high degree of similarity of the synonymous codon usage between CTV and its citrus host (Cheng et al 2012). Additionally, the synonymous codon usage resemblance between woody plant-infecting closteroviruses and their woody hosts is higher than that between herbaceous plant-infecting closteroviruses and their herbaceous hosts (Cheng et al 2012). This result further confirms the influence of the host on synonymous codon usage in plant viruses. In another study, we also found that linear specific synonymous codon usage exists in viruses within the Bunyviridae and two phylogenetically related genera, Tenuivirus and Emaravirus, although the synonymous codon usage of most of these viruses shows a high degree of resemblance, suggesting that the mutational preference is the major factor influencing synonymous codon usage (our unpublished data).
In conclusion, several basic deductions can be drawn from the above studies. First, the synonymous codon usage of viruses within the same genus is always highly similar. In other words, mutational pressure is the major factor determining the overall synonymous codon usage. Second, translational pressure from the host also affects the viral synonymous codon usage, even if not in all plant viruses. Third, the influence of host translational pressure may be stronger in the genes that are highly expressed than in those expressed at lower levels. Fourth, the impact of host translational selection may be important in some particular plant viruses, such as those that coevolved with their plant hosts.
Expression strain/codon usage
If protein expression in BL21 is low, differences in chaperones, codon usage bias, posttranslational modifications, or disulfide bridge formation in the algae compared to the E. coli expression strain could be the reason. In the case of codon usage bias, it might be most convenient to use a gene sequence that is already optimized for E. coli. Even though many methods optimize codon usage by enriching high-frequency host codons (Chin, Chung, & Lee, 2014; Grote et al., 2005; Liu, Deng, Wang, & Wang, 2014; Lorimer et al., 2009; Puigbò, Guzmán, Romeu, & Garcia-Vallvé, 2007; Villalobos, Ness, Gustafsson, Minshull, & Govindarajan, 2006), a recent study suggests using varying ratios of low-frequency to high-frequency codons in order to synchronize the translational speed of the protein according to its structural elements (Tian et al., 2017). Another strategy for overcoming the codon usage problem is to increase the availability of underrepresented tRNAs, which is possible with E. coli strains such as BL21-CodonPlus (Stratagene) or Rosetta (DE3) (Novagen). These strains contain plasmids with extra genes for rare tRNAs (Rosano & Ceccarelli, 2014). If disulfide bond formation is thought to be the critical step in proper folding of the recombinant hydrogenase in E. coli, expression in Origami (Novagen) or Shuffle (NEB) strains could be helpful. Both strains have mutations that lead to an oxidative cytoplasmic environment which favors disulfide bond formation (Derman, Prinz, Belin, & Beckwith, 1993). Another possibility is that the E. coli chaperone network might not be efficient enough to help with the recombinant protein folding. In that case, it has proven helpful to either induce the E. coli chaperone network by the addition of 10 mM benzyl alcohol 20 min before induction or coexpress plasmid-encoded molecular chaperones (de Marco, Vigh, Diamant, & Goloubinoff, 2005).
➔
Optimize codon usage
➔
Induce E. coli chaperone network by adding 10 mM benzyl alcohol before induction
➔
Use different E. coli strains such as Rosetta, BL21-CodonPlus, Origami, or Shuffle
View chapterPurchase book
Computational Tools for Taxonomic Microbiome Profiling of Shotgun Metagenomes
Matthias Scholz, ... Nicola Segata, in Metagenomics for Microbiology, 2015
Compositional approaches for metagenomic binning
Compositional approaches compare the intrinsic properties of sequences without being reliant on direct nucleotide or protein sequence alignment. Such intrinsic properties that are known to be good organismal signatures include variations in GC-content, codon usage bias, and the distribution of k-mers of variable length, with the latter being considered the most important compositional feature for comparison. In a compositional approach, the first step is to build a statistical model of species- or genus-specific intrinsic properties by preprocessing reference genomes (the so-called training step). The second step is applying this model to compare and classify the metagenomic reads. There are several different approaches to achieve these goals; for example, PhyloPythia/PhyloPythiaS25 adopts a support vector machine classifier based on k-mer statistics. Different methods use other state-of-the-art machine-learning tools and these include Phymm26 and NBC27 that are based on Bayesian models and TACOA,28 which adopts a k-nearest neighbor-based strategy.
Because compositional approaches avoid the computationally expensive sequence alignment, they usually permit quick running times. Similarly to assembly-based approaches, they have high generalizing capabilities showing good properties in classifying reads without closely related reference sequences. This capability is because of the fact that intrinsic sequence information is evolutionarily more conserved than nucleotide sequence homology. However, this ability comes at the expense of low discrimination power when closely related sequences are present in the reference databases. For this reason, compositional taxonomic profiling is usually limited to genus-level resolution. Moreover, the low discriminatory power is further exacerbated by very short sequencing reads. Combining compositional with mapping-based approaches can mitigate both shortcomings.
View chapterPurchase book
Emerging Applications of Molecular Imaging to Oncology
Il Minn, ... Martin G. Pomper, in Advances in Cancer Research, 2014
4.3 Codon optimization
The genetic code is degenerate as it has 64 different codons for 20 amino acids and transcriptional stop signs. Different species often have a preference for a particular codon for encoding an amino acid (Comeron & Aguade, 1998). That codon usage bias often makes it less efficient to express reporter genes from different species. For this reason, reporter genes of nonhuman origin have been optimized for their codon usage bias by replacing codons (DNA sequences) with the ones more frequently used in humans. An excellent example of that involves an attempt at the optimization of GFP as a reporter from Aequorea victoria (Yang, Cheng, & Kain, 1996). The humanized gLuc via codon optimization exhibited an approximately 1000-fold increase in signal intensity compared with its wild-type isolated from another marine organism, Gaussia princeps (Tannous et al., 2005). Recently, codon-optimized fLuc with a single mutation (S284T) has been shown to emit a red-shifted bioluminescent signal with enhanced intensity in human glioma cells (Caysa et al., 2009). Software is available online to aid in optimization of codons for desired species (Fox & Erill, 2010).
View chapterPurchase book
Agricultural and Related Biotechnologies
S. Ma, ... N.P.A. Hüner, in Comprehensive Biotechnology (Second Edition), 2011
4.23.4.4 Chloroplast Codon Optimization
It is well recognized that various organisms utilize certain codons in preference to others. Such preferential codon usage also occurs in chloroplasts. For example, the chloroplast of C. reinhardtii displays such codon bias, with codons containing adenine or uracil nucleotides in the third position favored over those with guanine or cytosine [12, 44]. Codon usage bias is an important factor in limiting foreign gene expression in chloroplasts [12, 44]. The adaption of foreign genes to the preferred codon usage of highly expressed chloroplast genes from Chlamydomonas may be another effective strategy for increasing recombinant protein expression in algal chloroplasts. Franklin et al. [39] demonstrated that the optimization of a GFP reporter to reflect chloroplast codon usage increased its expression at least 80-fold as compared to its nonoptimized counterpart. Similarly, Mayfield and Schultz [45] showed increased expression of the bacterial luciferase reporter when a chloroplast codon-optimized version of this gene was transformed into the chloroplast of C. reinhardtii. These results may indicate the necessity for codon optimization of any gene for which high levels of protein production are desired when using algal chloroplasts as an expression platform.
View chapterPurchase book
Bacillus Subtilis
A. Danchin, in Encyclopedia of Genetics, 2001
Codon Usage and Organization of the Cell's Cytoplasm
Because the genetic code is redundant, coding sequences exhibit highly variable patterns of codon usage. If there were no bias, all codons for a given amino acid should be used more or less equally. The genes of B. subtilis have been split into three classes on the basis of their codon usage bias. One class comprises the bulk of the proteins, another is made up of genes that are expressed at a high level during exponential growth, and a third class, with A+T-rich codons, corresponds to portions of the genome that have been horizontally exchanged. What is the source of such biases? Random mutations would be expected to have smoothed out any differences, but this is not the case. There are also systematic effects of context, with some DNA sequences being favored or selected against.
The cytoplasm of a cell is not a tiny test tube. One of the most puzzling features of the organization of the cytoplasm is that it accommodates the presence of a very long thread-like molecule, DNA, which is transcribed to generate a multitude of RNA threads that usually are as long as the length of the whole cell. If mRNA molecules were left free in the cytoplasm, all kinds of knotted structures would arise. There must exist therefore, some organizational principles that prevent mRNA molecules and DNA from becoming entangled. Several models, supported by experiments, postulate an arrangement where transcribed regions are present at the surface of a chromoid, in such a way that RNA polymerase does not have to circumscribe the double helix during transcription. Compartmentalization is important even for small molecules, despite the fact that they can diffuse quickly. In a B. subtilis cell growing exponentially in rich medium, the ribosomes occupy more than 15% of the cell's volume. The cytoplasm is therefore a ribosome lattice, in which the local diffusion rates of small molecules, as well as macromolecules, is relatively slow. Along the same lines, the calculated protein concentration of the cell is ca. 100–200 mg ml−1, a very high concentration.
The translational machinery requires an appropriate pool of elongation factors, aminoacyl-tRNA synthetases, and tRNAs. Counting the number of tRNA molecules adjacent to a given ribosome, one conceptualizes a small, finite number of molecules. As a consequence, a translating ribosome is an attractor that acts upon a limited pool of tRNA molecules. This situation provides a form of selective pressure, whose outcome would be adaptation of the codon usage bias of the translated message as a function of its position within the cytoplasm. If codon usage bias were to change from mRNA to mRNA, these different molecules would not see the same ribosomes during the life cycle. In particular, if two genes had very different codon usage patterns, this would predict that the corresponding mRNAs are not formed within the same sector of the cytoplasm.
When mRNA threads are emerging from DNA they become engaged by the lattice of ribosomes, and they ratchet from one ribosome to the next, like a thread in a wiredrawing machine (note that this is exactly opposite to the view of translation presented in textbooks, where ribosomes are supposed to travel along fixed mRNA molecules). In this process, nascent proteins are synthesized on each ribosome, and spread throughout the cytoplasm by the linear diffusion of the mRNA molecule from one ribosome to the next. However, when mRNA disengages from DNA, the transcription complex must sometimes break up. Broken mRNA is likely to be a dangerous molecule because, if translated, it would produce a truncated protein. Such protein fragments are often toxic, because they can disrupt the architecture of multisubunit complexes (this explains why many nonsense mutants are negative dominant, rather than recessive). There exists a process that copes with this kind of accident in B. subtilis. When a prematurely terminated mRNA molecule reaches its end, the ribosome stops translating, does not dissociate, and waits. A specialized RNA, tmRNA, which is folded and processed at its 3′ end like a tRNA and charged with alanine, comes in, inserts its alanine at the C-terminus of the nascent polypeptide, then replaces the mRNA within a ribosome, where it is translated as ASFNQNVALAA. This tail is a protein tag that is then used to direct it to a proteolytic complex (ClpA, ClpX), where it is degraded.
The organization of the ribosome lattice, coupled to the organization of the transcribing surface of the chromoid, ensures that mRNA molecules are translated parallel to each other, in such a way that they do not make knots. Polycistronic operons ensure that proteins having related functions are coexpressed locally, permitting channeling of the corresponding pathway intermediates. In this way, the structure of mRNA molecules is coupled to their fate in the cell, and to their function in compartmentalization. Genes translated sequentially in operons are physiologically and structurally connected. This is also true for mRNAs that are translated parallel to each other, suggesting that several RNA polymerases are engaged in the transcription process simultaneously, yoked as draft animals. Indeed, if there is correlation of function and/or localization in one dimension, there exists a similar constraint in the orthogonal directions. Because ribosomes attract tRNA molecules, they bring about a local coupling between these molecules and the codons being translated. This predicts that a given ribosome would preferentially translate mRNAs having similar patterns of codon usage. As a consequence, as one moves away from a strongly biased ribosome, there would be less and less availability of the most biased tRNAs. This creates a selection pressure for a gradient of codon usage as one goes away from the most biased messages and ribosomes, nesting transcripts around central core(s), formed of transcripts for highly biased genes. Finally, ribosome synthesis creates a repulsive force that pushes DNA strands away from each other, in particular from regions near the origin of replication. Together these processes result in a gene gradient along the chromosome, which is an important element of the architecture of the cell.
View chapterPurchase book
Impact of the host on plant virus evolution
Xiao-fei Cheng, ... Hui-zhong Wang, in Plant Virus–Host Interaction, 2014
Impact of the host on viral synonymous codon choice
Due to the degeneracy of genetic codons, all amino acids, except methionine and tryptophan, are encoded by more than one codon. Codons encoding the same amino acid are known as synonymous codons. The individual synonymous codons for a given amino acid are not used at similar frequencies in different genes or organisms, indicating a bias in codon usage (Grantham et al 1980). Synonymous codon usage is determined by many factors, such as translation selection, mutation pressure, gene transfer, amino acid conservation, RNA stability, hypersaline adaptation, and growth conditions (Ermolaeva 2001, Lynn et al 2002, Paul et al 2008). For viruses, the viral mutational preference is thought to be the most important factor that shapes viral synonymous codon usage (Jenkins & Holmes 2003, Adams & Antoniw 2004). However, the translational pressure due to tRNA availability, nucleotide acid abundance, and selection of CpG-suppressed clones in the host cell by the immune system, also affects viral synonymous codon usage and even determines the synonymous codon usage bias in some viral genes or particular viruses (Karlin et al 1990, Zhou et al 1999, Woo et al 2007, Lobo et al 2009, Aragonès et al 2010). For example, Chantawannakul and Cutler (2008) found that the nucleotide composition at all codon positions and synonymous codon usage of viruses infecting the honeybee show a high degree of resemblance to that of the honeybee, suggesting that the long-term convergent evolution between honeybee and associated viruses results in the adaptation of virus synonymous codon usage to that of the host.
The first comprehensive analysis of the synonymous codon usage of plant viruses was carried out by Adams and Antoniw (2004). In this study, the synonymous codon usage bias of 385 plant viruses was measured with an effective number of codons (ENC), a simple method to quantify how far the codon usage of a gene departs from equal usage of synonymous codons (Wright 1990), and was correlated with the viral nucleotide composition, host type, and mode of transmission. They found that the ENC values of these viruses were positively correlated with those of viral GC contents in the third codon position but not with the host type they infect or the transmission model. As a result, they concluded that mutational bias, rather than translational selection, accounts for the observed variations in synonymous codon usage in plant viruses, and that there is no obvious impact of host translation selection in the viral synonymous codon usage. However, there are several pitfalls in their study. First, it is arbitrary to use only one indicator, the ENC value, to evaluate viral synonymous codon usage. Second, a direct comparison of the relative synonymous codon usage (RSCU) between the viruses and their respective hosts was not performed. Third, it is also arbitrary to categorize all the plant viruses based on the type of host plants they infect without considering features of the viral genome, for example, genome type (ssDNA, dsDNA, ssRNA, or dsRNA) and genome polarity (positive or negative), because viruses with different genome features were originated separately and may differ from each other greatly in many aspects, including the nucleotide composition and mutation bias. In fact, detailed analysis of the synonymous codon usage of begomoviruses (circular ssDNA viruses, Geminiviridae) showed that translational selection can be detected in the genomes of begomoviruses, especially in the highly expressed genes, although mutation bias appears to be the major determinant of the overall synonymous codon usage of begomoviruses (Xu et al 2008). Interestingly, we found a high degree of similarity of the synonymous codon usage between CTV and its citrus host (Cheng et al 2012). Additionally, the synonymous codon usage resemblance between woody plant-infecting closteroviruses and their woody hosts is higher than that between herbaceous plant-infecting closteroviruses and their herbaceous hosts (Cheng et al 2012). This result further confirms the influence of the host on synonymous codon usage in plant viruses. In another study, we also found that linear specific synonymous codon usage exists in viruses within the Bunyviridae and two phylogenetically related genera, Tenuivirus and Emaravirus, although the synonymous codon usage of most of these viruses shows a high degree of resemblance, suggesting that the mutational preference is the major factor influencing synonymous codon usage (our unpublished data).
In conclusion, several basic deductions can be drawn from the above studies. First, the synonymous codon usage of viruses within the same genus is always highly similar. In other words, mutational pressure is the major factor determining the overall synonymous codon usage. Second, translational pressure from the host also affects the viral synonymous codon usage, even if not in all plant viruses. Third, the influence of host translational pressure may be stronger in the genes that are highly expressed than in those expressed at lower levels. Fourth, the impact of host translational selection may be important in some particular plant viruses, such as those that coevolved with their plant hosts.