8 Optimization of Codon Usage
Codon bias exists in filamentous fungi. An analysis from a genomic perspective revealed an average 57% GC content in the nuclear genes and confirmed an earlier finding that suggested a bias of a cytosine base at the third position in the codons of N. crassa (Radford & Parish, 1997). Another analysis of 45 highly or poorly expressed genes in A. nidulans indicated that, although the GC content of the genome is close to 50%, the codon usage is highly biased to approximately 20 "optimal codons." These "optimized codons" are characterized by their ending with C or G (Lloyd & Sharp, 1991). In A. awamori and A. niger, a total of 51,434 bp of codons were analyzed. The optimal codon usage has been defined, which helped designing of a synthesized chymosin gene with A. awamori-preferred codons (Cardoza et al., 2003).
The codon usage is tightly associated with recruitment of transfer RNAs (tRNAs) to the ribosome. A general principle is that rare codons (used less than 15%) in consecutive positions or in clusters will lead to inefficient translation (Kinnaird, Burns, & Fincham, 1991). Optimization of codon usage has widely been used in the production of heterologous proteins in bacterial (Haas, Park, & Seed, 1996; Kane, 1995), fungal (Huang et al., 2008; Sinclair & Choy, 2002), plant (Tregoning et al., 2003), and mammalian (Kim, Oh, & Lee, 1997; Massaer et al., 2001) host cells. Overexpression of tRNAs specifically for the rare codons in the host cell (such as the BL21-CodonPlus(DE3)-RIPL from Stratagene Inc.) tend to alleviate ineffective translation, thus enhancing expression of the target gene.
To overcome the hurdle of codon bias in expressing heterologous (or even endogenous) genes in filamentous fungi, more often the codons of the heterologous genes are optimized according to the codon usage of the host cell. By changing 20 codons of a thermophilic xylanase of the bacterium D. thermophilum to those preferred in T. reesei, the xylanase was successfully expressed while the original gene failed to express (Te'o et al., 2000). The native aequorin AeqA from the jellyfish Aequorea victoria was poorly expressed in N. crassa with a yield of 0.15 μg/g total protein due to the presence of 44 rare codons in the gene. After optimization of the codons, the expression rose to 2.26 μg aequorin/g total protein in N. crassa, 13.4 μg aequorin/g total protein in A. niger and 21.8 μg aequorin/g total protein in A. awamori (Nelson et al., 2004). The luc gene encoding the firefly Photinus pyralis luciferase is an invaluable tool for molecular analyses of biological processes, such as circadian rhythm. However, the gene was very poorly expressed in N. crassa. Optimization of the codons for its first 21 residues resulted in successful expression of this protein and facilitated the circadian rhythm study in N. crassa (Morgan Greene, & Bell-Pedersen, 2003). A more intensive optimization of its codons further increased the expression of luciferase by four orders of magnitude (Gooch et al., 2008).
16.5.2 PTC Mutation Suppressors
PTCs result when single base-pair substitutions create an erroneous stop codon within the open reading frame of a gene. Suppressors of PTCs, such as aminoglycoside antibiotics, are able to bind eukaryotic ribosomes and cause the insertion of a near cognate amino-acyl transfer RNA into the ribosomal A site.91 This process can allow the ribosome to "readthrough" the PTC and produce some full-length protein and has been extensively tested in proof-of-concept studies using aminoglycosides to suppress PTCs (gentamicin, amikacin, geneticin).91 There has been demonstrated efficacy in in vitro studies, in animal models of CF and muscular dystrophy, and in small numbers of CF patients. Currently, there is one oral compound, ataluren (PTC Therapeutics), in clinical trials to treat CF caused by PTCs. Ataluren was studied in three phase 2 randomized, dose-ascending, open-label trials in CF.83,92,93 Each study demonstrated short-term tolerability of ataluren, and two studies demonstrated improvements in CFTR function (across a number of PTC mutations) as measured by NPD. One study also demonstrated improvements in CFTR localization to the nasal cell membrane, whereas another demonstrated improvement in cough over 3 months.83,92 The third study failed to demonstrate improvements in NPD, and all three studies were limited by small numbers and absence of placebo groups.94 It was then studied in a large phase 3 randomized, 48-week, double-blind, placebo-controlled trial designed to test safety, efficacy, and tolerability. The results have only been reported in abstracts to date.95,96 The phase 3 study consisted of 232 patients and showed that ataluren was associated with a trend toward slower loss of FEV1 (−2.5% ataluren vs. −5.5% placebo) and fewer pulmonary exacerbations (23% decrease compared with placebo).97 The primary significant drug-related toxicity was to the kidney.
Gene Expression: Translation of the Genetic Code
Chang-Hui Shen, in Diagnostic Molecular Biology, 2019
Wobble Rules
All 64 codons have been assigned meaning, with 61 of them coding for amino acids and the remaining 3 serving as the termination signals, also called nonsense codons (Table 4.1). Multiple codons for a single amino acid are not randomly distributed but have one or two bases in common. The bases that are common to several codons are usually the first and second bases, with more room for variation in the third base, which is called the wobble base. The wobble rules indicate that a first-base anticodon U could recognize either an A or G in the codon third-base position, and a first-base anticodon G might recognize either U or C in the third-base position of the codon (Table 4.2). Because the degenerate codons for a given amino acid differ in the third base, a given tRNA can base-pair with several codons. Thus, fewer different tRNAs are needed.
Paul F. Agris, ... Emily Eruysal, in The Enzymes, 2017
5.5 Summary
Sixfold degenerate codons across two codon boxes differ not only in the nucleoside occupying their wobble positions but also in the nucleoside present in their first (and, for serine, second) positions. The tRNA isoacceptors decoding these degenerate codons must simultaneously recognize and discriminate between these divergent codons while remaining universally recognizable to a single aminoacyl-tRNA synthetase and able to bind the ribosomal A-site. In E. coli, posttranscriptional modifications assist the five arginine tRNA isoacceptors in accomplishing distinguishing their cognate and wobble codons. In the tRNAArg1,2ICG isoacceptors, the variable presence of a 2-thiocytidine at position-32 inhibits the ability of the tRNA to wobble decode the rare arginine codon CGA due to steric and hydrogen bonding effects resulting from the replacement of an oxygen with a larger and less electronegative sulfur atom. tRNAArg4UCU, by contrast, is also restricted in its wobble recognition of the near-cognate AGG codon, but depends for this effect not on its s2C32 modification but on the chemical identity of the codon nucleosides together with the presence of four endogenous posttranscriptional modifications. The E. coli arginine tRNA isoacceptors provide an excellent example of the functional specificity conferred by posttranscriptional modifications acting in unique local chemical environments, as well as pointing out how RNA modification-based strategies of cellular regulation might influence the cell's response to codon bias and the translation of rare codons.
View chapterPurchase book
Heterologous Expression of Proteins in Aspergillus
S. Gómez, ... M.C. Vega, in New and Future Developments in Microbial Biotechnology and Bioengineering, 2016
Codon Optimization
Codon optimization is a common strategy to increase heterologous expression levels, based on the premise that using codons frequently used in the expression host should increase expression yields by avoiding a shortage of charged tRNAs or transcriptional arrest of the RNA polymerase. In practice, codon optimization is accomplished by adjusting the codon adaptation index of the recombinant gene of interest to that of the average gene in Aspergillus. Multivariate statistical analysis has shown that Aspergillus genome has a 50% G + C content with 19–20 codons, which are most frequently used, most of them ending in G or C (Lloyd and Sharp, 1991; Tanaka et al., 2014).
Recent work on 160 highly expressed orthologous genes in seven different species of Aspergillus suggests the existence of a set of optimal codons conserved across the whole genus, as well as a G + C content similar between species. Natural selection acting at the level of translation (in speed and accuracy) is thought to be the main reason for the conservation of the codon adaptation bias in Aspergillus (Iriarte et al., 2012).
Gouka et al. (1996) have reported for the first time the analysis of the effect of codon optimization at the mRNA level. Using plant α-galactosidase as a template, a codon-optimized synthetic gene was designed for transformation into A. awamori strain. Subsequently, mRNA level was detected only in those cases where cells had been transformed with a plasmid bearing the optimized gene (Gouka et al., 1996). Recently, Tanaka et al. (2014) designed a synthetic gene to express house dust mite allergen (Der f 7) gene in A. oryzae with more than 40% codons optimized. Using this optimized gene a 3–5-fold increase in protein yield was observed, which was paralleled by other effects such as an increase in the steady-state mRNA and a reduction in prematurely polyadenylated transcripts (Tokuoka et al., 2008). Despite these successes, the true causes behind these effects are still poorly understood.
Bioinformatics tools, such as codon usage database (http://www.kazusa.or.jp/codon/) have been generated to help in synthetic gene construction strategies. A recent work by van den Berg et al. (2012) described the implementation of a sequence-based predictor for extracellular protein production in A. niger. For this purpose, an exhaustive experiment was conducted to express over 600 homologous and 2000 heterologous fungal genes in a protease-deficient A. niger strain, using a standardized expression cassette. As a result of the application of sequence-based analysis to these experiments, it was discovered that the presence of tyrosine and asparagine residues in the primary sequences has a positive correlation with higher expression yields, whereas methionine and lysine composition have the opposite effect (van den Berg et al., 2012).
View chapterPurchase book
Codon Usage
Raimi M. Redwan, ... Ranjeev Hari, in Encyclopedia of Bioinformatics and Computational Biology, 2019
Codon Adaptation in Composition of Codon
Composition of codon in genes is known to affect the fate of the protein being synthesized as it influences level of expression, protein folding, and the regulation of protein expression. This observation led to the same common question as the translational selection as discussed above and that is the role of codon bias in determination of the proteins' expression level. As a consequence, each species has its own set of "preferred codons" identified within the highly expressed genes that is assumed to translate efficiently to ensure optimal protein synthesis. Relative to the finding, a matrix known as codon adaptation index (CAI) was developed to assess the relative adaptation of individual codons encoding a certain amino acid. The index measures the ratio of the codon's frequency to the frequency of its other synonymous codons. The index was measured from the highly expressed genes as the set of reference and the CAI for a gene can then be derived through the geometric mean of the relative adaptiveness values of all the codon within the gene (Sharp and Li, 1987). Similar to tAI, CAI also enables prediction of gene expression level but on the basis of the choice of codons encoding the gene (Sen et al., 2007; Wu et al., 2005). The index was also shown to be highly correlated to mRNA concentration of most of the genes tested in Coghlan and Wolfe (2000) and in microarray study of Martín-Galiano et al. (2004). Nonetheless, the CAI index has its own caveats to its predictive power. This is due to the fact that the index relies on a set of reference genes to derive its index, which might imposed limited value to predict expression for genes not reflected in the reference set (Martín-Galiano et al., 2004). The index also failed to include other positive factors influencing the expression value of the composed codons (Ermolaeva, 2001). Consequently, few studies showed a disparity of CAI index and the actual expression value of the genes tested (dos Reis et al., 2003; Kudla et al., 2009). Nonetheless, the index is still widely used as a prediction tool especially in de novo gene synthesis, whereby the CAI value is taken into consideration in the codon optimization algorithm (Chung and Lee, 2012; Condon and Thachuk, 2012; Nandagopal and Elowitz, 2011). The algorithm is developed based on several known factors to influence the expression level of genes based on the composition of codons, other than just the CAI. In addition, there are also other indexes developed to predict expression value based on the codon usage. Some of the alternatives to CAI are relative codon-usage bias (Roymondal et al., 2009), expression measure E(g) (Roymondal et al., 2009), relative codon adaptation (Fox and Erill, 2010) and modified relative codon bias strength (Das et al., 2017).
Positive correlation of the codon composition and its expression level implied that synonymous mutation among orthologous gene products, be it across different individuals or tissues, may no longer be considered as silent (Chamary et al., 2006; Kimchi-Sarfaty et al., 2007; Shields et al., 1988). The choice of codons, even though they are synonymous with respect to amino acids they encode, the information determines the mRNA structure and stability and affecting the protein structure, and its folding kinetics. This was depicted in a study when 154 synthetic orthologous genes with various synonymous variants in their codon showed different level of mRNA level and degradation rates (Kudla et al., 2009). It is important to note that the variant was not entirely caused by the composition of preferred codons throughout the genes but due to the variation in the ribosomal binding site that changed the stability of mRNA folding.
View chapterPurchase book
The Genetic Code
Julia E. Richards, R. Scott Hawley, in The Human Genome (Third Edition), 2011
Translation Requires an Adaptor Molecule Called tRNA
The codons in an mRNA molecule cannot and do not directly recognize the amino acids whose incorporation they direct. The cell uses an important organelle called a ribosome to carry out protein synthesis, and the ribosome reads the mRNA sequence through the use of an adaptor molecule called a transfer RNA (tRNA). Basically, one end of this adaptor recognizes one of the codons on the mRNA and the other end of the adaptor has the amino acid that goes with that codon. The way the adaptor recognizes the codon is by having an anticodon, a set of three bases on the tRNA molecule that can base-pair with the codon in the mRNA (Figure 4.3). Each tRNA has an anti-codon at one end and the corresponding amino acid attached at the other end. It turns out that there is a specific tRNA molecule for all but three of the possible codons.

Sign in to download full-size image
Figure 4.3. The process of translation. As a threonine tRNA sits in place with its threonine attached to the chain of amino acids that have already been added, an expended valine tRNA will have just left, with no amino acid attached, and a tryptophan tRNA is moving into place to add a tryptophan to the growing protein chain.
Three codons do not have a tRNA. These are the stop codons, and when they occur there is no tRNA that can fit into place and drop off an amino acid. This lack of a tRNA moving into place signals the ribosome to stop translating the content of this particular mRNA and to release the mRNA back into the cytoplasm where it can potentially be picked up by another ribosome to start the translation process again. The stop codons are UGA, UAA, and UAG.
How does the process start? The two parts of the ribosome, a large subunit and a small subunit, exist separately in the cytoplasm. When an mRNA is available to be translated the two pieces of the ribosome come together around the mRNA to form an intact ribosome in the process of carrying out translation (Figure 4.4). The ribosome moves along the mRNA, using tRNAs that match the codons on the mRNA as the mechanism for adding amino acids to the growing protein chain, so that each new amino acid added corresponds to the next codon on the mRNA. The result is that the order of amino acids in the protein is directly determined by the order of the codons on the mRNA. Once the ribosome reaches the stop codon it separates into its two subunits and releases the mRNA. Although human and bacterial ribosomes are quite similar in function, they have enough specific differences that it has been possible to develop some important antibiotics that target bacterial ribosomes while leaving the human ribosomes alone (Box 4.1).

Sign in to download full-size image
Figure 4.4. Ribosome assembly and transcription. The large and small subunits come together on the RNA and begin reading the coding sequence at the start codon by inserting a methionine at the first position of the protein chain. The ribo-some moves along the mRNA (or the mRNA moves through the ribosome, depending on your perspective), and as each new codon on the mRNA moves into position the corresponding tRNA clicks into place, precisely positioned to allow the amino acid attached to it to become unattached to the tRNA and to become attached to the growing protein chain. Meanwhile a new tRNA is moving into place so that a new amino acid can be added. When the stop codon is reached, there is no corresponding tRNA to move into position and the ribosome subunits come apart and release both the mRNA and the completed protein chain.
Box 4.1
The Ribosome
The human ribosome is built by bringing together a small ribosomal subunit and a large ribosomal subunit. Each ribosomal subunit is made of about 65% RNA and 35% protein, and the actual active parts of the organelle are the RNA components so this is sometimes classified as a ribozyme. The role of the ribosome is to bring together molecules in a very precise alignment that enables a biochemical reaction to proceed, so it has sometimes been classified as a molecular assembler. Hundreds of proteins are involved in the assembly of a ribosome. Although human and bacterial ribosomes read the same code and have many structural similarities, there are just enough key differences that some very important antibiotics are those that target the bacterial ribosomes while leaving the human ribosomes alone. In 2009 the Nobel Prize in Chemistry was awarded to Venkatraman Ramakrishnan, Thomas A. Steitz, and Ada E. Yonath for their work showing how different antibiotics interact with the three-dimensional structure of ribosomes.
View chapterPurchase book
Intrinsically Disordered Proteins
Jing Li, Vincent J. Hilser, in Methods in Enzymology, 2018
2.1 Mammalian Expression Vectors for GR Translational Isoforms and Luciferase Reporters
Codons to express human GR A isoform NTD and DBD two-domain construct in U-2 OS cells were optimized for mammalian cell expression, synthesized by DNA 2.0 (Menlo Park, CA), and inserted into the PJ603 mammalian expression vector under CMV promoter control. Plasmids for the two-domain constructs of B, C1, C2, C3, D1, D2, and D3 were made by inserting the codons for each respective isoform amplified from A isoform into the NheI and XhoI sites of the PJ603 vector.
Plasmid GRE2-Gluc, to express secreted Gaussia luciferase under the control of two tandem full-length GREs in the promoter, was made by inserting an oligonucleotide containing two tandem full-length GREs, 5′-aattcAGAACAggaTGTTCTgagatccgtagc AGAACAggaTGTTC Tgagatccgtagcg-3′, into the EcoRI and BamHI sites of the pGluc-miniTK vector (NEB). Plasmid pCluc-miniTK2 vector (NEB) to express Cypridina luciferase was utilized as an internal control in the cotransfection to account for cell density differences and transfection efficiency differences in each well.
View chapterPurchase book
Computational Methods in Molecular Biology
Anders Krogh, Phone: +45 4525 2471; Fax: +45 4593 4808, in New Comprehensive Biochemistry, 1998
4.2 Coding regions
The codon structure is the most important feature of coding regions. Bases in triplets can be modeled with three states as shown in Fig. 10. The figure also shows how this model of coding regions can be used in a simple model of an unspliced gene that starts with a start codon (ATG), then consists of some number of codons, and ends with a stop codon.

Sign in to download full-size image
Fig. 10. Top: a model of coding regions, where state one, two and three match the first, second and third codon positions, respectively. A coding region of any length can match this model, because of the transition from state three back to state one. Bottom: a simple model for unspliced genes with the first three states matching a start codon, the next three of the form shown to the left, and the last three states matching a stop codon (only one of the three possible stop codons are shown).
Since a codon is three bases long, the last state of the codon model must be at least of order two to correctly capture the codon statistics. The 64 probabilities in such a state are estimated by counting the number of each codon in a set of known coding regions. These numbers are then normalized properly. For example the probabilities derived from the counts of CAA, CAC, CAG and CAT are
p(A|CA)=c(CAA)/[c(CAA)+c(CAC)+c(CAG)+c(CAT)],p(C|CA)=c(CAC)/[c(CAA)+c(CAC)+c(CAG)+c(CAT)],p(G|CA)=c(CAG)/[c(CAA)+c(CAC)+c(CAG)+c(CAT)],p(T|CA)=c(CAT)/[c(CAA)+c(CAC)+c(CAG)+c(CAT)],
where c(xyz) is the count of codon xyz.
One of the characteristics of coding regions is the lack of stop codons. That is automatically taken care of, because p(A|TA), p(G|TA) and p(A|TG), corresponding to the three stop codons TAA, TAG and TGA, will automatically become zero.
For modeling codon statistics it is natural to use an ordinary (zeroth order) state as the first state of the codon model and a first order state for the second. However, there are actually also dependencies between neighboring codons, and therefore one may want even higher order states. In my own gene finder, I currently use three fourth order states, which is inspired by GeneMark [9], in which such models were first introduced. Technically speaking, such a model is called an inhomogeneous Markov chain, which can be viewed as a subclass of HMMs.
View chapterPurchase book
Chemical and Synthetic Biology Approaches To Understand Cellular Functions - Part A
M. Escarlet Díaz Galicia, ... Raik Grünberg, in Methods in Enzymology, 2019
5.1 Expression vectors
We codon-optimized PTK KD sequences, considering both E. coli codon and codon-pair usage, optimal GC content, enrichment of hidden stop codons and avoidance of sequence repetitions (Chin, Chung, & Lee, 2014). The synthesized genes were recombined into a synthetic protein expression vector (pJExpress411, originally from DNA 2.0, CA) which we had previously modified to include a RBS secondary structure insulation cassette from Mutalik et al. (2013). In our hands, the combination of codon optimization with 5′ mRNA secondary structure suppression very often gives a marked increase in protein expression yields. However, it did not clearly improve PTK KD yields over values reported in the literature (i.e., 22 μg/mL of Src by Albanese et al., 2018 or 5–15 mg/L of Src by Seeliger et al., 2005). In all constructs, the catalytic KD was fused to yellow fluorescent mCitrine (Griesbeck, Baird, Campbell, Zacharias, & Tsien, 2001) followed by a modified TwinStrep purification tag (Schmidt et al., 2013) (Fig. 3). The YopH catalytic domain (residues 164–468) was expressed from a low copy vector with Spectinomycin resistance (Addgene #79749) (Albanese et al., 2018), whereas the human PTP1B (fused to GST) was expressed from a high copy plasmid with ampicillin resistance (Addgene #8602).
Codon Adaptation in Composition of Codon
Composition of codon in genes is known to affect the fate of the protein being synthesized as it influences level of expression, protein folding, and the regulation of protein expression. This observation led to the same common question as the translational selection as discussed above and that is the role of codon bias in determination of the proteins' expression level. As a consequence, each species has its own set of "preferred codons" identified within the highly expressed genes that is assumed to translate efficiently to ensure optimal protein synthesis. Relative to the finding, a matrix known as codon adaptation index (CAI) was developed to assess the relative adaptation of individual codons encoding a certain amino acid. The index measures the ratio of the codon's frequency to the frequency of its other synonymous codons. The index was measured from the highly expressed genes as the set of reference and the CAI for a gene can then be derived through the geometric mean of the relative adaptiveness values of all the codon within the gene (Sharp and Li, 1987). Similar to tAI, CAI also enables prediction of gene expression level but on the basis of the choice of codons encoding the gene (Sen et al., 2007; Wu et al., 2005). The index was also shown to be highly correlated to mRNA concentration of most of the genes tested in Coghlan and Wolfe (2000) and in microarray study of Martín-Galiano et al. (2004). Nonetheless, the CAI index has its own caveats to its predictive power. This is due to the fact that the index relies on a set of reference genes to derive its index, which might imposed limited value to predict expression for genes not reflected in the reference set (Martín-Galiano et al., 2004). The index also failed to include other positive factors influencing the expression value of the composed codons (Ermolaeva, 2001). Consequently, few studies showed a disparity of CAI index and the actual expression value of the genes tested (dos Reis et al., 2003; Kudla et al., 2009). Nonetheless, the index is still widely used as a prediction tool especially in de novo gene synthesis, whereby the CAI value is taken into consideration in the codon optimization algorithm (Chung and Lee, 2012; Condon and Thachuk, 2012; Nandagopal and Elowitz, 2011). The algorithm is developed based on several known factors to influence the expression level of genes based on the composition of codons, other than just the CAI. In addition, there are also other indexes developed to predict expression value based on the codon usage. Some of the alternatives to CAI are relative codon-usage bias (Roymondal et al., 2009), expression measure E(g) (Roymondal et al., 2009), relative codon adaptation (Fox and Erill, 2010) and modified relative codon bias strength (Das et al., 2017).
Positive correlation of the codon composition and its expression level implied that synonymous mutation among orthologous gene products, be it across different individuals or tissues, may no longer be considered as silent (Chamary et al., 2006; Kimchi-Sarfaty et al., 2007; Shields et al., 1988). The choice of codons, even though they are synonymous with respect to amino acids they encode, the information determines the mRNA structure and stability and affecting the protein structure, and its folding kinetics. This was depicted in a study when 154 synthetic orthologous genes with various synonymous variants in their codon showed different level of mRNA level and degradation rates (Kudla et al., 2009). It is important to note that the variant was not entirely caused by the composition of preferred codons throughout the genes but due to the variation in the ribosomal binding site that changed the stability of mRNA folding.