The Genetic Code
Julia E. Richards, R. Scott Hawley, in The Human Genome (Third Edition), 2011
Translation Requires an Adaptor Molecule Called tRNA
The codons in an mRNA molecule cannot and do not directly recognize the amino acids whose incorporation they direct. The cell uses an important organelle called a ribosome to carry out protein synthesis, and the ribosome reads the mRNA sequence through the use of an adaptor molecule called a transfer RNA (tRNA). Basically, one end of this adaptor recognizes one of the codons on the mRNA and the other end of the adaptor has the amino acid that goes with that codon. The way the adaptor recognizes the codon is by having an anticodon, a set of three bases on the tRNA molecule that can base-pair with the codon in the mRNA (Figure 4.3). Each tRNA has an anti-codon at one end and the corresponding amino acid attached at the other end. It turns out that there is a specific tRNA molecule for all but three of the possible codons.
Figure 4.3. The process of translation. As a threonine tRNA sits in place with its threonine attached to the chain of amino acids that have already been added, an expended valine tRNA will have just left, with no amino acid attached, and a tryptophan tRNA is moving into place to add a tryptophan to the growing protein chain.
Three codons do not have a tRNA. These are the stop codons, and when they occur there is no tRNA that can fit into place and drop off an amino acid. This lack of a tRNA moving into place signals the ribosome to stop translating the content of this particular mRNA and to release the mRNA back into the cytoplasm where it can potentially be picked up by another ribosome to start the translation process again. The stop codons are UGA, UAA, and UAG.
How does the process start? The two parts of the ribosome, a large subunit and a small subunit, exist separately in the cytoplasm. When an mRNA is available to be translated the two pieces of the ribosome come together around the mRNA to form an intact ribosome in the process of carrying out translation (Figure 4.4). The ribosome moves along the mRNA, using tRNAs that match the codons on the mRNA as the mechanism for adding amino acids to the growing protein chain, so that each new amino acid added corresponds to the next codon on the mRNA. The result is that the order of amino acids in the protein is directly determined by the order of the codons on the mRNA. Once the ribosome reaches the stop codon it separates into its two subunits and releases the mRNA. Although human and bacterial ribosomes are quite similar in function, they have enough specific differences that it has been possible to develop some important antibiotics that target bacterial ribosomes while leaving the human ribosomes alone (Box 4.1).

Sign in to download full-size image
Figure 4.4. Ribosome assembly and transcription. The large and small subunits come together on the RNA and begin reading the coding sequence at the start codon by inserting a methionine at the first position of the protein chain. The ribo-some moves along the mRNA (or the mRNA moves through the ribosome, depending on your perspective), and as each new codon on the mRNA moves into position the corresponding tRNA clicks into place, precisely positioned to allow the amino acid attached to it to become unattached to the tRNA and to become attached to the growing protein chain. Meanwhile a new tRNA is moving into place so that a new amino acid can be added. When the stop codon is reached, there is no corresponding tRNA to move into position and the ribosome subunits come apart and release both the mRNA and the completed protein chain.
Box 4.1
The Ribosome
The human ribosome is built by bringing together a small ribosomal subunit and a large ribosomal subunit. Each ribosomal subunit is made of about 65% RNA and 35% protein, and the actual active parts of the organelle are the RNA components so this is sometimes classified as a ribozyme. The role of the ribosome is to bring together molecules in a very precise alignment that enables a biochemical reaction to proceed, so it has sometimes been classified as a molecular assembler. Hundreds of proteins are involved in the assembly of a ribosome. Although human and bacterial ribosomes read the same code and have many structural similarities, there are just enough key differences that some very important antibiotics are those that target the bacterial ribosomes while leaving the human ribosomes alone. In 2009 the Nobel Prize in Chemistry was awarded to Venkatraman Ramakrishnan, Thomas A. Steitz, and Ada E. Yonath for their work showing how different antibiotics interact with the three-dimensional structure of ribosomes.
View chapterPurchase book
Intrinsically Disordered Proteins
Jing Li, Vincent J. Hilser, in Methods in Enzymology, 2018
2.1 Mammalian Expression Vectors for GR Translational Isoforms and Luciferase Reporters
Codons to express human GR A isoform NTD and DBD two-domain construct in U-2 OS cells were optimized for mammalian cell expression, synthesized by DNA 2.0 (Menlo Park, CA), and inserted into the PJ603 mammalian expression vector under CMV promoter control. Plasmids for the two-domain constructs of B, C1, C2, C3, D1, D2, and D3 were made by inserting the codons for each respective isoform amplified from A isoform into the NheI and XhoI sites of the PJ603 vector.
Plasmid GRE2-Gluc, to express secreted Gaussia luciferase under the control of two tandem full-length GREs in the promoter, was made by inserting an oligonucleotide containing two tandem full-length GREs, 5′-aattcAGAACAggaTGTTCTgagatccgtagc AGAACAggaTGTTC Tgagatccgtagcg-3′, into the EcoRI and BamHI sites of the pGluc-miniTK vector (NEB). Plasmid pCluc-miniTK2 vector (NEB) to express Cypridina luciferase was utilized as an internal control in the cotransfection to account for cell density differences and transfection efficiency differences in each well.
View chapterPurchase book
Computational Methods in Molecular Biology
Anders Krogh, Phone: +45 4525 2471; Fax: +45 4593 4808, in New Comprehensive Biochemistry, 1998
4.2 Coding regions
The codon structure is the most important feature of coding regions. Bases in triplets can be modeled with three states as shown in Fig. 10. The figure also shows how this model of coding regions can be used in a simple model of an unspliced gene that starts with a start codon (ATG), then consists of some number of codons, and ends with a stop codon.

Sign in to download full-size image
Fig. 10. Top: a model of coding regions, where state one, two and three match the first, second and third codon positions, respectively. A coding region of any length can match this model, because of the transition from state three back to state one. Bottom: a simple model for unspliced genes with the first three states matching a start codon, the next three of the form shown to the left, and the last three states matching a stop codon (only one of the three possible stop codons are shown).
Since a codon is three bases long, the last state of the codon model must be at least of order two to correctly capture the codon statistics. The 64 probabilities in such a state are estimated by counting the number of each codon in a set of known coding regions. These numbers are then normalized properly. For example the probabilities derived from the counts of CAA, CAC, CAG and CAT are
p(A|CA)=c(CAA)/[c(CAA)+c(CAC)+c(CAG)+c(CAT)],p(C|CA)=c(CAC)/[c(CAA)+c(CAC)+c(CAG)+c(CAT)],p(G|CA)=c(CAG)/[c(CAA)+c(CAC)+c(CAG)+c(CAT)],p(T|CA)=c(CAT)/[c(CAA)+c(CAC)+c(CAG)+c(CAT)],
where c(xyz) is the count of codon xyz.
One of the characteristics of coding regions is the lack of stop codons. That is automatically taken care of, because p(A|TA), p(G|TA) and p(A|TG), corresponding to the three stop codons TAA, TAG and TGA, will automatically become zero.
For modeling codon statistics it is natural to use an ordinary (zeroth order) state as the first state of the codon model and a first order state for the second. However, there are actually also dependencies between neighboring codons, and therefore one may want even higher order states. In my own gene finder, I currently use three fourth order states, which is inspired by GeneMark [9], in which such models were first introduced. Technically speaking, such a model is called an inhomogeneous Markov chain, which can be viewed as a subclass of HMMs.
View chapterPurchase book
Chemical and Synthetic Biology Approaches To Understand Cellular Functions - Part A
M. Escarlet Díaz Galicia, ... Raik Grünberg, in Methods in Enzymology, 2019
5.1 Expression vectors
We codon-optimized PTK KD sequences, considering both E. coli codon and codon-pair usage, optimal GC content, enrichment of hidden stop codons and avoidance of sequence repetitions (Chin, Chung, & Lee, 2014). The synthesized genes were recombined into a synthetic protein expression vector (pJExpress411, originally from DNA 2.0, CA) which we had previously modified to include a RBS secondary structure insulation cassette from Mutalik et al. (2013). In our hands, the combination of codon optimization with 5′ mRNA secondary structure suppression very often gives a marked increase in protein expression yields. However, it did not clearly improve PTK KD yields over values reported in the literature (i.e., 22 μg/mL of Src by Albanese et al., 2018 or 5–15 mg/L of Src by Seeliger et al., 2005). In all constructs, the catalytic KD was fused to yellow fluorescent mCitrine (Griesbeck, Baird, Campbell, Zacharias, & Tsien, 2001) followed by a modified TwinStrep purification tag (Schmidt et al., 2013) (Fig. 3). The YopH catalytic domain (residues 164–468) was expressed from a low copy vector with Spectinomycin resistance (Addgene #79749) (Albanese et al., 2018), whereas the human PTP1B (fused to GST) was expressed from a high copy plasmid with ampicillin resistance (Addgene #8602).
Skip to Main content

Genetic Code
Genetic code refers to the assignment of the codons to the amino acids, thus being the cornerstone template underling the translation process.
From: Reference Module in Life Sciences, 2017
Related terms:
Nucleotide
Transfer RNA
Nucleotides
Epigenetics
Phosphoprotein
Nested Gene
Mutation
Codon
View all Topics
The Universal Genetic Code and Non-Canonical Variants☆
A.S. Rodin, S. Branciamore, in Reference Module in Life Sciences, 2017
Abstract
Genetic code refers to the assignment of the codons to the amino acids, thus being the cornerstone template underling the translation process. Genetic code is largely invariant throughout all extant organisms; hence, it is often referred to as the "universal" or "canonical" genetic code. However, a number of extant deviations exist, in both nuclear and organelle (notably, mitochondrial) genomes. These are known as "deviant" or "non-canonical" codes. The emergence of the non-canonical codes posits a number of intriguing questions in regard to the origins and evolution of the universal genetic code and, importantly, has practical implications as certain human mitochondrial diseases have been shown to be linked to the mitochondrial code deviations and translational errors. On a fundamental level, universality (and presumed optimality) of the genetic code is a principal notion underlying its origins, evolution and functionality.
View chapterPurchase book
Gene Expression: Translation of the Genetic Code
Chang-Hui Shen, in Diagnostic Molecular Biology, 2019
Ribonucleotide Bases Are Used as Letters in the Genetic Code
The genetic code is written in linear form, using the ribonucleotides that compose mRNA molecules as letters. The ribonucleotide sequence is derived from the complementary nucleotide bases in the DNA template strand. Therefore, the nucleotide sequence is exactly the same as the DNA coding strand. Each genetic code consists of three ribonucleotide letters, thus referred to as a triplet code. As such, a genetic code is a triplet code in which a sequence of three bases is needed to specify one amino acid. The genetic code translates the RNA sequences into the amino acid sequence (Fig. 4.17). Each group of three ribonucleotides, called a codon, specifies one amino acid. These codes are unambiguous, as each triplet specifies only a single amino acid. Thus, one would imagine that a codon would be at least three bases long. With three bases, there are 43 = 64 codons, which is more than enough to encode the 20 amino acids. Therefore, the genetic code is degenerate, which means more than one triplet can encode the same amino acid. Each amino acid can have more than one codon, but no codon can encode more than one amino acid. Furthermore, the genetic code is universal, as the code can be used by all viruses, prokaryotes, archaea, and eukaryotes.

Sign in to download full-size image
Fig. 4.17. Flowchart demonstrating the central dogma of biology in which DNA is transcribed to mRNA, which is then translated into an amino acid sequence of a protein.
View chapterPurchase book
Translation
A. Liljas, in Encyclopedia of Genetics, 2001
Genetic Code
The genetic code is the universal dictionary by which genetic information is translated into the functional machinery of living organisms, the proteins. The words or 'codons' of the genetic message are three nucleotides long. Since there are four different nucleotides used in messenger RNA (mRNA), this results in a dictionary of 64 words. There are 20 amino acids that are normally used in proteins and which are translated. In addition the translation needs a definition of 'start' and 'stop.' The start codon defines the start of translation as well as the reading frame (the sequence of nucleotide triplets) that is to be translated. The start or initiator codon is identical to the methionine codon. Special mechanisms are used to identify the correct initiation site; in addition there are three stop codons. Thus 61 codons are available for 20 amino acids, and hence the genetic code is degenerate. In the case of leucine, serine, and arginine, there are as many as six codons, whereas methionine and tryptophan have only one codon.
The universal genetic code deviates slightly in mitochondria, where a few codons are translated in alternative ways. The most prevalent are methionine and tryptophan, which have two codons instead of the usual one. Different organisms use the degenerate genetic code differently. The usage of the codons is coupled to the availability to tRNAs that can translate them. Thus the codon usage can differ to the extent that a gene that is transferred from one organism to another cannot be translated unless the new organism is supplemented with extra tRNAs.
View chapterPurchase book
Peptide, Protein and Enzyme Design
C. Hu, J. Wang, in Methods in Enzymology, 2016
4 Synthetic Chemistry-Guided Unnatural Amino Acid Design
Genetic code expansion enables the usage of unnatural chemical groups, which are widely used in organic chemistry but is rare in organisms. At least three advantages can be achieved by doing that. First, the protein scaffold provides a secondary coordination sphere for the organic catalyst, which may enhance their performance, including turnover numbers and enantioselectivity (Durrenberger & Ward, 2014). Second, the structure containing unnatural organic amino acid is genetically encoded. As a consequence, its self-assembly can be easily amplified or improved by directed evolution. Finally, adding unnatural organic molecules enables researchers to solve biological chemistry problems by organic chemical methods, which may be helpful in green chemistry and synthetic biology.
Some unnatural amino acids were inspired by organic chemistry studies. Thus, some unnatural organic molecules and powerful and highly developed synthetic chemistry methods can be introduced to molecular biology (Mann, 1989). In order to be properly incorporated, those organic molecules are required to be converted into an unnatural amino acid first. Based on their structure features, they can be converted into either a "tyrosine" or a "lysine". For instance, if the molecule contained an aromatic ring, it would be suitable for mimicking tyrosine. The tyrosine type unnatural amino acid often contains an aromatic ring that bears the unnatural chemical groups, and a covalently linked aliphatic amino acid part (usually alanine) as the amino acid back bone. On the other hand, a lysine host is more suitable for flexible aliphatic chemical groups. The lysine mimic is usually composed of the unnatural aliphatic chemical groups and a lysine molecule. Usually they are covalently connected by an amide bond or carbamate.
View chapterPurchase book
Amino Acids, Peptides, Porphyrins, and Alkaloids
Dolph L. Hatfield, ... Byeong jae Lee, in Comprehensive Natural Products Chemistry, 1999
4.14.4 Universality of UGA as a Codon for Sec
The genetic code was previously thought to be used in the same way by all organisms and therefore was considered to be universal. However, it is now known that many changes have occurred in the genetic code during evolution65,66 and thus, it has been described as the "almost universal genetic code".73 In the present discussion of the assignment of Sec to the universal genetic code, the code will be referred to as the almost universal genetic code.73
Sec tRNAs that decode UGA in protein synthesis are widespread in nature.5,6,8 Initially, a Sec-tRNA that decodes UGA in protein synthesis was identified intracellularly in E. coli21 and in mammals.23 These studies clearly established the existence of selenocysteyl-tRNA[Ser]Sec and provided strong evidence that the Sec moiety in selenoproteins must arise by direct incorporation of Sec and not by posttranslational modification. The gene encoding Sec tRNA[Ser]Sec was subsequently found to be ubiquitous in the subkingdom Eubacteria74 and tRNA[Ser]Sec or its gene was found to be ubiquitous in the animal kingdom.75 Sec-tRNAs that decode UGA were also found in two very diverse protists, Tetrahymena borealis and Thalassiosira pseudoonana,76 in a higher plant, Beta vulgaris, and in a filamentous fungus, Gliocladium virens.77 Several potential Sec-containing protein genes (i.e., genes that contained TGA in an open reading frame) and a Sec tRNA gene were found in the genome sequence of the archaeon, Methanococcus jannaschii.78 Each of these studies shows that UGA as a codon for Sec occurs in representative organisms from all five life kingdoms, Monera (with its two subkingdoms, Eubacteria and Archaebacteria), Protists, Plants, Animals, and Fungi (see Figure 3 for the delineation of organisms into five life kingdoms).79 Therefore, Sec should be assigned to UGA in the almost universal genetic code5 as shown in Figure 4.

Sign in to download full-size image
Figure 3. Evolutionary tree showing the distribution of Sec in nature.

Sign in to download full-size image
Figure 4. The almost universal genetic code showing the inclusion of Sec as the 21st amino acid.
In addition to UGA, AUG also has a dual function in the almost universal genetic code (see Figure 4).67,69 AUG serves both as a codon that initiates protein synthesis and a codon for methionine at internal positions of protein. The dual role of AUG has been known since the code was first deciphered67,68 and thus it is not surprising that a second codon, UGA, also has a dual function. Furthermore, the fact that two codons have now been identified in the almost universal genetic code with multiple functions raises the possibility that other codewords may also exist with multiple roles.
The genome of Saccharomyces cerevisiae has been sequenced.80 This organism does not appear to encode a Sec tRNA[Ser]Sec gene or any potential selenoprotein genes.8 A S. cerevisiae homologue of the glutathione peroxidase gene in mammals was found to contain a cysteine codon (TGT) at the position where the mammalian gene encodes Sec (codon TGA). Thus, this organism appears to lack the biosynthetic pathways for specific site incorporation of Sec into protein found in other life forms.8 This observation reflects genetic diversity and should not affect our proposal that Sec belongs in the almost universal genetic code. Other yeast forms, such as Candida albicans, are known to encode variations in the almost universal genetic code in their genomes.65,66,81
The fact that S. cerevisiae does not appear to have the system for incorporating Sec into specific sites of protein demonstrates that this means of utilizing selenium is not essential to life. Furthermore, E. coli mutants lacking the ability to incorporate Sec into specific sites of protein can grow normally under certain conditions.82 Do these findings imply that the incorporation of Sec into specific sites of protein is not important in nature? In mammalian systems, the synthesis of specific selenoproteins is essential to sustain life as removal of the Sec tRNA[Ser]Sec gene from the mouse genome by gene replacement or "gene knockout" is embryonically lethal.83 In E. coli, the selenoprotein, formate dehydrogenase, is required to detoxify formate under aerobic growth conditions.82 The ability to incorporate Sec into specific selenoproteins is, therefore, not essential to sustain life in E. coli, but provides these organisms with a selective advantage. Thus, the ability to synthesize specific selenoproteins is essential to some life forms, while it appears to provide only a selective advantage to others. In addition, since this process is widespread in nature, it must then be important as a requirement and/or as a selective advantage to virtually all organisms.
View chapterPurchase book
Codon Usage and Translational Selection
R. Hershberg, in Encyclopedia of Evolutionary Biology, 2016
Abstract
The genetic code is redundant, meaning that most amino acids are encoded by more than one codon. Codons encoding the same amino acid are referred to as synonymous codons. Different synonymous codons are not used equally within the protein-coding sequences of a genome. Rather, a phenomenon of codon bias, by which certain synonymous codons are consistently over represented relative to others, is ubiquitous across living organisms. In this article we discuss the neutral and selective causes of codon bias, focusing on translation optimization considerations as a major source of selection on codon usage.
View chapterPurchase book
CLASSIFICATION OF BIOLOGICAL STRUCTURES
TOM BRODY, in Nutritional Biochemistry (Second Edition), 1999
Genetic Code
The genetic code is shown in Table 1.6. A total of 64 different combinations of the four DNA bases can occur, and 61 of these possible combinations are actually used to specify amino acids. Many of the amino acids are designated by more than one type of codon. This redundant situation is called degeneracy. The genetic code is thus degenerate. ATG codes for methionine. Methionine occurs at various positions in most proteins, and occurs as the first amino acid in essentially all proteins. For this reason, the codon ATG occurs at the beginning of coding regions of nearly all genes. ATG is called the start codon. At the very end of all coding regions, there occurs one stop codon. There exists three different stop codons, and these are TAA, TAG, and TGA. In mRNA, where the start and stop codons actually perform their function, the corresponding codons are AUG (start codon), UAA, UAG, and UGA (stop codons). With rare exceptions, stop codons never code for an amino acid. The sequence of codons that begins with ATG and ends with a stop codon is often called an open reading frame (ORF). The genetic code is the same for eukarya and bacteria, but differs somewhat for archae.
TABLE 1.6. The Genetic Code
Second base in codonFirst base in codonTCAGTTTPheTCTSerTATTyrTGTCysTTCPheTCCSerTACTyrTGCCysTTTALeuTCASerTAAStopTGAStopTTGLeuTCGSerTAGStopTGGTrpCTTLeuCCTProCATHisCGTArgCTCLeuCCCProCACHisCGCArgCCTALeuCCAProCAAGlnCGAArgCTGLeuCCGProCAGGlnCGGArgAATTIleACTThrAATAsnAGTSerATCIleACCThrAACAsnAGCSerATAIleACAThrAAALysAGAArgATGMetACGThrAAGLysAGGArgGTTValGCTAlaGATAspGGTGlyGTCValGCCAlaGACAspGGCGlyGGTAValGCAAlaGAAGluGGAGlyGTGValGCGAlaGAGGluGGGGly
The genetic code indicates the amino acids that are coded for by the codons appearing in DNAand mRNA. To acquire a genetic code for the codons in mRNA, change each thymine (T) to uracil (U). In actual practice, scientists usually refer to a table that contains the DNAcodons, in analyzing genetic information, and rarely use a table that contains the codons appearing in mRNA.
The leap from information in mRNA to the sequence of amino acids in a polypeptide chain is bridged by transfer RNA. Transfer RNA molecules are relatively small, when compared to mRNA and proteins, and consist of only about 40 ribonucleotides. There exist about 40 distinct types of tRNA, and these share the task of aligning the 20 amino acids according to the sequence of ribonucleotide bases occurring in any molecule of mRNA. Since there exist more types of mRNA molecules (about 40) than amino acids (20), one can see that the collection of tRNA molecules is also redundant or degenerate.
View chapterPurchase book
Translation
A. Liljas, in Brenner's Encyclopedia of Genetics (Second Edition), 2013
The Genetic Code
The genetic code is the universal dictionary by which the genetic information is translated into the functional machinery of living organisms, the proteins. The words or the codons of the genetic message are three nucleotides long. Since there are four different nucleotides (A, C, G, U) used in the messenger RNA (mRNA), this leads to a dictionary of 64 words. Translation needs a definition of start and stop. The start codon at the same time defines the reading frame of the sequence of nucleotide triplets that are to be translated. The start or initiator codon is identical to the methionine codon. Special mechanisms are used to identify the correct initiation site. In addition, there are three stop codons. Thus 61 codons are available for 20 amino acids that are normally translated and used in proteins. Thus the genetic code is degenerate. In the case of leucine, serine, and arginine, there are as many as six codons, whereas methionine and tryptophan have only one codon each.
The codon usage is coupled to the availability of tRNAs that can translate them. The codon usage can differ to the extent that if a gene is transferred from one organism to another, it may not be possible to translate unless the gene is altered to match the codon usage of the recipient organism.
View chapterPurchase book
Messenger RNA
A. Liljas, in Brenner's Encyclopedia of Genetics (Second Edition), 2013
The Genetic Code
The genetic code is the universal dictionary by which the genetic information is translated into the functional machinery of living organisms, the proteins. The words or the codons of the genetic message are three nucleotides long. Since there are four different nucleotides used in the messenger RNA (mRNA; A, C, G, and U), this leads to a dictionary of 64 code words. A total of 20 amino acids is normally used in proteins, but translation also needs a definition of a start and a stop of the message. The start codon, AUG, defines the reading frame of the sequence of nucleotide triplets that will be translated. However, the start or initiator codon is identical to the methionine codon. Special mechanisms are used to identify the correct initiation site. With the three stop codons – UAA, UAG, and UGA – there are 61 codons available for the 20 amino acids. Therefore, the genetic code is degenerate. There are as many as six codons corresponding to leucine, serine, and arginine, whereas methionine and tryptophane have only one codon each.
The codon usage is coupled to the availability of tRNAs that can translate them. The codon usage can differ to the extent that if a gene is transferred from one organism to another, it may not be possible to translate unless the gene is altered to match the codon usage of the recipient organism