Chereads / Carbon / Chapter 131 - Genome-C

Chapter 131 - Genome-C

Skip to main page content

COVID-19 is an emerging, rapidly evolving situation.

Get the latest public health information from CDC: https://www.coronavirus.gov

Get the latest research information from NIH: https://www.nih.gov/coronavirus

Find NCBI SARS-CoV-2 literature, sequence, and clinical content: https://www.ncbi.nlm.nih.gov/sars-cov-2/

National Institutes of Health

National Library of Medicine

NCBI homepage

Log in

Access keysNCBI HomepageMyNCBI HomepageMain ContentMain Navigation

Advanced

Title & authors                            Abstract                                                     Comment in                               Similar articles                        Cited by                            Publication types                        MeSH terms                        Substances                                    Related information                            LinkOut - more resources                

Genome-centric view of carbon processing in thawing permafrost

Ben J Woodcroft et al. Nature. 2018 Aug.

Show details

Abstract    PubMed    PMID 

Full-text linksCite

Abstract

As global temperatures rise, large amounts of carbon sequestered in permafrost are becoming available for microbial degradation. Accurate prediction of carbon gas emissions from thawing permafrost is limited by our understanding of these microbial communities. Here we use metagenomic sequencing of 214 samples from a permafrost thaw gradient to recover 1,529 metagenome-assembled genomes, including many from phyla with poor genomic representation. These genomes reflect the diversity of this complex ecosystem, with genus-level representatives for more than sixty per cent of the community. Meta-omic analysis revealed key populations involved in the degradation of organic matter, including bacteria whose genomes encode a previously undescribed fungal pathway for xylose degradation. Microbial and geochemical data highlight lineages that correlate with the production of greenhouse gases and indicate novel syntrophic relationships. Our findings link changing biogeochemistry to specific microbial lineages involved in carbon processing, and provide key information for predicting the effects of climate change on permafrost systems.

Comment in

Permafrost thawing and carbon metabolism.

Du Toit A. Nat Rev Microbiol. 2018. PMID: 30042479 No abstract available.

Similar articles

Metagenomic analysis of a permafrost microbial community reveals a rapid response to thaw.

Mackelprang R, et al. Nature. 2011. PMID: 22056985

Methane dynamics regulated by microbial community response to permafrost thaw.

McCalley CK, et al. Nature. 2014. PMID: 25341787

Methanotrophy across a natural permafrost thaw environment.

Singleton CM, et al. ISME J. 2018. PMID: 29955139 Free PMC article.

The subzero microbiome: microbial activity in frozen and thawing soils.

Nikrad MP, et al. FEMS Microbiol Ecol. 2016. PMID: 27106051 Review.

Dynamics of microbial communities and CO2 and CH4 fluxes in the tundra ecosystems of the changing Arctic.

Kwon MJ, et al. J Microbiol. 2019. PMID: 30656588 Review.

See all similar articles

Cited by 41 articles

Iron mineral dissolution releases iron and associated organic carbon during permafrost thaw.

Patzner MS, et al. Nat Commun. 2020. PMID: 33303752

Thermogenic hydrocarbon biodegradation by diverse depth-stratified microbial populations at a Scotian Basin cold seep.

Dong X, et al. Nat Commun. 2020. PMID: 33203858 Free PMC article.

Metagenomic and Metatranscriptomic Analyses Revealed Uncultured Bacteroidales Populations as the Dominant Proteolytic Amino Acid Degraders in Anaerobic Digesters.

Mei R, et al. Front Microbiol. 2020. PMID: 33193263 Free PMC article.

Effects of set cathode potentials on microbial electrosynthesis system performance and biocathode methanogen function at a metatranscriptional level.

Ragab A, et al. Sci Rep. 2020. PMID: 33188217 Free PMC article.

High-quality bacterial genomes of a partial-nitritation/anammox system by an iterative hybrid assembly method.

Liu L, et al. Microbiome. 2020. PMID: 33158461 Free PMC article.

See all "Cited by" articles

Publication types

Research Support, Non-U.S. Gov't

Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Bacteria / genetics

Bacteria / isolation & purification

Bacteria / metabolism

Carbon / metabolism*

Fermentation

Freezing*

Fungi / genetics

Fungi / isolation & purification

Fungi / metabolism

Global Warming

Metagenome / genetics*

Methane / metabolism

Permafrost / chemistry*

Permafrost / microbiology*

Polysaccharides / metabolism

Soil Microbiology*

Sweden

Xylose / metabolism

Substances

Polysaccharides

Carbon

Xylose

Methane

Related information

enome             AMA                            APA                          MLA                            NLM          Connect

Twitter

Facebook

YouTube

LinkedIn

GitHub

Blog

Support Center

National Center for Biotechnology Information

What Is Genome Size?

Genome size refers to the amount of DNA contained in a haploid genome expressed either in terms of the number of base pairs, kilobases (1 kb = 1000 bp), or megabases (1 Mb = 1 000 000 bp), or as the mass of DNA in picograms (1 pg = 10−12 g). Genome sizes of bacteriophages and viruses range from about 2 kb to over 1 Mb. Prokaryotic genomes range from about 500 kb to about 12 Mb. Eukaryotic genomes are diverse in size, ranging from ∼10 Mb in some fungi to >100 000 Mb in certain plants, salamanders, and lungfishes. Because they can be so large, eukaryotic genomes are usually expressed as 'C-value' (where 'C' stands for 'constant' referring to the fact that the genome size is constant from cell to cell in a given organism or species). C-value is the mass of DNA in picograms (1 pg ≈ 1 billion base pairs or 1000 Mb of DNA) in a haploid set of chromosomes (often measured from gametes). There is no general agreement on how to express genome size in cases of polyploidy, where an individual, population, or species has more than two complete sets of chromosomes. In such cases, C-value still refers to the mass of DNA in a single (haploid) set of chromosomes, which in polyploids may be only one-third or one-fourth or less of the total mass of DNA in the cell. However, since the mass of DNA in the nucleus of a cell has important phenotypic correlates (see below), it is important not to lose this information. One possible solution is to decouple genome size and C-value. This was done many years ago by the botanist Michael Bennett, who also suggested the term 'nucleotype' to refer to the nongenic effects of DNA content on the phenotype, independent of the informational content of the DNA (see below). Here, C-value (or 1C) refers to the haploid genome size, and 2C-value (or 2C) refers to the diploid C-value. Several online databases are now available that contain convenient listings of genome sizes for a large number of organisms.

Invertebrates

Genome size has been reported to correlate negatively with overall developmental rate in a small sample of beetles in the genus Tribolium (Carreras et al., 1991), and with pupal development in ladybird beetles (Gregory et al., 2003). More generally in leaf beetles, Petitpierre and Juan (1994) noted that species with one generation per year possessed C-values greater than 0.6 pg, whereas those with multiple generations had genome sizes smaller than 0.5 pg. It similarly has been suggested that the rapid life cycles of aphids are linked to their small genome sizes (Ma et al., 1992; Gregory, 2002c). An inverse correlation between genome size and developmental rate has also been found in copepod crustaceans (McLaren et al., 1988; White and McLaren, 2000), which may be inescapable due to their programs of determinate growth. Polychaete annelids inhabiting harsh interstitial environments display smaller C-values than macrobenthic species, which is believed to relate to selection for rapid development and small body size (Soldi et al., 1994; Gambi et al., 1997).

Genome Size Under Mutational Pressure

Genome size depends on differences in the rates at which deletions and insertions occur and on the efficiency of natural selection in promoting or eliminating such changes (Lynch, 2007). Small changes in genome size are probably of negligible significance in terms of energetic costs of replication, particularly in multicellular eukaryotes, in which genomes are large and metabolic costs of locomotion and development are many orders of magnitude higher than those of genome duplication. Similarly, while large genomes have higher mutational liability, even in noncoding areas, the disadvantage of a small further increase is likely to be insignificant. Thus, a large fraction of noncoding regions are likely to evolve neutrally with respect to insertions

Genome Sequences

A genome sequence underpins systems biology studies; it is now required for metabolic engineering and is able to be rapidly attained with recent advances in next-generation DNA sequencing technologies. The ATCC 27405 type strain was the first to have its genome sequence determined for this species, and the classical Sanger method of DNA sequencing was used by the U.S. Joint Genome Institute (GenBank accession number CP000568). Professor J.H. David Wu (University of Rochester) and Dr. Michael E. Himmel (National Renewable Energy Laboratory) submitted the proposal to generate the ATCC 27405 genome sequence. Professor Wu's laboratory supplied DNA for the ATCC 27405 genome project and the first draft sequence was available to the public in November 2003; however, repetitive sequences made closing this genome difficult and the genome sequence was not finished until February 2007. The Glimmer49 and Critica50 gene prediction algorithms were originally used and combined to predict gene models, which was followed by a round of manual curation. More recently, an improved gene prediction algorithm was applied to the ATCC 27405 genome and its annotation was updated (GenBank accession number CP000568.1).51 A comprehensive comparison of different annotation versions can be found at http://genome.ornl.gov/microbial/cthe/. As algorithms continue to improve and novel features such as small regulatory RNAs are discovered and identified, it is likely there will be refinements to genomes.

Since the first C. thermocellum genome sequence was generated, there has been a revolution in DNA sequencing technologies.52 Twenty genome sequences for Clostridia species across multiple genera were recently determined,53 two of which were for C. thermocellum strains JW20 (4150) and LQRI (DSM 2360). Finished and draft genomes have been described for C. thermocellum strains DSM 1313,54 YS and derivative strain AD2,55 and strain BC1.23 The genome sequence for strain ATCC 27405 has been used to design oligonucleotides for strain DSM 1313, indicating that they are closely related,56 which was confirmed by subsequent in silico genome comparisons.4 C. thermocellum DSM 1313 is the background strain for one recently developed genetic system (see below). A summary of several key genome features is provided for wild-type strains for which the genome sequences are available (Table 1). Although there are strain-level differences in gene content for encoding transposes and restriction systems,4 many of the differences in genome sizes and the number of predicted genes likely reflect differences in sequencing technologies, assembly methods, and gene prediction algorithms. Longer read technologies continue to develop, and we expect that such approaches will be useful to improve genome assemblies,57 which will facilitate comparative genomic studies. Future comparative genomic studies may permit more refined bioinformatics predictions for genes, operons, and cis-regulatory motifs and insights into phenotypic differences reported for strain BC1 or others such as hypercellulase production.58

Table 1. Summary statistics for wild-type C. thermocellum genome sequences

StrainStatusGenome Size (bp)% G + CTotal GenesProtein Coding GenesrRNA OperonsRef.aATCC 27405Finished3,84,3301393,3353,236451DSM 1313Finished3,56,1619393,1023,031429YSDraft3,84,3301393,0813,026155JW20 (DSM 4150)Draft3,32,1980393,0272,9793b53LQRI (DSM 2360)Draft3,45,4608393,1473,091153BC1Draft3,45,4918393,1593,095423

aWith the exception of strain BC1, data were obtained from the Integrated Microbial Genomes database on September 7, 2013.bThree 16S rDNA genes were identified for strain JW20, but only single copies of the 5S and 23S genes were noted.

These C. thermocellum genome sequences have been leveraged to produce a genome-scale metabolic model.59 This model consisted of 577 reactions, 525 intracellular metabolites, and 432 genes. In addition to providing a tool to predict modifications that could improve fuel production, it also highlighted gaps in metabolic pathways. Because these missing reactions are part of essential metabolic pathways, they either represent incorrectly annotated genes or situations in which C. thermocellum uses an unusual pathway. Future studies will be needed to resolve this question. The Roberts model was further updated using RNAseq data, further improving this tool.60

View chapterPurchase book

Metabolic Engineering – Applications, Methods, and Challenges

Shang-Tian Yang, ... Yali Zhang, in Bioprocessing for Value-Added Products from Renewable Resources, 2007

4.5 Omics and high-throughput tools

The genome sequences of many microorganisms, including E. coli, S. cerevisiae, C. glutamicum, Bacillus subtilis, Lactococcus lactis, and C. acetobutylicum, have been completed and many more are in progress. These genome sequences are available for functional genomics and metabolic engineering research. Besides genomic sequence information, transcriptomic, proteomic and metabolomic data can be generated at an everincreasing rate from high-throughput technologies, such as DNA sequencers, microarrays (gene chips), two-dimensional gel electrophoresis combined with tandem mass spectrometry, and isotopic label distributions probing metabolic phenotypes (see Table 4). Genome libraries can be analyzed to identify the genetic basis of relevant phenotypes [161, 196]. DNA microarrays can be used to efficiently analyze gene disruption/molecularly barcoded mutant libraries to identify genes essential to a particular phenotype [197, 198]. Transcriptional profiling or the analysis of genome-wide gene expression (transcriptome) can differentiate between genes with altered expression levels, either as the result or cause of the phenotype of interest [199, 200]. Proteomics enables quantitative profiling of cellular proteins using two-dimensional gel electrophoresis or chromatography coupled with mass spectrometry [201, 202]. High-throughput quantification of metabolites (metabolome) by sophisticated NMR, gas chromatography-mass spectrometry (GC-MS), liquid chromatography-mass spectrometry (LC-MS), and MALDI-TOF MS enables the comparative analysis of metabolite profiles under genetic and environmental perturbations [203, 204]. Intracellular fluxes can then be obtained by metabolite balancing using computational methods, such as MFA and FBA, and isotopomer experiments using 13C-labelled substrates [171–173]. These fluxomic data allow us to have a better understanding of physiological states and phenotypic behaviors and the relationship between genes and their functions, also referred to as phenomics [205].

Table 4. Omic data and high-throughput experimental tools useful to metabolic engineering

OmicsDataToolsReferencesGenomicsthe DNA sequence of the genomeDNA sequencerTranscriptomicsthe abundance of all mRNA's of a genomecDNA microarray[196]Proteomicsthe presence or absence of all proteins of the genome2D electrophoresis Mass spectrometry Protein microarray[201, 202]Metabolomicsintracellular concentrations of metabolitesGC-MS, LC-MS NMR[203, 204]Fluxomicsthe steady-state rates at which extracellular metabolites are producedFlux & isotopomer balance[171–173]

In the post-genomic era, the vast genome sequence information can be used to manipulate the metabolism of the organism, resulting in more efficient production strains [206, 207]. Through comparative analysis of wild-type and recombinant strains, genomics can be used to identify gene targets [208], and transcriptomics have been used to optimize fermentation conditions [209] and understand regulatory mechanisms [199, 210]. Today, functional genomics, which combines transcriptomic, proteomic, and metabolomic data, can provide insights on cellular metabolism that are difficult to obtain with traditional approaches. Based on results from functional genomic studies, new metabolic pathways that are expressed under different conditions or stress can be identified, and new strategies for rationally engineering metabolic pathways and cellular properties can be developed [160–162].

View chapterPurchase book

The isolation and improvement of industrially important microorganisms

Peter F. Stanbury, ... Stephen J. Hall, in Principles of Fermentation Technology (Third Edition), 2017

Application of genomics

Genomics uses the tools of gene sequencing and bioinformatics to study the biology of an organism at the chromosomal level. The knowledge obtained from a whole genome sequence enables the development of a raft of new information on the functioning of an organism both at, and below, the level of the genome itself. Comparison of the sequence with gene databases enables the prediction of the role of each gene and the protein each produces. Thus, the development of information systems, and the means to interrogate them, has been as crucial to the success of genome investigation as has been the sequencing science. Three major DNA databases have been established: GenBank (USA), EMBL-BANK (Europe), and DDBJ (Japan) that receive information from laboratories, and share it with each other, on a daily basis. The information stored in these data depositories is available to the public and thus enables laboratories all over the world to benefit from, and contribute to, the development of the subject. The searching of both DNA and protein databases for matching sequences is enabled by a number of algorithms such as BLAST (Basic Local Allignment Search Tool). Thus, the term in silico has been added to in vivo and in vitro, describing a new era of biological exploration.

The complete genome sequence of C. glutamicum ATCC 13032 was first elucidated in 2001 by the Japanese company Kyowa Hakko Kogyo Co., Ltd. (Nakagawa et al., 2001) and deposited in the public database (GenBank NC_003450). Kyowa's competitor, Ajinomoto, sequenced the genome of a closely related species, Corynebacterium efficiens, in 2002 (Fudou et al., 2002) and deposited it in 2003 (GenBank database, NC_004369). Quite independently, Kalinowski et al., 2003 published the sequence of C. glutamicum ATCC 13032 in 2003 and in 2007 the sequence of C. glutamicum strain R was published by Yukawa, Omumasaba, Nonaka, Kos, and Okai (2007). Ohnishi et al. (2002) was the first to apply the knowledge of C. glutamicum's genome sequence in an attempt to produce a "minimum mutation strain." Amino acid producing strains that have been developed by mutation and selection have proved to be highly successful commercial organisms. However, the selection of desirable traits using, for example, analog resistance, does not prevent the coselection of other mutations that negatively affect strain performance. Thus, strains that have undergone multiple mutation/selection procedures may have accumulated a range of undesirable mutations resulting in their being less vigorous, slower growing, and less resistant to stressful conditions. Also, the presence of background mutations may confuse the interpretation of the mechanism of over production that may, in fact, not be due to a "selected" mutation, thus making further logical, directed strain improvement problematic. As discussed earlier, protoplast fusion was used in an attempt to remove deleterious markers by generating recombinants between high producing strains (that lacked vigor) and wild types that grew well but did not over produce. Ohnishi et al.'s more direct strategy was to compare key gene sequences of a high lysine producing strain of C. glutamicum (B6) with that of the fully sequenced wild type to identify any mutated genes that could be responsible for over production. The influence of the mutated genes on lysine production could then be assessed by their sequential introduction into the wild-type by allelic replacement (Fig. 3.31). Ohnishi et al. focused their initial attention on 16 genes of the terminal lysine pathway (Fig. 3.32) and, knowing the sequence of the wild-type, were able to prepare PCR primers based on the nucleotide sequences flanking each intact gene. The PCR products derived from the high producer were then sequenced and compared with the wild type genes. Each of the following five genes were shown to contain a point mutation:

Sign in to download full-size image

Figure 3.31. Ohnishi et al.'s Strategy for the Development of a Minimal Mutation L-Lysine Producing Strain of Corynebacterium glutamicum by the Sequential Addition to the Wild-Type of Mutations Identified From the Production Strain (Ohnishi et al., 2002)

Sign in to download full-size image

Figure 3.32. L-Lysine Biosynthetic Pathway in C. glutamicum

Enzymes encoded by genes: asd, aspartate semialdehyde dehydrogenase; aspC, aspartate aminotransferase; dapA, dihydrodipicolinate synthase; dapB, dihydrodipicolinate reductase; dapC, succinyl-L-diaminopimelate aminotransferase; dapD, tetrahydodip-icolinate succinylase; dapE, succinyl-L-diaminopimelate desuccinylase; dapF, diaminopimelate epimerase; ddh, diaminopimelate dehydrogenase; hom, homoserine dehydrogenase; lysA, diaminopimelate decarboxylase; lysC, aspartokinase; lysE, lysine exporter; lysG, lysine exporter regulator; ppc, phosphenolpyruvate carboxylase; pyc, pyruvate carboxylase.

Modified from Ohnishi et al. (2002). Further details are given in Fig. 3.37 and Fig. 3.38.

hom—coding for homoserine dehydrogenase

lysC—coding for aspartokinase

dapE—coding for succinyl-L-diaminopimelate desuccinylase

dapF—coding for diaminopimelate epimerase

pyc—pyruvate carboxylase

The two dap mutations (E and F) were considered negligible because they resulted in neither amino acid substitution nor change to a rare codon. It can be seen from Fig. 3.16 that the control of the aspartate family of amino acids in C. glutamicum is achieved by the concerted inhibition of aspartokinase by lysine and threonine and the inhibition of homoserine dehydrogenase by threonine. The mutant alleles of lysC and hom (designated lysC311 and hom59 respectively) were introduced individually into the wild type strain by allelic replacement. The presence of lysC311 gave the phenotype of resistance to the lysine analog S-(2-aminoethyl)-L-cysteine (AEC) and hom59 resulted in a partial requirement for homoserine, observations commensurate with the history of the original producer strain (B6). Analog resistance of aspartokinase may be expected to release feedback inhibition; and partial auxotrophy for homoserine would result in depleted threonine, thereby lifting inhibition of homoserine dehydrogenase. The lysC311 single mutant produced 50 g dm−3 lysine; the hom59 mutant produced 10 g dm−3 lysine whereas a wild type background transformed with both mutations resulted in a synergistic production of 75 g dm−3 lysine. Crucially, this reconstructed strain resembled the wild-type in its high growth rate and rate of glucose consumption, indicating that the background of deleterious mutations introduced by the many rounds of mutation and selection had been circumvented.

The final mutation revealed in this work was that of pyc coding for pyruvate carboxylase (pyc458), an anaplerotic enzyme fixing carbon dioxide in the synthesis of oxaloacetate, the immediate precursor of the aspartate family. Previous work on the lysine fermentation had concentrated on the terminal pathway and had not addressed the supply of precursors. The further incorporation of pyc458 along with lysC311 and hom59 into the wild-type resulted in a strain (designated AHP-3) producing 80 g dm−3 lysine and, importantly, the highest production rate of 3.0 g dm−3 h−1 reported at that time; the high production rate being due to the high growth rate of the strain. Pyruvate carboxylase had not been a target in the strain improvement process used in the development of strain B6 and no selection mechanism existed for its isolation. Thus, the mutation had been coselected along with selectable markers during the process, illustrating that the undefined background of the industrial strain included both desirable and undesirable lesions. It may be recalled from our earlier discussion that the three key focal points for yield improvement are—control of the terminal pathway, provision of precursors, and the provision of NADPH. Thus, Ohnishi, Katahira, Mitsuhashi, Kakita, and Ikeda (2005) turned their attention to the supply of NADPH by investigating the genes associated with the pentose phosphate pathway, the major source of NADPH. Again, the gene sequence of the wild-type was used to prepare PCR primers based on the nucleotide sequences flanking each intact gene of the pentose phosphate pathway. Following comparison of the sequences of the PCR products with the wild-type, a point mutation was identified in the gnd gene, coding for 6-phosphogluconate dehydrogenase. Using the same allelic replacement methodology described earlier the mutated allele was added to the manipulated wild-type containing pyc458, lysC311, and hom59. The yield of this strain improved by 15% and again retained the vitality of the wild-type such that the fermentation was completed in 30 h, compared with 50 for the industrial B6 producer. Thus, this mutation had also been coselected and its addition to the other three mutations in a background free of undesirable lesions led to the development of a high-producing vigorous strain.

View chapterPurchase book

Integrated Production of Butanol from Glycerol

Keerthi P. Venkataramanan, Carmen Scholz, in Biorefineries, 2014

11.2.1 Improving Product Yield and Productivity

The lack of a genome sequence for C. pasteurianum has limited the engineering of this microorganism using the tools developed for Clostridium species and Gram-positive bacteria in general. A recently released draft genome sequence offers clear insights into the metabolic organization and capabilities of the organism, however [19]. The use of Pacific Biosciences RS II technology to identify host R-M (restriction-methylation) systems can lead to the development of genetic tools to facilitate the study and generation of genetically intractable strains of this bacterium [20,21], and one study has reported the successful use of chemical mutagens, such as N-methyl-N-nitro-N-nitosoguanidine (NTG), to generate a mutant strain of C. pasteurianum capable of enhanced butanol production and greater selectivity for butanol over other coproducts [16]. Generated by chemical mutagenesis, a mutant strain of C. pasteurianum ATCC 6013 produced butanol with a yield of 0.43 g per gram of glycerol consumed, leading to an approximately 50% increase in butanol formation compared to the production of the wild strain. Jensen et al. (2012) demonstrated another method of chemical mutagenesis using ethyl methyl sulfonate (EMS), a chemical mutagen that has been shown to be very effective in producing mutant strains [17,18]. A mutant strain using EMS mutation was able to tolerate crude glycerol at a very high concentration of 205 g/L. The generation of these mutant strains eliminates the need to cleanse crude glycerol of its inhibitory compounds, mainly free fatty acids in the form of soap. The mutant MN06 strain generated using EMS was also capable of growing in crude glycerol at a high rate of glycerol utilization (7.59 g/L h), while also producing more butanol and 1,3-PDO than the same strain grown on technical grade glycerol. The continuous fermentation of glycerol offers numerous advantages over batch fermentation in terms of maintaining process parameters at a level that maximizes production of the desired product. The butanol formation is a phase-dependent process, indicating a strong correlation between productivity and the pH of the medium. In a batch culture, the cells have to undergo acidogenesis, which results in the reduction of pH, a condition favorable for solvent formation. Maintaining an acidic pH results in solventogenic fermentation of the medium into predominantly butanol, leading to higher total butanol productivity.

Researchers have also studied fermentations that use a secondary carbon at a minimal concentration to increase butanol titers. Specifically, simple sugars (glucose, xylose, and arabinose) and acids (lactate) have been employed as the secondary carbon source. The application of glucose in this manner has been found to increase butanol productivity and titer leading to an increased yield of butanol from glycerol [22]. Equally, the addition of thin stillage and lactate into the glycerol fermentation broth has yielded similar results, enhancing butanol production by C. pasteurianum [23].

View chapterPurchase book

RADIONUCLIDE IMAGING

LORAINE V. UPHAM, DAVID F. ENGLERT, in Handbook of Radioactivity Analysis (Second Edition), 2003

4. DNA Microarray Applications

The emergence of whole genome sequence data has brought about gene array technology for differential gene expression, mutation screening, sequence analysis, and drug target identification. The commercial availability of mouse, human, and rat genome sequences preprinted on nylon membranes provide a convenient way to conduct gene array assays using radiolabeled sequences and a storage phosphor system. Storage phosphor imaging technology provides the long linear dynamic range and accuracy required for detection of subtle changes in a large range of gene expression levels that can occur within a given experiment. The following are two examples of the use of storage phosphor screen imaging for radiolabeled gene array samples.

Gene arrays can be used to analyze the effects of drug treatments at the molecular level. Atlas Rat Toxicology II arrays (Clontech, Palo Alto, CA) are filters containing rat liver total RNA. Filters containing 465 unique cDNA fragments in duplicate were hybridized with radiolabeled cDNA reverse transcribed from RNAs isolated from Rats exposed to Fenofibrate drug treatment for 10 days. Gene expression profiles from control and treated animals were analyzed to look for clues to the changes that may be a result of drug treatment and potentially cause adverse effects in humans (Jiao and Zhao, 2002). Filters were exposed 18–24 h on SR screens and scanned with the Cyclone. Images are overlaid in QuantArray software to determine which genes are up or down regulated with drug treatment. Figure 13.13a, b are the images obtained by this method. Figures 13.14a,b show the scatter plot display of quantified spots and results of one spot as analyzed by QuantArray (Upham and Fox, 2001).

Sign in to download full-size image

FIGURE 13.13. Cyclone images of rat liver total RNA hybridized with control (a) and Fenofibrate treated (b) rat liver total RNA reverse transcribed into cDNA and radiolabeled with 33P-dATP.

Sign in to download full-size image

FIGURE 13.14. (a) Scatterplot display of comparison of control and treated filters based on data from Cyclone; (b) Representation of specific spots as selected from Scatterplot.

Another application of the use of quantitative gene array analysis is in research on effects of the environment on human gene expression. For example, it is well documented that exposure to sun causes or results in an increase in actinic keratosis and eventually squamous cell carcinoma (Hodges and Smoller, 2002). Researchers at University of New Mexico collect punch biopsies from patients diagnosed with squamous cell carcinoma (SCC). Four samples are collected from each patient including tissue from (a) the SSC, (b) an actinic keratosis, a precursor lesion of SCC, (c) adjacent sun exposed normal skin, and (d) unexposed skin from the buttocks. Total RNA is isolated from each sample, reversed transcribed into cDNA and labeled with α-33P-dATP, and hybridized to an ID1001 DermArray Filter containing 5000 human genes from Invitrogen/Research Genetics (Carlsbad, CA). After washing, membranes are exposed to SR screens for 24 hours, to bring out low expressors, and scanned on the Cyclone system. The intensities of the spots correspond to the relative abundance of various transcripts at the time that the RNA was harvested. By comparing multiple filters, differences in gene expression profiles between each of the states, such as tumor versus adjacent normal skin and unexposed versus exposed skin can be observed. Figure 13.15 are quantitative images of these high density gene array filters.

Sign in to download full-size image

FIGURE 13.15. Gene expression arrays probed with mRNA isolated from normal and tumor tissue.

(Courtesy of Dr. Bryan E. Alexander, University of New Mexico.)

Additional references in literature describe the use of storage phosphor screen imaging for receptor binding assays (Chan et al., 1991); Southern and northern blot analysis (Muller and Gebel, 1994; Robben et al., 2002) western blot analysis (Taylor et al., 1992; Shelton et al., 1994); Gel shift assays (Zinck et al., 1993; Olivas and Maher, 1995); BioChip Imaging (Schena, 2000) and Microarray assays (Popovici et al., 2000), other related isotopes (Gonzalez et al., 2002), and double label autoradiography (Pickett, et al., 1992).

View chapterPurchase book

Conclusions and Future Trends

Eleni I. Georga, ... Stelios K. Tigas, in Personalized Predictive Modeling in Type 1 Diabetes, 2018

10.2 Toward Precision Diabetes Medicine

Knowledge of the human genome sequence has contributed significantly to our understanding of the pathogenesis and the underlying molecular mechanisms of both types 1 and 2 diabetes [13–16]. The HLA class II alleles account almost for 50% of the genetic susceptibility to Type 1 diabetes, whereas GWAS explain a lower proportion of type 2 diabetes heritability. Meyer et al. find no apparent way for precision medicine to permeate insulin therapy of type 1 diabetes and they locate its role mainly in (1) identifying effective preventive interventions targeting genetically susceptible individuals (characterized as primary prevention) and (2) understanding the immune mediators and propagators of β-cell destruction in individuals with islet autoimmunity (characterized as secondary prevention) [13]. To this end, the Environmental Determinants of Diabetes in the Young (TEDDY) multicenter study has already provided significant insight into the genetic-environmental associations triggering the development of islet autoimmunity or promoting type 1 diabetes progression in genetically at risk children (≤5 years) [17–19]. Recently, TEDDY study provided evidence of clear differences in the initiation of autoimmunity (insulin autoantibodies, GAD antibodies) according to genetic factors (e.g., presence of SNPs rs689 [INS], rs2476601 [PTPN22], rs2292239 [ERBB3], rs3184504 [SH2B3], rs3757247 [BACH2]), and environmental exposures (i.e., sex, family history, HLA, country, probiotics at age 28 days, weight at age 12 months) in infants with HLA-DR high-risk genotypes followed-up until 6 years of age [18]. On the other hand, Meyer et al. postulate that type 2 diabetes pharmacological interventions are more likely to benefit from the inclusion of precise knowledge on an individual's genotype and on predictive biomarkers of secondary complications; however, they acknowledge the lack of robust scientific evidence at present which would guide drug regulation. As a first step toward type 2 diabetes therapy individualization, the GRADE comparative effectiveness long-term study of commonly used glycemia-lowering medications (i.e., sulfonylurea, DPP-4 inhibitor, GLP-1 receptor agonist, insulin) when combined with metformin, having enrolled ~5000 participants, assesses the differences in study outcomes by race/ethnicity, sex, age, diabetes duration, weight, body mass index, HbA1c, and measures of insulin sensitivity, insulin secretion, and the glucose disposal index [20].

The NIH Precision Medicine Initiative in tandem with other linked precision medicine activities (e.g., the National Heart, Lung and Blood Institute Trans-Omics for Precision Medicine [TOPMed] Program), supporting the collection of longitudinal multivariate data (genome, proteome metabolome, microbiome, exposome, and phenome) from large population cohorts, will enable the development of systems biology approaches to elucidating the underlying pathophysiological mechanisms of diabetes onset and progression and the identification of new biomarkers of diabetes-related vascular complications [3,15,21–23]. Data mining of daily longitudinal self-monitoring data (e.g., continuous glucose monitoring, physical activity, stress) along with EHR data is an additional valuable asset, which has the potential to explain both the short-term and long-term glycemic status of an individual and facilitate the evaluation of the glycemic effectiveness of a specific intervention [24–28]. In addition, the consequent finer stratification of people with type 1 or type 2 diabetes per se, possibly defining new diabetes subtypes, could provide opportunities for more effective personalized therapeutic schemes as well as for new hypotheses about disease pathogenesis and medical care which could be tested at different stages of disease progression [3,15,21–23]. A paradigm is the Integrated Human Microbiome Project (iHMP) aiming at identifying physiological changes in microbiome–host omics temporal profiles during healthy and stress conditions [4]. The diabetes-associated iHMP exemplar substudy tests the individual effect of stress [i.e., medical illness, physical injury/pain, major or minor operation, major life changes (birth, death, divorce, marriage, and change of home or job)] on the human microbiome, metabolome, and epigenome, as well as its common effect on the host and the microbiome based on 3-year longitudinal observation of ~60 individuals at risk for developing type 2 diabetes. Multiomic analysis, including whole metagenome shotgun and meta-transcriptome sequencing, host whole genome/transcriptome sequencing, cytokine and autoantibody profiles, metabolome profiles, and standard clinical lab tests and surveys of behavioral and psychosocial information (e.g., physical activity, food intake, stress), lay the foundation for analyzing the biological properties of the human microbiome and host during the onset and progression of type 2 diabetes.

View chapterPurchase book

Recent Advancements on the Role of Biologically Active Secondary Metabolites from Aspergillus

Shafiquzzaman Siddiquee, in New and Future Developments in Microbial Biotechnology and Bioengineering, 2018

4.2.19 Aflatrem

The availability of A. flavus genome sequence data, the tremorgenic indolediterpene aflatrem (1) (Fig. 4.22) had used degenerate primers for conserved domains of geranylgeranyl synthases to clone a GGPP synthase gene (atmG) and used chromosome walking to identify a cluster containing two additional secondary metabolite genes (atmC and atmM) (Zhang et al., 2004). Penicillium paxilli generated a structurally similar indole-diterpene, paxilline (2) (Fig. 4.22). A plasmid containing a copy of atmM was introduced into a strain of P. paxilli missing the ortholog paxM, rescuing paxilline production and implicating atmM (and the clustered genes) in aflatrem biosynthesis. As a result of the whole genome sequencing of A. flavus, four additional candidate aflatrem genes were located on another chromosome, based on their homology to paxilline genes (Nicholson et al., 2009). The monooxygenase gene atmP was introduced into a P. paxilli paxP mutant, which resulted predominantly in the synthesis of paxilline.

Sign in to download full-size image

Figure 4.22. Structures of aflatrem (1) and paxilline (2).

View chapterPurchase book

Proteomic Techniques for Functional Identification of Bacterial Adhesins

Elisabet Carlsohn, Carol L. Nilsson, in Lectins, 2007

5 Proteomic Analysis of H. pylori

Since the completion of the genome sequence of the two H. pylori strains 26695 and J99 in 1997 and 1999, respectively [72, 73], a large number of proteomic analyses have been applied to this pathogen. This has made the H. pylori proteome one of the best characterized microbial proteomes.

The first proteomic investigation of H. pylori aimed towards identification of diagnostic and vaccine candidates [91]. By use of 2D-GE and MALDI-TOF analysis, McAtee and coauthors identified twenty proteins including urease, flagellin, and AlpA, which were found to be reactive with sera from H. pylori infected patients. Two years later, the group of Jungblut et al. presented a comparative proteome analysis of three different H. pylori strains. They used MALDI-TOF MS to identify 126 proteins from strain 26695. Several virulence factors, including urease and HpaA, were detected, but no OMPs were identified. Their main finding was the high proteomic variability between the strains, which most likely depends on shifts in the amino acid composition of certain proteins [92].

In 2002, Sarabath et al. published a new proteomic approach in which intact H. pylori cells were biotinylated followed by affinity purification of membrane proteins using streptavidin. Several virulence factors, including two OMPs (HefA and HP1564) were found among the eighteen identified proteins [93]. The same year, Jungblut and coworkers used immunoproteomics for identification of H. pylori antigens. They reported that a number of antigens, including some surface proteins, were recognized differently by sera from patients with different clinical outcomes, and thereby demonstrated the potential to use certain proteins as candidate indicators for clinical manifestations [94]. This group also published a study in which they performed proteomic analysis for characterization of the H. pylori secretome [95].

Later, Hynes et al. used a protein chip technology for the comparison of OMP profiles between H. pylori strains and found alterations in the protein profile between culture collection strains and clinical isolates with low numbers of passages [96]. In 2004, Lee et al. presented a proteomic analysis of a ferric uptake regulator H. pylori mutant [97] and Baik et al. used subcellular fractionation in combination with 2D-GE analysis to identify sixteen OMPs expressed by H. pylori strain 26695. Four OMPs (Omp11, Omp14, Omp20 and Omp21) were found to be immunoreactive [98]. Recently, a subproteomic study resulted in identification of numerous virulence factors including some OMPs [99]. These authors are now aiming towards the establishment of a dynamic 2D-GE reference database with multiple subproteomes of H. pylori.

Standard proteomic approaches can be useful for mapping protein expression, but cannot be used easily to assign protein functions to their identity. For adhesins, knowledge of a receptor saccharide should be possible to use for the functional identification of the microbial protein, provided that the genome sequence of the microbe is available, using a proteomics approach combined with affinity tagging.

View chapterPurchase book

Identification of Genetic Targets to Improve Lignocellulosic Hydrocarbon Production in Trichoderma reesei Using Public Genomic and Transcriptomic Datasets

Shihui Yang, ... Min Zhang, in Direct Microbial Conversion of Biomass to Advanced Biofuels, 2015

Trichoderma reesei Protein Function Annotation and Pathway Reconstruction

Although the 34-Mb genome sequence of T. reesei has been reported and annotated,24 the annotation has not been systematically conducted since its first release to reflect the recent exponential explosion of the genomic information. To identify the proteins related to hydrocarbon (e.g., terpenoid and fatty acid) biosynthesis, metabolism, and regulation, the protein sequences of T. reesei has been extracted and reannotated functionally. In brief, 9143 protein sequences containing all manually curated and automatically annotated models chosen from the filtered model sets representing the best gene model of each locus (TreeseiV2_FrozenGeneCatalog20081022.proteins.fasta) were downloaded from the JGI website (http://genome.jgi-psf.org/Trire2/Trire2.download.ftp.html) and imported into CLC Genomics Workbench (V7.0) as the reference protein sequences for Blast search. In addition, the protein sequences were also imported into Blast2GO for the functional annotation and CAZYmes Analysis Toolkit (CAT) for analysis and annotation of CAZYmes (Carbohydrate Active enZYmes),59,60 which was then compared to a recent reannotated CAZy genes of T. reesei.22 The KEGG pathways were extracted from annotation result, as was the information of KOG, enzyme code, and the reaction substrate(s) and product(s). The potential homologous gene(s) in T. reesei were identified by reiterated BlastP searches. The information of protein product and conserved domains were examined, and the pathway was reconstructed with the enzyme and pathway information from literature search (Figure 1).

Sign in to download full-size image

Figure 1. Flowchart of pathway reconstruction and omics data integration for this study.

View chapterPurchase book

Plant Metabolic Engineering

Neelam S. Sangwan, ... Rajender S. Sangwan, in Omics Technologies and Bio-Engineering, 2018

9.3.3.3.3 Genome-Scale Model-Based Analysis

These are the models based on genome sequences with stoichiometric reactions. These global metabolic pathway models are used for understanding metabolism and predicting phenotypes, identifying essential genes, determination of targets for metabolic engineering. Though many genome-scale models of microorganisms are available, these models are still limited in case of plant owing to compartmentation within the cell with distinct tissues and organs (Poolman et al., 2009; Grafahrend-Belau et al., 2009; Hay and Schwender, 2011; Saha et al., 2011). Nevertheless, it is very important to test the model experimentally and validate it for its promising application in plant metabolic engineering. Some examples may be studied regarding the application of microbial genome-scale models in metabolic design and it was achieved by the application of OptForce a computational, multilevel optimization procedure which predicted the complete set of metabolic modifications (knockout, upregulate, downregulate) in E. coli leading to the overproduction of the target chemicals (acetyl CoA and malonyl CoA) approximately four times more than wild type. An integrated flux technology is thus capable of providing more specific targets quantitatively (Xu et al., 2011). Experimentally, assignment of only 13% of plant genes is achieved with computational assignment of a few genes and rest being still unknown (Collakova et al., 2012).The free-living nematode Caenorhabditis elegans is a key laboratory model for metazoan biology. C. elegans has also become a model for parasitic nematodes despite being only distantly related to most parasitic species. All of the ∼65 Caenorhabditis species currently in culture are free-living, with most having been isolated from decaying plant or fungal matter. Caenorhabditis bovis is a particularly unusual species that has been isolated several times from the inflamed ears of Zebu cattle in Eastern Africa, where it is associated with the disease bovine parasitic otitis. C. bovis is therefore of particular interest to researchers interested in the evolution of nematode parasitism. However, as C. bovis is not in laboratory culture, it remains little studied. Here, by sampling livestock markets and slaughterhouses in Western Kenya, we successfully reisolated C. bovis from the ear of adult female Zebu. We sequenced the genome of C. bovis using the Oxford Nanopore MinION platform in a nearby field laboratory and used the data to generate a chromosome-scale draft genome sequence. We exploited this draft genome sequence to reconstruct the phylogenetic relationships of C. bovis to other Caenorhabditis species and reveal the changes in genome size and content that have occurred during its evolution. We also identified expansions in several gene families that have been implicated in parasitism in other nematode species. The high-quality draft genome and our analyses thereof represent a significant advancement in our understanding of this unusual Caenorhabditis species.Genomics comprises several distinct areas of research; transcriptomics, the study of global RNA expression; genotyping, measurement of DNA polymorphisms and mutations; and bioinformatics, the systematic analysis of biological data generated by technologies such as genomics. The field of genomics has had a rocky past, not only in toxicology, but in the biomedical sciences in general. This is due primarily to the nature of the studies. In a major shift from the research paradigm that has dominated research since the earliest philosophers and thinkers, genomics studies do not require a hypothesis. They are, in fact, considered "hypothesis generating." Other ways of describing the genomic approach are, "not hypothesis limited" or "discovery-based" investigations. For scientists trained from their earliest science fair projects in the absolute requirement for a testable hypothesis, genomics is indeed a "fishing expedition" and unfamiliar territory.Introduction

Genomics is the study of genes, their structure, function, and expression. The number of genes present in a particular genome depends on the species. But it is very difficult to determine the exact number of genes present in a genome. Scientists estimate that the human genome, for example, has about 20 000–25 000 protein-coding genes. There is no clear relationship between the genome size and number of genes. The number of protein-coding genes usually caps off at around 25 000 or so, even as genome size increases.Advances in Genome Biology

Volume 5, 1998, Pages 179-210

Genome architecture

Author links open overlay panelAndrei O.Zalensky

Show more

Outline

Share

Cite

https://doi.org/10.1016/S1067-5701(98)80021-1Get rights and content

Publisher Summary

This chapter describes the genome architecture. There is no doubt that spatial order exists within the cell nucleus, and genome architecture is a prominent constituent within this order. An extreme example is metaphase chromosomes, which reappear during each cell division in a reliable, recognizable, and reproducible pattern. A much more meaningful example is the chromosome structure in the nucleus during the interphase, during which DNA expresses and multiplies. Interphase nuclei are arranged so that replication, transcription, repair, and RNA processing can occur at restricted sites. These events are accompanied (and may be regulated) by dynamic changes in genome architecture at levels well above an individual gene and even chromatin structure. The current study of genome architecture is a very dynamic and active field of research relevant to such topics as cell differentiation, carcinogenesis, development, and others; yet after more than a century, this field is still in its adolescence. Collectively, genome architecture refers to the spatial arrangement of chromosomes within the nuclear volume. (The same term is used in a different context to describe the relative linear organization of DNA sequences of different types (e.g. unique, repetitive, etc.).