Epigenetic Regulation


Epigenetic Regulation

 

Contributed by Josh Paulk, Jacob Carlson, and Linjiao Luo


 

Introduction:

 

Epigenetics can be defined as “heritable changes in gene function that occur without a change in the DNA sequence.” These changes include covalent modification of histone proteins, methylation of DNA, and structural alterations in the higher-order chromatin structure. Gene expression can be greatly influenced by these epigenetic changes, thus deciphering and understanding the ‘epigenome’ is key to understanding how certain cell states are maintained. Bioinformatics has played an extremely important role in the advancement in this field. Below is an overview of the current state of epigenetics and some of the technology developed to advance progress in the field. Epigenetic-control operates on three major levels: nucleosome dynamics, histone modifications, and DNA methylation

 

Nucleosome Dynamics:

 

The Nucleosome:

 

DNA in eukaryotic organisms is packaged in a nucleic acid-protein complex called chromatin. All genetic processes depend on DNA in the chromatin context and are regulated through the controlled access of cellular machinery to free DNA. The nucleosome (shown below) represents the fundamental repeating unit of chromatin(occurring every 157–240 bp depending on the organism 10). The nucleosome core is composed of an octamer of histone proteins; two histone H3–H4 dimers and two histone H2A–H2B dimers (complexed to 147 bp of DNA). However, these canonical histones can be replaced by histone variants (e.g. H2A.X, H3.3, etc.) for the purposes of DNA repair, replication, etc. 9. In the chromatin fiber, this core is accompanied by an additional histone, H1, that ‘locks’ the complexed 147 bp of DNA on to the nucleosome core (and complexes with ~50 bp of linker-DNA).

 

 

Nucleosomes Regulate Gene Expression:

 

Nucleosomes can block RNA polymerase access to promoter regions of genes during transcription initiation; thus, allowing regulation of expression at this level. Although the promoters of many genes (mostly housekeeping genes) have low nucleosome occupancy, genes that are regulated on the level of chromatin have their promoters effectively blocked. Fortunately, chromatin remodeling proteins can remedy this situation by actively changing the position of the nucleosomes to allow the assembly of the initiation complex. In order to identify which genes are regulated in this way, many studies have aimed to understand how and where these nucleosomes are positioned.

 

Mapping Nucleosome Positioning:

 

Mapping on arrays-

Nucleosome positioning at a specific locus can be determined by digestion of total genomic DNA with microccocal nuclease (MNase) and then analysis of protected fragments by microarray (see figure below). DNA microarrays with tiling oligonucleotide probes allow the nucleosome positions to be determined at a genome scale. Unfortunately, the resolution of this method is dependent on the spacing between adjacent oligonucleotide probes; thus, it is very difficult to generate a genome-wide map of nucleosome positioning at a high resolution for organisms with very large genomes (e.g. human).

 

Mapping by sequencing-

In a 2008 paper published in Cell, Shones and colleagues demonstrated that high-resolution genome-wide maps of nucleosome positions could be constructed by suing Solexa high-throughput sequencing technology in place of microarrays. The authors of this study, first, isolated total human genomic DNA (from T cells) and used MNase-digesting to obtain protected fragments of approximately 150 bp. Finally, they sequenced the ends of the mononucleosome-sized DNA using the Solexa sequencing technique.

to map genome-wide nucleosome positions (see figure below). Figure 1 (from Shones et al., 2008) has been included to illustrate some of the data obtained.

 

 

 

Genomic Code for Nucleosome Positioning:

Using the data from nucleosome position mapping (including sequences with both low and high affinity for nucleosomes), Segal and colleagues (in 2006) aimed to discover the genome code for nucleosome positioning. Combining these experimental and computational approaches to determine DNA sequence preferences for nucleosome positioning and the nucleosome organization of the genome that occur from these preferences, the authors discovered that eukaryotic genomes use a nucleosome positioning code to allow for specific chromatin-level regulation at certain genes (see Figure 1 from Segal et al., 2006 below).

 

 

 

Histone Modifications:

 

Types of Post-translational Histone Modifications

 

Post-translational modification of the core histone subunits of nucleosomes is a fundamental mechanism by which the transcriptional activity of an associated gene locus can be regulated. Typically, specific residues (lysine, arginine, serine, threonine) at the positively charged N-terminal histone ‘tail’ are subject to modifications that include acetylation, methylation, phosphorylation, and ubiquitination. Within a histone tail, more than 100 single modifications are possible, which allows an enormous diversity of unique multiple-modification states.

 

 

 

Though the regulatory consequences of a given modification state can be subtle, some clear trends regarding specific modifications have begun to emerge. For example, acetylation of histone tail lysines often correlates with an increase in accessibility of an associated chromatin region to transcriptional machinery, presumably because the neutralization of positive charge diminishes charge-based DNA-histone interactions, thus relaxing the condensed nucleosome structure. The impact of other modifications, such as lysine methylation, depends on the residue being modified, as illustrated below.

 

 

 

Technologies for Studying Histone Modification Patterns

 

Studies in which a particularly histone modification is associated with a chromatin state have become increasingly common, and are enabled by the ChIP-chip and ChIP-seq technologies. The workflow for these methods is described below.

 

 

For ChIP-chip, cells are treated with formaldehyde to cross-link DNA with associated proteins, after which the DNA is harvested and fragmented by enzymatic (e.g. nuclease digestion) or physical (e.g. sonication) means and incubated with an antibody that is specific for the histone modification of interest. The antibody/histone/DNA complexes are recovered (ChIP), and the associated DNA is liberated, amplified using fluorescently tagged primers, and hybridized to a DNA microarray (chip). Tiling arrays are particularly useful for this analysis because they provide fairly high-resolution whole genome coverage that includes non-coding regions (intergenic regions and introns) which often contain regulatory sequence elements.

 

The ChIP-seq method uses a similar antibody immunoprecipitation technique, but after this step the liberated DNA is ligated to small adapter sequences, amplified, and identified by high-throughput parallel sequencing (e.g. Solexa)

 

 

The result of a ChIP-chip or ChIP-seq experiment is a profile of the enrichment of the histone modification of interest across the entire genome. In ChIP-chip this enrichment is observed as an increased ratio of a given DNA fragment derived from ChIP isolation relative to untreated nuclear DNA. In ChIP-seq the absolute number of fragments of a given sequence is counted and used to determine abundance. In either setup the results reveal DNA loci that are hot-spots for a particular histone modification. Shown below is a comparison of the results obtained from ChIP-chip and ChIP-seq for a specific histone modification in a particular region of the genome. Though both methods can introduce subtle biases, in general the agreement between methods is quite strong.

 

 

Examples of Histone Modification Studies

This information can be used at a rudimentary level to annotate the occupied sequence loci for various modified histones, and in particular the localization of these nucleosomes relative to the transcription start sites and other regulatory sequence elements of genes. A higher level analysis can also correlate the frequency of these modifications with expression levels. Shown below are data that correlate the abundance of particular histone modifications with their position relative to transcription start sites, and the expression levels of downstream genes obtained from classical microarray experiments of total cellular transcripts. The results indicate that some modifications (H3K4me) are correlated with increased gene expression, while others (H3K27me3) correlate with decreases gene expression. The peaks observed in the H3K4me3 for genes at high expression levels occur at +50, +210, and +360 based which correlates well with the known spacing interval for nucleosome positioning. Furthermore, the dip in abundance at the transcriptional start site is consistent with local nucleosome depletion of actively expressed genes.

 

 

Histone Modifying and Reading Enzymes

 

The enzymes that install these modifications are themselves regulated by specific upstream signaling processes and therefore serve as intermediates between a cellular input and a desired transcriptional effect. The resulting histone modification state reflects the integration of input from multiple cellular processes, thereby enabling the sophisticated coordination of complex epigenetic states. There are various mechanisms by which a histone modification state can be ‘translated’. One direct mechanism involves the simple electrostatic interactions of DNA with histones; positively charged lysine residues will interact more tightly with negatively charged DNA than will neutral (acetylated lysine) or negatively charged (phospho-serine, phospho-threonine) residues. In this way histone modifications can impact the extent of condensation of chromatin and thus the accessibility of an associated gene to transcriptional machinery.

 

Another mechanism involves the ‘reading’ of histone tails by specialized protein domains that bind to specific modified residues. A listing of these domains and their recognition elements is shown below. Proteins that contain these domains can be specifically recruited to nucleosomes with a particular histone modification which results in the localization of the protein’s activity to specific chromosome regions. These activities include chromatin remodeling, further modification of histones, and recruitment or blocking of transcriptional machinery.

 

 

DNA Methylation:

What is DNA Methylation?

Another well characterized epigenetic mechanism is DNA methylation. It involves addition of a methyl group (CH3) to the selected sites on DNA. This process is usually carried out by a group of enzymes called DNA methyltransferases. The figure below shows the most common eukaryotic DNA modification phenomena, which happens to the number 5 carbon of the cytosine pyrimidine ring.

 

 

DNA Methylation and CpG Islands:

DNA methylation mainly happens at CpG sites, where cytosine and guanine separated by a phosphate. In human genome, the frequency of CpG is usually pretty low, except for some very small “islands”. We call those CG islands. Even though ~80% CpG sites are methylated in the low CpG frequency area, most of the CpG sites on CG islands are not methylated. People found that the genes at low CpG frequency area are often inactive, while genes located on CG islands are often very important for the cell development or cell function. DNA methylation is believed to play an important role to this phenomenon. Imagine that all the important genes are methylated, while all the inactive genes are unmethylated. Over evolutionary time scales, the methylated CG will be converted to TG by accidental deamination, thus be progressively eliminated from the genome. But the unmethylated CG will be conserved.

 

DNA Methylation and Cancer

 

DNA methylation is one of the most important epigenetic modifications involved in disease. Recently, lots of study report that abnormal methylation may cause cancer.

As mentioned before, important genes on GC islands are normally unmethylazed. This is essential for transcription of the genes. Once the CG islands become methylated, the gene associated becomes permanently silenced. Cancer cells are characterized by abnormal DNA methylation patterns. There are two categories of DNA methylation aberrations: transcriptional silencing of tumor suppressor genes by CpG island promoter hypermethylation and a massive global genomic hypomethylation. The figure below shows the example of the first categories

 

 

Technologies for Studying DNA methylation:

There are different types of strategies for high-throughtput detection of DNA methylation. They are either microarray based or sequencing based.

MeDIP-Chip

MeDIP stands for methylated DNA immunoprecipitation. Chip means microarray. MeDIP-chip means using microarray after methylated DNA immunoprecipitation to map DNA methylation sites in the genome. This is a microarray based method. It’s very similar to CHIP-chip. What’s special about MeDIP-Chip is that an antibody against 5-methyl-cytosine is used to specifically recognize methylated DNA. The figure below shows the basic principle of MeDIP-chip. After sonication, DNA is randomly sheared to 300~1000 bp fragments. Then the sonicated DNA is immunoprecipitated with an antibody against 5-methyl-cytosine. Since CpG sites are not evenly distributed in the genome (remember GC island?), both the methylation status of the target sequence and the density of the CpG sites can influence the enrichment of the MeDIP. We should use unmethylated DNA to do the control. Input DNA and methylated DNA are differentially labeled with Cy3 and Cy5. We can use the same method to do the pick finding as what we did in CHIP-chip.

Methyl-Seq

Methy-Seq is a sequencing based method. The below figure shows an unpublished Methyl-Seq map from D. Johnson and R.Myers. Mathylation specific enzymes were used to digest the DNA. The CpG island fragments were sequenced. Here “U” means unmethylated sequence and “M” methylated sequence.

Conclusions:

 

An understanding of the role of epigenetic modifications in cell regulation and cell-fate determination will continue to be enhanced by the use of genome-wide analysis technologies and biostatistics. These methods reveal global trends in nucleosome positioning, histone modifications, and DNA methylation that correlate with fundamental biological phenomenon (e.g. gene expression regulation). Such analysis is critical for understanding the significant influence of biological determinants that are not strictly encoded in the primary DNA sequence.

 

 

 

References:

1) Barski et al. High-resolution profiling of histone methylations in the human genome. Cell (2007) 129, 823-37

2) Schones et al. Genome-wide approaches to studying chromatin modifications. Nat Rev Genet (2008) 9, 179-91

3) Mikkelsen et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature (2007) 448, 553-60

4) Taverna et al. How chromatin-binding modules interpret histone modifications: lessons from professional pocket pickers. Nat Struct Mol Biol (2007) 14,   1025-40

5)  Bernstein et al. The mammalian epigenome. Cell (2007) 128, 669-81

6) Segal, E., et al. A genomic code for nucleosome positioning. Nature 44 (2006) 2, 772-778

7) Segal, M. R. Re-cracking the nucleosome positioning code. Stat Appl Genet Mol Biol (2008) 7, Article14 

8) Luger, K., et al. Nucleosome and chromatin fiber dynamics. Curr Opin Struct 2005, 15, 188-196 

9) Schones, D. E, et al. Dynamic regulation of nucleosome positioning in the human genome (2008) 132, 887-898 

10) Davey, C. A., et al. Solvent mediated interactions in the structure of the nucleosome core particle at 1.9 a resolution. J Mol Biol (2002) 319, 1097-1113

11) Singalet et al. DNA methylation.Blood (1999) Jun 15;93(12):4059-70.

12) Liu XS. Getting started in tiling microarray analysis. PLoS Comput Biol. 2007 Oct;3(10):1842-4.

13)Wold B. et al. Sequence census methods for functional genomics. Nat Methods. 2008 Jan;5(1):19-21. Epub 2007 Dec 19.

14) Zhang X. et al. Genome-wide high-resolution mapping and functional analysis of DNA methylation in arabidopsis.Cell. 2006 Sep 22;126(6):1189-201. Epub 2006 Aug 31.