ChIP-on-chip and Genome Tiling Microarrays
Summary and Importance
ChIP stands for Chromatin Immunio Precipitation. The second chip refers to a microaray chip. The ChIP on chip technique is commonly used to locate transcription factor binding sites within a genome. A transcription factor is a protein that binds to DNA, therefore, trancription factors are found in the nucleus. In a cell nucleus, transcription factors regulate transcription by binding to particular pieces of the genome. The DNA they bind to is called a promotor. Promotors are typically close to the "beginning" of a gene, or upstream of a gene. But their location can vary. Once a transcription factor binds the promotor, trasncription of the nearby gene is either turned up or down. If it is turned up, more RNA is produced from that gene. If it is turned down, less RNA is produced. In this way, transcription factors are very important to a cell. They control whether or not a gene is expressed. For example, genes that are turned on in the eye may not be turned on in the skin. Transcription factors also control when a gene is turned on. For example, it is very important in development to have genes turned on and off at the right times. Trancription factors also control how much of a gene product is produced. For example, when temperature rises many organisms produce heat shock proteins.
Immunoprecipitation
Overview
ChIP chip is a method to find the binding sites (promoters) for a particular transcrioption factor of interest (of the researcher's choosing). First the researcher chooses an organism, tissue type, or developmental stage in which her transcription factor of interest is present. Then the researcher uses immunoprecipitation to collect all the pieces of DNA that bind to the transcription factor. They do this in two parts. The first part is not specific to the particular transcription factor. It involves isolating DNA from a cell while preserving any DNA to protein bonds. The second step is the specific step. In this step, the researcher selects only the DNA that is bound to the transcription factor of interest.
Image 1: The Immunoprecipitation Process
Specific Methods
First the researcher collects cells that express thier trancription factor of interest. The cells can be "fixed" in formaldehyde, which maintains the bonds between the protein and nucleic acid during the next procedure. Then the cells are lysed (broken open) and the DNA extracted with the bound proteins "fixed". The DNA is then sonicated (or sheared) to produce fragments about 500bp long. This results in a pool of small pieces of DNA, most of these DNA pieces are not bound to the transcription factor of interest. To select the desired transcription factor binding sites, an antibody which specifically binds to the transcription factor of interest is introduced to the DNA. Using this technique, the antibody binds to the transcription factor/DNA complex and precipitates it from solution. In this way, only DNA that is bound to the transcription factor should be collected. DNA that is not a binding site for the transription factor is washed away.
Finally, the bond between the transcription factor and the DNA is removed, and this DNA is purified.
Note on specificity of Procedure
This immnopreciptiation process is not entirely specific. Antibodies sometimes bind to other proteins within the cell instead of the transcription factor studied. This process generates signifigant noise that must be accounted for later in the analysis of results.
PCR Amplification
Why do we need to amplify?
The ChIP (chromatin immuno precipitation) can produce a very small amount DNA. In order to make enough DNA to use in our microarray (the microarray chip is the second chip in "ChIP on chip"!), the researcher uses PCR amplification.
How do we adapt the DNA for PCR?
PCR (polymerase chain reaction) requires a primer. A primer is a short sequence of DNA that matches the end of the DNA you want to amplify. In this case, we do not know the sequence of the DNA we want to amplify (we want to amplify the 500 base pair pieces of sheared DNA that each contain a DNA binding site specific to the transcription factor...but we don't know the sequence of this binding site, that is what we are trying to find out by ChIP on chip!). So how can we amplify this DNA? Instead of using specific primers, we ligate these 500bp DNA pieces into a plasmid. A plasmid is a circular piece of DNA for which we know the sequence. To make this easy, we just mix up all of the sequences we found from the immuno precipitation with lots of lots of copies of the same plasmid. This creates many pieces of circular DNA, each with the same plasmid backbone but with different 500bp insertions. We can amplify them all at the same time using primers based on the plasmid sequence (which we know). If you have never heard of plasmids before, please look at the following simple diagram to get an idea about how DNA is inserted into a plasmid. Do not worry about the language, just look at the pictures:
Image 2: Ligating DNA fragments into a plasmid
Microarray
Preparing the probe sequences
The PCR reaction has now amplified all of the 500bp long fragments that contain a transcription factor binding site (note the binding site itself is much smaller than 500bp, there is a lot fo extra DNA). This pool of DNA is mixed with nucleotides containing flourophores, such that it is made flourescent under the appropriate light. This DNA is also made single stranded.
Preparing the chip
The right microarray chip is from the same organism and is dotted with many short DNA sequences. These sequences are single stranded and represent potential promoter sequences found in the genome of the organism being studied. How are the potential promoter sequences chosen? Mnay times they are unknown, and the purpose of the ChIP on chip study is to find new promoter sequences. In this case, the segments of DNA on the chip are all intergenic DNA (DNA that is not protein coding, so may possibly be a promoter). In other cases for organisms with known promoters, the chip is dotted with promoter sequences.
Applying the probe to the chip
When the pool of flourescent DNA is applied to the chip, any matches between the pool and the chip will hybridize because both the pool and chip are single stranded. When a flourescent light is applid to the chip, any spots that have bound to their match in the pool of DNA will flouresce. The sequence of each spot on the microarry chip is already known, so now you can look up which promoter sequences were bound to your transcription factor.
Reality Check
McDreamy
The previous paragraph is a pretty dreamy explanation of how a microarray chip works. In fact, all of the spots on the microarray chip will probably light up at least a little bit because expression is not so perfectly and tighty regulated in a cell. As a result, quite a few controls must be performed in this experiment.
Controls
Typically these microarrays are done multiple times and done in unison with control ChIP chip experiments in which no specific transcription factor is used to search for promotors. Sequences that are either enriched or absent in experimental trials as compared to controls are likely candidates for promotors that bind specifically to the transcription factor, as opposed to random DNA fragments that are found in many immuno precipitation reactions.
What kind of chip? The Genome Tiling Microarray
A genome tiling microarray is pretty incredible. It can contain an entire genome! The genome is broken up into very small (25bp pieces) and each little segment is a spot on a microarray chip. This is very useful for ChIP on chip analysis for a few reasons that I will go through now.
Our knowledge of promotor sequences is quite limited. If we were to use all intergenic DNA to make a microarray chip, and break this DNA into large pieces to create each spot (unlike a tiling array which has small pieces), we would not know what parts of those large pieces were promotors, and what parts were just the sequence next to the promotors. Promoters are generally thought to be pretty short, on the order of 20bp. The size of the oloigonucleotide spots on a tiling array chip is very amenable to studies like ChIP on chip that are searching for promoters
The use of tiling microarrays is also very very important to allow for analysis of the 500bp long fragments we obtained in our immunoprecipitation experiment. These larger fragments impose the same problem as described above. We know we have found a site that bound to our transcription factor of interest, but we think it is much smaller than 500bp. How do we tell which part of our 500bp sequence has hybridized with the 25bp long sequences on the chip? How do we detect what short sequence is the promotor from this larger piece?
This is solved by using a bell curve correction, as described below. We remember that most of the genome is represented by 25 bp spots on the chip, so the DNA in the 500 bp fragment pool will hybridize to the promoter site PLUS will hybridize to the sites next to the promoter site that are represented by the 500bp fragment. How does this help! Now I am telling you multiple spots will light up, not just the right spots. Actually, this does help, because the right spot should light up the most. Why? See below.
Data Analysis:Bell Curve Correction
I will again describe the problem in case people are reading out of order. The first factor complicating data analysis is that each 500bp fragment is longer than the promotor sequence it contains. It will bind to many of the little spots on the microarry, even though most of the spots are not the promotor. THe promoter sequence is probably only 20bp or so. How do we tell which specific piece is the promotor sequence? This is solved with the help of a computer algorithm, and is best understood by the following diagram:
The key point here is you assume the sonication broke the genomic DNA in random places. For each 500 bp DNA fragment containing a real transcription factor binding site, there are about 480 extra base pairs in the fragment. In the most common situations, there are about 240bp extra bases on each side of the binding site, or 200 on one side and 280 on the other side. In the two extreme situations, the binding site is on the very end of the 500bp fragment, such that all 480 extra base pairs are on one side or the other. In every case, the binding site itself MUST be present because it was used to bind the TF and precipitate the DNA fragment. So of all of the 500bp, the piece that binds the chip the most will be the binding site.
The pieces 480bp to the right or left of the bindng site will hybridize to the chip much less often then the binding site itself, becaue they are only present in rare cases when the binding site is all the way on one side or another of the 500bp fragment. And the pieces close to the binding site will hybridize less than the binding site, but more than the ends. THis creates a bell shape curve, where the promoter (aka TF binding site) is in the very center.
So to find the real binding site you sequence through the 500 bp fragments (find the DNA sequence on a sequencing machine). Then you find all the spots on the chip that would have hybridized to any of the 500 bp fragment. THen you compare all these spots to see which one got bound to the most (the most enriched piece, and the one with the strongest floresecne). This spot represents the real transcription factor binding site. Hurray!
Finding Transcription Factor Binding Motifs
Actually, the florescence doesn't automatically find the transcription factor binding sites. Really, these are just larger probes that probably have smaller, say 10bp binding motifs. So the DNA probes must be mined for the true binding sites. Certain binding sequences must be enriched in the probes and these correspond to the sites to which a given protein can bind. Each motif is scored based on signal abundance, conservation, and how likely the sequence is to occur randomly based nucleotide biases in the genomic background. Longer motifs are preferred, but sometimes it is difficult to find long conserved motifs.
(equation for calculating motif values- from class notes)
This scoring method is related to the TF motif finding methods seen here: TF finding page.
MAT
MAT stands for Model based Analysis of Tiling arrays. MAT offers a way to tell whether or not a specific ChIP-chip experiment has worked.
Sonication shears DNA at random locations, producing fragments that are randomly positioned about the transcription factor binding site. Also, fragments are substantially bigger than the binding site, meaning that many probes centered about the true binding site will be enriched. Furthermore, probes occasionally mismatch, binding incorrectly to other pieces of DNA. Most ChIP-chip experiments require at least three replicates, and many researchers have trouble getting experimental techniques to work.
MAT provides a means for ascertaining whether or not a given ChIP-chip experiment has produced meaningful results. The standard ChIP-chip experiment houses such signficant noise that a single microarray is insufficient to discern signals. MAT predicts the amount of noise that a given probe should generate. Probes are assigned probe values based on their GC content and positional effects of nucleotides within the probe. Probes are then sorted by probe value into bins so that all probes within a given bin should theoretically generate similar noise levels. Any probe which generates a high signal relative to the other probes in its bin is more likely to indicate significant signal. Then end result is a list of the true peaks for any given ChIP-chip experiment, and experimental noise should be drastically reduced.
(MAT probe value equation- from class notes)
ChIP-on-chip with Yeast
Researchers at MIT attempted to map the entire S. cerevisiae regulatory network. They took each transcription factor and added to it a small peptide calledy myc. They already had an antibody that attacks myc, so they could do ChIP on each modified transcription factor. From all of their ChIP experiments they were able to find most Yeast TF binding sites. They then constructed the full yeast regulatory network shown below.
S. cerevisiae regulatory network
One detailed network of genes
There is the possibility that adding the myc epitope could change protein structure in such a way that the transcription factor might be less functional. So the experiment is probably accurate for most transcription factors, but some connections in the network might need a little tweaking with transcription factor specific antibody driven ChIP-chip.
One of the main reasons that this regulatory mapping was feasible is that S. cerevisiae genome is extremely compact. S. cerevisiae has very little intergenic DNA, even compared with other yeasts. Furthermore, most regulatory sequences fall very close to the genes they modify. These two factors allow for the almost all the potential regulatory regions to fit on a reasonable number of chips, making this ambitious project possible.
The mapping of the S. cerevisiae regulatory network has been crucial for the success of network biology. Many recent studies concerning the topology and stability of biological networks have relied heavily on the yeast protein network as produced via ChIP-chip. This field attempts to examine the form biological networks typically take, and to examine them for the influences of selection on network structure, either through enrichment of certain structural motifs, or through examining the impact these motifs have on robustness and molecular flux in organisms.
ChIP-on-chip in Humans
Affymatrix has made available genome tiling arrays which contain all non-repetitive DNA in the human genome. When combined with the immunoprecipitation technique, these chips can be used to find transcription factor binding sites in humans. Agilent technologies has also started offering a set of gene chips containing potential human promoter sequences with one 60mer per 100-300bp.
ChIP-chip in humans is much more difficult in humans than in yeast. Human regulatory regions can be extremely far from the genes they modify and are occasionally bi-directional, so simply assigning a given regulatory sequence to a downstream gene is not trivial. The distance factor also makes it difficult to include all potential regulatory sites in a single ChIP-chip experiment. Hence, it may be difficult to construct a gene regulatory map in humans like the one in yeast.
ChIP-chip has been used to identify over 3500 unique binding sites for the Estrogen Receptor in breast cancer cells. However, the identification of the genes regulated by ER binding was very difficult. The ChIP-chip experiment was then combined with RNA microarray studies to properly identify the genes ER affects. Some of the genes were even 250,000bp away from the binding site. ChIP-on-chip has also been used to find human promoter sequences under a variety of conditions, including regulatory regions for stem cells, and to find DNA-binding enzymes used in DNA repair.
Summary
ChIP-on-chip is a process commonly used to discover transcription factor binding sites and other important points of DNA-protein interaction. DNA bound to proteins within a cell is collected through immunoprecipitation. The DNA is then hybridized with a microarray to try to discover which sections of the genome the DNA bound to. Potential binding sites are then found through computational means similar to those used in TF binding site discovery discussed in an earlier wiki page. The ChIP-chip process generates a substantial amount of noise, and the methods used are sometimes difficult to perfect. MAT offers a means of correcting for large amounts of noise, to find true peaks for an experiment. High-throughput ChIP-chip has been used to map the S. cervisiae regulatory network and is currently being used to examine the human regulatory network.
Group members
- Rebekah Rogers
- Kerry Geiler
Sources
www.chiponchip.org
www.reactome.org
http://web.wi.mit.edu/young/regulator_network/
Class Notes
Top 20 Journal Articles
1. Alekseyenko, A.A., et al., High-resolution ChIP-chip analysis reveals that the Drosophila MSL complex selectively identifies active genes on the male X chromosome. Genes Dev, 2006. 20(7): p. 848-57. abstract
2. Blanchette, M., et al., Genome-wide computational prediction of transcriptional regulatory modules reveals new insights into human gene expression. Genome Res, 2006. 16(5): p. 656-68. abstract
3. Buck, M.J. and J.D. Lieb, ChIP-chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments. Genomics, 2004. 83(3): p. 349-60. abstract
4. Buck, M.J., A.B. Nobel, and J.D. Lieb, ChIPOTle: a user-friendly tool for the analysis of ChIP-chip data. Genome Biol, 2005. 6(11): p. R97. abstract
5. Bulyk, M.L., DNA microarray technologies for measuring protein-DNA interactions. Curr Opin Biotechnol, 2006. 17(4): p. 422-30. abstract
6. Herring, C.D., et al., Immobilization of Escherichia coli RNA polymerase and location of binding sites by use of chromatin immunoprecipitation and microarrays. J Bacteriol, 2005. 187(17): p. 6166-74. abstract
7. Horak, C.E. and M. Snyder, ChIP-chip: a genomic approach for identifying transcription factor binding sites. Methods Enzymol, 2002. 350: p. 469-83. abstract
8. Ji, H., S.A. Vokes, and W.H. Wong, A comparative analysis of genome-wide chromatin immunoprecipitation data for mammalian transcription factors. Nucleic Acids Res, 2006. 34(21): p. e146. abstract
9. Ji, H. and W.H. Wong, Computational biology: toward deciphering gene regulatory information in mammalian genomes. Biometrics, 2006. 62(3): p. 645-63. abstract
10. Jin, V.X., et al., A computational genomics approach to identify cis-regulatory modules from chromatin immunoprecipitation microarray data--A case study using E2F1. Genome Res, 2006. 16(12): p. 1585-95. abstract
11. Johnson, W.E., et al., Model-based analysis of tiling-arrays for ChIP-chip. Proc Natl Acad Sci U S A, 2006. 103(33): p. 12457-62. abstract
12. Li, W., C.A. Meyer, and X.S. Liu, A hidden Markov model for analyzing ChIP-chip experiments on genome tiling arrays and its application to p53 binding sequences. Bioinformatics, 2005. 21 Suppl 1: p. i274-82. abstract
13. Moses, A.M., et al., Large-scale turnover of functional transcription factor binding sites in Drosophila. PLoS Comput Biol, 2006. 2(10): p. e130. abstract
14. O'Geen, H., et al., Comparison of sample preparation methods for ChIP-chip assays. Biotechniques, 2006. 41(5): p. 577-80. abstract
15. Pyne, S., B. Futcher, and S. Skiena, Meta-analysis based on control of false discovery rate: combining yeast ChIP-chip datasets. Bioinformatics, 2006. 22(20): p. 2516-22. abstract
16. Scacheri, P.C., G.E. Crawford, and S. Davis, Statistics for ChIP-chip and DNase hypersensitivity experiments on NimbleGen arrays. Methods Enzymol, 2006. 411: p. 270-82. abstract
17. Smith, A.D., et al., Mining ChIP-chip data for transcription factor and cofactor binding sites. Bioinformatics, 2005. 21 Suppl 1: p. i403-12. abstract
18. Tsai, H.K., et al., Method for identifying transcription factor binding sites in yeast. Bioinformatics, 2006. 22(14): p. 1675-81. abstract
19. Wu, J., et al., ChIP-chip comes of age for genome-wide functional analysis. Cancer Res, 2006. 66(14): p. 6899-902. abstract
20. Wu, W.S., W.H. Li, and B.S. Chen, Computational reconstruction of transcriptional regulatory modules of the yeast cell cycle. BMC Bioinformatics, 2006. 7: p. 421. abstract
Comments (0)
You don't have permission to comment on this page.