| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

Transcription Regulation and Transcription Factor Motif Finding

This version was saved 17 years, 3 months ago View current version     Page history
Saved by PBworks
on December 18, 2006 at 10:52:08 pm
 

Transcription Regulation and Transcription Factor Motif Finding

Contributed by Tim Schmidt

 


 

Introduction:

Shirley used this analogy, which I have embellished a little for clarity to explain the necessity of gene regulation:

 

Eukaryotic
Transcription Factors in Action

 

 

Imagine that you are a chef. You have a massive cookbook with every single recipe that you could ever need. You are in charge of cooking for all occasions, from the most everyday noontime meal to the most extravagant dinner party, for a fabulous resort. Now, the problem arises that you need to know exactly which recipes to use, at which times, and in which quantities.

 

How do you manage? Well, there are certain connections in your memory that you have learned to make between external cues and your knowledge about the appropriate context for each dish. For example, when you see the sun rise, wake up from sleeping, see other people at the resort waking up, etc., you know that it is time to make breakfast. So, when you experience those cues, you pull out and use the recipes for eggs, waffles, coffee, etc. For a fancy dinner party, you know that people with high expectations, dressed in fancy clothes, in the evening, will be coming to eat. Thus, that triggers the connection in your mind with recipes like prime rib, red wine, Caesar salad, etc.

 

To make the analogy complete and useful, replace the chef and resort with the cell (the entity that both "makes the decisions" and requires regulation), and replace the giant cookbook with the cell's genome. A living cell has a constant stream of complex requirements in order to survive: it must "eat," it must make protein, it must grow, it must reproduce, and it must perform many, many other crucial tasks of life. The instructions for everything the cell needs reside in its genome, its giant "cookbook." The only problem is having the proper genes be expressed at the proper times and quantities. For example, the cell cycle requires a continuous parade of difference genes to coax a cell through the various stages that lead to reproduction.

 

 

To accomplish this monumental organizational feat, the cell uses a plethora of transcription factors, proteins that specifically affect the transcription of DNA into mRNA. These proteins interact, either directly, or indirectly through contact with other transcription factors, with genomic DNA.

 

Transcription factors have the ability to bind to specific transcription factor binding motifs, sequence patterns in DNA that play a major role in gene regulation. These motifs are generally found upstream of genes, and bind proteins that either upregulate or downregulate the transcription of the nearby gene. The types and quantities of transcription factor binding motifs upstream of genes determine those genes regulatory profile. For example, in yeast, during stressful situations, the ribosomal protein genes are greatly shut down in order to conserve resources (ribosomal protein gene transcription accounts for about 40% of all gene transcription under normal conditions). Therefore, most all of the ribosomal protein genes contain a transcription factor binding motif upstream that is recognized, either directly or indirectly, by a transcription factor call Sfp1. This transcription factor is only found in the nucleus under normal conditions, and is shuttled out of the nucleus when times get tough. Thus, its shuttling putatively regulates the transcription of ribosomal protein genes. I happen to be currently researching what causes this transcription factor to shuttle.

 

 

Eukaryotic Transcription Factors in Action

 

 

Transcription regulation, through the placement of transcription factor binding motifs, is a fascinating area of research because it helps explain how life organizes the use of the vast quantity of genomic information. Without transcription regulation, there could be no life because genes must be "turned on and off" according to specific needs of a living cell. Discovering which genes are controlled by which transcription factors, and what motifs those factors recognize, is an active and illuminating area of research.

 

Main Description:

 

 

 

Explain the general methodologies used in this problem and the intuition behind the approaches. Explain the assumptions and limitations, and possibly the theoretical foundations, of the different approaches. Include figures (with proper acknowledgement) to illustrate and clarify the text description, but make sure each figure takes no more than 500 KB of space. You can show some interesting papers to see how the technique was used to help reach the biological findings. This part should be about 3,000 words long.

Gene Clustering by Co-Expression

 

Computational Motif Finding

 

Challenges

 

Scan for Transcription Factor Motif Sites

 

Sequence Logo

 

Moby Dick

 

Regular Expression Enumeration

 

Consensus

 

Expectation Maximization

 

Gibbs Sampling Model

 

Scoring Motifs

 

 

Conclusions:

Briefly summarize the main description text. Point out where this field is going, E.g. are there new directions of research or new findings in this area, have the techniques used here been applied to other computational biology problems, etc.

 

References:

 

Recommended Reading

TitleAuthorPublisherPurpose
Molecular Biology of the GeneWatson, et al.PearsonGood introduction to molecular biology, including the ins and outs of basic transcription factor-mediated regulation
Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic AcidsDurbin, et al.CambridgeIntroduction to the theory behind multiple sequence alignment methods
Essential BioinformaticsJin XiongCambridgeExcellent source with a chapter titled, “Protein Motifs and Domain Prediction”
Bioinformatics: New ResearchEd. Peter V. YanNovaCollection of mini-textbooks of various topics in bioinformatics, including one called “Computational Approaches for Deciphering the Transcriptional Regulatory Network by Promoter Analysis”, by Ping Qiu
BIO 280 Notes Liu, X
Wikipedia Entry for “Transcription”
Wikipedia Entry for “Transcription Factor”
Wikipedia Entry for “Sequence Logo”
Wikipedia Entry for “Regular Expression”
Wikipedia Entry for “Consensus Sequence”

 

 

Transcription Factor-Related Sites and Web Applications

Weblogo

Promoter Sequence Retrival: Regulatory Sequence Analysis (RSA)

TRANSFAC: A public database for transcription factor motifs (free registration required)

JASPAR: The high-quality transcription factor binding profile database

MEME: Multiple Expectation-Maximization for Motif Elicitation

BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-Expressed Genes (X. Liu)

PROSITE: Database of protein domains, families and functional sites

Improbizer

Moby Dick

MD Scan

Weeder

Gibbs Motif Sampler

Align ACE

CONSENSUS

 

Primary Literature

 

1. Stormo, Gary D. (2000). DNA binding sites: representation and discovery. Bioinformatics 16 16–23.

2. Harmen J. Bussemaker, Hao Li, and Eric D. Siggia, Building a dictionary for genomes: identification of presumptive regulatory sites by statistical analysis. Proc Natl Acad Sci U S A. 2000 Aug 29;97(18):10096-100.

3. Waleev T, Shtokalo D, Konovalova T, Voss N, Cheremushkin E, Stegmaier P, Kel-Margoulis O, Wingender E, Kel A, Composite Module Analyst: identification of transcription factor binding site combinations using genetic algorithm. Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W541-5.

4. Bailey TL, Williams N, Misleh C, Li WW., MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W369-73.

5. Conlon EM, Liu XS, Lieb JD, Liu JS. Proc Natl Acad Sci U S A. 2003 Mar 18;100(6):3339-44. Epub 2003 Mar 7.

6. Liu X, Brutlag DL, Liu JS. BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac Symp Biocomput. 2001;:127-38.

7. Pabo, C.  TRANSCRIPTION FACTORS: Structural Families and Principles of DNA Recognition Annu. Rew Biochem. 1992. 61:1053-95

8. Timothy L. Bailey and Charles Elkan, \"Fitting a mixture model by expectation maximization to discover motifs in biopolymers\", Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994

9. Liu XS, Brutlag DL, Liu JS. An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments. Nat Biotechnol. 2002 Aug;20(8):835-9. Epub 2002 Jul 8.

10. Thompson W, Rouchka EC, Lawrence CE., Gibbs Recursive Sampler: finding transcription factor binding sites. Nucleic Acids Res. 2003 Jul 1;31(13):3580-5.

11. van Helden, J., André, B. & Collado-Vides, J. (1998). Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J Mol Biol 281(5), 827-42.

12. Sandelin A, Alkema W, Engstrom P, Wasserman WW, Lenhard B., JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D91-4.

13. Charles E. Lawrence, Andrew A. Reilly, An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences, Proteins: Structure, Function, and Genetics, Volume 7, Issue 1  , Pages 41 - 51

14. Thompson W, Rouchka EC, Lawrence CE., Gibbs Recursive Sampler: finding transcription factor binding sites. Nucleic Acids Res. 2003 Jul 1;31(13):3580-5.

15. Kel AE, Gossling E, Reuter I, Cheremushkin E, Kel-Margoulis OV, Wingender E., MATCH: A tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res. 2003 Jul 1;31(13):3576-9.

16. Sinha S, Tompa M., YMF: A program for discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res. 2003 Jul 1;31(13):3586-8.

17. Che D, Jensen S, Cai L, Liu JS., BEST: binding-site estimation suite of tools. Bioinformatics. 2005 Jun 15;21(12):2909-11. Epub 2005 Apr 6.

18. Favorov AV, Gelfand MS, Gerasimova AV, Ravcheev DA, Mironov AA, Makeev VJ., A Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length. Bioinformatics. 2005 May 15;21(10):2240-5. Epub 2005 Feb 22.

19. Berezikov E, Guryev V, Cuppen E., CONREAL web server: identification and visualization of conserved transcription factor binding sites. Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W447-50.

20. Liu Y, Wei L, Batzoglou S, Brutlag DL, Liu JS, Liu XS., A suite of web-based programs to search for transcriptional regulatory motifs. Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W204-7.

21. Grau J, Ben-Gal I, Posch S, Grosse I., VOMBAT: prediction of transcription factor binding sites using variable order Bayesian trees. Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W529-33.

22. Fu Y, Frith MC, Haverty PM, Weng Z., MotifViz: an analysis and visualization tool for motif discovery. Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W420-3.

23. Sosinsky A, Bonin CP, Mann RS, Honig B., Target Explorer: An automated tool for the identification of new target genes for a specified set of transcription factors. Nucleic Acids Res. 2003 Jul 1;31(13):3589-92.

24. Ponomarenko JV, Ponomarenko MP, Frolov AS, Vorobyev DG, Overton GC, Kolchanov NA., Conformational and physicochemical DNA features specific for transcription factor binding sites. Bioinformatics. 1999 Jul-Aug;15(7-8):654-68.

25. Akiyama Y, Hosoya T, Poole AM, Hotta Y., The gcm-motif: a novel DNA-binding motif conserved in Drosophila and mammals. Proc Natl Acad Sci U S A. 1996 Dec 10;93(25):14912-6

 

 

 

  • Web pointers to other online lecture notes (<= 3) or textbooks in this area.
  • Pointers to online databases or web server applications (<=5) related to this area.
  • Pointer to BIO280 class video on this topic.
  • References (<=20 best papers in this field). For each reference, include a URL link to PubMed abstract, and preferably an additional link to the freely available .pdf found through Google Scholar.

Comments (0)

You don't have permission to comment on this page.