| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • Work with all your cloud files (Drive, Dropbox, and Slack and Gmail attachments) and documents (Google Docs, Sheets, and Notion) in one place. Try Dokkio (from the makers of PBworks) for free. Now available on the web, Mac, Windows, and as a Chrome extension!

View
 

Transcription Regulation and Transcription Factor Motif Finding

Page history last edited by PBworks 14 years, 9 months ago

Transcription Regulation and Transcription Factor Motif Finding

Contributed by Tim Schmidt

 


 

Introduction:

Professor Shirley Liu used this analogy, which I have embellished a little for clarity to explain the necessity of gene regulation:

 

Imagine that you are a chef. You have a massive cookbook with every single recipe that you could ever need. You are in charge of cooking for all occasions, from the most everyday noontime meal to the most extravagant dinner party, for a fabulous resort. Now, the problem arises that you need to know exactly which recipes to use, at which times, and in which quantities.

 

How do you manage? Well, there are certain connections in your memory that you have learned to make between external cues and your knowledge about the appropriate context for each dish. For example, when you see the sun rise, wake up from sleeping, see other people at the resort waking up, etc., you know that it is time to make breakfast. So, when you experience those cues, you pull out and use the recipes for eggs, waffles, coffee, etc. For a fancy dinner party, you know that people with high expectations, dressed in fancy clothes, in the evening, will be coming to eat. Thus, that triggers the connection in your mind with recipes like prime rib, red wine, Caesar salad, etc.

 

To make the analogy complete and useful, replace the chef and resort with the cell (the entity that both "makes the decisions" and requires regulation), and replace the giant cookbook with the cell's genome. A living cell has a constant stream of complex requirements in order to survive: it must "eat," it must make protein, it must grow, it must reproduce, and it must perform many, many other crucial tasks of life. The instructions for everything the cell needs reside in its genome, its giant "cookbook." The only problem is having the proper genes be expressed at the proper times and quantities. For example, the cell cycle requires a continuous parade of difference genes to coax a cell through the various stages that lead to reproduction.

 

 

To accomplish this monumental organizational feat, the cell uses a plethora of transcription factors, proteins that specifically affect the transcription of DNA into mRNA. These proteins interact, either directly, or indirectly through contact with other transcription factors, with genomic DNA.

 

Transcription factors have the ability to bind to specific transcription factor binding motifs, sequence patterns in DNA that play a major role in gene regulation. These motifs are generally found upstream of genes, and bind proteins that either upregulate or downregulate the transcription of the nearby gene. The types and quantities of transcription factor binding motifs upstream of genes determine those genes regulatory profile. For example, in yeast, during stressful situations, the ribosomal protein genes are greatly shut down in order to conserve resources (ribosomal protein gene transcription accounts for about 40% of all gene transcription under normal conditions). Therefore, most all of the ribosomal protein genes contain a transcription factor binding motif upstream that is recognized, either directly or indirectly, by a transcription factor call Sfp1. This transcription factor is only found in the nucleus under normal conditions, and is shuttled out of the nucleus when times get tough. Thus, its shuttling putatively regulates the transcription of ribosomal protein genes. I happen to be currently researching what causes this transcription factor to shuttle.

 

 

Eukaryotic Transcription Factors in Action

 

Transcriptional regulation is very complex, and many researchers devote all their efforts to unraveling that complexity, studying what they call “transcriptional networks.” The idea behind transcriptional networks is that transcription factors do not act in a vacuum. Multiple transcription factors can affect the expression of one gene, and can often affect each other’s expression in complex loops. All in all, these relationships form a network that ultimately describes how a cell is wired to use its genomic information in a functional way.

 

For example, take this figure from Manoli’s thesis at MIT:

 

http://compbio.pbwiki.com/f/combinatorial%20regulation%20%28manoli%20at%20MIT%29.gif”>

 

In this case, Ste12 and Tec1 both have independent functions. When Ste12 is present mating genes are turned on, and when Tec1 is present, budding genes are turned on. But, when they are both present together, they each bind to their respective motifs on promoters that facilitate filamentation, interacting with each other and with other transcriptional machinery to express filamentation genes.

 

Transcription regulation, through the placement of transcription factor binding motifs, is a fascinating area of research because it helps explain how life organizes the use of the vast quantity of genomic information. Without transcription regulation, there could be no life because genes must be "turned on and off" according to specific needs of a living cell. Discovering which genes are controlled by which transcription factors, and what motifs those factors recognize, is an active and illuminating area of research.

 

Transcription Factor Motif Finding:

 

So you may be wondering, how are computational methods applied to research in transcription regulation. Well, the first difficult task in understanding transcription regulation is finding the motifs that direct specific transcription factors to the promoters of specific genes. Finding those common motifs from among lots and lots of sequence data is where the computational methods become crucial.

 

Gene Clustering by Co-Expression

 

Imagine that you have just run multiple microarray experiments (see the microarray chapter of this wiki textbook) on yeast in different conditions. After analyzing the resultant data, you find that there is a set of genes that seem to be upregulated and downregulated together. For example, this set of genes could be stress response genes that respond to conditions of starvation, heat, or osmotic stress. You hypothesize that there is a transcription factor that regulates all these genes in common. That particular transcription factor would need to integrate many different types of stress signals in order to communicate the information to the nucleus so that the cell could response correctly to the changing environmental conditions (homeostasis is one of the most important roles of transcription regulation).

 

Now, an indirect, but intuitive, way of testing your hypothesis is group all the genes that you found were co-regulated under stress, look at their promoter regions, and try to find some pattern in the sequences of those promoter regions that is present in most, if not all, of them.

 

The assumption here is that you have successfully done the microarray experiment and have discovered enough genes through clustering that you are sure are co-regulated for a computer to find a motif. The more sequences you have , the clearer it will be to a motif-finding algorithm that a motif is present. Obviously, if you only provide a motif algorithm one promoter region to search, then obviously it won’t be able to find one. Intuitively, then, the more promoter regions you have to search, the better, and about 20-800 is best.

 

Keep in mind that transcription initiation is a complicated process that involves far more than just one transcription factor. There is the polymerase of course, but in order to load the polymerase and start transcription, other factors, some of which interact with other DNA motifs and some of which interact with you transcription factor, must play a role. Below is a picture from the Pingry SMART Team of a sigma factor-DNA promoter complex. The sigma factor is the protein component that determines where transcription begins in prokaryotes. It is the most simple transcription factor, and there are many variations of it that come into play under different conditions.

 

 

The model illustrates where transcription starts in eukaryotes. In prokaryotes, the relative transcription factors bind nearby, around 200 base pairs upstream. However, in eukaryotes the picture is more complex, and transcription factor motifs can be found much further upstream (more than 1,000 base pairs). However, once those promoter region sequences have been culled by using the microarray data, the next step is to use computational methods to find common motifs.

 

Computational Motif Finding

 

By determining the motifs upstream of co-regulated genes, you can backtrack and use databases of known transcription factors with the motifs they recognize to discover the transcription factors that act at the promoters of your genes of interest. Finding out the relevant transcription factors allows you to connect the regulation of your genes to previously explored regulatory mechanisms of known transcription factors. For example, you might discover that your culled promoter regions have a motif that is recognized by Sfp1, a yeast stress response transcription factor. This would further an observation that determined motif is found at the promoter regions of genes that are co-regulated in response to stress.

 

Challenges

 

First of all, computational methods must be able to find a motif that is not only present in a majority of the inputted sequences, but is present more than it would be by change. In other words, the abundance of a candidate motif must be probabilistically significant. For example, a short motif such as ATTA is very likely to show up in most, if not all of the inputted promoter sequences. It is too short to be differentiated from random, non-coding nucleotide noise. Fortunately, transcription factors bind motifs that are longer than four bases, and for the same reason that we can only search for longer motifs. Transcription factors must be reasonably specific for particular promoters and thus must be able to recognize longer sequences that won’t show up in the genome very often by chance. Otherwise, the transcription factors would be “distracted” by much of the genome that does not need any regulation (most of it).

 

Another challenge is that the motif could appear in either strand of a promoter sequence. In other words, the reverse complements of the culled promoter sequences must be scanned as well.

 

Base substitutions pose a creative challenge to the computational biologist. Transcription factors generally recognize a broad array of similar motifs. This makes sense since the motifs are generally long enough to allow for base substitutions while maintaining affinity for the transcription factor. Like the challenge of finding a significantly abundant motif, this challenge also has biological significance. Transcription factors will bind to motifs with differing affinities based on those motifs deviation from an ideal. This ideal is called the *consensus sequence*. However, most of the sequences that you want to find aren’t the consensus sequence. Thus, the computational strategy must involve some way to “score” a potential sequence in terms of its similarity to a consensus. However, the consensus is determined at the end. How is this possible? Read on!

 

Another challenge is that a motif may be present once, not at all, or in multiple copies in a promoter sequence. Some genes, that engage in heavy transcriptional activity under certain conditions, will have multiple copies of the motif of a particular transcription factor in their promoters in order to recruit multiple copies of the transcription factor. Also, important genes may have multiple copies of a motif that attracts a relatively low-concentration transcription factor in order to ensure its presence at the proper time (more sites means a higher change of binding). This happens more often in higher organisms, like humans.

 

Finally, some transcription factors contact DNA at two sites that are separated by some intervening sequence. Thus, only the areas of direct contact are strongly conserved (according to the consensus). Sometimes a transcription factor will dimerize at a promoter, and thus two palendromic sequences will be seen separated from each other by a nonconserved sequence. The palendromic nature of the binding sites is due to the opposite orientation of the transcription factors when they bind.

 

 

In the above image, called a sequence logo (see later section), you can see the two block nature of the binding of a transcription factor called CRP. The size of the letters indicates their relative importance in the consensus sequence. You can see that there is a space in between two blocks of conservation that is not conserved. This property must also be dealt with by computational methods.

 

Scan for Transcription Factor Motif Sites

 

This experimental procedure assumes that your culled promoter sequences contain motif(s) that have already been identified. In a way this is cheating. You simply need to check if you promoter sequences have any of a set of known motifs. However, it isn’t that simple. Remember that motifs are variable, and only tend to cluster around a consensus.

 

There are sites on the internet that contain known transcription factor motifs, such as TRANSFAC and JASPAR (see References). In order to search for motifs that match parts of your sequences, the motifs have to be defined in a probabilistic manner. The figure from Professor Shirley Liu’s lecture illustrates this concept well:

 

 

On the left is a collection of sequences that are going to be checked against a motif in the database. The database motif has been broken down into a *position weight matrix*, which defines a probability of each base occurring at each position in a motif.

 

The ratio of the probability of generating a particular segment of a promoter region from the position weight matrix divided by the probability of getting the segment from the genomic background is called the segment score.

 

Note that in normal position weight matricies, usually no position will have a probability of zero. This is because that condition is too harsh, and will rule out ANY sequences that have the zero probability base at that site. It is conceivable that the sample sequences used to make the position weight matrix were simply not diverse enough to have a nucleotide at one position, but that in reality that position can contain that nucleotide.

 

JASPAR allows the user to define a cutoff for the segment score. Defining cutoffs is a tricky business, and there is no “right” answer. You don’t want to exclude variations on the consensus that are biologically significant, but you also don’t want to find false motifs in your promoter regions, which would be very confusing if the motif functions did not fit the genes (a method of checking).

 

Scanning for known transcription factor motifs has its limitations. Only so many motifs are known, and maybe your promoters sequences contain an undocumented motif. The motifs in the databases are only generalized base on a limited number of samples. Thus, each “documented motif” may not be representative of the true population character of the transcription factor binding preference. Motifs in the databases are also poorly described. There lengths can be off, for example. Finally, many motifs look similar to each other, so that you would get hits of multiple similar motifs with one scan.

 

Sequence Logo

 

 

I have shown an example of a sequence logo already, in my example of a two block motif. Above is an image from Tom Schneider of the Center for Cancer Research Nanobiology Program. It shows 12 sequences that have been aligned and used to generate a sequence logo. A popular sequence logo generator is called Weblogo (see References for web application), published by Crooks, et al in 2004.

 

There are three things to watch out for in a sequence logo:

1. The height of the stack of letters represents the overall conservation at that location.

2. The height of the letters within the stack represent the predominance of those letters in the observed sequences.

3. Special, upside down letters indicate that the upside down letter is NOT found at that site, significantly.

 

A sequence logo is a good way to visualize the consensus of a motif and handy for publishing papers and trying to understand a transcription factor’s binding character.

 

De novo Sequence Motif Finding

 

This is a different goal from the “scan for known transcription factor motifs” strategy. Here, there is no assumption that an enriched motif in a set of promoter sequences has ever been documented before. The only characteristic that will define a motif is its enriched nature alone. It must be enriched above some established genomic background, which can be represented similarly by a type of position weight matrix for single nucleotides, or a Markov background, in which nucleotide probability is depended on the preceding nucleotide (as in a Hidden Markov Model).

 

Regular Expression

 

Regular Expression Enumeration

 

The computational advantage here comes in the enumeration. The computer performs oligonucleotide analysis: it checks over every w-mer (where w is some whole number that matches the desired motif length) in all of the input data (the promoter sequences) to find significantly enriched sequences. A score is generated for each w-mer, similarly to that produced during the scan of known transcription factor motifs. The equation from Professor Liu’s lecture is below:

 

 

The expected occurrence is calculated based on the genomic background probabilities of specific letters and/or Markov Background. Also, the larger the data size, the more likely any given sequence will appear, so enrichment requirements are more strict for larger data sets. Ultimately, each possible w-mer has a given expected frequency.

 

To see if an enumerated w-mer is significant, its frequency is compared against the expected frequency in the shown ratio. Some cut off it defined to find over-represented motifs and those are the potential binding sites.

 

Regular expression enumeration is not as slow as it might seem because it involves 2-bit encoding, which allows fast index access. It is exhaustive and guarantees a global optimum while also finding multiple motifs. It’s problems, however, include excluding certain motifs with certain base substitutions, providing a long list of possible motifs as output, and only being able to deal with a limited motif width.

 

 

Moby Dick

 

The most well-known example of a regular expression strategy applied to de novo motif finding is Moby Dick.

 

The Moby Dick algorithm was first impressively demonstrated on the famous novel itself, where it predicted English words from all the letters of the book strung together. The same idea is applied to promoter sequences to find transcription factor “words”, or motifs. Moby Dick is another form of regular expression enumeration, but builds long motifs from short ones rather than finding a fixed length consensus from the beginning. The following is a brief description of the algorithm.

 

First, a “dictionary” of single letters is made: a frequency is determined for each of the four bases from the given input sequences. Then, those one letter “words” are scanned for in the input sequences to see if there are any word (at the beginning, letter) pairs that are over-represented. For example, the probability of finding an A at random may be .3. But in the input sequences there is a motif with lots of A’s strung together, such that finding 4 A’s in a row occurs more often than the 0.0081 frequency expected (0.3 to the fourth power, assuming independence).

 

Next, a new dictionary of over-represented word PAIRS is created and scanned back through the input data to see if any word pairs go together more than expected.

 

In this way, longer and longer words can be formed, while some shorter words no longer associate with other short words more than expected. Finally, the longer words discovered should be motifs that are enriched in the input sequences and involve smaller subunits that are associated more than expected.

 

Matrix Update Methods

 

Consensus

 

The best way to explain the consensus algorithm is by walking through it. The G. Stormo paper (see References) can explain in more technical detail.

 

You give the algorithm a series of promoter regions. It takes two of those sequences (at random) and compares all the w-mers in the two sequences. From that comparison, the algorithm develops a list of good motif candidates, since those candidates matched in the two sequences, and bad motif candidates because there were no similar sequences in common between the two sequences. The following is an outline overview of the consensus procedure from Professor Liu’s lecture:

 

 

As you might be able to guess from the figure, this process of matching candidate motifs against subsequent sequence members of the input data continues. In each round, a new sequence is scanned to see if it contains any w-mers that match or resemble closely that “good motifs” developed from previous comparisons. This repeats until all the sequences have been scanned, and the most popular motifs have been found.

 

What is interesting, and a bit of a drawback, of Consensus motif finding is that the first few sequences used have a large effect on the outcome. For example, if the first two sequences used don’t have the motif for whatever reason, the motif itself will be identified as a “bad motif” and will be ignored in later sequence scans. To get around this, versions of the consensus algorithm repeat the process and compare the answers from each iteration to make sure nothing was missed.

 

 

Expectation Maximization and Gibbs Sampling Modeling

 

Expectation Maximization AND Gibbs Sampling take these inputs from Professor Liu’s BIO280 Lecture:

 

 

These are the most technically complicated of the de novo motif finding algorithms, and if you are interested for more details, I suggest wikipedia.

 

Expectation maximization is centered around the idea of alternating between an expectation step and a maximization step. The expectation step calculates the expected frequency, based on the same likelihood ratio seen in previous algorithms (P(given sequence based on input)/P(given sequence based on genomic background). The expectation step is iterated over every position in a sequence. The maximization step follows, as all the possible motifs are lined up and weighted according to the expectation score previously calculated. To get the initial score matrices, the algorithm enumerates every word in the input sequences.

 

Both the C. Lawrence and TL Bailey papers are linked in the References section of this chapter for your Expectation Maximization perusal.

 

The Gibbs Sampler has a much more intuitive feel, despite similarly complex statistics. The following slide is from the The Bioinformatics Center at Rensselaer and Wadsworth:

 

 

First, each sequences has a motif chosen at random from it as an initial “guess”. Next, a scoring matrix is made based on those randomly chosen sequence segments. Then, one sequence is chosen at random, and each of its w-mers are compared against the initialized scoring matrix. The highest scoring w-mer is then throwing back into the pile of “motifs” and a new scoring matrix is calculated based on the new set of motifs (new because one is different). Then, another sequence is taken out and scanned for its highest-scoring w-mer, that w-mer is taken as that sequence’s new representative motif, and it is thrown back in and a new scoring matrix is calculated. Thus, the interating occurs, and, interestingly, quickly converges to a motif solution.

 

The intuition of the Gibbs Sampling method is that the given promoter sequences DO have an enriched motif. Once that motif is randomly selected, it quickly entrenches itself and multiplies among the motifs used in the scoring matrices because each sequence has that motif. Soon, two sequences have matching words, then three, and it quickly converges.

 

 

Example of microarray analysis, hierarchical clustering, Gibbs sampling, and hidden Markov modeling from the Literature

 

In the paper, “Whole-genome expression profiling defines the HrpL regulon of Pseudomonas syringae pv. tomato DC3000, allows de novo reconstruction of the Hrp cis clement, and identifies novel coregulated genes”, by Ferreira, et al., the authors use an array of bioinformatic techniques to characterize the regulation of the hypersensitive response and pathogenicity (Hrp) type III secretion system in a plant model bacterial pathogen. The relevant part of the paper is that, after performing a microarray experiment to find differentially regulated genes between Hrp deficient and wild type strains of bacteria, the authors used Gibbs sampling to find the motif recognized by the HrpL sigma factor. They then went on to develop a hidden Markov model based on their findings that could predict Hrp promoters in all strains of the bacteria.

 

 

Conclusions:

 

Transcriptional regulation is as important in the productive utilization of a genome as a knowledgeable chef is for the productive utilization of a cookbook. Thus, research into how transcription is regulated, and, more specifically, where certain transcription factors bind and their effects of transcription is critical to an understanding of life itself.

 

The two broad methods of finding motifs are:

1. Scanning for known transcription factor motifs among a set of promoter sequences

2. Finding motifs among those sequences de novo through either a regular expression enumeration (as in oligonucleotide analysis or Moby Dick) or Position weight matrix update (as in Consensus, Expectation Maximization, and Gibbs sampling).

 

 

The above image is of the p53 regulatory network, taken from the BioBase website that supplies TRANSFAC. This regulatory network is only a small part of the gene regulation that goes on in eukaryotes, yet it is crucial in its role in DNA damage repair and cancer prevention. Understanding how such a regulatory network functions requires a quantitative understanding of transcriptional regulatory relationships. The field of transcription factor motif finding will be critical in its complex applications to higher eukaryotes.

 

 

 

Recommended Reading

TitleAuthorPublisherPurpose
Molecular Biology of the GeneWatson, et al.PearsonGood introduction to molecular biology, including the ins and outs of basic transcription factor-mediated regulation
Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic AcidsDurbin, et al.CambridgeIntroduction to the theory behind multiple sequence alignment methods
Essential BioinformaticsJin XiongCambridgeExcellent source with a chapter titled, “Protein Motifs and Domain Prediction”
Bioinformatics: New ResearchEd. Peter V. YanNovaCollection of mini-textbooks of various topics in bioinformatics, including one called “Computational Approaches for Deciphering the Transcriptional Regulatory Network by Promoter Analysis”, by Ping Qiu
BIO 280 Notes Liu, X
Wikipedia Entry for “Transcription”
Wikipedia Entry for “Transcription Factor”
Wikipedia Entry for “Sequence Motif”
Wikipedia Entry for “Sequence Logo”
Wikipedia Entry for “Regular Expression”
Wikipedia Entry for “Consensus Sequence”
Wikipedia Entry for “Expectation Maximization Algorithm”
[http://en.wikipedia.org/wiki/Gibbs_samplingWikipedia Entry for “Gibbs Sampling”
BIO280 Video Not AvailableCheck the website

 

 

Transcription Factor-Related Sites and Web Applications

Weblogo

Promoter Sequence Retrival: Regulatory Sequence Analysis (RSA)

TRANSFAC: A public database for transcription factor motifs (free registration required)

JASPAR: The high-quality transcription factor binding profile database

MEME: Multiple Expectation-Maximization for Motif Elicitation

BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-Expressed Genes (X. Liu)

PROSITE: Database of protein domains, families and functional sites

Improbizer

Moby Dick

MD Scan

Weeder

Gibbs Motif Sampler

Align ACE

CONSENSUS

 

 

Papers in the Field (Including new algorithms and applications)

 

1.Stormo, Gary D. (2000). DNA binding sites: representation and discovery. Bioinformatics 16 16–23.

2.Harmen J. Bussemaker, Hao Li, and Eric D. Siggia, Building a dictionary for genomes: identification of presumptive regulatory sites by statistical analysis. Proc Natl Acad Sci U S A. 2000 Aug 29;97(18):10096-100.

3. Waleev T, Shtokalo D, Konovalova T, Voss N, Cheremushkin E, Stegmaier P, Kel-Margoulis O, Wingender E, Kel A, Composite Module Analyst: identification of transcription factor binding site combinations using genetic algorithm. Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W541-5.

4.Bailey TL, Williams N, Misleh C, Li WW., MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W369-73.

5.Conlon EM, Liu XS, Lieb JD, Liu JS. Proc Natl Acad Sci U S A. 2003 Mar 18;100(6):3339-44. Epub 2003 Mar 7.

6.Liu X, Brutlag DL, Liu JS. BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac Symp Biocomput. 2001;:127-38.

7.Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC., Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment.

Science. 1993 Oct 8;262(5131):208-14.

8.Timothy L. Bailey and Charles Elkan, \"Fitting a mixture model by expectation maximization to discover motifs in biopolymers\", Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994

9.Liu XS, Brutlag DL, Liu JS. An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments. Nat Biotechnol. 2002 Aug;20(8):835-9. Epub 2002 Jul 8.

10.Thompson W, Rouchka EC, Lawrence CE., Gibbs Recursive Sampler: finding transcription factor binding sites. Nucleic Acids Res. 2003 Jul 1;31(13):3580-5.

11.van Helden, J., André, B. & Collado-Vides, J. (1998). Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J Mol Biol 281(5), 827-42.

12.Sandelin A, Alkema W, Engstrom P, Wasserman WW, Lenhard B., JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D91-4.

13.Charles E. Lawrence, Andrew A. Reilly, An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences, Proteins: Structure, Function, and Genetics, Volume 7, Issue 1  , Pages 41 - 51

14.Thompson W, Rouchka EC, Lawrence CE., Gibbs Recursive Sampler: finding transcription factor binding sites. Nucleic Acids Res. 2003 Jul 1;31(13):3580-5.

15.Kel AE, Gossling E, Reuter I, Cheremushkin E, Kel-Margoulis OV, Wingender E., MATCH: A tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res. 2003 Jul 1;31(13):3576-9.

16.Sinha S, Tompa M., YMF: A program for discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res. 2003 Jul 1;31(13):3586-8.

17.Che D, Jensen S, Cai L, Liu JS., BEST: binding-site estimation suite of tools. Bioinformatics. 2005 Jun 15;21(12):2909-11. Epub 2005 Apr 6.

18.Favorov AV, Gelfand MS, Gerasimova AV, Ravcheev DA, Mironov AA, Makeev VJ., A Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length. Bioinformatics. 2005 May 15;21(10):2240-5. Epub 2005 Feb 22.

19.Berezikov E, Guryev V, Cuppen E., CONREAL web server: identification and visualization of conserved transcription factor binding sites. Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W447-50.

20.Liu Y, Wei L, Batzoglou S, Brutlag DL, Liu JS, Liu XS., A suite of web-based programs to search for transcriptional regulatory motifs. Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W204-7.

Comments (0)

You don't have permission to comment on this page.