| 
View
 

RNA Folding

Page history last edited by PBworks 19 years, 1 month ago

RNA folding

 


 

 

by Rebekah Rogers

 

Introduction

In all of biology structure correlates with function. The shape of a butterfly's mouth enables it to pull nectar from a flower, the form of an enzyme dictates what moleculers it may act upon, and the mold of a joint limits the range in which our bones can move. Similarly, the structure of an RNA molecule can help us predict both its function and its stability within the cell, helping us figure out its effect on cellular processes.

 

Potential Folding Patterns

 

RNA is a single stranded molecule. Unlike the DNA double helix, bases are not automatically paired with a complementary strand of nucleic acid. Often times complementary portions of the RNA will pair with one another to make the molecule more stable. The unpaired RNA between paired portions forms different kinds of loops. Once the secondary patterns of base pairing are established, the RNA molecule can twist into a unique three dimensional structure that renders the molecule more stable. Proteins within the cell may bind to the RNA molecule altering its structure and functional properties.

 

Various RNA secondary structure patterns

a. hairpin loop

b. internal loop

c. bulge loop

d. multibranched loop

e. stem

f. pseudoknot

(http://ludwig-sun2.unil.ch/~bsondere/nussinov/)

 

In the hairpin, RNA folds back on itself, and any unpaired RNA forms a loop. The paired portion is called the stem. An internal loop is essentially section that falls in the middle of a longer stretch of paired RNA where both sides are unpaired. A bulge loop occurs where one side continuously pairs but the other side has a little extra DNA that is unpaired and bulges out to one side. A multi branched loop is a ring of unpaired DNA that connects three or more stems. A pseudoknot is a complicated structure where part of a hairpin loop reaches over and pairs with a completely different section of RNA.

 

 

Types of RNAs

 

  • mRNA
  • tRNA
  • rRNA
  • snRNA and other catalytic RNA

 

 

Messenger RNA, or mRNA, molecules commonly serve as an intermediate molecule in protein biosynthesis, ferrying information from the genomic code to the ribosomes where translation occurs. For these molecules, the folding patterns and base pairings primarily affect natural molecular stability and succeptibility to degradation by cellular enzymes, such as RNases. Some RNase enzymes, such as E. coli's RNase III, cleave RNA molecules only at stem-loop structures. Most others, such as RNase E, only degrade single-stranded RNA at specific sites. These single strand cleavers can be easily blocked by putting the recognition sequence into a paired stem loop structure where the enzyme cannot access it.

 

Some RNase enzymes and related degredation enzymes bind to one end of the RNA before cleaving an internal site. These enzymes can be blocked even further by placing a hairpin at either the 5' or 3' end of the molecule. Clearly the secondary structure of RNA has a strong role in stability, which in turn plays a strong role in the regulation of mRNA levels. Careful control of mRNA levels is important to maintain to precise control of protein synthesis.

 

Transcription of many bacterial genes is regulated by attenuation, in which a hairpin loop forms in the mRNA as it is being transcribed. This hairpin knocks the RNA polymerase off the nucleic acids. Here, mRNA structure directly regulates mRNA production.

 

The tRNA molecules within the cell all have a unique secondary structure that involves three hairpin loops so that the structure is arrayed like a clover leaf.

(Molecular Biology of the Cell, 3rd Ed. Part II, Molecular Genetics, Chapter 6. Basic Genetic Mechanisms, RNA and Protein Synthesis)

 

These hairpin loops interact with one another to produce a three diminsional structure.

 

The middle clover loop houses the anticodon which "reads" the mRNA while the end of the mRNA provides a site to which a specific amino acid may attach. The structure of the tRNA must allow for proper availability of the anticodon, a proper shape to fit within the ribosome during translation, and the correct chemical pattern for the amino acid to bind. Some tRNAs must be processed to remove introns before they become mature and able to carry amino acids. Changes in tRNA structure could prevent binding to amino acids, entry into the ribosome, or correct reading of the mRNA template. Even slight changes to tRNA structure can have drastic impacts on translational accuracy and efficiency, affecting potentially every protein made in the cell.

 

 

Ribosomes are organelles within the cell which translate mRNAs into proteins. Ribosomal RNA, or rRNA forms the backbone of each ribosomal subunit. These rRNAs are first transcribed as a long molecule, and then cleaved to form smaller 28S and 18S units. Each of these is packaged with proteins to form the large and small ribosomal subunits. Proper conformation of the mRNA is necessary to give each ribosome its proper shape. Ribosomal RNA is often the target for antibacterial drugs, as the prokaryotic and eukaryotic rRNAs are usually sufficiently different to allow specific targeting of only the prokaryotic molecule.

 

(http://fig.cox.miami.edu/~cmallery/255/255hist/mcb4.24.rRNA.jpg)

 

Some RNA molecules can also be catalytic. In eukaryotes, small nuclear ribonuclear proteins (snRNPs) remove the introns from newly transcribed mRNA molecules. These snRNPs are a mix of RNA molecules and proteins. Any mistakes in snRNA function will have cascading effects as they are necessary to produce every other RNA molecule (and therefore protein) within the cell.

 

(BIOLOGY, Campbell and Reece, Ch 17, Figure 17.3, p. 303)

 

Predicting RNA structure

One main caveat for RNA structure prediction applies to all prediction algorithms. Many molecules experience modification prior to maturity, such that the correct nucleic acid sequence must be entered for the finally product. Additionally the primary folding of the RNA prior to modification may in part be preserved, influencing the final RNA folding pattern of the mature RNA. For these molecules, the prediction programs may not correctly identify the in vivo structure of the RNA molecule at maturity.

 

Covariance

The Covariance Model of RNA structure prediction assumes that orthologous RNA from multiple species should have similar structures and functions. For example, a tRNA molecule from S. cerevisiae should have a structure that is very similar to the tRNA from humans. RNA sequences are aligned, and each pair of columns in the alignment is scored for covarying substitutions. Each column is given a "Mutual Information" score between zero and two (inclusive).

 

The covariance method will not be useful for all RNA molecules, as it requires sufficient sequence similarity for sequences to be aligned, yet different enough to find covarrying substitutions. The covariance approach generally requires a large number of RNA sequences to generate meaningful results.

 

This rationale seems reasonable for molecules where structure dictates almost all functionality, such as rRNA and tRNA. However, I personally would imagine that application to orthologous mRNA across species will probably not generate exceptionally useful results. The mRNAs simply do not experience the same evolutionary constraints on structure across species. For such molecules, a model which recognizes base-pairing rules is likely to be more useful.

 

Dot Plot

The Dot Plot method simply seeks to find all possible base pairings for a given RNA molecule. Bases are listed in one direction across all columns and then in the reverse direction accross all columns. Complementary pairings are marked with dots, and extended across diagonals as far as possible. This readout only gives all possible base pairings, with no regard for maximum base pairing or overall molecular stability, and therefore, is a very simplistic model.

 

 

Base Pair Maximization

The Nussinov Algorithm attempts to maximize the number of bases within a molecule that will be paired within the structure. The algorithm is recursive, and adds bases either as paired or unpaired to the previous best structure. The reasoning is remniscient of the Smith Watterman method for sequence alignment. The algorithm allows the i and j bases to be paired with one another, to have one paired within a loop and one unpaired, or to pair in two separate loops within the same molecule. The last step, which allows bifurcations, greatly increases the computational time of the algorithm.

 

 

A Matrix is filled for the entire sequence according to the following rules:

 

Initialization

(i,i-1) = 0 for i = 2 to L

(i,i) = 0 for i = 1 to L

Recursion:

(Formatted equation taken from http://ludwig-sun2.unil.ch/~bsondere/nussinov/)

 

Note that since the front half of the RNA simply pairs with the back half of the RNA, we only need to fill in half of the matrix. The best set of parings is traced back from the upper right hand corner of the matrix. The Nussinov Algorithm is a very simplistic model of RNA folding that only maximizes base pairings. It does not take into account the stability effects that different bulge lengths may have, nor does it consider any tradeoffs or interactions between different structures. However, given the time in which the algorithm was introduced, it was extremely advanced and provided an excellent basis for future algorithm development.

 

Free Energy

 

Most current programs for RNA structure prediction focus on minimizing free energy. In these free energy calculations, the nearest neighbor to each base pair is affected only by the two base pairs on either side. The free energies of all bases are then summed. Scoring includes effects of neighboring base pairs, penalties for loop initiation, as well as rewards for proper base pairing. Terminal base pairings are not considered to be stable, and internal loops and bulges also destabilize the structure.

 

Neighboring base pair effects

 

The structures of these RNA molecules are never absolute as molecules always move about the equilibrium, and there may be more than one stable structure which occurs often within the cell. Computational programs do not calculate every possible RNA structure and free energy, as the computational time required would be too great. Rather, they employ dynamic programming to produce a few structures which should be most stable.

 

There are a few main weaknesses of Free Energy methods. These methods all assume that the most stable structure is the one which occurrs within the cell. However, the cell may favor some slightly less stable structures which are more or less resistant to degradation by endonucleases within the cell. Also, for the catalytic or ribosomal RNAs, as with enzymes, the most stable structure may not be the most functional.

 

Likewise, protein interactions are also generally ignored in these algorithms, so that any RNA molecule which ultimately binds to proteins may not be predicted accurately. However, the accuracy of some predictions, especially those for rRNA goes up dramatically when the function of cellular folding enzymes is included in the analysis. Finally, the majority of these programs do not consider the extremely complex RNA foldings such as pseudoknot formation, so any molecule which in fact experiences this pattern will not be predicted accurately.

 

There are several programs available with either a web-based interface or which can be downloaded.

 

 

 

RNA Prediction Programs

 

MFold

 

The MFold software is avaiable both as a web-based interface and as downloadable software for Unix and Linux. The program predicts both the single minimum energy folding option, as well as several alternative sub-optimal foldings. The folding temperature for all predictions is set at 37 degrees Celsius and cannot be changed. Only sequences of 800bp or less can be folded on the MFold server. Output includes free energy dot plots and diagrams of predicted structures.

 

Vienna

 

 

Vienna is a web-based RNA structure prediction program which is also available for download and execution on Unix or Linuz based machines. Vienna offers a few advantages over the MFold program. First, in addition to free energy calculations, it will calculate a partion function to find the probability of each base pair in a submitted sequence. This partition function can be turned on or off as desired. The program allows greater flexibility in that dangling end energies can be turned off, so that molecules can end with paired bases, and the temperature for energy calculations can be changed. Single stranded DNA sequences can be submitted with altered energy rules to reflect differences in chemical structure.

 

The program then reports the most likely structure and a dot plot of the probabilities of base pairings. These plots show the absolute plot in the bottom left triangle and weighted dots that reflect actual probabilities in the upper right triangle.

 

(From BIOINFORMATICS, Ch 6, Predictive Methods Using RNA Sequences, p. 154)

 

 

RNAstructure

RNA structure is a windows based RNA structure prediction program, which uses thermodynamic parameters to find both the optimal and optional suboptimal secondary structures. Output is displayed in the form of a secondary structure diagram, revealing all paired bases, loops, and the free energy of the structure. The program is useful for PCR primer design as it contains an OligoWalk feature which predictst the binding efficiency of a single stranded nucleic acid to a given target.

 

SFold

 

SFold is a RNA prediction program that first runs RNA free energy predictions to form the most stable structure, and then performs a partition function according to a stochastic sampling procedure to predict the probability that any base pair will be paired or single stranded, regardless of what other part of the molecule it pairs with. It also displays a dot plot similar to Vienna that plots potential base pairings with dot size corresponding to the probability of that particular match.

 

(From BIOINFORMATICS, Ch 6, Predictive Methods Using RNA Sequences, p. 158)

 

Additional Prediction Programs

One alternative to the standard dynamic programming approach uses evolutionary programming to select the most stable structure prediction. The program STAR starts with a structure and makes slight changes which may make the structure better or worse. The better structures are kept and others discarded. This program does not guarantee the best prediction, as the program may never make it through a selection valley to get to a neighboring selection peak. However, it does offer the advantage of including pseudoknots in a reasonable computational time, a feat which none of the previously mentioned programs can do.

 

RNA tertiary structure pprediction is exceptionally different, in part because the number of precise known tertiary structures occurring in nature is very limited. Current biochemical techniques have difficulty purifying tRNA for X-ray crystalography, although some high quality measurements of ribosomal structure have been made. One model so far focuses on co-variance. If non-paired nucleotides co-vary (are evolutionarily constrained) then they may be instrumental in determining tertiary structure. However the success of this approach so far is extremely limited.

 

The most promising RNA tertiary structure modeling involves a random walk. The program yaamp uses starts with a simplified version of the RNA secondary structure, with all helices (stems, stretches of paired bases) are represented with a single symbol. From there, the computer randomly constructs a variety of structures which are selected based on simulated annealing and energy minimization. Experimental data on folding constraints was also factored in to model constructuion. All favored models are then combined to produce one final favored structure.

 

Summary

 

RNA structure dictates RNA function and RNA stability. There are many programs which currently predict RNA secondary structures based on Free Energy constraints. Although these programs are the most consistent and widely used, they may not always accurately predicte the true in vivo structure of an RNA. Additional algorithms must be employed to produce more complicated structures, such as pseudoknots. RNA tertiary structure prediction currently is extremely limited. Still, as molecular and computational biology progress, new algorithms may produce some reasonable means of accurately modeling RNA tertiary structure.

 

Recommended Reading

TitleAuthorPublisher
BIOLGYCampbell, NA and Reece, JBBenjamin Cummings
BIOINFORMATICS: A practical guide to the analysis of genes and proteins Baxevanis, AD and Oullette, BFF, Eds. Wiley Press
BIO 280 Notes Liu, X
Nussinov Website

 

Primary Literature

1. Barreda, D.C.J., et al., RNA 3D structure prediction: (1) assessing rna 3D structure similarity from 2D structure similarity. Genome Inform, 2004. 15(2): p. 112-20. pdf

2. Bernhart, S.H., et al., Partition function and base pairing probabilities of RNA heterodimers. Algorithms Mol Biol, 2006. 1(1): p. 3. pdf

3. Bindewald, E. and B.A. Shapiro, RNA secondary structure prediction from sequence alignments using a network of k-nearest neighbor classifiers. Rna, 2006. 12(3): p. 342-52. pdf

4. Ding, Y., Statistical and Bayesian approaches to RNA secondary structure prediction. Rna, 2006. 12(3): p. 323-31. pdf

5. Do, C.B., D.A. Woods, and S. Batzoglou, CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics, 2006. 22(14): p. e90-8. pdf

6. Doshi, K.J., et al., Evaluation of the suitability of free-energy minimization using nearest-neighbor energy parameters for RNA secondary structure prediction. BMC Bioinformatics, 2004. 5: p. 105. pdf

7. Dowell, R.D. and S.R. Eddy, Efficient pairwise RNA structure prediction and alignment using sequence alignment constraints. BMC Bioinformatics, 2006. 7: p. 400. pdf

8. Favaretto, P., A. Bhutkar, and T.F. Smith, Constraining ribosomal RNA conformational space. Nucleic Acids Res, 2005. 33(16): p. 5106-11. pdf

9. Hamada, M., et al., Mining frequent stem patterns from unaligned RNA sequences. Bioinformatics, 2006. 22(20): p. 2480-7. abstract only

10. Kierzek, E., et al., Facilitating RNA structure prediction with microarrays. Biochemistry, 2006. 45(2): p. 581-93. abstract only

11. Kim, N., et al., Candidates for novel RNA topologies. J Mol Biol, 2004. 341(5): p. 1129-44. pdf

12. Lemieux, S. and F. Major, Automated extraction and classification of RNA tertiary structure cyclic motifs. Nucleic Acids Res, 2006. 34(8): p. 2340-6. pdf

13. Liu, H., et al., An RNA folding algorithm including pseudoknots based on dynamic weighted matching. Comput Biol Chem, 2006. 30(1): p. 72-6. abstract only

14. Lu, Z.J., D.H. Turner, and D.H. Mathews, A set of nearest neighbor parameters for predicting the enthalpy change of RNA secondary structure formation. Nucleic Acids Res, 2006. 34(17): p. 4912-24.

15. Malhotra, A. and S.C. Harvey, A quantitative model of the Escherichia coli 16 S RNA in the 30 S ribosomal subunit. J Mol Biol, 1994. 240(4): p. 308-40. abstract only

16. Mathews, D.H., Revolutions in RNA secondary structure prediction. J Mol Biol, 2006. 359(3): p. 526-32. abstract only

17. Mathews, D.H. and D.H. Turner, Prediction of RNA secondary structure by free energy minimization. Curr Opin Struct Biol, 2006. 16(3): p. 270-8. abstract only

18. Rodland, E.A., Pseudoknots in RNA secondary structures: representation, enumeration, and prevalence. J Comput Biol, 2006. 13(6): p. 1197-213. pdf

19. Voss, B., Structural analysis of aligned RNAs. Nucleic Acids Res, 2006. 34(19): p. 5471-81. pdf

20. Voss, B., R. Giegerich, and M. Rehmsmeier, Complete probabilistic analysis of RNA shapes. BMC Biol, 2006. 4: p. 5. pdf

Comments (0)

You don't have permission to comment on this page.