| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

View
 

SNPs and Association Studies

Page history last edited by PBworks 14 years, 8 months ago

SNPs and Association Studies

By Rosa Ng (mng@fas.)

 


What is Single Nucleotide Polymorphisms?

 

A single nucleotide polymorphism, or SNP (pronouned "snip"), is a change in a single nucleotide in the genome that causes variations in DNA sequences between members of the same species. In order for the variation to be considered as a “polymorphism,” it has to be “common” in the population. Otherwise, the variation is just a rare mutation. For example, a single-nucleotide change in the DNA sequence of a specific gene, such as from ACGCCAG to ACTCCCG, that occurs in greater than or equal to 1% of the population can be considered as a SNP. These variants are called alleles, representing alternative forms of DNA in a locus, or position on the chromosome. Sometimes short (1 to 3 base-pair-long) insertions and deletions in the DNA sequence, or indels, are considered SNPs as well.

 

The idea of polymorphism in genetics has first been explored by founders of evolutionary genetics such as R. A. Fisher as early as the 1930s. (1) As one can imagine, polymorphism is considered important for evolution because genetic variations allow for evolutionary selection. Still, it was with the increasing interests in and studies of DNA, particularly with the discovery of Restriction Fragment Length Polymorphism (RFLPs) in the 1980s (2), that SNPs became known.

 

SNP Characteristics

 

Origin and Mode of Inheritance

Figure: Recombination of chromosomes over many generations leads to many different descendant chromosomes that may share common DNA segments. (3)

Single Nucleotide Polymorphisms usually arise from mistakes during DNA replication in individual germ-line cells (i.e. gametes such as sperms and eggs) that become passed on to a subset of the population. We can understand this inheritance of genetic variation by recalling our knowledge of genetic recombination, or crossing over, from high school biology. During meiosis, sister chromatids of homologous chromosomes can exchange genetic materials, leading to recombinants in the resulting haploid gametes. Thus, the offspring, product of an union of two gametes, will be genetically unique from the parents, but yet have regions of chromosome containing segments of DNA sequences, or haplotypes, that are shared by others with common ancestry because these segments have been passed on through the generations. Since SNPs are biallelic, i.e. they only have two alleles as opposed to all four possible forms (from the four types of nucleotides), they occur more frequently in the population and tend to be more stable in passing on from generation to generation. This makes SNPs very useful for anthropological and evolutionary tracing, and for disease studies. These will be explored in the sections below.

 

Distribution in the Genome

According to the Human Genome Project Information website, SNPs make up about 90% of all human genetic variation (keep in mind that the DNA sequences between any two humans are more than 99% alike). (4) There is an average of approximately 1 SNP per every 1200 bp of DNA in the human genome (3), the result of a balance between the rate of introduction of mutations and that of loss of these mutations, which can happen within a few generations due to genetic recombination as described above. This means that there are about 3 million SNPs in the 3.2 billion-base-pair human genome! SNP distribution differs in genomes of different species, however. For example, organisms such as fly and maize have higher numbers of polymorphism, on average about 1 SNP per 50-100 bp, which is an order of magnitude greater than that in human. These differences in numbers can probably be attributed to differences in the rate of genetic recombination between organisms, as nucleotide diversity has been found to positively correlate with recombination rate. (5,6) More information on nucleotide diversity can be found in the sub-section below.

 

Note that the frequencies mentioned above represent averages. They do NOT mean that there is necessarily a SNP every thousand or so nucleotides; There are regions of "SNP hot-spots" in the genome, as well as regions where there are no SNPs at all on the chromosome. As Professor Liu mentioned in lecture, SNPs can occur in both coding and non-coding sequences in the genome. SNPs in non-coding regions may affect splicing, transcription factor binding or the sequence of non-coding RNA. They tend to occur with less frequency at the more conserved sequences. In coding regions, SNPs often tend to be synonymous, meaning that the various forms of the SNPs all result in the same protein sequence, such that the variations have no actual effects on protein productions and thus cellular functions.

 

It is also interesting to note that most SNP-related mutations are transitions, i.e. A/G and C/T changes, rather than transversions, i.e. A/T, A/C, G/T, G/C changes. In fact, two-thirds of SNPs involve a C (cytosine) to T (thymine) substitution. (4)

 

Allele Frequency Distribution

The above sub-section dealt with how SNPs are distributed in an individual's genome. In a population, the distribution of genetic variations from SNPs can be described with allele frequency. We have mentioned above that a variation has to be “common” (occurring >1%) in the population to be considered as a SNP. Still, most alleles are actually quite rare, with the minor allele frequency, or the ratio of the less common variant to the more common variant, of most SNPs typically under 10% in a population. There are databases that document allele frequency data of human populations, such as ALFRED and Frequency Finder.

 

Figure: The minor allele of most SNPs occur in less than 10% (or 0.1) of the population. Source: Lecture notes

 

Nucleotide Diversity

Nucleotide diversity also represents a measure of the degree of polymorphism for a population. It is the average fraction of nucleotide differences between any two randomly chosen alleles from the population. It can be calculated by the proportion of different nucleotides between the DNA sequences multiplied by their respective frequencies. The following example from the lecture notes should illustrate the principle:

 

 

The 20 stands for the 20 bp in the sequence, with 17 nucleotides that do not differ, 1 nucleotide differing 1 out of 5 sequences, and 2 nucleotides differing 2 out of 5 sequences.

 

Linkage Disequilibrium

One interesting characteristic of SNPs is that alleles may exhibit linkage disequilibrium, or non-random associations at two or more loci from the same or different chromosomes that result in a more (or less) frequent occurrence of a certain combination of alleles than would be expected to happen by chance. This means that certain SNPs may be inherited along with other genetic markers in a specific way - We say that these SNPs are linked.

 

To understand linkage disequilibrium, we need to first revisit the idea of Hardy-Weinberg Equilibrium that we might have heard of in high school biology. The Hardy-Weinberg principle states that for an “idealized” population, i.e. a population that is

  1. sufficiently large,
  2. exhibits random mating,
  3. experiences no selection,
  4. experiences no mutation, and
  5. has no gene flow (such as migration),

the frequencies of the genotypes formed by two alleles, B and b, are p2, q2 and 2pq for genotypes BB, bb and Bb respectively at equilibrium, where p is the frequency of allele B and q is that of allele b in the population. For example, if p is 0.60 and q is 0.40, the resulting genotype frequencies will be as follows at equilibrium. The same system of analysis can be applied to the case of two loci (four alleles), such as A/a B/b, as well.

One locus, two alleles
Two loci, four alleles

 

Linkage disequilibrium reflects a deviation from this kind of equilibrium principle:

Disequilibrium

It can arise from the non-fulfillment of any of the assumptions outlined above. For example, a population that exhibits non-random mating, such as a bacterial population that reproduce asexually, will never reach equilibrium because there is no chance for any linkage between two loci to break down over generations through genetic recombination. Even when genetic recombination does occur for a population, there is a limit to the breaking down of linked loci, especially if the linked loci are very close in distance on a chromosome, due to the physical structure of the chromosome.

 

Figure: Crossing over in chromosomes allows for genetic recombination, but the closer in distance two genes are, the less likely crossing over will occur between them, and therefore the more "linked" they are. Image from http://secondary.thomsonlearning.com.au

This means that linkage disequilibrium is higher for loci closer in distance, and for each SNP, it is expected to decrease monotonically as distance increase on either side. For mammalian chromosomes, linkage is often lost at around 100,000 bases, while in Drosophila, linkage disequilibrium usually decays within a few hundred bases.

 

Figure: Linkage disequilibrium decreases as physical distance between SNPs increases. (7)

 

Calculation:

There are three ways to describe linkage disequilibrium mathematically.

 

1. We can describe linkage disequilibrium with the parameter D, which is the deviation from the observed genotypic frequencies from the expected. It can be calculated from the haplotype frequencies and allele frequencies. For example, if we have two loci, A and B, each with two alleles, and if we let the allele frequencies to be p1, p2, q1 and q2 for A1, A2, B1 and B2 respectively, and the haplotype frequencies to be x11, x12, x21 and x22 for A1B1, A1B2, A2B1 and A2B2 respectively, we can find D as illustrated in the following table, where the orange ovals highlight the observed genotypic frequencies, and the teal ovals highlight the expected genotypic frequencies:

 

2. Richard Lewontin introduced in 1964 another parameter for measuring linkage disequilibrium, D’, which is a normalized version of D constrained to be between 0 and 1. The normalization is achieved by the division of D by a theoretical maximum of the observed allele frequencies, Dmax, i.e.

D' = D / Dmax

where

In general, when D’ is close to 0, there is little to no linkage disequilibrium. Coupling between the two loci occurs when D’ is close to 1. D’ can also be negative, indicating repulsion to linkage.

 

3. Linkage disequilibrium can also be described by the square of the correlation coefficient, or r2, which is defined as

 

Of course, measurements of linkage disequilibrium also have to be subjected to tests for statistical significance. Chi-square test with 1 degree of freedom, general chi-square tests and permutation tests are all used to calculate the statistical significance of linkage disequilibrium.

 

Significance:

Linkage disequilibrium is important because the linking of SNPs means that there are fewer common haplotypes in the population than the actual number of possibilities. For example, a sequence [C/T] A T X C [A/C] [T/A] can give a possible of 23 haplotypes. However, in reality, 90% of the variations can be captured by a few common haplotypes. Because these common haplotypes share a number of SNPs, this also means that we can capture the majority of diversity within a region by tagging only a few SNPs, thus eliminating redundant information and making studies of genetic variations more efficient. (8)

 

Linkage disequilibrium also helps define haplotype boundary. It was found that in the human genome, there are sizable regions, or haplotype blocks, that correspond to strong linkage disequilibrium (highly linked, low recombination) within the blocks and no linkage disequilibrium between blocks. (9) Most of the blocks are small, though most of the sequences that are spanned by blocks are in large blocks. (9)

 

Figure: (A) Haplotype size distribution and (B) distribution of all genome sequence spanned by blocks over block sizes for the Swedish (white bars) and Nigerian (grey bars) population samples. (9)

The association between linkage disequilibrium, SNPs and haplotypes means that patterns of linkage disequilibrium can serve as a measure for genetic tracing, for both evolutionary genetics studies and studies of disease epidemiology. (10) There are also many software available for linkage disequilibrium analysis, listed at the end of this page.

Population Stratification

As a final note on SNP characteristics, it is important to be aware that the presence and inheritance of SNPs may be different for different populations even within the same species. For example, it was found that in a Swedish human population sample, the linkage disequilibrium, D’, retains an average of above 0.5 when polymorphic sites are up to 60kb in separation, while for a sample of Nigerians, the appreciable D’ only extends up to about 5kb, one order of magnitude lower. (7) The two populations are believed to have diverged from common ancestors about 100,000 years ago, and this divergence is certainly reflected in the difference in genetic variations. (7) Such population stratification could be the result of environmental, cultural or simply genetic selections. It plays an influential role in the results of case control association studies for human diseases (see below).

 

Figure: An example of population stratification is that the distance in polymorphic loci for an appreciable linkage disequilibrium, D', differs between the Swedish and the Yoruba (Nigerian) populations. The LD curve from an Utah population was used for normalization. (7)

 

 

SNP Discovery and Genotyping

Discovery

We have been talking about SNPs and SNP characteristics, but how do we find the SNPs in human genome?

 

The obvious way to do so is to sequence the genome of many individuals, and find mismatches in alignments of the sequences. However, as you can imagine, it is too costly everyone’s genome. Before the human genome project was completed, a technique called reduced representation shotgun (RRS) sequencing was used to discover SNPs. Modified from the whole genome shotgun sequencing method, RRS sequencing focuses on only a subset of the genome, allowing scientists to compare DNA sequences at randomly selected sites throughout the genome for SNPs. (11) With the human genome having been sequenced, researchers can now also use the genomic-alignment (GA) strategy, aligning individual shotgun sequences with publicly available sequence, to identify new SNPs. For example, about 2,730 SNPs in the human chromosome 22 has been identified using both the RRS and GA strategies. (12) After the sequencing experiments, computational methods can then be used to differentiate between potential SNPs and sequencing errors. Computational tools can also serve to align the genome assembly to expressed sequence tags (ESTs) for SNP detection in the coding regions of the genome. A number of software is available for sequence alignment and annotation, such as Phred, Phrap, and Consed, and for sequence variant detection, such as Poly Phred and Consed (see list of software below). Of course, for the potential SNPs, re-sequencing is performed to verify the SNP candidacy.

 

The disadvantage of the sequencing methods in SNP discovery is that they require a priori knowledge of DNA sequences. There are also other sequence-free methods for SNP detection, including temperature modulated heteroduplex analysis (TMHA) using denaturing high pressure liquid chromatography (DHPLC). DHPLC compares DNA segments from two samples by denaturing and reannealing the samples using PCR, then detecting the resulting duplex products. (13) In the absence of sequence differences, the DNA should form homoduplex, while sequence variation would cause heteroduplex to form. The different duplexes would produce a different chromatography pattern, which can be used to detect SNPs.

 

Figure: Denaturing high pressure liquid chromatography (DHPLC) can reveal single nucleotide variation in DNA samples. Source: rbc.gs-im3.fr

 

Genotyping

Besides allowing for the identification of unknown SNPs, DHPLC can also be used as a diagnosis tool for already-characterized SNPs. This kind of “diagnosis” is known as SNP genotyping. In other words, we are asking, for a known SNP locus, which allele does an individual have? A number of methods exist for SNP genotyping. For example, we can use allele-specific primers to amplify DNA samples, so any PCR products could be traced back to the specific primer with the specific allele. Alternatively, we can discriminate against the variant allele during the nucleotide incorporation step of PCR, by using a technique called Minisequencing, or allele-specific single-base primer extension. (14)

 

Figure: Allele-specific primer can be used for SNP genotyping by discriminating against the alternative allele. (14)

 

Figure: Single-base primer extension method discriminates between different alleles during nucleotide incorporation. (14)

 

We can also use fluorescent probes such as molecular beacons, which are hairpin-shaped oligonucleotide hybridization probes that become fluorescent only when bound to its specific DNA sequence. Furthermore, SNP chip based on microarray technology can also allow for genome-wide SNP genotyping, i.e. simultaneous genotyping of thousands of SNPs. References (14) and (15) provide detailed reviews of most SNP discovery and genotyping methods.

 

Figure: Molecular beacons can also be used for SNP genotyping. (15)

 

Why Do We Care?

As mentioned before, SNPs are the most common cause for genetic variations, making up about 90% of all human genetic variations. (4) It is important because due to its mode of inheritance, distributions and other characteristics mentioned above, it can serve as a biological marker for anthropological/evolutionary tracing and human diseases studies. Both of these applications have been reviewed in References (16) and (17). The latter application is particularly significant, because the genetic variations due to SNPs can not only impact how different people respond to diseases, but also how each one of us may respond to environmental stresses such as toxins and chemicals, and drugs and therapies. (4) A classic example is the SNP that could lead to sickle cell anemia, which is an autosomal recessive disorder caused by an A to T mutation in the beta-hemoglobin gene located on chromosome 11. The point mutation (i.e. SNP) changes the hydrophilic glutamate residue in the beta-globin to a hydrophobic valine, which disrupts the folding of the protein, inhibiting the hemoglobin’s oxygen-carrying ability. Thus, if the sickle cell anemia allele is present in two copies in a person, the person will develop the deadly disease. On the other hand, interestingly, if only one allele is present, the person carrying the allele is relatively less susceptible to malaria than non-carriers of the allele.

 

Figure: A SNP in a coding region can lead to a change in protein sequence, which here disrupts hemoglobin formation and causes sickle cell anemia. Bottom left: Normal human de-oxy hemoglobin, with red box indicating site of mutation. Bottom right: Mutated hemoglobin molecules can clump together. Source: www.ornl.gov

The link between SNPs and human diseases also implies that we can cater “personalized medicine” to people based on their genetic makeup. For example, the anticoagulant drug warfarin can only be metabolized by people with the wild-type cytochrome P450 CYP2C9*1 allele, while patients with the variant alleles CYP2C9*2 and CYP2C9*3 alleles are poor warfarin metabolizers and can be at higher risks of hemorrhage. (18)

 

The importance of SNPs in human diseases gave rise to efforts in discovering genetic mutations that potentially underlie common diseases through association studies.

SNP Association Studies

Association studies refer to the linking of genetic markers with phenotypic expression. For SNP association studies, this means finding the SNP or haplotype markers for “disease genes” that increase susceptibility to diseases and thus allow for disease prediction and diagnosis. There are two types of SNP association studies: Population-based case-control association studies and family-based association studies.

 

Population-Based Case-Control Association Studies

The idea of population-based case-control SNP association studies seem quite straightforward: From two large matching (in age and gender) sample populations, one healthy (the control) and one with a disease, individuals can be genotyped for the allele they possess. If one allele is significantly more prevalent in the disease group than the healthy one, the allele could be considered to be related to the disease, probably because the SNP is within or near a gene that influences the disease susceptibility. Although this seems like a formidable task due to the number of SNPs present in the genome, the existence of linkage disequilibrium places SNPs into haplotype blocks; thus, only a number of SNPs representative of the genome need to be typed for a genome-wide coverage in the association studies.

 

The lecture notes presented an example of the allele frequency results for this type of association studies:

Note that the expected is calculated as follows:

To see whether these results are significant, a chi-square test can be performed, giving a p-value of statistical significance:

 

Family-Based Association Studies

Family-based association studies can be conducted in unrelated families and within the same family. In studies with unrelated families, the sample can be controlled by having subject families that all have one affected child in each, such as in the following example:

By looking at allele transmission, predictions can be made as to the significance of the association between the allele and the disease. For example, in the results given above, there are 9 A alleles and 2 a alleles, giving a transmission frequency of:

 

In studies within the same family, allele frequency between affected children and the unaffected children can be compared to predict the significance of the association. Furthermore, genotyping affected and unaffected individuals within the family can reveal the loci of the associated disease gene. For example, the pedigree chart below shows affected individuals in the family as filled circles or squares, and the white line across the depicted chromosomes in the third generation indicates shared loci for all three members inflicted with the disease, suggesting a link between that part of the chromosome and the disease.

 

Figure: A pedigree chart of a family with filled shapes representing members affected by the disease. The affected members all seem to share a particular portion of the chromosome (green). Source: Lecture Notes.

 

Impact and Complications

SNP association studies have successfully identified a number of polymorphisms or haplotypes that consistently show association with complex diseases, such as those listed in the table below. Even as recently as April 2006, a collaborating group of scientists at Boston and Germany believes that they have identified a common genetic variant associated with adult and childhood obesity. (19)

 

Figure: Diseases found to be consistently associated with SNPs or haplotypes. (20)

Still, making a true association between a genetic variation and a disease is very challenging and complicated. For example, within months after the paper regarding the obesity-linked common genetic variant was published, four “Technical Comments” appeared in the January 2007 issue of Science, all with results indicating that there is actually no strong association between the genetic variant and the obesity condition, i.e. the original results from April 2006 were false positives. (19) The pitfalls of association studies result from the facts that:

 

1. Association doesn’t necessarily imply causation;

2. Several genes may affect a single phenotypic trait;

3. Population stratification may cause mismatching between sample groups because certain SNPs are unique to certain ethnic groups; and

4. Weak association effects may not be captured by the studies.

 

These complications could all lead to false positives and/or false negatives, resulting in inconsistencies and irreproducibility of the association studies outcomes. For studies to be reliable, careful study designs involving appropriate controls are needed. Large study sample sizes – preferably upward of thousands of people – are also crucial to ensure that both rare and “common” variants with different relative levels of effects can be captured in the studies. (20)

 

Conclusion

SNPs consist of single point mutations and occasionally short (1-3bp) indels that occur in >1% of the population. They make up about 90% of genetic variations between human beings, and are distributed throughout the genome. They are passed on through genetic inheritance and recombination, and they may exhibit linkage disequilibrium. Advances have been made to identify and genotype them, using techniques ranging from molecular beacon probing to microarray chips. The “commonness” of SNPs and their other characteristics allow them to be excellent genetic markers for anthropological/ evolutionary and human epidemiological studies, particularly for association studies with human diseases. Although the results from these human disease association studies show great promises in providing an increased understanding of the genetic basis of complex human disorders and even in leading to the development of personalized medicine, many gene-disease associations still lack validations due to poor study designs and the lack of reproducibility. Still, the significance of SNPs and their applications is undeniable. A number of efforts, including the SNP Consortium and the International Hap Map Project, are aimed to identify all SNPs/ haplotypes in the human genome and to construct a haplotype map. Over 6 million SNPs are now available on the SNP database on NCBI, dbSNP. See references and lists of databases below for more information.

 

References

(1) Thompson, E.A. R.A. Fisher's Contribution to Genetical Statistics. Biometrics. 46:905-914. 1990.

(2) Botstein, D., White, R.L., Skolnick, M.H., and Davis, R.W. Construction of a genetic linkage map in man using restriction fragment length polymorphisms. American Journal of Human Genetics. 32:314-331. 1980.

(3) "The Origins of Haplotypes". International HapMap Project. Available: http://www.hapmap.org/originhaplotype.html.ja

(4) "SNP Fact Sheet". Human Genome Project Information. Available: http://www.ornl.gov/sci/techresources/Human_Genome/faq/snps.shtml

(5) Begun, D.J., and Aquadro, C.F. Levels of naturally occurring DNA polymorphism correlate with recombination rates in D. melanogaster. Nature. 356:519-20. 1992.

(6) Nachman, M.W. Single nucleotide polymorphisms and recombination rate in humans. Trends Genet. 17:481-5. 2001.

(7) Reich, D.E., Cargill, M., Bolk, S., Ireland, J., Sabeti, P.C., Richter, D.J., Lavery, T., Kouyoumjian, R., Farhadian, S.F., Ward, R., and Lander, E.S. Linkage disequilibrium in the human genome. Nature. 411: 199-204. 2001.

(8) "Tagging SNPs". Available: http://slack.ser.man.ac.uk/theory/tagging.html

(9) Gabriel, S.B., Schaffner, S.F., Nguyen, H., Moore, J.M., Roy, J., Blumenstiel, B., Higgins, J., DeFelice, M., Lochner, A., Faggart, M., Liu-Cordero, S.N., Rotimi, C., Adeyemo, A., Cooper, R., Ward, R., Lander, E.S., Daly, M.J., and Altshuler, D. The structure of haplotype blocks in the human genome. Science. 296:2225-2229. 2002.

(10) Goldstein, D.B. and Weale, M.E. Population genomics: Linkage disequilibrium holds the key. Current Biology. 11: R576-R579. 2001.

(11) Altshuler, D., Pollara, V.J., Cowles, C.R., Van Etten, W.J., Baldwin, J., Linton, L., and Lander, E.S. An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature. 407:513-516. 2000.

(12) Mullikin J.C. et al. An SNP map of human chromosome 22. Nature. 407: 516-520. 2000.

(13) Wolford, J.K., Blunt, D., Ballecer, C., and Prochazka, M. High-throughput SNP detection by using DNA pooling and denaturing high performance liquid chromatography (DHPLC). Human Genetics. 107: 483-3. 2000.

(14) Twyman, R.M. SNP discovery and typing technologies for pharmacogenomics. Current Topics in Medicinal Chemistry. 4: 1421-29. 2004.

(15) Gupta, P.K., Roy, J.K., and Prasad, M. Single nucleotide polymorphisms: A new paradigm for molecular marker technology and DNA polymorphism detection with emphasis on their use in plants. Current Science. 80: 524-535. 2001.

(16) Stoneking, M. Single nucleotide polymorphisms: From the evolutionary past... Nature. 409: 821-822. 2001.

(17) Chakravarti, A. Single nucleotide polymorphisms: ... to a future of genetic medicine. Nature. 409: 822-823. 2001.

(18) Aithal, G.P., Day, C.P., Kesteven, P.J., and Daly, A.K. Association of polymorphisms in the cytochrome P450 CYP2C9 with warfarin dose requirement and risk of bleeding complications. Lancet. 353: 717-719. 1999.

(19) Herbert et al. A common genetic variant is associated with adult and childhood obesity. Science. 312: 279-283. 2006. With technical comments and response published in Science 315:187. 2007.

(20) Andrew, T., Hattersley, D.M., and McCarthy, M.I. What makes a good genetic association study? Lancet. 366: 1315-23. 2005.

 

Further Information

  • Professor Liu's STAT115 lecture note can be found here.

  • Database resources:

 

ALFRED (ALlele FREquency Database)

db SNP (SNP search from NCBI)

Entrez Gene (Gene database from NCBI)

Frequency Finder (Public Allele Frequency Metasearch database)

Genetic Association Database (An archive of human genetic association studies of diseases)

GVS (Genome Variation Server)

HAPMAP (International HapMap Project)

HGMD (Human Gene Mutation Database)

HuGENet (Human Genome Epidemiology Network)

NIEHS SNPs (National Institute of Environmental Health Sciences Environmental Genome Project)

OMIM (Online Mendelian Inheritance in Man from NCBI)

SeattleSNPs (SNP discovery and genotyping database)

A list of database from D. Nickerson's group at University of Washington

 

  • Software:

 

Haploview (Program for linkage disequilibrium and haplotype block analysis, single SNP and haplotype association tests, etc. from M. Daly's lab at the Broad Institute at MIT)

ldSelect (Program from D. Nickerson's group that analyzes linkage disequilibrium patterns)

Phred/ Phrap/ Consed (Programs for assigning quality values to each base from DNA sequencing trace files, assembling shotgun DNA sequence data and viewing assembled data)

PolyPhred (Program for SNP detection from fluorescence-based sequences)

PyPop (Python for Population Genetics)

Linkage Disequilibrium programs from the Wellcome Trust Centre for Human Genetics at University of Oxford

A list of software from D. Nickerson's group at University of Washington

This webpage provides an exhaustive, alphabetical list of genetic analysis software.

Comments (0)

You don't have permission to comment on this page.