- There are
several popular DNA array platforms in use. A major difference between the
platforms is how the genes are probed. In the case of cDNA arrays, full length cDNA is spotted on the DNA array
for each gene to be queried. Oligonucleotide arrays are comprised of k-mer nucleotide probes for each gene (k is typically 25-40). An advantage of oligonucleotide
probes is their enhanced sensitivity, or the ability to detect weak expressed
transcripts. However, the length of the gene probe directly corresponds to how
specific the hybridization of the probe is. A variation of oligonucleotide
arrays is perfect match/mismatch arrays, introduced by Affymetrix. A set of paired
oligonucleotide probes, typically 25-mers, is designed for each gene. Each pair
contains the canonical sequence, or perfect match probe, of the gene and also a
deliberate mutation in the 13th position (middle) of the gene, or
mismatch probe. The mismatch probe measures the degree of cross
hybridization, or how much lower the
detection signals for noise are.
Compare Affymetrix arrays with
cDNA arrays with regard to the given criteria:
- Maximum number of probes per chip with good detection capability.
- High specificity of probes, detection of only the desired gene.
- High sensitivity of probes, ability to detect low expressed genes.
- Lowest price per DNA array to experimenter.
- A key
issue when comparing DNA arrays is normalization, or the process by which
expression levels are made comparable. A common approach to normalization is
global normalization. In this approach, the averages of the expression
distributions (expression levels for all genes within a DNA array) across DNA
arrays are set to be equal. This follows from the assumption that while genes
can be differentially expressed, the amount of transcription is essentially
similar across samples.
- You are
comparing two separate single hybridization sample DNA array experiments, one
from liver tissue and one from whole blood. Assume the major cell types for
liver are hepatocytes and for whole blood are red blood cells. Would you expect
global normalization to perform adequately in this case? Why or why not? Hint:
Consider the organelles in both cell types.
- Affymetrix
implements a secondary normalization by utilizing the perfect match/mismatch
model. In this model, the mismatch probe contains a deliberate mismatch in the
13th position to measures the degree of cross hybridization, and is
"subtracted" from the perfect match probe. However, Naef et al (2003) have
recently shown that for low expressed genes, perfect match alone is a good
indicator. For saturated genes, mismatch alone is a still a valid indicator.
Given their findings, what would you suggest as an alternative to Affymetrix's
normalization procedure?
- A researcher gives you a list of common housekeeping genes to help you with
normalization. Housekeeping genes are a set of genes that are ubiquitously expressed
in a relatively stable manner. As opposed to normalizing to the average gene of
the entire array, this form of normalization uses the average of the
housekeeping genes.
For your experiments,
you are asked to compare the differences in gene intensities due to global
versus housekeeping normalization. As a simple test, you notice that when you
use global normalization, the housekeeping genes are significant higher (but
not saturated) in one array versus another.
- A
biologist designed a series of two sample hybridization DNA array experiments
consistent with a reference design. See Figure 6.17(c) from Pevsner. Halfway
through his experiments, he realizes he inadvertently hybridized twice as much
sample as compared to pool or reference.
- What
effect will this have on normalization?
- He decides
to adjust the sample to the amount suggested in his protocol for future
experiments. What effect will this have his remaining DNA arrays?
- A researcher gives you a list of 1,000 significantly expressed genes from a
DNA array with 12,000 genes. He utilized the t-test with a p-value cutoff of
0.05. He realizes that you expect 600 false positives. How many of the genes
identified as significantly expressed do you think have a biologically
significant role? What if the p-value cutoff had been 0.000001? (0.012 false
positives) HINT: Think why a biologists
clusters genes rather than directly studying gene lists.
- When
examining the clusters of gene that result from clustering gene expression
values, researchers will usually use one of the two following assumptions to
help them analyze their data. State which of the following assumptions you
agree with and why.
- Genes in the cluster are
co-regulated.
- Genes in the cluster are
involved in the same biological function.
- Genes in the cluster are
bound by the same transcription factor(s)
- Genes can be a member of
only one cluster
- A
researcher is studying cancer of the thyroid. There are two hospitals that
have run DNA arrays on the same chip (Affymetrix U133A) for several samples of
normal thyroid tissue and cancerous thyroid tissue. The researcher ran SAM on
the data from one of the hospitals comparing the normal and cancerous tissues
and identified several differentially expressed genes. However, when they
pooled both datasets together, they got the following SAM plot:
They ask you why SAM failed to find any differentially
expressed genes. What should you tell them?
Suggested Reading
Naef, et al. (2003) A
study of accuracy and precision in oligonucleotide arrays: extracting more
signal at large concentrations. Bioinformatics 19(2): 178-84.
Pevsner. (2003) Bioinformatics and
Functional Genomics. Wiley-Liss. 551-562.
Tusher, Tibshirani and Chu (2001): "Significance analysis of
microarrays applied to the ionizing radiation response". PNAS 2001 98:
5116-5121.