Supplementary materials for
"Variations in yeast 3'-processing cis-elements correlate with transcript stability"
Joel H. Graber, In press
Trends in Genetics,Sept 2003

Abstract
Supplemental Table 1
Supplemental Table 2
Supplemental Figure 1
Supplemental Figure 2
Flanking Sequence Analysis

Abstract

A large set of yeast mRNA 3'-processing regulatory sequences was analyzed statistically, revealing a systematic variation that correlates with measured mRNA stability. Transcripts with relatively short half-lives have a higher frequency of inclusion of 3'-processing elements that include the core sequence of binding sites for the PUF proteins, which enhance mRNA turnover. These results suggest that regulatory sequence variation, typically modeled as random, may instead arise from the necessity or advantage of specifying multiple functions in a common sequence element.

Supplemental Information and Data

Sequences

The sequences for analysis were obtained by combining the transcript half-life measurements made available by Pat Brown's group at Stanford, from the associated manuscript, Wang Y et al, "Precision and functional specificity in mRNA decay" (2002) PNAS 99:5860-5865, with the Hidden Markov Model (HMM-)based 3'-processing site prediction tool from Graber JH, McAllister GD, and Smith TF, "Probabilistic prediction of Saccharomyces cerevisiae mRNA 3'-processing sites." (2002) Nucleic Acids Res 30:1851-8. The sequence files contain the final 100 nt of the predicted transcripts, based on the most probable 3'-processing site:
SLOW ("slowUTR.fa", 379 sequences with t1/2 > 50 minutes) FAST ("fastUTR.fa", 393 sequences with t1/2 < 10 minutes)

Supplemental Table 1: Tests on equality of the proportions displayed in Figure 2 of the manuscript


The large sample sizes allow use of the normal approximation to the binomial distribution to test for equality of the proportions. p is the probability that the two underlying proportions are equal, based on their measured values. Z is the standard normal variable which gives rise to p under the two-sided equality test.
sequencecomparisonZp
UGUAUA
or
UAUGUA
FAST-SLOW
FAST-random
SLOW-random
4.72
4.53
-2.29
2.33E-06
5.85E-06
0.022

UAUAUA
FAST-SLOW
FAST-random
SLOW-random
0.92
0.52
-0.78
0.357
0.602
0.436

UACAUA
FAST-SLOW
FAST-random
SLOW-random
0.72
0.43
-0.59
0.471
0.667
0.558

Supplemental Table 2: Comparison of word frequencies in the SLOW and FAST sets

This table is available under two different analyses:
  1. In the first version, words are measured on a "per sequence basis", which is to say that the measured counts are the number of sequences that contain at least one copy of that word. Z and p are the same as in Table 1. The sequences are then sorted based on Z-scores. The top and bottom five rows are shown below, and the complete table is available in either html or tab-delimited text formats. The probablility values are, of course, computed under the assumption of a single test. At the very least a Bonferroni correction, or some similar compensation for multiple testing should be performed.
    SLOW countSLOW fractionFAST countFAST fractionZp
    UGUA1950.5145122700.687023-4.8960478959.79227E-07
    UGUAU1080.284961630.414758-3.7773656040.000158547
    UAUGUA650.1715041080.274809-3.441195990.000579246
    AUGUA1020.2691291510.384224-3.4059404220.000659468
    UGUAUA650.1715041040.264631-3.1282260860.001758783
    |||||||
    GGUUUU220.058047570.01781172.9392989670.003289692
    GCC1660.4379951310.3333332.9880546010.002807732
    GUUU1970.5197891610.4096693.0672491020.002160525
    AACUUC150.039577820.005089063.2644041250.001097067
    UUCAA960.253298610.1552163.3847489150.000712539
  2. In the second version, words are measured as their count relative to the total number of words counted (of the same size), which is to say that of all tetramers counted, 290, or 0.788% were UGUA. As above, the top and bottom five rows are shown below, and the complete table is available in either html or tab-delimited text formats. The probablility values are, of course, computed under the assumption of a single test. At the very least a Bonferroni correction, or some similar compensation for multiple testing should be performed.
    SLOW countSLOW fractionFAST countFAST fractionZp
    UGUA2900.007888374390.0115359-5.0777653123.82551E-07
    AUAU9000.024481110930.0287216-3.6011196730.000316924
    AU45340.12083950230.129322-3.5416467420.00039772
    UGUAU1410.003875332130.00565557-3.511371930.000445887
    |||||||
    AAAAA5410.01486924600.01221393.128319890.001758222
    AACUUC150.00041660925.36639E-053.2618137590.00110714
    GUU6410.01725815480.0142533.3195583510.000901714
    UUCAA1080.00296834630.001672773.6721822530.000240553
    AAAAAA3400.009443132480.006654324.2781185281.88611E-05

Supplemental Figure 1: Variation in the distribution of t1/2 with number of copies of UGUA

Distribution of half-life measurements, with transcripts grouped based on the number of copies of UGUA in the final 100 nucleotides of the projected 3'-UTR. All distributions fit a log-normal curve well. Note that the distributions of the UGUA-containing transcripts (red and blue plots) show a small but consistent shift throughout the entire distribution when compared to the UGUA-absent trnascripts. This shift is consistent with the introduction of a small population of fast-turnover transcripts mediated by a UGUA-containing motif.

Supplemental Figure 2 (not referenced in the paper): Sequence Logo Representation of Gibbs Sampler Analysis of the SLOW and FAST sets


SLOW

FAST

Analysis of UGUA-flanking sequences

coming soon...