Supplementary materials for
"Variations in yeast 3'-processing
cis-elements correlate with transcript stability"
Joel H. Graber, In press Trends in Genetics,Sept 2003
Abstract
A large set of yeast mRNA 3'-processing regulatory sequences was analyzed statistically, revealing a systematic variation that correlates with measured mRNA stability. Transcripts with relatively short half-lives have a higher frequency of inclusion of 3'-processing elements that include the core sequence of binding sites for the PUF proteins, which enhance mRNA turnover. These results suggest that regulatory sequence variation, typically modeled as random, may instead arise from the necessity or advantage of specifying multiple functions in a common sequence element.
Supplemental Information and Data
Sequences
The sequences for analysis were obtained by combining the transcript half-life measurements made available by Pat Brown's group at Stanford, from the associated manuscript, Wang Y et al, "Precision and functional specificity in mRNA decay" (2002) PNAS 99:5860-5865, with the Hidden Markov Model (HMM-)based 3'-processing site prediction tool from Graber JH, McAllister GD, and Smith TF, "Probabilistic prediction of Saccharomyces cerevisiae mRNA 3'-processing sites." (2002) Nucleic Acids Res 30:1851-8. The sequence files contain the final 100 nt of the predicted transcripts, based on the most probable 3'-processing site:
Supplemental Table 1: Tests on equality of the proportions displayed in Figure 2 of the manuscript
The large sample sizes allow use of the normal approximation to the binomial distribution to test for equality of the proportions. p is the probability that the two underlying proportions are equal, based on their measured values. Z is the standard normal variable which gives rise to p under the two-sided equality test.
|
| sequence | comparison | Z | p |
UGUAUA or UAUGUA | FAST-SLOW FAST-random SLOW-random | 4.72 4.53 -2.29 | 2.33E-06 5.85E-06 0.022 |
UAUAUA
| FAST-SLOW FAST-random SLOW-random | 0.92 0.52 -0.78 | 0.357 0.602 0.436 |
UACAUA
| FAST-SLOW FAST-random SLOW-random | 0.72 0.43 -0.59 | 0.471 0.667 0.558 |
|
Supplemental Table 2: Comparison of word frequencies in the SLOW and FAST sets
This table is available under two different analyses:
- In the first version, words are measured on a "per sequence basis", which is to say that the measured counts are the number of sequences that contain at least one copy of that word. Z and p are the same as in Table 1. The sequences are then sorted based on Z-scores. The top and bottom five rows are shown below, and the complete table is available in either html or tab-delimited text formats. The probablility values are, of course, computed under the assumption of a single test. At the very least a Bonferroni correction, or some similar compensation for multiple testing should be performed.
| SLOW count | SLOW fraction | FAST count | FAST fraction | Z | p |
| UGUA | 195 | 0.514512 | 270 | 0.687023 | -4.896047895 | 9.79227E-07 |
| UGUAU | 108 | 0.28496 | 163 | 0.414758 | -3.777365604 | 0.000158547 |
| UAUGUA | 65 | 0.171504 | 108 | 0.274809 | -3.44119599 | 0.000579246 |
| AUGUA | 102 | 0.269129 | 151 | 0.384224 | -3.405940422 | 0.000659468 |
| UGUAUA | 65 | 0.171504 | 104 | 0.264631 | -3.128226086 | 0.001758783 |
| | | | | | | | | | | | | | |
| GGUUUU | 22 | 0.0580475 | 7 | 0.0178117 | 2.939298967 | 0.003289692 |
| GCC | 166 | 0.437995 | 131 | 0.333333 | 2.988054601 | 0.002807732 |
| GUUU | 197 | 0.519789 | 161 | 0.409669 | 3.067249102 | 0.002160525 |
| AACUUC | 15 | 0.0395778 | 2 | 0.00508906 | 3.264404125 | 0.001097067 |
| UUCAA | 96 | 0.253298 | 61 | 0.155216 | 3.384748915 | 0.000712539 |
- In the second version, words are measured as their count relative to the total number of words counted (of the same size), which is to say that of all tetramers counted, 290, or 0.788% were UGUA. As above, the top and bottom five rows are shown below, and the complete table is available in either html or tab-delimited text formats. The probablility values are, of course, computed under the assumption of a single test. At the very least a Bonferroni correction, or some similar compensation for multiple testing should be performed.
| SLOW count | SLOW fraction | FAST count | FAST fraction | Z | p |
| UGUA | 290 | 0.00788837 | 439 | 0.0115359 | -5.077765312 | 3.82551E-07 |
| AUAU | 900 | 0.0244811 | 1093 | 0.0287216 | -3.601119673 | 0.000316924 |
| AU | 4534 | 0.120839 | 5023 | 0.129322 | -3.541646742 | 0.00039772 |
| UGUAU | 141 | 0.00387533 | 213 | 0.00565557 | -3.51137193 | 0.000445887 |
| | | | | | | | | | | | | | |
| AAAAA | 541 | 0.0148692 | 460 | 0.0122139 | 3.12831989 | 0.001758222 |
| AACUUC | 15 | 0.000416609 | 2 | 5.36639E-05 | 3.261813759 | 0.00110714 |
| GUU | 641 | 0.0172581 | 548 | 0.014253 | 3.319558351 | 0.000901714 |
| UUCAA | 108 | 0.00296834 | 63 | 0.00167277 | 3.672182253 | 0.000240553 |
| AAAAAA | 340 | 0.00944313 | 248 | 0.00665432 | 4.278118528 | 1.88611E-05 |
Supplemental Figure 1: Variation in the distribution of t1/2 with number of copies of UGUA
| Distribution of half-life measurements, with transcripts grouped based on the number of copies of UGUA in the final 100 nucleotides of the projected 3'-UTR. All distributions fit a log-normal curve well. Note that the distributions of the UGUA-containing transcripts (red and blue plots) show a small but consistent shift throughout the entire distribution when compared to the UGUA-absent trnascripts. This shift is consistent with the introduction of a small population of fast-turnover transcripts mediated by a UGUA-containing motif. |
|
Supplemental Figure 2 (not referenced in the paper): Sequence Logo Representation of Gibbs Sampler Analysis of the SLOW and FAST sets
 SLOW |
 FAST |
Analysis of UGUA-flanking sequences
coming soon...