Abstract
A large set of yeast mRNA 3'processing regulatory sequences was analyzed statistically, revealing a systematic variation that correlates with measured mRNA stability. Transcripts with relatively short halflives have a higher frequency of inclusion of 3'processing elements that include the core sequence of binding sites for the PUF proteins, which enhance mRNA turnover. These results suggest that regulatory sequence variation, typically modeled as random, may instead arise from the necessity or advantage of specifying multiple functions in a common sequence element.
Supplemental Information and Data
Sequences
The sequences for analysis were obtained by combining the transcript halflife measurements made available by Pat Brown's group at Stanford, from the associated manuscript, Wang Y et al, "Precision and functional specificity in mRNA decay" (2002) PNAS 99:58605865, with the Hidden Markov Model (HMM)based 3'processing site prediction tool from Graber JH, McAllister GD, and Smith TF, "Probabilistic prediction of Saccharomyces cerevisiae mRNA 3'processing sites." (2002) Nucleic Acids Res 30:18518. The sequence files contain the final 100 nt of the predicted transcripts, based on the most probable 3'processing site:
The large sample sizes allow use of the normal approximation to the binomial distribution to test for equality of the proportions. p is the probability that the two underlying proportions are equal, based on their measured values. Z is the standard normal variable which gives rise to p under the twosided equality test.

sequence  comparison  Z  p 
UGUAUA or UAUGUA  FASTSLOW FASTrandom SLOWrandom  4.72 4.53 2.29  2.33E06 5.85E06 0.022 
UAUAUA
 FASTSLOW FASTrandom SLOWrandom  0.92 0.52 0.78  0.357 0.602 0.436 
UACAUA
 FASTSLOW FASTrandom SLOWrandom  0.72 0.43 0.59  0.471 0.667 0.558 

Supplemental Table 2: Comparison of word frequencies in the SLOW and FAST sets
This table is available under two different analyses:
 In the first version, words are measured on a "per sequence basis", which is to say that the measured counts are the number of sequences that contain at least one copy of that word. Z and p are the same as in Table 1. The sequences are then sorted based on Zscores. The top and bottom five rows are shown below, and the complete table is available in either html or tabdelimited text formats. The probablility values are, of course, computed under the assumption of a single test. At the very least a Bonferroni correction, or some similar compensation for multiple testing should be performed.
 SLOW count  SLOW fraction  FAST count  FAST fraction  Z  p 
UGUA  195  0.514512  270  0.687023  4.896047895  9.79227E07 
UGUAU  108  0.28496  163  0.414758  3.777365604  0.000158547 
UAUGUA  65  0.171504  108  0.274809  3.44119599  0.000579246 
AUGUA  102  0.269129  151  0.384224  3.405940422  0.000659468 
UGUAUA  65  0.171504  104  0.264631  3.128226086  0.001758783 
             
GGUUUU  22  0.0580475  7  0.0178117  2.939298967  0.003289692 
GCC  166  0.437995  131  0.333333  2.988054601  0.002807732 
GUUU  197  0.519789  161  0.409669  3.067249102  0.002160525 
AACUUC  15  0.0395778  2  0.00508906  3.264404125  0.001097067 
UUCAA  96  0.253298  61  0.155216  3.384748915  0.000712539 
 In the second version, words are measured as their count relative to the total number of words counted (of the same size), which is to say that of all tetramers counted, 290, or 0.788% were UGUA. As above, the top and bottom five rows are shown below, and the complete table is available in either html or tabdelimited text formats. The probablility values are, of course, computed under the assumption of a single test. At the very least a Bonferroni correction, or some similar compensation for multiple testing should be performed.
 SLOW count  SLOW fraction  FAST count  FAST fraction  Z  p 
UGUA  290  0.00788837  439  0.0115359  5.077765312  3.82551E07 
AUAU  900  0.0244811  1093  0.0287216  3.601119673  0.000316924 
AU  4534  0.120839  5023  0.129322  3.541646742  0.00039772 
UGUAU  141  0.00387533  213  0.00565557  3.51137193  0.000445887 
             
AAAAA  541  0.0148692  460  0.0122139  3.12831989  0.001758222 
AACUUC  15  0.000416609  2  5.36639E05  3.261813759  0.00110714 
GUU  641  0.0172581  548  0.014253  3.319558351  0.000901714 
UUCAA  108  0.00296834  63  0.00167277  3.672182253  0.000240553 
AAAAAA  340  0.00944313  248  0.00665432  4.278118528  1.88611E05 
Supplemental Figure 1: Variation in the distribution of t_{1/2} with number of copies of UGUA
Distribution of halflife measurements, with transcripts grouped based on the number of copies of UGUA in the final 100 nucleotides of the projected 3'UTR. All distributions fit a lognormal curve well. Note that the distributions of the UGUAcontaining transcripts (red and blue plots) show a small but consistent shift throughout the entire distribution when compared to the UGUAabsent trnascripts. This shift is consistent with the introduction of a small population of fastturnover transcripts mediated by a UGUAcontaining motif. 

SLOW 
FAST 
Analysis of UGUAflanking sequences
coming soon...