|
|
About Us
Education
What's New
Coming soon...
Rice-Arabidopsis comparison
Non-negative matrix factorization (NMF) for sequence analysis
Funded By
|
EST library analysis
Analytic Tools
Supplemental Results referenced in the manuscript
|
Supplemental Figure 1. Mouse libraries: median 3'-UTR length vs. L-divergence from the reference set
In this figure, we show a measurement the L-Divergence between the transcript length distributions of
the ENSEMBL reference transcript set and all EST libraries for which we could determine at
least 50 unique putative 3'-processing sites. We plot the median 3'-UTR length (based on the putative
3'-processing sites) as a function of the L-Divergence between the EST library and the
ENSEMBL reference transcript set. As shown, the scatter plot naturally segregates into two classes,
for which L = 0.15 acts as an approximate separation. All of the libraries with L > 0.15 come
from the NIH Brain Molecular Anatomy Project, a somewhat expected result
since many of these EST libraries have very
specific and extreme selections on cDNA insert size. Examination of the data for libraries
with L < 0.15 reveals a significant correlation (r = 0.61) indicating a general lengthening
of the 3'-UTR distribution with increasing divergence from the reference cDNA set.
This result is consistent with a depletion or absence of short transcripts (and therefore
short 3'-UTR sequences) in many EST libraries. In further support of the depletion of short transcripts, we examined a number of comparative
plots such as shown in Figure 1 of the manuscript and the annotation of the associated EST
libraries (data not shown), and found a general depletion of the transcript distribution for
short (< 500 nucleotides) transcripts
and annotations indicating removal of clones with short cDNA inserts, respectively. The EST
libraries in this plot are restricted to those for which we could identify at least 50 unique 3'-processing sites.
|
|
Supplemental Figure 2. Mouse: 3'-UTR length distribution as a function of transcript length, from PACdb
In This figure, we display a heatmap representation of the dependence
of 3'-UTR length on approximate transcript length, based on high confidence
mouse 3'-processing extracted from PACdb (http://harlequin.jax.org/pacdb/).
Since 5'-UTRs are roughly constant in size and relatively short, the
sum of CDS and 3'-UTR lengths was used as a surrogate for transcript length. As shown,
the distribution of 3'-UTR lengths with respect to transcript length is far from uniform,
with longer transcripts preferentially associating with long 3'-UTR sequences.
Accurate assessment of the 3'-UTR length distribution and, more importantly,
comparison of distributions between distinct EST libraries will therefore be highly affected
by the transcript sampling in the EST library. Both transcript
and 3'-UTR lengths were separated into 50-nucleotide bins. The heat map displays
the number of transcripts observed in PACdb for the paired 3'-UTR length (y-coordinate)
and transcript (x-coordinate) lengths.
|
|