The first display shown is the overview of the entire submitted
sequence, shown below for yeast gene CYC1 (YJR048W).
The two plots shown represent the relative likelihood of esach
position being the 3'-most end of the transcript. The gray
plot is generated by the maximum-likelihood (ML) model, while
the blue plot is generated by the
discrete state-space model (DSM), also referred to as a hidden
Markov Model (HMM). The magenta dashed line is at a DSM score
of approximately 3.8, the point at which our analysis indicated
that the false positive rate is 1 in 1000.
If the sequence under investigation can be identified as one
of the predicted or known genes from yeast, then the coding
sequence (CDS) is marked by a green horizontal line. In addition,
any known sites, whether published, or implied by our EST analysis
are shown as red asterisks. The y-value used for the known
sites is that of the corresponding DSM prediction.
The ML method is under continuing development. The results
are included here, as they help to identify possible false
positives of the DSM prediction. The DSM will occasionally
make predictions in regions where matches to the processing
elements are strong, but the surrounding sequence is dissimilar
to typical 3'UTR sequence. (Previous experimental studies,
referenced in our paper, have shown that the intervening sequence
can have an inhibitory effect.) The DSM has no current ability
to penalize these sites, whereas the ML method will take into
account the local sequence content. The characteristic signature
of this type of pseudo-site is a high DSM score, but low ML
score.
|