Output Overview Plot

The first display shown is the overview of the entire submitted sequence, shown below for yeast gene CYC1 (YJR048W). The two plots shown represent the relative likelihood of esach position being the 3'-most end of the transcript. The gray plot is generated by the maximum-likelihood (ML) model, while the blue plot is generated by the discrete state-space model (DSM), also referred to as a hidden Markov Model (HMM). The magenta dashed line is at a DSM score of approximately 3.8, the point at which our analysis indicated that the false positive rate is 1 in 1000.

If the sequence under investigation can be identified as one of the predicted or known genes from yeast, then the coding sequence (CDS) is marked by a green horizontal line. In addition, any known sites, whether published, or implied by our EST analysis are shown as red asterisks. The y-value used for the known sites is that of the corresponding DSM prediction.

CYC1 3' end prediction overview WIDTH=

The ML method is under continuing development. The results are included here, as they help to identify possible false positives of the DSM prediction. The DSM will occasionally make predictions in regions where matches to the processing elements are strong, but the surrounding sequence is dissimilar to typical 3'UTR sequence. (Previous experimental studies, referenced in our paper, have shown that the intervening sequence can have an inhibitory effect.) The DSM has no current ability to penalize these sites, whereas the ML method will take into account the local sequence content. The characteristic signature of this type of pseudo-site is a high DSM score, but low ML score.