harlequin.jax.org logo
BU Bioinformatics
BMERC

Home
   mRNA Site Predictor

Example Output
   List of Outputs

Tools
   Sites in User Submitted Sequence
   Sites in Known Genes
   Check Results

 Documentation
  Output Format
   Overview Plot
   Top Sites
   Full Site Detail

   DSM/HMM Used
   DSM Overview Plot
  Empirical Positions
   Nucleotide Frequency


Comments, Questions to Joel Graber



List of Top Sites

Following the overview plot comes an ordered list of the most probable sites for 3'-end processing. As shown below for CYC1, the list includes the position, e-value, and DSM score. The e-value is the expected number of sites with this score or higher given the length of the query sequence. The list is cut off at sites with DSM >= 3.0, which corresponds roughly to a probability of 0.008 of occurrence in random sequence.

The probability of occurrence in random sequence was obtained empirically through the analysis of 2 x 105 nucleotides of sequence that were generated with 2nd order statistics (preserving nucleotide, di-nucleotide, and tri-nucleotide frequencies) from yeast transcripts (including 5'UTR, CDS, and 3'UTR). The e-value is obtained by multiplying the probability of occurrence by the length of the query sequence.

Top Sites (for CYC1)

position e-val DSM
554 0.09826344 4.497124
559 1.266864 3.659934
549 4.144594 3.197224
404 4.364864 3.175544
402 4.993354 3.118584
407 5.988874 3.040034