List of Top Sites

Following the overview plot comes an ordered list of the most probable sites for 3'-end processing. As shown below for CYC1, the list includes the position, e-value, and DSM score. The e-value is the expected number of sites with this score or higher given the length of the query sequence. The list is cut off at sites with DSM >= 3.0, which corresponds roughly to a probability of 0.008 of occurrence in random sequence.

The probability of occurrence in random sequence was obtained empirically through the analysis of 2 x 105 nucleotides of sequence that were generated with 2nd order statistics (preserving nucleotide, di-nucleotide, and tri-nucleotide frequencies) from yeast transcripts (including 5'UTR, CDS, and 3'UTR). The e-value is obtained by multiplying the probability of occurrence by the length of the query sequence.

Top Sites (for CYC1)

position e-val DSM
554 0.09826344 4.497124
559 1.266864 3.659934
549 4.144594 3.197224
404 4.364864 3.175544
402 4.993354 3.118584
407 5.988874 3.040034