harlequin.jax.org logo
BU Bioinformatics
BMERC

Home
   mRNA Site Predictor

Example Output    List of Outputs

  Tools
   Sites in User Submitted Sequence
   Sites in Known Genes
   Check Results

 Documentation
  Output Format
   Overview Plot
   Top Sites
   Full Site Detail

   DSM/HMM Used
   DSM Overview Plot
  Empirical Positions
   Nucleotide Frequency


Comments, Questions to Joel Graber



Empirical Positioning of the Elements

The structure of the discrete state-space model (DSM) used to model 3'-processing control sequences for yeast. All state-to-state transitions not explicitly labeled have a probability of 1.0. The hexagonal elements are background elements that can take on any length in the given range with equal probability. The functional elements e1-e4 are hexamers, with individual nucleotide frequencies determined through analysis with the Gibbs Sampler. Probabilities p1-p4 were optimized empirically in analysis of known processing sites. The position of the cleavage and polyadenylation is the center of the c element. Nucleotide probabilities for the c element were obtained from the 1,352 training sequences.

element positioning near 3'-end site



The positioning of the elements was determined by measuring the distribution of the positions for each hexamer in the region near the 3'-processing site for 1,352 putative processing sites. Similar hexamers were clustered using a k-means algorithm on the basis of similar profiles.

The probabilities shown in the model at the top were empirically optimized to p1 = 0.8, p2 = 0.65, p3 = 0.5, and p4 = 0.65. The figure below shows the resulting probability of occurrence for each of the elements, e1-e4 in our model.

element positioning in the model