Empirical Positioning of the Elements

The structure of the discrete state-space model (DSM) used to model 3'-processing control sequences for yeast. All state-to-state transitions not explicitly labeled have a probability of 1.0. The hexagonal elements are background elements that can take on any length in the given range with equal probability. The functional elements e1-e4 are hexamers, with individual nucleotide frequencies determined through analysis with the Gibbs Sampler. Probabilities p1-p4 were optimized empirically in analysis of known processing sites. The position of the cleavage and polyadenylation is the center of the c element. Nucleotide probabilities for the c element were obtained from the 1,352 training sequences.

element positioning near 3'-end site

The positioning of the elements was determined by measuring the distribution of the positions for each hexamer in the region near the 3'-processing site for 1,352 putative processing sites. Similar hexamers were clustered using a k-means algorithm on the basis of similar profiles.

The probabilities shown in the model at the top were empirically optimized to p1 = 0.8, p2 = 0.65, p3 = 0.5, and p4 = 0.65. The figure below shows the resulting probability of occurrence for each of the elements, e1-e4 in our model.

element positioning in the model