Positioning of the Elements
structure of the discrete state-space model (DSM) used to model
3'-processing control sequences for yeast. All state-to-state
transitions not explicitly labeled have a probability of 1.0.
The hexagonal elements are background elements that can take
on any length in the given range with equal probability. The
functional elements e1-e4 are hexamers, with
individual nucleotide frequencies determined through analysis
with the Gibbs Sampler. Probabilities p1-p4
were optimized empirically in analysis of known processing
sites. The position of the cleavage and polyadenylation is
the center of the c element. Nucleotide probabilities
for the c element were obtained from the 1,352 training
The positioning of the elements was determined by measuring
the distribution of the positions for each hexamer in the region
near the 3'-processing site for 1,352 putative processing sites.
Similar hexamers were clustered using a k-means algorithm on
the basis of similar profiles.
The probabilities shown in the model at the top were empirically
optimized to p1 = 0.8, p2 = 0.65, p3 =
0.5, and p4 = 0.65. The figure below shows the resulting
probability of occurrence for each of the elements, e1-e4
in our model.