The
structure of the discrete statespace model (DSM) used to model
3'processing control sequences for yeast. All statetostate
transitions not explicitly labeled have a probability of 1.0.
The hexagonal elements are background elements that can take
on any length in the given range with equal probability. The
functional elements e1e4 are hexamers, with
individual nucleotide frequencies determined through analysis
with the Gibbs Sampler. Probabilities p1p4
were optimized empirically in analysis of known processing
sites. The position of the cleavage and polyadenylation is
the center of the c element. Nucleotide probabilities
for the c element were obtained from the 1,352 training
sequences.
The positioning of the elements was determined by measuring
the distribution of the positions for each hexamer in the region
near the 3'processing site for 1,352 putative processing sites.
Similar hexamers were clustered using a kmeans algorithm on
the basis of similar profiles.
The probabilities shown in the model at the top were empirically
optimized to p1 = 0.8, p2 = 0.65, p3 =
0.5, and p4 = 0.65. The figure below shows the resulting
probability of occurrence for each of the elements, e1e4
in our model.
