The
structure of the discrete statespace model (DSM) used to model
3'processing control sequences for yeast. All statetostate
transitions not explicitly labeled have a probability of 1.0.
The hexagonal elements are background elements that can take
on any length in the given range with equal probability. The
functional elements e1e4 are hexamers, with
individual nucleotide frequencies determined through analysis
with the Gibbs Sampler. Probabilities p1p4
were optimized empirically in analysis of known processing
sites. The position of the cleavage and polyadenylation is
the center of the c element. Nucleotide probabilities
for the c element were obtained from the 1,352 training
sequences.
Discrete
Statespace Models (DSM)
A
DSM is fully defined by its statetostate transition matrix
(F), its emission matrix (H), and its state
vector (x).
The transition matrix is defined such that F_{i,j}
= probability of transition from state j to state i.
In the DSM for 3'processing site identification in yeast, all
elements are either 0 or 1, other than those defined by probabilities
p1p4.
The emission matrix is defined such that H_{i,j}
= probability of emission of character i when the model
is in state j. In the DSM for 3'processing site indentification
in yeast, the elements, other than background, are as shown above.
DSMs and their application to biological sequences are described
in:
White,
J.V. (1988) In Spall, J. C. (ed.), Bayesian Analysis of Time
Series and Dynamic Models. Marcel Dekker, New York.
White, J.V., Stultz, C.M. and Smith, T.F. (1994) Protein classification
by stochastic modeling and optimal filtering of aminoacid
sequences Math Biosci, 119, 3575.
Stultz, C.M., White, J.V. and Smith, T.F. (1993) Structural
analysis based on statespace modeling Protein Sci,
2, 305314.
