Description of DSM - Overview Plot

The structure of the discrete state-space model (DSM) used to model 3'-processing control sequences for yeast. All state-to-state transitions not explicitly labeled have a probability of 1.0. The hexagonal elements are background elements that can take on any length in the given range with equal probability. The functional elements e1-e4 are hexamers, with individual nucleotide frequencies determined through analysis with the Gibbs Sampler. Probabilities p1-p4 were optimized empirically in analysis of known processing sites. The position of the cleavage and polyadenylation is the center of the c element. Nucleotide probabilities for the c element were obtained from the 1,352 training sequences.

overview plot of S. cerevisiae 3'-end DSM

Discrete State-space Models (DSM)

A DSM is fully defined by its state-to-state transition matrix (F), its emission matrix (H), and its state vector (x).

The transition matrix is defined such that Fi,j = probability of transition from state j to state i. In the DSM for 3'-processing site identification in yeast, all elements are either 0 or 1, other than those defined by probabilities p1-p4.

The emission matrix is defined such that Hi,j = probability of emission of character i when the model is in state j. In the DSM for 3'-processing site indentification in yeast, the elements, other than background, are as shown above.

DSMs and their application to biological sequences are described in:
White, J.V. (1988) In Spall, J. C. (ed.), Bayesian Analysis of Time Series and Dynamic Models. Marcel Dekker, New York.
White, J.V., Stultz, C.M. and Smith, T.F. (1994) Protein classification by stochastic modeling and optimal filtering of amino-acid sequences Math Biosci, 119, 35-75.
Stultz, C.M., White, J.V. and Smith, T.F. (1993) Structural analysis based on state-space modeling Protein Sci, 2, 305-314.