harlequin.jax.org logo
BU Bioinformatics
BMERC

Home
   mRNA Site Predictor

Example Output    List of Outputs

  Tools
   Sites in User Submitted Sequence
   Sites in Known Genes
   Check Results

 Documentation
  Output Format
   Overview Plot
   Top Sites
   Full Site Detail

   DSM/HMM Used
   DSM Overview Plot
  Empirical Positions
   Nucleotide Frequency


Comments, Questions to Joel Graber



Description of DSM - Overview Plot

The structure of the discrete state-space model (DSM) used to model 3'-processing control sequences for yeast. All state-to-state transitions not explicitly labeled have a probability of 1.0. The hexagonal elements are background elements that can take on any length in the given range with equal probability. The functional elements e1-e4 are hexamers, with individual nucleotide frequencies determined through analysis with the Gibbs Sampler. Probabilities p1-p4 were optimized empirically in analysis of known processing sites. The position of the cleavage and polyadenylation is the center of the c element. Nucleotide probabilities for the c element were obtained from the 1,352 training sequences.

overview plot of S. cerevisiae 3'-end DSM




Discrete State-space Models (DSM)

A DSM is fully defined by its state-to-state transition matrix (F), its emission matrix (H), and its state vector (x).

The transition matrix is defined such that Fi,j = probability of transition from state j to state i. In the DSM for 3'-processing site identification in yeast, all elements are either 0 or 1, other than those defined by probabilities p1-p4.

The emission matrix is defined such that Hi,j = probability of emission of character i when the model is in state j. In the DSM for 3'-processing site indentification in yeast, the elements, other than background, are as shown above.

DSMs and their application to biological sequences are described in:
White, J.V. (1988) In Spall, J. C. (ed.), Bayesian Analysis of Time Series and Dynamic Models. Marcel Dekker, New York.
White, J.V., Stultz, C.M. and Smith, T.F. (1994) Protein classification by stochastic modeling and optimal filtering of amino-acid sequences Math Biosci, 119, 35-75.
Stultz, C.M., White, J.V. and Smith, T.F. (1993) Structural analysis based on state-space modeling Protein Sci, 2, 305-314.