Computational studies of 3'-processing in Arabidopsis


Post-transcriptional processing of mRNA is an important mechanism of gene regulation. This project is concerned with 3'-processing of mRNA, a process that includes cleavage of the precursor RNA sequence and subsequent polyadenylation. 3'-processing sites are determined by cis-acting signal elements, short control sequences within the immature precursor mRNA. The signals in yeast have been well characterized, but studies in Arabidopsis have only begun to identify the precise sequence and positioning characteristics of the required cis-elements.

Alternative 3'-processing of a specific mRNA, similar to alternative splicing, is a mechanism for regulating the sequence of the mature mRNA. Unlike alternative splicing, however, alternative 3'-processing typically changes only the 3'-untranslated region (3'-UTR) of the mRNA, leaving the protein coding sequence unaltered. Variation of the 3'-UTR sequence results in altered regulatory elements specific to the 3'-UTR, such as mRNA stability, translation, or localization elements.

The specific goals of this are project are:

  1. establishment of a curated database of experimentally determined 3'-processing sites for Arabidopsis,
  2. construction of a discrete state-space model (DSM) based predictive tool for Arabidopsis mRNA 3'-processing sites, and
  3. creation of a freely accessible web server interface to both the database and the predictive tools

DSM models are a form of Hidden Markov Models (HMM), in which the model structure is manually, rather than automatically, designed. DSM based models have previously been used to predict 3'-processing sites in the yeast, Saccharomyces cerevisiae, which has 3'-processing control elements similar to those found in plants.

Public access to predictive tools will make it possible for external researchers to analyze any gene or group of genes of interest, even in the absence of experimentally determined sites in the curated database. Prediction of 3'-processing sites can also be coupled with gene prediction software to make the prediction more complete, including probable 3'-UTR sequences.

Updates, May 2003

  • The 3'-processing site database has been re-designed and data from arabidopsis is being entered. We estimate that full public access to the database will be available in 15 July, 2003.
  • Access to the HMM-based prediction of 3'-processing sites in Arabidopsis sequence is expected to be available in August 2003.