About Us


Education

What's New

Coming soon...
Rice-Arabidopsis comparison
Non-negative matrix factorization (NMF) for sequence analysis


Funded By

NIH

nigms

A multispecies comparison of the metazoan 3'-processing downstream elements and the CstF-64 RNA recognition motif

Supplemental Results

Motif Finding

The DSE region spanning 80 nt downstream of the cleavage site was examined by the Gibbs Recursive Sampler, MEME, and Improbizer. From each species, 500 3'-processing site sequences were randomly selected without replacement. As a position independent control, 500 sequences were generated from a species specific trained 0th order model and run in parallel. Several prelimiary runs were performed for each program to define optimal settings. At least 10 independent production runs were performed for each dataset. The Gibbs Recursive Sampler extracted 3 variable length motifs using command line options "-E 3 -W 0 -t -n -r -F -i 200 -S 200 -d 1,5,10,2,5,10,3,5,10". Motifs described in the "suboptimal optimal" out section were used in order to maximize the number of example motifs tabulated. The MEME program was run a beowulf cluster using options "-dna -mod oops -nmotifs 3 -text -p52 -maxsize 1000000". Improbizer runs used options "numMotifs=3 background=1 maxOcc=1" and for additional control runs the "controlRun=on" parameter was set. Motif sequence information was gathered from all three programs via perl script and used to make sequence logo images with the WebLogo script. A custom perl script was written to collect and graph positioning from the Gibbs Recursive Sampler and MEME output.

Hexamer PWC analysis

HEXAMER PWC K-MEANS CLUSTER RESULTS

File downloads

  • Fasta-formatted sequences, are plus/minus 200 nt from the putative cleavage site, with headers in the format: >seq_number est_count
  • tetramer positional word counts (PWC) are for the 80 nt downstream of the putative cleavage site, with a window size of 2 nt, sorted in decreasing order of word non-uniformity as measured by S-squared, a chi-squared like statistic
human (H. sapiens) fasta file tetramer PWC
dog (C. familiaris) fasta file tetramer PWC
rat (R. norvegicus) fasta file tetramer PWC
mouse (M. musculus) fasta file tetramer PWC
chicken (G. gallus) fasta file tetramer PWC
fugu (T. rubripes) fasta file tetramer PWC
zebrafish (D. rerio) fasta file tetramer PWC
mosquito (A. gambiae) fasta file tetramer PWC
fruit fly (D. melanogaster) fasta file tetramer PWC
nematode (C. elegans) fasta file tetramer PWC

Data description


Discriminate Function Scores
Complete Multiple Alignment
AAUAAA like table