PACdb Download Area


Our interlinked databases now hold over 36 gigabytes of data. Because of this, we are unable to post the full contents of the database tables here. However, we are posting organism-wide sequence data which is available for public use. Please cite PACdb if you use the data in your research.
Thanks for your interest and feel free to e-mail us at if you have any questions or requests.


Organism 3'-UTR* PolyA site flanking seq.** Version/Date
Human Very High

High/Very High
Not yet available Build 36
Mouse Very High

High/Very High
Not yet available Build 36
Rat Not yet available Not yet available N/A
Dog Not yet available Not yet available N/A
Chicken Not yet available Not yet available N/A
Zebrafish Not yet available Not yet available N/A
Fugu Puffer Not yet available Not yet available N/A
D. melanogaster Very High

High/Very High
Not yet available 12/19/2005
Mosquito Not yet available Not yet available N/A
C. elegans Very High

High/Very High
Not yet available 12/19/2005
Rice Not yet available Not yet available N/A
A. thaliana High/Very High Not yet available N/A
S. cerevisiae Not yet available Not yet available N/A
S. pombe Not yet available Not yet available N/A

* The 3'-UTR sequence is the single longest sequence for each gene whose terminating putative cleavage site is of "High" or "Very High" confidence and has been clustered with any adjacent putative cleavage sites using an organism-specific distance threshold (see PACdb paper).

** The flanking sequence is for any site that is of "High" or "Very High" confidence and has been clustered with any adjacent putative cleavage sites using an organism-specific distance threshold (see PACdb paper).

PACdb's FASTA formatting

FASTA formatted sequences from PACdb use a "rich defline" that communicates information related to the sequence, such as related genomic information, related 3-processing information, and related gene information. There are a number of fields that may or may not be present. If that field is not present, the abbreviation is present instead. Here are the fields:

Genomic Fields:
genome sequence id (gsid) : could be a chromosome or contig
genome sequence start (gsi)
genome sequence stop (gsf)

3'-Processing Fields:
3'-processing coordinate (pac)
multiplicity, or the number of sequences (ESTs) giving this site (trc)

Gene Fields:
probable gene id (exid)
gene CDS start (tri)
gene CDS stop (trf)


Example:
>1:5695:5895:5895:trc:At1g01010.1:tri:trf
TTCTTTGCTCTGTTTTCTCGCTCCGGAAAAGTTTGAAGTTATATTTTATT
AGTATGTAAAGAAGAGAAAAAGGGGGAAAGAAGAGAGAAGAAAAATGCAG
AAAATCATATATATGAATTGGAAAAAAGTATATGTAATAATAATTAGTGC
ATCGTTTTGTGGTGTAGTTTATATAAATAAAGTGATATATAGTCTTGTAT