Below are detailed explanations of :



EST/Putative 3'-Processing Site Properties

PACdb stores information about each putative 3'-processing site and the ESTs that determine those sites. These properties can be used to help decide whether a putative 3' processing site is a real 3' processing site, or if that site is likely to be false. PACdb stores this information in a bitstring called "properties". The table below lists each of the properties that PACdb tracks (or plans to track in the future) and also shows how the bits in the properties bitstring are used to store this information.

Bit Property Number of bits How bits are used Description
Genomic A-rich sequence adjacent to site
2
00: s < 8
01: 8 ≤ s < 16
10: 16 ≤ s < 36
11: s ≥ 36
Score based on consecutive and non-consecutive length (nts) of genomic poly(A) sequence adjacent to putative 3'-processing site
Presence of Repeats
1
0: absent
1: present
Check for repeats in EST (excluding ubiquitous reps)
PolyA/T Tail
2
00: tail absent
01: trailing polyA present
10: leading polyT present
Presence of PolyA or PolyT tail
Library Wide Contamination
(not linker) (NYI)
1
0: absent
1: present
Presence of Library-wide contamination other than unclipped linker sequence
EST hits one or multiple places
1
0: unique hit
1: multiple hits
EST hits in one or multiple places in genome
Vector contamination
1
0: absent
1: present
If EST contains vector sequence at terminal end(s)
Reverse complement (NYI)
1
0: forward sense
1: reverse complemented
If EST is found to be reverse complemented
EST Multiplicity
2
00: 1 EST hit
01: 2-3 ESTs
10: 4-5 ESTs
11: 6+ ESTs
Number of ESTs providing evidence for a putative 3'-processing site
Restriction Enzyme Site
2
00: RE site absent
01: Imperfect RE site match
10: Perfect RE site match
Terminal restriction enzyme (RE) site check (necessary since restriction enzymes are used in the EST creation process). Due to enzyme star activity, imperfect matches may still be real RE site hits.
Sequencing Error Percentile (NYI)
2
00: x ≥ 95
01: 95 > x ≥ 90
10: 90 > x ≥ 80
11: x < 80
Comparative Positional Sequence Error
EST Insert (NYI)
1
0: absent
1: present
EST has unknown (non-genomic) insert
TOTAL BITS
17
*NYI: not yet implemented

Confidence Level Assignment

Below is the table that lists which properties are used to calculate the confidence level of a putative 3'-processing site. PACdb does not store the confidence level, but instead calculates it on demand based on the properties bitlist associated with an EST-site pair. Certain properties can be very indicative of a false site depending on their degree. In the table below, these are the fields that are shaded gray. As the table goes from "Very High" confidence to "Very Low" confidence, the likelihood that a putative 3'-processing site is false increases.

Bit Property Very High High Medium Low Very Low
Genomic A-rich score (false priming) s ≤ 8 s ≤ 8 8 < s < 16 16 ≤ s < 36 s ≥ 36
EST has unique/multiple hits Unique Unique Unique Multi D/M
Number of EST hits supporting site 6+ ESTs 3-5 ESTs D/M D/M D/M
Restriction Enzyme Site None None None Imperfect match Perfect match
PolyA/T Tail Present D/M D/M D/M D/M

D/M = Doesn't matter
Gray Field => If gray field satisfied, it alone decides the confidence level