Below are detailed explanations of :



EST/Putative 3'-Processing Site Properties

AtPACdb stores information about each putative 3'-processing site and the ESTs that determine those sites. These properties can be used to help decide whether a putative 3' processing site is a real 3' processing site, or if that site is likely to be false. AtPACdb stores this information in a bitstring called "properties". The table below lists each of the properties that AtPACdb tracks (or plans to track in the future) and also shows how the bits in the properties bitstring are used to store this information.

Bit Property Number of bits How bits are used Description
Genomic A-rich sequence adjacent to site
2
00: s < 8
01: 8 ≤ s > 16
10: 16 ≤ s > 36
11: s ≥ 36
Score based on consecutive and non-consecutive length (nts) of genomic poly(A) sequence adjacent to putative 3'-processing site
Presence of Repeats
1
0: absent
1: present
Check for repeats in EST (excluding ubiquitous reps)
PolyA
1
0: absent
1: present
Presence of PolyA tail
Library Wide Contamination
(not linker) (NYI)
1
0: absent
1: present
Presence of Library-wide contamination other than unclipped linker sequence
EST hits one or multiple places
1
0: unique hit
1: multiple hits
EST hits in one or multiple places in genome
Vector contamination
1
0: absent
1: present
If EST contains vector sequence at terminal end(s)
Reverse complement (NYI)
1
0: forward sense
1: reverse complemented
If EST is found to be reverse complemented
EST Multiplicity
2
00: 1 EST hit
01: 2-3 ESTs
10: 4-5 ESTs
11: 6+ ESTs
Number of ESTs providing evidence for a putative 3'-processing site
Restriction Enzyme Site
4
0000: RE1 absent, RE2 absent
0001: RE1 imperfect, RE2 absent
0010: RE1 perfect, RE2 absent
0011: RE1 absent, RE2 imperfect
0100: RE1 imperfect, RE2 imperfect
0101: RE1 perfect, RE2 imperfect
0110: RE1 absent, RE2 perfect
0111: RE1 imperfect, RE2 perfect
1000: RE1 perfect, RE2 perfect
EST has terminal RE site. Since two Restriction Enzymes are used, an EST could potentially have both RE sites. Due to enzyme star activity, imperfect matches may still be real RE sites.
Sequencing Error Percentile (NYI)
2
00: x ≥ 95
01: 95 > x ≥ 90
10: 90 > x ≥ 80
11: x < 80
Comparative Positional Sequence Error
EST Insert (NYI)
1
0: absent
1: present
EST has unknown (non-genomic) insert
TOTAL BITS
17
*NYI: not yet implemented

Confidence Level Assignment

Below is the table that lists which properties are used to calculate the confidence level of a putative 3'-processing site. AtPACdb does not store the confidence level, but instead calculates it on demand based on the properties bitlist associated with an EST-site pair. Certain properties can be very indicative of a false site depending on their degree. In the table below, these are the fields that are shaded gray. As the table goes from "Very High" confidence to "Very Low" confidence, the likelihood that a putative 3'-processing site is false increases.

Bit Property Very High High Medium Low Very Low
Genomic A-rich score (false priming) s < 8 s < 8 8 ≥ s < 16 16 ≥ s < 36 s > 36
EST has unique/multiple hits Unique Unique Unique Multi D/M
Number of EST hits supporting site 6+ ESTs 3-5 ESTs D/M D/M D/M
Restriction Enzyme Site None None None Imperfect match Perfect match

D/M = Doesn't matter
Gray Field => If gray field satisfied, it alone decides the confidence level