NMF analysis pipeline output files:

All output filenames begin with the prefix set by the Ðo option at the command line. The files generated, in order of steps of the analysis are:

1: fa2stats:

prefix.stat : mono- and di-nucleotide counts and frequencies from input file

2: WindowCount (assuming w = 3, k = 4):

prefix.w3k4.s2.counts.txt The critical file for further analysis: a tab-delimited text file with the positional word count (PWC) matrix where rows represent each k-mer and columns the counts within each aligned window. The first row has as headers the position at the start of each window, and the first column is the k-mers (ranked in decreasing s-squared).

Additional files that can be ignored for nmf

prefix.w3k4.s2.txt: the actual s^2 values (similar to chi-squared) that rank the k-mers for output in the .s2.counts.txt file.

prefix.w3k4.ranks.txt

prefix.w3k4.chi.txt

3: nmfSmoothAndPseudoCounts:

prefix.sw3k4.counts.txt: the transformed PWC matrix after smoothing and addition of pseudocounts

4: nnmf (assuming r = 8):

prefix..sw3k4r8.weights.txt: The positioning matrix, giving the probability of observing a given motif at a specific position.

prefix..sw3k4r8.bases.txt: The sequence content, giving the probability of observing each k-mer as a part of each motif.

prefix..sw3k4r8.prog.txt: A file that simply tracks the progress of the analysis (can be deleted after the run is complete).

5: nmfSortMatrix:

prefix.sw3k4r8.nweights.txt: Normalized version of the nmf weights, such that all vectors have common integration.

prefix.sw3k4r8.rwords.txt: A text table that separates each column in the base file, sorting by decreasing contribution of k-mer to the motif. In the final output there are two columns per motif, k-mer and weight, sorted by decreasing weight.

6: nmfWplot:

prefix.sw3k4r8.n.png: Line plot of positioning for normalized nmf weights

prefix.sw3k4r8.w.png: Line plot of positioning for raw nmf weights

7: buildMotifs:

prefix.sw3k4r8.motifs: Text listing of the matrixes describing the MCMC-derived motifs from the NMF bases

8: nmfMotifsToModels:

prefix.sw3k4r8.models.txt: reformatted file with matrixes for the motifs

9: pwmToExamples:

zf.sw3k4r8.A.logoEx.txt: random sequences for logo for the first motif

zf.sw3k4r8.B.logoEx.txt: random sequences for logo for the second motif

10: seqlogo:

zf.sw3k4r8.A.png: sequence logo image for the first motif

zf.sw3k4r8.B.png: sequence logo image for the second motif

11 nmfWeb:

prefix.sw3k4r8.html: The web page that displays the results.