Computational Analysis of Gene Regulation and Interactions

The advent of genome-scale biology has provided biologists with enormous amounts of data to analyze, understand, and incorporate into ever-improving models of how organisms function at a molecular level. A fundamental problem in these studies is the identification and characterization of an organism's genes, especially those that code for proteins. While a great deal of attention understandably has been focused on delineation of the function of the resulting proteins, it is equally important to ascertain the context in which genes function. Gene regulation is a complex phenomenon that can be controlled at multiple stages from genomic initiation of transcription, through processing of the nascent RNA transcript, to post-translational modifications of a final protein product. A complete model of regulation requires understanding of control at all stages of expression. Post-transcriptional regulation of genes is frequently mediated by cis-acting sequence elements located in the untranslated regions (UTRs) of the nascent transcript. Alternative polyadenylation necessarily results in an altered 3'-UTR, which can result in corresponding changes in gene expression. We have three principal areas of research in the near future:

The study of the role and impact of genetic variation in the sequence elements that control mRNA processing, with a specific focus on polyadenylation

Our recent studies have helped to illuminate the functional significance of systematic alternative polyadenylation at different stages of development, in different cell types, and in primary tumor samples. Carrying this work forward, we will now extend these studies to explore the relationship of genetic variability and control of alternative polyadenylation. We will build upon our existing database of polyA sites (PACdb,, including new, large-scale analyses derived from microarray and high-throughput mRNA sequencing efforts. Through this research program, we expect to generate and disseminate a genome-wide view of polyadenylation in mouse, the preeminent model mammalian system. Integration with genetic variation, reinforced with experimental validation of selected predictions, will provide new insights into the control, extent and consequences of alternative polyadenylation

The role and downstream consequences of disrupted regulation of mRNA processing in tumorigenesis

We recently developed and applied a probe-level microarray analysis to data obtained from mouse models of pre-B-cell lymphoma, resulting in the identification of genome-wide, systematic, and characteristic changes in mRNA processing. This work contributed to a growing understanding of the role of alternative processing (specifically alternative polyadenylation) as a part of tumorigenesis. This work has the potential to provide new models for, and understanding of, the disruption in regulation that accompanies tumor initiation and progression. As we move forward, we plan to broaden the studies to additional types of tumors, while also switching from microarray to mRNA-seq or other high-throughput sequencing-based methodologies.

The continued development and validation of computational approaches to regulatory motif identification and characterization

We have an interest in developing improved methods for identification and characterization of the regulatory sequences that guide mRNA processing and gene regulation. The majority of the approaches available in popular tools pay little or no attention to positioning of the motifs, despite the clear role that positioning plays in many critical processes, such as splicing and polyadenylation. Our recent work includes a novel motif characterization based on non-negative matrix factorization. Our methodology has the unique feature of simultaneously determining both sequence content and positioning. We envision a number of improvements and investigations of alternative approaches as the work progresses.

  • August 2012
  • CGDSNPdb v1.5 maintenance release
  • Coming Soon
  • Webservices to access both CGDSNPDB and GraberTranscript database