Background Transcriptional regulation is an important part of regulatory control in

Background Transcriptional regulation is an important part of regulatory control in eukaryotes. the action of 149 transcription factors on these genes. Several of these programs were significantly enriched for known biological processes and signalling pathways. One transcriptional program has a significant overlap with a reference set of cell cycle specific transcription factors. Conclusion Our method is able to pick out higher order structure from noisy sequence analyses. The transcriptional programs it identifies potentially represent common mechanisms of regulatory control across the genome. It simultaneously predicts which genes are co-regulated and which sets of Ibodutant (MEN 15596) transcription factors cooperate to achieve this co-regulation. The programs we discovered enable biologists to choose new genes and transcription factors to study in specific transcriptional regulatory systems. Background Organisms ranging in complexity from bacteria to higher eukaryotes are able to react and adapt to environmental and cellular signals. These responses are often encoded as complex gene regulatory networks. In these networks the expression of a gene’s products is regulated by the activity of other genes. Although regulation can occur at many levels, we focus on transcriptional regulation, one of the most important and pervasive methods of regulation in eukaryotes. Transcriptional regulation occurs when certain gene products, transcription Ibodutant (MEN 15596) factors (TFs), bind to the DNA at binding sites (TFBSs) and affect the transcription of the regulated gene by modulation of the RNA polymerase complex. TFBSs often appear in clusters or cis-regulatory modules (CRMs), presumably to enable interactions between TFs Ibodutant (MEN 15596) binding there. Combinatorics of transcriptional regulation TFs do not work in isolation from each other. Particularly in higher organisms, combinatorial operations are often necessary for the response Mouse monoclonal to IL-1a of a cell to external stimuli or developmental programs. Such a response is frequently implemented as a transcriptional switch where a combination of presence or absence of certain TFs regulates the expression of a certain gene. Several well characterised examples of the coordination of TFs are known. For instance, a set of well studied TFs in Drosophila melanogaster that govern spatial patterns of development in its embryo is described in [1]; higher eukaryotes are known to use CRMs to integrate cellular signalling information [2]; the development of Ibodutant (MEN 15596) the anterior pituitary gland is regulated by combinatorial actions of specific activating and restricting factors [3] which determine cell type. Conversely, cellular processes often involve the coordinated expression of sets of genes. Hence there is reason to suppose that not only do particular sets of transcription factors regulate particular genes but that these sets are also reused across the genome: that is, co-regulated genes are often targets of the same TFs. Genomic data commonly available today, such as sequence data, expression data or TF localisation data, do not permit Ibodutant (MEN 15596) direct inference of the higher order structure in transcriptional regulation. Most analyses of these data operate at the individual TF level. When the data permit it and the biologist is interested in this level of detail, it is certainly appropriate. However, genomic data is often noisy or incomplete. In this case a summary or view of higher order structure in transcriptional regulation is easier to interpret. Identification of binding sites by sequence analysis The databases TRANSFAC [4] and JASPAR [5] hold the most widely used collections of position specific scoring matrices (PSSMs). Each PSSM is a probabilistic model of the DNA binding specificities of a particular TF: given the PSSM and a stretch of DNA the likelihood of that TF binding to different positions in the sequence can be computationally predicted. There are several problems with this approach: algorithms that find putative binding sites are known to generate many false positives; the regions in which regulatory TFBSs are located are not normally known in advance; and, unfortunately, JASPAR and TRANSFAC do not contain PSSMs for all TFs of interest. We chose to use the PSSMs in TRANSFAC for our analysis. Our model Our model aims.