Nucleic Acids Research Advance Access originally published online on September 6, 2009
Nucleic Acids Research 2009 37(19):6305-6315; doi:10.1093/nar/gkp682
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nucleic Acids Research, 2009, Vol. 37, No. 19 6305-6315
© The Author(s) 2009. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Computational Biology |
CpG-depleted promoters harbor tissue-specific transcription factor binding signals—implications for motif overrepresentation analyses
1Max Planck Institute for Molecular Genetics, Ihnestrasse 73, 14195 Berlin, 2Bergen Center for Computational Science, University of Bergen, Thormøhlensgate 55, N-5008 Bergen, Norway and 3Centre for Medical Molecular Virology, University College London, London W1T 4JF, UK
*To whom correspondence should be addressed. Tel: +49 30 8413 1151; Fax: +49 30 8413 1152; Email: roider{at}molgen.mpg.de
Received May 30, 2009. Revised August 2, 2009. Accepted August 3, 2009.
Motif overrepresentation analysis of proximal promoters is a common approach to characterize the regulatory properties of co-expressed sets of genes. Here we show that these approaches perform well on mammalian CpG-depleted promoter sets that regulate expression in terminally differentiated tissues such as liver and heart. In contrast, CpG-rich promoters show very little overrepresentation signal, even when associated with genes that display highly constrained spatiotemporal expression. For instance, while
50% of heart specific genes possess CpG-rich promoters we find that the frequently observed enrichment of MEF2-binding sites upstream of heart-specific genes is solely due to contributions from CpG-depleted promoters. Similar results are obtained for all sets of tissue-specific genes indicating that CpG-rich and CpG-depleted promoters differ fundamentally in their distribution of regulatory inputs around the transcription start site. In order not to dilute the respective transcription factor binding signals, the two promoter types should thus be treated as separate sets in any motif overrepresentation analysis.