A functional and evolutionary perspective on transcription factor binding in Arabidopsis thaliana
Klaas Vandepoele
Comparative & Integrative Genomics group
Department of Plant Biotechnology and Bioinformatics, Ghent University
Department of Plant Systems Biology, VIB - Belgium
Use of mutants in understanding seedling development.pptx
A functional and evolutionary perspective on transcription factor binding in Arabidopsis thaliana
1. A functional and evolutionary perspective on
transcription factor binding in Arabidopsis thaliana
Potsdam, October 2014
Comparative & Integrative Genomics group
Department of Plant Biotechnology and Bioinformatics, Ghent University
Department of Plant Systems Biology, VIB - Belgium
plaza_genomics
2. OVERVIEW
1. Transcriptional gene regulation in plants
2. Inference of transcriptonal networks using an ensemble
framework for phylogenetic footprinting in plants
3. An integrated gene regulatory network using experimental ChIP
data of 27 transcription factors in Arabidopsis
4. Conclusions
Jan Van de Velde
Ken Heyndrickx
5. COMPUTATIONAL ANALYSIS OF CIS-REGULATORY
ELEMENTS
Mapping of known TF binding sites on promoter sequences
False positives
Low quality motifs (PWMs) + many motifs lack information about
binding factor
Motif redundancy & multi-gene transcription factor families
Database # CRE Species
PLACE 469 Vascular plants
AGRIS 99 Arabidopsis thaliana
AtProbe 172 Arabidopsis thaliana
PlantCARE 435 Monocots and dicots
6. 2. PHYLOGENETIC FOOTPRINTING: DETECTION OF
CONSERVED NON-CODING SEQUENCES (CNS)
Comparative analysis of noncoding DNA sequences to identify
candidate regulatory elements (in orthologous genes)
Regulatory elements are conserved during evolution due to
functional constraint (vs. neutral carry-over)
The power of phylogenetic footprinting is enhanced significantly
when data from a number of related species, which diverged
sufficiently, is available
7. DEVELOPING AN ENSEMBLE FRAMEWORK FOR
PHYLOGENETIC FOOTPRINTING IN PLANTS
Application of motif mapping and
different pairwise alignment tools
Aggregate alignments in multi-
species footprint using 11
comparator dicot genomes
Evaluate statistical signifcance incl.
FDR analysis
AtProbe
Feature map
@ RSAT
144 regulatory elements (63 genes)
774 DNA motifs
8. FROM PAIRWISE ALIGNMENTS TO MULTI-SPECIES FOOTPRINTS
Generate all pairwise alignments
between Arabidopsis query gene
and its orthologs
Map all pairwise alignments back
to reference promoter
Count per position the #species
that support a footprint
Significance estimation
Van Bel, M., Proost, S., Wischnitzki, E., Movahedi, S., Scheerlinck, C., Van de Peer, Y., Vandepoele, K. (2012) Dissecting plant
genomes with the PLAZA comparative genomics platform. Plant Physiology 158:590-600
9. EVALUATION ATPROBE EXPERIMENTAL CIS-
REGULATORY ELEMENTS
Significance
Experimental
motifs
Scmm ACGTGGC = 0.54
P value < 0.001
G-box
Scmm ATAGATAA = 0.09
P value 0.48
GA motif
Scmm GATAAGATT = 0.36
P value < 0.001
I-box
RBCS1A
Scmm TATATATA = 0.7
P value < 0.001
GAPA
ACA motif
C-motif
10. PROPERTIES CNS
69,361 CNSs associated with 17,895 genes
Protein-coding genes (99%), miRNA genes (1%)
Median length: 11nt (min-max: 5-514nt)
CNS cover 1,070kb of the non-coding Arabidopsis genome
11. DETECTION OF EXPERIMENTAL REGULATORY
ELEMENTS
• Black boxes: percentage of
recovered elements
• White boxes: percentage
of uniquely recovered
elements in this study
12. RECOVERY OF IN VIVO FUNCTIONAL TARGETS
USING CNS INFORMATION
• White boxes: fold enrichment for CNSs
• Black boxes: fold enrichment naïve motif mapping
High-quality dataset 15 TFs
ChIP-Seq binding + TF binding site + differentially expressed TF perturbation (n=2708)
13. GENOME-WIDE REGULATORY ANNOTATION
Collapsed TF-target module network
40,758 TF-target interactions (157 TFs)
9/13 TFs significant overlap with experimentally
confirmed targets (AtRegNet/Hussey et al., 2013)
Various functional genomics metrics confirm quality
predicted GRN
Van de Velde, J.*, Heyndrickx, K.S.*, and Vandepoele, K. (2014). Inference of Transcriptional Networks in Arabidopsis through
Conserved Noncoding Sequence Analysis. Plant Cell.
15. 2. AN INTEGRATED GENE REGULATORY NETWORK USING CHIP
DATA OF 27 TRANSCRIPTION FACTORS
How is TF binding organised across different target genes?
Have Highly Occupied Target (HOT) genes in plants the same
distinct regulatory features as in organisms?
To what extent is binding linked to differential expression TF
binding site presence?
16. * Heyndrickx KS, Vandepoele K (2012) Systematic Identification of Functional Plant Modules through the Integration of Complementary Data
Sources. Plant physiology 159: 884-901
17. TF PROTEIN-PROTEIN INTERACTION NETWORK
De Bodt S, Hollunder J, Nelissen H, Meulemeester N, Inze D (2012) CORNET 2.0: integrating plant coexpression, protein-protein
interactions, regulatory interactions, gene associations and functional annotations. The New phytologist 195: 707-720
Flowering
Light Response
22. BINDING SITE ORGANISATION
A
B
DH I sites: Zhang W, Zhang T, Wu Y, Jiang J (2012) Genome-Wide Identification of Regulatory DNA Elements and Protein-Binding
Footprints Using Signatures of Open Chromatin in Arabidopsis. The Plant cell
A
B
C
D
R
24. Hub genes
1,170 genes bound by ≥ 8 TFs
Significantly Enriched for TFs and miRNAs
Highly Occupied Target (HOT) regions
1,179 regions bound by ≥ 7 TFs
COOPERATIVE TF BINDING: HUB & HOT
A-D
Enrichment for regulatory genes
(TFs, kinases), response to
stimuli & developmental genes
25. CHROMATIN STATES AND NUCLEOTIDE DIVERSITY OF TF-
BOUND REGIONS
Sequeira-Mendes et al., … Gutierrez, C. (2014). The Functional Topography of the Arabidopsis Genome Is Organized in a Reduced
Number of Linear Motifs of Chromatin States. Plant Cell.
Population sequence diversity based on
369 Arabidopsis strains (Weigel lab)
28. EXPRESSION LEVELS ARE CORRELATED WITH
THE TOTAL NUMBER OF BOUND TFS
Low: < 3 TFs; Intermediate: >= 3 TFs and < 8TFs; hub: >= 8 TFs
(n=406 flowering-related genes)
34. NEW HYPOTHESES ON CO-BINDING AND
TETHERING
AP1
PIF3
PRR7
FHY3AP1 SEP3
Co-binding Tethering
PRR5 PRR7
35. CONCLUSIONS & PERSPECTIVES
Integration of CNS with complementary experimental data sources
offers new possibilities for regulatory gene annotation in plants
High specificity to predict TF-target interactions
Complementary to exerperimental TF-target detection methods
Study GRN conservation and rewiring across species
Integrated 27 TF ChIP-Seq gene regulatory network reveals
Complexly regulated are enriched for regulatory genes
HOT-associated regions represent functional binding events
Open chromatin
Sequence constraint
TF binding sites
Enrichment for regulated target genes
Co-binding and tethering patterns could explain the apparent
discrepancy between binding and regulation in ChIP-chip/Seq
studies
36. FURTHER READING
Van de Velde, J.*, Heyndrickx, K.S.*, and Vandepoele, K. (2014). Inference of
Transcriptional Networks in Arabidopsis through Conserved Noncoding Sequence
Analysis. The Plant Cell 26(7):2729-2745
Proost, S., Van Bel, M. … and Vandepoele, K. (2015). PLAZA 3.0: an access point
for plant comparative genomics. Nucleic Acids Research (accepted)
Heyndrickx, K.S.*, Van de Velde, J.*, and Vandepoele, K. (2014). A functional
and evolutionary perspective on transcription factor binding in Arabidopsis
thaliana. The Plant Cell (accepted)