In this case study, we propose to emulate the approach by Mikkelsen et al. in "Genome-wide maps of chromatin state in pluripotent and lineage-committed cells" Nature. 2007 August 2; 448(7153): 553–560 and use H3K4me3 and H3K36me3 ENCODE ChIP-Seq to uncover genes and non-coding RNA loci, albeit relying on a more simple clustering of the ChIP-Seq in lieu of the Hidden Markov Model (HMM) segmentataion of the reference genome into ‘enriched’ and ‘unenriched’ intervals. H3K4me3 is typically associated with promoter regions, while H3K36me3 is typically associated with actively transcribed genomic intervals. By looking for stretch of H3K36me3 signal whose extremity overlap with stretch of H3K4me3, we aim to uncover the location of known loci as well as potentially uncover novel coding and non-coding loci.
This case study will provide examples of how to :
- select relevant data from DEX to build tracks
- modify the way the raw mapping data is rendered
- sub-select experiments of interest from within the glyph interface
- rename the experiments based on their associated metdata in order to better apprehend the data being vizualized
- cluster raw data into regions of signal higher density
- intersect such clusters to reveal the hallmark of transcribed coding and non-coding loci.
Selection of the H3K4me3 and H3K36me3 ENCODE ChIP-Seq data
Using DEX we search for "encode h3k36me3".
This returns 70 datasets that we all select to build a single track ("select all" and then hitting "build track" in the upper right track and view building panel)
We will name this track "all encode H3k36me3".
ChIP-Seq data are strandless in nature, so we flip the strandless flag on as well as set the display to be area base "expression", which will produce a wig like track.
We similarly search for "encode h3k4me3" and select all the data to build the track we name "all encode H3k4me3".
We also add to our view the gencode V11 annotation dataset and choose to display its content as "transcript".
We thus obtain a view with 3 tracks :
- "all encode H3k36me3" as a wig track
- "all encode H3k4me3" as a wig track
- "gencode V11" as transcripts
Modification of the rendering of the ChIP-Seq tracks and renaming of the experiments
The wig track do not allow us to easily grasp differences in the histone methylation states between the various cell lines and replicates. A better overview can be provided by turning the display into a heatmap style rather than the wig style drawing.
Genomic location without any signal appear white, which can confuse weak and absent signal. By setting the background color to grey, we can better appreciate area without ChIP-Seq signal and areas with only weak signal.
Applying the same duplicate/modify the rendering is then applied to the "all encode H3k4me3"
Remark how the ordering of the rows in the heat map match those in the bottom "expression level over the selected area" histogram summary.
By clicking on the bold header experiment name or the histogram header ("strandless") we can reorder the heatmap rows according to the alphabetical order of the experiment name or the strength of the signal
The name of the experiments contains a large amount of information such as the lab of origin, the cell line, the antibody used for the ChIP experiment, the replication number, ... but its alphabetical ordering fail to provide us with a useful order such as ordering by cell line.
We can rename the displayed experiment name using metadata associated to each experiment in a fashion that will enable us to better apprehend the dataset.
Overview of the metadata of a given experiment
To accomplish this task, we turn into the use of the processing script facility of ZENBU. Opening the track edit panel we switch the default "expression histogram" processing script to one allowing the renaming of the experiment describe in the processing module wiki page RenameExperiments. Having gotten ride of the default "histogram expression" processing, we need for the heatmap to display expression level to also reproduce the grid binning. We take advantage of this required addition of the grid binning step to modify the binning parameter to 100bp which will downsize the volume of data sent to the browser and allow for a faster rendering of the data.
The full processing script is therefore :
<zenbu_script> <stream_queue> <spstream module="TemplateCluster"> <overlap_mode>height</overlap_mode> <expression_mode>sum</expression_mode> <overlap_subfeatures>true</overlap_subfeatures> <ignore_strand>true</ignore_strand> <side_stream> <spstream module="FeatureEmitter"> <width>100</width> <fixed_grid>true</fixed_grid> <both_strands>false</both_strands> </spstream> </side_stream> </spstream> <spstream module="RenameExperiments"> <tag>enc:cell</tag> <tag>enc:cell_lineage</tag> <tag>enc:cell_tissue</tag> </spstream> </stream_queue> </zenbu_script>
Refining the selection of the H3K4me3 and H3K36me3 ENCODE ChIP-Seq to K562 cell originating data from the glyph interface
As can be seen there are many more cell line for which H3K4me3 data are avaailable than for the H3K36me3 dataset.
We can use the glyph interface to modify dynamically the selection of datasources in a track.
First let us duplicate once more the heat map, shrink the other tracks and change the background color of the H3K36me3 to light blue and the background of the H3K4me3 to pink, so has to easily differentiate them.