CHipSeq

From ZENBU documentation wiki
Jump to: navigation, search

In this case study, we propose to emulate the approach by Mikkelsen et al. in "Genome-wide maps of chromatin state in pluripotent and lineage-committed cells" Nature. 2007 August 2; 448(7153): 553–560 and use H3K4me3 and H3K36me3 ENCODE ChIP-Seq to uncover genes and non-coding RNA loci, albeit relying on a more simple clustering of the ChIP-Seq in lieu of the Hidden Markov Model (HMM) segmentataion of the reference genome into ‘enriched’ and ‘unenriched’ intervals. H3K4me3 is typically associated with promoter regions, while H3K36me3 is typically associated with actively transcribed genomic intervals. By looking for stretch of H3K36me3 signal whose extremity overlap with stretch of H3K4me3, we aim to uncover the location of known loci as well as potentially uncover novel coding and non-coding loci.

This case study will provide examples of how to :

  • select relevant data from DEX to build tracks
  • modify the way the raw mapping data is rendered
  • sub-select experiments of interest from within the glyph interface
  • rename the experiments based on their associated metdata in order to better apprehend the data being vizualized
  • cluster raw data into regions of signal higher density
  • intersect such clusters to reveal the hallmark of transcribed coding and non-coding loci.

Selection of the H3K4me3 and H3K36me3 ENCODE ChIP-Seq data

Using DEX we search for "encode h3k36me3".
This returns 70 datasets that we all select to build a single track ("select all" and then hitting "build track" in the upper right track and view building panel)
We will name this track "all encode H3k36me3".
ChIP-Seq data are strandless in nature, so we flip the strandless flag on as well as set the display to be area base "expression", which will produce a wig like track.

LincRNA.1.png


We similarly search for "encode h3k4me3" and select all the data to build the track we name "all encode H3k4me3".
We also add to our view the gencode V11 annotation dataset and choose to display its content as "transcript".


LincRNA.2.png
LincRNA.3.png

We thus obtain a view with 3 tracks :

  • "all encode H3k36me3" as a wig track
  • "all encode H3k4me3" as a wig track
  • "gencode V11" as transcripts
LincRNA.4.png


Modification of the rendering of the ChIP-Seq tracks and renaming of the experiments

The wig track do not allow us to easily grasp differences in the histone methylation states between the various cell lines and replicates. A better overview can be provided by turning the display into a heatmap style rather than the wig style drawing.

LincRNA.5.png


Genomic location without any signal appear white, which can confuse weak and absent signal. By setting the background color to grey, we can better appreciate area without ChIP-Seq signal and areas with only weak signal.
LincRNA.6.png LincRNA.7.png LincRNA.8.png


Applying the same duplicate/modify the rendering is then applied to the "all encode H3k4me3"

LincRNA.9.png



Remark how the ordering of the rows in the heat map match those in the bottom "expression level over the selected area" histogram summary.
By clicking on the bold header experiment name or the histogram header ("strandless") we can reorder the heatmap rows according to the alphabetical order of the experiment name or the strength of the signal
LincRNA.10.png upleft

The name of the experiments contains a large amount of information such as the lab of origin, the cell line, the antibody used for the ChIP experiment, the replication number, ... but its alphabetical ordering fail to provide us with a useful order such as ordering by cell line.

We can rename the displayed experiment name using metadata associated to each experiment in a fashion that will enable us to better apprehend the dataset.

LincRNA.12.png

Overview of the metadata of a given experiment


To accomplish this task, we turn into the use of the processing script facility of ZENBU. Opening the track edit panel we switch the default "expression histogram" processing script to one allowing the renaming of the experiment describe in the processing module wiki page RenameExperiments. Having gotten ride of the default "histogram expression" processing, we need for the heatmap to display expression level to also reproduce the grid binning. We take advantage of this required addition of the grid binning step to modify the binning parameter to 100bp which will downsize the volume of data sent to the browser and allow for a faster rendering of the data.
LincRNA.13.png LincRNA.17.png
The full processing script is therefore :

<zenbu_script>
	<stream_queue>
		<spstream module="TemplateCluster">
			<overlap_mode>height</overlap_mode>
			<expression_mode>sum</expression_mode>
			<overlap_subfeatures>true</overlap_subfeatures>
			<ignore_strand>true</ignore_strand>
			<side_stream>
				<spstream module="FeatureEmitter">
					<width>100</width>
					<fixed_grid>true</fixed_grid>
					<both_strands>false</both_strands>
				</spstream>
			</side_stream>
		</spstream>
		<spstream module="RenameExperiments">
			<tag>enc:cell</tag>
			<tag>enc:cell_lineage</tag>
			<tag>enc:cell_tissue</tag>
		</spstream>
	</stream_queue>
</zenbu_script>


As illustrated below, the experiment have been renamed and can be reorder by cell line or by expression level
LincRNA.18.pngLincRNA.16.pngLincRNA.15.png

Refining the selection of the H3K4me3 and H3K36me3 ENCODE ChIP-Seq to K562 cell originating data from the glyph interface

As can be seen there are many more cell line for which H3K4me3 data are avaailable than for the H3K36me3 dataset. We can use the glyph interface to modify dynamically the selection of datasources in a track.
First let us duplicate once more the heat map, shrink the other tracks and change the background color of the H3K36me3 to light blue and the background of the H3K4me3 to pink, so has to easily differentiate them.

LincRNA.21.png