OverlapMerge

From ZENBU documentation wiki
Jump to: navigation, search

Data Stream Processing > Processing Modules > Clustering and collation Modules

Description

The OverlapMerge processing module simply processes features on the main stream for overlaps with distance. If overlap occurs the features are merged into a larger cluster.

Parameters

  • <expression_mode> : defines how expression within matching Experiments are collated together. Possible values are:
    • sum : sum the expression between multiple primary stream features for each matching experiment into the template feature
    • min : calculate the minimum expression value between multiple primary stream features for each matching experiment
    • max : calculate the maximum expression value between multiple primary stream features for each matching experiment
    • count : count the number of primary stream features for each matching experiment overlapping the template feature.
    • mean : calculate the average expression value among primary stream features for each matching experiment overlapping the template feature
  • <overlap_check_subfeatures> : if features contain subfeatures (eg like transcript gene models) setting this option to true while require that the subfeatures overlap each other in order to trigger collation of expression. If one of the features does not have subfeatures then the genomic bounds of the feature are used in the overlap calulation. If both features have subfeatures then it must be a subfeature to subfeature overlap to trigger collation.
  • <distance>
  • <ignore_strand>
  • <merge_subfeatures>
  • <subfeature_category_filter>


Example1 : combining a Proxy module (co-localization with selected regions)

In order to quantify the amount of signal co-localized with particular regions, TemplateCluster can be used in combination with a Proxy module defining the regions of choice (in the case below all the regions corresponding to Entrez genes -- on either the human, mouse or rat genomes -- ).

<zenbu_script>
    <datastream name="entrez" output="simple_feature">
        <source id="0583D02E-BA10-11DE-B45C-8D369A8382FD::50:::FeatureSource" name="Entrez_gene_mm9"/>
        <source id="0583D02E-BA10-11DE-B45C-8D369A8382FD::31:::FeatureSource" name="Entrez_gene_hg18"/>
        <source id="B1880D44-F935-11DF-82E8-6158894DF986::15:::FeatureSource" name="Entrez_gene_hg19"/>
        <source id="0583D02E-BA10-11DE-B45C-8D369A8382FD::47:::FeatureSource" name="Entrez_gene_rn4"/>
     </datastream>
     <stream_processing>
         <spstream module="TemplateCluster">
             <ignore_strand>false</ignore_strand>
             <side_stream><spstream module="Proxy" name="entrez"/></side_stream>
         </spstream>
     </stream_processing>
</zenbu_script>

Example2 : combining with a FeatureEmitter module (strand-aware grid binning)

One of the most common use of TemplateCluster is in combination with FeatureEmitter. This script combines a FeatureEmitter with TemplateCluster to create a regular grid of features at a "screen resolution" of 970 separate "bins". This will always generate the same number of output feature/bins irrespective of the region query size. This is useful for display purposes. Input expression is collates evenly into each overlapping "bin".

<zenbu_script>
   <parameters>
	<source_outmode>skip_metadata</source_outmode>
	<skip_default_expression_binning>true</skip_default_expression_binning>
   </parameters>
   <stream_processing>
	<spstream module="TemplateCluster">
		<overlap_mode>height</overlap_mode>
		<expression_mode>sum</expression_mode>
		<ignore_strand>false</ignore_strand>
		<overlap_subfeatures>true</overlap_subfeatures>
		<side_stream>
			<spstream module="FeatureEmitter">
				<num_per_region>970</num_per_region>
				<fixed_grid>true</fixed_grid>
				<both_strands>true</both_strands>
			</spstream>
		</side_stream>
        </spstream>
   </stream_processing>
</zenbu_script>

Example3 combining with a FeatureEmitter module (strand-less grid binning)

This script combines a FeatureEmitter with TemplateCluster to create a regular grid of strandless features at a 100base resolution and collates expression evenly into each overlapping "bin".

<zenbu_script>
   <stream_processing>
		<spstream module="TemplateCluster">
			<overlap_mode>height</overlap_mode>
			<expression_mode>sum</expression_mode>
			<overlap_subfeatures>true</overlap_subfeatures>
			<ignore_strand>true</ignore_strand>
			<side_stream>
				<spstream module="FeatureEmitter">
					<width>100</width>
					<fixed_grid>true</fixed_grid>
					<both_strands>false</both_strands>
				</spstream>
			</side_stream>
		</spstream>
   </stream_processing>
</zenbu_script>

Example ZENBU view showing this script in use
http://fantom.gsc.riken.jp/zenbu/gLyphs/#config=bUULzhgRIBkifTt2sXZEEB;loc=hg19::chr8:128746973..128755020

FeatureEmitter.1.png