TemplateCluster

From ZENBU documentation wiki
Jump to: navigation, search

Data Stream Processing > Processing Modules > Clustering and collation Modules

Description

The TemplateCluster processing module takes a stream of template features on a side stream and performs overlap comparison against features with expression on the primary data stream. When an overlap occurs, expression is collated from the primary-stream feature into the overlapping secondary stream feature. The output of the this module are template features with expression values.

Parameters

  • <side_stream> : data source definition for features to be used as the templates for collation. This can be as simple as a FeatureEmitter or Proxy or it can be a processed stream of features.
  • <overlap_mode> : defined how overlap calculation is performed between features on the primary stream and features on the side stream. possible values are :
    • area : if primary stream features overlap multiple templates, expression is evenly divided among the templates so that total counts remain the same as the input stream. Visually this creates an affect where by the expression correlates to the "area on the curve" of the feature or the number of pixels.
    • height : if primary stream features overlap multiple templates, expression is equally copied/collated into all template featues. Visually this gives the effect whereby the height of the resultant feature represents the collated expression, but the total sum of expression across output features is no longer preserved.
    • 5end : the primary stream feature is compressed to the 5' end and overlap is compared against that single base location.
    • 3end : the primary stream feature is compressed to the 3' end and overlap is compared against that single base location.
  • <expression_mode> : defines how expression within matching Experiments are collated together. Possible values are:
    • sum : sum the expression between multiple primary stream features for each matching experiment into the template feature
    • min : calculate the minimum expression value between multiple primary stream features for each matching experiment
    • max : calculate the maximum expression value between multiple primary stream features for each matching experiment
    • count : count the number of primary stream features for each matching experiment overlapping the template feature.
    • mean : calculate the average expression value among primary stream features for each matching experiment overlapping the template feature
  • <ignore_strand> : ignore strand specificity when comparing features between the primary and template streams. Enable by setting to true.
  • <overlap_subfeatures> : if features contain subfeatures (eg like transcript gene models) setting this option to true while require that the subfeatures overlap each other in order to trigger collation of expression. If one of the features does not have subfeatures then the genomic bounds of the feature are used in the overlap calulation. If both features have subfeatures then it must be a subfeature to subfeature overlap to trigger collation.
  • <skip_empty_templates> : if set to false templates with zero expression are retained. default behaviour is that templates which do not collate expression are removed from the stream.

Example1 : combining a Proxy module (co-localization with selected regions)

In order to quantify the amount of signal co-localized with particular regions, TemplateCluster can be used in combination with a Proxy module defining the regions of choice (in the case below all the regions corresponding to Entrez genes -- on either the human, mouse or rat genomes -- ).

<zenbu_script>
    <datastream name="entrez" output="simple_feature">
        <source id="0583D02E-BA10-11DE-B45C-8D369A8382FD::50:::FeatureSource" name="Entrez_gene_mm9"/>
        <source id="0583D02E-BA10-11DE-B45C-8D369A8382FD::31:::FeatureSource" name="Entrez_gene_hg18"/>
        <source id="B1880D44-F935-11DF-82E8-6158894DF986::15:::FeatureSource" name="Entrez_gene_hg19"/>
        <source id="0583D02E-BA10-11DE-B45C-8D369A8382FD::47:::FeatureSource" name="Entrez_gene_rn4"/>
     </datastream>
     <stream_processing>
         <spstream module="TemplateCluster">
             <ignore_strand>false</ignore_strand>
             <side_stream><spstream module="Proxy" name="entrez"/></side_stream>
         </spstream>
     </stream_processing>
</zenbu_script>

Example2 : combining with a FeatureEmitter module (strand-aware grid binning)

One of the most common use of TemplateCluster is in combination with FeatureEmitter. This script combines a FeatureEmitter with TemplateCluster to create a regular grid of features at a "screen resolution" of 970 separate "bins". This will always generate the same number of output feature/bins irrespective of the region query size. This is useful for display purposes. Input expression is collates evenly into each overlapping "bin".

<zenbu_script>
   <parameters>
	<source_outmode>skip_metadata</source_outmode>
	<skip_default_expression_binning>true</skip_default_expression_binning>
   </parameters>
   <stream_processing>
	<spstream module="TemplateCluster">
		<overlap_mode>height</overlap_mode>
		<expression_mode>sum</expression_mode>
		<ignore_strand>false</ignore_strand>
		<overlap_subfeatures>true</overlap_subfeatures>
		<side_stream>
			<spstream module="FeatureEmitter">
				<num_per_region>970</num_per_region>
				<fixed_grid>true</fixed_grid>
				<both_strands>true</both_strands>
			</spstream>
		</side_stream>
        </spstream>
   </stream_processing>
</zenbu_script>

Example3 combining with a FeatureEmitter module (strand-less grid binning)

This script combines a FeatureEmitter with TemplateCluster to create a regular grid of strandless features at a 100base resolution and collates expression evenly into each overlapping "bin".

<zenbu_script>
   <stream_processing>
		<spstream module="TemplateCluster">
			<overlap_mode>height</overlap_mode>
			<expression_mode>sum</expression_mode>
			<overlap_subfeatures>true</overlap_subfeatures>
			<ignore_strand>true</ignore_strand>
			<side_stream>
				<spstream module="FeatureEmitter">
					<width>100</width>
					<fixed_grid>true</fixed_grid>
					<both_strands>false</both_strands>
				</spstream>
			</side_stream>
		</spstream>
   </stream_processing>
</zenbu_script>

Example ZENBU view showing this script in use
http://fantom.gsc.riken.jp/zenbu/gLyphs/#config=bUULzhgRIBkifTt2sXZEEB;loc=hg19::chr8:128746973..128755020

FeatureEmitter.1.png