From ZENBU documentation wiki
Revision as of 11:07, 4 December 2012 by Jessica Severin (talk | contribs) (Examples)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Data Stream Processing > Processing Modules > Filtering Modules


The TemplateFilter processing module takes a stream of template features on a side stream and performs overlap comparison against features on the primary data stream. When an overlap occurs, the primary stream primary-stream feature is either passed through this filter (default behaviour) or blocked based on this module's parameter settings.


  • <side_stream> : data source definition for features to be used as the templates for overlap comparison. This is most often a Proxy for another set of defined annotations, but it can also be a processed stream of features.
  • <overlap_mode> : defines how overlap calculation is performed between features on the primary stream and features on the side stream. possible values are :
    • area : the full lenght of both features are used in the comparison.
    • 5end : the primary stream feature is compressed to the 5' end and overlap is compared against that single base location.
    • 3end : the primary stream feature is compressed to the 3' end and overlap is compared against that single base location.
  • <ignore_strand> : ignore strand specificity when comparing features between the primary and template streams. Enable by setting to true.
  • <inverse> : inverses the filtering process. if set to true then overlaps are blocked not passed through. if set to false then overlaps are allowed to pass through.
  • <overlap_subfeatures> : if features contain subfeatures (eg like transcript gene models) setting this option to true will require that the subfeatures overlap each other. If one of the features does not have subfeatures then the genomic bounds of the feature are used in the overlap calulation. If both features have subfeatures then it must be a subfeature to subfeature overlap.
  • <distance> : features are allowed to be up to distance basepairs away from each other and still be considered to overlap.


In this example, we use TemplateFilter with Gencode transcripts as a side stream to filter the datastream (in the linked View below, RNA-seq from ENCODE) for signal overlapping exons (by enforcing 'overlap_subfeatures') . The histogram is then recreated thru combination of a FeatureEmitter with TemplateCluster to create a regular grid and collates expression evenly into each overlapping "bin".

	<datastream name="gencode">
		<source name="UCSC_gencodeV10_hg19_20120101"  id="D71B7748-1450-4C62-92CB-7E913AB12899::19:::FeatureSource" category="gene"/>
		<n>first collate expression into genome segment grid</n>
		<spstream module="TemplateCluster"> 
				<spstream module="FeatureEmitter">

		<n>then filter those genome segments against genocode exons</n>
		<spstream module="TemplateFilter">
				<spstream module="Proxy" name="gencode"/>

Example ZENBU view showing this script in use http://fantom.gsc.riken.jp/zenbu/gLyphs/#config=4B6jmi8YIt8sXiI1Coa3MC;loc=hg19::chr19:50351422..50392834