NeighborCutoff

From ZENBU documentation wiki
Jump to: navigation, search

Data Stream Processing > Processing Modules > Filtering Modules

Description

The NeighborCutoff processing module is filtering algorithm which operates on the significance of Features. Filtering is performed based on the concept that "a hill next to a mountain is lost in the background, while that same hill in a field looks like a giant". Filtering is performed based on a ration of Features relative to their neighbors. Strong Features will shadow weaker Features and filter them out.

The motivation for NeighborCutoff was that often with sequence data, there are many situations where data appears to "spill over". When there is a strong signal, the background around that signal is often stronger than the background in other areas. Therefore it is sometime necessary to adjust the noise cutoff level relative to the signal in an area. This is what NeighborCutoff does.

Parameters

  • <distance> : distance between Features which defines them to be neighbors
  • <ratio> : maximum allow ratio of largest Feature in a Neighborhood to the smallest. Features less than (strongest neighbor significance / ratio) are filtered out. It can be consider like distance to the noise floor. The larger the ratio the more noisy / weaker neighbors are allowed to remain.

Example

This is a complex script which incorporates a FeatureEmitter / TemplateCluster expression histogram binning with de-novo clustering via Paraclu followed by CalcFeatureSignificance and then several filtering steps including NeighborCutoff, CutoffFilter, and FeatureLengthFilter

<zenbu_script>
	<stream_queue>
		<spstream module="TemplateCluster">
			<overlap_mode>area</overlap_mode>
			<expression_mode>sum</expression_mode>
			<side_stream>
				<spstream module="FeatureEmitter">
					<width>1</width>
					<fixed_grid>true</fixed_grid>
					<both_strands>true</both_strands>
				</spstream>
			</side_stream>
		</spstream>

		<spstream module="Paraclu">
			<min_cutoff>10</min_cutoff>
			<stability>0</stability>
			<max_cluster_length>100</max_cluster_length>
		</spstream>

		<spstream module="CalcFeatureSignificance">
			<expression_mode>sum</expression_mode>
		</spstream>

		<spstream module="NeighborCutoff">
			<ratio>300</ratio>
			<distance>100</distance>
		</spstream>

		<spstream module="CutoffFilter">
			<min_cutoff>100</min_cutoff>
		</spstream>

		<spstream module="FeatureLengthFilter">
			<max_length>50</max_length>
		</spstream>

	</stream_queue>
</zenbu_script>

Example ZENBU view showing this script in use
http://fantom.gsc.riken.jp/zenbu/gLyphs/#config=jqr7D6J2PxMrOdqTg8glvD;loc=hg19::chr19:49990236..49997699

Paraclu.1.png