TopHits

From ZENBU documentation wiki
Jump to: navigation, search

Data Stream Processing > Processing Modules > Filtering Modules

Description

The TopHits processing module effectively returns a fixed number of Features in a stream query region based on a sorting of their significance. Since the total number of top Features is specified, a fixed queue length can be utilized and a running top features can be augmented in a streaming manner without requiring all data to be loaded into memory. After the top most significant Features have been found, they will be resorted based on chromosome location and streamed out. This module can be useful as a visualization filter to limit the number of objects on the screen, or can be useful for 'genome scanning with data download to return only the most significant results.

Note: Currently the region of filtering is interacting with the TrackCache system so tophits are filtered on the fixed-gridding of the TrackCache system which is 100kb. If the track caching is disabled it will return the TopHits within the query region window. We will improve the control behavior of this module in the future.

Parameters

  • <queue_length> : maximum number of most significant Features which will be returned

Example

This script show a how TopHits can be used as a visualization filter to make sure that the track does not explode with too many items and only shows the most significant.

<zenbu_script>
	<stream_queue>
		<spstream module="UniqueFeature">
			<ignore_strand>true</ignore_strand>
		</spstream>
		<spstream module="TopHits">
			<queue_length>200</queue_length>
		</spstream>
	</stream_queue>
</zenbu_script>

Example ZENBU view showing this script in use
http://fantom.gsc.riken.jp/zenbu/gLyphs/#config=FbcK_M26MlUnac19PXdOXB;loc=hg19::chr8:128746973..128755020

Tophit.1.png