NormalizeRPKM

From ZENBU documentation wiki
Jump to: navigation, search

Data Stream Processing > Processing Modules > Data normalization and rescaling Modules

Description

The NormalizeRPKM processing module is an extension of NormalizePerMillion and designed to normalize the expression of a feature based on both the total expression in an experiment and the cumulative length of each Feature's subfeatures to recompute the expression level as <datatype> per million per 1000 basepairs (RPKM). If there is no Experiment total count then normalization is just based on subfeature total length (<datatype> per 1000 basepairs).

Parameters

  • <category_filter> : defines the subfeatures which are used in calculating the total subfeature length

Example

This is a script which incorporates a Proxy / TemplateCluster to collate expression into Gencode V10 gene models. The expression is then normalized via the NormalizeRPKM normalization module. The script finishes with CalcFeatureSignificance so that the Features can be displayed via score-coloring.

<zenbu_script>
	<datastream name="gencode" output="full_feature">
		<source id="D71B7748-1450-4C62-92CB-7E913AB12899::19:::FeatureSource"/>
	</datastream>
	<stream_queue>
		<spstream module="TemplateCluster">
			<overlap_mode>height</overlap_mode>
			<skip_empty_templates>false</skip_empty_templates>
			<expression_mode>sum</expression_mode>
			<overlap_subfeatures>true</overlap_subfeatures>
			<ignore_strand>true</ignore_strand>
			<side_stream>
				<spstream module="Proxy" name="gencode"/>
			</side_stream>
		</spstream>

		<spstream module="NormalizeRPKM">
			<category_filter>exon</category_filter>
			<category_filter>block</category_filter>
		</spstream>

		<spstream module="CalcFeatureSignificance"/>
	</stream_queue>
</zenbu_script>

Here is a ZENBU view showing this script in use
http://fantom.gsc.riken.jp/zenbu/gLyphs/#config=vtPXLwwqO9KjD1YYWqCMGD;loc=hg19::chr8:128746973..128755020

Gencode.rpkm.collation.1.png