ResizeFeatures

From ZENBU documentation wiki
Revision as of 19:08, 14 October 2012 by Nicolas.bertin (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Data Stream Processing > Processing Modules > General manipulation Modules

Description

The ResizeFeatures processing module is designed to work on Features to alter their genomic coordinates. The module will resort features on the data stream as needed to preserve the stream integrity. Typical use cases

  • shrink the feature to its 5' end and make it 1bp wide (CAGE data)
  • expand a feature 500bp upstream of it's 5'end (ex extending refseq genes into their potential promoter regions to use for overlap analysis)

Parameters

  • <category_filter> : only Features matching categories will be resized
  • <mode> : defines which resize method will be performed
    • shrink_start : shrink the feature to 1bp length on the chrom_start
    • shrink_end : shrink the feature to 1bp length on the chrom_end
    • shrink_5prime : shrink the feature to 1bp length on the 5' end (start for + strand, end of - strand)
    • shrink_3prime : shrink the feature to 1bp length on the 3' end (start for - strand, end of + strand)
    • expand_start : expand the chrom_start by <expand> amount up-stream
    • expand_end : expand the chrom_end by <expand> amount down-stream
    • expand_5prime : expand the 5' end by <expand> amount
    • expand_3prime : expand the 3' end by <expand> amount
    • store : save the coordinates prior to resizing, such that they can be restored in a latter call to ResizeFeatures
    • restore : redefine the coordinates as those stored in a previous ResizeFeatures call
  • <expand> : for expand modes this is the amount to be expanded.
  • <retain_subfeatures> : Defaults to "false" which is the desired behavior: after resising the subfeatures have no more meaning, unless the original coordinates are thought to be store(d)/restore(d) back. Hence, this flag must be set to "true" if you plan to redefine the coordinates to those stored in a previous ResizeFeatures call along with its subfeatures.

Example

This script combines a CalcInterSubfeatures modules with a StreamSubfeatures module to generate introns. This is followed by ResizeFeatures and UniqueFeature to reduce the introns into a set of unique intron donor sites with counts of their abundance.

<zenbu_script>
	<stream_processing>
		<spstream module="CalcInterSubfeatures"/>
		<spstream module="StreamSubfeatures">
			<category_filter>intron</category_filter>
			<transfer_expression>true</transfer_expression>
			<unique>
				<ignore_strand>true</ignore_strand>
			</unique>
		</spstream>
		<spstream module="ResizeFeatures">
			<mode>shrink_start</mode>
		</spstream>
		<spstream module="UniqueFeature"/>
	</stream_processing>
</zenbu_script>

Here is a ZENBU view showing this script in use
http://fantom.gsc.riken.jp/zenbu/gLyphs/#config=fuf2V6ehKhHlbabZkQWebB;loc=hg19::chr8:128746973..128755020
ProcessingResizeFeatures.SpliceDonorAcceptor.example.png

This script exemplifies how one would collate CAGE based expression into transcritps. It uses the combination of several consecutive ResizeFeatures modules, the first ones defining the [-500..+500] region around the beginning of transcritps from which CAGE TSS can be associated (using the with a TemplateCluster module) to the transcripts, and a last final one restoring the original transcript coordinates and their associated exons structure (subfeatures)

<zenbu_script>
	<note> Lists RefSeq transcript sources for all the main assemblies loaded in zenbu </note>
	<datastream name="refseq" output="full_feature">
		<source id="025F0224-D145-4E28-86A2-DB37A42A89CB::21:::FeatureSource" name="UCSC_RefSeq_canFam2_20120101"/>
		<source id="025F0224-D145-4E28-86A2-DB37A42A89CB::5:::FeatureSource" name="UCSC_RefSeq_galGal3_20120101"/>
		<source id="D71B7748-1450-4C62-92CB-7E913AB12899::13:::FeatureSource" name="UCSC_RefSeq_hg19_20120101"/>
		<source id="4043B030-0201-495F-824B-BC197EA3C272::6:::FeatureSource" name="UCSC_RefSeq_mm9_20120101"/>
		<source id="025F0224-D145-4E28-86A2-DB37A42A89CB::35:::FeatureSource" name="UCSC_RefSeq_rn4_20120101"/>
		<source id="0583D02E-BA10-11DE-B45C-8D369A8382FD::78:::FeatureSource" name="UCSC_hg18_refgene"/>
	</datastream>

	<stream_processing>

		<note> Get the Transcriptional Start Sites (TSS) revealed by the 5'extremity of CAGE derived reads </note>
		<spstream module="ResizeFeatures">
                       <mode>shrink_5prime</mode>
                </spstream>

                <note> Collate the CAGE TSS along regions defined as RefSeq TSS +/-500bp </note>
                <spstream module="TemplateCluster">
                       <ignore_strand value="false"/>
                       <side_stream>
                            <spstream module="Proxy" name="refseq"/>

                            <note> Modify the coordinates to refseq TSS, save temporaly the original coordinates for later call back </note>
                            <spstream module="ResizeFeatures">
                                  <retain_subfeatures>true</retain_subfeatures>
                                  <mode>store</mode>
                            </spstream>

                            <note> Modify the coordinates to refseq TSS+/-500bp </note>
                            <spstream module="ResizeFeatures">
                                 <retain_subfeatures>true</retain_subfeatures>
                                 <mode>shrink_5prime</mode>
                            </spstream>
                            <spstream module="ResizeFeatures">
                                 <retain_subfeatures>true</retain_subfeatures>
                                 <mode>expand_start</mode>
                                 <expand>500</expand>
                            </spstream>
                            <spstream module="ResizeFeatures">
                                 <retain_subfeatures>true</retain_subfeatures>
                                 <mode>expand_end</mode>
                                 <expand>500</expand>
                            </spstream>

                       </side_stream>
		 </spstream>

                 <note> Restore RefSeq original coordinates </note>
                 <spstream module="ResizeFeatures">
                       <retain_subfeatures>true</retain_subfeatures>
                       <mode>restore</mode>
                 </spstream>

                 <note> Sum up the expression over all samples and save the value as the refseq score to color it accordingly </note>
	         <spstream module="CalcFeatureSignificance">
		    <expression_mode>sum</expression_mode>
	         </spstream>

	</stream_processing>
</zenbu_script>

Here is a ZENBU view showing this script in use
http://fantom.gsc.riken.jp/zenbu/gLyphs/#config=PyTxIWwAO5apFGJYNVOGjB;loc=hg18::chr11:129118781..129899006
ProcessingResizeFeatures.RefseqPromCAGEexpression.example.png