Difference between revisions of "Proxy"

From ZENBU documentation wiki
Jump to: navigation, search
(Created page with "==Description== The '''FilterSubfeatures''' processing module is designed to work on Features with subfeatures to remo...")
 
(Getting datastream xml definitions from tracks)
 
(50 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
__NOTOC__
 +
[[DataProcessing | Data Stream Processing]]
 +
>
 +
[[DataProcessing#Processing_modules | Processing Modules]]
 +
>
 +
[[DataProcessing#Infrastructure_modules | Infrastructure Modules]]
 +
 
==Description==
 
==Description==
The '''FilterSubfeatures''' [[DataProcessing#Processing_modules|processing module]] is designed to work on [[DataModel#Feature|Features]] with subfeatures to remove subfeatures not specified in the category filtering and then rebuild the outer feature boundaries based on the remaining subfeatures. The module will resort features on the data stream as needed to preserve the stream integrity.
+
The '''Proxy''' a special place holder [[DataProcessing#Processing_modules|processing module]] designed to work in coordination with the <datastream> section of the ZENBU scripting system.
 +
 
 +
Each <datastream> has a ''' ''name'' ''' attribute and a pool of data sources with tag <source>.  Each data source is defined by their ZENBU system ''' ''id'' '''. The other attributes of each <source> are ignored, but can be helpful for script writers as ''comments''.  Here is an example of a data stream pool of 4 RNAseq experiments from the Encode project from HepG2 cells.
 +
 
 +
<pre>
 +
<datastream name="encode_wold_hepg2" output="skip_metadata" datatype="tagcount" >
 +
  <source id="904A696A-62EC-4665-85B9-4F92DDFA9814::2:::Experiment" platform="RNA-seq"/>
 +
  <source id="8EB257B8-6B26-4DB7-8470-07A708EC7CEF::2:::Experiment" platform="RNA-seq"/>
 +
  <source id="A80763D0-F12C-449D-AFEA-288BEBE55C4A::2:::Experiment" platform="RNA-seq"/>
 +
  <source id="74E98401-90F9-4FE4-B534-2AC4D3955753::2:::Experiment" platform="RNA-seq"/>
 +
</datastream>
 +
</pre>
 +
 
 +
The matching Proxy for this datastream in a script would look like this
 +
<pre>
 +
  ...
 +
  <spstream module="Proxy" name="encode_wold_hepg2"/>
 +
  ...
 +
</pre>
 +
 
 +
By separating the data sources and proxy place-holders it is possible to provide
 +
* makes it easy to copy/paste commonly used <datastream> blocks between different scripts.
 +
* security checking that the current user is allowed access to the data sources defined in the <datastream> sections
 +
* allows the pooled data sources to be reused in different sections of the same script by placing multiple Proxy modules with the same ''name''.
 +
 
 +
==Datastream attributes==
 +
Proxy and <datastream> are a special module pairing in the ZENBU script system and use these attributes to control the nature of the data on the datastream.
 +
 
 +
* '''name''' : name of the <datastream> which will be injected in place of this proxy at query time. Name must match between a <datastream> definition for a Proxy to correctly initialize.
 +
* '''output''' : defines the level of data which will be provided on this datastream. By limiting the level of data loading, performance can be increased. Valid values are:
 +
** ''full_feature'' : Features are loaded with all available data -- genome coordinates, name, subfeatures, expression and feature metadata.
 +
** ''simple_feature'' : Features are loaded with only genome coordinates and names. No subfeatures, nor expression nor feature metadata. Default if no specified.
 +
** ''subfeature'' : Features are loaded with -- genome coordinates, name, subfeatures. No expression nor metadata.
 +
** ''expression'' : Features are loaded with -- genome coordinates, name, expression. No subfeatures nor metadata.
 +
** ''skip_metadata'' : Features are loaded with -- genome coordinates, name, subfeatures, and expression. No metadata.
 +
** ''skip_expression'' : Features are loaded with -- genome coordinates, name, subfeatures, and metadata. No expression.
 +
* '''datatype''' : defineds the expression datatype for the datastream. If not specified, no expression will be available on the datastream.
 +
 
 +
== Getting datastream xml definitions from tracks ==
 +
The easiest way to get the XML for a Proxy Datastream pool is to use an already existing track configured with the desired data sources.
 +
 
 +
For example below we can see a Gencode annotation track on top which we want to use as a Proxy datastream to collate expression from the second track in order to create the third track.
 +
[[File:Gencode_collation_RNAseq.png|800px]]
  
==Atributes==
+
First access the Track Reconfiguration panel [[File:Track_controls-reconfigure_track.jpg|100px]] and then select the '''datastream xml''' control. This will bring up a pop-up panel with the XML datastream definition for this track which can be '''copy-pasted''' into your script in another track. Please note that when using this interface that the default ''name'' of the datastream is an arbitrary track-number. It is best to rename the datastream-pool to something easier to remember after copying into your script.
Proxy is a special module in the system and does not have parameters. Instead it has a single attribute
 
  
* '''name''' : name of the <datastream> which will be injected in place of this proxy at query time.
+
[[File:Datastream_xml_widget.png|512px]]
  
 
==Example==
 
==Example==
This script combines two [[CalcInterSubfeatures|CalcInterSubfeatures]] modules with a '''FilterSubfeatures''' module to manipulate a complex gene model of exons and UTRs into a new gene model based on coding exons. It first creates ''intron'' features between the exons which might be labeled either as ''block'' or ''exon''. The second CalcInterSubfeatures then uses ''5utr'' and ''3utr'' and ''intron'' as demarkation to create ''codingexon'' subfeatures. Lastly the '''FilterSubfeatures''' rebuilds the transcript gene model using only these ''codingexon'' subfeatures.
+
This is a script which incorporates a '''Proxy''' / [[TemplateCluster|TemplateCluster]] to collate expression into Gencode V10 gene models. The expression is then normalized via the [[NormalizeRPKM|NormalizeRPKM]] normalization module. The script finishes with [[CalcFeatureSignificance|CalcFeatureSignificance]] so that the Features can be displayed via score-coloring.
  
 
<pre>
 
<pre>
 
<zenbu_script>
 
<zenbu_script>
<datastream name="gencode">
+
<datastream name="gencode" output="full_feature">
 
<source id="D71B7748-1450-4C62-92CB-7E913AB12899::19:::FeatureSource"/>
 
<source id="D71B7748-1450-4C62-92CB-7E913AB12899::19:::FeatureSource"/>
 
</datastream>
 
</datastream>
Line 23: Line 71:
 
<ignore_strand>true</ignore_strand>
 
<ignore_strand>true</ignore_strand>
 
<side_stream>
 
<side_stream>
<spstream module="EEDB::SPStream::Proxy" name="gencode"/>
+
<spstream module="Proxy" name="gencode"/>
 
</side_stream>
 
</side_stream>
 
</spstream>
 
</spstream>
Line 35: Line 83:
  
 
Here is a ZENBU view showing this script in use
 
Here is a ZENBU view showing this script in use
<br>http://fantom.gsc.riken.jp/zenbu/gLyphs/#config=vtPXLwwqO9KjD1YYWqCMGD;loc=hg19::chr8:128746973..128755020
+
<br>http://fantom.gsc.riken.jp/zenbu/gLyphs/#config=vtPXLwwqO9KjD1YYWqCMGD

Latest revision as of 09:02, 31 October 2012

Data Stream Processing > Processing Modules > Infrastructure Modules

Description

The Proxy a special place holder processing module designed to work in coordination with the <datastream> section of the ZENBU scripting system.

Each <datastream> has a name attribute and a pool of data sources with tag <source>. Each data source is defined by their ZENBU system id . The other attributes of each <source> are ignored, but can be helpful for script writers as comments. Here is an example of a data stream pool of 4 RNAseq experiments from the Encode project from HepG2 cells.

<datastream name="encode_wold_hepg2" output="skip_metadata" datatype="tagcount" >
   <source id="904A696A-62EC-4665-85B9-4F92DDFA9814::2:::Experiment" platform="RNA-seq"/>
   <source id="8EB257B8-6B26-4DB7-8470-07A708EC7CEF::2:::Experiment" platform="RNA-seq"/>
   <source id="A80763D0-F12C-449D-AFEA-288BEBE55C4A::2:::Experiment" platform="RNA-seq"/>
   <source id="74E98401-90F9-4FE4-B534-2AC4D3955753::2:::Experiment" platform="RNA-seq"/>
</datastream>

The matching Proxy for this datastream in a script would look like this

  ...
  <spstream module="Proxy" name="encode_wold_hepg2"/>
  ...

By separating the data sources and proxy place-holders it is possible to provide

  • makes it easy to copy/paste commonly used <datastream> blocks between different scripts.
  • security checking that the current user is allowed access to the data sources defined in the <datastream> sections
  • allows the pooled data sources to be reused in different sections of the same script by placing multiple Proxy modules with the same name.

Datastream attributes

Proxy and <datastream> are a special module pairing in the ZENBU script system and use these attributes to control the nature of the data on the datastream.

  • name : name of the <datastream> which will be injected in place of this proxy at query time. Name must match between a <datastream> definition for a Proxy to correctly initialize.
  • output : defines the level of data which will be provided on this datastream. By limiting the level of data loading, performance can be increased. Valid values are:
    • full_feature : Features are loaded with all available data -- genome coordinates, name, subfeatures, expression and feature metadata.
    • simple_feature : Features are loaded with only genome coordinates and names. No subfeatures, nor expression nor feature metadata. Default if no specified.
    • subfeature : Features are loaded with -- genome coordinates, name, subfeatures. No expression nor metadata.
    • expression : Features are loaded with -- genome coordinates, name, expression. No subfeatures nor metadata.
    • skip_metadata : Features are loaded with -- genome coordinates, name, subfeatures, and expression. No metadata.
    • skip_expression : Features are loaded with -- genome coordinates, name, subfeatures, and metadata. No expression.
  • datatype : defineds the expression datatype for the datastream. If not specified, no expression will be available on the datastream.

Getting datastream xml definitions from tracks

The easiest way to get the XML for a Proxy Datastream pool is to use an already existing track configured with the desired data sources.

For example below we can see a Gencode annotation track on top which we want to use as a Proxy datastream to collate expression from the second track in order to create the third track. Gencode collation RNAseq.png

First access the Track Reconfiguration panel Track controls-reconfigure track.jpg and then select the datastream xml control. This will bring up a pop-up panel with the XML datastream definition for this track which can be copy-pasted into your script in another track. Please note that when using this interface that the default name of the datastream is an arbitrary track-number. It is best to rename the datastream-pool to something easier to remember after copying into your script.

Datastream xml widget.png

Example

This is a script which incorporates a Proxy / TemplateCluster to collate expression into Gencode V10 gene models. The expression is then normalized via the NormalizeRPKM normalization module. The script finishes with CalcFeatureSignificance so that the Features can be displayed via score-coloring.

<zenbu_script>
	<datastream name="gencode" output="full_feature">
		<source id="D71B7748-1450-4C62-92CB-7E913AB12899::19:::FeatureSource"/>
	</datastream>
	<stream_queue>
		<spstream module="TemplateCluster">
			<overlap_mode>height</overlap_mode>
			<skip_empty_templates>false</skip_empty_templates>
			<expression_mode>sum</expression_mode>
			<overlap_subfeatures>true</overlap_subfeatures>
			<ignore_strand>true</ignore_strand>
			<side_stream>
				<spstream module="Proxy" name="gencode"/>
			</side_stream>
		</spstream>

		<spstream module="NormalizeRPKM"/>

		<spstream module="CalcFeatureSignificance"/>
	</stream_queue>
</zenbu_script>

Here is a ZENBU view showing this script in use
http://fantom.gsc.riken.jp/zenbu/gLyphs/#config=vtPXLwwqO9KjD1YYWqCMGD