The Proxy a special place holder processing module designed to work in coordination with the <datastream> section of the ZENBU scripting system.
Each <datastream> has a name attribute and a pool of data sources with tag <source>. Each data source is defined by their ZENBU system id . The other attributes of each <source> are ignored, but can be helpful for script writers as comments. Here is an example of a data stream pool of 4 RNAseq experiments from the Encode project from HepG2 cells.
<datastream name="encode_wold_hepg2" output="skip_metadata" datatype="tagcount" > <source id="904A696A-62EC-4665-85B9-4F92DDFA9814::2:::Experiment" platform="RNA-seq"/> <source id="8EB257B8-6B26-4DB7-8470-07A708EC7CEF::2:::Experiment" platform="RNA-seq"/> <source id="A80763D0-F12C-449D-AFEA-288BEBE55C4A::2:::Experiment" platform="RNA-seq"/> <source id="74E98401-90F9-4FE4-B534-2AC4D3955753::2:::Experiment" platform="RNA-seq"/> </datastream>
The matching Proxy for this datastream in a script would look like this
... <spstream module="Proxy" name="encode_wold_hepg2"/> ...
By separating the data sources and proxy place-holders it is possible to provide
- makes it easy to copy/paste commonly used <datastream> blocks between different scripts.
- security checking that the current user is allowed access to the data sources defined in the <datastream> sections
- allows the pooled data sources to be reused in different sections of the same script by placing multiple Proxy modules with the same name.
Proxy and <datastream> are a special module pairing in the ZENBU script system and use these attributes to control the nature of the data on the datastream.
- name : name of the <datastream> which will be injected in place of this proxy at query time. Name must match between a <datastream> definition for a Proxy to correctly initialize.
- output : defines the level of data which will be provided on this datastream. By limiting the level of data loading, performance can be increased. Valid values are:
- full_feature : Features are loaded with all available data -- genome coordinates, name, subfeatures, expression and feature metadata.
- simple_feature : Features are loaded with only genome coordinates and names. No subfeatures, nor expression nor feature metadata. Default if no specified.
- subfeature : Features are loaded with -- genome coordinates, name, subfeatures. No expression nor metadata.
- expression : Features are loaded with -- genome coordinates, name, expression. No subfeatures nor metadata.
- skip_metadata : Features are loaded with -- genome coordinates, name, subfeatures, and expression. No metadata.
- skip_expression : Features are loaded with -- genome coordinates, name, subfeatures, and metadata. No expression.
- datatype : defineds the expression datatype for the datastream. If not specified, no expression will be available on the datastream.
Getting datastream xml definitions from tracks
The easiest way to get the XML for a Proxy Datastream pool is to use an already existing track configured with the desired data sources.
First access the Track Reconfiguration panel and then select the datastream xml control. This will bring up a pop-up panel with the XML datastream definition for this track which can be copy-pasted into your script in another track. Please note that when using this interface that the default name of the datastream is an arbitrary track-number. It is best to rename the datastream-pool to something easier to remember after copying into your script.
This is a script which incorporates a Proxy / TemplateCluster to collate expression into Gencode V10 gene models. The expression is then normalized via the NormalizeRPKM normalization module. The script finishes with CalcFeatureSignificance so that the Features can be displayed via score-coloring.
<zenbu_script> <datastream name="gencode" output="full_feature"> <source id="D71B7748-1450-4C62-92CB-7E913AB12899::19:::FeatureSource"/> </datastream> <stream_queue> <spstream module="TemplateCluster"> <overlap_mode>height</overlap_mode> <skip_empty_templates>false</skip_empty_templates> <expression_mode>sum</expression_mode> <overlap_subfeatures>true</overlap_subfeatures> <ignore_strand>true</ignore_strand> <side_stream> <spstream module="Proxy" name="gencode"/> </side_stream> </spstream> <spstream module="NormalizeRPKM"/> <spstream module="CalcFeatureSignificance"/> </stream_queue> </zenbu_script>
Here is a ZENBU view showing this script in use