Track visualization styles

From ZENBU documentation wiki
Revision as of 19:39, 22 July 2013 by Jessica Severin (talk | contribs) (1D heatmap visualization)
Jump to: navigation, search

The Tracks in the ZENBU genome browser fall into three main categories of visualization styles

  • Annotation tracks: where the data sources only contain genomic information and no expression
    Annotation-track-transcript.png
    This includes the styles : thick-arrow, medium-arrow, arrow, centroid, transcript, transcript2, thick-transcript, thin-transcript, box, thick-box, thin-box, probesetloc, seqtag, scorethick, and cytoband.


  • Numerical signal tracks: where signal level is displayed without feature boundaries but in an user interactive data exploration tool.
    Signal style tracks.png
    This includes the styles: signal-histogram, xyplot, 1D-heatmap, and experiment-heatmap


  • Hybrid tracks: ZENBU enhanced visualization which allow for processed data to contain both genomic features and multi-experiment expression data
    Hybrid tracks longRNAseq.png.
    This primarily includes the use of annotation drawing styles with dynamic signal mapped colorings.

Annotation Tracks

Annotation tracks are for the visualization of genomic positional data within the ZENBU genome browser.

Annotation-track-transcript.png

The visualization style of visualization can be changed in the track configuration interface panel's Visualization section.
Track visualization interface.jpg

Different visualization styles vary in the amount of information displayed and the amount of vertical screen space used. For dense data tracks, more compact visualization my be better depending on how one will use the visualization. The strand of the annotation is color coded. The interface provides color-pickers to allow one to choose any color to identify the forward/reverse strand features and the background color.

Annotation-track-thickarrow.png thick arrow style: a colored rectangle with arrow head is drawn from the start to the end of the annotation data. The name of the annotation feature is also displayed. The arrow head is drawn on the 3' end of the annotation data.
Annotation-track-mediumarrow.jpg medium arrow style: a colored rectangle with arrow head is drawn from the start to the end of the annotation data. The arrow head is drawn on the 3' end of the annotation data.
Annotation-track-arrow.png arrow style: a colored arrow is drawn on the 5' end of the annotation pointing toward the 3' end.
Annotation-track-centroid.png centroid style: a think line is drawn alone the span of the annotation feature, and am arrow is placed in the center pointing in the direction of the strand. Annotation is labeled.
Annotation-track-transcript.png transcript style: a complete annotation visualization of the exonic, intronic and UTR regions of a transcript in a style similar to the UCSC transcript visualization. The UTR regions are displayed as thinner blocks where the coding exons are dsplayed as think blocks. There is a very thin backbone line spaning the entire length of the transcript. The name of the annotation feature is also displayed. If this style in used with annotation data without exon/intron/UTR data, the visualization displays just the backbone line without any raised subfeatures.
Annotation-track-transcript2 v2.png transcript2 style: a complete annotation visualization of the exonic, intronic and UTR regions of a transcript. The UTR regions are displayed as an an outline box, exons are displayed as colored boxs, and a gray line spans the entire length of the transcript. The name of the annotation feature is also displayed. If this style in used with annotation data without exon/intron/UTR data, the visualization displays the same as the line style.
Annotation-track-thicktranscript.png thick-transcript style: similar to the transcript style but in a more compact vertical format and without the name of the annotation feature.
Annotation-track-thintranscript.png thin-transcript style: similar to the thick transcript style but without UTR visualization , and even more compact.
Annotation-track-box.png box style: a colored rectangle is drawn from the start to the end of the annotation feature. The name of the annotation feature is also displayed.
Annotation-track-thick-box.png thick-box style: a thick colored rectangle is drawn from the start to the end of the annotation feature.
Annotation-track-thin-box.png thin-box style: a thin colored rectangle is drawn from the start to the end of the annotation feature.
cytoband style:

Numerical signal tracks

Expression tracks are for the visualization of numerical expression data from Experiment Data Sources in the ZENBU genome browser on a segmented genomic grid similar to the UCSC genome browser wiggle visualization.

Signal histogram visualization

In the signal-histogram visualization, numerical signal (like RNA expression, logFC, pvalues...) is visualized as a signal-height graph along genomic coordinate space. ZENBU can visualize both strandless and stranded signal histrograms.

Here is an example of FANTOM4 CAGE signal which is stranded in nature
Glyphs expression track.png

Here is an example of FANTOM4 ChipCHiP signal which is strandless in nature
Glyphs expression track-chipchip.png

Here is an example of ENCODE strandless-protocol RNAseq signal configured to only display expression signal in areas of sequence alignment (skipping gaps of alignments)
Woldlab RNAseq exonic expression signal.jpg
http://fantom.gsc.riken.jp/zenbu/gLyphs/#config=l_D-jGt1IlehEahizVAMeB;loc=hg19::chr8:128746973..128755020

Here is an example of ENCODE stranded-protocol RNAseq exonic expression signal
LongRNASeq CSH exonic expression.png


In order to create this style of visualization the primary expression data must be processed using either the graphical interface expression binning script GUI processing modules or with a custom data processing script to create the dynamic genomic segmented grid.
Expression track configuration.jpg

The expression binning processing GUI parameters are as follows :

  • overlap mode : since ZENBU can work directly with sequence alignment data (often uploaded from BAM files) it is necessary to modify the alignments to be properly visualized. The options here are::
    • area under the curve: the expression is spread evenly along the length of the alignment so that the area-of-the-curve represents the level of expression.
      This only effects alignments which overlap more than one of the genomic segmentation bins. If all alignments are shorter than the genomic segmentation then area and height modes generate the same visualization.
    • height: the expression is collated so that the height of curve represents the level of expression at the genomic segment.
      This only effects alignments which overlap more than one of the genomic segmentation bins. If all alignments are shorter than the genomic segmentation then area and height modes generate the same visualization.
    • 5'end: the expression signal is concentrated at the 5'end of the sequence alignment prior to being collated into the genomic segmentation binning.
      This is primary used for CAGE-based sequencing experiments
    • 3'end: the expression is concentrated on the 3'end of the sequence alignment prior to being collated into the genomic segmentation binning.
      Currently there are few RNA sequencing technologies which can utilize this mode of processing but is included for new technology development.
  • expression binning: the mathematical operation used when multiple expression from the same experiment collate into the same genomic segmentation bin. Each Experiment is kept distint and this math is applied across different expression features within the same Experiment. The options are:
    • sum : sum the different expression values within each experiment
    • min : calculate the minimum value of the different expression values of each experiment
    • max : calculate the maximum value of the different expression values of each experiment
    • mean : calculate the mean average of different expression values of each experiment
    • count : simply report the count of different expression values within each experiment that collate into the genomic segmentation bin.
  • fixed bin size: by default the processing script creates dynamic bin sizes based on the zoom level of the genomic view and the width of the display in order that each segmentation bin maps approximately to a single pixel width on the screen. This ensures that a fine enough visualization resolution is preserved without creating unneeded sub-pixel resolution. But if a finer or courser segmentation binning is desire it can be entered here. For example the track above using a 100base pair fixed binning size.
    Woldlab RNAseq exonic expression signal 100bp.jpg
  • process ignoring strand: if the primary expression experiments are using a strandless protocol or one wishes to process stranded expression in a strandless manner , check this and a strandless genomic segmenation binnning grid will used and strand of the primary data will be ignored. It the data is processed as strandless it is best to also select the strandless option within the visualization options.
  • overlap via subfeatures: sometimes RNA sequencing experiments generate gapped sequence alignments when an RNA molecule spans an intronic splicing junction. This information is contained in BAM files and is preserved durring ZENBU uploading. To get an accurate visualization of true RNA exonic signal these intronic gaps should not be collated into the genomic segmentation bins. The example above of the ENCODE Wold lab RNAseq experiments contain such gapped alignments. Here is this BAM sequence alignment data processed without this option enabled and both RNA exon and intron signal is collated into the expression visalization.
    Woldlab RNAseq exonic expression signal withgaps.jpg


Additional visualization options available for expression Experiment tracks (visualization style of express)

  • hide empty experiment: this parameter effects the track-linked Experiment Expression panel. If selected, only those Experiments with a non-zero expression value are displayed.
  • color expression: currently has no effect when the track is in express' mode
  • display datatype: depending on how the track was configured and processed there may be more than one datatype available for visualization. If more than one is available, please select.
  • background color: the option of altering the background color to help visually group related tracks in very large views. color can be specified using any of the html web color syntaxes (named colors, #FFFFFF style or rgb(255,255,255) style).
  • track pixel height: adjusts the screen height of the track. this can also be adjusted with the resize widget on the left side of the track with click-drag.
    Express track height resize.jpg
  • express scale: adjusts the numerical scale which the expression values are displayed. by default this is auto meaning that the expression track is visually rescaled to fit into the height of the track. If one desires to use a fixed scaling among several tracks, this can be set here. Tracks with more expression than this scale limit are clipped.
  • log scale: for visualization the expression can be dynamically compressed onto a log scale. If the expression has huge dynamic range, this can be helpful to expand the low background signal and compress the higher peaks. For example here is the FANTOM4 CAGE expression track from above visualized on a log scale.
    Express track fantom4 cage logscale.jpg
  • strandless: this visualization option should be set in coordination with data processing which is also strandless.

1D heatmap visualization

The same expression-binning data processing can be also displayed in the 1D-heatmap visualization style. This is a visualization style for expression data but can also be applied to non-overlapping hybrid data. It draw the expression on a single layer of a track using only the false-color-spectrum to visualize expression differences. It can be used in combination with normal "expression binning" processing or with more advanced scripts. The 1D-heatmap visualization does not display strand information so processing should be done in a strandless or separated-strand manner using custom scripting.

Configure spectrum visualization.png

This example RNAseq data is processed to display only exonic signal (no gaps/introns) and displayed with the "blue1" false-color-spectrum. This style of visualization gives a very compact track which allows people to use it in situations where they might need many separate expression tracks.

Woldlab-rnaseq-exonic-spectrum.png

Experiment heatmap visualization

This is a visualization style for datasource pooled tracks with many experiments and expression. In this style of visualization each experiment is given a unique horizontal layer in the image, vertical slices represent genomic segments, and the false-color-spectrum is applied to the expression value at the intersection of genomic-position and experiment. This style of visualization simultaneously shows spatial variation in expression and differential expression between experiments.

In this example the RNAseq is processed for exonic signal and binned into a genomic-segmentation grid and experiments are sorted based on most expression value.

Woldla exonic rnaseq heatmap.png

Hovering over elements in the heatmap reveals the name of the experiment, the location and the expression value collated into that genomic segment.

Woldlab-RNAseq-heatmap-mousehover.png

The order of experiments matches the order in the linked Experiment-expression graph and resorting in that panel, changes the sort-order of experiments in the heatmap. Here the sort order is changed to be by sample cell-type name.

Woldlab-RNAseq-heatmap-sample-sort.png

Another example showing more dramatic differential expression and spatial difference between RNAseq exonic signal among different ENCODE samples.

Hybrid track longRNAseq spectrum.png

xyplot - stranded, signed signal visualization

Hybrid Tracks

ZENBU advanced visualization tracks which combine genomic annotation and expression Experiments, often in combination with ZENBU data processing to create novel visualizations. There are several different types of visualizations which can be categorized as hybrid tracks.

Expression false coloring of genomic features

In this style of hybrid visualization, genomic annotation Features have expression collated onto them. This can either be generated inside ZENBU by a data processing script or by utilizing the BED file with score-as-expression loading options or with OSCtable files with combined annotation and expression. This visualization is enabled by selecting one of the annotation visualization styles, checking the color expression box and selecting a false color spectrum.
Hybrid track colored features config.jpg

For example here is a track which uses ZENBU data processing of the ENCODE wold-lab RNAseq expression (which was loaded via BAM files) collated into Gencode Gene models to give gene expression. This data processing is then visualized as a hybrid track with the transcript visualization stlye and the color expression option with the fire1 false-color-spectrum.
Hybrid track gencode gene RNAseq expression.jpg
The top track is the hybrid track showing the processed gene expression, and the two tracks below are the RNAseq expression signal track which was then collated into the Gencode gene models which are shown in the third track. For details on how to create tracks like this, please see the case study RNAseq_expression_collated_onto_gene_models


Here is a variation on the previous collated-expression situation, but here we use advanced scripting to dynamically generate new genomic-features from the primary data and then use false-color-spectrum to show their abundance.

Woldlab-intronsupport.png

In this track RNAseq alignment gaps are extracted by ZENBU processing into new genomic-features and then "uniqued" and counted. In gapped RNAseq, long gaps mainly occur because of RNA spanning introns and these gaps represent evidence for introns. These "intron evidence" features are then filtered for length and minimum abundance before being displayed using "medium-exon" and a "fire1" spectrum.