Data loading

From ZENBU documentation wiki
Revision as of 21:37, 26 June 2012 by Nicolas.bertin (talk | contribs) (Uploading of data with associated experiment/expression)
Jump to: navigation, search

ZENBU supports several files types for uploading primary data into the system. Since ZENBU provides built in data processing capabilities, it is possible to upload data in a more raw or primary format. When data is loaded into the system it is first translated into the internal ZENBU Data Model which allows the ZENBU system to manipulate that data as genomic annotation, expression data, and descriptive metadata.

File formats

The file types currently supported by ZENBU upload are:

Secured data uploading

ZENBU provides for data loading throught the secured user profile system.
This guarantees that the data is only available to the specific users who should have access to it.
After a user has securely logged into the ZENBU system they can upload data for either private use or for sharing with specific collaborations.


Uploading of data with associated experiment/expression

UCSC genome browser or the IGV genome browser tie the data upload format to its visualization. For example in UCSC, BED files are always display as annotation and wig files are always displayed as "wiggle" tracks. With UCSC or IGV, all processing must be performed externally to the system prior to creating their visualization files.
In constrast, ZENBU offers greater flexibility : typical annotations containing files (ESTs, gene models, ...) in BED format can be turned into can be used to produce wiggle tracks or heatmaps, bam files can be displayed as annoations (so as to see individual reads), etc...

Experiment expression data can be loaded via three different means.

  • as BED files
    • BED file based data uploading offers the option to use the score column and assign its value to a specific expression data type by clicking the [BED.score column has expression values] option and selecting the datatype associated to those expression values.
    • If the expression is simply a count of '1' for every feature (for example, used when loading mapped reads), then one can use BED or GFF style files and simple check the [single-best-mapping expression] option.
  • as OSCtable files
    • OSCtables provide a rich set of control vocabulary to specify multiple experiments within a single file, experiment metadata and multiple datatypes in multiple columns in the file. It allows all possible mapping of data into the internal data model. Since the OSCtable specification is highly flexible, it was possible for the ZENBU OSCtable parser to have an extended vocabulary of metadata directives and column name spaces.
    • ZENBU OSCtable parser is able to parse both tab-separated and space-separated files.