Difference between revisions of "OSCtable"

From ZENBU documentation wiki
Jump to: navigation, search
Line 15: Line 15:
 
The basic structure is '##qualifier = value'.<br>Required metadata: FileFormat, Date, ProtocolREF, ColumnVariable, ContactName, ContactEmail
 
The basic structure is '##qualifier = value'.<br>Required metadata: FileFormat, Date, ProtocolREF, ColumnVariable, ContactName, ContactEmail
  
 +
When using <blockquote>##NameSpace=genomic_coordinate</blockquote>, ''' 'genome_assembly' for parameter value is required.'''
  
 
====Required (mandatory) metadata====
 
====Required (mandatory) metadata====

Revision as of 12:46, 18 April 2012

OSCtable1.1

Basic structure

  • A simple tabdelimited text file.
  • The first line after the comments/metadata (see below) is a header line, which indicate column names of the table.
  • Column order is flexible
  • The first column should describe a 'key' (unique in many cases, but not necessarily) of the data, and the column name should be 'id'
  • If a cell need to include multiple values, comma(',') is recommended to be used as a separator.
  • Lines starting with '#' are comments.
  • Lines starting with '##' are attributes or metadata of the table. (See 'Metadata' section below)
  • All the comment and attribute lines should appear above the header line
  • All the column should be described in Metadata (See 'Metadata' section below)


Metadata ('##' lines)

The basic structure is '##qualifier = value'.
Required metadata: FileFormat, Date, ProtocolREF, ColumnVariable, ContactName, ContactEmail

When using

##NameSpace=genomic_coordinate

, 'genome_assembly' for parameter value is required.

Required (mandatory) metadata

FileFormat

describes file format of this file.
example:

##FileFormat = OSCtable1.1

Date

describes the date when the data file is generated

##Date = 20090602

ProtocolREF

describes the protocol used to generate the data file.

##ProtocolREF = CAGEmappingv1.0

ColumnVariable

describes the ALL columns used in the data file

##ColumnVariable[start] = this is a start position of the genomic coordinate
##ColumnVariable[end] = this is a stop position of the genomic coordinate
##ColumnVariable[norm.THP10h] = this is TPM normalized value with 10h
##ColumnVariable[entrez_gene_id] = Entrez gene ID, which is assigned to the cluster

ContactName

describes the contact name about the data file.

##ContactName = Hideya Kawaji

ContactEmail

describes the contact address about the data file

##ContactEmai = kawaji@gsc.riken.jp

'genome_assembly' for parameter value is required.

Optional metadata

InputFile

describes the file(s) used to generate the data file

##InputFile = lane1.fa
##InputFile = lane2.fa

ParameterValue

describes the parameter(s) used to generate the data file in the protocol
the parameter(s) should be consistent with the protocol description

##ParameterValue[alignment_program] = BWA
##ParamterValue[aligment_program_version] = 1.3.5
##ParameterValue[UCSC_gene_tracks] = RefSeq
##ParameterValue[UCSC_gene_tracks] = ENSEMBL transcript

NameSpace

describes the name space for the column names. See below (NameSpace)

##NameSpace=genomic_coordinate
##NameSpace=expression

Name Spaces

  • A set of column names (and parameters) to be used for a specific purpose or context.
  • The same column names with the same name space are recognized as the same (equivalent) meaning.
  • Supported name space: genomic_coordinate, expression

genomic_coordinate

  • column names are: chrom, start.0base, start.1base, end, strand
  • parameter value: genome_assembly
    • chrom: chromosome name used in the genome assembly. For example, chr1, chr2, chr3, ... chrM for the UCSC hg18 genome assembly.
    • start.0base: start position (bp) on the chromosome in 0start coordinate system (BED, PSL, BLAT, exonerate, and nexAlign style)
    • start.1base: start position (bp) on the chromosome in 1start coordinate system (conventional coordinate system; adopted in GFF as well)
    • end: end position (bp) on the chromosome
    • strand: strand on the chromosome; optional
    • Note:
      • All of the above columns are not necessarily required. For example, start.0base would not be required if you have start.1base, and strand would not be required if the annotation do not have strand distinction
      • 'genome_assembly' for parameter value is required.

expression

generic expression tags
  • column names are: raw.ZZZ, norm.ZZZ, or exp.YYY.ZZZ
    • raw.ZZZ : for raw value of expression such as raw_counts and raw signal intensities.
    • norm.ZZZ : for normalized value.
    • ZZZ indicates the experiment (cell conditions, RNAs, etc) of the expressions
    • YYY should not include dot (.), and indicates the type of expression.
'mapcount' tag

To specify that alignments may be be mapped on more than one location, you can either use...