BED file support
BED files are a common interchange format for genomic annotations. BED lines have three required fields and nine additional optional fields. The number of fields per line must be consistent throughout any single set of data in an annotation track. The order of the optional fields is binding: lower-numbered fields must always be populated if higher-numbered fields are used.
The 12 BED columns are labeled as follows:
- chrom - The name of the chromosome (e.g. chr3, chrY, chr2_random) or scaffold (e.g. scaffold10671).
- chromStart - The starting position of the feature in the chromosome or scaffold. The first base in a chromosome is numbered 0.
- chromEnd - The ending position of the feature in the chromosome or scaffold. The chromEnd base is not included in the display of the feature. For example, the first 100 bases of a chromosome are defined as chromStart=0, chromEnd=100, and span the bases numbered 0-99.
- name - Defines the name of the BED line.
- score - A score between 0 and 1000.
- strand - Defines the strand - either '+' or '-'.
- thickStart - The starting position at which the feature is drawn thickly (for example, the start codon in gene displays).
- thickEnd - The ending position at which the feature is drawn thickly (for example, the stop codon in gene displays).
- itemRgb - An RGB value of the form R,G,B (e.g. 255,0,0). If the track line itemRgb attribute is set to "On", this RBG value will determine the display color of the data contained in this BED line.
- blockCount - The number of blocks (exons) in the BED line.
- blockSizes - A comma-separated list of the block sizes. The number of items in this list should correspond to blockCount.
- blockStarts - A comma-separated list of block starts. All of the blockStart positions should be calculated relative to chromStart. The number of items in this list should correspond to blockCount.
ZENBU interpretation of BED files
The BED file format easily maps into the ZENBU data model.
- chrom, chromStart, chromEnd, strand are directly interpreted as genomic coordinates. It should be noted that BED files are in a zero-exclusive coordinate space, while ZENBU uses a 1-based-inclusive coordinate space. ZENBU automatically handles the conversion between coordinate spaces.
- name is stored in the ZENBU Feature name
- score is stored in the Feature significance. On data uploading there is an option to copy the score into an Expression value of a specified DataType.
- the three columns blockCount, blockSizes, blockStarts work together and are interpreted into SubFeatures on the primary Feature. Each of these SubFeatures are created with a FeatureSource category of block. If these columns generate SubFeatures then ZENBU can also interpret the thickStart and thickEnd columns as follows
- if thickStart is not equal to chromStart then the region from chromStart to thickStart is interpreted into a SubFeature of category 5utr
- if thickEnd is not equal to chromEnd then the region from thickEnd to chromEnd is interpreted into a SubFeature of category 3utr
- itemRGB is currently loaded but not interpreted.
for example this BED line
chr5 137801180 137805004 NM_001964 0.00 + 137801451 137803770 0 2 576,2558 0,1265
is interpreted into the ZENBU data model (here displayed in a ZENBU XML export/interchange format)
<feature name="NM_001964" start="137801181" end="137805004" strand="+" > <chrom chr="chr5" asm="hg19" ucsc_sm="hg19" ncbi_asm="GRCh37" taxon_id="9606" length="180915260"/> <featuresource category="refgene" name="UCSC_hg19_refgene" feature_count="35067"/> <subfeatures count="4"> <feature category="5utr" start="137801181" end="137801451" strand="+"/> <feature category="block" start="137801181" end="137801757" strand="+"/> <feature category="block" start="137802446" end="137805004" strand="+"/> <feature category="3utr" start="137803770" end="137805004" strand="+"/> </subfeatures> </feature>
BED as OSCtable header
BED files can easily be represented with an OSCtable column header line using the ZENBU extended column namespace.
eedb:chrom eedb:start.0base eedb:end eedb:name eedb:score eedb:strand eedb:bed_thickstart eedb:bed_thickend bed:itemRgb eedb:bed_block_count eedb:bed_block_sizes eedb:bed_block_starts
The official BED specification is available here http://genome.ucsc.edu/FAQ/FAQformat.html#format1