BED file support
BED files are a common interchange format for genomic annotations. BED lines have three required fields and nine additional optional fields. The number of fields per line must be consistent throughout any single set of data in an annotation track. The order of the optional fields is binding: lower-numbered fields must always be populated if higher-numbered fields are used.
The 12 BED columns are labeled as follows:
- chrom - The name of the chromosome (e.g. chr3, chrY, chr2_random) or scaffold (e.g. scaffold10671).
- chromStart - The starting position of the feature in the chromosome or scaffold. The first base in a chromosome is numbered 0.
- chromEnd - The ending position of the feature in the chromosome or scaffold. The chromEnd base is not included in the display of the feature. For example, the first 100 bases of a chromosome are defined as chromStart=0, chromEnd=100, and span the bases numbered 0-99.
- name - Defines the name of the BED line.
- score - A score between 0 and 1000.
- strand - Defines the strand - either '+' or '-'.
- thickStart - The starting position at which the feature is drawn thickly (for example, the start codon in gene displays).
- thickEnd - The ending position at which the feature is drawn thickly (for example, the stop codon in gene displays).
- itemRgb - An RGB value of the form R,G,B (e.g. 255,0,0). If the track line itemRgb attribute is set to "On", this RBG value will determine the display color of the data contained in this BED line.
- blockCount - The number of blocks (exons) in the BED line.
- blockSizes - A comma-separated list of the block sizes. The number of items in this list should correspond to blockCount.
- blockStarts - A comma-separated list of block starts. All of the blockStart positions should be calculated relative to chromStart. The number of items in this list should correspond to blockCount.
Special note there is a special translation for BED4 (4 column) version files to match the BEDGraph format (chrom,start,end,score). BED3, BED6 and BED12 are standard versions that is commonly used in the field.
ZENBU interpretation of BED files
The BED file format easily maps into the ZENBU data model.
- chrom, chromStart, chromEnd, strand are directly interpreted as genomic coordinates. It should be noted that BED files are in a zero-exclusive coordinate space, while ZENBU uses a 1-based-inclusive coordinate space. ZENBU automatically handles the conversion between coordinate spaces.
- name is stored in the ZENBU Feature name
- score is stored in the Feature significance. On data uploading there is an option to copy the score into an Expression value of a specified DataType.
- the three columns blockCount, blockSizes, blockStarts work together and are interpreted into SubFeatures on the primary Feature. Each of these SubFeatures are created with a FeatureSource category of block. If these columns generate SubFeatures then ZENBU can also interpret the thickStart and thickEnd columns as follows
- if thickStart is not equal to chromStart then the region from chromStart to thickStart is interpreted into a SubFeature of category 5utr
- if thickEnd is not equal to chromEnd then the region from thickEnd to chromEnd is interpreted into a SubFeature of category 3utr
- itemRGB allow manual coloring of features in tracks (loaded from bed12 or bed9 files). If the itemRgb column is empty it is not inserted into the metadata of the features. To visualize the itemRGB metadata-stored color, make sure the metadata is present and make sure "full_feature" is enabled for source outmode.
for example this BED line
chr5 137801180 137805004 NM_001964 0.00 + 137801451 137803770 0 2 576,2558 0,1265
is interpreted into the ZENBU data model (here displayed in a ZENBU XML export/interchange format)
<feature name="NM_001964" start="137801181" end="137805004" strand="+" > <chrom chr="chr5" asm="hg19" ucsc_sm="hg19" ncbi_asm="GRCh37" taxon_id="9606" length="180915260"/> <featuresource category="refgene" name="UCSC_hg19_refgene" feature_count="35067"/> <subfeatures count="4"> <feature category="5utr" start="137801181" end="137801451" strand="+"/> <feature category="block" start="137801181" end="137801757" strand="+"/> <feature category="block" start="137802446" end="137805004" strand="+"/> <feature category="3utr" start="137803770" end="137805004" strand="+"/> </subfeatures> </feature>
BED as OSCtable header
BED files can easily be represented with an OSCtable column header line using the ZENBU extended column namespace.
eedb:chrom eedb:start.0base eedb:end eedb:name eedb:score eedb:strand eedb:bed_thickstart eedb:bed_thickend bed:itemRgb eedb:bed_block_count eedb:bed_block_sizes eedb:bed_block_starts
eedb:chrom eedb:start.0base eedb:end eedb:name eedb:score eedb:strand
BED4 : aka bedGraph
eedb:chrom eedb:start.0base eedb:end eedb:score
eedb:chrom eedb:start.0base eedb:end
The official BED specification is available here http://genome.ucsc.edu/FAQ/FAQformat.html#format1