GFF and GTF file support
The GFF (General Feature Format) format consists of one line per feature, each containing 9 columns of data, plus optional track definition lines. The following documentation is based on the Version 2 specifications.
The GTF (General Transfer Format) is nearly identical to GFF version 2.
ZENBU currently provides full support for parsing GFF, GFF2, and GTF files. GFF3 file parsing is mostly supported except for the linking of parents and children (Parent= ID= attributes) to create transcripts with subfeatures. Full GFF3 support will come in a future version of ZENBU.
Fields must be tab-separated. Also, all but the final field in each feature line must contain a value; "empty" columns should be denoted with a '.'
- seqname - name of the chromosome or scaffold; chromosome names can be given with or without the 'chr' prefix.
- source - name of the program that generated this feature, or the data source (database or project name)
- feature - feature type name, e.g. Gene, Variation, Similarity
- start - Start position of the feature, with sequence numbering starting at 1.
- end - End position of the feature, with sequence numbering starting at 1.
- score - A floating point value.
- strand - defined as + (forward) or - (reverse).
- frame - One of '0', '1' or '2'. '0' indicates that the first base of the feature is the first base of a codon, '1' that the second base is the first base of a codon, and so on..
- attribute - A semicolon-separated list of tag-value pairs, providing additional information about each feature. Format is tag=value or tag <space> value.
Sample GFF output from Ensembl export:
X Ensembl Repeat 2419108 2419128 42 . . hid=trf; hstart=1; hend=21 X Ensembl Repeat 2419108 2419410 2502 - . hid=AluSx; hstart=1; hend=303 X Ensembl Repeat 2419108 2419128 0 . . hid=dust; hstart=2419108; hend=2419128 X Ensembl Pred.trans. 2416676 2418760 450.19 - 2 genscan=GENSCAN00000019335 X Ensembl Variation 2413425 2413425 . + . X Ensembl Variation 2413805 2413805 . + .
ZENBU interpretation of GFF / GTF files
The GFF/GTF file format easily maps into the ZENBU data model. Both ZENBU and GFF use a 1base-exclusive coordinate system so there is adjust between coordinate spaces.
- seqname : is mapped to Feature chromosome
- source : is not interpreted but simply stored as metadata on the Feature with the tag gff:source
- feature is interpreted as a FeatureSources category multiplexer. This allows a complex GFF file with many different feature / category types to be organized into separate ZENBU FeatureSources after loading.
- start : is Feature chrom_start 1base-exclusive coordinate system
- end : is Feature chrom_end
- score is stored in the Feature significance. On data uploading there is an option to copy the score into an Expression value of a specified DataType.
- strand : is Feature strand
- frame : is not interpreted but simply stored as metadata on the Feature
- attributes : is parsed into Metadata attached to the Feature. In the future ZENBU will support the GFF3 special tags for extended parsing 
- ID= is used for feature/subfeature linking, not stored into metadata. not currently supported
- Parent= to used for feature/subfeature linking. not stored into metadata. not currently supported
- Name= is stored as the name of the Feature, not stored into metadata. not currently supported
ZENBU supports all variations of attributes formatting from GFF, GFF2, GTF, GTF2 and GFF3 variations
some_tag=some_value; some_tag="some value"; some_tag some_value; some_tag "some value"; some_tag="value 1","value 2","value 3"; some_tag "value 1","value 2","value 3"; some_tag=value1,value2,value3; some_tag value1,value2,value3;
GFF as OSCtable header
GFF files can easily be represented with an OSCtable column header line using the ZENBU extended column namespace.
eedb:chrom gff:source eedb:fsrc_category eedb:start.1base eedb:end eedb:score eedb:strand gff:frame gff:attributes
The gff:attributes column has a complete ZENBU parser attached to it. The parser can interpret this column in either the older GFF/GTF tag<space>value format or the GFF2/GFF3 style tag=value format. In the future, this gff:attributes parser will be expanded to parse the special GFF3 specification tags for 'feature names' and the GFF3 style of storing feature/subfeature relationships. Currently all data in the gff:attributes is parsed into metadata of the Feature.
gff:source and gff:frame are currently not interpreted but simply stored as Metadata.
GFF GTF specifications
For more information about this file format, see the documentation on these external websites.