Metadata searching

From ZENBU documentation wiki
Jump to navigation Jump to search

One of the primary systems in ZENBU is that of metadata and metadata searching.

Search is modeled on the google/yahoo approach of prefix-based multiple-keyword searching. ZENBU also provides additional logic elements to fine tune one's queries.

  • and : by default space separating keywords in a search is interpreted as an and operation. This operates in the same way as set intersection [1]
  • or : is used for combining queries in the same way as set unioning [2]
  • not <keyword or ()phrase> : will exclude any items which match the phrase for the results. For example "not spliced" will return experiments which do not have the keyword spliced.
  •  !  : short hand for not
  • ( )  : nesting of parenthesis logic are supported.

It is good practice to always give a good description when saving configurations (view/track/script) or uploading data. ZENBU performs automatic keyword extraction from all metadata providing a wealth of ways to search the system.

Another good practice is to take advantage of the rich Metadata editing system to add key:value metadata to your configurations and uploaded datasets.

With rich metadata the search system provides the ability to apply searches to only specific metadata types (keys).

  • key:=value : means an exact match of the value for the given key
  • key~=value : means a like match of the value for the given key

Here is a complex example searching for specific Encode and FANTOM5 hepatocyte datasets in the Data Explorer experiment interface section.

(encode hg19 cshl rnaseq hepg2 nucleus !splice !spike) or (hepatocyte !sinusoidal fantom5 hg19 donor )

https://fantom.gsc.riken.jp/zenbu/dex/#section=DataSources;collab=all;datasource=experiments;search=(encode%20hg19%20cshl%20rnaseq%20hepg2%20nucleus%20!splice%20!spike)%20or%20(hepatocyte%20!sinusoidal%20fantom5%20hg19%20donor%20)

Another search using specific key:value pairs

enc:cell_karyotype:=cancer AND enc:rnaExtract:=longPolyA AND enc:localization:=nucleus

https://fantom.gsc.riken.jp/zenbu/dex/#section=DataSources;collab=all;datasource=experiments;search=enc:cell_karyotype:=cancer%20and%20enc:rnaExtract:=longPolyA%20and%20enc:localization:=nucleus

Other general comments to help users with searching:

  • If a search is performed with too many terms, it may fail to return any results. This is the same behavior that google or yahoo has.
  • keywords are generally extracted from free-form metadata like names and descriptions but also controlled metadata like the genome assembly, controlled vocabulary, and ontology metadata.
  • for OSCtable files, keywords are extracted from the ParameterValue and ExperimentMetadata sections.
  • any metadata added into the system via the Metadata editing system becomes immediately available for searching