# Difference between revisions of "RescalePseudoLog"

(→Description) |
(→Example) |
||

(11 intermediate revisions by the same user not shown) | |||

Line 1: | Line 1: | ||

+ | __NOTOC__ | ||

+ | [[DataProcessing | Data Stream Processing]] | ||

+ | > | ||

+ | [[DataProcessing#Processing_modules | Processing Modules]] | ||

+ | > | ||

+ | [[DataProcessing#Data_normalization_and_rescaling | Data normalization and rescaling Modules]] | ||

+ | |||

==Description== | ==Description== | ||

− | The '''RescalePseudoLog''' [[DataProcessing#Processing_modules|processing module]] is a simple signal | + | The '''RescalePseudoLog''' [[DataProcessing#Processing_modules|processing module]] is a simple signal processor which rescale expression level as pseudo log.<br> |

Log transformation is convenient to vizualize data whose expression levels varies in a wide range of values, but zero values are common places in genomic data and log(base,0) is not defined.<br> | Log transformation is convenient to vizualize data whose expression levels varies in a wide range of values, but zero values are common places in genomic data and log(base,0) is not defined.<br> | ||

− | We would thus need to recurse to pseudocount (typically arbitrarily adding 0.5).<br> | + | We would thus need to recurse to pseudocount (typically arbitrarily adding 1 or 0.5 to all values).<br> |

Alternatively we can use pseudolog defined as '''asinh(x/2) / log(base)''', which has the following nice properties | Alternatively we can use pseudolog defined as '''asinh(x/2) / log(base)''', which has the following nice properties | ||

* is defined for all real x values | * is defined for all real x values | ||

* pseudolog(base, 0) = 0 | * pseudolog(base, 0) = 0 | ||

− | * pseudolog(base, -x) = -1* pseudolog(base, x) | + | * pseudolog(base, -x) = -1 * pseudolog(base, x) |

* pseudolog(base, x) ~ log(base, x) for x > base | * pseudolog(base, x) ~ log(base, x) for x > base | ||

<pre> | <pre> | ||

Line 45: | Line 52: | ||

</zenbu_script> | </zenbu_script> | ||

</pre> | </pre> | ||

+ | |||

+ | An example of the usefulness of pseudolog rescaling is presented in the View [http://fantom.gsc.riken.jp/zenbu/gLyphs/#config=ATMu21bIIZvgCegdIRpteD;loc=hg19::chr8:124010566..124067231 "pseudolog rescaling example.1 " ], with the wold lab ENCODE RNA-seq displayed as a heatmap and with expression collated and RPKM normalized onto gencodeV10 models and is rescaled as pseudo_log2 and pseudo_log10, providing a good overview of expression pattern dynamics throughout a wide range of expression levels. | ||

+ | <br>http://fantom.gsc.riken.jp/zenbu/gLyphs/#config=ATMu21bIIZvgCegdIRpteD;loc=hg19::chr8:124010566..124067231 | ||

+ | |||

+ | [[image:pseudolog_rescaling.1.png|400px]] | ||

+ | |||

+ | In the center of this view with a light yellow background RNAseq reads from the wold lab ENCODE dataset (unstranded protocol) and Gencode V10 transcript models. | ||

+ | <br>On both sides, in increasingly darker shade of blue the RNAseq signal as a heatmap and as expression levels of gencode V10 transcripts (obtained by collating the RNAseq reads and normalized as RPKM), untouched, pseudo-log2, and pseudo-log20 rescaled. |

## Latest revision as of 19:14, 14 October 2012

Data Stream Processing > Processing Modules > Data normalization and rescaling Modules

## Description

The **RescalePseudoLog** processing module is a simple signal processor which rescale expression level as pseudo log.

Log transformation is convenient to vizualize data whose expression levels varies in a wide range of values, but zero values are common places in genomic data and log(base,0) is not defined.

We would thus need to recurse to pseudocount (typically arbitrarily adding 1 or 0.5 to all values).

Alternatively we can use pseudolog defined as **asinh(x/2) / log(base)**, which has the following nice properties

- is defined for all real x values
- pseudolog(base, 0) = 0
- pseudolog(base, -x) = -1 * pseudolog(base, x)
- pseudolog(base, x) ~ log(base, x) for x > base

For information : pseudolog(10,1) = 0.2089876; log10(1) = 0 pseudolog(10,2) = 0.3827757; log10(2) = 0.3010300 pseudolog(10,3) = 0.5188791; log10(3) = 0.4771213 pseudolog(10,4) = 0.6269629; log10(4) = 0.6020600 pseudolog(10,5) = 0.7153834; log10(5) = 0.6989700 pseudolog(10,10) = 1.0042792; log10(10) = 1 pseudolog(10,100)= 2.0000430; log10(100)= 2 pseudolog(2,1) = 0.6942419; log2(1) = 0 pseudolog(2,2) = 1.2715533; log2(2) = 1 pseudolog(2,3) = 1.7236790; log2(3) = 1.584963 pseudolog(2,4) = 2.0827257; log2(4) = 2 pseudolog(2,5) = 2.3764522; log2(5) = 2.321928 pseudolog(2,10) = 3.3361433; log2(10) = 3.321928 pseudolog(2,100)= 6.6440004; log2(100)= 6.643856

## Parameters

The sole optional parameter is the base of the pseudo log rescaling. By default, the rescaling base is set to 10

## Example

Simple script...

<zenbu_script> <stream_queue> <spstream module="RescalePseudoLog"> <base>10</base> </spstream> </stream_queue> </zenbu_script>

An example of the usefulness of pseudolog rescaling is presented in the View "pseudolog rescaling example.1 " , with the wold lab ENCODE RNA-seq displayed as a heatmap and with expression collated and RPKM normalized onto gencodeV10 models and is rescaled as pseudo_log2 and pseudo_log10, providing a good overview of expression pattern dynamics throughout a wide range of expression levels.

http://fantom.gsc.riken.jp/zenbu/gLyphs/#config=ATMu21bIIZvgCegdIRpteD;loc=hg19::chr8:124010566..124067231

In the center of this view with a light yellow background RNAseq reads from the wold lab ENCODE dataset (unstranded protocol) and Gencode V10 transcript models.

On both sides, in increasingly darker shade of blue the RNAseq signal as a heatmap and as expression levels of gencode V10 transcripts (obtained by collating the RNAseq reads and normalized as RPKM), untouched, pseudo-log2, and pseudo-log20 rescaled.