The RescalePseudoLog processing module is a simple signal processor which rescale expression level as pseudo log.
Log transformation is convenient to vizualize data whose expression levels varies in a wide range of values, but zero values are common places in genomic data and log(base,0) is not defined.
We would thus need to recurse to pseudocount (typically arbitrarily adding 1 or 0.5 to all values).
Alternatively we can use pseudolog defined as asinh(x/2) / log(base), which has the following nice properties
- is defined for all real x values
- pseudolog(base, 0) = 0
- pseudolog(base, -x) = -1 * pseudolog(base, x)
- pseudolog(base, x) ~ log(base, x) for x > base
For information : pseudolog(10,1) = 0.2089876; log10(1) = 0 pseudolog(10,2) = 0.3827757; log10(2) = 0.3010300 pseudolog(10,3) = 0.5188791; log10(3) = 0.4771213 pseudolog(10,4) = 0.6269629; log10(4) = 0.6020600 pseudolog(10,5) = 0.7153834; log10(5) = 0.6989700 pseudolog(10,10) = 1.0042792; log10(10) = 1 pseudolog(10,100)= 2.0000430; log10(100)= 2 pseudolog(2,1) = 0.6942419; log2(1) = 0 pseudolog(2,2) = 1.2715533; log2(2) = 1 pseudolog(2,3) = 1.7236790; log2(3) = 1.584963 pseudolog(2,4) = 2.0827257; log2(4) = 2 pseudolog(2,5) = 2.3764522; log2(5) = 2.321928 pseudolog(2,10) = 3.3361433; log2(10) = 3.321928 pseudolog(2,100)= 6.6440004; log2(100)= 6.643856
The sole optional parameter is the base of the pseudo log rescaling. By default, the rescaling base is set to 10
<zenbu_script> <stream_queue> <spstream module="RescalePseudoLog"> <base>10</base> </spstream> </stream_queue> </zenbu_script>
An example of the usefulness of pseudolog rescaling is presented in the View "pseudolog rescaling example.1 " , with the wold lab ENCODE RNA-seq displayed as a heatmap and with expression collated and RPKM normalized onto gencodeV10 models and is rescaled as pseudo_log2 and pseudo_log10, providing a good overview of expression pattern dynamics throughout a wide range of expression levels.
In the center of this view with a light yellow background RNAseq reads from the wold lab ENCODE dataset (unstranded protocol) and Gencode V10 transcript models.
On both sides, in increasingly darker shade of blue the RNAseq signal as a heatmap and as expression levels of gencode V10 transcripts (obtained by collating the RNAseq reads and normalized as RPKM), untouched, pseudo-log2, and pseudo-log20 rescaled.