From ZENBU documentation wiki
Revision as of 19:04, 14 October 2012 by Nicolas.bertin (talk | contribs)
Jump to: navigation, search

Data Stream Processing > Processing Modules > Data normalization and rescaling Modules


The RescalePseudoLog processing module is a simple signal processor which rescale expression level as pseudo log.
Log transformation is convenient to vizualize data whose expression levels varies in a wide range of values, but zero values are common places in genomic data and log(base,0) is not defined.
We would thus need to recurse to pseudocount (typically arbitrarily adding 1 or 0.5 to all values).
Alternatively we can use pseudolog defined as asinh(x/2) / log(base), which has the following nice properties

  • is defined for all real x values
  • pseudolog(base, 0) = 0
  • pseudolog(base, -x) = -1 * pseudolog(base, x)
  • pseudolog(base, x) ~ log(base, x) for x > base
           For information :          
                 pseudolog(10,1)  = 0.2089876;     log10(1)  = 0         
                 pseudolog(10,2)  = 0.3827757;     log10(2)  = 0.3010300
                 pseudolog(10,3)  = 0.5188791;     log10(3)  = 0.4771213 
                 pseudolog(10,4)  = 0.6269629;     log10(4)  = 0.6020600 
                 pseudolog(10,5)  = 0.7153834;     log10(5)  = 0.6989700 
                 pseudolog(10,10) = 1.0042792;     log10(10) = 1         
                 pseudolog(10,100)= 2.0000430;     log10(100)= 2         

                 pseudolog(2,1)  = 0.6942419;     log2(1)  = 0           
                 pseudolog(2,2)  = 1.2715533;     log2(2)  = 1           
                 pseudolog(2,3)  = 1.7236790;     log2(3)  = 1.584963    
                 pseudolog(2,4)  = 2.0827257;     log2(4)  = 2           
                 pseudolog(2,5)  = 2.3764522;     log2(5)  = 2.321928    
                 pseudolog(2,10) = 3.3361433;     log2(10) = 3.321928   
                 pseudolog(2,100)= 6.6440004;     log2(100)= 6.643856    


The sole optional parameter is the base of the pseudo log rescaling. By default, the rescaling base is set to 10


Simple script...

	<spstream module="RescalePseudoLog">

An example of the usefulness of pseudolog rescaling is presented in the View "pseudolog rescaling example.1 " , with the wold lab ENCODE RNA-seq displayed as a heatmap and with expression collated and RPKM normalized onto gencodeV10 models and is rescaled as pseudo_log2 and pseudo_log10, providing a good overview of expression pattern dynamics throughout a wide range of expression levels.

Pseudolog rescaling.1.png

In the center of this view with a light yellow background RNAseq reads from the wold lab ENCODE dataset (unstranded protocol) and Gencode V10 transcript models.
On both sides, in increasingly darker shade of blue the RNAseq signal as a heatmap and as expression levels of gencode V10 transcripts (obtained by collating the RNAseq reads and normalized as RPKM), untouched, pseudo-log2, and pseudo-log20 rescaled.