winsorize

Syntax

winsorize(X, limit, [inclusive=true], [nanPolicy=’upper’])

Arguments

X is a vector.

limit is a scalar or a vector with 2 elements indicating the percentages to cut on each side of X, with respect to the number of unmasked data, as floats between 0 and 1. If limit is a scalar, it means the percentages to cut on both sides of X. If limit has n elements (including NULLs), the (n * limit[0])-th smallest element and the (n * limit[1])-th largest element are masked, and the total number of unmasked data after trimming is n * (1-sum(limit)). The value of one element of limit can be set to 0 to indicate no masking is conducted on this side.

inclusive is a Boolean type scalar or a vector of 2 elements indicating whether the number of data being masked on each side should be truncated (true) or rounded (false).

nanPolicy is a string indicating how to handle NULL values. The following options are available (default is ‘upper’):

  • ‘upper’: allows NULL values and treats them as the largest values of X.

  • ‘lower’: allows NULL values and treats them as the smallest values of X.

  • ‘raise’: throws an error.

  • ‘omit’: performs the calculations without masking NULL values.

Details

Return a winsorized version of the input array.

Examples

$ x=1..10
winsorize(x, 0.1);
[2,2,3,4,5,6,7,8,9,9]

$ winsorize(x, 0.12 0.17);
[2,2,3,4,5,6,7,8,9,9]

$ winsorize(x, 0.12 0.17, inclusive=false);
[2,2,3,4,5,6,7,8,8,8]


$ x=1..20;
$ x[19:]=NULL;
$ x;
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,]

$ winsorize(x, 0.1);
[3,3,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,18,18]

$ winsorize(x, 0.1, nanPolicy='upper');
[3,3,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,18,18]

$ winsorize(x, 0.1, nanPolicy='lower');
[2,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,17,17,2]