Yet another operator to simplify data preparation with memisc
The recently published version 0.99.31.6 of the memisc package also contains an
%$$%
operator that simplifies routine data preparation steps that hitherto would
involve calls to the function within()
. It is analogous to the operator %$%
,
which is provided by the "magrittr" package, but is also defined by this
package.
These operators are illustrated by the following code examples.
Here we create a simple example data frame:
df <- data.frame(a = 1:7, x = rnorm(7))
df
a x
1 1 1.3709584
2 2 -0.5646982
3 3 0.3631284
4 4 0.6328626
5 5 0.4042683
6 6 -0.1061245
7 7 1.5115220
The following code creates two new variables b
and x.sq
in the data frame using within()
:
df <- within(df,{
b <- a + 4
x.sq <- x^2
})
df
a x x.sq b
1 1 1.3709584 1.87952706 5
2 2 -0.5646982 0.31888402 6
3 3 0.3631284 0.13186224 7
4 4 0.6328626 0.40051508 8
5 5 0.4042683 0.16343288 9
6 6 -0.1061245 0.01126241 10
7 7 1.5115220 2.28469875 11
This is a bit tedious, because we have to write the name of the data frame
(i.e. "df") twice. Using the operator %<>%
from the magrittr package one
needs to write the name of the data frame only once:
df %<>% within({
b <- a + 4
x.sq <- x^2
})
df
a x x.sq b
1 1 1.3709584 1.87952706 5
2 2 -0.5646982 0.31888402 6
3 3 0.3631284 0.13186224 7
4 4 0.6328626 0.40051508 8
5 5 0.4042683 0.16343288 9
6 6 -0.1061245 0.01126241 10
7 7 1.5115220 2.28469875 11
The magrittr package defines an operator %$%
that can be used as a shorthand
for with()
:
with(df, mean(x))
[1] 0.5159882
df %$% mean(x)
[1] 0.5159882
Thus it does not seem to be far-fetched to use an analogous shorthand for
within()
- which is defined in the most recent version of memisc:
df[c("b","x.sq")] <- NULL
df %$$% {
b <- a + 4
x.sq <- x^2
}
df
a x b x.sq
1 1 1.3709584 5 1.87952706
2 2 -0.5646982 6 0.31888402
3 3 0.3631284 7 0.13186224
4 4 0.6328626 8 0.40051508
5 5 0.4042683 9 0.16343288
6 6 -0.1061245 10 0.01126241
7 7 1.5115220 11 2.28469875
Beside being shorter than a call to within()
, it results in a data frame (or
data set) in which the variables are ordered by their creation - variables
created frist, appear first in the resulting data frame.