Yet another operator to simplify data preparation with memisc¶
20 August 2023
The recently published version 0.99.31.6 of the memisc package also contains an
%$$%
operator that simplifies routine data preparation steps that hitherto would
involve calls to the function within()
. It is analogous to the operator %$%
,
which is provided by the “magrittr” package, but is also defined by this
package.
These operators are illustrated by the following code examples.
library(magrittr)
library(memisc)
set.seed(42)
Show code cell output
Loading required package: lattice
Loading required package: MASS
Attaching package: ‘memisc’
The following object is masked from ‘package:magrittr’:
%$%
The following objects are masked from ‘package:stats’:
contr.sum, contr.treatment, contrasts
The following object is masked from ‘package:base’:
as.array
Here we create a simple example data frame:
df <- data.frame(a = 1:7, x = rnorm(7))
df
a x
1 1 1.3709584
2 2 -0.5646982
3 3 0.3631284
4 4 0.6328626
5 5 0.4042683
6 6 -0.1061245
7 7 1.5115220
The following code creates two new variables b
and x.sq
in the data frame using within()
:
df <- within(df,{
b <- a + 4
x.sq <- x^2
})
df
a x x.sq b
1 1 1.3709584 1.87952706 5
2 2 -0.5646982 0.31888402 6
3 3 0.3631284 0.13186224 7
4 4 0.6328626 0.40051508 8
5 5 0.4042683 0.16343288 9
6 6 -0.1061245 0.01126241 10
7 7 1.5115220 2.28469875 11
This is a bit tedious, because we have to write the name of the data frame
(i.e. “df”) twice. Using the operator %<>%
from the magrittr package one
needs to write the name of the data frame only once:
df %<>% within({
b <- a + 4
x.sq <- x^2
})
df
a x x.sq b
1 1 1.3709584 1.87952706 5
2 2 -0.5646982 0.31888402 6
3 3 0.3631284 0.13186224 7
4 4 0.6328626 0.40051508 8
5 5 0.4042683 0.16343288 9
6 6 -0.1061245 0.01126241 10
7 7 1.5115220 2.28469875 11
The magrittr package defines an operator %$%
that can be used as a shorthand
for with()
:
with(df, mean(x))
[1] 0.5159882
df %$% mean(x)
[1] 0.5159882
Thus it does not seem to be far-fetched to use an analogous shorthand for
within()
- which is defined in the most recent version of memisc:
df[c("b","x.sq")] <- NULL
df %$$% {
b <- a + 4
x.sq <- x^2
}
df
a x b x.sq
1 1 1.3709584 5 1.87952706
2 2 -0.5646982 6 0.31888402
3 3 0.3631284 7 0.13186224
4 4 0.6328626 8 0.40051508
5 5 0.4042683 9 0.16343288
6 6 -0.1061245 10 0.01126241
7 7 1.5115220 11 2.28469875
Beside being shorter than a call to within()
, it results in a data frame (or
data set) in which the variables are ordered by their creation - variables
created frist, appear first in the resulting data frame.