Yet another operator to simplify data preparation with memisc

20 August 2023

The recently published version 0.99.31.6 of the memisc package also contains an %$$% operator that simplifies routine data preparation steps that hitherto would involve calls to the function within(). It is analogous to the operator %$%, which is provided by the “magrittr” package, but is also defined by this package.

These operators are illustrated by the following code examples.

library(magrittr)
library(memisc) 
set.seed(42)

Last executed at 21 November 2023 at 10:47:17 GMT+1 in 2120ms

Show cell output

Loading required package: lattice
Loading required package: MASS

Attaching package: ‘memisc’

The following object is masked from ‘package:magrittr’:

    %$%

The following objects are masked from ‘package:stats’:

    contr.sum, contr.treatment, contrasts

The following object is masked from ‘package:base’:

    as.array

Here we create a simple example data frame:

df <- data.frame(a = 1:7, x = rnorm(7))
df

Last executed at 21 November 2023 at 10:47:17 GMT+1 in 309ms

  a          x
1 1  1.3709584
2 2 -0.5646982
3 3  0.3631284
4 4  0.6328626
5 5  0.4042683
6 6 -0.1061245
7 7  1.5115220

The following code creates two new variables b and x.sq in the data frame using within():

df <- within(df,{
    b <- a + 4
    x.sq <- x^2
})
df

Last executed at 21 November 2023 at 10:47:17 GMT+1 in 43ms

  a          x       x.sq  b
1 1  1.3709584 1.87952706  5
2 2 -0.5646982 0.31888402  6
3 3  0.3631284 0.13186224  7
4 4  0.6328626 0.40051508  8
5 5  0.4042683 0.16343288  9
6 6 -0.1061245 0.01126241 10
7 7  1.5115220 2.28469875 11

This is a bit tedious, because we have to write the name of the data frame (i.e. “df”) twice. Using the operator %<>% from the magrittr package one needs to write the name of the data frame only once:

df %<>% within({
    b <- a + 4
    x.sq <- x^2
})
df

Last executed at 21 November 2023 at 10:47:17 GMT+1 in 45ms

  a          x       x.sq  b
1 1  1.3709584 1.87952706  5
2 2 -0.5646982 0.31888402  6
3 3  0.3631284 0.13186224  7
4 4  0.6328626 0.40051508  8
5 5  0.4042683 0.16343288  9
6 6 -0.1061245 0.01126241 10
7 7  1.5115220 2.28469875 11

The magrittr package defines an operator %$% that can be used as a shorthand for with():

with(df, mean(x))

Last executed at 21 November 2023 at 10:47:17 GMT+1 in 34ms

[1] 0.5159882

df %$% mean(x)

Last executed at 21 November 2023 at 10:47:17 GMT+1 in 32ms

[1] 0.5159882

Thus it does not seem to be far-fetched to use an analogous shorthand for within() - which is defined in the most recent version of memisc:

df[c("b","x.sq")] <- NULL

df %$$% {
    b <- a + 4
    x.sq <- x^2
}
df

Last executed at 21 November 2023 at 10:47:17 GMT+1 in 39ms

  a          x  b       x.sq
1 1  1.3709584  5 1.87952706
2 2 -0.5646982  6 0.31888402
3 3  0.3631284  7 0.13186224
4 4  0.6328626  8 0.40051508
5 5  0.4042683  9 0.16343288
6 6 -0.1061245 10 0.01126241
7 7  1.5115220 11 2.28469875

Beside being shorter than a call to within(), it results in a data frame (or data set) in which the variables are ordered by their creation - variables created frist, appear first in the resulting data frame.