Yet another operator to simplify data preparation with memisc

The recently published version 0.99.31.6 of the memisc package also contains an %$$% operator that simplifies routine data preparation steps that hitherto would involve calls to the function within(). It is analogous to the operator %$%, which is provided by the “magrittr” package, but is also defined by this package.

These operators are illustrated by the following code examples.

library(magrittr)
library(memisc) 
set.seed(42)
Hide code cell output
Loading required package: lattice
Loading required package: MASS
Attaching package: ‘memisc’
The following object is masked from ‘package:magrittr’:

    %$%
The following objects are masked from ‘package:stats’:

    contr.sum, contr.treatment, contrasts
The following object is masked from ‘package:base’:

    as.array

Here we create a simple example data frame:

df <- data.frame(a = 1:7, x = rnorm(7))
df
  a          x
1 1  1.3709584
2 2 -0.5646982
3 3  0.3631284
4 4  0.6328626
5 5  0.4042683
6 6 -0.1061245
7 7  1.5115220

The following code creates two new variables b and x.sq in the data frame using within():

df <- within(df,{
    b <- a + 4
    x.sq <- x^2
})
df
  a          x       x.sq  b
1 1  1.3709584 1.87952706  5
2 2 -0.5646982 0.31888402  6
3 3  0.3631284 0.13186224  7
4 4  0.6328626 0.40051508  8
5 5  0.4042683 0.16343288  9
6 6 -0.1061245 0.01126241 10
7 7  1.5115220 2.28469875 11

This is a bit tedious, because we have to write the name of the data frame (i.e. “df”) twice. Using the operator %<>% from the magrittr package one needs to write the name of the data frame only once:

df %<>% within({
    b <- a + 4
    x.sq <- x^2
})
df
  a          x       x.sq  b
1 1  1.3709584 1.87952706  5
2 2 -0.5646982 0.31888402  6
3 3  0.3631284 0.13186224  7
4 4  0.6328626 0.40051508  8
5 5  0.4042683 0.16343288  9
6 6 -0.1061245 0.01126241 10
7 7  1.5115220 2.28469875 11

The magrittr package defines an operator %$% that can be used as a shorthand for with():

with(df, mean(x))
[1] 0.5159882
df %$% mean(x)
[1] 0.5159882

Thus it does not seem to be far-fetched to use an analogous shorthand for within() - which is defined in the most recent version of memisc:

df[c("b","x.sq")] <- NULL

df %$$% {
    b <- a + 4
    x.sq <- x^2
}
df
  a          x  b       x.sq
1 1  1.3709584  5 1.87952706
2 2 -0.5646982  6 0.31888402
3 3  0.3631284  7 0.13186224
4 4  0.6328626  8 0.40051508
5 5  0.4042683  9 0.16343288
6 6 -0.1061245 10 0.01126241
7 7  1.5115220 11 2.28469875

Beside being shorter than a call to within(), it results in a data frame (or data set) in which the variables are ordered by their creation - variables created frist, appear first in the resulting data frame.

© Copyright 2022, Martin Elff. Last updated on 05 Sep 2024. Created using Sphinx 7.2.6. Page source