An R equivalent to Stata’s ‘replace if’ in memisc¶
07 April 2023
Version 0.99.31.6 of package “memisc” was recently (3rd March 2023) published on
CRAN. One of the
new features of this version is the %if%
operator which allows
to assign values to subsets of observations. To see how it
works, consider the following example:
library(memisc)
Show code cell output
Lade nötiges Paket: lattice
Lade nötiges Paket: MASS
Attache Paket: ‘memisc’
Die folgenden Objekte sind maskiert von ‘package:stats’:
contr.sum, contr.treatment, contrasts
Das folgende Objekt ist maskiert ‘package:base’:
as.array
The following objects are masked from ‘package:stats’:
contr.sum, contr.treatment, contrasts
The following object is masked from ‘package:base’:
as.array
x <- 1:7
(y <- 1) %if% (x > 5)
(y <- 2) %if% (x <= 5)
(y <- 3) %if% (x <= 3)
data.frame(y,x,check.names=FALSE)
y x
1 3 1
2 3 2
3 3 3
4 2 4
5 2 5
6 1 6
7 1 7
I implemented this feature on suggestion from a colleague who missed such a feature for data preparation.
While the %if%
operator is supposed to mimic the functionality of Stata’s
replace if
there are two notable differences:
- The variable to which values are assigned to does not have to exist prior to
the assignment. In this case, the newly created vector variable will have
the same length as the logical vector that contains the logical condition for
the assignment. Its elements will have a missing value (i.e.
NA
) when the assignment condition isFALSE
, otherwhise its elements will equal the right hand side of the assignment. - The right-hand side of the assignment does not have to have the same length as
the left-hand side, alternatively it can have as many elements as the instances
that the condition vector is
TRUE
, or a single element which is recycled to the appropriate length. - Due to the fixed operator precedence in R, both the assignment operation and the logical condition need to be put between parentheses.
Of course, similar results can be obtained using cases()
(from the “memisc” package) or ifelse()
from base R,1 but this syntax should make it easier to translate data preparation scripts from
Stata to R.
For comparison, see the following example with ifelse()
:
y <- ifelse(x <= 3,3,
ifelse(x <= 5,2,1))
data.frame(y,x,check.names=FALSE)
y x
1 3 1
2 3 2
3 3 3
4 2 4
5 2 5
6 1 6
7 1 7
Note that in any variant one has to take into account the order in which the conditions are checked.
y <- cases(x > 5 -> 1,
x > 3 -> 2,
x <= 3 -> 3)
data.frame(y,x,check.names=FALSE)
Warning in cases(1 <- x > 5, 2 <- x > 3, 3 <- x <= 3):
Conditions are not mutually exclusive
y x
1 3 1
2 3 2
3 3 3
4 2 4
5 2 5
6 1 6
7 1 7
- 1
-
In fact
%if%
usesifelse()
internally, albeit with some appropriate length checks.