Introduction to the ‘memisc’ Package¶
Description¶
This package collects an assortment of tools that are intended to make work with R
easier for the author of this package and are submitted to the public in the hope that
they will be also be useful to others.
The tools in this package can be grouped into four major categories:
- Data preparation and management
- Data analysis
- Presentation of analysis results
- Programming
Data preparation and management¶
Survey Items¶
memisc
provides facilities to work with what users from other packages like SPSS,
SAS, or Stata know as ‘variable labels’, ‘value labels’ and ‘user-defined missing
values’. In the context of this package these aspects of the data are represented by the
"description"
, "labels"
, and "missing.values"
attributes of a data vector.
These facilities are useful, for example, if you work with survey data that contain coded
items like vote intention that may have the following structure:
Question: ‘’If there was a parliamentary election next tuesday, which party would you vote for?’’
1 |
Conservative Party |
2 |
Labour Party |
3 |
Liberal Democrat Party |
4 |
Scottish Nation Party |
5 |
Plaid Cymru |
6 |
Green Party |
7 |
British National Party |
8 |
Other party |
96 |
Not allowed to vote |
97 |
Would not vote |
98 |
Would vote, do not know yet for which party |
A statistical package like SPSS allows to attach labels like ‘Conservative Party’,
‘Labour Party’, etc. to the codes 1,2,3, etc. and to mark mark the codes 96, 97, 98, 99
as ‘missing’ and thus to exclude these variables from statistical analyses. memisc
provides similar facilities. Labels can be attached to codes by calls like labels(x)
<- something
and expendanded by calls like labels(x) <- labels(x) + something
,
codes can be marked as ‘missing’ by calls like missing.values(x) <- something
and
missing.values(x) <- missing.values(x) + something
.
memisc
defines a class called “data.set”, which is similar to the class “data.frame”.
The main difference is that it is especially geared toward containing survey item data.
Transformations of and within “data.set” objects retain the information about value
labels, missing values etc. Using as.data.frame
sets the data up for R’s
statistical functions, but doing this explicitely is seldom necessary. See data.set
.
More Convenient Import of External Data¶
Survey data sets are often relative large and contain up to a few thousand variables.
For specific analyses one needs however only a relatively small subset of these
variables. Although modern computers have enough RAM to load such data sets completely
into an R session, this is not very efficient having to drop most of the variables after
loading. Also, loading such a large data set completely can be time-consuming, because R
has to allocate space for each of the many variables. Loading just the subset of
variables really needed for an analysis is more efficient and convenient - it tends to be
much quicker. Thus this package provides facilities to load such subsets of variables,
without the need to load a complete data set. Further, the loading of data from SPSS
files is organized in such a way that all informations about variable labels, value
labels, and user-defined missing values are retained. This is made possible by the
definition of importer
objects, for which a subset
method exists. importer
objects contain only the information about the variables in the external data set but not
the data. The data itself is loaded into memory when the functions subset
or
as.data.set
are used.
Recoding¶
memisc
also contains facilities for recoding survey items. Simple recodings, for
example collapsing answer categories, can be done using the function recode
. More
complex recodings, for example the construction of indices from multiple items, and
complex case distinctions, can be done using the function cases
. This function may
also be useful for programming, in so far as it is a generalization of ifelse
.
Code Books¶
There is a function codebook
which produces a code book of an external data set or an
internal “data.set” object. A codebook contains in a conveniently formatted way concise
information about every variable in a data set, such as which value labels and missing
values are defined and some univariate statistics.
An extended example of all these facilities is contained in the vignette “anes48”, and in
demo(anes48)
Data Analysis¶
Tables and Data Frames of Descriptive Statistics¶
genTable
is a generalization of xtabs
: Instead of counts, also descriptive
statistics like means or variances can be reported conditional on levels of factors. Also
conditional percentages of a factor can be obtained using this function.
In addition an Aggregate
function is provided, which has the same syntax as
genTable
, but gives a data frame of descriptive statistics instead of a table
object.
Per-Subset Analysis¶
By
is a variant of the standard function by
: Conditioning factors are specified
by a formula and are obtained from the data frame the subsets of which are to be
analysed. Therefore there is no need to attach
the data frame or to use the dollar
operator.
Presentation of Results of Statistical Analysis¶
Publication-Ready Tables of Coefficients¶
Journals of the Political and Social Sciences usually require that estimates of regression models are presented in the following form:
==================================================
Model 1 Model 2 Model 3
--------------------------------------------------
Coefficients
(Intercept) 30.628*** 6.360*** 28.566***
(7.409) (1.252) (7.355)
pop15 -0.471** -0.461**
(0.147) (0.145)
pop75 -1.934 -1.691
(1.041) (1.084)
dpi 0.001 -0.000
(0.001) (0.001)
ddpi 0.529* 0.410*
(0.210) (0.196)
--------------------------------------------------
Summaries
R-squared 0.262 0.162 0.338
adj. R-squared 0.230 0.126 0.280
N 50 50 50
==================================================
Such tables of coefficient estimates can be produced by mtable
. To see some of the
possibilities of this function, use example(mtable)
.
LaTeX Representation of R Objects¶
Output produced by mtable
can be transformed into LaTeX tables by an appropriate
method of the generic function toLatex
which is defined in the package utils
. In
addition, memisc
defines toLatex
methods for matrices and ftable
objects.
Note that results produced by genTable
can be coerced into ftable
objects. Also,
a default method for the toLatex
function is defined which coerces its argument to a
matrix and applies the matrix method of toLatex
.
Programming¶
Looping over Variables¶
Sometimes users want to contruct loops that run over variables rather than values. For
example, if one wants to set the missing values of a battery of items. For this purpose,
the package contains the function foreach
. To set 8 and 9 as missing values for the
items knowledge1
, knowledge2
, knowledge3
, one can use
foreach(x=c(knowledge1,knowledge2,knowledge3),
missing.values(x) <- 8:9)
Changing Names of Objects and Labels of Factors¶
R
already makes it possible to change the names of an object. Substituting the
names
or dimnames
can be done with some programming tricks. This package defines
the function rename
, dimrename
, colrename
, and rowrename
that implement
these tricks in a convenient way, so that programmers (like the author of this package)
need not reinvent the weel in every instance of changing names of an object.
Dimension-Preserving Versions of lapply
and sapply
¶
If a function that is involved in a call to sapply
returns a result an array or a
matrix, the dimensional information gets lost. Also, if a list object to which lapply
or sapply
are applied have a dimension attribute, the result looses this information.
The functions Lapply
and Sapply
defined in this package preserve such dimensional
information.
Combining Vectors and Arrays by Names¶
The generic function collect
collects several objects of the same mode into one
object, using their names, rownames
, colnames
and/or dimnames
. There are
methods for atomic vectors, arrays (including matrices), and data frames. For example
a <- c(a=1,b=2)
b <- c(a=10,c=30)
collect(a,b)
leads to
x y
a 1 10
b 2 NA
c NA 30
Reordering of Matrices and Arrays¶
The memisc
package includes a reorder
method for arrays and matrices. For
example, the matrix method by default reorders the rows of a matrix according the results
of a function.