Logistic Regression, Average Marginal Effects, and the Linear Probability Model - Part II: Coefficients and AMEs of nested models
As mentioned in the previous post, one of the claims made by Mood (2010) is that coefficients of nested models are not comparable, because tend to increase when predictor variables are added to a model, even if these predictor variables are uncorrelated. This post examines how big a problem this is and whether it can be avoided by the use of average marginal effects. This investigation is based on a few simulation studies and an empirical example.
A simulation study
The first simulation study examines the change in coefficient estimates of to nested models that involve two normal distributed predictor variables. This simulation study repeatedly fits two logistic regression models and two linear regression models to artifical data. These data are created such that two predictor variables of i.i.e. random numbers affect a binary response variable. Of the two logistic regression models and the two linear regression models, one includes only one predictor variable and the other includes both. For each logistic regression model fit to an artificial data data set also the average marginal effect is computed for the predictor that is present in both models.
The following two lines of code activate the “memisc” package and run a code file in which the simulation infrastructure is defined.
The following line defines the logistic function.
logistic <- function(x) 1/(1+exp(-x))
The next few lines define a function for the computation of the average marginal effects of the predictor variables in a logistic regression model.
# Note that this function is non-generic is correct only for
# logistic regression or logit models
AME <- function(mod) {
p <- predict(mod,type="response")
cf <- coef(mod)
cf <- cf[names(cf)!="(Intercept)"]
unname(mean(p*(1-p),na.rm=TRUE))*cf
}
The following lines define the function that is evaluated in each repliation of
the simulation study. The arguments a, b1, b2 are the intercept and slopes
of a “true” logistic regression models, according to which binary responses are
computed based on the values of two predictor variables x1 and x2. The
arguments mu.x1 and mu.x are the expected values of x1 and x2, and n
is the simulation sample size. The sample size is set to a large number by
default, so that the results of the simulation study mainly reflect the
structural features of the model estimates and the average marginal effects,
withouth being swamped by sampling error. The objects glm0 and glm1 are the
fitted logistic regression models, while lm0 and lm1 are the fitted linear
regression models. The number of iteration in the computation of the ML
estimates for the logistic regression model is set to maximum of 6, so that the
(occassionally occuring) cases of separation do not lead to runaway coefficient estimates.
fun <- function(a=0,b1=1,b2=1,mu.x1=0,mu.x2=0,n=5000) {
x1 <- rnorm(n=n,mean=mu.x1)
x2 <- rnorm(n=n,mean=mu.x2)
p <- logistic(a+b1*x1+b2*x2)
y <- rbinom(n=n,size=1,prob=p)
glm1 <- glm(y~x1, family = binomial,maxit=6)
glm2 <- glm(y~x1+x2, family = binomial,maxit=6)
lm1 <- lm(y~x1)
lm2 <- lm(y~x1+x2)
c(
b_glm1=coef(glm1)[-1],
b_glm2=coef(glm2)[-1],
AME_glm1=AME(glm1),
AME_glm2=AME(glm2),
b_lm1=coef(lm1)[-1],
b_lm2=coef(lm2)[-1]
)
}
The following code runs a simulation with various settings for the parameters b1
and b2, the coeffients of the data response generating model. The settings
comprise zero and both small and large values for the coefficients.
simres <- simsim(fun,
conditions=list(
b1=c(0,0.1,.3,0.5,1,1.5,2,2.5,3),
b2=c(0,0.1,.3,0.5,1,1.5,2,2.5,3)
),
nsim=100)
There were 50 or more warnings (use warnings() to see the first 50)
The simulation results are collected into an array, which in the following line of code is transferred into a data frame.
simres.df <- as.data.frame(simres)
For the visualization of the simulation results, the package “ggplot2” is activated, and a particular theme for the overal appearance is set.
library(ggplot2)
theme_set(theme_bw())
The following diagram plots the distribution of the differences between the
estimates of b1 from a logistic regression fit with x1 only (b_glm1.x1)
and and with both x1 and x2 (b_glm2.x1). The difference b_glm2.x1 - b_glm1.x1 expresses how much the estimate of b1 changes when the variable
x2 is added to the model. The diagram makes clear that (almost) no change
occurs on average if either b1 or b2 (or both) are equal to 0, while the change gets
the larger the larger the values of b1 and b2.
ggplot(simres.df) +
geom_boxplot(aes(b2,b_glm2.x1 - b_glm1.x1,group=b2)) +
facet_wrap(~b1, labeller = label_both,scales="free_y")
In the same way as the previous diagram plots the difference between the
estimates of the coefficient of x1 from glm0 and glm1, the following
diagram plots the differences between the avarage marginal effect of x1
according to these models. It becomes clear that at least on average, adding the
variable x2 to the model with x1 does not change, on average, the AME of
x1.
ggplot(simres.df) +
geom_boxplot(aes(b2,AME_glm2.x1 - AME_glm1.x1,group=b2)) +
facet_wrap(~b1, labeller = label_both)
Is the change in the coefficient estimate of x1 the consequence of a bias in
model with both variables x1 and x2 (i.e. glm1) or with only x1
(i.e. glm0)? This can be gleaned from the following diagram, which shows the
distribution of the difference between estimate of the coefficient b1 in the
full model glm1 and its “true” value (i.e. the value from which the
observations are generated).
ggplot(simres.df) +
geom_boxplot(aes(b2,b_glm2.x1-b1,group=b2)) +
facet_wrap(~b1, labeller = label_both)
The plot shows that on average, the difference between the estimate b_glm2.x1
and the true parameter value b1 is zero. This should not surprise, however,
since the maximum likelihood estimator of logistic regression coefficient is
consistent and asymptotically unbiased for correctly specified models (i.e. for
a model like glm1) and the sample size is large (with n equal to 5000). This
means that, conversely, the estimate b_glm1.x1 is biased.
In the empirical example in the previous blog post we saw that the average marginal effect of a variable can be quite different from its maximum variable if predicted probabilities are mainly close to 0 and 1. The following diagram examines how the average marginal effect varies with the true coefficients of the generating model:
# Here the AME's come from glm2, which has the
# model formula y~x1+x2
ggplot(simres.df) +
geom_boxplot(aes(b2,AME_glm2.x1,group=b2)) +
facet_wrap(~b1, labeller = label_both,scales="free_y")
This shows that the AME of x1 in the full model glm2 not only increases
with the coefficient of x1 but also decreases with the coefficient of the other variable,
i.e. x1.
But consider now that the AME does not change when a variable is added to a model or removed from it. This implies that the AME of x1 decreases with coefficient of x2, whether or
not x2 is in the model! Even though this may be hard to believe, the next diagram shows
that this is the case:
# Note that here the AME's come from glm1, which has the
# model formula y~x1
ggplot(simres.df) +
geom_boxplot(aes(b2,AME_glm1.x1,group=b2)) +
facet_wrap(~b1, labeller = label_both,scales="free_y")
A claim by Moode which gained a log of attention is that if the influence of the predictor variables in a logistic regressions are not “too strong” so that the link between the predictor variables and the predicted probability is (almost) linear, then the AME from a logistic regression and the coefficient from a linear regression should be (almost) identical. The following code creates a diagram that provides an impression of under what conditions is this (more or less) the case.
ggplot(simres.df) +
geom_boxplot(aes(b2,b_lm2.x1-AME_glm2.x1,group=b2)) +
facet_wrap(~b1, labeller = label_both,scales="free_y")
The diagram just created contains the differences between the AMEs from logistic regression and OLS estimates from linear regression. It shows that, indeed, AMEs and linear regression coefficients are very close and that there is hardly any systematic variation in the difference between the values of these two quantities.
A real-world example
The following explores the issue of comparing coefficients across nested models with “real-world” data. We use the same data as in the previous post. The data are from a survey conducted on occasion of the 1988 Chilean plebiscite, available from the R package “carData”.
The following lines activate two packages, “carData” and “memisc”.
The next few lines create a binary variable for the votes, where 1 represents a
vote for Pinochet and 0 represents a vote against him.
Chile %$$% {
vote2 <- factor(vote, levels = c("N","Y"))
vote.bin <- as.integer(vote2 == "Y")
}
The following lines of code define a couple logistic regression models with the
binary vote variable (vote.bin) as response, and age, sex, and statusquo
as predictor variables. There are several nesting relations among these models.
Both glm_1 and glm_2are nested in glm_3, glm_1 and glm_4 are nested in
glm_5, glm_2 and glm_4 are nested in glm_6, and finally glm_1,
glm_2, glm_3, glm_4, glm_5, and glm_6 nested in glm_7. Comparing
these models allow us to see how much coefficient estimates are affected when
variables are nested to a model, and how this varies if the existing or the
added variables have a “strong” impact, i.e. lead to many of the predicted
probabilities getting closer to 0 and 1.
glm_1 <- glm(vote.bin ~ age,
data = Chile,
family = binomial)
glm_2 <- glm(vote.bin ~ sex,
data = Chile,
family = binomial)
glm_3 <- glm(vote.bin ~ age + sex,
data = Chile,
family = binomial)
glm_4 <- glm(vote.bin ~ statusquo,
data = Chile,
family = binomial)
glm_5 <- glm(vote.bin ~ age + statusquo,
data = Chile,
family = binomial)
glm_6 <- glm(vote.bin ~ sex + statusquo,
data = Chile,
family = binomial)
glm_7 <- glm(vote.bin ~ age + sex + statusquo,
data = Chile,
family = binomial)
The next lines allow for a comparison of the coefficient estimates of the models
specified and estimated in the lines above. Comparing the coefficients suggests
that adding even a strong predictor such as status-quo preferences to a model with
a weak predictor does not necessary lead to major changes in the
estimates. The most drastic change that occurs is the one of the coefficient of sex if
status-quo preferences are added to the model. In model glm_2 the coefficient
is -0.584, while in model glm_6 it is -0.718.
mtable(
"(1)"=glm_1,
"(2)"=glm_2,
"(3)"=glm_3,
"(4)"=glm_4,
"(5)"=glm_5,
"(6)"=glm_6,
"(7)"=glm_7
) |> show_html()
| (1) | (2) | (3) | (4) | (5) | (6) | (7) | |||||||||||||||
| (Intercept) | −0 | . | 814*** | 0 | . | 279*** | −0 | . | 529*** | 0 | . | 215* | −0 | . | 200 | 0 | . | 594*** | 0 | . | 233 |
| (0 | . | 133) | (0 | . | 070) | (0 | . | 141) | (0 | . | 100) | (0 | . | 266) | (0 | . | 146) | (0 | . | 296) | |
| age | 0 | . | 021*** | 0 | . | 022*** | 0 | . | 011 | 0 | . | 009 | |||||||||
| (0 | . | 003) | (0 | . | 003) | (0 | . | 007) | (0 | . | 007) | ||||||||||
| sex: M/F | −0 | . | 584*** | −0 | . | 610*** | −0 | . | 718*** | −0 | . | 696*** | |||||||||
| (0 | . | 097) | (0 | . | 098) | (0 | . | 197) | (0 | . | 198) | ||||||||||
| statusquo | 3 | . | 206*** | 3 | . | 196*** | 3 | . | 208*** | 3 | . | 195*** | |||||||||
| (0 | . | 143) | (0 | . | 143) | (0 | . | 144) | (0 | . | 144) | ||||||||||
| Log-likelihood | −1197 | . | 036 | −1199 | . | 256 | −1177 | . | 407 | −376 | . | 294 | −374 | . | 881 | −369 | . | 561 | −368 | . | 591 |
| N | 1757 | 1757 | 1757 | 1754 | 1754 | 1754 | 1754 | ||||||||||||||
|
Significance: *** = p < 0.001; ** = p < 0.01; * = p < 0.05 |
|||||||||||||||||||||
The following lines define a function that allow a convenient display of various quantities, such as AMEs and correlations.
print_nicely <- function(x, digits=3) {
x <- formatC(x,format="f",digits=digits)
x <- gsub("NA"," ",x,fixed=TRUE)
x <- format(x,justify="right")
print(x,quote=FALSE)
}
In a way similar way as above, the following lines allow to compare the AMEs
between nested models. This comparison yields a suprising result, given the
claimed advantages of AMEs, namely that they supposedly change little between
nested models. However, the AME of sex changes quite drastically if status-quo
preferences are added to the model (as the comparison of the AMEs from glm_2
and glm_6 indicates). While we found a substantial change in the coefficient
of sex from model glm_2 to model glm_6, it was small relative to its
size. However the change in the AME of sex from model glm_2 to model glm_6
is large relative compared to its initial value of -0.146. In glm_6 the AME of
sex is less than half of that in glm_2. Notably, while the coefficient gets
larger, the AME gets smaller.
collect(
"(1)"=AME(glm_1),
"(2)"=AME(glm_2),
"(3)"=AME(glm_3),
"(4)"=AME(glm_4),
"(5)"=AME(glm_5),
"(6)"=AME(glm_6),
"(7)"=AME(glm_7)
) |> print_nicely()
(1) (2) (3) (4) (5) (6) (7)
age 0.005 0.005 0.001 0.001
sexM -0.143 -0.146 -0.043 -0.041
statusquo 0.195 0.193 0.191 0.190
While the simulation study discussed earlier indicate that coefficients differ more between nested models than AMEs, the example of the Chile plebiscite seems to suggest otherwise. Yet the predictor variabels in the simulation study were, at least at population-level, uncorrelated. The different conclusion from the empirical data may come from the predictor variables being correlated. However, as the results from following code show, the correlations among the predictor variables in the empirical example are far from being strong.
with(Chile,cor(
cbind(age,sex,statusquo),
use="pairwise"
)) |> print_nicely()
age sex statusquo
age 1.000 0.017 0.115
sex 0.017 1.000 -0.067
statusquo 0.115 -0.067 1.000
What can be concluded from the previous example is that the specific “advantage” of average marginal effects breaks down when the predictor variables are correlated: In the example, an AME (the one related to respondents’ sex) changed quite substantially when a predictor was added to a model, while the coefficient changed relatively little. This could be interpretated such that in the present case the logitstic regression coefficient fails to express that the effect of sex on voting for Pinochet is partially mediated by status quo preferences, while this is reflected by the average marginal effects.
Discussion and conclusion
The simulation studies reported about in this post suggest the following conclusions:
- Coefficient estimates of a predictor in a logistic regression model with an omitted variable (i.e., a predictor that does “in reality” have an effect on the response, but is not included in the odel) are biased even if the missing predictor is uncorrelated with the variable(s) present in the model. This is different from the situation with classical linear regression models with OLS estimator: Here an omitted-variable bias occurs only if the ommitted variable is correlated with at least one of the included variables.
- The ommitted variable bias in logistic regression is the larger, the stronger the “true” effect of a ommitted variable bias.
- As a consequence, coefficients of variables in a logistic regression model will always change when an ommitted variable is added to the model, provided that its “true” effect is non-zero.
- In contrast to its coefficient, the average marginal effect of a predictor variable in a logistic regression model does not change or changes very little when an omitted variable is added to the model. This, however, means that an AME not only represents the influence of a predictor variable present in a model, but is also affected by influence of omitted variables.
- Under a wide set of conditions, the AME of a predictor variable in a logistic regression equals its coefficient in a linear regression. (This is in contrast to the emprical example of the previous post, where the AME and the linear regression coefficient of status quo preferences on voting for Pinochet were quite different.)
The empirical example leads to a qualification of the conlusions drawn from the simulation study: If predictor variables are correlated and then it may happen that the relative change of an AME of a predictor variable due to adding an omitted variable can be larger than the relative change of a coefficient. On the other hand, changes in a coefficient due to adding an omitted variable to the model is not always substantial in size.
That adding an omitted variable to a model even if it is uncorrelated to variables already present in a model may make it difficult if not important to decompose a total effect of a predictor variable into a direct and an indirect effect. However, one the one hand, this decomposability of total effects may only work if total, direct, and indirect effects are all linear (or nearly linear). But it is arguably a very strong (and not very plausible) that all relations between variables in the social science are linear. The linearity assumption is all too often motivated by convenience that (for the same reason) remains untested. On the hand, there seems to be a solution of the (alleged) problem of coefficient incomparability, the KHB-techique, developed by Karlson, Holm, and Breen (2012).
That AMEs are influenced not only by the coefficient of the predictor variable of interest but also by other variables with influence on the response (whether they are included in the model or not) has some consequences that one may consider as paradoxical or inconvenient, if one sticks to the use of AMEs to describe the effect of predictor variables on a binary response. These consequences will be uncovered in the next blog post.