Logistic Regression, Average Marginal Effects, and the Linear Probability Model - Part III: How AMEs are affected by the distribution of predictor variables
By construction, the AME of an independent variable in a logistic regression model depends not only on its coefficient, but also on the predicted values of the dependent variable. The formula for the AME of the -th independent variable is
Since the predicted values not only depend on the estimate on the coefficient(s) but also the values of the independent variable(s), one may ask how its distribution affects the AME. This is explored in the simulation studies discussed below.
A first simulation study: How the mean of a predictor variable affects the AME
The first simulation study discussed in this post looks at how the mean of the independent variable and its coefficient affect its AME.
The following two lines activate the “memisc” package and run an R script that defines the functions providing the simulation infrastructure.
The following line defines the logistic function.
logistic <- function(x) 1/(1+exp(-x))
The next few lines define a function for the computation of the average marginal effects of the predictor variables in a logistic regression model.
AME <- function(mod) {
p <- predict(mod,type="response")
cf <- coef(mod)
cf <- cf[names(cf)!="(Intercept)"]
unname(mean(p*(1-p),na.rm=TRUE))*cf
}
The function defined in the following lines provides the core replications of
the simulation study. In this function normal distributed random vector is
created with expected value mue.x and standard deviation sd.x. Further a binary
dependent variable is created according to a logistic function with intercept a
and slope coefficient b. Then a logistic regression is fitted to the dependent
and independent variables as well as a linear regression. The function returns
the coefficient x in the logistic regression, the AME of x computed from the
logistic regression, and the coefficient of x from a linear regression.
fun <- function(a=0,b=1,mu.x=1,sd.x=1,n=5000) {
x <- rnorm(n=n,mean=mu.x,sd=sd.x)
p <- logistic(a+b*x)
y <- rbinom(n=n,size=1,prob=p)
glm1 <- glm(y~x, family = binomial,maxit=6)
lm1 <- lm(y~x)
c(
b_glm1=coef(glm1)[-1],
AME_glm1=AME(glm1),
b_lm1=coef(lm1)[-1]
)
}
In the first simulation, the slope coefficient b is varied between 0 and 3, and
the mean of x is varied between -2 and +2, while the standard deviation of x
equals 1 and the intercept in the model is equal to zero.
For the visualisation of the results of the simulation, we again use the package “ggplot2” and set colour theme with white background.
library(ggplot2)
theme_set(theme_bw())
Attaching package: ‘ggplot2’
The following object is masked from ‘package:memisc’:
syms
The following lines of code plot the distribution of the coefficient estimates
of x against the mean of x, conditional on the settings of the true slope
b. The resulting diagram indicates that the average of the coefficient values
does systematically change with mu.x, i.e. the expected value of x, the
variation is merely the result of sampling error (i.e. the consequence that the
samples are finite).
ggplot(simres.df) +
geom_boxplot(aes(mu.x,b_glm1.x,group=mu.x)) +
facet_wrap(~b, labeller = label_both,scales="free_y")
The diagram created next plots the distribution of the AME of x against the
mean of x, conditional on the settings of the true slope
b.
ggplot(simres.df) +
geom_boxplot(aes(mu.x,AME_glm1.x,group=mu.x)) +
facet_wrap(~b, labeller = label_both,scales="free_y")
It becomes quite clear from the diagram that, unless the influence on x on y
is very small (with b equal to 0.3 or less), the AME strongly varies with the
mean of x. It is gets the highest value if the mean of x is zero and
gets smaller the greater the absolute value of the mean of x is. Furthermore, the
influence of the mean of x is the stronger the greater the (absolute) value of
the slope coefficient is.
A second simulation study: How the mean and the stand deviation of a predictor variable affect the AME
In the second simulation study, the logit coefficient of x is fixed to 1,
while both the mean and the standard deviation of x are varied.
The diagram created by the following code displays how the estimates of the logit
coefficients vary with the mean and the standard deviation of the independent
variable x. Again, it is clear that there is no bias in the logit
coefficient estimates that could be influenced by the parameters of the
distribution of x. However, the sampling variance of the coefficient estimates
increases with smaller values of the standard deviation of x.
ggplot(simres1.df) +
geom_boxplot(aes(sd.x,b_glm1.x,group=sd.x)) +
facet_wrap(~mu.x, labeller = label_both)
The next code shows, in a similar way, how the AMEs vary with the mean and the
standard deviation of x. Yet while the distribution of the coefficient
estimates is affected only in terms of its dispersion, the distribution of the
AMEs is strongly affected in terms of its mean.
ggplot(simres1.df) +
geom_boxplot(aes(sd.x,AME_glm1.x,group=sd.x)) +
facet_wrap(~mu.x, labeller = label_both,scales="free_y")
While in the previous simulation study we already found that the mean of the
distribution of the independent variable affects the mean of the distribution of
the AMEs, we now see that the standard deviation of the distribution of the
independent variable also affects the central tendency of the AME: Greater
values of the standard deviation lead to smaller values of the AME. This so
because with a greater dispersion in x, large or small values of the predicted
values are more likely (which again lowers the value of the AME). For large
values of the mean of the distribution of x, the relation between its
standard deviation and the AME becomes non-monotonic. This is so because,
when the mean is large in absolute value and stand deviation is small, predicted
probability close to zero or unity are more common.
A third simulation study: How the mean of a predictor variable and the intercept in the model affect the AME
For a given value of the logit coefficient(s), the predicted probabilities of a response in a logit model not only depends on the values of the predictors but also on the intercept. Since an AME depends both on the logit coefficient and the predicted probabilities, it may be interesting how the distribution of a predictor variable and the intercept jointly affect an AME. This is explored in the simulation discussed in the following.
In the simulation study, both the mean of the predictor variable x and the
“true” intercept a are varied. The same replication function is used
as before.
The following lines of code create a diagram that plots logit coefficient
estimates against the mean of x and the true intercept a. Obviously,
estimates are neither systematically influenced by neither.
ggplot(simres2.df) +
geom_boxplot(aes(mu.x,b_glm1.x,group=mu.x)) +
facet_wrap(~a, labeller = label_both)
In contrast to the logit coefficient estimates, the values of the AME are
systematically related to the mean of x and the value of a: If a equals
equals -2, the AME values increase on average as the mean of the distribution of x
increases from -2 to +2. Conversely, if a equals
equals +2, the AME values decrease on average as the mean of the distribution of x
increases from -2 to +2. If value of a is between -2 and +2 the AME values
increase with the mean of x until their average reaches a maximum. If a
equals zero, the AME values tend to be maximal if the mean of the distribution
of x is also equal to zero. It appears that the average of the AME is maximal
if a and the mean of the distribution of x sum to zero.
ggplot(simres2.df) +
geom_boxplot(aes(mu.x,AME_glm1.x,group=mu.x)) +
facet_wrap(~a, labeller = label_both)
Summary and discussion
The simulation studies in this post reveal a fundamental difference between logit coefficients and AMEs. The values of logit coefficients and their estimates are independent from the distribution of the corresponding predictor variable, while AMEs are not. This is so because a logit coefficient describes the conditional distribution of the response for any given value of the predictor, while an AME describes something else. Therefore, the values of AMEs for the same predictor variable cannot be compared between different samples or (sub-)populations when the its distribution varies between the samples or (sub-)populations.
Some people might be tempted to argue that an AME is superior to a logit coefficient, because it “takes into account” the distribution of the predictor variable, while the coefficient “ignores” it. But this is misleading, or at least is based on a very peculiar conception of a superiority of one quantity of interest over another. That an AME depends on the distribution of a predictor value reduces its generality and restricts its meaning to that particular sample or a particular (sub-)population. It thus makes AMEs incomparable between different samples or (sub-)population even if the fundamental relation between the independent and dependent variables in all (sub-)populations are the same.
To understand why the dependence of an AME on the distribution of a predictor variable may lead to erroneous conclusions, consider the following (artificially simplified) example: In (the fictitious country of) Syldavia, the probability to vote in a general election for the legislative assembly with the years a citizen has spent in full-time education, and this increase is described by a logit model with particular logit coefficient and a particular intercept. With these parameters of the logit model, one can predict the probability that someone participates in the election for any level of education. With a particular distribution of the level of education in a particular year, say 1997, one can compute an average marginal effect (AME). Now imagine further, that the average level of education has considerably changed in Syldavia, so that twice as many people reach a post-secondary qualification in 2027 than in 1997. Because the AME is affected by the mean of the predictor variable education, it will be different in 2027 from its value in 1997. A researcher who rests his or her conclusion on the AME will therefore conclude that the impact of education changed, e.g. became weaker, even though the relation between education and turnout did not change.
One could argue that this argumentation rests on the assumption that a logit model is the correct model describing the influence of the predictor variables. Perhaps the AME performs better when the true model is a linear probability model, because AMEs tend to be close to coefficients of a linear regression model fitted to the binary response variable. However, this is generally not the case, as I will show in a later blog post.