Logistic Regression, Average Marginal Effects, and the Linear Probability Model - Part III: How AMEs are affected by the distribution of predictor variables

By construction, the AME of an independent variable in a logistic regression model depends not only on its coefficient, but also on the predicted values of the dependent variable. The formula for the AME of the h h -th independent variable is

AME h = 1 n i = 1 n π i ( 1 π i ) β h \textrm{AME}_h = \frac1n\sum_{i=1}^{n}\pi_i(1-\pi_i)\beta_h

Since the predicted values π i \pi_i not only depend on the estimate on the coefficient(s) but also the values of the independent variable(s), one may ask how its distribution affects the AME. This is explored in the simulation studies discussed below.

A first simulation study: How the mean of a predictor variable affects the AME

The first simulation study discussed in this post looks at how the mean of the independent variable and its coefficient affect its AME.

The following two lines activate the “memisc” package and run an R script that defines the functions providing the simulation infrastructure.

library(memisc)
source("simsim.R")
Last executed at 8 February 2026 at 22:26:08 GMT+1 in 428ms
Show cell output
Loading required package: lattice
Loading required package: MASS

Attaching package: ‘memisc’

The following objects are masked from ‘package:stats’:

    contr.sum, contr.treatment, contrasts

The following object is masked from ‘package:base’:

    as.array

The following line defines the logistic function.

logistic <- function(x) 1/(1+exp(-x))
Last executed at 8 February 2026 at 22:26:09 GMT+1 in 93ms

The next few lines define a function for the computation of the average marginal effects of the predictor variables in a logistic regression model.

AME <- function(mod) {
    p <- predict(mod,type="response")
    cf <- coef(mod)
    cf <- cf[names(cf)!="(Intercept)"]
    unname(mean(p*(1-p),na.rm=TRUE))*cf
}
Last executed at 8 February 2026 at 22:26:11 GMT+1 in 18ms

The function defined in the following lines provides the core replications of the simulation study. In this function normal distributed random vector is created with expected value mue.x and standard deviation sd.x. Further a binary dependent variable is created according to a logistic function with intercept a and slope coefficient b. Then a logistic regression is fitted to the dependent and independent variables as well as a linear regression. The function returns the coefficient x in the logistic regression, the AME of x computed from the logistic regression, and the coefficient of x from a linear regression.

fun <- function(a=0,b=1,mu.x=1,sd.x=1,n=5000) {
    x <- rnorm(n=n,mean=mu.x,sd=sd.x)
    p <- logistic(a+b*x)
    y <- rbinom(n=n,size=1,prob=p)
    glm1 <- glm(y~x, family = binomial,maxit=6)
    lm1 <- lm(y~x)
    
    c(
        b_glm1=coef(glm1)[-1],
        AME_glm1=AME(glm1),
        b_lm1=coef(lm1)[-1]
    )
}
Last executed at 8 February 2026 at 22:26:13 GMT+1 in 16ms

In the first simulation, the slope coefficient b is varied between 0 and 3, and the mean of x is varied between -2 and +2, while the standard deviation of x equals 1 and the intercept in the model is equal to zero.

simres <- simsim(fun,
                 conditions=list(
                     b=c(0,0.1,.3,0.5,1,1.5,2,2.5,3),
                     mu.x=c(-2,-1.5,-1,-.5,0,.5,1,1.5,2)
                 ),
                 nsim=100)
simres.df <- as.data.frame(simres)
Last executed at 8 February 2026 at 22:28:00 GMT+1 in 50899ms
Show cell output
There were 50 or more warnings (use warnings() to see the first 50)

For the visualisation of the results of the simulation, we again use the package “ggplot2” and set colour theme with white background.

library(ggplot2)
theme_set(theme_bw())
Last executed at 8 February 2026 at 22:28:03 GMT+1 in 236ms

Attaching package: ‘ggplot2’

The following object is masked from ‘package:memisc’:

    syms

The following lines of code plot the distribution of the coefficient estimates of x against the mean of x, conditional on the settings of the true slope b. The resulting diagram indicates that the average of the coefficient values does systematically change with mu.x, i.e. the expected value of x, the variation is merely the result of sampling error (i.e. the consequence that the samples are finite).

ggplot(simres.df) + 
  geom_boxplot(aes(mu.x,b_glm1.x,group=mu.x)) +
  facet_wrap(~b, labeller = label_both,scales="free_y")
Last executed at 8 February 2026 at 22:28:08 GMT+1 in 1296ms

The diagram created next plots the distribution of the AME of x against the mean of x, conditional on the settings of the true slope b.

ggplot(simres.df) + 
  geom_boxplot(aes(mu.x,AME_glm1.x,group=mu.x)) +
  facet_wrap(~b, labeller = label_both,scales="free_y")
Last executed at 8 February 2026 at 22:28:21 GMT+1 in 973ms

It becomes quite clear from the diagram that, unless the influence on x on y is very small (with b equal to 0.3 or less), the AME strongly varies with the mean of x. It is gets the highest value if the mean of x is zero and gets smaller the greater the absolute value of the mean of x is. Furthermore, the influence of the mean of x is the stronger the greater the (absolute) value of the slope coefficient is.

A second simulation study: How the mean and the stand deviation of a predictor variable affect the AME

In the second simulation study, the logit coefficient of x is fixed to 1, while both the mean and the standard deviation of x are varied.

simres1 <- simsim(fun,
                 conditions=list(
                     mu.x=c(-2,-1.5,-1,-.5,0,.5,1,1.5,2),
                     sd.x=c(0.5,1,1.5,2,2.5,3)
                 ),
                 nsim=100)
simres1.df <- as.data.frame(simres1)
Last executed at 7 February 2026 at 22:19:40 GMT+1 in 39535ms

The diagram created by the following code displays how the estimates of the logit coefficients vary with the mean and the standard deviation of the independent variable x. Again, it is clear that there is no bias in the logit coefficient estimates that could be influenced by the parameters of the distribution of x. However, the sampling variance of the coefficient estimates increases with smaller values of the standard deviation of x.

ggplot(simres1.df) + 
  geom_boxplot(aes(sd.x,b_glm1.x,group=sd.x)) +
  facet_wrap(~mu.x, labeller = label_both)
Last executed at 7 February 2026 at 22:19:41 GMT+1 in 735ms

The next code shows, in a similar way, how the AMEs vary with the mean and the standard deviation of x. Yet while the distribution of the coefficient estimates is affected only in terms of its dispersion, the distribution of the AMEs is strongly affected in terms of its mean.

ggplot(simres1.df) + 
  geom_boxplot(aes(sd.x,AME_glm1.x,group=sd.x)) +
  facet_wrap(~mu.x, labeller = label_both,scales="free_y")
Last executed at 7 February 2026 at 22:19:41 GMT+1 in 788ms

While in the previous simulation study we already found that the mean of the distribution of the independent variable affects the mean of the distribution of the AMEs, we now see that the standard deviation of the distribution of the independent variable also affects the central tendency of the AME: Greater values of the standard deviation lead to smaller values of the AME. This so because with a greater dispersion in x, large or small values of the predicted values are more likely (which again lowers the value of the AME). For large values of the mean of the distribution of x, the relation between its standard deviation and the AME becomes non-monotonic. This is so because, when the mean is large in absolute value and stand deviation is small, predicted probability close to zero or unity are more common.

A third simulation study: How the mean of a predictor variable and the intercept in the model affect the AME

For a given value of the logit coefficient(s), the predicted probabilities of a response in a logit model not only depends on the values of the predictors but also on the intercept. Since an AME depends both on the logit coefficient and the predicted probabilities, it may be interesting how the distribution of a predictor variable and the intercept jointly affect an AME. This is explored in the simulation discussed in the following.

In the simulation study, both the mean of the predictor variable x and the “true” intercept a are varied. The same replication function is used as before.

simres2 <- simsim(fun,
                 conditions=list(
                     a=c(-2,-1,-0.5,-0.3,0,.3,0.5,1,2),
                     mu.x=c(-2,-1.5,-1,-.5,0,.5,1,1.5,2)
                 ),
                 nsim=100)
simres2.df <- as.data.frame(simres2)
Last executed at 7 February 2026 at 22:20:35 GMT+1 in 53920ms
Show cell output
There were 50 or more warnings (use warnings() to see the first 50)

The following lines of code create a diagram that plots logit coefficient estimates against the mean of x and the true intercept a. Obviously, estimates are neither systematically influenced by neither.

ggplot(simres2.df) + 
  geom_boxplot(aes(mu.x,b_glm1.x,group=mu.x)) +
  facet_wrap(~a, labeller = label_both)
Last executed at 7 February 2026 at 22:20:36 GMT+1 in 858ms

In contrast to the logit coefficient estimates, the values of the AME are systematically related to the mean of x and the value of a: If a equals equals -2, the AME values increase on average as the mean of the distribution of x increases from -2 to +2. Conversely, if a equals equals +2, the AME values decrease on average as the mean of the distribution of x increases from -2 to +2. If value of a is between -2 and +2 the AME values increase with the mean of x until their average reaches a maximum. If a equals zero, the AME values tend to be maximal if the mean of the distribution of x is also equal to zero. It appears that the average of the AME is maximal if a and the mean of the distribution of x sum to zero.

ggplot(simres2.df) + 
  geom_boxplot(aes(mu.x,AME_glm1.x,group=mu.x)) +
  facet_wrap(~a, labeller = label_both)
Last executed at 7 February 2026 at 22:20:37 GMT+1 in 868ms

Summary and discussion

The simulation studies in this post reveal a fundamental difference between logit coefficients and AMEs. The values of logit coefficients and their estimates are independent from the distribution of the corresponding predictor variable, while AMEs are not. This is so because a logit coefficient describes the conditional distribution of the response for any given value of the predictor, while an AME describes something else. Therefore, the values of AMEs for the same predictor variable cannot be compared between different samples or (sub-)populations when the its distribution varies between the samples or (sub-)populations.

Some people might be tempted to argue that an AME is superior to a logit coefficient, because it “takes into account” the distribution of the predictor variable, while the coefficient “ignores” it. But this is misleading, or at least is based on a very peculiar conception of a superiority of one quantity of interest over another. That an AME depends on the distribution of a predictor value reduces its generality and restricts its meaning to that particular sample or a particular (sub-)population. It thus makes AMEs incomparable between different samples or (sub-)population even if the fundamental relation between the independent and dependent variables in all (sub-)populations are the same.

To understand why the dependence of an AME on the distribution of a predictor variable may lead to erroneous conclusions, consider the following (artificially simplified) example: In (the fictitious country of) Syldavia, the probability to vote in a general election for the legislative assembly with the years a citizen has spent in full-time education, and this increase is described by a logit model with particular logit coefficient and a particular intercept. With these parameters of the logit model, one can predict the probability that someone participates in the election for any level of education. With a particular distribution of the level of education in a particular year, say 1997, one can compute an average marginal effect (AME). Now imagine further, that the average level of education has considerably changed in Syldavia, so that twice as many people reach a post-secondary qualification in 2027 than in 1997. Because the AME is affected by the mean of the predictor variable education, it will be different in 2027 from its value in 1997. A researcher who rests his or her conclusion on the AME will therefore conclude that the impact of education changed, e.g. became weaker, even though the relation between education and turnout did not change.

One could argue that this argumentation rests on the assumption that a logit model is the correct model describing the influence of the predictor variables. Perhaps the AME performs better when the true model is a linear probability model, because AMEs tend to be close to coefficients of a linear regression model fitted to the binary response variable. However, this is generally not the case, as I will show in a later blog post.