Logistic Regression, Average Marginal Effects, and the Linear Probability Model - Part IV: How AMEs are affected by the distribution of omitted variables

22 March 2026

In the previous post, we saw that an average marginal effect (AME) of an independent variable in a logistic regression model reflects not only the influence of the variable in focus (i.e., the variable for which the AME is computed), but also the impact of ommitted variables in a model. The perhaps desirable consequence of this is that AMEs change little if an ommitted variable is added to the model (if it is uncorrelated to the variables to the variables already present in the model.) However, we also saw that the mean and standard deviation of the variable in focus affects its AME.

Since AMEs are affected by the strength of the influence of an omitted variable, one may ask whether the distribution of an omitted variable affects the AME. This is explored in the following simulation studies.

The function defined in the following provides the core replications of the simulation study. In this function two normal distributed random variables x1 and x2 are reated with expected values mu.x1 and mu.x2. Additionally, a binary dependent variable is created that follows a logistic regression model with intercept a and coefficients b1 and b2 of the predictor variables x1 and x2. The function returns the coefficients from a logistic regression, a linear regression, and the AMEs from the logistic regression, where x1 is included in the regression model and x2 is omitted.

fun <- function(a=0,b1=1,b2=1,mu.x1=0,mu.x2=0,n=5000) {
    x1 <- rnorm(n=n,mean=mu.x1)
    x2 <- rnorm(n=n,mean=mu.x2)
    p <- logistic(a+b1*x1+b2*x2)
    y <- rbinom(n=n,size=1,prob=p)
    glm <- glm(y~x1, family = binomial,maxit=6)
    lm <- lm(y~x1)
    
    c(
        b_glm=coef(glm)[-1],
        AME_glm=AME(glm),
        b_lm=coef(lm)[-1]
    )
}

Last executed at 21 March 2026 at 20:43:18 GMT+1 in 43ms

Before the simulations are run, a few preparations are made.

library(memisc)
source("simsim.R")
logistic <- function(x) 1/(1+exp(-x))
AME <- function(mod) {
    p <- predict(mod,type="response")
    cf <- coef(mod)
    cf <- cf[names(cf)!="(Intercept)"]
    unname(mean(p*(1-p),na.rm=TRUE))*cf
}
library(ggplot2)
theme_set(theme_bw())

Last executed at 21 March 2026 at 20:26:21 GMT+1 in 29ms

A first simulation study: How the mean of an omitted variable affects the AME

The following simulation run varies the true coefficient of the omitted variable as well as its expected value.

simres <- simpar(fun,
                 conditions=list(
                     b2=c(0,0.1,.3,0.5,1,1.5,2,2.5,3),
                     mu.x2=c(-2,-1.5,-1,-.5,0,.5,1,1.5,2)
                 ),
                 nsim=100)

Last executed at 21 March 2026 at 22:38:22 GMT+1 in 19206ms


Maximum number of cores available is 20.
Using 18 cores for 18 threads/jobs ...

simres.df <- as.data.frame(simres)

Last executed at 21 March 2026 at 22:38:26 GMT+1 in 25ms

The following code creates a diagram that shows the distribution of the estimates of the coefficient of x1 in logistic regression.

The resulting diagram shows that, as already seen earlier, that the estimate is biased if the omitted variable as an infuence on the response. The bias increases with the strength of the influence of the omitted variable. If this influence is strong, the expected value of the omitted variable also has a slight influence on this bias, where greater values lead to smaller bias.

ggplot(simres.df) + 
  geom_boxplot(aes(mu.x2,b_glm.x1,group=mu.x2)) +
  facet_wrap(~b2, labeller = label_both)

Last executed at 21 March 2026 at 22:38:28 GMT+1 in 1440ms

The following code creates a diagram that shows the distribution of the AME of x1 from logistic regression.

The diagram indicates that the AME of x1 is the smaller, the stronger the influence of x2 (expressed by the coefficent b2) is. (This is already occured in a previous post). Yet in addition, the AME of x1 is also affected by the mean of x1.

ggplot(simres.df) + 
  geom_boxplot(aes(mu.x2,AME_glm.x1,group=mu.x2)) +
  facet_wrap(~b2, labeller = label_both)

Last executed at 21 March 2026 at 22:38:47 GMT+1 in 1578ms

A second simulation study: Comparing the effect of the mean of a predictor variable and an omitted variable on the AME

The second simulation study varies the expected values of both a predictor variable present in a model (represented by x1) and an omitted variable (represented by x2) and examines their effect on the AME values.

simres1 <- simpar(fun,
                 conditions=list(
                     mu.x1=c(-2,-1.5,-1,-.5,0,.5,1,1.5,2),
                     mu.x2=c(-2,-1.5,-1,-.5,0,.5,1,1.5,2)
                 ),
                 nsim=100)

Last executed at 21 March 2026 at 23:31:29 GMT+1 in 22008ms


Maximum number of cores available is 20.
Using 18 cores for 18 threads/jobs ...

simres1.df <- as.data.frame(simres1)

Last executed at 21 March 2026 at 23:31:33 GMT+1 in 44ms

The next two diagrams show how the values of the AMEs vary with the expected values of both x1 and x2. The first diagram has the expected values of x2 on the horizontal axis, while the expected values of x1 define the panels; the second diagram has the expected values of x1 on the horizontal axis, with the expected values of x2 defining the panels.

ggplot(simres1.df) + 
  geom_boxplot(aes(mu.x2,AME_glm.x1,group=mu.x2)) +
  facet_wrap(~mu.x1, labeller = label_both)

Last executed at 21 March 2026 at 23:31:36 GMT+1 in 1380ms

ggplot(simres1.df) + 
  geom_boxplot(aes(mu.x1,AME_glm.x1,group=mu.x1)) +
  facet_wrap(~mu.x2, labeller = label_both)

Last executed at 21 March 2026 at 23:31:45 GMT+1 in 1408ms

The two diagrams are very similar to each other, to the degree that the two variables, the included and the omitted variable, are exchangeable in terms of their impact of an AME.

The third diagram obtained from this simulation study shows that logistic regression coefficients are also affected by the means of included and omitted variables. However, this influence appears much weaker than in the case of average marginal effects.

ggplot(simres1.df) + 
  geom_boxplot(aes(mu.x2,b_glm.x1,group=mu.x2)) +
  facet_wrap(~mu.x1, labeller = label_both)

Last executed at 22 March 2026 at 00:05:17 GMT+1 in 1560ms

Conclusion

The simulation study described above sheds even more doubt about the comparability of average marginal effects (AMEs) across samples from different (sub-)populations. That the mean of a predictor variable included in the model affects the values of the AME was already demonstrated in the previous post. If there are no omitted varaibles it could at least anticipated whether AMEs from different samples are incomparable by comparing the distribution of the variable for which an AME is computed. In practice, however, it can seldom ruled out completely that there are no relevant variables omitted from a logistic regression model. This however means that it cannot always be decided whether differences between samples in terms of the distribution lead to an incomparability of AMEs, because samples may (also) differ in the distribution of relevant variables that are not included into the logistic regression model.