# Using Mixed-Effects Models For Linear Regression

Mixed-effects regression models are a powerful tool for linear regression models when your data contains global and group-level trends. This article walks through an example using fictitious data relating exercise to mood to introduce this concept. R has had anundeserved rough time in the news
lately, so this post will use R as a small condolence to the language, though a robust framework exist in Python as well.

Mixed-effect models are common in political polling analysis where national-level characteristics are assumed to occur at a state-level while state-level sample sizes may be too small to drive those characteristics on their own. They are also common in scientific experiments where a given effect is assumed to be present among all study individuals which needs to be teased out from a specific effect on a treatment group. In a similar vein, this framework can be helpful in pre/post studies of interventions.

#### Data Analysis

This data simulates a survey of residents of 4 states who were asked about their daily exercise habits and their overall mood on a scale from 1–10. We’ll assume for purposes of example that mood scores are linear, but in the real world we may want to treat this as an ordinal
variable. What we will see (because I made up the data) is that exercise improves mood, however there are strong state-level effects.

summary(data)

#### First Try: Fixed-Effect Linear Regression

There are clear positive correlations between exercise and mood, though the model fit is not great: exercise is a significant predictor, though adjusted r-squared is fairly low. By the way, I love using R for quick regression questions: a clear, comprehensive output is often easy to find.

reg1 <- lm(Mood ~ Exercise, data = data)
summary(reg1)

with(data, plot(Exercise, Mood))
abline(reg1)

We may conclude at this point that the data is noisy, but let’s dig a little deeper. Recall that one of the assumptions of linear regression is “homoscedasticity”, i.e. that variance is constant among for all independent variables. When heteroscedasticity is present (and homoscedasticity is violated), the regression may give too much weight to the subset of data where the error variance is the largest. You can validate this assumption by looking at a residuals plot.

For reference, below is a linear regression that does not violate homoscedasticity.

x <- rnorm(25, 0, 1)
y = 2*x - 0.5 + rnorm(25, 0, 0.25)
reg.test <- lm(y ~ x)
plot(x, y)
abline(reg.test)
plot(x, resid(reg.test), ylab="Residuals", xlab="x")
abline(0, 0)

And then there is our data:

plot(data\$Mood, resid(reg1), ylab="Residuals", xlab="Mood")
abline(0, 0)

This plot shows that a simple linear regression is not appropriate — the model consistently produces negative residuals for low mood scores, and positive residuals for high mood scores. We might suspect at this point that mood and state are correlated in a way or model is not incorporating, which is a good guess => variance in residuals differs by state.

plot(data\$State, resid(reg1), ylab="Residuals", xlab="Mood")
abline(0, 0)

#### Second Try: A More Robust Linear Regression

But wait — if state is a predictor, let’s include it in our regression and fix everything. We’ll see that this is mostly
correct.

reg2 <- lm(Mood ~ Exercise + State, data = data)
summary(reg1)

with(data, plot(Exercise, Mood))
abline(reg2)
plot(data\$State, resid(reg2), ylab="Residuals", xlab="State")
abline(0, 0)

R-squared improves significantly, but now the plotted line looks awfully goofy — we consistently undershoot, and the coefficient estimate for Exercise is near zero (and has a non-significant p-value). This is an example of the effect of heteroskedasticity — the groups (i.e. states) with larger variance override groups with smaller variance.

We’re getting somewhere. What if we did a separate linear regression for each state? That way we’re not going to have problems with group interactions skewing our coefficients, right?

library(ggplot2)
ggplot(data, aes(x = Exercise, y = Mood, color = State)) +
geom_point() +
geom_smooth(method='lm',formula=y~x)

Well, we have an opposite problem now — notice that in state C exercise is now decreasing mood. and the slope coefficient in other states is much lower than the 0.42951 that we saw in the Mood ~ Exercise regression. So now there is information in the high-level model that we’re neglecting because of our focus on the state-level models.

#### Mixed-Effect Models

The final example above leads right into a mixed-effect model. In this model, we can allow the state-level regressions to incorporate some of the information from the overall regression, but also retain some state-level components. We can use the lme4 library to do this.

The notation is similar to an lm regression with a key difference: the (1 + Exercise | State) notation allows the model to use a term with different slopes and intercepts for Mood ~ Exercise for each State value. See the coefficient values below.

library(lme4)
reg3 <- lmer(Mood ~ Exercise + (1 + Exercise | State), data = data, REML = FALSE)
summary(reg3)
coef(reg3)

We have run 3 models now:

1. Mood ~ Exercise
2. Mood ~ Exercise + State
3. Mood ~ Exercise + (1 + Exercise | State)

We can calculate the RMSE of each model.

reg1_predict <- predict(reg1, data[1:2])
reg2_predict <- predict(reg2, data[1:2])
reg3_predict <- predict(reg3, data[1:2])
sqrt(sum((data[3] - reg1_predict)**2))
sqrt(sum((data[3] - reg2_predict)**2))
sqrt(sum((data[3] - reg3_predict)**2))

RMSE improved significantly moving from models 2 to 3 — this suggests that the majority of the difference between states and mood is due to average mood in each state. Moving from model 2 to 3 captured this state-level intercept information, but also calculated a slope coefficient for Mood ~ Exercise for each state which incorporated information from the total dataset and from the state-level information (recall that using only state-level slope information produced a negative slope in State C).

A few final notes on Mixed-Effect Models. There are multiple approaches and ongoing research into how to determine p-values for mixed-effect models. One can use an anova likelihood test to determine if an added variable is significant with respect to a model without that added variable.

#### Conclusion

Mixed-Effect models provide a framework for smoothing global and group level characteristics in your data.

I learned about these models primarily from Richard McElreath and his wonderful text Statistical Rethinking
. I’d recommend it highly to any reader: it is a great help in rethinking many of the statistical assumptions that were made for me in entry-level classes that I never knew to reconsider.

OJ Watson also has a well-done Kaggle post
that presents a python-based framework for mixed-effect models.