R Markdown File to Augment Bliese’s (2016) Tutorial
Multilevel Modeling - Bliese’s Steps
The following R Markdown file is meant to augment Bliese’s (2016) multilevel package tutorial
Prior to starting the actual steps to multilevel modeling, let’s preload the packages
#install.packages("multilevel")
library(multilevel)
## Loading required package: nlme
## Loading required package: MASS
Now, let’s call the data that is preloaded in R (thanks to Bliese)
data(bh1996)
Although we’re not to Bliese’s Step 1 yet (p. 52), his step 1.5 is actually Aguinis et al.’s (2013) Step 1, so let’s start there
Null.gls<-gls(WBEING~1,data=bh1996,
control=list(opt="optim"),na.action = na.omit) #Adding na.action to handle missing data
Here, we’re checking to see if there’s any variation among individual’s response to well being (i.e., intercept only)
STEP ONE: Examine the ICC for the Outcome Variable
Null.Model<-lme(WBEING~1,random=~1|GRP,data=bh1996,
control=list(opt="optim"),na.action = na.omit) #Adding na.action to handle missing data
Well being is the outcome variable. The intercept is allowed to vary by group. Thus, we’re modeling the intercept plus random between- and within-group error terms. AKA the unconditional means model.
Option 1: Manually Calculate ICC(1)
VarCorr(Null.Model)
## GRP = pdLogChol(1)
## Variance StdDev
## (Intercept) 0.03580079 0.1892110
## Residual 0.78949727 0.8885366
ICC(1) = Intercept Variance / (Intercept Variance + Residual Variance)
0.03580079/(0.03580079+0.78949727)
## [1] 0.04337922
Option 2: Let R Calculate ICC(1) by Creating an aov model
tmod<-aov(WBEING~as.factor(GRP),data=bh1996) # Creates a model comparing well-being among groups
ICC1(tmod)
## [1] 0.04336905
Option 3: Let R Calculate ICC(1) and ICC(2) Using psychometric package
# ICCs of Mixed Effects Models
#install.packages("psychometric")
library(psychometric)
ICC1.lme(WBEING, GRP, bh1996) # Format is (DV, grouping variable, data-frame)
## [1] 0.0433792
ICC2.lme(WBEING, GRP, bh1996, weighted = FALSE)
## [1] 0.7335211
- As you can see, all ICC(1) calculations are very close. Roughly 4% of the variance in individual well being can be “explained” by the group.
- Remember, ICC(1) can be interpretted as the amount of variance “explained” by the group membership (i.e., the higher level entity)
- ICC(2) is how reliably you can say that the group means differ from each other. >.70 is a rule of thumb cut-off.
Compare Models to See if Including the Random Intercept (i.e., different group means) Improves Model Fit
anova(Null.gls, Null.Model)
## Model df AIC BIC logLik Test L.Ratio p-value
## Null.gls 1 2 19540.17 19553.98 -9768.084
## Null.Model 2 3 19353.34 19374.06 -9673.669 1 vs 2 188.8303 <.0001
The anova function simplifies the -2 logLiklihood comparison that uses Chi-square distribution, based on degrees of freedom, to test significance The results of this comparison suggest that including the random intercept significantly improves the model fit. Thus, we can move forward to the next step in the multilevel process.
STEP TWO: Explain Level 1 and 2 Intercept Variance
Now, we’re adding in level-1 (HRS) and level-2 (G.HRS) predictor variables. *Note: G.HRS is the group mean for individual-level hours worked.
Model.1<-lme(WBEING~HRS+G.HRS,random=~1|GRP,data=bh1996,
control=list(opt="optim"),na.action = na.omit)
summary(Model.1)
## Linear mixed-effects model fit by REML
## Data: bh1996
## AIC BIC logLik
## 19222.28 19256.81 -9606.14
##
## Random effects:
## Formula: ~1 | GRP
## (Intercept) Residual
## StdDev: 0.11639 0.8832353
##
## Fixed effects: WBEING ~ HRS + G.HRS
## Value Std.Error DF t-value p-value
## (Intercept) 4.740829 0.21368746 7282 22.185808 0
## HRS -0.046461 0.00488798 7282 -9.505056 0
## G.HRS -0.126926 0.01940357 97 -6.541368 0
## Correlation:
## (Intr) HRS
## HRS 0.000
## G.HRS -0.965 -0.252
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.35320562 -0.65024982 0.03760797 0.71319835 2.70917777
##
## Number of Observations: 7382
## Number of Groups: 99
Checking to see how much variance is explained at each level.
VarCorr(Model.1)
## GRP = pdLogChol(1)
## Variance StdDev
## (Intercept) 0.01354663 0.1163900
## Residual 0.78010466 0.8832353
- Variance Explained = 1 - (Var with Predictor/Var without Predictor)
WithinVarExp=1-(.780/.789)
WithinVarExp
## [1] 0.01140684
BetweenVarExp=1-(.014/.036)
BetweenVarExp
## [1] 0.6111111
“…the variance estimates should be treated with some degree of caution because they are partially dependent upon how one specifies the models” (Bliese, 2016, p. 57).
STEP THREE: Examine and Predict Slope Variance
Now, we’re trying to explain the third source of variation, slope variation
library(lattice)
xyplot(WBEING~LEAD|as.factor(GRP),data=bh1996[1:1582,], # We're looking at the first 1582 rows to see the first 25 groups
type=c("p","g","r"),col="dark blue",col.line="black", # "p" = points, "g" = grid, "r" = regression line
xlab="Leadership Consideration",
ylab="Well-Being")
- Based on the plot, it looks like the slopes vary among the groups, but we have to test this.
Before we allow the WBEING~LEAD slope to vary among groups, we should first build a model with a fixed slope & random intercepts.
- Note: I’m rearranging Bliese’s order because it makes sense to start with the less complicated models and work to more complicated models. Thus, Model.2a comes before Model.2 in my version, but the models match Bliese’s (2016).
Model.2a<-lme(WBEING~HRS+LEAD+G.HRS,random=~1|GRP, data=bh1996, # We're allowing the intercept to vary among groups
control=list(opt="optim"),na.action=na.omit)
summary(Model.2a)
## Linear mixed-effects model fit by REML
## Data: bh1996
## AIC BIC logLik
## 17862.68 17904.12 -8925.341
##
## Random effects:
## Formula: ~1 | GRP
## (Intercept) Residual
## StdDev: 0.1190809 0.8043905
##
## Fixed effects: WBEING ~ HRS + LEAD + G.HRS
## Value Std.Error DF t-value p-value
## (Intercept) 2.5592587 0.21580452 7281 11.85915 0e+00
## HRS -0.0282785 0.00447625 7281 -6.31745 0e+00
## LEAD 0.4956385 0.01277894 7281 38.78559 0e+00
## G.HRS -0.0790096 0.01888137 97 -4.18453 1e-04
## Correlation:
## (Intr) HRS LEAD
## HRS -0.027
## LEAD -0.260 0.105
## G.HRS -0.950 -0.228 0.065
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.94871064 -0.65589603 0.03691456 0.70114760 3.63347646
##
## Number of Observations: 7382
## Number of Groups: 99
Now, we can add another layer of complication by allowing the WBEING~LEAD slopes to vary among groups
Model.2<-lme(WBEING~HRS+LEAD+G.HRS,random=~LEAD|GRP, data=bh1996, # We're allowing the WBEING~LEAD slope to vary by group
control=list(opt="optim"),na.action=na.omit)
summary(Model.2)
## Linear mixed-effects model fit by REML
## Data: bh1996
## AIC BIC logLik
## 17838.58 17893.83 -8911.29
##
## Random effects:
## Formula: ~LEAD | GRP
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 0.3794891 (Intr)
## LEAD 0.1021935 -0.97
## Residual 0.8008079
##
## Fixed effects: WBEING ~ HRS + LEAD + G.HRS
## Value Std.Error DF t-value p-value
## (Intercept) 2.4631348 0.20832607 7281 11.823459 0e+00
## HRS -0.0284776 0.00446795 7281 -6.373764 0e+00
## LEAD 0.4946550 0.01680846 7281 29.428928 0e+00
## G.HRS -0.0705047 0.01789284 97 -3.940387 2e-04
## Correlation:
## (Intr) HRS LEAD
## HRS -0.022
## LEAD -0.338 0.074
## G.HRS -0.929 -0.245 0.064
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.87093730 -0.65503152 0.04205164 0.69693131 3.95467766
##
## Number of Observations: 7382
## Number of Groups: 99
Now, we compare the two models to see if allowing the slopes to vary among groups improves the model fit
anova(Model.2a,Model.2)
## Model df AIC BIC logLik Test L.Ratio p-value
## Model.2a 1 6 17862.68 17904.12 -8925.341
## Model.2 2 8 17838.58 17893.83 -8911.290 1 vs 2 28.10254 <.0001
*The results suggest allowing the WBEING~LEAD slopes to vary among groups provides a significantly better fit
Now, we add a cross-level interaction between level-1 LEAD and level-2 G.HRS
Final.Model<-lme(WBEING~HRS+LEAD+G.HRS+LEAD:G.HRS, random=~LEAD|GRP,data=bh1996,
control=list(opt="optim"),na.action=na.omit)
summary(Final.Model)
## Linear mixed-effects model fit by REML
## Data: bh1996
## AIC BIC logLik
## 17843.94 17906.09 -8912.968
##
## Random effects:
## Formula: ~LEAD | GRP
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 0.36229909 (Intr)
## LEAD 0.09770136 -0.965
## Residual 0.80087761
##
## Fixed effects: WBEING ~ HRS + LEAD + G.HRS + LEAD:G.HRS
## Value Std.Error DF t-value p-value
## (Intercept) 3.653755 0.7261065 7280 5.031983 0.0000
## HRS -0.028559 0.0044683 7280 -6.391465 0.0000
## LEAD 0.125570 0.2173923 7280 0.577621 0.5635
## G.HRS -0.174924 0.0635883 97 -2.750884 0.0071
## LEAD:G.HRS 0.032460 0.0190634 7280 1.702740 0.0887
## Correlation:
## (Intr) HRS LEAD G.HRS
## HRS -0.018
## LEAD -0.962 0.017
## G.HRS -0.994 -0.058 0.958
## LEAD:G.HRS 0.958 -0.012 -0.997 -0.959
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.83632991 -0.66059970 0.04112998 0.69533044 3.94797744
##
## Number of Observations: 7382
## Number of Groups: 99
Now, we compare the random slope models with and without the cross-level interaction
anova(Model.2,Final.Model)
## Warning in anova.lme(Model.2, Final.Model): fitted objects with different fixed
## effects. REML comparisons are not meaningful.
## Model df AIC BIC logLik Test L.Ratio p-value
## Model.2 1 8 17838.58 17893.83 -8911.290
## Final.Model 2 9 17843.94 17906.09 -8912.968 1 vs 2 3.356567 0.0669
NOTE: Even though this final model is not significantly better at the .05 level and Final.Model’s interaction is not significant at the .05 level, we proceed for illustrative purposes as if these were both significant.
Calculate Psuedo-R-squared for lme Models
#install.packages("MuMIn")
library(MuMIn)
Model.1rs <- r.squaredGLMM(Model.1) #pseudo R-squared
## Warning: 'r.squaredGLMM' now calculates a revised statistic. See the help page.
Model.1rs
## R2m R2c
## [1,] 0.03855485 0.05496551
Model.2rs <- r.squaredGLMM(Model.2) #pseudo R-squared
Model.2rs
## R2m R2c
## [1,] 0.2030021 0.2271794
Final.Modelrs <- r.squaredGLMM(Final.Model) #pseudo R-squared
Final.Modelrs
## R2m R2c
## [1,] 0.2053143 0.2283785
PLOTTING CROSS-LEVEL INTERACTIONS
STEP 1: Calculate High and Low Values of Both Variables, Based on 1sd Above and Below the Mean
mean(bh1996$G.HRS)
## [1] 11.2987
sd(bh1996$G.HRS)
## [1] 0.8608297
# Low = 11.30 - .86 = 10.44
# High = 11.30 + .86 = 12.16
mean(bh1996$LEAD)
## [1] 2.890665
sd(bh1996$LEAD)
## [1] 0.7719381
# Low = 2.89 - .77 = 2.12
# High = 2.89 + .77 = 3.66
STEP 2: Create a Temporary Dataset (TDAT) with High and Low Values for the Interaction Variables and Mean Values for Non-Interactive Variables
TDAT<-data.frame(HRS=c(11.2987,11.2987,11.2987,11.2987),
LEAD=c(2.12,2.12,3.66,3.66),
G.HRS=c(10.44,12.16,10.44,12.16),
GRP=c(1,1,1,1))
predict(Final.Model,TDAT,level=0)
## [1] 2.489508 2.307001 3.204766 3.108239
## attr(,"label")
## [1] "Predicted values"
STEP 3: Plot the Interaction Using interaction.plot
TDAT$WBEING<-predict(Final.Model,TDAT,level=0)
with(TDAT,interaction.plot(LEAD,G.HRS,WBEING))
REFERENCES
Aguinis, H., Gottfredson, R. K., & Culpepper, S. A. (2013). Best-practice recommendations for estimating cross-level interaction effects using multilevel modeling. Journal of Management (Vol. 39). https://doi.org/10.1177/0149206313478188
Bliese, P. D. (2016). Multilevel modeling in R (2.6): A brief introduction to R, the multilevel package and the nlme package. https://doi.org/10.1177/0269216316671280