测试

SEM: Moderation and moderated mediation
knitr::opts_chunk$set(echo = TRUE)

Mediation with latent variables

What is mediating/indirect effects?

## Loading required package: Rcpp

  • A given variable may be said to function as a mediator to the extent that it accounts for the relation between the predictor and the criterion. Mediators explain how external physical events take on internal psychological significance (Baron & Kenny, 1986, p. 1176).

    • Mediation analysis answers “how” or “why” the observed effect occurs.

    • “The underlying mechanism”

  • The “Mediating Effects”

    • Also known as indirect effect in sociology or economics; in Structural Equation Modeling

      • Direct effect: The direct influence of the predictor to the dependent variable after controlling the mediator

      • Indirect effect: The influence of the predictor to the dependent variable via the mediator

      • Total effect: The sum of direct and indirect effects

  • Types of mediation

    • Complete mediation

    • Partial mediation

    • Notes. Current research reports rarely stress the difference. Why?

    • Mediators can be more than one variable

  • The equations?

    • \(M=\beta_{01}+\beta_{11}X\)

    • \(Y=\beta_{02}+\beta_{12}X+\beta_{22}M\)

  • Is it enough to make the following claim if it refers to “mediation”?

If our model fit the data well, we are safe to conclude that the hypothesize causational pattern between 𝐴, 𝐵 and 𝐶 is well supported by data, it can be use to explain the observed correlation among these 3 variables.

Then, how?

The “traditional” approach: Baron and Kenny’s 4-step approach

  1. The predictor significantly correlates with the dependent variable
  2. The predictor is related to the mediator
  3. The mediator predicts the dependent variable
  4. The strength of the relation between the predictor and the dependent variable is significantly reduced when the mediator is added to the model
  • How to test these steps in regression?
set.seed(233)
x <- rnorm(300,4,1)
med <- 0.5*x + rnorm (300,0,0.5)
y <- 0.1*x + 0.6*med + rnorm(300,0,0.5)
df2 <- data.frame(x,med,y)
  1. The predictor significantly correlates with the dependent variable: byx
model.step1 <- lm(y~x,df2)
summary(model.step1)
## 
## Call:
## lm(formula = y ~ x, data = df2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.66491 -0.41371  0.03236  0.39688  2.14289 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.02799    0.15699   0.178    0.859    
## x            0.39168    0.03793  10.328   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6179 on 298 degrees of freedom
## Multiple R-squared:  0.2636, Adjusted R-squared:  0.2611 
## F-statistic: 106.7 on 1 and 298 DF,  p-value: < 2.2e-16
  1. The predictor is related to the mediator: bmx
model.step2 <- lm(med~x,df2)
summary(model.step2)
## 
## Call:
## lm(formula = med ~ x, data = df2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.28852 -0.33578 -0.00593  0.37153  1.29721 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.35855    0.12734  -2.816  0.00519 ** 
## x            0.59257    0.03076  19.263  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5012 on 298 degrees of freedom
## Multiple R-squared:  0.5546, Adjusted R-squared:  0.5531 
## F-statistic: 371.1 on 1 and 298 DF,  p-value: < 2.2e-16
  1. The mediator predicts the dependent variable: bym
model.step3 <- lm(y~med,df2)
summary(model.step3)
## 
## Call:
## lm(formula = y ~ med, data = df2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.72668 -0.35811 -0.00736  0.33982  1.44663 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.29468    0.08876    3.32  0.00101 ** 
## med          0.64636    0.04102   15.76  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5318 on 298 degrees of freedom
## Multiple R-squared:  0.4545, Adjusted R-squared:  0.4526 
## F-statistic: 248.2 on 1 and 298 DF,  p-value: < 2.2e-16
  1. The strength of the relation between the predictor and the dependent variable is significantly reduced when the mediator is added to the model: byx.m
model.step4 <- lm(y~x+med,df2)
summary(model.step4)
## 
## Call:
## lm(formula = y ~ x + med, data = df2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.71829 -0.35459 -0.00681  0.34935  1.42725 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.25321    0.13710   1.847   0.0658 .  
## x            0.01946    0.04898   0.397   0.6915    
## med          0.62814    0.06156  10.205   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5326 on 297 degrees of freedom
## Multiple R-squared:  0.4548, Adjusted R-squared:  0.4511 
## F-statistic: 123.9 on 2 and 297 DF,  p-value: < 2.2e-16
  • The a,b,c,c’ in mediation model

    • a: bmx

    • b: bym.x

    • c: byx

    • c’: byx.m

  • The criterion of complete mediation: c’ = 0

  • Comments on the 4-step approach

    • Advantage: Easy for understanding

    • Disadvantage: Low power

      • How many hypotheses have been tested in this 4 steps?
    • Test of significance of c’

      • Not significant: Complete mediation

      • Significant: Then what?

Sobel test: The significance of a*b

  • Among the a, b, c, c’, what is the effect of mediation?

  • Which one is the total effect (the overall effect of x on y)? Which one is the direct effect (the distinctive effect of x on y after controlling for med)?

  • Mediation effect, or, more precisely, indirect effect = total effect - direct effect= c - c’

  • Prove c-c’=a*b

    • \(z_y=\beta_{yx}z_x;\ z_m=\beta_{mx}z_x;\ z_y=\beta_{yx.m}z_x+\beta_{ym.x}z_m\)

    • \(\beta_{yx}=r_{xy}=\frac{\sum z_xz_y}{n-1}=\frac{\sum z_x(\beta_{yx.m}z_x+\beta_{ym.x}z_m)}{n-1}=\frac{\beta_{yx.m}\sum z_x^2+\beta_{ym.x}\sum z_x(\beta_{mx}z_x)}{n-1}=\beta_{yx.m}+\beta_{ym.x}\beta_{mx}\)

    • c=c’+a*b

  • To test whether indirect effect is significant, we can test whether a*b is significantly from 0.

    • Sobel test: \(z=\frac{ab}{\sqrt{b^2s^2_a+a^2*s^2_b}}\)

    • Aroian test: \(z=\frac{ab}{\sqrt{b^2*s^2_a+a^2*s^2_b+s^2_a*s^2_b}}\)

    • Goodman test: \(z=\frac{ab}{\sqrt{b^2*s^2_a+a^2*s^2_b-s^2_a*s^2_b}}\)

    • Baron & Kenny recommended the Aroian formula

library(bda)
## Loading required package: boot
## bda - 18.2.2
mediation.test(df2$med,df2$x,df2$y)
##                Sobel       Aroian      Goodman
## z.value 9.017374e+00 9.007901e+00 9.026877e+00
## p.value 1.926512e-19 2.100374e-19 1.766401e-19
  • What is the problem of sobel test (family)?

  • How did we generate the z score?

  • What is the underlying assumption of z-test?

Bootstrap: The mainstream test for indirect effect

  • Bootstrap is suggested as a better approach: A nonparametric approach

    • Bollen, K. A., & Stine, R. (1990). Direct and Indirect Effects: Classical and Bootstrap Estimates of Variability. Sociological Methodology, 20, 115. https://doi.org/10.2307/271084
  • To determine whether a regression coefficient is different from 0, what would we do?

    • t-test; sampling distribution; p-value

  • The bootstrapping approach: By resampling with replacement

  • How bootstrapping procedure determine the significance of a*b?

    • How to do it?

    • The famous PROCESS

      • Hayes, A. F., & Preacher, K. J. (2014). Statistical mediation analysis with a multicategorical independent variable. British Journal of Mathematical and Statistical Psychology, 67(3), 451–470. https://doi.org/10.1111/bmsp.12028

        Preacher, K. J., & Hayes, A. F. (2004). SPSS and SAS procedures for estimating indirect effects in simple mediation models. Behavior Research Methods, Instruments, & Computers, 36(4), 717–731.

library(psych)
## 
## Attaching package: 'psych'
## The following object is masked from 'package:boot':
## 
##     logit
my.model3 <- mediate(y~x+(med),data=df2,n.iter=100)

summary(my.model3)
## Call: mediate(y = y ~ x + (med), data = df2, n.iter = 100)
## 
## Direct effect estimates (traditional regression)    (c') X + M on Y 
##              y   se     t  df     Prob
## Intercept 0.25 0.14  1.85 297 6.58e-02
## x         0.02 0.05  0.40 297 6.91e-01
## med       0.63 0.06 10.20 297 3.72e-21
## 
## R = 0.67 R2 = 0.45   F = 123.85 on 2 and 297 DF   p-value:  7.66e-40 
## 
##  Total effect estimates (c) (X on Y) 
##              y   se     t  df     Prob
## Intercept 0.03 0.16  0.18 298 8.59e-01
## x         0.39 0.04 10.33 298 1.42e-21
## 
##  'a'  effect estimates (X on M) 
##             med   se     t  df     Prob
## Intercept -0.36 0.13 -2.82 298 5.19e-03
## x          0.59 0.03 19.26 298 2.85e-54
## 
##  'b'  effect estimates (M on Y controlling for X) 
##        y   se    t  df     Prob
## med 0.63 0.06 10.2 297 3.72e-21
## 
##  'ab'  effect estimates (through all  mediators)
##      y boot   sd lower upper
## x 0.37 0.37 0.04  0.29  0.45
  • What if you don’t have the original data? Or, the above methods are not applicable?

  • Monte Carlo Method: Bootstrapping (parameter) in mediation

summary(model.step2)
## 
## Call:
## lm(formula = med ~ x, data = df2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.28852 -0.33578 -0.00593  0.37153  1.29721 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.35855    0.12734  -2.816  0.00519 ** 
## x            0.59257    0.03076  19.263  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5012 on 298 degrees of freedom
## Multiple R-squared:  0.5546, Adjusted R-squared:  0.5531 
## F-statistic: 371.1 on 1 and 298 DF,  p-value: < 2.2e-16
summary(model.step4)
## 
## Call:
## lm(formula = y ~ x + med, data = df2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.71829 -0.35459 -0.00681  0.34935  1.42725 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.25321    0.13710   1.847   0.0658 .  
## x            0.01946    0.04898   0.397   0.6915    
## med          0.62814    0.06156  10.205   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5326 on 297 degrees of freedom
## Multiple R-squared:  0.4548, Adjusted R-squared:  0.4511 
## F-statistic: 123.9 on 2 and 297 DF,  p-value: < 2.2e-16
RMediation::medci(model.step2$coefficients["x"],
                  model.step4$coefficients["med"],
                  se.x=summary(model.step2)$coefficients["x","Std. Error"],
                  se.y=summary(model.step4)$coefficients["med","Std. Error"],
                  type="MC")
## $`95% CI`
##      2.5%     97.5% 
## 0.2936716 0.4552401 
## 
## $Estimate
## [1] 0.3722536
## 
## $SE
## [1] 0.04138573
## 
## $`MC Error`
## [1] 4.138573e-07
  • What if… your theory predicts more than one mediator?

  • Or, you have more than 1 DVs?

  • Generally speaking, to deal with these models. Using lavaan package to fit them as observed variables in Structural Equation Modeling is easier.

  • Sometimes, you may have the need to compare the indirect effects from 2 different independent samples.

    • e.g., whether a mediation is larger in China than in the US.

    • \(I_{diff}=I_1-I_2\)

    • \(SE_{I_{diff}}=\sqrt{SE^2_{I_1}+SE^2_{I_2}}\)

    • \(z=\frac{I_{diff}}{SE_{I_diff}}\)

    • The logic is the same as the change score approach in mixed design. Therefore, it could be regarded as a mediation model with a moderator. Or, a multi-group analysis in SEM.

How to conduct equivalent GLM mediation using lavaan package?

Two intermediate mediators (serial mediation)

library(lavaan)
## This is lavaan 0.6-17
## lavaan is FREE software! Please report any bugs.
## 
## Attaching package: 'lavaan'
## The following object is masked from 'package:psych':
## 
##     cor2cov
library(semPlot)
## Warning in check_dep_version(): ABI version mismatch: 
## lme4 was built with Matrix ABI version 0
## Current Matrix ABI version is 1
## Please re-install lme4 from source or restore original 'Matrix' package
## Set seed for replicability of results
set.seed(3091003)
my.df <- read.table("SEM03.dat", header=TRUE)

my.med1 <- 'y ~ x + m1 + c*m2
m2 ~ x + b*m1
m1 ~ a*x
## Define indirect effect
indirct := a*b*c'

## Usually bootstrap is set with resampling 5000 times. here to speed up the calculation, the value was set at 200
my.fit1 <- sem(my.med1, data=my.df, se="bootstrap", bootstrap=200)
summary(my.fit1)
## lavaan 0.6.17 ended normally after 1 iteration
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         9
## 
##   Number of observations                           100
## 
## Model Test User Model:
##                                                       
##   Test statistic                                 0.000
##   Degrees of freedom                                 0
## 
## Parameter Estimates:
## 
##   Standard errors                            Bootstrap
##   Number of requested bootstrap draws              200
##   Number of successful bootstrap draws             200
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   y ~                                                 
##     x                 0.074    0.134    0.548    0.584
##     m1               -0.115    0.129   -0.893    0.372
##     m2         (c)    0.361    0.122    2.960    0.003
##   m2 ~                                                
##     x                -0.076    0.109   -0.694    0.487
##     m1         (b)    0.693    0.079    8.775    0.000
##   m1 ~                                                
##     x          (a)    0.703    0.098    7.167    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .y                11.706    1.971    5.938    0.000
##    .m2                9.029    1.235    7.310    0.000
##    .m1               11.030    1.311    8.416    0.000
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     indirct           0.176    0.064    2.761    0.006

semPaths(my.fit1, whatLabels="est", color="green")

Two specific mediators (parallel mediation)

## Set seed for replicability of results
set.seed(3091003)
my.df <- read.table("SEM04.dat", header=TRUE)

my.med2 <- 'y ~ x + b*m1 + d*m2
m2 ~ c*x
m1 ~ a*x
## Free the residuals between m1 and m2
m1 ~~ m2
## Define total indirect effect
ind_tot := a*b+c*d
## Define difference on indirect effects
ind_dif := a*b-c*d'

my.fit2 <- sem(my.med2, data=my.df, se="bootstrap", bootstrap=200)
summary(my.fit2)
## lavaan 0.6.17 ended normally after 10 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         9
## 
##   Number of observations                           100
## 
## Model Test User Model:
##                                                       
##   Test statistic                                 0.000
##   Degrees of freedom                                 0
## 
## Parameter Estimates:
## 
##   Standard errors                            Bootstrap
##   Number of requested bootstrap draws              200
##   Number of successful bootstrap draws             200
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   y ~                                                 
##     x                 0.143    0.160    0.896    0.370
##     m1         (b)    0.316    0.105    2.992    0.003
##     m2         (d)    0.361    0.122    2.960    0.003
##   m2 ~                                                
##     x          (c)    0.560    0.096    5.842    0.000
##   m1 ~                                                
##     x          (a)    0.703    0.098    7.167    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##  .m2 ~~                                               
##    .m1                2.133    0.866    2.462    0.014
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .y                11.706    1.971    5.938    0.000
##    .m2                9.441    1.291    7.315    0.000
##    .m1               11.030    1.311    8.416    0.000
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     ind_tot           0.424    0.094    4.504    0.000
##     ind_dif           0.020    0.115    0.173    0.862
## Obtain the bootstrap CI
parameterEstimates(my.fit2)
##        lhs op     rhs   label    est    se     z pvalue ci.lower ci.upper
## 1        y  ~       x          0.143 0.160 0.896  0.370   -0.170    0.485
## 2        y  ~      m1       b  0.316 0.105 2.992  0.003    0.132    0.572
## 3        y  ~      m2       d  0.361 0.122 2.960  0.003    0.082    0.548
## 4       m2  ~       x       c  0.560 0.096 5.842  0.000    0.356    0.711
## 5       m1  ~       x       a  0.703 0.098 7.167  0.000    0.484    0.892
## 6       m2 ~~      m1          2.133 0.866 2.462  0.014    0.599    4.022
## 7        y ~~       y         11.706 1.971 5.938  0.000    7.646   15.654
## 8       m2 ~~      m2          9.441 1.291 7.315  0.000    6.286   12.071
## 9       m1 ~~      m1         11.030 1.311 8.416  0.000    8.412   13.604
## 10       x ~~       x         10.790 0.000    NA     NA   10.790   10.790
## 11 ind_tot := a*b+c*d ind_tot  0.424 0.094 4.504  0.000    0.239    0.629
## 12 ind_dif := a*b-c*d ind_dif  0.020 0.115 0.173  0.862   -0.170    0.268
semPaths(my.fit2, whatLabels="est", color="green")

What are the latent variables?

  • Latent variables:

    • Abstract and hypothetical constructs.

    • They cannot be measured directly.

    • Most constructs in psychology and social sciences are unobservable.

      • For example, motivation, stress, depression, intelligence and satisfaction.
    • What is the common strategy to measure psychological constructs in psychology?

  • Observed or measured variables:

    • Indicators of the latent constructs.

      • For example, items to measure depression, test scores of intelligence.
    • How many items do we need to measure depression?

CFA (Confirmatory factor analysis) Model specification

  • What is the proposed model?

  • \(x = \mu_x + \Lambda f + e\)

    • For example, \(x_1 = \mu_{x_1} + \lambda_{11}f_1 + e_1\)

    • x1: observed continuous variable

    • µx1: intercept of the observed variable. It is usually set at the sample mean.

    • f1: latent factor. Its mean is usually fixed at 0. The mean can be non-zero in multiple-group analysis and latent growth model.

    • λ11: factor loading. It is similar to regression coefficient.

    • e1: measurement error (and unique factor).

Mediation with latent variables

  • The X, M, and Y are all latent variables
dat <- read.csv("exampleMod.csv")

my.med3 <- '
tvalue =~ t1 + t2 + t3 + t4
sources =~ s1 + s2 + s3 + s4 + s5
tip =~ tip1 + tip2 + tip3


tip ~ tvalue + b*sources 
sources ~ a*tvalue
indirect:= a*b
'
my.fit3 <- sem(my.med3, data=dat, se="bootstrap", bootstrap=200)
summary(my.fit3)
## lavaan 0.6.17 ended normally after 30 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        27
## 
##                                                   Used       Total
##   Number of observations                          7037        7314
## 
## Model Test User Model:
##                                                       
##   Test statistic                              1261.996
##   Degrees of freedom                                51
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                            Bootstrap
##   Number of requested bootstrap draws              200
##   Number of successful bootstrap draws             200
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   tvalue =~                                           
##     t1                1.000                           
##     t2                0.988    0.012   83.152    0.000
##     t3                0.809    0.014   57.256    0.000
##     t4                0.653    0.016   40.678    0.000
##   sources =~                                          
##     s1                1.000                           
##     s2                1.164    0.027   43.456    0.000
##     s3                0.860    0.022   38.981    0.000
##     s4                0.967    0.021   45.905    0.000
##     s5                1.280    0.028   45.502    0.000
##   tip =~                                              
##     tip1              1.000                           
##     tip2              0.297    0.027   10.900    0.000
##     tip3              0.980    0.037   26.329    0.000
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   tip ~                                               
##     tvalue           -0.080    0.015   -5.159    0.000
##     sources    (b)    0.559    0.019   29.897    0.000
##   sources ~                                           
##     tvalue     (a)   -0.167    0.014  -11.883    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .t1                0.290    0.013   23.157    0.000
##    .t2                0.193    0.010   19.800    0.000
##    .t3                0.339    0.012   27.418    0.000
##    .t4                0.879    0.015   59.438    0.000
##    .s1                0.642    0.017   38.928    0.000
##    .s2                0.724    0.018   40.463    0.000
##    .s3                1.000    0.017   57.218    0.000
##    .s4                0.643    0.017   38.241    0.000
##    .s5                0.546    0.019   28.824    0.000
##    .tip1              0.736    0.023   31.627    0.000
##    .tip2              0.985    0.017   56.270    0.000
##    .tip3              0.777    0.025   31.692    0.000
##     tvalue            0.750    0.018   42.485    0.000
##    .sources           0.552    0.020   26.930    0.000
##    .tip               0.342    0.021   16.587    0.000
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     indirect         -0.093    0.008  -11.182    0.000
## Obtain the bootstrap CI
parameterEstimates(my.fit3)
##         lhs op     rhs    label    est    se       z pvalue ci.lower ci.upper
## 1    tvalue =~      t1           1.000 0.000      NA     NA    1.000    1.000
## 2    tvalue =~      t2           0.988 0.012  83.152      0    0.962    1.014
## 3    tvalue =~      t3           0.809 0.014  57.256      0    0.782    0.837
## 4    tvalue =~      t4           0.653 0.016  40.678      0    0.623    0.688
## 5   sources =~      s1           1.000 0.000      NA     NA    1.000    1.000
## 6   sources =~      s2           1.164 0.027  43.456      0    1.110    1.217
## 7   sources =~      s3           0.860 0.022  38.981      0    0.817    0.906
## 8   sources =~      s4           0.967 0.021  45.905      0    0.926    1.006
## 9   sources =~      s5           1.280 0.028  45.502      0    1.228    1.337
## 10      tip =~    tip1           1.000 0.000      NA     NA    1.000    1.000
## 11      tip =~    tip2           0.297 0.027  10.900      0    0.242    0.348
## 12      tip =~    tip3           0.980 0.037  26.329      0    0.901    1.059
## 13      tip  ~  tvalue          -0.080 0.015  -5.159      0   -0.110   -0.046
## 14      tip  ~ sources        b  0.559 0.019  29.897      0    0.522    0.596
## 15  sources  ~  tvalue        a -0.167 0.014 -11.883      0   -0.195   -0.142
## 16       t1 ~~      t1           0.290 0.013  23.157      0    0.264    0.314
## 17       t2 ~~      t2           0.193 0.010  19.800      0    0.171    0.212
## 18       t3 ~~      t3           0.339 0.012  27.418      0    0.315    0.367
## 19       t4 ~~      t4           0.879 0.015  59.438      0    0.849    0.909
## 20       s1 ~~      s1           0.642 0.017  38.928      0    0.608    0.672
## 21       s2 ~~      s2           0.724 0.018  40.463      0    0.690    0.758
## 22       s3 ~~      s3           1.000 0.017  57.218      0    0.965    1.032
## 23       s4 ~~      s4           0.643 0.017  38.241      0    0.608    0.672
## 24       s5 ~~      s5           0.546 0.019  28.824      0    0.510    0.584
## 25     tip1 ~~    tip1           0.736 0.023  31.627      0    0.688    0.779
## 26     tip2 ~~    tip2           0.985 0.017  56.270      0    0.948    1.021
## 27     tip3 ~~    tip3           0.777 0.025  31.692      0    0.728    0.827
## 28   tvalue ~~  tvalue           0.750 0.018  42.485      0    0.712    0.786
## 29  sources ~~ sources           0.552 0.020  26.930      0    0.510    0.597
## 30      tip ~~     tip           0.342 0.021  16.587      0    0.301    0.381
## 31 indirect :=     a*b indirect -0.093 0.008 -11.182      0   -0.109   -0.079
semPaths(my.fit3, whatLabels="est", color="green")

Last things about mediation

  • Let’s look back on the 4-step approach, are all the 4 steps necessary?

    • e.g., the first step c!=0?

      • There is now, however, an emerging consensus (see Kenny, Kashy, & Bolger, 1998; MacKinnon, Lockwood, Hoffman, West, & Sheets, 2002; Shrout & Bolger, 2002) that this first condition is unnecessary.

        (Judd & Kenny, 2010)

    • What is needed to determine a significant indirect effect?

      • ab=c-c’!=0

      • The conclusion that only a single condition must hold to demonstrate mediation, i.e., ab = c = c’ != 0, has led to the unfortunate conclusion among many social psychologists that there is a statistical “test” of mediation and all that one needs to do to argue for mediation is to have that “test” be statistically significant. This is far from true. The “test” simply amounts to a test of whether a partial slope is different from an unpartialed or zero-order slope.

        …..

        What the statistics do is estimate and test the indirect path if the model is true.

        (Judd & Kenny, 2010)

  • Statistics analyses should test your theory or model.

  • And this leads to another misunderstanding that mediation can test for causality.

  • Experimental psychology: 3 criteria for causality

    • Association

    • Temporal sequence

    • No explanation by third variable

  • The causality inference should based on research design not statistical analyses

  • Things you may consider–manipulating the mediator(s)

  • Or, cross-lagged panel model.

The main course: Moderation with latent variables

Schoemann, A. M., & Jorgensen, T. D. (2021). Testing and Interpreting Latent Variable Interactions Using the semTools Package. Psych, 3(3), 322–335. https://doi.org/10.3390/psych3030024

What we already know/ or you don’t know :)

Moderating effect between two latent variables

  • There are several approaches to model latent interactions.

  • Cross-products on the observed variables, e.g., Kenny and Judd (1984) and Marsh, Wen, and Hau (2004).

    • Whether to model all possible pairs of indicators;

    • Whether to impose constraints on the factor loadings and error variances.

  • Maximum likelihood estimation of latent interaction (e.g., Klein & Moosbrugger, 2000, LMS)

    • No product term on the indicators;

    • ML estimates are obtained by maximizing the log-likelihood of the joint distribution of indicators for the focal predictor and outcome, conditional on indicators of the moderator—essentially, a mixture distribution across the moderator indicators

    • Only available in Mplus and nlsem package in R.

  • Quasi-ML estimator (QML, Klein and Muthén 2007)

    • Less computationally intensive than LMS

    • More robust to distributional assumptions

  • Markov chain Monte Carlo (MCMC) estimation

    • Incorporate latent product terms directly into the model

    • In a Bayesian framework, factor scores can be sampled from their posterior distribution at each iteration of the Markov chain, along with all other unknown parameters

    • Only available in Mplus


LMS and QML in R: nlsem

Let’s try lavaan() first

my.med4 <- '
  tvalue =~ t1 + t2 + t3 + t4
  sources =~ s1 + s2 + s3 + s4 + s5
  tip =~ tip1 + tip2 + tip3
  tip ~ tvalue + sources + tvalue:sources
'
fit4 <- sem(my.med4, data=dat, se="bootstrap", bootstrap=200)
## Error in lavaan::lavaan(model = my.med4, data = dat, se = "bootstrap", : lavaan ERROR:
##     Interaction terms involving latent variables (tvalue:sources) are
##     not supported. You may consider creating product indicators to
##     define the latent interaction term. See the indProd() function in
##     the semTools package.
summary(fit4, fit.measures = TRUE)
## Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'summary': object 'fit4' not found
semPaths(fit4)
## Error in eval(expr, envir, enclos): object 'fit4' not found
parameterEstimates(fit4)
## Error in eval(expr, envir, enclos): object 'fit4' not found

LMS or QML in nlsem

library(nlsem)
model5 <- '
  tvalue =~ t1 + t2 + t3 + t4
  sources =~ s1 + s2 + s3 + s4 + s5
  tip =~ tip1 + tip2 + tip3
  tip ~ tvalue + sources + tvalue:sources
'
nl_model5<- lav2nlsem(model5, constraints="direct1",class.spec="class")
start <- runif(count_free_parameters(nl_model5))
## qml=FALSE, use LMS
## Notice NA
em5 <- em(nl_model5, na.omit(dat[,2:13]), start, qml=TRUE)
summary(em5)

Product–indicator approaches

  • Product indicators are assumed to be continuous by virtue of multiplying their values

    • parceling strategy
  • The number of observed variables

Unconstrained approach

  • It is similar to creating cross-products. However, it skips some constaints.

  • It is called the unconstrained approach (Marsh, Wen, & Hau, 2004).

  • Model:

    • y1 to y3: y = λ ∗fy + ey

    • x1 to x6 are centered scores.

    • x1 to x3: x = λ ∗f1 + e

    • x4 to x6: x = λ ∗f2 + e

    • x7 to x9: x = λ ∗ff1f2 + e where x7=x1*x4, x8=x2*x5 and x9=x3*x6.

Mean-centering first-order indicators (Marsh et al., 2004); matched-pairs strategy
## Center the items on the predictors
## Be careful! It replaces the original data.
my.df <- read.csv("SEM01.csv", header=TRUE)
my.df[, 4:9] <- as.data.frame(scale(my.df[, 4:9], scale=FALSE))
## Create product terms
my.df$x7 <- with(my.df, x1*x4)
my.df$x8 <- with(my.df, x2*x5)
my.df$x9 <- with(my.df, x3*x6)

my.model1 <- '## Measurement model on fy
            fy =~ y1 + y2 + y3
            ## Measurement model on f1 and f2
            f1 =~ x1 + x2 + x3
            f2 =~ x4 + x5 + x6
            ## Measurement model for the latent interaction
            f1f2 =~ x7 + x8 + x9

            ## Covariance between f1 and f2
            f1 ~~ cf1f2*f2
            ## Covariances between f1 and f1f2, and f2 and f1f2 are fixed at 0
            f1 ~~ 0*f1f2
            f2 ~~ 0*f1f2

            ## Intercepts on the f1 and f2
            f1 ~ 0*1
            f2 ~ 0*1
            ## Intercept on f1f2 is fixed at cov(f1,f2)
            f1f2 ~ cf1f2*1

            ## Intercepts on items are fixed at 0
            x1 ~ 0*1
            x2 ~ 0*1
            x3 ~ 0*1
            x4 ~ 0*1
            x5 ~ 0*1
            x6 ~ 0*1
            x7 ~ 0*1
            x8 ~ 0*1
            x9 ~ 0*1

            ## Structural model
            fy ~ f1 + f2 + f1f2'

my.fit1 <- sem(my.model1, data=my.df)
summary(my.fit1, fit.measures=TRUE, standardized=TRUE, rsquare=TRUE)
## lavaan 0.6.17 ended normally after 71 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        32
##   Number of equality constraints                     1
## 
##   Number of observations                           500
## 
## Model Test User Model:
##                                                       
##   Test statistic                                44.805
##   Degrees of freedom                                59
##   P-value (Chi-square)                           0.914
## 
## Model Test Baseline Model:
## 
##   Test statistic                               647.120
##   Degrees of freedom                                66
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000
##   Tucker-Lewis Index (TLI)                       1.027
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -8751.313
##   Loglikelihood unrestricted model (H1)      -8728.911
##                                                       
##   Akaike (AIC)                               17564.627
##   Bayesian (BIC)                             17695.280
##   Sample-size adjusted Bayesian (SABIC)      17596.884
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.011
##   P-value H_0: RMSEA <= 0.050                    1.000
##   P-value H_0: RMSEA >= 0.080                    0.000
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.028
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   fy =~                                                                 
##     y1                1.000                               0.731    0.613
##     y2                1.259    0.113   11.120    0.000    0.920    0.721
##     y3                1.638    0.150   10.891    0.000    1.197    0.796
##   f1 =~                                                                 
##     x1                1.000                               0.440    0.423
##     x2                1.206    0.205    5.884    0.000    0.531    0.546
##     x3                1.582    0.293    5.389    0.000    0.696    0.678
##   f2 =~                                                                 
##     x4                1.000                               0.561    0.534
##     x5                1.036    0.246    4.218    0.000    0.581    0.579
##     x6                0.613    0.144    4.266    0.000    0.344    0.337
##   f1f2 =~                                                               
##     x7                1.000                               0.275    0.240
##     x8                1.887    0.850    2.221    0.026    0.519    0.559
##     x9                0.496    0.312    1.590    0.112    0.136    0.126
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   fy ~                                                                  
##     f1                0.149    0.116    1.284    0.199    0.089    0.089
##     f2                0.210    0.102    2.063    0.039    0.161    0.161
##     f1f2              0.491    0.273    1.799    0.072    0.185    0.185
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   f1 ~~                                                                 
##     f2      (cf12)    0.056    0.019    2.881    0.004    0.227    0.227
##     f1f2              0.000                               0.000    0.000
##   f2 ~~                                                                 
##     f1f2              0.000                               0.000    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     f1                0.000                               0.000    0.000
##     f2                0.000                               0.000    0.000
##     f1f2    (cf12)    0.056    0.019    2.881    0.004    0.204    0.204
##    .x1                0.000                               0.000    0.000
##    .x2                0.000                               0.000    0.000
##    .x3                0.000                               0.000    0.000
##    .x4                0.000                               0.000    0.000
##    .x5                0.000                               0.000    0.000
##    .x6                0.000                               0.000    0.000
##    .x7                0.000                               0.000    0.000
##    .x8                0.000                               0.000    0.000
##    .x9                0.000                               0.000    0.000
##    .y1                3.761    0.056   67.766    0.000    3.761    3.155
##    .y2                3.534    0.060   58.626    0.000    3.534    2.769
##    .y3                3.253    0.072   45.291    0.000    3.253    2.163
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .y1                0.887    0.070   12.756    0.000    0.887    0.624
##    .y2                0.782    0.081    9.633    0.000    0.782    0.480
##    .y3                0.829    0.121    6.882    0.000    0.829    0.367
##    .x1                0.889    0.067   13.288    0.000    0.889    0.821
##    .x2                0.662    0.066    9.992    0.000    0.662    0.701
##    .x3                0.568    0.094    6.033    0.000    0.568    0.540
##    .x4                0.789    0.090    8.718    0.000    0.789    0.715
##    .x5                0.671    0.091    7.358    0.000    0.671    0.665
##    .x6                0.920    0.066   13.841    0.000    0.920    0.886
##    .x7                1.233    0.088   13.980    0.000    1.233    0.942
##    .x8                0.594    0.147    4.033    0.000    0.594    0.688
##    .x9                1.148    0.075   15.331    0.000    1.148    0.984
##    .fy                0.494    0.078    6.358    0.000    0.925    0.925
##     f1                0.194    0.053    3.686    0.000    1.000    1.000
##     f2                0.315    0.089    3.535    0.000    1.000    1.000
##     f1f2              0.076    0.044    1.730    0.084    1.000    1.000
## 
## R-Square:
##                    Estimate
##     y1                0.376
##     y2                0.520
##     y3                0.633
##     x1                0.179
##     x2                0.299
##     x3                0.460
##     x4                0.285
##     x5                0.335
##     x6                0.114
##     x7                0.058
##     x8                0.312
##     x9                0.016
##     fy                0.075
semPaths(my.fit1, whatLabels="est", color="green")

  • Although performing well in simulation, different pairs substantially alter the results in real data

The desert: Moderated mediation

## Use bootstrapLavaan() by writing a function that takes the simple slopes 
## from probe2WayMC() and calculates the indirect effects manually in that 
## bootstrap sample.  
## From the help-page example, treating the latent outcome as a mediator and 
## adding an additional arbitrary outcome:
library(semTools)
  
  dat2wayMC <- indProd(dat2way, 1:3, 4:6)
  dat2wayMC$newDV <- rowMeans(dat2way) + rnorm(nrow(dat2way))
  model1 <- "
f1 =~ x1 + x2 + x3
f2 =~ x4 + x5 + x6
f12 =~ x1.x4 + x2.x5 + x3.x6
f3 =~ x7 + x8 + x9
f3 ~ f1 + f2 + f12
f12 ~~ 0*f1 + 0*f2
x1 + x4 + x1.x4 + x7 ~ 0*1 # identify latent means
f1 + f2 + f12 + f3 ~ NA*1
newDV ~ b*f3
"
  fitMC2way <- sem(model1, data = dat2wayMC, meanstructure = TRUE)
  probe <- probe2WayMC(fitMC2way, nameX = c("f1", "f2", "f12"), 
                       nameY = "f3", modVar = "f2", valProbe = c(-1, 0, 1))
  probe$SimpleSlope
##   f2   est    se      z pvalue
## 1 -1 0.294 0.015 20.081      0
## 2  0 0.436 0.012 35.238      0
## 3  1 0.578 0.016 36.182      0
  ## conditional indirect effects on newDV
  probe$SimpleSlope$est * coef(fitMC2way)[["b"]]
## [1] 0.3531317 0.5239150 0.6946983
  ## custom function to return simple slopes
  condIndFX <- function(fit) {
    condFX <- probe2WayMC(fit, nameX = c("f1", "f2", "f12"), nameY = "f3",
                          modVar = "f2", valProbe = c(-1, 0, 1))
    indFX <- condFX$SimpleSlope$est * coef(fit)[["b"]]
    names(indFX) <- paste("f2 =", condFX$SimpleSlope$f2)
    indFX
  }
  ## test once on original data
  condIndFX(fitMC2way)
##   f2 = -1    f2 = 0    f2 = 1 
## 0.3531317 0.5239150 0.6946983
  ## (too small) bootstrap sample of simple slopes
  bootOut <- bootstrapLavaan(fitMC2way, R = 100, FUN = condIndFX)
  ## percentile 95% CI
  apply(bootOut, MARGIN = 2, FUN = quantile, probs = c(.025, .975))
##         f2 = -1    f2 = 0   f2 = 1
## 2.5%  0.3150987 0.5014642 0.664913
## 97.5% 0.3753435 0.5450563 0.724830
林闻正
林闻正
Lecturer of Psychology

My research interests include humility, morality, and prosociality.