CGSS data analysis

CGSS data analysis

Lin Wenzheng

2024-06-30

Parenting motivation

## [1] "A68b. 请问您有几个18岁以下未成年子女(包括继子继女、养"

偏态的计数数据(zero inflation),作为自变量,转换为两分变量:有未成年子女在家 vs. 没有。

P12 Pro-environmental intensions

lapply(df2[, c("P12_1","P12_2","P12_3","P12_4","P12_5","P12_6","P12_7")], function(x) attr(x, "label"))
## $P12_1
## [1] "[1.像我这样的人很难为环境保护做什么] P12.您在多大程度上"
## 
## $P12_2
## [1] "[2.即使要花费更多的钱和时间,我也要做有利于环境的事] P12"
## 
## $P12_3
## [1] "[3.生活中还有比环境保护更重要的事情要做] P12.您在多大程"
## 
## $P12_4
## [1] "[4.除非大家都做,否则我保护环境的努力就没有意义] P12.您"
## 
## $P12_5
## [1] "[5.许多关于环境威胁的说法都是夸大其词] P12.您在多大程度"
## 
## $P12_6
## [1] "[6.我很难弄清楚我现在的生活方式是对环境有害还是有利] P12"
## 
## $P12_7
## [1] "[7.环境问题直接影响我的日常生活] P12.您在多大程度上同意"
df2[c("P12_1_r","P12_3_r","P12_4_r","P12_5_r","P12_6_r")] <- car::recode(as.matrix(df2[c("P12_1","P12_3","P12_4","P12_5","P12_6")]),
            "1=5;2=4;3=3;4=2;5=1")
alpha(df2[c("P12_1_r","P12_2","P12_3_r","P12_4_r","P12_5_r","P12_6_r","P12_7")])
## 
## Reliability analysis   
## Call: alpha(x = df2[c("P12_1_r", "P12_2", "P12_3_r", "P12_4_r", "P12_5_r", 
##     "P12_6_r", "P12_7")])
## 
##   raw_alpha std.alpha G6(smc) average_r S/N    ase mean   sd median_r
##       0.52      0.52     0.5      0.13 1.1 0.0081    3 0.59     0.15
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt      0.5  0.52  0.53
## Duhachek   0.5  0.52  0.53
## 
##  Reliability if an item is dropped:
##         raw_alpha std.alpha G6(smc) average_r  S/N alpha se  var.r med.r
## P12_1_r      0.41      0.42    0.39      0.11 0.71   0.0101 0.0083 0.085
## P12_2        0.51      0.52    0.49      0.15 1.07   0.0083 0.0106 0.176
## P12_3_r      0.49      0.50    0.47      0.14 0.99   0.0086 0.0102 0.148
## P12_4_r      0.44      0.44    0.42      0.12 0.79   0.0096 0.0078 0.085
## P12_5_r      0.44      0.44    0.42      0.12 0.79   0.0095 0.0093 0.086
## P12_6_r      0.49      0.49    0.46      0.14 0.94   0.0087 0.0095 0.148
## P12_7        0.55      0.54    0.51      0.17 1.20   0.0075 0.0078 0.176
## 
##  Item statistics 
##            n raw.r std.r r.cor r.drop mean   sd
## P12_1_r 2624  0.64  0.62  0.55  0.396  3.2 1.21
## P12_2   2637  0.42  0.43  0.23  0.168  3.5 1.01
## P12_3_r 2594  0.45  0.47  0.30  0.217  2.4 0.98
## P12_4_r 2640  0.60  0.58  0.48  0.342  2.6 1.21
## P12_5_r 2534  0.58  0.58  0.47  0.342  3.4 1.07
## P12_6_r 2517  0.49  0.50  0.34  0.239  3.0 1.09
## P12_7   2633  0.40  0.37  0.13  0.088  3.2 1.21
## 
## Non missing response frequency for each item
##            1    2    3    4    5 miss
## P12_1_r 0.07 0.31 0.10 0.38 0.13 0.68
## P12_2   0.03 0.17 0.21 0.46 0.12 0.68
## P12_3_r 0.14 0.51 0.18 0.13 0.03 0.68
## P12_4_r 0.16 0.43 0.11 0.22 0.08 0.68
## P12_5_r 0.04 0.20 0.18 0.45 0.14 0.69
## P12_6_r 0.06 0.34 0.23 0.30 0.08 0.69
## P12_7   0.09 0.25 0.13 0.40 0.13 0.68
alpha(df2[c("P12_1_r","P12_3_r","P12_4_r","P12_5_r","P12_6_r")])
## 
## Reliability analysis   
## Call: alpha(x = df2[c("P12_1_r", "P12_3_r", "P12_4_r", "P12_5_r", "P12_6_r")])
## 
##   raw_alpha std.alpha G6(smc) average_r S/N    ase mean   sd median_r
##       0.58      0.57    0.52      0.21 1.3 0.0073  2.9 0.71     0.21
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.56  0.58  0.59
## Duhachek  0.56  0.58  0.59
## 
##  Reliability if an item is dropped:
##         raw_alpha std.alpha G6(smc) average_r  S/N alpha se  var.r med.r
## P12_1_r      0.49      0.49    0.42      0.19 0.95   0.0091 0.0034  0.21
## P12_3_r      0.56      0.56    0.49      0.24 1.29   0.0079 0.0014  0.24
## P12_4_r      0.49      0.49    0.42      0.19 0.94   0.0092 0.0041  0.19
## P12_5_r      0.50      0.50    0.43      0.20 0.99   0.0088 0.0048  0.21
## P12_6_r      0.55      0.55    0.48      0.23 1.21   0.0080 0.0027  0.23
## 
##  Item statistics 
##            n raw.r std.r r.cor r.drop mean   sd
## P12_1_r 2624  0.68  0.65  0.51   0.38  3.2 1.21
## P12_3_r 2594  0.52  0.54  0.33   0.25  2.4 0.98
## P12_4_r 2640  0.67  0.65  0.51   0.39  2.6 1.21
## P12_5_r 2534  0.63  0.63  0.48   0.36  3.4 1.07
## P12_6_r 2517  0.57  0.57  0.37   0.28  3.0 1.09
## 
## Non missing response frequency for each item
##            1    2    3    4    5 miss
## P12_1_r 0.07 0.31 0.10 0.38 0.13 0.68
## P12_3_r 0.14 0.51 0.18 0.13 0.03 0.68
## P12_4_r 0.16 0.43 0.11 0.22 0.08 0.68
## P12_5_r 0.04 0.20 0.18 0.45 0.14 0.69
## P12_6_r 0.06 0.34 0.23 0.30 0.08 0.69
df2$pe_intentM <- rowMeans(df2[c("P12_1_r","P12_2","P12_3_r","P12_4_r","P12_5_r","P12_6_r","P12_7")])

量表内部一致性信度表现较差,无论只使用反向题(最大数量)还是全部题目,alpha均只有0.5+。

Pro-environmental behaviors

# P11a - P11d, reverse coding P11d
lapply(df2[, c("P11a","P11b","P11c","P11d")], function(x) attr(x, "label"))
## $P11a
## [1] "P11a.为了保护环境,您在多大程度上愿意支付更高的价格?"
## 
## $P11b
## [1] "P11b.为了保护环境,您在多大程度上愿意缴纳更高的税?"
## 
## $P11c
## [1] "P11c.为了保护环境,您在多大程度上愿意降低生活水平?"
## 
## $P11d
## [1] "P11d.为了经济发展,您在多大程度上愿意接受减少国家自然保"
df2$P11d_r <- 6 - df2$P11d
alpha(df2[, c("P11a","P11b","P11c","P11d_r")])
## Warning in alpha(df2[, c("P11a", "P11b", "P11c", "P11d_r")]): Some items were negatively correlated with the first principal component and probably 
## should be reversed.  
## To do this, run the function again with the 'check.keys=TRUE' option
## Some items ( P11d_r ) were negatively correlated with the first principal component and 
## probably should be reversed.  
## To do this, run the function again with the 'check.keys=TRUE' option
## 
## Reliability analysis   
## Call: alpha(x = df2[, c("P11a", "P11b", "P11c", "P11d_r")])
## 
##   raw_alpha std.alpha G6(smc) average_r S/N    ase mean   sd median_r
##       0.52      0.52     0.6      0.21 1.1 0.0084  2.9 0.76     0.19
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt      0.5  0.52  0.54
## Duhachek   0.5  0.52  0.54
## 
##  Reliability if an item is dropped:
##        raw_alpha std.alpha G6(smc) average_r  S/N alpha se var.r  med.r
## P11a        0.17      0.15    0.27     0.058 0.18   0.0159 0.147 -0.109
## P11b        0.16      0.15    0.26     0.057 0.18   0.0161 0.129 -0.077
## P11c        0.39      0.38    0.50     0.172 0.62   0.0119 0.210 -0.077
## P11d_r      0.79      0.79    0.73     0.554 3.72   0.0042 0.016  0.496
## 
##  Item statistics 
##           n raw.r std.r r.cor r.drop mean  sd
## P11a   2636  0.82  0.82  0.81   0.60  2.8 1.1
## P11b   2622  0.83  0.82  0.82   0.59  3.0 1.1
## P11c   2645  0.70  0.68  0.53   0.38  3.2 1.1
## P11d_r 2432  0.24  0.23 -0.18  -0.16  2.6 1.1
## 
## Non missing response frequency for each item
##           1    2    3    4    5 miss
## P11a   0.09 0.40 0.21 0.23 0.07 0.68
## P11b   0.07 0.36 0.20 0.27 0.10 0.68
## P11c   0.06 0.27 0.19 0.37 0.12 0.68
## P11d_r 0.15 0.38 0.25 0.17 0.05 0.70
alpha(df2[, c("P11a","P11b","P11c","P11d")])
## 
## Reliability analysis   
## Call: alpha(x = df2[, c("P11a", "P11b", "P11c", "P11d")])
## 
##   raw_alpha std.alpha G6(smc) average_r S/N    ase mean   sd median_r
##       0.68      0.68    0.68      0.34 2.1 0.0058  3.1 0.83     0.34
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.67  0.68  0.69
## Duhachek  0.67  0.68  0.69
## 
##  Reliability if an item is dropped:
##      raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## P11a      0.53      0.53    0.48      0.27 1.1   0.0090 0.040  0.22
## P11b      0.50      0.50    0.45      0.25 1.0   0.0095 0.038  0.22
## P11c      0.56      0.56    0.58      0.30 1.3   0.0087 0.123  0.11
## P11d      0.79      0.79    0.73      0.55 3.7   0.0042 0.016  0.50
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop mean  sd
## P11a 2636  0.79  0.79  0.75   0.58  2.8 1.1
## P11b 2622  0.82  0.81  0.78   0.61  3.0 1.1
## P11c 2645  0.77  0.76  0.63   0.54  3.2 1.1
## P11d 2432  0.49  0.49  0.19   0.16  3.4 1.1
## 
## Non missing response frequency for each item
##         1    2    3    4    5 miss
## P11a 0.09 0.40 0.21 0.23 0.07 0.68
## P11b 0.07 0.36 0.20 0.27 0.10 0.68
## P11c 0.06 0.27 0.19 0.37 0.12 0.68
## P11d 0.05 0.17 0.25 0.38 0.15 0.70
alpha(df2[, c("P11a","P11b","P11c")])
## 
## Reliability analysis   
## Call: alpha(x = df2[, c("P11a", "P11b", "P11c")])
## 
##   raw_alpha std.alpha G6(smc) average_r S/N    ase mean   sd median_r
##       0.79      0.79    0.73      0.55 3.7 0.0042    3 0.95      0.5
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.78  0.79  0.79
## Duhachek  0.78  0.79  0.79
## 
##  Reliability if an item is dropped:
##      raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## P11a      0.66      0.66    0.50      0.50 2.0   0.0075    NA  0.50
## P11b      0.63      0.63    0.46      0.46 1.7   0.0081    NA  0.46
## P11c      0.82      0.82    0.70      0.70 4.7   0.0040    NA  0.70
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop mean  sd
## P11a 2636  0.86  0.86  0.78   0.67  2.8 1.1
## P11b 2622  0.88  0.87  0.80   0.69  3.0 1.1
## P11c 2645  0.79  0.78  0.57   0.52  3.2 1.1
## 
## Non missing response frequency for each item
##         1    2    3    4    5 miss
## P11a 0.09 0.40 0.21 0.23 0.07 0.68
## P11b 0.07 0.36 0.20 0.27 0.10 0.68
## P11c 0.06 0.27 0.19 0.37 0.12 0.68
df2[, c("P11d","P11d_r")]
## # A tibble: 8,148 × 2
##    P11d                    P11d_r
##    <dbl+lbl>                <dbl>
##  1  3 [既非愿意也非不愿意]      3
##  2  3 [既非愿意也非不愿意]      3
##  3 NA                          NA
##  4 NA                          NA
##  5 NA                          NA
##  6 NA                          NA
##  7  1 [非常愿意]                5
##  8 NA                          NA
##  9 NA                          NA
## 10 NA                          NA
## # ℹ 8,138 more rows
df2$pe_sacri <- rowMeans(df2[, c("P11a","P11b","P11c")])

反向题表现不好,删去。

lapply(df2[, c("P19a","P19b","P20","P21_1","P21_2","P21_3")], function(x) attr(x, "label"))
## $P19a
## [1] "P19a.您经常会特意将玻璃、铝罐、塑料或报纸等进行分类以方"
## 
## $P19b
## [1] "P19b.您经常会特意为了环境保护而不去购买某些产品吗?"
## 
## $P20
## [1] "P20.您是否加入了任何以保护环境为目的的社团?"
## 
## $P21_1
## [1] "[1.就某个环境问题签署过请愿书] P21.在过去5年中,您是否有"
## 
## $P21_2
## [1] "[2.给环保团体捐过钱] P21.在过去5年中,您是否有过以下行动"
## 
## $P21_3
## [1] "[3.为某个环境问题参加过抗议或示威游行] P21.在过去5年中,"
df2$pe_trash <- 2-df2$P19a
df2$pe_purchase <- 2-df2$P19b
df2$pe_commu <- 2-df2$P20
df2$pe_petition <- 2-df2$P21_1
df2$pe_donate <- 2-df2$P21_2
df2$pe_protest <- 2-df2$P21_3

均为2分变量,1=是,2=否,反向计分为1=是,0=否

Correlation matrix

df3 <- df2[,c("gender", "age", "income", "ParMotiv", "ParMotivB", "pe_intentM", "pe_sacri", "pe_trash", "pe_purchase", "pe_commu", "pe_petition", "pe_donate", "pe_protest")]
pairs.panels(df3, stars = TRUE)

Regression

DV:Intention

根据相关结果初步进行回归分析。控制变量:性别、年龄。同时也检验了性别的调节模型。

df3$gender <- as.factor(df3$gender)

my.lm1.1 <- lm(pe_intentM ~ ParMotivB + gender + age, df3)
car::Anova(my.lm1.1)
## Anova Table (Type II tests)
## 
## Response: pe_intentM
##           Sum Sq   Df F value    Pr(>F)    
## ParMotivB   0.02    1  0.0525    0.8188    
## gender      0.11    1  0.3537    0.5521    
## age         7.46    1 24.5351 7.817e-07 ***
## Residuals 713.02 2345                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
my.lm1.2 <- lm(pe_intentM ~ ParMotivB*gender + age, df3)
car::Anova(my.lm1.2)
## Anova Table (Type II tests)
## 
## Response: pe_intentM
##                  Sum Sq   Df F value    Pr(>F)    
## ParMotivB          0.02    1  0.0527  0.818404    
## gender             0.11    1  0.3552  0.551247    
## age                7.51    1 24.7910 6.855e-07 ***
## ParMotivB:gender   3.20    1 10.5614  0.001171 ** 
## Residuals        709.83 2344                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
cat_plot(my.lm1.2, pred = gender, modx = ParMotivB)

testInteractions(my.lm1.2,fixed=c("gender"),pairwise = c("ParMotivB"))
## Warning in rbind(deparse.level, ...): number of columns of result, 6, is not a
## multiple of vector length 5 of arg 2
## F Test: 
## P-value adjustment method: holm
##               Value      SE     Df Sum of Sq      F Pr(>F)
## 0-1 : 1    0.070425    0.04   1.00   0.82265 2.7166 0.1185
## 0-1 : 2   -0.077892    0.04   1.00   1.07868 3.5620 0.1185
## Residuals           2344.00 709.83

主效应模型:加入年龄作为控制变量后,主效应消失。 调节模型:加入性别与抚育动机的调节项发现,二者调节作用显著。对于男性来说,抚育动机抑制亲环境行为倾向。在女性上,抚育动机促进亲环境行为倾向。 Note. 但亲环境意图量表的信度指标表现不好。

DV:Purchase

控制变量:性别、年龄,收入。同时也检验了性别的调节模型。

my.lm2.1 <- lm(pe_purchase ~ ParMotivB + gender + age + income, df3)
summary(my.lm2.1)
## 
## Call:
## lm(formula = pe_purchase ~ ParMotivB + gender + age + income, 
##     data = df3)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.25195 -0.90611 -0.08782  0.76749  2.27594 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -6.265e-01  1.082e-01  -5.790 7.86e-09 ***
## ParMotivB1  -4.945e-03  5.687e-02  -0.087  0.93071    
## gender2     -1.394e-02  3.697e-02  -0.377  0.70606    
## age         -6.478e-03  1.602e-03  -4.043 5.42e-05 ***
## income      -1.496e-08  5.769e-09  -2.594  0.00954 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9593 on 2736 degrees of freedom
##   (5407 observations deleted due to missingness)
## Multiple R-squared:  0.01488,    Adjusted R-squared:  0.01344 
## F-statistic: 10.33 on 4 and 2736 DF,  p-value: 2.65e-08
my.lm2.2 <- lm(pe_purchase ~ ParMotivB*gender + age + income, df3)
summary(my.lm2.2)
## 
## Call:
## lm(formula = pe_purchase ~ ParMotivB * gender + age + income, 
##     data = df3)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.27123 -0.90782 -0.08788  0.74171  2.29826 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        -5.978e-01  1.095e-01  -5.458 5.23e-08 ***
## ParMotivB1         -7.135e-02  6.937e-02  -1.029   0.3038    
## gender2            -6.725e-02  4.883e-02  -1.377   0.1685    
## age                -6.469e-03  1.602e-03  -4.039 5.52e-05 ***
## income             -1.480e-08  5.768e-09  -2.566   0.0103 *  
## ParMotivB1:gender2  1.241e-01  7.429e-02   1.671   0.0949 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9589 on 2735 degrees of freedom
##   (5407 observations deleted due to missingness)
## Multiple R-squared:  0.01588,    Adjusted R-squared:  0.01408 
## F-statistic: 8.828 on 5 and 2735 DF,  p-value: 2.528e-08

加入控制变量后,主效应模型与调节效应模型均不显著。

捐款

控制变量:性别、年龄,收入。同时也检验了性别的调节模型。

my.glm3.1 <- glm(pe_donate ~ ParMotivB + gender + age, family = binomial, df3)
summary(my.glm3.1)
## 
## Call:
## glm(formula = pe_donate ~ ParMotivB + gender + age, family = binomial, 
##     data = df3)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.417171   0.403415  -3.513 0.000443 ***
## ParMotivB1   0.325766   0.218916   1.488 0.136729    
## gender2     -0.055131   0.138731  -0.397 0.691077    
## age         -0.022826   0.006174  -3.697 0.000218 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1603.8  on 2740  degrees of freedom
## Residual deviance: 1543.3  on 2737  degrees of freedom
##   (5407 observations deleted due to missingness)
## AIC: 1551.3
## 
## Number of Fisher Scoring iterations: 5
my.glm3.2 <- glm(pe_donate ~ ParMotivB*gender + age, family = binomial, df3)
summary(my.glm3.2)
## 
## Call:
## glm(formula = pe_donate ~ ParMotivB * gender + age, family = binomial, 
##     data = df3)
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        -1.547787   0.419241  -3.692 0.000223 ***
## ParMotivB1          0.521346   0.272863   1.911 0.056049 .  
## gender2             0.164272   0.228103   0.720 0.471422    
## age                -0.022748   0.006164  -3.690 0.000224 ***
## ParMotivB1:gender2 -0.351733   0.287658  -1.223 0.221425    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1603.8  on 2740  degrees of freedom
## Residual deviance: 1541.8  on 2736  degrees of freedom
##   (5407 observations deleted due to missingness)
## AIC: 1551.8
## 
## Number of Fisher Scoring iterations: 5