24.6 研究婴儿出生体重低的相关危险因素

在线性回归的基础上,响应变量是离散的类别,且无序 (Hasan, Wang, and Mahani 2016)

birthwt 数据是 1986 年在马萨诸塞州斯普林菲尔德的 Baystate 医疗中心收集的,用于研究婴儿出生体重低的相关危险因素

# 加载数据
# library(MASS)
data(birthwt, package = "MASS")
# 查看 birthwt 数据集 `help(birthwt)`
head(birthwt)
##    low age lwt race smoke ptl ht ui ftv  bwt
## 85   0  19 182    2     0   0  0  1   0 2523
## 86   0  33 155    3     0   0  0  0   3 2551
## 87   0  20 105    1     1   0  0  0   1 2557
## 88   0  21 108    1     1   0  0  1   2 2594
## 89   0  18 107    1     1   0  0  1   0 2600
## 91   0  21 124    3     0   0  0  0   0 2622
str(birthwt)
## 'data.frame':    189 obs. of  10 variables:
##  $ low  : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ age  : int  19 33 20 21 18 21 22 17 29 26 ...
##  $ lwt  : int  182 155 105 108 107 124 118 103 123 113 ...
##  $ race : int  2 3 1 1 1 3 1 3 1 1 ...
##  $ smoke: int  0 0 1 1 1 0 0 0 1 1 ...
##  $ ptl  : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ ht   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ ui   : int  1 0 0 1 1 0 0 0 0 0 ...
##  $ ftv  : int  0 3 1 2 0 0 1 1 1 0 ...
##  $ bwt  : int  2523 2551 2557 2594 2600 2622 2637 2637 2663 2665 ...

low 表示婴儿出生体重小于 2.5kg,age 表示母亲的年龄(年),lwt 母亲最后一次月经期间的体重(磅),race 母亲的种族(1 =白人,2 =黑人,3 =其他)。,smoke 怀孕期间的吸烟状况,ptl 以前早产的次数,ht 高血压病史,ui 子宫过敏,ftv 妊娠头三个月的医生就诊次数,bwt 出生体重(克)

with(birthwt, tapply(lwt, ui, var))
##        0        1 
## 940.8472 783.7196
t.test(lwt ~ ui, data = birthwt, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  lwt by ui
## t = 2.1138, df = 187, p-value = 0.03586
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   0.8753389 25.3544748
## sample estimates:
## mean in group 0 mean in group 1 
##        131.7578        118.6429
t.test(lwt ~ ui, data = birthwt)
## 
##  Welch Two Sample t-test
## 
## data:  lwt by ui
## t = 2.2547, df = 39.163, p-value = 0.02982
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   1.351128 24.878685
## sample estimates:
## mean in group 0 mean in group 1 
##        131.7578        118.6429
# birthwt$ui <- as.factor(birthwt$ui)
# library(lattice)
# bwplot(lwt ~ ui, data = birthwt, pch = "|")

boxplot(lwt ~ ui, data = birthwt)

# 重新编码,数据预处理,方便代入模型
bwt <- with(birthwt, {
  race <- factor(race, labels = c("white", "black", "other"))
  ptd <- factor(ptl > 0)
  ftv <- factor(ftv)
  levels(ftv)[-(1:2)] <- "2+" # 除了前两个水平外,其余的都编码为 2+
  data.frame(
    low = factor(low), age, lwt, race, smoke = (smoke > 0),
    ptd, ht = (ht > 0), ui = (ui > 0), ftv
  )
})
# 查看编码后的数据
head(bwt)
##   low age lwt  race smoke   ptd    ht    ui ftv
## 1   0  19 182 black FALSE FALSE FALSE  TRUE   0
## 2   0  33 155 other FALSE FALSE FALSE FALSE  2+
## 3   0  20 105 white  TRUE FALSE FALSE FALSE   1
## 4   0  21 108 white  TRUE FALSE FALSE  TRUE  2+
## 5   0  18 107 white  TRUE FALSE FALSE  TRUE   0
## 6   0  21 124 other FALSE FALSE FALSE FALSE   0
str(bwt)
## 'data.frame':    189 obs. of  9 variables:
##  $ low  : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
##  $ age  : int  19 33 20 21 18 21 22 17 29 26 ...
##  $ lwt  : int  182 155 105 108 107 124 118 103 123 113 ...
##  $ race : Factor w/ 3 levels "white","black",..: 2 3 1 1 1 3 1 3 1 1 ...
##  $ smoke: logi  FALSE FALSE TRUE TRUE TRUE FALSE ...
##  $ ptd  : Factor w/ 2 levels "FALSE","TRUE": 1 1 1 1 1 1 1 1 1 1 ...
##  $ ht   : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ ui   : logi  TRUE FALSE FALSE TRUE TRUE FALSE ...
##  $ ftv  : Factor w/ 3 levels "0","1","2+": 1 3 2 3 1 1 2 2 2 1 ...

广义线性模型拟合,二项逻辑回归,响应变量为婴儿出生的体重,以2.5kg为界,它被编码成二分类变量 0或1

options(contrasts = c("contr.treatment", "contr.poly"))
glm(formula = low ~ ., family = binomial, data = bwt)
## 
## Call:  glm(formula = low ~ ., family = binomial, data = bwt)
## 
## Coefficients:
## (Intercept)          age          lwt    raceblack    raceother    smokeTRUE  
##     0.82302     -0.03723     -0.01565      1.19241      0.74068      0.75553  
##     ptdTRUE       htTRUE       uiTRUE         ftv1        ftv2+  
##     1.34376      1.91317      0.68020     -0.43638      0.17901  
## 
## Degrees of Freedom: 188 Total (i.e. Null);  178 Residual
## Null Deviance:       234.7 
## Residual Deviance: 195.5     AIC: 217.5

多项逻辑回归

library(nnet)
(bwt.mu <- multinom(formula = low ~ ., data = bwt))
## # weights:  12 (11 variable)
## initial  value 131.004817 
## iter  10 value 98.029803
## final  value 97.737759 
## converged
## Call:
## multinom(formula = low ~ ., data = bwt)
## 
## Coefficients:
## (Intercept)         age         lwt   raceblack   raceother   smokeTRUE 
##  0.82320102 -0.03723828 -0.01565359  1.19240391  0.74065606  0.75550487 
##     ptdTRUE      htTRUE      uiTRUE        ftv1       ftv2+ 
##  1.34375901  1.91320116  0.68020207 -0.43638470  0.17900392 
## 
## Residual Deviance: 195.4755 
## AIC: 217.4755
summary(bwt.mu)
## Call:
## multinom(formula = low ~ ., data = bwt)
## 
## Coefficients:
##                  Values  Std. Err.
## (Intercept)  0.82320102 1.24476766
## age         -0.03723828 0.03870437
## lwt         -0.01565359 0.00708079
## raceblack    1.19240391 0.53598076
## raceother    0.74065606 0.46176615
## smokeTRUE    0.75550487 0.42503626
## ptdTRUE      1.34375901 0.48063449
## htTRUE       1.91320116 0.72076133
## uiTRUE       0.68020207 0.46434974
## ftv1        -0.43638470 0.47941107
## ftv2+        0.17900392 0.45639129
## 
## Residual Deviance: 195.4755 
## AIC: 217.4755

计算 Z 分数和 P 值

z <- summary(bwt.mu)$coefficients / summary(bwt.mu)$standard.errors
z
## (Intercept)         age         lwt   raceblack   raceother   smokeTRUE 
##   0.6613291  -0.9621210  -2.2107121   2.2247140   1.6039635   1.7775069 
##     ptdTRUE      htTRUE      uiTRUE        ftv1       ftv2+ 
##   2.7958023   2.6544170   1.4648486  -0.9102516   0.3922159
p <- (1 - pnorm(abs(z), 0, 1)) * 2
p
## (Intercept)         age         lwt   raceblack   raceother   smokeTRUE 
## 0.508401310 0.335988847 0.027055777 0.026100443 0.108722092 0.075484881 
##     ptdTRUE      htTRUE      uiTRUE        ftv1       ftv2+ 
## 0.005177106 0.007944557 0.142962228 0.362689827 0.694898695

模型解释

参考文献

Hasan, Asad, Zhiyu Wang, and Alireza S. Mahani. 2016. “Fast Estimation of Multinomial Logit Models: R Package mnlogit.” Journal of Statistical Software 75 (3): 1–24. https://doi.org/10.18637/jss.v075.i03.