第 24 章 功效分析
CRAN 上有很多功效计算和分析的 R 包,我们针对不同的混合效应模型和统计检验,提供对应的 R 实现。
MKpower 包提供 Welch 和 Hsu(许宝騄)t 检验、Wilcoxon 秩和检验、符号秩检验的功效分析和样本量计算,经验功效和第一类错误的计算方法是蒙特卡罗模拟。Superpower 基于模拟的方法分析三因素方差分析实验设计的功效,开发者写了本书介绍,详见 https://aaroncaldwell.us/SuperpowerBook/,也开发了两个 Shiny 应用。powerlmm 可用于计算两、三个水平的纵向多水平/线性混合效应模型的功效。pwrAB Welch 两样本 t 检验的功效分析,常用于 A/B 测试。Metin Bulus 开发 PowerUpR 计算响应变量是连续型的多水平随机对照实验统计功效,最小可检测的效应大小,最小样本量要求。simr 通过模拟方法分析广义线性混合效应模型的功效。WebPower 提供相关性、比例、t 检验、单因素方差分析、两因素方差分析、线性回归、逻辑回归、泊松回归、纵向数据分析、结构方程模型和多水平模型等的功效分析,详见网站 https://webpower.psychstat.org/,包含书籍和功效分析的工具。PowerAnalysisIL 功效分析的 shiny 应用 http://daniellakens.blogspot.com/2015/01/always-use-welchs-t-test-instead-of.html。
此外,还有 lmerTest [50] 和 lmtest [51]。 试验设计 [52] 可以视为一种组织形式,包括各类检验, R 语言实战 [53] 作者 Robert I. Kabacoff 创建了网站 Quick-R,实战这本书第 10 章功效分析主要基于 pwr 包来介绍,Jacob Cohen 的著作《Statistical Power Analysis for the Behavioral Sciences》第二版 [54]
https://powerandsamplesize.com/ 功效和样本量计算器
pbkrtest 提供 parametric bootstrap test、Kenward-Roger-type F-test、Satterthwaite-type F-test 用于线性混合效应模型,parametric bootstrap test 用于广义线性混合效应模型
24.1 方差分析检验的功效
power.anova.test()
计算平衡的单因素方差分析检验的功效
usage(power.anova.test)
power.anova.test(groups = NULL, n = NULL, between.var = NULL, within.var = NULL,
sig.level = 0.05, power = NULL)
power.anova.test(
groups = 4, # 4 个组
between.var = 1, # 组间方差为 1
within.var = 3, # 组内方差为 3
power = 0.95 # 1 - 犯第二类错误的概率
)
##
## Balanced one-way analysis of variance power calculation
##
## groups = 4
## n = 18.18245
## between.var = 1
## within.var = 3
## sig.level = 0.05
## power = 0.95
##
## NOTE: n is number in each group
24.2 比例检验的功效
power.prop.test()
计算两样本比例检验的功效
usage(power.prop.test)
power.prop.test(n = NULL, p1 = NULL, p2 = NULL, sig.level = 0.05, power = NULL,
alternative = c("two.sided", "one.sided"), strict = FALSE,
tol = .Machine$double.eps^0.25)
功效可以用来计算实验所需要的样本量,检验统计量的功效越大/高,检验方法越好,实验所需要的样本量越少
# p1 >= p2 的检验 单边和双边检验
power.prop.test(
p1 = .65, p2 = 0.6, sig.level = .05,
power = 0.90, alternative = "one.sided"
)
##
## Two-sample comparison of proportions power calculation
##
## n = 1603.846
## p1 = 0.65
## p2 = 0.6
## sig.level = 0.05
## power = 0.9
## alternative = one.sided
##
## NOTE: n is number in *each* group
power.prop.test(
p1 = .65, p2 = 0.6, sig.level = .05,
power = 0.90, alternative = "two.sided"
)
##
## Two-sample comparison of proportions power calculation
##
## n = 1968.064
## p1 = 0.65
## p2 = 0.6
## sig.level = 0.05
## power = 0.9
## alternative = two.sided
##
## NOTE: n is number in *each* group
pwr 包 pwr.2p.test()
函数提供了类似 power.prop.test()
函数的功能
library(pwr)
# 明确 p1 > p2 的检验
# 单边检验拆分更加明细,分为大于和小于
pwr.2p.test(
h = ES.h(p1 = 0.65, p2 = 0.6),
sig.level = 0.05, power = 0.9, alternative = "greater"
)
##
## Difference of proportion power calculation for binomial distribution (arcsine transformation)
##
## h = 0.1033347
## n = 1604.007
## sig.level = 0.05
## power = 0.9
## alternative = greater
##
## NOTE: same sample sizes
已知两样本的样本量不等,检验 H0: \(p_1 = p_2\) H1: \(p_1 \neq p_2\) 的功效
library(pwr)
pwr.2p2n.test(
h = 0.30, n1 = 80, n2 = 245,
sig.level = 0.05, alternative = "greater"
)
##
## difference of proportion power calculation for binomial distribution (arcsine transformation)
##
## h = 0.3
## n1 = 80
## n2 = 245
## sig.level = 0.05
## power = 0.7532924
## alternative = greater
##
## NOTE: different sample sizes
h 表示两个样本的差异,计算得到的功效是 0.75
24.3 t 检验的功效
power.t.test()
计算单样本或两样本的 t 检验的功效,或者根据功效计算参数,如样本量
Cohen’s d 单样本/配对 t 检验的功效分析
n = 30 # 样本量(只是一个例子)
x = seq(0, 12, 0.01)
library(ggplot2)
dat <- data.frame(xx = x/sqrt(n), yy = 2 * (1 - pt(x, n - 1)))
ggplot(data = dat, aes(x = xx, y = yy)) +
geom_line() +
geom_vline(xintercept = c(0.01, 0.2, 0.5, 0.8, 1.2, 2), linetype = 2) +
theme_minimal() +
labs(x = "d = t / sqrt(n)", y = "2 * (1 - pt(x, n - 1))")

图 24.1: t 检验的功效
usage(power.t.test)
power.t.test(n = NULL, delta = NULL, sd = 1, sig.level = 0.05, power = NULL,
type = c("two.sample", "one.sample", "paired"),
alternative = c("two.sided", "one.sided"), strict = FALSE,
tol = .Machine$double.eps^0.25)
power.t.test(
n = 100, delta = 2.2,
sd = 1, sig.level = 0.05,
type = "two.sample",
alternative = "two.sided"
)
##
## Two-sample t test power calculation
##
## n = 100
## delta = 2.2
## sd = 1
## sig.level = 0.05
## power = 1
## alternative = two.sided
##
## NOTE: n is number in *each* group
参数 | 含义 |
---|---|
n |
每个组的样本量 |
delta |
两个组的均值之差 |
sd |
标准差,默认值 1 |
sig.level |
显著性水平,默认是 0.05 (犯第 I 类错误的概率) |
power |
检验的功效(1 - 犯第 II 类错误的概率) |
type |
t 检验的类型 "two.sample" 两样本、"one.sample" 单样本或 "paired" 配对样本 |
alternative |
单边或双边检验,取值为 "two.sided" 或 "one.sided"
|
参数 n
,delta
,power
,sd
和 sig.level
必须有一个值为 NULL
,为 NULL
的参数是由其它参数决定的。
Jacob Cohen 提出的 Cohen’s d 和 Cohen’s f 详见书籍 [54],他的代表性文章,地球是圆的 [34]
# 前面 t 检验和方差分析检验的等价功效计算
library(pwr)
pwr.t.test(
d = 2.2 / 6.4,
n = 100,
sig.level = 0.05,
type = "two.sample",
alternative = "two.sided"
)
##
## Two-sample t test power calculation
##
## n = 100
## d = 0.34375
## sig.level = 0.05
## power = 0.6768572
## alternative = two.sided
##
## NOTE: n is number in *each* group
# f 是如何和上面的组间/组内方差等价指定的
pwr.anova.test(
k = 4, # 组数
f = 0.5,
power = 0.95 # 检验的效
)
##
## Balanced one-way analysis of variance power calculation
##
## k = 4
## n = 18.18244
## f = 0.5
## sig.level = 0.05
## power = 0.95
##
## NOTE: n is number in each group
with(
aggregate(
data = PlantGrowth, weight ~ group,
FUN = function(x) c(dist_mean = mean(x), dist_sd = sd(x))
),
cbind.data.frame(weight, group)
)
R 3.5.0 以后,函数 aggregate
的参数 drop
默认设置为 TRUE
表示扔掉未用来分组的变量,聚合返回的是一个矩阵类型的数据对象。
ggsignif 添加显著性注释
library(ggplot2)
library(ggsignif)
ggplot(data = PlantGrowth, aes(x = group, y = weight)) +
geom_boxplot() +
geom_signif(comparisons = list(c("ctrl", "trt1"), c("trt1", "trt2")),
map_signif_level = function(p) sprintf("p = %.2g", p),
textsize = 6, test = "t.test") +
theme_minimal()
无条件 \(2 \times 2\) 列联表
fisher.test https://en.wikipedia.org/wiki/Fisher's_exact_test
Exact https://en.wikipedia.org/wiki/Barnard's_test exact.test power.exact.test
24.4 运行环境
## R version 4.1.3 (2022-03-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggplot2_3.3.5 lme4_1.1-28 Matrix_1.4-1 pwr_1.3-0 formatR_1.11
## [6] magrittr_2.0.3
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.1.2 xfun_0.30 bslib_0.3.1 purrr_0.3.4
## [5] splines_4.1.3 lattice_0.20-45 colorspace_2.0-3 vctrs_0.4.0
## [9] generics_0.1.2 htmltools_0.5.2 yaml_2.3.5 utf8_1.2.2
## [13] rlang_1.0.2 jquerylib_0.1.4 nloptr_2.0.0 pillar_1.7.0
## [17] withr_2.5.0 DBI_1.1.2 glue_1.6.2 lifecycle_1.0.1
## [21] stringr_1.4.0 munsell_0.5.0 gtable_0.3.0 memoise_2.0.1
## [25] evaluate_0.15 labeling_0.4.2 knitr_1.38 fastmap_1.1.0
## [29] curl_4.3.2 fansi_1.0.3 highr_0.9 Rcpp_1.0.8.3
## [33] scales_1.1.1 cachem_1.0.6 jsonlite_1.8.0 sysfonts_0.8.8
## [37] farver_2.1.0 fs_1.5.2 digest_0.6.29 stringi_1.7.6
## [41] bookdown_0.25 dplyr_1.0.8 grid_4.1.3 cli_3.2.0
## [45] tools_4.1.3 sass_0.4.1 tibble_3.1.6 crayon_1.5.1
## [49] pkgconfig_2.0.3 downlit_0.4.0 MASS_7.3-56 ellipsis_0.3.2
## [53] xml2_1.3.3 assertthat_0.2.1 minqa_1.2.4 rmarkdown_2.13
## [57] R6_2.5.1 boot_1.3-28 nlme_3.1-157 compiler_4.1.3