24.2 理论基础

分两个段落分别介绍指数族和 GLM

\[ f(y;\theta,\phi) = \exp[(a(y) b(\theta) + c(\theta))/f(\phi) + d(y,\phi)] \]

泊松分布 (with \(\lambda \to \theta\), \(x \to y\)) (\(\phi=1\)):

\[\begin{equation} \begin{split} f(y,\theta) & = \exp(-\theta) \theta^y/(y!) \\ & = \exp\left( \underbrace{y}_{a(y)} \underbrace{\log \theta}_{b(\theta)} + \underbrace{(-\theta)}_{c(\theta)} + \underbrace{(- \log(y!))}_{d(y)} \right) \end{split} \end{equation}\]

24.2.1 岭回归

Geometry and properties of generalized ridge regression in high dimensions http://web.ccs.miami.edu/~hishwaran/papers/IR.conmath2014.pdf

这篇文章借助三维几何图形展示高维情形下的广义岭回归

24.2.2 Lasso

glmnet: Lasso and Elastic-Net Regularized Generalized Linear Models https://glmnet.stanford.edu

24.2.3 最优子集回归

bestglm: Best Subset GLM and Regression Utilities

24.2.4 偏最小二乘回归

pls(Mevik and Wehrens 2007) 实现了偏最小二乘回归(partial least squares regression, PLS)和主成分回归 (principal component regression, PCR),详见主页 https://mevik.net/work/software/pls.html 帮助文档的质量较高,是比较完整全面的。

  • several algorithms: the traditional orthogonal scores (NIPALS) PLS algorithm, kernel PLS, wide kernel PLS, Simpls and PCR through svd
  • supports multi-response models (aka PLS2)
  • flexible cross-validation
  • Jackknife variance estimates of regression coefficients
  • extensive and flexible plots: scores, loadings, predictions, coefficients, (R)MSEP, R², correlation loadings
  • formula interface, modelled after lm(), with methods for predict, print, summary, plot, update, etc.
  • extraction functions for coefficients, scores and loadings
  • MSEP, RMSEP and R² estimates
  • multiplicative scatter correction (MSC)

参考文献

Mevik, Björn-Helge, and Ron Wehrens. 2007. “The pls Package: Principal Component and Partial Least Squares Regression in r.” Journal of Statistical Software 18 (2): 1–23. https://doi.org/10.18637/jss.v018.i02.