23.12 1888 年瑞士生育率分析

1888 年瑞士生育率和社会经济指标数据,各个指标都是百分比的形式,探索性分析

1888 年瑞士生育率和社会经济指标的关系

图 23.3: 1888 年瑞士生育率和社会经济指标的关系

fit_swiss <- lm(Fertility ~ . - 1, data = swiss)
summary(fit_swiss)
## 
## Call:
## lm(formula = Fertility ~ . - 1, data = swiss)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -16.8358  -6.3606  -0.5603   6.0585  23.3203 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## Agriculture       0.11100    0.07424   1.495  0.14233    
## Examination       0.44406    0.31435   1.413  0.16514    
## Education        -0.70674    0.25009  -2.826  0.00719 ** 
## Catholic          0.11707    0.04860   2.409  0.02046 *  
## Infant.Mortality  2.98366    0.31683   9.417 6.53e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.893 on 42 degrees of freedom
## Multiple R-squared:  0.9828, Adjusted R-squared:  0.9807 
## F-statistic: 478.8 on 5 and 42 DF,  p-value: < 2.2e-16
anova(fit_swiss)
## Analysis of Variance Table
## 
## Response: Fertility
##                  Df Sum Sq Mean Sq   F value    Pr(>F)    
## Agriculture       1 204039  204039 2084.6865 < 2.2e-16 ***
## Examination       1  16781   16781  171.4556 < 2.2e-16 ***
## Education         1     24      24    0.2454    0.6229    
## Catholic          1   4782    4782   48.8556 1.504e-08 ***
## Infant.Mortality  1   8680    8680   88.6858 6.528e-12 ***
## Residuals        42   4111      98                        
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Cook 距离 ?plot.lm

par(mar = c(4, 4, 2, 2))
plot(fit_swiss, which = 4, sub.caption = "")

par(mar = c(4, 4, 2, 2))
plot(fit_swiss, which = 5, sub.caption = "")

X <- as.matrix(swiss[, setdiff(names(swiss), "Fertility")])
Y <- as.matrix(swiss[, "Fertility"])
# beta 的估计
(beta_hat <- solve(a = crossprod(X, X), b = crossprod(X, Y)))
##                        [,1]
## Agriculture       0.1110005
## Examination       0.4440591
## Education        -0.7067362
## Catholic          0.1170662
## Infant.Mortality  2.9836617
# Y 的预测 MSE 残差平方和 
sigma2_hat <- (t(Y) %*% (diag(rep(1, dim(X)[1])) - X %*% solve(crossprod(X)) %*% t(X)) %*% Y)/(dim(X)[1] - qr(X)$rank)
# RMSE
sqrt(sigma2_hat)
##          [,1]
## [1,] 9.893187