1Foundation of Mathematical Biology / The Elements of Statistical Learning
.pdfSmoothing Splines
Avoid knot selection problem by regularization.
For all fns f with two cts derivatives minimize
N
RSS( f ; λ) = ∑fyi f (xi)g2 + λZ f f (t)g2 dt
i=1
First term measures closeness to data, second term penalizes curvature in f ; λ effects trade-off:
λ= 0 : f any interpolating function, (very rough)
λ= ∞ : f simple least squares fit (very smooth).
Soln: natural cubic spline with knots at unique xi.
ˆ |
= ( ˆ |
(xi)) = Sλy. Calibrate |
Linear smoother: f |
f |
smoothing parameter λ via dfλ = trace(Sλ).
Pick λ by cross-validation; GCV.
Additive Models
Multiple linear regression:
E(Y jX1; : : : ; Xp) = β0 + β1X1 + : : : + βpXp
Additive model extension:
E(Y jX1; : : : ; Xp) = β0 + s1(X1)+ : : : + sp(Xp)
Estimation of s j via backfitting algorithm:
ˆ |
1 |
N |
|
|
|
1. Initialize: β0 = |
|
∑i=1 yi; sˆ j 0 |
8 j. |
||
N |
|||||
2. Cycle: j = 1; 2; : : : ; p; : : : ; 1; 2; : : : ; |
|
||||
|
|
|
ˆ |
|
N |
sˆ j Smooth j "fyi β0 |
∑ sˆk(xik)g1 # |
||||
|
|
|
|
k j |
|
until the sˆ j |
converge. |
|
|
Same generalization – replacing linear predictor with sum of smooth functions – and backfitting method applies to binary, count outcomes.
Prostate Cancer: Additive Model Fits
|
|
|
|
|
|
|
|
|
|
|
|
1.5 |
|
2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
|
|
|
|
1.0 |
|
1 |
|
|
|
|
|
|
|
|
|
|
|
s(lcavol) |
0 |
|
|
|
|
s(lweight) |
0 |
|
|
|
s(age) |
0.5 |
|
|
|
|
|
|
|
|
|
|
|
|
0.0 |
|
-1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
-1 |
|
|
|
|
-0.5 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
-2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
-1.0 |
|
-1 |
0 |
1 |
2 |
3 |
4 |
|
3 |
4 |
5 |
6 |
|
|
|
|
lcavol |
|
|
|
|
|
lweight |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1.0 |
|
0.6 |
|
|
|
|
|
|
|
|
|
|
|
|
0.4 |
|
|
|
|
|
0.5 |
|
|
|
|
0.5 |
s(lbph) |
0.0 0.2 |
|
|
|
|
s(lcp) |
0.0 |
|
|
|
s(pgg45) |
0.0 |
-0.2 |
-0.5 |
-0.5 |
-0.4 |
|
1.0 |
|
|
- |
0.6 |
-1.0 |
|
- |
|
|
40 |
50 |
60 |
70 |
80 |
|
|
age |
|
|
-1 |
0 |
1 |
2 |
-1 |
0 |
1 |
2 |
3 |
0 |
20 |
40 |
60 |
80 |
100 |
|
|
lbph |
|
|
|
lcp |
|
|
|
|
|
pgg45 |
|
|
Prostate Cancer: Additive Model
|
Df |
NparDf |
Npar F |
Pr(F) |
s(lcavol) |
1 |
3 |
1.15 |
0.33 |
s(lweight) |
1 |
3 |
1.65 |
0.18 |
s(lcp) |
1 |
3 |
2.11 |
0.10 |
s(pgg45) |
1 |
3 |
1.15 |
0.33 |
Initial Model:
lpsa ˜ s(lcavol) + s(lweight) + s(lcp) + s(pgg45) Final Model:
lpsa ˜ lcavol + lweight + s(lcp) + s(pgg45)
|
From |
To |
Df Resid Df |
AIC |
|
1 |
|
|
|
80 |
57.5 |
2 |
s(lweight) s(lweight, 2) |
2 |
82 |
56.4 |
|
3 |
s(lcavol) |
s(lcavol, 2) |
2 |
84 |
55.6 |
4 |
s(lcavol, 2) |
lcavol |
1 |
85 |
55.3 |
5 |
s(lweight, 2) |
lweight |
1 |
86 |
55.3 |
Tree-Structured Regression Paradigm
Tree-based methods involve four components:
1.A set of questions - splits - phrased in terms of covariates that serve to partition the covariate space. A tree structure derives from recursive splitting and a binary tree results if the questions are yes/no. The subgroups created by assigning cases according to splits are termed nodes.
2.A split f unction φ(s; g) that can be evaluated for any split s of any node g.
The split function is used to assess the worth of the competing splits.
3.A means for determining appropriate tree size.
4.Statistical summaries for the nodes of the tree.
Allowable Splits
An interpretable, flexible, feasible set of splits is
obtained by constraining that
1.each split depends upon the value of only a single covariate,
2.for continuous or ordered categorical covariates,
Xj, only splits resulting from questions of the form “Is Xj c ?” for c 2 domain(Xj) are considered; thus ordering is preserved,
3.for categorical covariates all possible splits into disjoint subsets of the categories are allowed.
Growing a Tree
1.Initialize: root node comprises the entire sample.
2.Recurse: for every terminal node, g,
(a)examine all splits, s, on each covariate,
(b)select and execute (create left, gL, and right, gR, daughter nodes) the best of these splits.
3.Stopping: grow large; prune back.
4.Selection: cross-validation, test sample.
Best split determined by split function φ(s; g).
y¯ = (1=N ) |
∑i |
2 |
g |
y outcome average for node g. |
|||||
g |
g |
|
i |
|
|
|
|
||
Within node sum-of-squares: SS(g) = |
∑i |
2 |
g |
(y y¯ )2. |
|||||
|
|
|
|
|
|
|
i g |
||
Define |
φ(s; g) = SS(g) SS(gL) SS(gR). |
|
|
Best split s such that φ(s ; g) = maxs φ(s; g)
Easily computed via updating formulae.
Prostate Cancer: Regression Tree
|
|
|
2.4780 |
|
|
|
|
|
| |
|
|
|
|
|
n=97 |
|
|
|
lcavol<2.46165 |
|
|
||
|
|
|
|
lcavol>2.46165 |
|
|
2.1230 |
|
|
|
3.7650 |
|
n=76 |
|
|
|
n=21 |
|
lcavol<-0.478556 |
|
|
lcavol<2.79352 |
|
|
lcavol>-0.478556 |
|
|
lcavol>2.79352 |
|
|
0.6017 |
2.3270 |
|
3.2840 |
4.2030 |
|
n=9 |
n=67 |
|
n=10 |
n=11 |
|
lweight<3.68886 |
|
|
|
|
|
|
|
lweight>3.68886 |
|
|
|
2.0330 |
|
|
2.7120 |
|
|
n=38 |
|
|
n=29 |
|
pgg45<7.5 |
|
lcavol<0.821736 |
|
||
|
pgg45>7.5 |
|
|
lcavol>0.821736 |
|
1.7250 |
2.4130 |
|
2.2880 |
2.9360 |
|
n=21 |
n=17 |
|
n=10 |
|
n=19 |
lcavol<0.774462 |
|
|
|
|
|
lcavol>0.774462 |
|
|
|
|
|
1.2630 |
2.0100 |
|
|
|
|
n=8 |
n=13 |
|
|
|
|
Prostate Cancer: Regression Tree
|
1.2 |
|
Cross-Validation |
|
|
|
|
|
|
|
|
|
Training Error |
|
|
1.0 |
|
|
|
Error |
0.8 |
|
|
|
Squared |
0.6 |
|
|
|
Relative |
0.4 |
|
|
|
|
0.2 |
|
|
|
|
0.0 |
|
|
|
|
0 |
2 |
4 |
6 |
Number of Splits
Prostate Cancer: Pruned Regression Tree
2.4780
|
n=97
lcavol<2.46165
lcavol>2.46165
2.1230 |
3.7650 |
n=76 |
n=21 |
lcavol<-0.478556
lcavol>-0.478556
0.6017 |
|
2.3270 |
n=9 |
|
n=67 |
|
|
|