Model Selection

4/28/2016
Below is the R Markdown file showing different techniques to do feature selection in R.
Covered techniques are:
Best subset selection
Forward Selection
Ridge & Lasso Regression
---
title: "ModelSelection"
output: html_document
---

```{r}
library(ISLR)
summary(Hitters)
```

Hitters is baseball dataset where we aim to predict salary of baseball player.

Summary shows that there are some NA values in Salary, lets remove them:

na.omit: Removes any row that has NA value.

```{r}
Hitters = na.omit(Hitters)
with(Hitters, sum(is.na(Salary)))
```

Best subset selection
______________________
Look for all combination of models for each specific number of features and determine the best model for each.

cp statistic: Pick the model with least cp statistic

```{r}
library(leaps)
regFit.full = regsubsets(Salary~., data = Hitters, nvmax = 19)
reg.summary = summary(regFit.full)
names(reg.summary)
plot(reg.summary$cp, xlab="Number of Variables", ylab = "cp")
best.model.num = which.min(reg.summary$cp)
coef(regFit.full, best.model.num)
```

Forward step-wise selection
______________________
Greedy Algorithm, each time it includes a new variable to set as a nested Sequence.

```{r}
regFit.fwd = regsubsets(Salary~., data = Hitters, nvmax = 19, method = "forward")
reg.summary.fwd  = summary(regFit.fwd)
plot(regFit.fwd, scale="Cp")
```

Model Selection using a validation-set
______________________
Lets make a training and a validation set, so that we can choose a good subset model.

```{r}
dim(Hitters)
set.seed(1)
train = sample(seq(263), 180, replace = FALSE)
regFit.fwd.v = regsubsets(Salary~., data = Hitters[train,], nvmax = 19, method = "forward")
reg.summary.fwd.v  = summary(regFit.fwd.v)
plot(regFit.fwd.v, scale="Cp")
```

Preparing the test data
Note: regFit.fwd.v$rss[-1] removes the RSS for null model

```{r}
errors.v = rep(NA, 19)
test.v = model.matrix(Salary~., data = Hitters[-train,])
for(i in 1:19){
  coef.i = coef(regFit.fwd.v, id = i)
  pred = test.v[, names(coef.i)]%*%coef.i
  errors.v[i] = mean((Hitters$Salary[-train] - pred)^2)
}
plot(sqrt(errors.v), ylab="RMSE", ylim=c(300, 400), pch = 19, type = "b")
points(sqrt(regFit.fwd.v$rss[-1]/180), col="blue", pch = 19, type = "b")
legend("topright", legend=c("Training", "Validation"), col=c("blue", "black"), pch = 19)
```

Model-Selection by cross-validation
______________________
We shall use 10 fold cross validation

```{r}
set.seed(11)
folds = sample(rep(1:10, length = nrow(Hitters)))
table(folds)
errors.cv = matrix(NA, 10, 19)
for(k in 1:10){
  best.fit = regsubsets(Salary~.,data=Hitters[folds!=k,],nvmax = 19, method = "forward")
  test.v = model.matrix(Salary~., data = Hitters[folds == k,])
  for(i in 1:19) {
    coef.i = coef(best.fit, id = i)
    pred = test.v[,names(coef.i)]%*%coef.i
    errors.cv[k, i] = mean((Hitters$Salary[folds == k] - pred)^2)
  }
}
rmse = sqrt(apply(errors.cv,2,mean))
plot(rmse,pch=19,type="b")
```

Ridge & Lasso Regression
______________________
Split the Hitters package to predictors (X) and Response (Y)

```{r}
library(glmnet)
x = model.matrix(Salary~.-1, data = Hitters)
y = Hitters$Salary
```

First we shalls ee ridge regression by calling glmnet with alpha = 0
cv.glmnet function which will do the cross-validation for us.
Ridge Regression keeps all predictors and try to reduce the coefficients to zero.
Ridge uses L1-Norm (Sum of Squares of coefficients). Penality is applied on it.
RSS + lambda * L-1 Norm

```{r}
fit.ridge = glmnet(x, y, alpha = 0)
plot(fit.ridge, xvar = "lambda", label = TRUE)
cv.ridge = cv.glmnet(x, y, alpha = 0)
plot(cv.ridge)
```


We shall use Lasso regression, by setting the alpha = 1. Does both Shrinkage and feature selection
Lasoo Regression uses Mean Absolute Error of coefficients
RSS + lambda * MAE
Using MAE, Lasso reduces some of its coefficients to Zero.
```{r}
fit.lasso = glmnet(x, y, alpha = 1)
plot(fit.lasso, xvar = "lambda", label = TRUE)
cv.lasso = cv.glmnet(x, y, alpha = 1)
plot(cv.lasso)
coef(cv.lasso)
```

Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.
0 Comments
Model Selection

Leave a Reply.

Archives

Categories