Below is the R Markdown file showing different techniques to do feature selection in R.
Covered techniques are:
--- title: "ModelSelection" output: html_document --- ```{r} library(ISLR) summary(Hitters) ``` Hitters is baseball dataset where we aim to predict salary of baseball player. Summary shows that there are some NA values in Salary, lets remove them: na.omit: Removes any row that has NA value. ```{r} Hitters = na.omit(Hitters) with(Hitters, sum(is.na(Salary))) ``` Best subset selection ______________________ Look for all combination of models for each specific number of features and determine the best model for each. cp statistic: Pick the model with least cp statistic ```{r} library(leaps) regFit.full = regsubsets(Salary~., data = Hitters, nvmax = 19) reg.summary = summary(regFit.full) names(reg.summary) plot(reg.summary$cp, xlab="Number of Variables", ylab = "cp") best.model.num = which.min(reg.summary$cp) coef(regFit.full, best.model.num) ``` Forward step-wise selection ______________________ Greedy Algorithm, each time it includes a new variable to set as a nested Sequence. ```{r} regFit.fwd = regsubsets(Salary~., data = Hitters, nvmax = 19, method = "forward") reg.summary.fwd = summary(regFit.fwd) plot(regFit.fwd, scale="Cp") ``` Model Selection using a validation-set ______________________ Lets make a training and a validation set, so that we can choose a good subset model. ```{r} dim(Hitters) set.seed(1) train = sample(seq(263), 180, replace = FALSE) regFit.fwd.v = regsubsets(Salary~., data = Hitters[train,], nvmax = 19, method = "forward") reg.summary.fwd.v = summary(regFit.fwd.v) plot(regFit.fwd.v, scale="Cp") ``` Preparing the test data Note: regFit.fwd.v$rss[-1] removes the RSS for null model ```{r} errors.v = rep(NA, 19) test.v = model.matrix(Salary~., data = Hitters[-train,]) for(i in 1:19){ coef.i = coef(regFit.fwd.v, id = i) pred = test.v[, names(coef.i)]%*%coef.i errors.v[i] = mean((Hitters$Salary[-train] - pred)^2) } plot(sqrt(errors.v), ylab="RMSE", ylim=c(300, 400), pch = 19, type = "b") points(sqrt(regFit.fwd.v$rss[-1]/180), col="blue", pch = 19, type = "b") legend("topright", legend=c("Training", "Validation"), col=c("blue", "black"), pch = 19) ``` Model-Selection by cross-validation ______________________ We shall use 10 fold cross validation ```{r} set.seed(11) folds = sample(rep(1:10, length = nrow(Hitters))) table(folds) errors.cv = matrix(NA, 10, 19) for(k in 1:10){ best.fit = regsubsets(Salary~.,data=Hitters[folds!=k,],nvmax = 19, method = "forward") test.v = model.matrix(Salary~., data = Hitters[folds == k,]) for(i in 1:19) { coef.i = coef(best.fit, id = i) pred = test.v[,names(coef.i)]%*%coef.i errors.cv[k, i] = mean((Hitters$Salary[folds == k] - pred)^2) } } rmse = sqrt(apply(errors.cv,2,mean)) plot(rmse,pch=19,type="b") ``` Ridge & Lasso Regression ______________________ Split the Hitters package to predictors (X) and Response (Y) ```{r} library(glmnet) x = model.matrix(Salary~.-1, data = Hitters) y = Hitters$Salary ``` First we shalls ee ridge regression by calling glmnet with alpha = 0 cv.glmnet function which will do the cross-validation for us. Ridge Regression keeps all predictors and try to reduce the coefficients to zero. Ridge uses L1-Norm (Sum of Squares of coefficients). Penality is applied on it. RSS + lambda * L-1 Norm ```{r} fit.ridge = glmnet(x, y, alpha = 0) plot(fit.ridge, xvar = "lambda", label = TRUE) cv.ridge = cv.glmnet(x, y, alpha = 0) plot(cv.ridge) ``` We shall use Lasso regression, by setting the alpha = 1. Does both Shrinkage and feature selection Lasoo Regression uses Mean Absolute Error of coefficients RSS + lambda * MAE Using MAE, Lasso reduces some of its coefficients to Zero. ```{r} fit.lasso = glmnet(x, y, alpha = 1) plot(fit.lasso, xvar = "lambda", label = TRUE) cv.lasso = cv.glmnet(x, y, alpha = 1) plot(cv.lasso) coef(cv.lasso) ``` Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.
0 Comments
Leave a Reply. |
Archives
May 2016
Categories |