Below is the R Markdown for Tree Based Models.
--- title: "TreeBasedModels" output: html_document --- # Tree Based Models ## Decision Trees ```{r} require(ISLR) require(tree) attach(Carseats) hist(Sales) High = ifelse(Sales <= 8, "No", "Yes") Carseats = data.frame(Carseats, High) fit.tree = tree(High ~.-Sales, data = Carseats) summary(fit.tree) plot(fit.tree) text(fit.tree, pretty=0) ``` To print Detailed tree Description of entry: `Node Observation Mean_Deviance (%yes, %no)` ```{r} fit.tree ``` Lets generate test-set by spliting careats data into (250, 150) Grow the tree on training set and evaluate the performance of test-set ```{r} set.seed(1011) train = sample(1:nrow(Carseats), 250) fit.tree.1 = tree(High ~.-Sales, data = Carseats, subset = train) summary(fit.tree.1) plot(fit.tree.1) text(fit.tree.1, pretty=0) fit.tree.predict = predict(fit.tree.1, Carseats[-train,], type = "class") with(Carseats[-train,], table(fit.tree.predict, High)) ``` Lets now use the CV method to prune the tree to reduce variance. ```{r} fit.tree.cv = cv.tree(fit.tree.1, FUN = prune.misclass) plot(fit.tree.cv) prune.fit.tree.1 = prune.misclass(fit.tree.1, best = 13) plot(prune.fit.tree.1) text(prune.fit.tree.1, pretty = 0) prune.tree.1.predict = predict(prune.fit.tree.1, Carseats[-train,], type = "class") with(Carseats[-train,], table(prune.tree.1.predict, High)) ``` # Random Forests And Boosting ## Random forest (package : randomForest) We shall use Boston Housing data from MASS package Response: medv ```{r} require(randomForest) require(MASS) set.seed(101) attach(Boston) train = sample(1:nrow(Boston), 300) fit.RF = randomForest(medv ~., data = Boston, subset = train) ``` The MSR & % of variance Explained are based on OOB (out og bag). No. of variables randomly chosen at each split is 4, since $p=13$ we can use all 13 values of `mtry`. ```{r} oob.error = double(13) test.error = double(13) for(mtry in 1:13){ fit = randomForest(medv ~., data = Boston, subset = train, mtry = mtry, ntree=400) oob.error[mtry] = fit$mse[400] pred = predict(fit, Boston[-train,]) test.error[mtry] = with(Boston[-train,], mean((medv - pred)^2)) } matplot(1:mtry, cbind(test.error, oob.error), pch = 19, col = c("red", "blue"), type = "b", ylab = "MSE") legend("topright", legend = c("Test", "OOB"), pch = 19, col = c("red", "blue")) ``` $`mtry` = 13$ corresponds to bagging ## Boosting (package: gbm) Boosting tries to reduce bias unlike RF targeted at variance. By building (`n.trees`) numerous shallow trees (`interaction.depth`) `summary(fit.boost)` gives the variable importance graph. ```{r} require(gbm) fit.boost = gbm(medv ~., data = Boston[train,], distribution = "gaussian", n.trees = 10000, shrinkage = 0.01, interaction.depth = 4) summary(fit.boost) plot(fit.boost, i = "lstat") plot(fit.boost, i = "rm") ``` Tuning the model parameters ```{r} n.trees = seq(from = 100, to = 10000, by = 100) predmat = predict(fit.boost, newdata = Boston[-train,], n.trees = n.trees) # Column wise MSE error = with(Boston[-train,], apply((predmat - medv)^2,2,mean)) plot(n.trees, error, pch = 19, ylab = "MSE", xlab = "#trees", main = "Boosting Test Error") abline(h = min(test.error), col = "red") ```
0 Comments
Leave a Reply. |
Archives
May 2016
Categories |