Tree Based Models

5/16/2016
Below is the R Markdown for Tree Based Models.
---
title: "TreeBasedModels"
output: html_document
---
# Tree Based Models

## Decision Trees

```{r}
require(ISLR)
require(tree)
attach(Carseats)
hist(Sales)
High = ifelse(Sales <= 8, "No", "Yes")
Carseats = data.frame(Carseats, High)
fit.tree = tree(High ~.-Sales, data = Carseats)
summary(fit.tree)
plot(fit.tree)
text(fit.tree, pretty=0)
```

To print Detailed tree
Description of entry:
  `Node Observation Mean_Deviance (%yes, %no)`
```{r}
fit.tree
```

Lets generate test-set by spliting careats data into (250, 150)
Grow the tree on training set and evaluate the performance of test-set
```{r}
set.seed(1011)
train = sample(1:nrow(Carseats), 250)
fit.tree.1 = tree(High ~.-Sales, data = Carseats, subset = train)
summary(fit.tree.1)
plot(fit.tree.1)
text(fit.tree.1, pretty=0)
fit.tree.predict = predict(fit.tree.1, Carseats[-train,], type = "class")
with(Carseats[-train,], table(fit.tree.predict, High))
```

Lets now use the CV method to prune the tree to reduce variance.
```{r}
fit.tree.cv = cv.tree(fit.tree.1, FUN = prune.misclass)
plot(fit.tree.cv)
prune.fit.tree.1 = prune.misclass(fit.tree.1, best = 13)
plot(prune.fit.tree.1)
text(prune.fit.tree.1, pretty = 0)
prune.tree.1.predict = predict(prune.fit.tree.1, Carseats[-train,], type = "class")
with(Carseats[-train,], table(prune.tree.1.predict, High))
```

# Random Forests And Boosting

## Random forest (package : randomForest)
We shall use Boston Housing data from MASS package
Response: medv
```{r}
require(randomForest)
require(MASS)
set.seed(101)
attach(Boston)
train = sample(1:nrow(Boston), 300)
fit.RF = randomForest(medv ~., data = Boston, subset = train)
```

The MSR & % of variance Explained are based on OOB (out og bag).
No. of variables randomly chosen at each split is 4, since $p=13$ we can use
all 13 values of `mtry`.
```{r}
oob.error = double(13)
test.error = double(13)
for(mtry in 1:13){
  fit = randomForest(medv ~., data = Boston, subset = train, mtry = mtry, ntree=400)
  oob.error[mtry] = fit$mse[400]
  pred = predict(fit, Boston[-train,])
  test.error[mtry] = with(Boston[-train,], mean((medv - pred)^2))
}
matplot(1:mtry, cbind(test.error, oob.error), pch = 19, col = c("red", "blue"), type = "b", ylab = "MSE")
legend("topright", legend = c("Test", "OOB"), pch = 19, col = c("red", "blue"))
```

$`mtry` = 13$ corresponds to bagging

## Boosting (package: gbm)
Boosting tries to reduce bias unlike RF targeted at variance.
By building (`n.trees`) numerous shallow trees (`interaction.depth`)
`summary(fit.boost)` gives the variable importance graph.
```{r}
require(gbm)
fit.boost = gbm(medv ~., data = Boston[train,], distribution = "gaussian", n.trees = 10000, shrinkage = 0.01, interaction.depth = 4)
summary(fit.boost)

plot(fit.boost, i = "lstat")
plot(fit.boost, i = "rm")
```

Tuning the model parameters
```{r}
n.trees = seq(from = 100, to = 10000, by = 100)
predmat = predict(fit.boost, newdata = Boston[-train,], n.trees = n.trees)
# Column wise MSE
error = with(Boston[-train,], apply((predmat - medv)^2,2,mean))
plot(n.trees, error, pch = 19, ylab = "MSE", xlab = "#trees", main = "Boosting Test Error")
abline(h = min(test.error), col = "red")
```
0 Comments
Tree Based Models

Leave a Reply.

Archives

Categories