Below is the R Markdown file with snippets on non-linear models.
--- title: "NonLinearModels" output: html_document --- # Nonlinear Models ```{r} require(ISLR) attach(Wage) ``` ## Polynimoal Regression keyword `poly()` generates abasis function of *orthogonal polynomial*. ```{r} fit.poly = lm(wage ~ poly(age, 4), data = Wage) summary(fit.poly) ``` Plot the fitted function along with SE of fit. ```{r} age.limits = range(age) age.grid = seq(from = age.limits[1], to = age.limits[2]) preds = predict(fit.poly, newdata = list(age = age.grid), se = T) se.bands = cbind(preds$fit + 2 * preds$se, preds$fit - 2 * preds$se) plot(age, wage, col="darkgrey") lines(age.grid, preds$fit, col="blue") matlines(age.grid, se.bands, col="blue", lty=2) ``` Use of `anova()` to test differences between multiple models ```{r} fita = lm(wage ~ education, data = Wage) fitb= lm(wage ~ education+age, data = Wage) fitc = lm(wage ~ education+poly(age,2), data = Wage) fitd = lm(wage ~ education+poly(age,3), data = Wage) anova(fita,fitb,fitc,fitd) ``` ## Polynomial Logistic Regression Let the binary responsible variable be wage > 250K as 1 or 0. ```{r} fit.log = glm(I(wage > 250) ~ poly(age, 3), data = Wage, family = "binomial") summary(fit.log) preds.log = predict(fit.log, newdata = list(age = age.grid), se = T) se.bands.1 = preds.log$fit + cbind(fit = 0, lower = -2*preds.log$se, upper = 2*preds.log$se) prob.bands = exp(se.bands.1)/(1+exp(se.bands.1)) plot(age, wage, col="darkgrey") matplot(age.grid, prob.bands, col = "blue", lwd = c(2,1,1), lty = c(1,2,2), type = "l", ylim = c(0, 0.1)) ``` ## Splines Lets implement cubic spine with knots at 25,40,60 bs() gives teh basis for cubic polynomials ```{r} require(splines) fit.splines = lm(wage ~ bs(age, knots = c(25, 40, 60)), data = Wage) plot(age, wage, col = "darkgray") lines(age.grid, predict(fit.splines, list(age = age.grid)), col = "green", lwd = 2) abline(v = c(25, 40, 60), lty = 2, col = "darkgreen" ) ``` Smoothing Splines doesnot require knot selection but have smoothing parameter, which can be selected by choosing degree of freedom df ```{r} fit.smooth = smooth.spline(age, wage, df = 16) lines(fit.smooth, col = "red", lwd = 2) ``` Another way to choose smoothing parameters is to use LOOCV { leave one out cross validation } ```{r} fit.smooth.loocv = smooth.spline(age, wage, cv = TRUE) lines(fit.smooth.loocv, col = "blue", lwd = 2) ``` ## GAM - Generalized Additive Models To fit models with more than one non-linear terms we use GAMs `gam` package. s() in gam will tell to create a smoothing spline. ```{r} require(gam) fit.gam = gam(wage ~ s(age, df = 4) + s(year, df = 4) + education, data = Wage) par(mfrow = c(1,3)) plot(fit.gam, se = TRUE) ``` Lets see if we need a nonlinear term for year ```{r} fit.gam.1 = gam(wage ~ s(age, df = 4)+year+education, data = Wage) anova(fit.gam, fit.gam.1, test = "Chisq") ```
0 Comments
Leave a Reply. |
Archives
May 2016
Categories |