Information Technology > QUESTIONS & ANSWERS > Running head: HOMEWORK 8 – STEPWISE REGRESSION, LASSO AND ELASTIC NET 1 Homework 8 – Use Stepwis (All)

Running head: HOMEWORK 8 – STEPWISE REGRESSION, LASSO AND ELASTIC NET 1 Homework 8 – Use Stepwise Regression, Lasso, Elastic net and glmnet Amitava Chatterjee OMS Analytics GATECH – Fall 2019HOMEWORK 8 – STEPWISE REGRESSION, LASSO AND ELASTIC NET 2, 100% Accurate

Document Content and Description Below

Running head: HOMEWORK 8 – STEPWISE REGRESSION, LASSO AND ELASTIC NET 1 Homework 8 – Use Stepwise Regression, Lasso, Elastic net and glmnet Amitava Chatterjee OMS Analytics GATECH – Fall 2019 ... HOMEWORK 8 – STEPWISE REGRESSION, LASSO AND ELASTIC NET 2 Abstract Use different Regression models to look at the crime data with scaling to align and assess alpha across stepwise, lasso, and elastic net regressions.HOMEWORK 8 – STEPWISE REGRESSION, LASSO AND ELASTIC NET 3 Homework 8 – Use Stepwise Regression, Lasso, Elastic net and glmnet Stepwise Regression As per Wikipedia - Stepwise regression is a method of fitting regression models in which the choice of predictive variables is carried out by an automatic procedure. In each step, a variable is considered for addition to or subtraction from the set of explanatory variables based on some prespecified criterion. Lasso As per Wikipedia - Lasso regression is a type of linear regression that uses shrinkage. Shrinkage is where data values are shrunk towards a central point, like the mean. The lasso procedure encourages simple, sparse models (i.e. models with fewer parameters). Elastic net The elastic net is a regularized regression method that linearly combines the L1 and L2 penalties of the lasso and ridge methods.HOMEWORK 8 – STEPWISE REGRESSION, LASSO AND ELASTIC NET 4 Question 11.1 Using the crime data set uscrime.txt from Questions 8.2, 9.1, and 10.1, build a regression model using: 1. Stepwise regression 2. Lasso 3. Elastic net For Parts 2 and 3, remember to scale the data first – otherwise, the regression coefficients will be on different scales and the constraint won’t have the desired effect. For Parts 2 and 3, use the glmnet function in R. Notes on R: • For the elastic net model, what we called λ in the videos, glmnet calls “alpha”; you can get a range of results by varying alpha from 1 (lasso) to 0 (ridge regression) [and, of course, other values of alpha in between]. • In a function call like glmnet(x,y,family=”mgaussian”,alpha=1) the predictors x need to be in R’s matrix format, rather than data frame format. You can convert a data frame to a matrix using as.matrix – for example, x <- as.matrix(data[,1:n-1]) • Rather than specifying a value of T, glmnet returns models for a variety of values of T. Answer 11.1 Solution runs stepwise regression, lasso, and elastic net on both the scaled raw data and principal components found using PCA. The R code for each model covers three things: (1) uses the method to identify a set of variables to use, (2) builds a regression model using those variables, and (3) eliminates the insignificant variables in the regression and then builds a regression using theHOMEWORK 8 – STEPWISE REGRESSION, LASSO AND ELASTIC NET 5 remaining variables. After building each model, the code reports the R-squared value on the training data, and then uses cross-validation to estimate the real R-squared value of the model. The elastic net models, are used to test 11 different values of alpha, from 0.0 to 1.0 at intervals of 0.1. One would see slightly different results depending on the random number generator. # Clear environment rm(list = ls()) library(caret) ## Loading required package: lattice ## Loading required package: ggplot2 library(glmnet) ## Loading required package: Matrix ## Loading required package: foreach ## Loaded glmnet 2.0-18 # Set the random number generator seed - the results are reproducible set.seed(1) # Read in the data # ac_data = read.table("uscrime.txt", stringsAsFactors = FALSE, header = TRUE) # # Make sure the data is read correctly # head(ac_data) ## M So Ed Po1 Po2 LF M.F Pop NW U1 U2 Wealth Ineq ## 1 15.1 1 9.1 5.8 5.6 0.510 95.0 33 30.1 0.108 4.1 3940 26.1 ## 2 14.3 0 11.3 10.3 9.5 0.583 101.2 13 10.2 0.096 3.6 5570 19.4 ## 3 14.2 1 8.9 4.5 4.4 0.533 96.9 18 21.9 0.094 3.3 3180 25.0 ## 4 13.6 0 12.1 14.9 14.1 0.577 99.4 157 8.0 0.102 3.9 6730 16.7 ## 5 14.1 0 12.1 10.9 10.1 0.591 98.5 18 3.0 0.091 2.0 5780 17.4 ## 6 12.1 0 11.0 11.8 11.5 0.547 96.4 25 4.4 0.084 2.9 6890 12.6 ## Prob Time Crime ## 1 0.084602 26.2011 791 ## 2 0.029599 25.2999 1635HOMEWORK 8 – STEPWISE REGRESSION, LASSO AND ELASTIC NET 6 ## 3 0.083401 24.3006 578 ## 4 0.015801 29.9012 1969 ## 5 0.041399 21.2998 1234 ## 6 0.034201 20.9995 682 tail(ac_data) ## M So Ed Po1 Po2 LF M.F Pop NW U1 U2 Wealth Ineq ## 42 14.1 0 10.9 5.6 5.4 0.523 96.8 4 0.2 0.107 3.7 4890 17.0 ## 43 16.2 1 9.9 7.5 7.0 0.522 99.6 40 20.8 0.073 2.7 4960 22.4 ## 44 13.6 0 12.1 9.5 9.6 0.574 101.2 29 3.6 0.111 3.7 6220 16.2 ## 45 13.9 1 8.8 4.6 4.1 0.480 96.8 19 4.9 0.135 5.3 4570 24.9 ## 46 12.6 0 10.4 10.6 9.7 0.599 98.9 40 2.4 0.078 2.5 5930 17.1 ## 47 13.0 0 12.1 9.0 9.1 0.623 104.9 3 2.2 0.113 4.0 5880 16.0 ## Prob Time Crime ## 42 0.088904 12.1996 542 ## 43 0.054902 31.9989 823 ## 44 0.028100 30.0001 1030 ## 45 0.056202 32.5996 455 ## 46 0.046598 16.6999 508 ## 47 0.052802 16.0997 849 # Crime is response, other variables are predictors —————————- Stepwise Regression ————————————- Stepwise Regression using original variables and Cross Validation In backward stepwise regression. Our lower model will have only the intercept and all variables in our full model. #Scaling the data except the response variable and categorical scaledData = as.data.frame(scale(ac_data[,c(1,3,4,5,6,7,8,9,10,11,12,13,14,15)])) scaledData = cbind(ac_data[,2],scaledData,ac_data[,16]) # Add column 2 back in colnames(scaledData)[1] = "So" colnames(scaledData)[16] = "Crime" # Perform 5 fold CV using the code below ctrl = trainControl(method = "repeatedcv", number = 5, repeats = 5) lmFit_Step = train(Crime ~ ., data = scaledData, "lmStepAIC", scope = list(lower = Crime~1, upper = Crime~.), direction = "backward",trControl=ctrl, trace =FALSE) summary(lmFit_Step)HOMEWORK 8 – STEPWISE REGRESSION, LASSO AND ELASTIC NET 7 ## ## Call: ## lm(formula = .outcome ~ M + Ed + Po1 + M.F + U1 + U2 + Ineq + ## Prob, data = dat) ## ## Residuals: ## Min 1Q Median 3Q Max ## -444.70 -111.07 3.03 122.15 483.30 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 905.09 28.52 31.731 < 2e-16 *** ## M 117.28 42.10 2.786 0.00828 ** ## Ed 201.50 59.02 3.414 0.00153 ** ## Po1 305.07 46.14 6.613 8.26e-08 *** ## M.F 65.83 40.08 1.642 0.10874 ## U1 -109.73 60.20 -1.823 0.07622 . ## U2 158.22 61.22 2.585 0.01371 * ## Ineq 244.70 55.69 4.394 8.63e-05 *** ## Prob -86.31 33.89 -2.547 0.01505 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 195.5 on 38 degrees of freedom ## Multiple R-squared: 0.7888, Adjusted R-squared: 0.7444 ## F-statistic: 17.74 on 8 and 38 DF, p-value: 1.159e-10 Fit a new model with 8 variables mod_Step = lm(Crime ~ M.F+U1+Prob+U2+M+Ed+Ineq+Po1, data = scaledData) summary(mod_Step) ## ## Call: ## lm(formula = Crime ~ M.F + U1 + Prob + U2 + M + Ed + Ineq + Po1, ## data = scaledData) ## ## Residuals: ## Min 1Q Median 3Q Max ## -444.70 -111.07 3.03 122.15 483.30 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 905.09 28.52 31.731 < 2e-16 *** ## M.F 65.83 40.08 1.642 0.10874 ## U1 -109.73 60.20 -1.823 0.07622 . ## Prob -86.31 33.89 -2.547 0.01505 * ## U2 158.22 61.22 2.585 0.01371 * ## M 117.28 42.10 2.786 0.00828 ** ## Ed 201.50 59.02 3.414 0.00153 ** ## Ineq 244.70 55.69 4.394 8.63e-05 ***HOMEWORK 8 – STEPWISE REGRESSION, LASSO AND ELASTIC NET 8 ## Po1 305.07 46.14 6.613 8.26e-08 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 195.5 on 38 degrees of freedom ## Multiple R-squared: 0.7888, Adjusted R-squared: 0.7444 ## F-statistic: 17.74 on 8 and 38 DF, p-value: 1.159e-10 We obtain an Adjusted R-SQuared value = 0.7444 using the selected 8 variables using Backward StepWise regression and Cross Validation Now let’s use cross-validation to see how good this model really is. Because we only have 47 data points, let’s use 47-fold cross-validation (equivalently, leave-one-out cross-validation). SStot = sum((ac_data$Crime - mean(ac_data$Crime))^2) totsse = 0 for(i in 1:nrow(scaledData)) { mod_Step_i = lm(Crime ~ M.F+U1+Prob+U2+M+Ed+Ineq+Po1, data = scaledData[-i,]) pred_i = predict(mod_Step_i,newdata=scaledData[i,]) totsse = totsse + ((pred_i - ac_data[i,16])^2) } r2_md = 1 - totsse/SStot r2_md ## 1 ## 0.667621 In the model above, the p-value for M.F is above 0.1. We might keep it in the model, because it’s close to 0.1 and might be important. That’s what we tested above. Or, we might remove it, and re-run the model without it. Let’s see what happens if we do: mod_Step = lm(Crime ~ U1+Prob+U2+M+Ed+Ineq+Po1, data = scaledData) summary(mod_Step) ## ## Call: ## lm(formula = Crime ~ U1 + Prob + U2 + M + Ed + Ineq + Po1, data = scaledData) ## ## Residuals: ## Min 1Q Median 3Q Max ## -520.76 -105.67 9.53 136.28 519.37 ## ## Coefficients:HOMEWORK 8 – STEPWISE REGRESSION, LASSO AND ELASTIC NET 9 ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 905.09 29.14 31.062 < 2e-16 *** ## U1 -63.86 54.48 -1.172 0.2482 ## Prob -84.83 34.61 -2.451 0.0188 * ## U2 134.13 60.71 2.209 0.0331 * ## M 134.20 41.70 3.218 0.0026 ** ## Ed 244.38 54.07 4.520 5.62e-05 *** ## Ineq 264.65 55.52 4.767 2.61e-05 *** ## Po1 314.89 46.73 6.738 4.91e-08 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 199.8 on 39 degrees of freedom ## Multiple R-squared: 0.7738, Adjusted R-squared: 0.7332 ## F-statistic: 19.06 on 7 and 39 DF, p-value: 8.805e-11 U1 doesn’t look significant… we can take it out too, and re-run the model. mod_Step = lm(Crime ~ Prob+U2+M+Ed+Ineq+Po1, data = scaledData) summary(mod_Step) ## ## Call: ## lm(formula = Crime ~ Prob + U2 + M + Ed + Ineq + Po1, data = scaledData) ## ## Residuals: ## Min 1Q Median 3Q Max ## -470.68 -78.41 -19.68 133.12 556.23 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 905.09 29.27 30.918 < 2e-16 *** ## Prob -86.44 34.74 -2.488 0.01711 * ## U2 75.47 34.55 2.185 0.03483 * ## M 131.98 41.85 3.154 0.00305 ** ## Ed 219.79 50.07 4.390 8.07e-05 *** ## Ineq 269.91 55.60 4.855 1.88e-05 *** ## Po1 341.84 40.87 8.363 2.56e-10 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 200.7 on 40 degrees of freedom ## Multiple R-squared: 0.7659, Adjusted R-squared: 0.7307 ## F-statistic: 21.81 on 6 and 40 DF, p-value: 3.418e-11 This model looks good, so now let’s see how it cross-validates: SStot = sum((ac_data$Crime - mean(ac_data$Crime))^2) totsse = 0 for(i in 1:nrow(scaledData)) { mod_Step_i = lm(Crime ~ Prob+U2+M+Ed+Ineq+Po1, data = scaledData[-i,])HOMEWORK 8 – STEPWISE REGRESSION, LASSO AND ELASTIC NET 10 pred_i = predict(mod_Step_i,newdata=scaledData[i,]) totsse = totsse + ((pred_i - ac_data[i,16])^2) } r2_md = 1 - totsse/SStot r2_md ## 1 ## 0.6661638 So, cross-validation shows that it’s about the same whether we include M.F and U1 (0.668) or not (0.666). That gives some support to the idea that M.F and U1 really aren’t significant. Since the quality is about the same, we should probably use the simpler model. —————————- Lasso Regression ————————————- #building lasso XP=data.matrix(scaledData[,-16]) YP=data.matrix(scaledData$Crime) lasso=cv.glmnet(x=as.matrix(scaledData[,- 16]),y=as.matrix(scaledData$Crime),alpha=1, nfolds = 5,type.measure="mse",family="gaussian") #Output the coefficients of the variables selected by lasso coef(lasso, s=lasso$lambda.min) ## 16 x 1 sparse Matrix of class "dgCMatrix" ## 1 ## (Intercept) 889.059998 ## So 47.073756 ## M 85.028207 ## Ed 124.299584 ## Po1 308.896175 ## Po2 . ## LF 1.112481 ## M.F 52.034368 [Show More]

Last updated: 3 years ago

Preview 1 out of 32 pages

Buy Now

Instant download

We Accept: