ISYE 6501 Homework 2 Latest Update

Document Content and Description Below

Homework 2 Question 3.1 Using the same data set (credit_card_data.txt or credit_card_data-headers.txt) as in Question 2.2, use the ksvm or kknn function to find a good classifier: (a) using cross-validation (do this for the k-nearest-neighbors model; SVM is optional); and Using leave-one-out crossvalidation with different kernel for classification data <- read.csv("credit_card_data-headers.txt", header = TRUE, sep = "") # Splitting data for training (70%) and validating (30%) number_of_data_points <- nrow(data) training_sample <- sample(number_of_data_points, size = round(number_of_data_points * 0.7)) training_data <- data[training_sample,] validating_data <- data[-training_sample,] kmax <- 100 model <- train.kknn(R1~., training_data, kmax = kmax, scale = TRUE, kernel = c("rectangular", "triangular", "epanechnikov", "gaussian", "rank", "optimal")) model ## ## Call: ## train.kknn(formula = R1 ~ ., data = training_data, kmax = kmax, kernel = c("rectangular", "triang## ## Type of response variable: continuous ## minimal mean absolute error: 0.1957787 ## Minimal mean squared error: 0.107393 ## Best kernel: gaussian ## Best k: 41 pred <- predict(model, validating_data[,-11]) accuracy <- sum(as.integer(round(pred) == validating_data[,11])) / nrow(validating_data) # Accuracy on validation data cat("Best accuracy for validation data is", accuracy, " for K value of", model$best.parameters$k, "\n\n") ## Best accuracy for validation data is 0.8469388 for K value of 41 (b) splitting the data into training, validation, and test data sets (pick either KNN or SVM; the other is optional). data <- read.csv("credit_card_data-headers.txt", header = TRUE, sep = "") # spliting data into training: 60%, validating: 20% and testing: 20% 1 number_of_data_points <- nrow(data) training_sample <- sample(number_of_data_points, size = round(number_of_data_points * 0.6)) training_data <- data[training_sample,] non_training_data <- data[-training_sample,] number_of_non_training_data_points = nrow(non_training_data) validating_sample <- sample(number_of_non_training_data_points, size = round(number_of_non_training_data_points * 0.5)) validating_data <- non_training_data[validating_sample,] testing_data <- non_training_data[-validating_sample,] # Using kknn for crossvalidation Ks <- seq(1, 100) bestK <- 0 bestAcuracy <- 0 bestModel <- NULL for(k in Ks) { model <- kknn(R1~., training_data, validating_data, k = k, scale = TRUE) pred <- round(predict(model)) accuracy <- sum(pred == validating_data[,11]) / nrow(validating_data) # Keeping the best accuracy data for later use if(accuracy > bestAcuracy) { bestAcuracy <- accuracy bestK <- k bestModel <- model } } # Best K and it accuracy on validation data cat("Best K value is", bestK, "with accuracy of", bestAcuracy, "\n\n") ## Best K value is 11 with accuracy of 0.870229 # Running the test data with best K value model <- kknn(R1~., training_data, testing_data, k = bestK, scale = TRUE) pred <- round(predict(model)) accuracy <- sum(pred == testing_data[,11]) / nrow(testing_data) # Accuracy of test data with best K cat("Acuracy with K value of", bestK, "on test data is", accuracy, "\n\n") ## Acuracy with K value of 11 on test data is 0.8549618 Question 4.1 Describe a situation or problem from your job, everyday life, current events, etc., for which a clustering model would be appropriate. List some (up to 5) predictors that you might use. One of the key revenue generator for our e-commerce business is the recommendation based online sales. In order to make product recommendation, we need to group our online visitors and returing customers into various groups. Some of the common predictors we use are:

[Show More]

Last updated: 3 years ago

Preview 1 out of 7 pages

Buy Now

Instant download

We Accept:

Preview image of ISYE 6501 Homework 2 Latest Update document

Buy this document to get the full access instantly

Instant Download Access after purchase

Buy Now

Instant download

We Accept:

Report Copyright Violation

Also available in bundle (1)

Click Below to Access Bundle(s)

BUNDLED PAPERS (Multiple versions) FOR Georgia Institute Of Technology ISYE 6501 Homeworks 1 - 15, Midterm 1 & 2 + FINAL EXAM | ISYE6501x Courseware | edX - Complete Solutions - Introduction To Analytics Modeling - GTX ISYE 6501

GTx: ISYE6501x Introduction to Analytics Modeling Midterm Quiz 2 - GT Students and Verified MM Learners latest 2021 Midterm Quiz 1 - GT Students (Launch Proctortrack first before taking the Midterm Qu...

By Nutmegs 3 years ago

$15