Information Technology > QUESTIONS & ANSWERS > University of Maryland, Baltimore County - IS 733assign4 (All)
Student id: - QH45693 IS 733- Data Mining 1(a) Briefly describe the boosting algorithm. State why it may improve classification accuracy. 1(b) What is the bias-variance trade-off for machine lear... ning methods? Explain. 1(c) Briefly describe the bagging procedure. Discuss why it may improve the accuracy of decision tree classifiers, in terms of the bias-variance trade-off. 2. (a) Using any software tool or programming language of your choice, create and print out a scatter plot of this dataset, eruption time versus waiting time. Note that for many tools, before the data can be loaded you will need to make a copy of the file and delete the header information. You will need to ignore the first column, which contains ID numbers for each instance. 2(b) How many clusters do you see based on your scatter plot? For the purposes of this question, a cluster is a “blob” of many data points that are close together, with regions of fewer data points between it and other “blobs”/clusters. 2(c) Describe the steps of a hierarchical clustering algorithm. Based on your scatter plot, would this method be appropriate for this dataset? Question 2b. I recommend using a high-level data-friendly programming language such as MATLAB, R, or python. Be sure to ignore the first column, which contains instance ID numbers. Report the following items: • Your source code for the k-means algorithm. You do not need to report code for loading the data, or for drawing a scatter plot. You need to implement the algorithm from scratch. • A scatter plot of your final clustering, with the data points in each cluster color-coded, or plotted with different symbols. Include the cluster centers in your plot. • A plot of the k-means objective function versus iterations of the algorithm. Recall that the objective function is E = X k i=1 X p∈Ci kp - cik 2 , where k is the number of clusters, Ci is the set of instances assigned to the ith cluster, and ci is the cluster center for the ith cluster. Note that the objective function should always decrease. If this is not the case, look for a bug in your code. • Did the method manage to find the clusters that you identified in Question 2b? If not, did it help to run the method again with another random initialization [Show More]
Last updated: 2 years ago
Preview 1 out of 8 pages
Buy this document to get the full access instantly
Instant Download Access after purchase
Buy NowInstant download
We Accept:
Can't find what you want? Try our AI powered Search
Connected school, study & course
About the document
Uploaded On
Apr 20, 2021
Number of pages
8
Written in
This document has been written for:
Uploaded
Apr 20, 2021
Downloads
0
Views
69
In Scholarfriends, a student can earn by offering help to other student. Students can help other students with materials by upploading their notes and earn money.
We're available through e-mail, Twitter, Facebook, and live chat.
FAQ
Questions? Leave a message!
Copyright © Scholarfriends · High quality services·