Information Technology > QUESTIONS & ANSWERS > University of Maryland, Baltimore County - IS 733assign4 (All)

University of Maryland, Baltimore County - IS 733assign4

Document Content and Description Below

Student id: - QH45693 IS 733- Data Mining 1(a) Briefly describe the boosting algorithm. State why it may improve classification accuracy. 1(b) What is the bias-variance trade-off for machine lear... ning methods? Explain. 1(c) Briefly describe the bagging procedure. Discuss why it may improve the accuracy of decision tree classifiers, in terms of the bias-variance trade-off. 2. (a) Using any software tool or programming language of your choice, create and print out a scatter plot of this dataset, eruption time versus waiting time. Note that for many tools, before the data can be loaded you will need to make a copy of the file and delete the header information. You will need to ignore the first column, which contains ID numbers for each instance. 2(b) How many clusters do you see based on your scatter plot? For the purposes of this question, a cluster is a “blob” of many data points that are close together, with regions of fewer data points between it and other “blobs”/clusters. 2(c) Describe the steps of a hierarchical clustering algorithm. Based on your scatter plot, would this method be appropriate for this dataset? Question 2b. I recommend using a high-level data-friendly programming language such as MATLAB, R, or python. Be sure to ignore the first column, which contains instance ID numbers. Report the following items: • Your source code for the k-means algorithm. You do not need to report code for loading the data, or for drawing a scatter plot. You need to implement the algorithm from scratch. • A scatter plot of your final clustering, with the data points in each cluster color-coded, or plotted with different symbols. Include the cluster centers in your plot. • A plot of the k-means objective function versus iterations of the algorithm. Recall that the objective function is E = X k i=1 X p∈Ci kp - cik 2 , where k is the number of clusters, Ci is the set of instances assigned to the ith cluster, and ci is the cluster center for the ith cluster. Note that the objective function should always decrease. If this is not the case, look for a bug in your code. • Did the method manage to find the clusters that you identified in Question 2b? If not, did it help to run the method again with another random initialization [Show More]

Last updated: 2 years ago

Preview 1 out of 8 pages

Buy Now

Instant download

We Accept:

We Accept
document-preview

Buy this document to get the full access instantly

Instant Download Access after purchase

Buy Now

Instant download

We Accept:

We Accept

Reviews( 0 )

$9.00

Buy Now

We Accept:

We Accept

Instant download

Can't find what you want? Try our AI powered Search

69
0

Document information


Connected school, study & course


About the document


Uploaded On

Apr 20, 2021

Number of pages

8

Written in

Seller


seller-icon
Muchiri

Member since 4 years

209 Documents Sold

Reviews Received
19
5
1
1
6
Additional information

This document has been written for:

Uploaded

Apr 20, 2021

Downloads

 0

Views

 69

Document Keyword Tags


$9.00
What is Scholarfriends

In Scholarfriends, a student can earn by offering help to other student. Students can help other students with materials by upploading their notes and earn money.

We are here to help

We're available through e-mail, Twitter, Facebook, and live chat.
 FAQ
 Questions? Leave a message!

Follow us on
 Twitter

Copyright © Scholarfriends · High quality services·