Data Mining > QUESTIONS & ANSWERS > CSC550Z: Data Mining & Distributed Computing (Summer 2019) Week 1 Assignment Solution (100 points). (All)

CSC550Z: Data Mining & Distributed Computing (Summer 2019) Week 1 Assignment Solution (100 points).

Document Content and Description Below

CSC550Z: Data Mining & Distributed Computing (Summer 2019) Week 1 Assignment Solution (100 points) 2.1 Assuming that data mining techniques are to be used in the following cases, identify whethe... r the task required is supervised or unsupervised learning. (30 points) 2.1.a Deciding whether to issue a loan to an applicant based on demographic and financial data (with reference to a database of similar data on prior customers). 2.1.b In an online bookstore, making recommendations to customers concerning additional items to buy based on the buying patterns in prior transactions. 2.1.c Identifying a network data packet as dangerous (virus, hacker attack) based on comparisons to other packets whose threat status is known. 2.1.d Identifying segments of similar customers. 2.1.e Predicting whether a company will go bankrupt based on comparing its financial data to those of similar bankrupt and nonbankrupt firms. 2.1.f Estimating the repair time required for an aircraft based on a trouble ticket. 2.1.g Automatic sorting of mail by zip code scanning. 2.1.h Printing of customer discount coupons at the conclusion of a grogery store checkout based on what you just bought and what others have bought recently. 2.2 Describe the difference in roles assumed by the validation partition and the test partition (10 points) 2.3 Consider the sample from a database of credit applications shown in Figure 2.13. Comment on the likelihood that it was sampled randomly, and whether it is likely to be a useful sample. (10 points) Answer: 2.5 Using the concept of overfitting, explain why when a model is fit to training data, zero error with those data is not necessarily good. (10 points) Answer: 2.8 Normalize the data in Table 2.3. (30 points) Answer: Normalization of a measurement is obtained by subtracting the average from each measurement and dividing the difference by the standard deviation. 2.10 Two models are applied to a dataset that has been partitioned. Model A is considerably more accurate than model B on the training data, but slightly less accurate than model B on the validation data. Which one are you more likely to consider for final deployment. (10 points) Answer: [Show More]

Last updated: 2 years ago

Preview 1 out of 3 pages

Buy Now

Instant download

We Accept:

We Accept
document-preview

Buy this document to get the full access instantly

Instant Download Access after purchase

Buy Now

Instant download

We Accept:

We Accept

Reviews( 0 )

$9.50

Buy Now

We Accept:

We Accept

Instant download

Can't find what you want? Try our AI powered Search

163
0

Document information


Connected school, study & course


About the document


Uploaded On

Sep 22, 2020

Number of pages

3

Written in

Seller


seller-icon
QuizMaster

Member since 6 years

1187 Documents Sold

Reviews Received
185
56
29
11
17
Additional information

This document has been written for:

Uploaded

Sep 22, 2020

Downloads

 0

Views

 163

Document Keyword Tags


$9.50
What is Scholarfriends

In Scholarfriends, a student can earn by offering help to other student. Students can help other students with materials by upploading their notes and earn money.

We are here to help

We're available through e-mail, Twitter, Facebook, and live chat.
 FAQ
 Questions? Leave a message!

Follow us on
 Twitter

Copyright © Scholarfriends · High quality services·