Computer Science > EXAM > CS 412 Introduction To Data Mining - University of Illinois_Take-Home Midterm (All)
CS 412 Introduction To Data Mining - University of Illinois_Take-Home Midterm CS 412: Spring’21 Introduction To Data Mining Take-Home Midterm (Due Tuesday, March 23, 10:00 am) General Instruction... s • You will have to answer the questions yourself, you cannot consult with other students in class. It is an open book exam, so you can use the textbook and the material shared in class, e.g., slides, lectures, etc. • The take-home midterm will be due at 10 am, Tue, March 23. We will be using Compass for collecting the homework assignments. Please submit your answers via Compass (http: //compass2g.illinois.edu). Contact the TAs if you are having technical difficulties in submitting the assignment. We will NOT accept late submissions. • Your answers should be typeset and submitted as a pdf. You cannot submit a hand-written and scanned version of your midterm. • You DO NOT have to submit code for any of the questions. • For the questions, you will not get full credit if you only give out a final result. Please show the necessary details, calculation steps, and explanations as appropriate. • If you have clarification questions, you can use slack or campaswire. However, since the midterm needs to be submitted within 24 hours, please try to do your best in answering the questions based on your own understanding, in case responses are delayed. 1 1. (18 points) This question considers summarization and visualization of probability distributions: (a) (3 points) Describe what a five-number summary of a distribution is. (b) (3 points) Describe what boxplots are and explain how boxplots incorporate the fivenumber summary. (c) (3 point) Can two different distributions have the exact same boxplot? Clearly explain your answer. (d) (3 points) Describe what quantile plots are. (e) (3 points) Describe what quantile-quantile plots are. (f) (3 point) How is a quantile-quantile plot different from a quantile plot? Clearly explain. 2. (22 points) Table 1 is a summary of customers’ purchase history of diapers and beer. In particular, for a total of 1000 customers, the table shows how many bought both Beer and Diapers, how many bought Beer but not Diapers, and so on. For the problem, we will treat both ’Buy Beer’ and ’Buy Diaper’ as binary attributes. Buy Diaper Not Buy Diaper Buy Beer Not Buy Beer 100 300 400200 Table 1: Contingency table for Beer and Diaper sales. (a) (3 points) Under the null hypothesis, i.e., ‘Buy Beer’ and ‘Buy Diaper’ are independent, what is the expected number for ‘Buy Beer’ and ‘Buy Diaper’? (b) (3 points) Under the null hypothesis, i.e., ‘Buy Beer’ and ‘Buy Diaper’ are independent, what is the expected number for ‘Buy Beer’ and ‘Not But Diaper’? (c) (4 points) What is the χ2 statistic for the contingency table? Show steps of your calculation. (d) (4 points) At a significance level of α = 0:05, are these two variables ‘Buy Beer’ and ‘Buy Diaper’ independent? Explain your answer. (e) (4 points) Consider an updated contingency table where the entry for ‘Not Buy Beer’ and ‘Not Buy Diaper’ is 20,000 instead of 200, and all other entries are the same. What is the χ2 statistic for this updated contingency table? Show steps of your calculation. (f) (4 points) For the updated contingency table, at a significance level of α = 0:05, are these two variables ‘Buy Beer’ and ‘Buy Diaper’ independent? Explain your answer. 3. (24 points) This question considers frequent pattern mining and association rule mining. (a) (12 points) A transaction database (Table 2) has 5 transactions, and we will consider frequent pattern and association mining with (relative) minimum support min sup = 0:6 and (relative) minimum confidence min conf = 0:6. i. (6 points) What is the frequent k-itemset for the largest k? Explain your answer. If there are more than one, it is sufficient to mention (and explain) only on [Show More]
Last updated: 2 years ago
Preview 1 out of 4 pages
Buy this document to get the full access instantly
Instant Download Access after purchase
Buy NowInstant download
We Accept:
Can't find what you want? Try our AI powered Search
Connected school, study & course
About the document
Uploaded On
Apr 02, 2023
Number of pages
4
Written in
This document has been written for:
Uploaded
Apr 02, 2023
Downloads
0
Views
98
In Scholarfriends, a student can earn by offering help to other student. Students can help other students with materials by upploading their notes and earn money.
We're available through e-mail, Twitter, Facebook, and live chat.
FAQ
Questions? Leave a message!
Copyright © Scholarfriends · High quality services·