Computer Science > SOLUTIONS MANUAL > Final Exam CMU 10-601: Machine Learning (Spring 2016) Carnegie Mellon UniversityCMU 1010601b-s16_fin (All)
Final Exam CMU 10-601: Machine Learning (Spring 2016) April 27, 2016 Name: Andrew ID: START HERE: Instructions • This exam has 16 pages and 6 Questions (page one is this cover page). Check to ... see if any pages are missing. Enter your name and Andrew ID above. • You are allowed to use one page of notes, front and back. • Electronic devices are not acceptable. • Note that the questions vary in difficulty. Make sure to look over the entire exam before you start and answer the easier questions first. Question Point Score 1 20 2 16 3 16 4 16 5 16 6 16 Extra Credit 6 Total 10610-601: Machine Learning Page 2 of 16 4/27/2016 1 Topics before Midterm [20 pts. + 2 Extra Credit] Answer each of the following questions with T or F and provide a one line justification. (a) [2 pts.] T or F: Naive Bayes can only be used with MLE estimates, and not MAP estimates. Solution: F. Naive Bayes can also be trained with MAP, for example, for binary Naive Bayes we can use a Beta distribution Beta(1, 1) as its prior and compute the corresponding MAP solution. (b) [2 pts.] T or F: Logistic regression cannot be trained with gradient descent algorithm. Solution: F. Since the objective function of the logistic regression model is differentiable, it can also be trained using gradient descent. (c) [2 pts.] T or F: Assume we have a set of n data points f(xi; yi)gn i=1, xi 2 Rm, yi 2 R sampled i.i.d. from a linear model yi = wT xi + i, where i ∼ N(0; σ2), then minimizing the squared loss function Pn i=1(yi−wT xi)2 is equivalent to maximizing the log-likelihood function. Solution: T. minw Pn i=1(yi −wT xi)2 , maxw −1 2 Pn i=1(yi −wT xi)2 , maxw log p(yi − wT xijσ2). (d) [2 pts.] T or F: Leaving out one training data point will always change the decision boundary obtained by perceptron. Solution: F. The update rule for perceptron will only update based on the instances which are incorrectly classified. So if we remove one instance from the training set that is correctly classified in the first iteration then the final hyperplane will not change. (e) [2 pts.] T or F: The function K(x; z) = −2xT z is a valid kernel function. Solution: F. Kernel function needs to be positive definite. However, for this example, we have K(x; x) = −2xT x < 0 given that x 6= 0. (f) [2 pts.] T or F: The function K(x; z) = xT z + (xT z)2 is a valid kernel function. Solution: T. Since both xT z and (xz)2 are valid kernel functions, the sum of two kernels is still a kernel. (g) [2 pts.] T or F: A data set containing 8 points with binary labels is shown in Fig. 1. The subset of data points with large circle around them are sufficient to determine the max-margin classifier in this case. Solution: T. Support vectors are sufficient to determine the max-margin classifiers. In this example, the support vectors are f(1; 1); (1; 2); (3; 1); (3; 2)g. (h) [2 pts.] T or F: The VC dimension of a finite concept class H is upper bounded by dlog2 jHje. Solution: T. For any finite set S, if H shatters S, then H at least needs to have 2jSj elements, which implies jSj ≤ dlog2 jHje. [Show More]
Last updated: 2 years ago
Preview 1 out of 16 pages
Buy this document to get the full access instantly
Instant Download Access after purchase
Buy NowInstant download
We Accept:
Can't find what you want? Try our AI powered Search
Connected school, study & course
About the document
Uploaded On
Jun 08, 2021
Number of pages
16
Written in
This document has been written for:
Uploaded
Jun 08, 2021
Downloads
0
Views
78
In Scholarfriends, a student can earn by offering help to other student. Students can help other students with materials by upploading their notes and earn money.
We're available through e-mail, Twitter, Facebook, and live chat.
FAQ
Questions? Leave a message!
Copyright © Scholarfriends · High quality services·