Computer Science > SOLUTIONS MANUAL > Final Exam CMU 10-601: Machine Learning (Spring 2016) Carnegie Mellon UniversityCMU 1010601b-s16_fin (All)

Final Exam CMU 10-601: Machine Learning (Spring 2016) Carnegie Mellon UniversityCMU 1010601b-s16_final_solutions ( ALL ANSWERS 100% CORRECT )

Document Content and Description Below

Final Exam CMU 10-601: Machine Learning (Spring 2016) April 27, 2016 Name: Andrew ID: START HERE: Instructions • This exam has 16 pages and 6 Questions (page one is this cover page). Check to ... see if any pages are missing. Enter your name and Andrew ID above. • You are allowed to use one page of notes, front and back. • Electronic devices are not acceptable. • Note that the questions vary in difficulty. Make sure to look over the entire exam before you start and answer the easier questions first. Question Point Score 1 20 2 16 3 16 4 16 5 16 6 16 Extra Credit 6 Total 10610-601: Machine Learning Page 2 of 16 4/27/2016 1 Topics before Midterm [20 pts. + 2 Extra Credit] Answer each of the following questions with T or F and provide a one line justification. (a) [2 pts.] T or F: Naive Bayes can only be used with MLE estimates, and not MAP estimates. Solution: F. Naive Bayes can also be trained with MAP, for example, for binary Naive Bayes we can use a Beta distribution Beta(1, 1) as its prior and compute the corresponding MAP solution. (b) [2 pts.] T or F: Logistic regression cannot be trained with gradient descent algorithm. Solution: F. Since the objective function of the logistic regression model is differentiable, it can also be trained using gradient descent. (c) [2 pts.] T or F: Assume we have a set of n data points f(xi; yi)gn i=1, xi 2 Rm, yi 2 R sampled i.i.d. from a linear model yi = wT xi + i, where i ∼ N(0; σ2), then minimizing the squared loss function Pn i=1(yi−wT xi)2 is equivalent to maximizing the log-likelihood function. Solution: T. minw Pn i=1(yi −wT xi)2 , maxw −1 2 Pn i=1(yi −wT xi)2 , maxw log p(yi − wT xijσ2). (d) [2 pts.] T or F: Leaving out one training data point will always change the decision boundary obtained by perceptron. Solution: F. The update rule for perceptron will only update based on the instances which are incorrectly classified. So if we remove one instance from the training set that is correctly classified in the first iteration then the final hyperplane will not change. (e) [2 pts.] T or F: The function K(x; z) = −2xT z is a valid kernel function. Solution: F. Kernel function needs to be positive definite. However, for this example, we have K(x; x) = −2xT x < 0 given that x 6= 0. (f) [2 pts.] T or F: The function K(x; z) = xT z + (xT z)2 is a valid kernel function. Solution: T. Since both xT z and (xz)2 are valid kernel functions, the sum of two kernels is still a kernel. (g) [2 pts.] T or F: A data set containing 8 points with binary labels is shown in Fig. 1. The subset of data points with large circle around them are sufficient to determine the max-margin classifier in this case. Solution: T. Support vectors are sufficient to determine the max-margin classifiers. In this example, the support vectors are f(1; 1); (1; 2); (3; 1); (3; 2)g. (h) [2 pts.] T or F: The VC dimension of a finite concept class H is upper bounded by dlog2 jHje. Solution: T. For any finite set S, if H shatters S, then H at least needs to have 2jSj elements, which implies jSj ≤ dlog2 jHje. [Show More]

Last updated: 2 years ago

Preview 1 out of 16 pages

Buy Now

Instant download

We Accept: