Computer Science > EXAM > CS221 Midterm exam Spring 2019 ( all solutions are 100% correct ) (All)

Also available in bundle ( Click Here )

CS221 Midterm exam Spring 2019 ( all solutions are 100% correct )

Document Content and Description Below

CS221 Exam Solutions CS221 Spring 2019 Name: | {z } by writing my name I agree to abide by the honor code SUNet ID: Read all of the following information before starting the exam: • This test has 3 problems printed on 32 pages and is worth 180 points total. It is your responsibility to make sure that you have all of the pages. • Only the printed (top) side of each page will be scanned so write all your answers on that side of the paper. • Keep your answers precise and concise. We may award partial credit so show all your work clearly and in order. • Don’t spend too much time on one problem. Read through all the problems carefully and do the easier ones first. Try to understand the problems intuitively; it really helps to draw a picture. • You cannot use any external aids except one double-sided 81 2” x 11” page of notes. • Good luck! Problem Part Max Score Score 1 a 18 b 15 c 18 d 9 2 a 13 b 10 c 11 d 16 3 a 20 b 19 c 15 d 16 Total Score: + + = 11. Machine Learning (60 points) In an alternate universe, the course staff of CS 221 does not release the mechanism by which final grades are computed. As an enterprising student, you decide to procrastinate by trying to predict final course grades. You have obtained access to data points (x; y) for each person, where x 2 R6 gives 6 pieces of information about each student’s grades in previous classes and progress so far in CS 221 (you don’t need to know what those are). You wish to design a model that predicts the student’s final grade: A, B, C, D, or F. (a) Binary classification. (18 points, 6 points each) You start by trying to solve a simpler problem: only using the data x(i) = (x( 1i); x(2i))T 2 R2 for each person i in the historical database, you wish to predict whether student i passes the class, label z. Thus, z = 1 when the student’s final grade is in fA; B; Cg and they pass, and z = −1 when the student receives a fD; Fg and thus NC (no credit) for the class. You have N training examples, x(1); : : : ; x(N) (viewing them in R2) with associated pass/fail labels z(1); : : : ; z(N) 2 f−1; 1g. (i) Before applying a machine learning algorithm, you plot x(1); : : : x(N) and observe the following, where circle dots have label z = 1 and star dots have label z = −1: Based on this plot, give an expression for one additional polynomial feature φ(x(i)), you would add to the vector x(i) to train on, that would likely improve the performance of a machine learning model that used a linear predictor, so x(i) := (x( 1i); x( 2i); φ(x(i)))T. Solution x2 1=49 + x2 2=9 2(ii) After playing around with the data, you realize you have many poorly classified points where z and your prediction z^ = θT x are far from each other. Thus, you wish to penalize large losses linearly, but wish to use a squared loss for non-outlier data points. Given this background, construct a continuous, differentiable, loss function ‘1 in the piecewise format below: ‘1(z; zb) = (? ? j otherwise z − zbj < 1 What loss function L(θ) (with respect to the penalty ‘1) do you minimize to find the optimal predictor θ ? Solution L(θ) = PN i=1 ‘1(z(i); θ>x(i)) where ‘1(z; zb) = (j12z(z−−zbj − zb)2 1 2 j otherwise z − zbj < 1 3(iii) With a step size of η = 0:1 and θ initialized to all zeros, manually compute one step of a training update using stochastic gradient descent if you randomly select data point x(1) = (0:9; 0:5); z(1) = 1 (you do not need to use the feature from (i)). Solution We compute the gradient of ‘1(z(j); θ>x(j)) with respect to θ as r‘1(z(j); θ>x(j)) = (− −x x( (j j) ) · · ( sign z(j)(−z(jθ)>−x(θjT))x(j)) j otherwise z(j) − θ>x(j)j < 1 : Then we update θ := θ − η · r‘1(z(j); θ>x(j)), so θ = (0:09; 0:05)

[Show More]

Last updated: 3 years ago

Preview 1 out of 39 pages

Buy Now

Instant download