Computer Science > QUESTIONS & ANSWERS > CS_188_Spring_2020_Written_Homework_4_Solutions-all answers CORRECT-GRADED A (All)

CS_188_Spring_2020_Written_Homework_4_Solutions-all answers CORRECT-GRADED A

Document Content and Description Below

Q1. [60 pts] Probabilistic Language Modeling In lecture, you saw an example of supervised learning where we used Naive Bayes for a binary classification problem: To predict whether an email was ham ... or spam. To do so, we needed a labeled (i.e., ham or spam) dataset of emails. To avoid this requirement for labeled datasets, let’s instead explore the area of unsupervised learning, where we don’t need a labeled dataset. In this problem, let’s consider the setting of language modeling. Language modeling is a field of Natural Language Processing (NLP) that tries to model the probability of the next word, given the previous words. Here, instead of predicting a binary label of \yes" or \no," we instead need to predict a multiclass label, where the label is the word (from all possible words of the vocabulary) that is the correct word for the blank that we want to fill in. One possible way to model this problem is with Naive Bayes. Recall that in Naive Bayes, the features X1; :::; Xm are assumed to be pairwise independent when given the label Y . For this problem, let Y be the word we are trying to predict, and our features be Xi for i = −n; :::; −1; 1; :::; n, where Xi = ith word i places from Y . (For example, X −2 would be the word 2 places in front of Y . Again, recall that we assume each feature Xi to be independent of each other, given the word Y . For example, in the sequence Neural networks ____ a lot, X−2 = Neural, X−1 = networks, Y = the blank word (our label), X1 = a, and X2 = lot. (a) First, let’s examine the problem of language modeling with Naive Bayes. (i) [1 pt] Draw the Bayes Net structure for the Naive Bayes formulation of modeling the middle word of a sequence given two preceding words and two succeeding words. You may think of the example sequence listed above: Neural networks ____ a lot. Y X −2 X−1 X+1 X+2 (ii) [1 pt] Write the joint probability P (X−2; X−1; Y; X1; X2) in terms of the relevant Conditional Probability Tables (CPTs) that describe the Bayes Net. P (X−2; X−1; Y; X1; X2) = P (Y )P (X−2jY )P (X−1jY )P (X1jY )P (X2jY ) (iii) [1 pt] What is the size of the largest CPT involved in calculating the joint probability? Assume a vocabulary size of V , so each variable can take on one of possible V values. Maximum CPT size is V 2. (iv) [1 pt] Write an expression of what label y tha [Show More]

Last updated: 2 years ago

Preview 1 out of 16 pages

Buy Now

Instant download

We Accept: