Computer Science > QUESTIONS & ANSWERS > CS_188_Spring_2020_Written_Homework_4_Solutions-all answers CORRECT-GRADED A (All)
Q1. [60 pts] Probabilistic Language Modeling In lecture, you saw an example of supervised learning where we used Naive Bayes for a binary classification problem: To predict whether an email was ham ... or spam. To do so, we needed a labeled (i.e., ham or spam) dataset of emails. To avoid this requirement for labeled datasets, let’s instead explore the area of unsupervised learning, where we don’t need a labeled dataset. In this problem, let’s consider the setting of language modeling. Language modeling is a field of Natural Language Processing (NLP) that tries to model the probability of the next word, given the previous words. Here, instead of predicting a binary label of \yes" or \no," we instead need to predict a multiclass label, where the label is the word (from all possible words of the vocabulary) that is the correct word for the blank that we want to fill in. One possible way to model this problem is with Naive Bayes. Recall that in Naive Bayes, the features X1; :::; Xm are assumed to be pairwise independent when given the label Y . For this problem, let Y be the word we are trying to predict, and our features be Xi for i = −n; :::; −1; 1; :::; n, where Xi = ith word i places from Y . (For example, X −2 would be the word 2 places in front of Y . Again, recall that we assume each feature Xi to be independent of each other, given the word Y . For example, in the sequence Neural networks ____ a lot, X−2 = Neural, X−1 = networks, Y = the blank word (our label), X1 = a, and X2 = lot. (a) First, let’s examine the problem of language modeling with Naive Bayes. (i) [1 pt] Draw the Bayes Net structure for the Naive Bayes formulation of modeling the middle word of a sequence given two preceding words and two succeeding words. You may think of the example sequence listed above: Neural networks ____ a lot. Y X −2 X−1 X+1 X+2 (ii) [1 pt] Write the joint probability P (X−2; X−1; Y; X1; X2) in terms of the relevant Conditional Probability Tables (CPTs) that describe the Bayes Net. P (X−2; X−1; Y; X1; X2) = P (Y )P (X−2jY )P (X−1jY )P (X1jY )P (X2jY ) (iii) [1 pt] What is the size of the largest CPT involved in calculating the joint probability? Assume a vocabulary size of V , so each variable can take on one of possible V values. Maximum CPT size is V 2. (iv) [1 pt] Write an expression of what label y tha [Show More]
Last updated: 2 years ago
Preview 1 out of 16 pages
Buy this document to get the full access instantly
Instant Download Access after purchase
Buy NowInstant download
We Accept:
Can't find what you want? Try our AI powered Search
Connected school, study & course
About the document
Uploaded On
May 03, 2021
Number of pages
16
Written in
This document has been written for:
Uploaded
May 03, 2021
Downloads
0
Views
69
In Scholarfriends, a student can earn by offering help to other student. Students can help other students with materials by upploading their notes and earn money.
We're available through e-mail, Twitter, Facebook, and live chat.
FAQ
Questions? Leave a message!
Copyright © Scholarfriends · High quality services·