Chemistry > QUESTIONS & ANSWERS > Stanford UniversityCS 234midterm_2017_sol. (All)

Stanford UniversityCS 234midterm_2017_sol.

Document Content and Description Below

1 Markov Decision Process (10 Points) You are in a Las Vegas casino! You have $20 for this casino venture and will play until you lose it all or as soon as you double your money (i.e., increase your... holding to at least $40). You can choose to play two slot machines: 1) slot machine A costs $10 to play and will return $20 with probability 0:05 and $0 otherwise; and 2) slot machine B costs $20 to play and will return $30 with probability 0:01 and $0 otherwise. Until you are done, you will choose to play machine A or machine B in each turn. In the space below, provide an MDP that captures the above description. Describe the state space, action space, rewards and transition probabilities. Assume the discount factor γ = 1. You can express your solution using equations instead of a table, or a diagram. 2 Bellman Operator with Function Approximation (12 Points) Consider an MDP M = (S; A; R; P; γ) with finite discrete state space S and action space A. Assume M has dynamics model P(s0js; a) for all s; s0 2 S and a 2 A and reward model R(s; a) for all s 2 S and a 2 A. Recall that the Bellman operator B applied to a function V : S ! R is defined as B(V )(s) = max a (R(s; a) + γ X s0 P(s0js; a)V (s0)) (a) (4 Points) Now, consider a new operator which first applies a Bellman backup and then applies a function approximation, to map the value function back to a space representable by the function approximation. We will consider a linear value function approximator over a continuous state space. Consider the following graphs: The graphs show linear regression on the sample X0 = f0; 1; 2g for hypothetical underlying functions. On the left, a target function f (solid line), that evaluates to f(0) = f(1) = f(2) = 0, and its corresponding fitted function f^(x) = 0. On the right, another target function g (solid line), that evaluates to g(0) = 0 and g(1) = g(2) = 1, and its fitted function ^ g(x) = 12 7 x. What happens to the distance between points ff(0); f(1); f(2)g and fg(0); g(1); g(2)g after we do the linear approximation? In other words, compare maxx2X0 jf(x) − g(x)j and maxx2X0 jf^(x) − g^(x)j. (b) (4 Points) Is the linear function approximator here a contraction operator? Explain your answer. (c) (4 Points) Will the new operator be guaranteed to converge to single value function? If yes, will this be the optimal value function for the problem? Justify your answers. 3 Probably Approximately Correct (6 Points) Let A(α; β) be a hypothetical reinforcement learning algorithm, parametrized in terms of α and β such that for any α > β > 1, it selects action a for state s satisfying jQ(s; a) − V ∗(s)j ≤ αβ in all but N = jSjj 1−Aγjαβ steps with probability at least 1 − β12 . Per the definition of Probably Approximately Correct Reinforcement Learning, express N as a function of jSj, jAj, δ,  and γ. What is the resulting N? Is algorithm A probably approximately correct (PAC)? Briefly justify. [Show More]

Last updated: 2 years ago

Preview 1 out of 12 pages

Buy Now

Instant download

We Accept:

We Accept
document-preview

Buy this document to get the full access instantly

Instant Download Access after purchase

Buy Now

Instant download

We Accept:

We Accept

Reviews( 0 )

$13.50

Buy Now

We Accept:

We Accept

Instant download

Can't find what you want? Try our AI powered Search

56
0

Document information


Connected school, study & course


About the document


Uploaded On

Jun 01, 2021

Number of pages

12

Written in

Seller


seller-icon
proff JAY

Member since 4 years

30 Documents Sold

Reviews Received
8
0
0
0
0
Additional information

This document has been written for:

Uploaded

Jun 01, 2021

Downloads

 0

Views

 56

Document Keyword Tags


$13.50
What is Scholarfriends

In Scholarfriends, a student can earn by offering help to other student. Students can help other students with materials by upploading their notes and earn money.

We are here to help

We're available through e-mail, Twitter, Facebook, and live chat.
 FAQ
 Questions? Leave a message!

Follow us on
 Twitter

Copyright © Scholarfriends · High quality services·